code quality vs. economics (aka, the bottom line matters in your code)

| 5 Comments
I have recently been asked to write some code that by itself is actually very bad code.
basically:


$sql = 'select * from table';
$sth = $dbh->prepare($sql);
$sth->execute();
while (@array = $sth->fetchrow_array){
  if($array[4] =~ /^text_.*_text_$somevar.*$/) {
    do_some_work_here();
  }
}\
Basically what is happening is that the hard work of the where clause is being moved from the sql server to the web server. When I suggested that this could be part of why our product is slow, I was told that adding a web server is much easier and cheaper than adding a database server.

So the next time this code needs to be modified, the developer will again need to muddle through the two differing filtering mechanisms, for the sake of not overloading our mysql server(s). 

Although I have not be expressly forbidden from making changes to the sql, any changes do need to be reviewed by someone who understands the much larger architecture better than I do.

5 Comments

I would think that a text equality check (even with % wildcards) is cheaper than sending a record over the tcp connection.

First off, if the number of possible values of $somevar is small and known in advance, the best thing you can do is create index tables or columns in the database that pre-compute these tests, so that you can then turn your pattern match into a simple lookup.

This is analogous to pre-computing the uppercase of your text if you just know you want to search case-insensitively anyway.

Barring that working, such as if $somevar has many possible values, then ...

With all this talk about saving on database servers, do these people think about all the work the database server is doing, plus extra network traffic for the database server, to just transfer all of that data so that your application server can process it?

The only way this can possibly be reasonable is if your application server basically needed all of those rows anyway, or was keeping, say, about half of them.

But if you're going to end up only using a small fraction of the rows, especially if it is say less than 5%, then it is much better to have the database do the filtering so that it only has to transfer a tiny fraction of the data.

You should really profile and see if any CPU you're saving isn't making up for all the time wasted in the network transfer.

With all the extra work that the database will have to do to write the whole table to the socket, I doubt it will be any saving to do the filtering in perl.

As Darren says, the biggest saving would be pre-computing if possible and the same pattern is used many times. Imagine when the db is populated if it goes update table_name set widget_type = 4 where column_name like 'abc%xyz'.

Anyway, hard to comment further without knowing a bit more about the problem they/you are trying to solve.

It does not really matter how many rows are there - if it is cheaper for one row then when you do that a million times then it will be cheaper but million times more :)

About this Entry

This page contains a single entry by leonard published on May 21, 2010 1:20 PM.

Yapc 2010 was the previous entry in this blog.

monotony in development is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.