* * * * *

                       Death by a thousand SQL queries

The Company just got hired to take over the maintenance and development of a
mid-sized Web 2.0 social website that's the next Big Thing™ on the Internet.
For the past week we've been given access to the source code, set up a new
development server and have been basically poking around both the code and
the site.

The major problem with the site was performance—loads exceeding 50 were
common on both the webserver and database server. The site apparently went
live in January and has since grown quickly, straining the existing
infrastructure. That's where we come in, to help with “Project:
SocialSpace2.0” (running on the ubiquitous LAMP (Linux Apache MySQL PHP)
stack).

The site is written with PHP [1] (of course), and one of the cardinal rules
of addresssing performance issues is “profile, profile, profile.”—the bottle
neck is almost never where you think it is. Now, I've profiled code before
[2], but that was C, not PHP. I'm not even sure where one would begin to
profile PHP code. And even if we had access to a PHP profiler, profiling the
program on the development server may not be good enough (the development
server has maybe half the data of the production server, which may not have
the pathological cases the production server might encounter).

So what to do as the load increases on the webserver?

Well, this answer to profiling C++ code [3] gave me an idea. In one window I
ran top [4]. In another window a command line. When a particular instance of
Apache hit the CPU (Central Processing Unit) hard as seen in top, I quickly
get a listing of open files in said process (listing the contents of
/proc/pid/fd to find the ofending PHP file causing the load spike).

Laugh if you will, but it worked. About half a dozen checks lead to one
particular script causing the issue—basically a “people who viewed this
profile also viewed these profiles” script.

I checked the code in question and found the following bit of code (in
pseudocode, to basically protect me):

> for viewers in SELECT userID
>               FROM people_who_viewed
>               WHERE profileID = {userid}
>               ORDER BY RAND()
>   for viewees in SELECT profileID
>               FROM people_who_viewed
>               WHERE userID = {viewers['userID']}
>               ORDER BY RAND()
>     ...
>   end
> end
>

Lovely!

An O(n^2) algorithm [5]—in SQL (Structured Query Language) no less!

No wonder the site was dying.

Worse, the site only displayed about 10 results anyway!

A simple patch:

> for viewers in SELECT userID
>               FROM people_who_viewed
>               WHERE profileID = {userid}
>               ORDER BY RAND() **LIMIT 10**
>   for viewees in SELECT profileID
>               FROM people_who_viewed
>               WHERE userID = {viewers['userID']}
>               ORDER BY RAND() **LIMIT 10**
>     ...
>   end
> end
>

And what do you know? The site is actually usable now.

[1] http://www.php.net/
[2] gopher://gopher.conman.org/0Phlog:2010/05/12.1
[3] http://stackoverflow.com/questions/375913/what-can-i-use-to-profile-c-code-in-linux/378024#378024
[4] http://en.wikipedia.org/wiki/Top_(Unix)
[5] http://en.wikipedia.org/wiki/Quadratic_time#Polynomial_time

Email author at [email protected]