[cvs] [Wiki] created: RunawayApacheProcesses
Michael Slusarz
slusarz at horde.org
Tue Jan 30 09:15:04 PST 2007
slusarz Tue, 30 Jan 2007 09:15:04 -0800
Created page: http://wiki.horde.org/RunawayApacheProcesses
+ Runaway Apache Processes
++ Need for Analysis
The following is adapted from [http://lists.horde.org/archives/horde/Week-of-Mon-20070129/032765.html this post]:
When dealing with runaway Apache processess, it would **tremendously** help if you could figure out the URI's these runaway httpd processes are attempting to serve. Horde developers can give much better advice and, if appropriate, attempt to remedy the problem when we know what the problem is.
For example, we recently ran into an issue where a Horde installation was seeing runaway processes. Trying to debug, that limited information didn't help much - after all, a full Horde installation contains tens of thousands of lines of code. However, analysis of the runaway httpd processes indicated there was a specific URI call to a page in kronolith with one single specific parameter passed in that allowed me to track this down in an hour to make this tiny change: http://lists.horde.org/archives/cvs/Week-of-Mon-20070115/064855.html
Here was the server analysis that sparked the discovery (thanks nuno):
<code>
20:19:59 up 29 days, 3:14, 1 user, load average: 55.16, 38.82, 23.92
412 processes: 403 sleeping, 8 running, 1 zombie, 0 stopped
CPU states: 82.5% user, 17.5% system, 0.0% nice, 0.0% idle
Mem: 3883356K total, 3865044K used, 18312K free, 207128K buffers
Swap: 979924K total, 61724K used, 918200K free, 2434148K cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
4858 webmail 25 0 29196 28M 24876 R 71.7 0.7 17:39 httpd
27948 webmail 25 0 31888 31M 27104 R 13.5 0.8 6:20 httpd
PID REQUEST
4858 GET /kronolith/year.php?year=1104543267 HTTP/1.1
27948 GET /kronolith/year.php?year=1199151268 HTTP/1.1
</code>
The biggest issue was the fact that I didn't see this on my local machine when debugging because the runaway process turned out to be a PHP bug. It wasn't until I was able to track the potential issues down to a few lines of code that I was able to cross-reference with the PHP bugs database to determine the problem was this installation was using an older version of PHP that had the bug (in the mktime() function) while my local installation wasn't seeing any problems because I was using a newer version of PHP.
As far as how to do this server analysis - that's not something that I am personally going to be able to help with. Depending on your setup, this will vary. But if you can at least track the runaway processes down to a specific URI (and most likely a specific e-mail message), we will be more than happy to try to track it down.
++ How-to Analyze
Here is information useful in linking the PIDs to the runaway requests (adapted from [http://lists.horde.org/archives/horde/Week-of-Mon-20070129/032771.html this post]):
I got the association between PIDs and Requests through Apache "server-status" (mod_status or something). So what I usually do is a 'top' and a {{lynx -dump http://my.server/server-status}} in separate windows. Usually I run top first, then lynx and I press 'space, q' on 'top' so I'm almost sure that the PIDs reflect the Requests in question (remember that in most configurations one apache process processes several requests).
If you run Linux and you want to know what user is being served by that request, you can also do a {{ls -l /proc/<pid>/fd}} and edit the session file.
More information about the cvs
mailing list