[horde] Apache/Horde unresponsive
Daniel A. Ramaley
daniel.ramaley at DRAKE.EDU
Mon May 22 09:00:15 PDT 2006
Hello. The Horde installation that i maintain has been having some
issues recently. What i am looking for is some ideas as to what
debugging steps to try next. The problem is that Apache stops
responding to web requests. First i'll describe the server's
configuration, then the problem in more detail:
The server is a Sun Fire v40z, which is a dual processor AMD Opteron box
with 4 GB RAM. It has an SSL accelerator card installed (an nFast Ultra
from Ncipher) which handles encryption of https traffic. The operating
system is Red Hat Enterprise AS 4.3 with all updates applied within a
day of their availability. It is running Apache 2.0.52 with PHP 4.3.9
and PostgreSQL 7.4.8. We are using eAccelerator 0.9.4 to cache the
compiled PHP scripts, but the accelerator's code optimizer is turned
off. UP-imapproxy 1.2.4 is also installed, and Horde is configured to
use it. The Horde software we have installed, with versions, is:
Horde 3.1.1
Imp 4.1.1
Ingo 1.1.1
Kronolith 2.1.1
Passwd 3.0
Turba 2.1.1
The only thing that i know of which is configured against
recommendations is in php.ini there is a memory_limit of 128M set. I
could remove that limit if it would help, though i'm not sure what
(besides a bug in PHP) could cause PHP to use more than 128M per
session; the maximum message size that our IMAP server will accept is
8M so Imp should never have anywhere near 128M to deal with at any one
time.
When Apache intermittently stops responding to web requests, there is an
entry like this printed in Apache's error log:
server reached MaxClients setting, consider raising the
MaxClients setting
MaxClients is set to 128. While we have approximately 5000 people who
use the server, i doubt that more than 128 simultaneous connections are
ever made. I have checked Apache's access_log and error_log at the
times when the above message occurs, and not found anything unusual.
>From the access_log it looks like several users are logged in and
reading their e-mail, and the server is handling handling several
connections per second. Right up until it stops responding, however,
the server responds very quickly to all requests. I am confident that
the server is not overloaded.
For now i've installed swatch and configured it to do several things
when that occurs: send me an e-mail alert, stop Apache, stop the IMAP
proxy, clear the PHP accelerator's cache, restart the PostgreSQL
database, start the IMAP proxy, and finally start Apache again. That
rather drastic action seems to get everything working again, but only
temporarily. The MaxClients message can occur again in anywhere from a
few minutes to a few weeks. Just stopping Apache, cleaning the PHP
accelerator cache and starting Apache again works to temporarily
correct the problem, but i added in the other steps to see if it would
lead to more stability. So far it has not.
Horde had been running fine for several weeks without requiring my
attention until last Friday (May 19). At the time the server was under
a lighter load than normal; this is at a university and students had
gone home for the summer the weekend before. But Friday morning Apache
stopped responding several times. I completely rebooted the server just
to make sure there wasn't some subtle software problem, but the problem
continued to occur after the reboot. That is when i installed swatch
and set up the automatic restart of Apache. Between 17:00 on Friday
when i went home for the evening and Sunday at 04:00 swatch had to
restart Apache a total of 55 times! I've investigated Apache's logs
around some of those times and everything looked normal; the server was
under a light load with just a handful of people reading e-mail with
Imp. If the server were under some sort of scripted attack it should
have shown up in the access_log (this has happened before; the server
usually just shrugs off automated attacks from worms as if nothing is
happening). However, since Sunday morning swatch hasn't had to restart
Apache at all. The last time swatch restarted Apache was about 30 hours
ago. And everything has run just fine since then, but there is no way
to tell when the problem will happen again. Over the weekend the
problem was occurring frequently enough that users noticed and begun to
complain about it.
Has anyone else experienced problems similar to this? Did you figure out
what was misconfigured or broken? What should i look at next? Apache's
logs so far have been less than helpful. Thanks in advance for any
assistance.
------------------------------------------------------------------------
Dan Ramaley Dial Center 118, Drake University
Network Programmer/Analyst 2407 Carpenter Ave
+1 515 271-4540 Des Moines IA 50311 USA
More information about the horde
mailing list