[horde] Horde/Apache instability problems
Daniel A. Ramaley
daniel.ramaley at DRAKE.EDU
Mon Jun 12 11:45:12 PDT 2006
Hello. I've posted a few times about a few different (but possibly
related) problems with my Horde installation. I've recently been able
to gather some more information and am hoping someone might have an
idea or two of what i should try next. The next 2 paragraphs briefly
resummarize the problems and the server statistics, later in the
message i describe what i've done since the last posts on these
problems.
In brief, the problems are as follows: First, sometimes Apache segfaults
(this problem seems to occur sporadically; the server can go for weeks
without it, then numerous segfaults over a couple day period, even with
regularly restarting the Apache daemon). The second problem is
"unexpected EOF on client connection" errors in PostgreSQL's log file.
These occur very frequently; once every few minutes when the server is
under a light load (such as now, when most of the students have left
for the summer), and several times a minute when under a heavier load.
The third problem is that occasionally Apache just stops responding to
web requests until the daemon is restarted. When this occurs the line
"server reached MaxClients setting, consider raising the MaxClients
setting" is added to Apache's error_log.
The hardware is a Sun Fire v40z (dual Opteron 64-bit CPUs, 4 GB RAM).
The relevant software it is running is: Red Hat Enterprise AS 4.3,
Apache 2.0.52, PHP 4.3.9, PostgreSQL 7.4.8, eAccelerator 0.9.4 (with
optimization off), UP-imapproxy 1.2.4. The Horde software that is
installed are the latest release versions of: Horde, Imp, Ingo,
Kronolith, Passwd, Turba.
Now for the recently discovered information:
It was suggested that i try running Apache with CoreDumpDirectory
defined. I have done that, and it seems to dump core when the segfault
problem occurs. Since that problem is somewhat rare and the server
isn't under a very heavy load during the summer, i had to wait awhile
before getting a collection of core dumps. I'm not familiar with
analyzing core dumps, but loaded each dump into gdb and asked for a
backtrace. The first lines of the backtraces all looked like this, with
the exception that the op_array would vary between dumps:
(gdb) bt
#0 0x0000002a99ff2492 in preg_replace_impl (ht=Variable "ht" is not
available.)
at /usr/src/redhat/BUILD/php-4.3.9/ext/pcre/php_pcre.c:1154
#1 0x0000002a9a0ac255 in execute (op_array=0x552afe2798)
at /usr/src/redhat/BUILD/php-4.3.9/Zend/zend_execute.c:1640
#2 0x0000002a9a0a9386 in execute (op_array=0x552aff7fb8)
at /usr/src/redhat/BUILD/php-4.3.9/Zend/zend_execute.c:1684
#3 0x0000002a9a0a9386 in execute (op_array=0x552b0e26c8)
at /usr/src/redhat/BUILD/php-4.3.9/Zend/zend_execute.c:1684
#4 0x0000002a9a0a9386 in execute (op_array=0x552af80db8)
at /usr/src/redhat/BUILD/php-4.3.9/Zend/zend_execute.c:1684
I tried recompiling PHP with --enable-debug, hoping to be able to get
more information, but unfortunately PHP from Red Hat's source RPM won't
compile with that configure flag. Is there something i should do to
pursue this further? If more debugging information would be helpful, i
can put more time into compiling PHP with --enable-debug, though i'll
probably have to subvert the package management system and install it
completely by hand.
It was suggested in a different thread to configure Apache's
server-status handler and try to retrieve information that way when
Apache becomes unresponsive. I did that and can post the whole thing if
someone wants to look at it. But i believe the most important bit is
the "scoreboard", which tells me that there were 128 (which is the
MaxClients setting) processes doing this:
119 processes closing connections
2 processes reading requests
4 processes sending replies
3 processes waiting for connections
I checked the server-status page this morning when the server was
behaving properly just to get some typical values. It was running far
fewer processes, with only 2 in the closing state:
2 closing connections
11 reading requests
3 sending replies
9 waiting for connections
Any ideas what could occasionally cause so many closing connections to
bog down the system?
One other thing i tried was turning Apache's LogLevel up. When set to
"debug" i get regular (several per minute) entries in the error_log
like so:
[Fri Jun 09 13:56:17 2006] [debug] util_ldap.c(1441): INIT global
mutex /tmp/filessX9mx in child 25783
[Fri Jun 09 13:56:17 2006] [debug] util_ldap.c(1441): INIT global
mutex /tmp/filessX9mx in child 25782
[Fri Jun 09 13:56:21 2006] [debug] util_ldap.c(1441): INIT global
mutex /tmp/filessX9mx in child 25791
[Fri Jun 09 13:56:25 2006] [debug] util_ldap.c(1441): INIT global
mutex /tmp/filessX9mx in child 25802
[Fri Jun 09 13:56:26 2006] [debug] util_ldap.c(1441): INIT global
mutex /tmp/filessX9mx in child 25803
I'm not sure yet what significance those debug lines have (if any). If
anyone know more, i'd appreciate your sharing of the information.
Thanks in advance for any ideas on how to make Apache more stable.
------------------------------------------------------------------------
Dan Ramaley Dial Center 118, Drake University
Network Programmer/Analyst 2407 Carpenter Ave
+1 515 271-4540 Des Moines IA 50311 USA
More information about the horde
mailing list