[horde] Horde/Apache instability problems

Mon Jun 12 11:45:12 PDT 2006

Hello. I've posted a few times about a few different (but possibly 
related) problems with my Horde installation. I've recently been able 
to gather some more information and am hoping someone might have an 
idea or two of what i should try next. The next 2 paragraphs briefly 
resummarize the problems and the server statistics, later in the 
message i describe what i've done since the last posts on these 
problems.

In brief, the problems are as follows: First, sometimes Apache segfaults 
(this problem seems to occur sporadically; the server can go for weeks 
without it, then numerous segfaults over a couple day period, even with 
regularly restarting the Apache daemon). The second problem is 
"unexpected EOF on client connection" errors in PostgreSQL's log file. 
These occur very frequently; once every few minutes when the server is 
under a light load (such as now, when most of the students have left 
for the summer), and several times a minute when under a heavier load. 
The third problem is that occasionally Apache just stops responding to 
web requests until the daemon is restarted. When this occurs the line 
"server reached MaxClients setting, consider raising the MaxClients 
setting" is added to Apache's error_log.

The hardware is a Sun Fire v40z (dual Opteron 64-bit CPUs, 4 GB RAM). 
The relevant software it is running is: Red Hat Enterprise AS 4.3, 
Apache 2.0.52, PHP 4.3.9, PostgreSQL 7.4.8, eAccelerator 0.9.4 (with 
optimization off), UP-imapproxy 1.2.4. The Horde software that is 
installed are the latest release versions of: Horde, Imp, Ingo, 
Kronolith, Passwd, Turba.

Now for the recently discovered information:

It was suggested that i try running Apache with CoreDumpDirectory 
defined. I have done that, and it seems to dump core when the segfault 
problem occurs. Since that problem is somewhat rare and the server 
isn't under a very heavy load during the summer, i had to wait awhile 
before getting a collection of core dumps. I'm not familiar with 
analyzing core dumps, but loaded each dump into gdb and asked for a 
backtrace. The first lines of the backtraces all looked like this, with 
the exception that the op_array would vary between dumps:

(gdb) bt
#0  0x0000002a99ff2492 in preg_replace_impl (ht=Variable "ht" is not 
available.)
    at /usr/src/redhat/BUILD/php-4.3.9/ext/pcre/php_pcre.c:1154
#1  0x0000002a9a0ac255 in execute (op_array=0x552afe2798)
    at /usr/src/redhat/BUILD/php-4.3.9/Zend/zend_execute.c:1640
#2  0x0000002a9a0a9386 in execute (op_array=0x552aff7fb8)
    at /usr/src/redhat/BUILD/php-4.3.9/Zend/zend_execute.c:1684
#3  0x0000002a9a0a9386 in execute (op_array=0x552b0e26c8)
    at /usr/src/redhat/BUILD/php-4.3.9/Zend/zend_execute.c:1684
#4  0x0000002a9a0a9386 in execute (op_array=0x552af80db8)
    at /usr/src/redhat/BUILD/php-4.3.9/Zend/zend_execute.c:1684

I tried recompiling PHP with --enable-debug, hoping to be able to get 
more information, but unfortunately PHP from Red Hat's source RPM won't 
compile with that configure flag. Is there something i should do to 
pursue this further? If more debugging information would be helpful, i 
can put more time into compiling PHP with --enable-debug, though i'll 
probably have to subvert the package management system and install it 
completely by hand.

It was suggested in a different thread to configure Apache's 
server-status handler and try to retrieve information that way when 
Apache becomes unresponsive. I did that and can post the whole thing if 
someone wants to look at it. But i believe the most important bit is 
the "scoreboard", which tells me that there were 128 (which is the 
MaxClients setting) processes doing this:

119 processes closing connections
  2 processes reading requests
  4 processes sending replies
  3 processes waiting for connections

I checked the server-status page this morning when the server was 
behaving properly just to get some typical values. It was running far 
fewer processes, with only 2 in the closing state:

 2 closing connections
11 reading requests
 3 sending replies
 9 waiting for connections

Any ideas what could occasionally cause so many closing connections to 
bog down the system?

One other thing i tried was turning Apache's LogLevel up. When set to 
"debug" i get regular (several per minute) entries in the error_log 
like so:

[Fri Jun 09 13:56:17 2006] [debug] util_ldap.c(1441): INIT global 
mutex /tmp/filessX9mx in child 25783 
[Fri Jun 09 13:56:17 2006] [debug] util_ldap.c(1441): INIT global 
mutex /tmp/filessX9mx in child 25782 
[Fri Jun 09 13:56:21 2006] [debug] util_ldap.c(1441): INIT global 
mutex /tmp/filessX9mx in child 25791 
[Fri Jun 09 13:56:25 2006] [debug] util_ldap.c(1441): INIT global 
mutex /tmp/filessX9mx in child 25802 
[Fri Jun 09 13:56:26 2006] [debug] util_ldap.c(1441): INIT global 
mutex /tmp/filessX9mx in child 25803 

I'm not sure yet what significance those debug lines have (if any). If 
anyone know more, i'd appreciate your sharing of the information.

Thanks in advance for any ideas on how to make Apache more stable.

------------------------------------------------------------------------
Dan Ramaley                            Dial Center 118, Drake University
Network Programmer/Analyst             2407 Carpenter Ave
+1 515 271-4540                        Des Moines IA 50311 USA