[imp] Eye-popping load averages

andrew morgan morgan@orst.edu
Wed, 26 Sep 2001 12:42:38 -0700 (PDT)


On Wed, 26 Sep 2001, Joseph Formoso wrote:

> Folks,
>
> 	I've see something happen on our IMP box a few times now, and
> was wondering if any of y'all have ever seen similar behavior.  First,
> of course, the specifics:
>
> Linux Mandrake 7.0
> IMP 2.2.7-cvs
> Horde 1.2.7-cvs
> Apache 1.3.19 w/mod_ssl and PHP4.0.4pl1
>
> 	It talks to an IMAP server which lives on an IRIX box.  The
> problem we've seen is that, occasionally, our IRIX box gets a load
> average spike (often due to that wonder of SGI ingenuity, nsd).  The
> spike causes our IMAP and POP servers to stop taking connections (by
> design; we have some services suspend themselves when the load average
> hits a certain point to try to throttle the out-of-control spin).  This
> has happened a few times over the past few months, and each time, the
> IMP box has had its load average to through the ceiling immediately
> after the IRIX box's did, and could only be fixed by shutting down and
> restarting Apache.  The correlation suggests that *something* in the
> IMP/PHP/c-client/Apache setup is getting put in a strange state when
> the IMAP server suddenly goes away (or possibly when it suddenly comes
> back).
>
> 	Anyone else ever seen anything like this?  Failing that, any
> suggestions for trying to find a root cause?  I'm assuming it's
> something at the PHP/c-client level, but on the off chance that IMP is
> in some way involved (I don't know how), I figured I'd ask.

I've run into similar trouble if the IMAP server is stuck somehow.  We are
fighting a problem right now where our IMAP server stops handling
authentications (at the OS level), so users can never complete an IMAP
authentication.  When this happens, every new call to the IMAP servers
gets stuck and doesn't seem to time out.  This leads to many apache
processes as users keep trying to connect and spawning new apache
processes.

I don't know if the same thing happens when the IMAP server is no longer
listening at all.  I would assume c-client would error out, but maybe it
is doing lots of retries?

	Andy