[Tickets #8535] Net_IMSP Sends too many strings as literals

bugs at horde.org bugs at horde.org
Thu Aug 27 19:50:00 UTC 2009


DO NOT REPLY TO THIS MESSAGE. THIS EMAIL ADDRESS IS NOT MONITORED.

Ticket URL: http://bugs.horde.org/ticket/8535
------------------------------------------------------------------------------
  Ticket             | 8535
  Created By         | noah at lsit.ucsb.edu
  Summary            | Net_IMSP Sends too many strings as literals
  Queue              | Horde Framework Packages
  Version            | HEAD
  Type               | Enhancement
  State              | New
  Priority           | 1. Low
  Milestone          |
  Patch              | 1
  Owners             |
------------------------------------------------------------------------------


noah at lsit.ucsb.edu (2009-08-27 19:50) wrote:

Some users on our system have very large and complicated sets of  
address books stored on a cyrus IMSP server. Performance of the IMSP  
backend is a serious issue for us. One of the things I've noticed in  
comparing IMSP conversations between horde's IMSP client and say,  
Mulberry's, is that horde is sending many more strings as literals  
instead of double quoted strings. From RFC, "In the case of literals  
transmitted from client to server, the client must wait to receive a  
command continuation request (described later in this document) before  
sending the octet data (and the remainder of the command)." Every time  
a string is sent as a literal there's an extra To/From with the  
server, so per user, per request, it can add up.

I didn't do a lot of time benchmarking to back this up, but this seems  
like one of the easiest and best ways to improve performance. In a  
particular address book on our system, the number of separate IMSP  
sends To: the server on each request for Turba's browse interface  
would normally be around 1900. More conservative use of literals drops  
that to around 550 for the same request. We have a custom mass  
batching fetchaddress function that also helps, so mileage will vary.

Literal usage in the IMSP backend is governed by this definition:
define('IMSP_MUST_USE_LITERAL', "/[\W]/i");

The main reason /[\W]/i is overbroad is that it catches a vast  
majority of contact names (generally "First Last" etc). Technically,  
transferring all strings as literals would be valid behavior, but  
there is a cost associated with waiting for all those command  
continuation responses.  It seems like there is some performance to be  
gained by tightening the use of literals to (at least) not include  
space characters (but still include CR & LF) and maybe commas.

 From reading the IMSP RFC, looking at other client conversations, and  
testing with imspd, it seems like a better value for  
IMSP_MUST_USE_LITERAL might be:
define('IMSP_MUST_USE_LITERAL', "/[\x80-\xFF\\r\\n\"\\\\]/");

Any 8-bit char, CR, LF, double-quote <">, or backslash \. The Regex is  
based on what a double-quoted string is NOT, and what a literal IS  
(RFC bit follows below). It works well with cyrus imspd.

The new regex also handles the cases that were being fixed last time  
IMSP_MUST_USE_QUOTE was changed. Fields with multi-line input continue  
to work fine, thanks to CR & LF in the regex. It's also been tested  
with UTF-8, latin1, and ascii chars.

In the provided patch, I've retained the /[\W]/i as  
IMSP_MUST_USE_QUOTE, but I've only used it once in the  
quoteSpacedString function, so it might not be necessary. I had been  
thinking there might be additional places in the IMSP code where  
quoteSpacedString would have to be called, but if there are, I have  
not found them.

 From the RFC, regarding quoted strings and literals:
"A quoted string is a sequence of zero or more 7-bit characters,  
excluding CR and LF, with double quote (<">) characters at each end."
   TEXT_CHAR       ::= <any CHAR except CR and LF>
   QUOTED_CHAR     ::= <any TEXT_CHAR except quoted_specials> /
                        "\" quoted_specials

   quoted_specials ::= <"> / "\"

"A literal is a sequence of zero or more octets (including CR and LF)"
   literal         ::= "{" number "}" CRLF *CHAR8
                        ;; The number represents the number of CHAR8 octets

   CHAR            ::= <any 7-bit US-ASCII character except NUL, 0x01 - 0x7f>

   CHAR8           ::= <any 8-bit octet except NUL, 0x01 - 0xff>







More information about the bugs mailing list