[Tickets #8535] Net_IMSP Sends too many strings as literals
bugs at horde.org
bugs at horde.org
Thu Aug 27 19:50:00 UTC 2009
DO NOT REPLY TO THIS MESSAGE. THIS EMAIL ADDRESS IS NOT MONITORED.
Ticket URL: http://bugs.horde.org/ticket/8535
------------------------------------------------------------------------------
Ticket | 8535
Created By | noah at lsit.ucsb.edu
Summary | Net_IMSP Sends too many strings as literals
Queue | Horde Framework Packages
Version | HEAD
Type | Enhancement
State | New
Priority | 1. Low
Milestone |
Patch | 1
Owners |
------------------------------------------------------------------------------
noah at lsit.ucsb.edu (2009-08-27 19:50) wrote:
Some users on our system have very large and complicated sets of
address books stored on a cyrus IMSP server. Performance of the IMSP
backend is a serious issue for us. One of the things I've noticed in
comparing IMSP conversations between horde's IMSP client and say,
Mulberry's, is that horde is sending many more strings as literals
instead of double quoted strings. From RFC, "In the case of literals
transmitted from client to server, the client must wait to receive a
command continuation request (described later in this document) before
sending the octet data (and the remainder of the command)." Every time
a string is sent as a literal there's an extra To/From with the
server, so per user, per request, it can add up.
I didn't do a lot of time benchmarking to back this up, but this seems
like one of the easiest and best ways to improve performance. In a
particular address book on our system, the number of separate IMSP
sends To: the server on each request for Turba's browse interface
would normally be around 1900. More conservative use of literals drops
that to around 550 for the same request. We have a custom mass
batching fetchaddress function that also helps, so mileage will vary.
Literal usage in the IMSP backend is governed by this definition:
define('IMSP_MUST_USE_LITERAL', "/[\W]/i");
The main reason /[\W]/i is overbroad is that it catches a vast
majority of contact names (generally "First Last" etc). Technically,
transferring all strings as literals would be valid behavior, but
there is a cost associated with waiting for all those command
continuation responses. It seems like there is some performance to be
gained by tightening the use of literals to (at least) not include
space characters (but still include CR & LF) and maybe commas.
From reading the IMSP RFC, looking at other client conversations, and
testing with imspd, it seems like a better value for
IMSP_MUST_USE_LITERAL might be:
define('IMSP_MUST_USE_LITERAL', "/[\x80-\xFF\\r\\n\"\\\\]/");
Any 8-bit char, CR, LF, double-quote <">, or backslash \. The Regex is
based on what a double-quoted string is NOT, and what a literal IS
(RFC bit follows below). It works well with cyrus imspd.
The new regex also handles the cases that were being fixed last time
IMSP_MUST_USE_QUOTE was changed. Fields with multi-line input continue
to work fine, thanks to CR & LF in the regex. It's also been tested
with UTF-8, latin1, and ascii chars.
In the provided patch, I've retained the /[\W]/i as
IMSP_MUST_USE_QUOTE, but I've only used it once in the
quoteSpacedString function, so it might not be necessary. I had been
thinking there might be additional places in the IMSP code where
quoteSpacedString would have to be called, but if there are, I have
not found them.
From the RFC, regarding quoted strings and literals:
"A quoted string is a sequence of zero or more 7-bit characters,
excluding CR and LF, with double quote (<">) characters at each end."
TEXT_CHAR ::= <any CHAR except CR and LF>
QUOTED_CHAR ::= <any TEXT_CHAR except quoted_specials> /
"\" quoted_specials
quoted_specials ::= <"> / "\"
"A literal is a sequence of zero or more octets (including CR and LF)"
literal ::= "{" number "}" CRLF *CHAR8
;; The number represents the number of CHAR8 octets
CHAR ::= <any 7-bit US-ASCII character except NUL, 0x01 - 0x7f>
CHAR8 ::= <any 8-bit octet except NUL, 0x01 - 0xff>
More information about the bugs
mailing list