[Tickets #13153] Re: Bugs in the current implementation for PARTIAL limiting (RFC 5267 [4.4]) in Horde's Imap_Client library
noreply at bugs.horde.org
noreply at bugs.horde.org
Mon May 5 07:38:32 UTC 2014
DO NOT REPLY TO THIS MESSAGE. THIS EMAIL ADDRESS IS NOT MONITORED.
Ticket URL: http://bugs.horde.org/ticket/13153
------------------------------------------------------------------------------
Ticket | 13153
Updated By | daniel_bistriceanu at yahoo.com
Summary | Bugs in the current implementation for PARTIAL limiting
| (RFC 5267 [4.4]) in Horde's Imap_Client library
Queue | Horde Framework Packages
Version | Git master
Type | Bug
State | Feedback
Priority | 1. Low
Milestone |
Patch |
Owners | Michael Slusarz
------------------------------------------------------------------------------
daniel_bistriceanu at yahoo.com (2014-05-05 07:38) wrote:
>>>> To fix this it should just enforce the usage of the provided partial
>>>> if it is a valid sequence range ('range_start:range_end'), as
>>>> specified in RFC 5267 [4.4].
>>>
>>> This won't work, since 'partial' can be, for example, array(1, 2, 3).
>>> So we do need to convert to an Ids object first. We just need to
>>> catch the single case where the range contains a single ID
>>
>> Why would you even want to allow the ability to specify partial
>> ranges as anything else but the actual literal representation?
>
> Because that's what the API (2.0) requires. You can't take that
> ability away from someone using the code ... that's the definition of
> breaking backward compatibility. Whether its a good idea or not is
> completely irrelevant - we're stuck with it until Horde_Imap_Client
> 3.0.
>
> And from an API perspective, by requiring someone to provide a full
> partial string you are requiring them to do 2 things: 1.) Have a
> detailed knowledge of RFC 3501 (number sequence formatting) and 2.)
> Have a detailed knowledge of RFC 5267.
>
> The brilliance/advantage of allowing someone to pass an array in is
> that all of these details are abstracted. The library does all the
> IMAP formatting details for you.
>
>> Partial ranges have nothing to do with message UIDs and I don't think
>> that trying to handle them the same way just for consistency is the
>> best idea.
>
> Why not? From RFC 3501 [9]:
>
> seq-number = nz-number / "*"
> uniqueid = nz-number
>
> Outside of the '*', they *are* identicial from a syntax level. So we
> can (and should) treat them the same internally.
> Horde_Imap_Client_Ids works on both equally well. We surely don't
> want to have two libraries that duplicate the exact same code.
>
There is a small difference between a partial range and a sequence
range of message UIDs. The latter is described in RFC 3501 [9]:
seq-range = seq-number ":" seq-number
and since PARTIAL syntax does not support '*', as described in RFC
5267 [4.4], a partial range is not exactly the same as a seq-range of
message UIDs, but a range like nz-number ":" nz-number.
Horde_Imap_Client_Ids is a much more generic handler that works with a
mixed set of ids representations:
* @param mixed $ids Either self::ALL, self::SEARCH_RES,
* Horde_Imap_Client_Ids object, array, or string.
As such it would accept any of those representation variants as valid
even though the rage_string for the corresponding $options['partial']
might not make any sense as a PARTIAL range in some particular cases.
For example, it would even accept a $options['partial?] set to
Horde_Imap_Client_Ids::ALL which would result in a PARTIAL range
broken error caused by the command that was actually sent that
included a RETURN (PARTIAL :), as the range_string property returns
an empty string in this particular case. And yes, this is not really
an option that should be supported and it falls in the same
type/sanity-checking category, like the existing check for a not empty
string/array, and can probably be avoided by just checking if the
range_string representation is not empty before proceeding further.
I still believe that the most efficient way of validating a partial
range would be to simply check that it does in fact follow its
specification, since we're only validating a specific subset of all
the possible representations accepted by the Horde_Imap_Client_Ids.
But on the other hand you are correct that the current API
implementation behaves differently and since in my opinion the
documentation on this is not exactly clear (see below) your approach
is probably best in order not to break backwards compatibility. So
maybe this should be changed in the next version (3.0) if you agree
that it?s a good idea.
>> In case of specifying a set of message UIDs
>
> This isn't allowed. So not sure what you are saying here.
>
>> Also you did not commit any changes to Horde_Imap_Client_Ids so at
>> the moment your current fix will always set the partials range to ':'
>> which is clearly invalid:
>
> How? Are you passing in an empty array? Don't do that. (I can add
> code to catch that, but the result's going to be the same ... an
> invalid search. GIGO).
>
Actually I didn't do anything special to get that error. I was simply
using an older version of the library (for reasons which are not
relevant to this issue) on which I manually applied your fix.
Unfortunately I wasn't thorough enough and because in that version
there was no `case 'range_string':` in the `__get `magic method from
Ids class I was always getting that ':', regardless of the actual
value of the 'partial' option. Like I said, my bad for not also
checking the current implementation of Ids and assuming it was the
same as mine.
> I haven't gone through and been 100% thorough in type-checking,
> sanity-checking, etc. especially for parameters that are very liberal
> about their data input. Trying to do that right now is simply going
> to add more bloat to the library - Socket.php is already too big.
> Once the commands are broken up into separate classes (coming in
> 3.0), this will probably happen since the code will be more
> maintainable even with all the sanity checks.
>
>> C: 4 UID SORT RETURN (PARTIAL :) (DATE) US-ASCII UNDELETED
>> S: 4 BAD Error in IMAP command UID SORT: PARTIAL range broken.
>>
>> And last but not least, if you're hell bent on allowing this array
>> option
>> then you need even more additional processing, to make sure
>> that you're actually sending a proper partial range. For example
>> something like array(1,2,3,5,6,7), while valid as far as Ids class is
>> concerned, would still break your current fix because it does not
>> have a single sequence-set rage representation and after the
>> conversion it would get translated from '1:3,5:7'*, which is
>> obviously still not a valid partial range:
>
> Incorrect. Directly from the documentation:
>
> * - partial: (mixed) The range of results to return (message sequence
> * numbers) Only a single range is supported (represented by
> * the minimum and maximum values contained in the range
> * given).
> * DEFAULT: All messages are returned.
>
> This code:
Yes, I've read the documentation for partial syntax
(http://dev.horde.org/api/master/lib/Imap_Client/classes/Horde_Imap_Client_Base.html#method_search). That was actually among the first things I did when I tried to use PARTIAL results. But I didn't see that it stated anywhere that the 'partial' option supports an array representation, or any of the other representation variants that the Ids class can handle. The current implementation, after your fix, supports either the literal string representation: 'nz-number:nz-number' or an array representation to specify the same thing, like array('nz-number', 'nz-number') which may or may not contain any other values in between the two that define the range, or an already existing Ids object that may or may not contain a valid partial range, or even Horde_Imap_Client_Ids::ALL, Horde_Imap_Client_Ids::SEARCH_RES which make no sense in this case. I might be wrong, but for me this is neither really correct, nor clear/intuitive, and certainly not inferable only from the (mixed) type hint a!
nd the description for the 'partial' option available in the
documentation.
That being said, I strongly agree with you that this library doesn't
need even more bloat, especially not just for handling things like this.
>
> $pids = $this->getIdsOb(array(1,2,3,5,6,7))->range_string;
>
> returns:
>
> 1:7
>
>> C: 4 UID SORT RETURN (PARTIAL 1:3,5:7) (DATE) US-ASCII UNDELETED
>> S: 4 BAD Error in IMAP command UID SORT: PARTIAL range broken.
>
> That's flat out impossible with the current code (at least for the
> PARTIAL return). $pids, used as the argument to PARTIAL, can NEVER
> contain a comma (,) for example. You either copy/pasted this example
> or you are not running an unmodified copy of the current code.
>
Yes, like already mentioned above I was running an older version of
the library on which I had to manually apply your fix and I didn't do
a thorough job at it.
>> However the reverse is neither implied nor required for either cases.
>> For example you can have a server that supports ESORT but not
>> CONTEXT=SORT, which is exactly the case on my test server:
>
> **EXACTLY**. And a server that supports ESORT but not CONTEXT=SORT
> can't use PARTIAL (or CONTEXT or UPDATE) with ESORT.
>
> Would you happen to be using Dovecot? Dovecot does NOT support
> CONTEXT=SORT. See, e.g., http://wiki2.dovecot.org/Roadmap
>
Yes, you are correct. I'm using Dovecot.
> Now what you are seeing is probably the following: internally,
> Dovecot's SEARCH and SORT code is the same. So limiting via PARTIAL
> will work on SORT simply because the code being used is identical and
> limitations on PARTIAL with SORT aren't being enforced. Which is
> perfectly legal: there's nothing in IMAP spec that prevents commands
> from executing if not listed in CAPABILITY.
>
> But the more difficult part in implementing CONTEXT=SORT actually
> deals with CONTEXT & UPDATE, which hasn't been written yet in
> Dovecot. Without those two, CONTEXT=SORT can't be listed as
> supported since there is no way to indicate which of the 3 return
> values are supported.
>
> Fortuitously this works on your system. But there is absolutely no
> way of knowing this from the CAPABILITY string and no guarantee this
> works on other systems. Per the RFC, the only way PARTIAL is
> supported for SORTs is if CONTEXT=SORT is available
>
> RFC 5267 [4.1]: if CONTEXT=SORT is advertised, ESORT accepts 3
> additional return arguments: UPDATE, CONTEXT, and PARTIAL. Without
> CONTEXT=SORT, these 3 RETURN arguments aren't available. That's all
> that section says.
>
> The RFC does not say that PARTIAL searching is supported for SORTs if
> CONTEXT=SEARCH is available. (Which makes logical sense: there's no
> reason to support CONTEXT=SOMETHING type capability strings unless
> they refer to a specific subset of behavior modifications. Otherwise
> a single CONTEXT capability would have been defined handling both
> SEARCH and SORT.)
>
> What really should be done, and would be a more productive use of our
> time, is to abstract partial support to the client also (i.e.
> client-side sorting if SORT is not available). Meaning a user should
> be able to pass-in the partial argument and get the partial list of
> responses, regardless if CONTEXT is available on the server.
> Obviously CONTEXT would make this a much less expensive operation,
> but that should be irrelevant for the purposes of someone using the
> API.
I'm certainly not an expert and the first time I've came in contact
with IMAP PARTIAL results and this library in particular was about a
month ago. The original issue was that the library was failing on
mailboxes that had a huge number of messages, due to memory
limitations while trying to retrieve all message UIDs, even though the
results were being paginated through on the client side and only the
relevant messages were being fetched afterwards. So it made sense to
try and leverage the use of PARTIAL results to retrieve only the
relevant message UIDs as that would have made the process a lot more
efficient and completely avoided the memory limitation issue. My
problem was that the existing implementation wasn't actually allowing
PARTIAL results for an UID SORT command because CONTEXT=SORT was not
available on my server. And even though that might be the correct
behavior as far as RFC 5267 is concerned, the strange thing was that I
could manually run an UID SORT command to return a correct PARTIAL,
which showed that it was in fact working properly on my server even
w/o the CONTEXT=SORT.
Thank you for your prompt replies and for the clarification about
Dovecot's quirky behavior which actually explains the weird issue
mentioned above.
More information about the bugs
mailing list