[dev] [Corrected] Horde_Imap_Client and fetching vanished messages.
Michael J Rubinsky
mrubinsk at horde.org
Wed Jan 11 14:56:21 UTC 2012
Quoting Michael M Slusarz <slusarz at horde.org>:
> Quoting Michael J Rubinsky <mrubinsk at horde.org>:
>
>> Quoting Michael M Slusarz <slusarz at horde.org>:
>>
>>> Quoting Michael J Rubinsky <mrubinsk at horde.org>:
>>>
>>>> Some background: This code is all for the purpose of syncing
>>>> email over ActiveSync. I'm using modseq and changedsince to
>>>> retrieve the uids of any recently changed email. In this context,
>>>> 'changed' would mean a new, never before seen email, or an email
>>>> that has had the seen flag added or removed.
>>>
>>> Pardon my ignorance: what does the ActiveSync client send to the
>>> server to indicate the current status of its synchronized cache?
>>> Is it a user-definable cache ID? Or is it a timestamp of the last
>>> sync? Or something else?
>>
>> First off, you probably know this, but the ActiveSync client knows
>> *nothing* about IMAP (or POP3 for that meatter). In fact, it
>> doesn't care at all where the messages come from, as long as it
>> receives them in the format defined by the ActiveSync protocol. As
>> far as state information goes, the only thing the ActiveSync client
>> sends to or receives from the ActiveSync server is it's current
>> syncKey. The syncKey is basically a random hash string with an
>> integer tacked onto the end of it. The server generates this key
>> during the first sync of each collection (email|contacts|calendar
>> etc...). The client sends this key along with each SYNC and PING
>> request to notify the server what it is assuming the last known
>> state was. When the state changes, the server increments the
>> syncKey after sending changes to the client. This is the only bit
>> of identifying information the client ever gets or sends. Server
>> side, this syncKey is linked to the server state at the time that
>> the syncKey was generated. So, e.g., with contacts or calendar
>> data, this state is basically a timestamp. We use this timestamp to
>> query the History system to get server changes to send.
>>
>> For mail, the state consists of the modseq/nextuid/uidvalidity
>> data, along with a list of UIDs and their seen state that are on
>> the device.
>
> To be clear, this is what each setup (QRESYNC and plain RFC 3501
> IMAP) needs in term of state:
>
> QRESYNC: UIDVALIDITY, MODSEQ
> IMAP: UIDVALIDITY, UIDNEXT, message flag information
>
> QRESYNC does not require UIDNEXT or mesage flag information (unclear
> if that's what you were suggesting).
I keep the flag data (actually just whether or not the seen flag is
set), so we know what the device thinks the message's "read" state is
(what ActiveSync refers to it as). That way we know if a flag change
needs to be sent to the device or not. Sending unnecessary changes,
even just flag changes, to the device wastes mobile bandwidth and
contributes to poor battery performance since the change causes the
currently running PING to terminate, a new SYNC request to be made and
handled, and finally, a new PING request to be made. Each one of those
requests, obviously, has all the normal ActiveSync protocol overhead.
I guess one could argue here that this point is moot since these
changes would rarely be duplicates; If the IMAP server is sending a
change, it would be rare that the flag on the device would already
match the flag on the IMAP server. The only place this would
consistently happen is when a flag is initially changed on the device.
This causes the change to be sent to the IMAP server (through the
ActiveSync code, of course) which, in turn, will cause the flag change
to be detected the next time we FETCH changes with changedsince. Plus,
this case can be dealt with the same way we deal with device-caused
changes in the other collections - we save the incoming change in a
separate cache and compare changes that the server is sending against
those we *know* came from the client. When we find a match, we ignore
the change and remove the entry from the cache. This might still cause
premature PING termination, but *most* of the time the change would be
caught (and ignored) during the same SYNC request that is sending the
device changes anyway.
Since I already needed to save the UIDs to detect deleted messages, it
was easy to just add the flag state there as well. The bottom line to
this point is that if you implement this functionality in
Horde_Imap_Client, I would no longer need to cache the UID list and
flag state in the ActiveSync driver.
>>> If user-definable, you would obviously be using a combination of
>>> UIDVALIDITY + MODSEQ if QRESYNC is available. If QRESYNC is not
>>> available, it will be a combination of UIDVALIDTY + UIDNEXT. But
>>> note the latter cannot ever reliably catch flag changes (you need
>>> to do a FETCH FLAGS on every sync to reliably catch flag changes
>>> without QRESYNC).
>>>
>>>> If I don't use 'changedsince', I would have to retrieve the
>>>> complete set of UIDs again, along with flags and compare each
>>>> message with my stored device state.
>>>
>>> This should only need to be done once, on the first sync,
>>> regardless of whether QRESYNC is available or not.
>>
>> I'm confused. If QRESYNC/CONDSTORE is NOT available, how else would
>> I be able to catch flag changes, other than querying the imap
>> client for what flags are on each message and comparing them with
>> the device's state? To be clear, I'm talking about querying the
>> imap client, not the server itself. I assume the client will be
>> smart about when it actually talks to the server, and what it asks
>> for.
>
> Guess I skipped a step. You *do* need to do the above if you want
> to be 100% sure to sync flags correctly. However, this can be/is a
> potentially expensive operation on the server. So a practical
> solution, and one we use in IMP, is that we DON'T guarantee that
> flags are synchronized correctly. 99% of the time, this assumption
> is fine - very few people have multiple connections open to an IMAP
> server that are simultaneously updating flags.
For IMP that makes sense. For an ActiveSync client it does not. An
ActiveSync client can be considered always active as long as the
device is turned on. Every email the IMAP server receives will be
pushed to the device and marked as unseen. If I am sitting at my desk,
dealing with email throughout the day, I don't want my device to still
show all of those emails as unseen when I get home. Granted, this
point will be moot if the emails are moved to a different folder while
I read them, but not everybody keeps their INBOX that clean. The
reverse is not as big of a problem, since even if I leave IMPs session
open all night and when I return I find all the mail I have already
read on my device still marked as unseen, I can simply refresh the
browser. The only way to 100% guarantee a full refresh like this on an
ActiveSync device is to force-remove the device's state on the server
and cause a complete re-sync (this is also how we would deal with the
need to invalidate the device's state due to e.g., UIDVALIDITY
changing).
ActiveSync NEEDS to provide a way to reliably sync flags to the
client. Ideally, it would be great to provide this for both QRESYNC
and IMAP. Perhaps make it a configuration switch to turn on the
support in IMAP only servers. Would be nice if the functionality was
abstracted in the Imap_Client before 4.1. If not, I can implement it
in the ActiveSync driver.
>>> In the absence of QRESYNC, it is still trivial to determine the
>>> list of *new* messages since the last sync - since the cache ID
>>> will be using the last known UID, it is simply a matter of
>>> FETCH'ing all UIDs greater than this value. But this does not
>>> catch flag changes and it does not catch messages that have since
>>> been deleted. So, practically, you do need to do this
>>> array_diff() (both ways - one to catch new messages, one to catch
>>> deleted messages) to sync.
>>
>> This is why I thought I needed to query the imap client for the
>> list of UIDs and flag state when QRESYNC is not available - so I
>> can compare the flags against what the device thinks each message's
>> flag state is.
>
> Is there no way to just send flag changes to the device? In other
> words, changed messages are an all-or-nothing action?
Yes, only flag changes can be sent. In fact, that's all you *should*
send. The message is expected to be sent only once. Some devices will
simply duplicate the message in the INBOX if it is sent twice,
regardless of the UID. When a message is marked seen on the server,
all that is sent to the client is the instruction to mark it as read.
> I'm thinking it may be useful to abstract this kind of
> synchronization into Imap Client itself. Meaning: abstracting
> changedsince/vanished so that it will work even without QRESYNC.
>
>>>> Caching in the imap client would obviously help with imap server
>>>> load, but given how frequently this must occur, I'd like to avoid
>>>> having to iterate over each message to check the status of the
>>>> flags to determine what has changed. Using changedsince gets me
>>>> only the messages that have changed, greatly reducing the number
>>>> of messages I'd have to iterate.
>>>
>>> As mentioned above, without QRESYNC this is impossible.
>>> (Actually, CHANGEDSINCE was defined with CONDSTORE)
>>
>> See comment above. If QRESYNC is not available you say it still
>> shouldn't be necessary to get a list of UIDs and flags. Not sure
>> how else I would be able to catch flag changes? Am I
>> misunderstanding something again?
>
> I didn't realize that you were keeping a local cache of flags that
> you could use to compare against.
Yes. I was trying to avoid keeping that kind of state, but realized it
would be necessary. Since I was already keeping the list of the UIDs
on the device (to deal with detecting deletions), it will be easy to
add the flag state if needed.
> Still, it would be nice to have this done automatically by
> Horde_Imap_Client. Meaning that, upon opening a mailbox, it would
> automatically sync flag changes to the local cache without having to
> do it in application code. Although I realize it may not work in
> your situation: the server sends a list of messages to the client;
> the flags are changed somehow; this change occurs in a Horde access
> not associated with the activesync syncing; thus, the next time the
> activesync sync occurs we are comparing the IMAP server state with
> the Horde cache state, NOT the activesync client state.
Ah. I did not realize that the IMAP client cache would be the same for
both the user's Horde session and the ActiveSync session. I guess I'll
need to keep the flags cached in ActiveSync after all, right?
ActiveSync connects, in addition to getting new messages and expunged
messages, gets a list of changed uids/flags from Horde_Imap_Client
(regardless of how it determined them - IMAP or QRESYNC) and can then
compare those flags against the client state. So, basically the
ActiveSync client code would really not change much from what I was
planning - I would still need to compare the device flags with what
Horde_Imap_Client tells me they are - it's just that the optimizations
that would come from having QRESYNC available would be done inside
Horde_Imap_Client?
> You can see how CONDSTORE/QRESYNC makes things much easier.
Indeed.
> Sidenote: CONDSTORE by itself is not enough to properly implement
> this. CONDSTORE works when the connecting client is the MUA. In
> the activesync case, the client (Horde) is actually acting as a
> proxy to the activesync client. We need the additional VANISHED
> functionality provided by QRESYNC to handle everything solely via
> the MODSEQ number. (Otherwise, it will require UID additional
> FETCHing to do this properly.)
Yeah, thanks for clearing that up. I'm already doing the extra UID
FETCHing and array_diff()ing from when VANISHED wasn't working. I was
planning to check for QRESYNC and then use VANISH if it's available,
fall back to the additional FETCH if not...but if you are going to add
this functionality to Horde_Imap_Client...
--
mike
The Horde Project (www.horde.org)
mrubinsk at horde.org
More information about the dev
mailing list