[dev] [Corrected] Horde_Imap_Client and fetching vanished messages.

Wed Jan 11 00:34:07 UTC 2012

Quoting Michael J Rubinsky <mrubinsk at horde.org>:

> Quoting Michael M Slusarz <slusarz at horde.org>:
>
>> Quoting Michael J Rubinsky <mrubinsk at horde.org>:
>>
>>> Some background: This code is all for the purpose of syncing email  
>>> over ActiveSync. I'm using modseq and changedsince to retrieve the  
>>> uids of any recently changed email. In this context, 'changed'  
>>> would mean a new, never before seen email, or an email that has  
>>> had the seen flag added or removed.
>>
>> Pardon my ignorance: what does the ActiveSync client send to the  
>> server to indicate the current status of its synchronized cache?   
>> Is it a user-definable cache ID?  Or is it a timestamp of the last  
>> sync?  Or something else?
>
> First off, you probably know this, but the ActiveSync client knows  
> *nothing* about IMAP (or POP3 for that meatter). In fact, it doesn't  
> care at all where the messages come from, as long as it receives  
> them in the format defined by the ActiveSync protocol. As far as  
> state information goes, the only thing the ActiveSync client sends  
> to or receives from the ActiveSync server is it's current syncKey.  
> The syncKey is basically a random hash string with an integer tacked  
> onto the end of it. The server generates this key during the first  
> sync of each collection (email|contacts|calendar etc...). The client  
> sends this key along with each SYNC and PING request to notify the  
> server what it is assuming the last known state was. When the state  
> changes, the server increments the syncKey after sending changes to  
> the client. This is the only bit of identifying information the  
> client ever gets or sends. Server side, this syncKey is linked to  
> the server state at the time that the syncKey was generated. So,  
> e.g., with contacts or calendar data, this state is basically a  
> timestamp. We use this timestamp to query the History system to get  
> server changes to send.
>
> For mail, the state consists of the modseq/nextuid/uidvalidity data,  
> along with a list of UIDs and their seen state that are on the device.

To be clear, this is what each setup (QRESYNC and plain RFC 3501 IMAP)  
needs in term of state:

QRESYNC: UIDVALIDITY, MODSEQ
IMAP: UIDVALIDITY, UIDNEXT, message flag information

QRESYNC does not require UIDNEXT or mesage flag information (unclear  
if that's what you were suggesting).

>> If user-definable, you would obviously be using a combination of  
>> UIDVALIDITY + MODSEQ if QRESYNC is available.  If QRESYNC is not  
>> available, it will be a combination of UIDVALIDTY + UIDNEXT.  But  
>> note the latter cannot ever reliably catch flag changes (you need  
>> to do a FETCH FLAGS on every sync to reliably catch flag changes  
>> without QRESYNC).
>>
>>> If I don't use 'changedsince', I would have to retrieve the  
>>> complete set of UIDs again, along with flags and compare each  
>>> message with my stored device state.
>>
>> This should only need to be done once, on the first sync,  
>> regardless of whether QRESYNC is available or not.
>
> I'm confused. If QRESYNC/CONDSTORE is NOT available, how else would  
> I be able to catch flag changes, other than querying the imap client  
> for what flags are on each message and comparing them with the  
> device's state? To be clear, I'm talking about querying the imap  
> client, not the server itself. I assume the client will be smart  
> about when it actually talks to the server, and what it asks for.

Guess I skipped a step.  You *do* need to do the above if you want to  
be 100% sure to sync flags correctly.  However, this can be/is a  
potentially expensive operation on the server.  So a practical  
solution, and one we use in IMP, is that we DON'T guarantee that flags  
are synchronized correctly.  99% of the time, this assumption is fine  
- very few people have multiple connections open to an IMAP server  
that are simultaneously updating flags.

>> In the absence of QRESYNC, it is still trivial to determine the  
>> list of *new* messages since the last sync - since the cache ID  
>> will be using the last known UID, it is simply a matter of  
>> FETCH'ing all UIDs greater than this value.  But this does not  
>> catch flag changes and it does not catch messages that have since  
>> been deleted.  So, practically, you do need to do this array_diff()  
>> (both ways - one to catch new messages, one to catch deleted  
>> messages) to sync.
>
> This is why I thought I needed to query the imap client for the list  
> of UIDs and flag state when QRESYNC is not available - so I can  
> compare the flags against what the device thinks each message's flag  
> state is.

Is there no way to just send flag changes to the device?  In other  
words, changed messages are an all-or-nothing action?

I'm thinking it may be useful to abstract this kind of synchronization  
into Imap Client itself.  Meaning: abstracting changedsince/vanished  
so that it will work even without QRESYNC.

>>> Caching in the imap client would obviously help with imap server  
>>> load, but given how frequently this must occur, I'd like to avoid  
>>> having to iterate over each message to check the status of the  
>>> flags to determine what has changed. Using changedsince gets me  
>>> only the messages that have changed, greatly reducing the number  
>>> of messages I'd have to iterate.
>>
>> As mentioned above, without QRESYNC this is impossible.  (Actually,  
>> CHANGEDSINCE was defined with CONDSTORE)
>
> See comment above. If QRESYNC is not available you say it still  
> shouldn't be necessary to get a list of UIDs and flags. Not sure how  
> else I would be able to catch flag changes? Am I misunderstanding  
> something again?

I didn't realize that you were keeping a local cache of flags that you  
could use to compare against.  Still, it would be nice to have this  
done automatically by Horde_Imap_Client.  Meaning that, upon opening a  
mailbox, it would automatically sync flag changes to the local cache  
without having to do it in application code.  Although I realize it  
may not work in your situation: the server sends a list of messages to  
the client; the flags are changed somehow; this change occurs in a  
Horde access not associated with the activesync syncing; thus, the  
next time the activesync sync occurs we are comparing the IMAP server  
state with the Horde cache state, NOT the activesync client state.

You can see how CONDSTORE/QRESYNC makes things much easier.

Sidenote: CONDSTORE by itself is not enough to properly implement  
this.  CONDSTORE works when the connecting client is the MUA.  In the  
activesync case, the client (Horde) is actually acting as a proxy to  
the activesync client. We need the additional VANISHED functionality  
provided by QRESYNC to handle everything solely via the MODSEQ number.  
  (Otherwise, it will require UID additional FETCHing to do this  
properly.)

michael

___________________________________
Michael Slusarz [slusarz at horde.org]