[dev] [Corrected] Horde_Imap_Client and fetching vanished messages.

Wed Jan 11 14:56:21 UTC 2012

Quoting Michael M Slusarz <slusarz at horde.org>:

> Quoting Michael J Rubinsky <mrubinsk at horde.org>:
>
>> Quoting Michael M Slusarz <slusarz at horde.org>:
>>
>>> Quoting Michael J Rubinsky <mrubinsk at horde.org>:
>>>
>>>> Some background: This code is all for the purpose of syncing  
>>>> email over ActiveSync. I'm using modseq and changedsince to  
>>>> retrieve the uids of any recently changed email. In this context,  
>>>> 'changed' would mean a new, never before seen email, or an email  
>>>> that has had the seen flag added or removed.
>>>
>>> Pardon my ignorance: what does the ActiveSync client send to the  
>>> server to indicate the current status of its synchronized cache?   
>>> Is it a user-definable cache ID?  Or is it a timestamp of the last  
>>> sync?  Or something else?
>>
>> First off, you probably know this, but the ActiveSync client knows  
>> *nothing* about IMAP (or POP3 for that meatter). In fact, it  
>> doesn't care at all where the messages come from, as long as it  
>> receives them in the format defined by the ActiveSync protocol. As  
>> far as state information goes, the only thing the ActiveSync client  
>> sends to or receives from the ActiveSync server is it's current  
>> syncKey. The syncKey is basically a random hash string with an  
>> integer tacked onto the end of it. The server generates this key  
>> during the first sync of each collection (email|contacts|calendar  
>> etc...). The client sends this key along with each SYNC and PING  
>> request to notify the server what it is assuming the last known  
>> state was. When the state changes, the server increments the  
>> syncKey after sending changes to the client. This is the only bit  
>> of identifying information the client ever gets or sends. Server  
>> side, this syncKey is linked to the server state at the time that  
>> the syncKey was generated. So, e.g., with contacts or calendar  
>> data, this state is basically a timestamp. We use this timestamp to  
>> query the History system to get server changes to send.
>>
>> For mail, the state consists of the modseq/nextuid/uidvalidity  
>> data, along with a list of UIDs and their seen state that are on  
>> the device.
>
> To be clear, this is what each setup (QRESYNC and plain RFC 3501  
> IMAP) needs in term of state:
>
> QRESYNC: UIDVALIDITY, MODSEQ
> IMAP: UIDVALIDITY, UIDNEXT, message flag information
>
> QRESYNC does not require UIDNEXT or mesage flag information (unclear  
> if that's what you were suggesting).

I keep the flag data (actually just whether or not the seen flag is  
set), so we know what the device thinks the message's "read" state is  
(what ActiveSync refers to it as). That way we know if a flag change  
needs to be sent to the device or not. Sending unnecessary changes,  
even just flag changes, to the device wastes mobile bandwidth and  
contributes to poor battery performance since the change causes the  
currently running PING to terminate, a new SYNC request to be made and  
handled, and finally, a new PING request to be made. Each one of those  
requests, obviously, has all the normal ActiveSync protocol overhead.

I guess one could argue here that this point is moot since these  
changes would rarely be duplicates; If the IMAP server is sending a  
change, it would be rare that the flag on the device would already  
match the flag on the IMAP server. The only place this would  
consistently happen is when a flag is initially changed on the device.  
This causes the change to be sent to the IMAP server (through the  
ActiveSync code, of course) which, in turn, will cause the flag change  
to be detected the next time we FETCH changes with changedsince. Plus,  
this case can be dealt with the same way we deal with device-caused  
changes in the other collections - we save the incoming change in a  
separate cache and compare changes that the server is sending against  
those we *know* came from the client. When we find a match, we ignore  
the change and remove the entry from the cache. This might still cause  
premature PING termination, but *most* of the time the change would be  
caught (and ignored) during the same SYNC request that is sending the  
device changes anyway.

Since I already needed to save the UIDs to detect deleted messages, it  
was easy to just add the flag state there as well. The bottom line to  
this point is that if you implement this functionality in  
Horde_Imap_Client, I would no longer need to cache the UID list and  
flag state in the ActiveSync driver.

>>> If user-definable, you would obviously be using a combination of  
>>> UIDVALIDITY + MODSEQ if QRESYNC is available.  If QRESYNC is not  
>>> available, it will be a combination of UIDVALIDTY + UIDNEXT.  But  
>>> note the latter cannot ever reliably catch flag changes (you need  
>>> to do a FETCH FLAGS on every sync to reliably catch flag changes  
>>> without QRESYNC).
>>>
>>>> If I don't use 'changedsince', I would have to retrieve the  
>>>> complete set of UIDs again, along with flags and compare each  
>>>> message with my stored device state.
>>>
>>> This should only need to be done once, on the first sync,  
>>> regardless of whether QRESYNC is available or not.
>>
>> I'm confused. If QRESYNC/CONDSTORE is NOT available, how else would  
>> I be able to catch flag changes, other than querying the imap  
>> client for what flags are on each message and comparing them with  
>> the device's state? To be clear, I'm talking about querying the  
>> imap client, not the server itself. I assume the client will be  
>> smart about when it actually talks to the server, and what it asks  
>> for.
>
> Guess I skipped a step.  You *do* need to do the above if you want  
> to be 100% sure to sync flags correctly.  However, this can be/is a  
> potentially expensive operation on the server.  So a practical  
> solution, and one we use in IMP, is that we DON'T guarantee that  
> flags are synchronized correctly.  99% of the time, this assumption  
> is fine - very few people have multiple connections open to an IMAP  
> server that are simultaneously updating flags.

For IMP that makes sense. For an ActiveSync client it does not. An  
ActiveSync client can be considered always active as long as the  
device is turned on. Every email the IMAP server receives will be  
pushed to the device and marked as unseen. If I am sitting at my desk,  
dealing with email throughout the day, I don't want my device to still  
show all of those emails as unseen when I get home. Granted, this  
point will be moot if the emails are moved to a different folder while  
I read them, but not everybody keeps their INBOX that clean. The  
reverse is not as big of a problem, since even if I leave IMPs session  
open all night and when I return I find all the mail I have already  
read on my device still marked as unseen, I can simply refresh the  
browser. The only way to 100% guarantee a full refresh like this on an  
ActiveSync device is to force-remove the device's state on the server  
and cause a complete re-sync (this is also how we would deal with the  
need to invalidate the device's state due to e.g., UIDVALIDITY  
changing).

ActiveSync NEEDS to provide a way to reliably sync flags to the  
client. Ideally, it would be great to provide this for both QRESYNC  
and IMAP. Perhaps make it a configuration switch to turn on the  
support in IMAP only servers. Would be nice if the functionality was  
abstracted in the Imap_Client before 4.1. If not, I can implement it  
in the ActiveSync driver.

>>> In the absence of QRESYNC, it is still trivial to determine the  
>>> list of *new* messages since the last sync - since the cache ID  
>>> will be using the last known UID, it is simply a matter of  
>>> FETCH'ing all UIDs greater than this value.  But this does not  
>>> catch flag changes and it does not catch messages that have since  
>>> been deleted.  So, practically, you do need to do this  
>>> array_diff() (both ways - one to catch new messages, one to catch  
>>> deleted messages) to sync.
>>
>> This is why I thought I needed to query the imap client for the  
>> list of UIDs and flag state when QRESYNC is not available - so I  
>> can compare the flags against what the device thinks each message's  
>> flag state is.
>
> Is there no way to just send flag changes to the device?  In other  
> words, changed messages are an all-or-nothing action?

Yes, only flag changes can be sent. In fact, that's all you *should*  
send. The message is expected to be sent only once. Some devices will  
simply duplicate the message in the INBOX if it is sent twice,  
regardless of the UID.  When a message is marked seen on the server,  
all that is sent to the client is the instruction to mark it as read.

> I'm thinking it may be useful to abstract this kind of  
> synchronization into Imap Client itself.  Meaning: abstracting  
> changedsince/vanished so that it will work even without QRESYNC.
>
>>>> Caching in the imap client would obviously help with imap server  
>>>> load, but given how frequently this must occur, I'd like to avoid  
>>>> having to iterate over each message to check the status of the  
>>>> flags to determine what has changed. Using changedsince gets me  
>>>> only the messages that have changed, greatly reducing the number  
>>>> of messages I'd have to iterate.
>>>
>>> As mentioned above, without QRESYNC this is impossible.   
>>> (Actually, CHANGEDSINCE was defined with CONDSTORE)
>>
>> See comment above. If QRESYNC is not available you say it still  
>> shouldn't be necessary to get a list of UIDs and flags. Not sure  
>> how else I would be able to catch flag changes? Am I  
>> misunderstanding something again?
>
> I didn't realize that you were keeping a local cache of flags that  
> you could use to compare against.

Yes. I was trying to avoid keeping that kind of state, but realized it  
would be necessary. Since I was already keeping the list of the UIDs  
on the device (to deal with detecting deletions), it will be easy to  
add the flag state if needed.

> Still, it would be nice to have this done automatically by  
> Horde_Imap_Client.  Meaning that, upon opening a mailbox, it would  
> automatically sync flag changes to the local cache without having to  
> do it in application code.  Although I realize it may not work in  
> your situation: the server sends a list of messages to the client;  
> the flags are changed somehow; this change occurs in a Horde access  
> not associated with the activesync syncing; thus, the next time the  
> activesync sync occurs we are comparing the IMAP server state with  
> the Horde cache state, NOT the activesync client state.

Ah. I did not realize that the IMAP client cache would be the same for  
both the user's Horde session and the ActiveSync session. I guess I'll  
need to keep the flags cached in ActiveSync after all, right?  
ActiveSync connects, in addition to getting new messages and expunged  
messages, gets a list of changed uids/flags from Horde_Imap_Client  
(regardless of how it determined them - IMAP or QRESYNC) and can then  
compare those flags against the client state. So, basically the  
ActiveSync client code would really not change much from what I was  
planning - I would still need to compare the device flags with what  
Horde_Imap_Client tells me they are - it's just that the optimizations  
that would come from having QRESYNC available would be done inside  
Horde_Imap_Client?

> You can see how CONDSTORE/QRESYNC makes things much easier.

Indeed.

> Sidenote: CONDSTORE by itself is not enough to properly implement  
> this.  CONDSTORE works when the connecting client is the MUA.  In  
> the activesync case, the client (Horde) is actually acting as a  
> proxy to the activesync client. We need the additional VANISHED  
> functionality provided by QRESYNC to handle everything solely via  
> the MODSEQ number.  (Otherwise, it will require UID additional  
> FETCHing to do this properly.)

Yeah, thanks for clearing that up. I'm already doing the extra UID  
FETCHing and array_diff()ing from when VANISHED wasn't working. I was  
planning to check for QRESYNC and then use VANISH if it's available,  
fall back to the additional FETCH if not...but if you are going to add  
this functionality to Horde_Imap_Client...

-- 
mike

The Horde Project (www.horde.org)
mrubinsk at horde.org