[dev] [Corrected] Horde_Imap_Client and fetching vanished messages.

Mon Jan 16 23:50:59 UTC 2012

Quoting Michael M Slusarz <slusarz at horde.org>:

> Quoting Michael J Rubinsky <mrubinsk at horde.org>:
>
>> Quoting Michael M Slusarz <slusarz at horde.org>:
>>
>>> Quoting Michael J Rubinsky <mrubinsk at horde.org>:
>>>
>>>> Quoting Michael M Slusarz <slusarz at horde.org>:
>>>>
>>>>> Quoting Michael J Rubinsky <mrubinsk at horde.org>:
>>>>>
>>>>>> Some background: This code is all for the purpose of syncing  
>>>>>> email over ActiveSync. I'm using modseq and changedsince to  
>>>>>> retrieve the uids of any recently changed email. In this  
>>>>>> context, 'changed' would mean a new, never before seen email,  
>>>>>> or an email that has had the seen flag added or removed.
>>>>>
>>>>> Pardon my ignorance: what does the ActiveSync client send to the  
>>>>> server to indicate the current status of its synchronized cache?  
>>>>>  Is it a user-definable cache ID?  Or is it a timestamp of the  
>>>>> last sync?  Or something else?
>>>>
>>>> First off, you probably know this, but the ActiveSync client  
>>>> knows *nothing* about IMAP (or POP3 for that meatter). In fact,  
>>>> it doesn't care at all where the messages come from, as long as  
>>>> it receives them in the format defined by the ActiveSync  
>>>> protocol. As far as state information goes, the only thing the  
>>>> ActiveSync client sends to or receives from the ActiveSync server  
>>>> is it's current syncKey. The syncKey is basically a random hash  
>>>> string with an integer tacked onto the end of it. The server  
>>>> generates this key during the first sync of each collection  
>>>> (email|contacts|calendar etc...). The client sends this key along  
>>>> with each SYNC and PING request to notify the server what it is  
>>>> assuming the last known state was. When the state changes, the  
>>>> server increments the syncKey after sending changes to the  
>>>> client. This is the only bit of identifying information the  
>>>> client ever gets or sends. Server side, this syncKey is linked to  
>>>> the server state at the time that the syncKey was generated. So,  
>>>> e.g., with contacts or calendar data, this state is basically a  
>>>> timestamp. We use this timestamp to query the History system to  
>>>> get server changes to send.
>>>>
>>>> For mail, the state consists of the modseq/nextuid/uidvalidity  
>>>> data, along with a list of UIDs and their seen state that are on  
>>>> the device.
>>>
>>> To be clear, this is what each setup (QRESYNC and plain RFC 3501  
>>> IMAP) needs in term of state:
>>>
>>> QRESYNC: UIDVALIDITY, MODSEQ
>>> IMAP: UIDVALIDITY, UIDNEXT, message flag information
>>>
>>> QRESYNC does not require UIDNEXT or mesage flag information  
>>> (unclear if that's what you were suggesting).
>>
>> I keep the flag data (actually just whether or not the seen flag is  
>> set), so we know what the device thinks the message's "read" state  
>> is (what ActiveSync refers to it as). That way we know if a flag  
>> change needs to be sent to the device or not. Sending unnecessary  
>> changes, even just flag changes, to the device wastes mobile  
>> bandwidth and contributes to poor battery performance since the  
>> change causes the currently running PING to terminate, a new SYNC  
>> request to be made and handled, and finally, a new PING request to  
>> be made. Each one of those requests, obviously, has all the normal  
>> ActiveSync protocol overhead.
>>
>> I guess one could argue here that this point is moot since these  
>> changes would rarely be duplicates; If the IMAP server is sending a  
>> change, it would be rare that the flag on the device would already  
>> match the flag on the IMAP server. The only place this would  
>> consistently happen is when a flag is initially changed on the  
>> device. This causes the change to be sent to the IMAP server  
>> (through the ActiveSync code, of course) which, in turn, will cause  
>> the flag change to be detected the next time we FETCH changes with  
>> changedsince.
>
> No, this shouldn't happen.  Say that, on a sync, you get a read flag  
> from the ActiveSync device.  You will take this flag change and send  
> to the IMAP server.  Once you are done with all the changes needed  
> on the IMAP server, you will grab the highestmodseq number and store  
> in your local cache.  Thus, if nothing changes on the IMAP server  
> the next time you sync, the highestmodseq will equal the current  
> modseq of the mailbox thus indicating you are properly sync'd.

The incoming changes are applied to the backend first. So, e.g., a  
typical sync session would be something like:

(1) At some point a SYNC occurs and we cache various server values  
like modseq.
(2) A message is read on the device, and the device sends a SYNC  
command to the server. This SYNC command can, but is not required to,  
also request server side changes.
(3) The incoming message change is sent to the backend server (thus  
altering the servers's modseq value).
(4) Only after incoming changes are processed (and if this were a  
collection like contacts or events, checked for conflicts), we check  
for server side changes.
(5) After all changes (client and server) are dealt with, we cache the  
various bits of state that we need and tie it to the next syncKey.  
Again, in this case it would be modseq and uidvalidity. For other  
collections, it would be timestamp data.

Since the server side changes are checked after the incoming change is  
processed, the modseq value will *always* be different than the  
previously cached value. For other collections we deal with this with  
a combination of knowing what timespan we are checking for changes in,  
as well as caching the incoming changes until after the next SYNC.  
This is how current incarnation of how mail support in ActiveSync  
behaves in my local branch. A read flag almost immediatley causes the  
same message to attempt to be sent back to the client - with the same  
flag change). It's only the previously discussed client change caching  
that prevents it.

>> Plus, this case can be dealt with the same way we deal with  
>> device-caused changes in the other collections - we save the  
>> incoming change in a separate cache and compare changes that the  
>> server is sending against those we *know* came from the client.  
>> When we find a match, we ignore the change and remove the entry  
>> from the cache. This might still cause premature PING termination,  
>> but *most* of the time the change would be caught (and ignored)  
>> during the same SYNC request that is sending the device changes  
>> anyway.
>
> This is an optimization, yes, but not technically necessary.

Unless I totally misunderstand what you are saying, I think this *is*  
necessary (see above).

>> Since I already needed to save the UIDs to detect deleted messages,  
>> it was easy to just add the flag state there as well. The bottom  
>> line to this point is that if you implement this functionality in  
>> Horde_Imap_Client, I would no longer need to cache the UID list and  
>> flag state in the ActiveSync driver.
>
> This is not something that can easily be built into the Imap_Client  
> driver, per se.  As previously discussed, the Imap_Client object is  
> only concerned about syncing between the client object and the  
> remote IMAP server.  It can not track syncing at a different level.   
> This was a valid design decision - useful because it hides all  
> caching logic within the client object so that an application does  
> not have to worry about how/why caching is needed.

Sure. Understood.

> Once you start caching an independent data store that Imap_Client  
> has no knowledge of, we necessarily have to expose some level of  
> caching since the application now has to manually handle the sync  
> state (Imap_Client can't automatically track this anymore).  So  
> implementing this functionality won't take place in the  
> Horde_Imap_Client_Base drivers itself.

Agreed. That's why I originally planned this as a decorator/proxy that  
activesync would wrap around the Imap_Client.

> I was thinking instead that this could be implemented in an overlay,  
> or utility class, that just abstracts out some of the Imap calls  
> needed but doesn't try to automagically control the synchronization  
> process.  We can leverage the existing Horde_Imap_Client_Cache  
> object which would save some overhead.  But I haven't thought this  
> out too much yet.
>
>> For IMP that makes sense. For an ActiveSync client it does not. An  
>> ActiveSync client can be considered always active as long as the  
>> device is turned on. Every email the IMAP server receives will be  
>> pushed to the device and marked as unseen. If I am sitting at my  
>> desk, dealing with email throughout the day, I don't want my device  
>> to still show all of those emails as unseen when I get home.  
>> Granted, this point will be moot if the emails are moved to a  
>> different folder while I read them, but not everybody keeps their  
>> INBOX that clean. The reverse is not as big of a problem, since  
>> even if I leave IMPs session open all night and when I return I  
>> find all the mail I have already read on my device still marked as  
>> unseen, I can simply refresh the browser. The only way to 100%  
>> guarantee a full refresh like this on an ActiveSync device is to  
>> force-remove the device's state on the server and cause a complete  
>> re-sync (this is also how we would deal with the need to invalidate  
>> the device's state due to e.g., UIDVALIDITY changing).
>
> The good news is that UIDVALIDITY changing should never ever ever  
> never happen in practical usage.  So you should not worry about  
> performance issues regarding this.

Good news, indeed.

>> ActiveSync NEEDS to provide a way to reliably sync flags to the  
>> client. Ideally, it would be great to provide this for both QRESYNC  
>> and IMAP. Perhaps make it a configuration switch to turn on the  
>> support in IMAP only servers. Would be nice if the functionality  
>> was abstracted in the Imap_Client before 4.1. If not, I can  
>> implement it in the ActiveSync driver.
>
> This is something that can/should be added to  
> Horde_Imap_Client_Base.  Most likely as a configuration flag/option  
> along the lines of "always sync flag changes when opening a  
> mailbox".  When using a QRESYNC/CONDSTORE server, this is already  
> done automatically so this adds no load.  This flag would only cause  
> extra work for servers that don't support these extensions.
>
>> Ah. I did not realize that the IMAP client cache would be the same  
>> for both the user's Horde session and the ActiveSync session. I  
>> guess I'll need to keep the flags cached in ActiveSync after all,  
>> right?
>
> Technically, it does not need to be the same - you could pass a  
> different cache object in.  There would just be duplication of IMAP  
> data though if that happens, which could be quite a bit of data.
>
> Maybe what needs to be done is to separate message data caching from  
> mailbox list caching.  Although I'm not sure the activesync code has  
> access to the Imp IMAP cache object anyway, so this might be a  
> non-issue.

Currently, I use the mail/ImapOb API method to obtain the Imap_Client,  
if that's what you mean. Also, just FYI,  I use the IMP API to get the  
mailbox list (so I don't have to duplicate code in Core's ActiveSync  
driver), and I have added an API method to IMP to get the special  
folder names. I know I can get it from prefs directly, but (1) didn't  
want to duplicate the code, and (2) using IMP to do this also takes  
into account various hooks.

>> ActiveSync connects, in addition to getting new messages and  
>> expunged messages, gets a list of changed uids/flags from  
>> Horde_Imap_Client (regardless of how it determined them - IMAP or  
>> QRESYNC) and can then compare those flags against the client state.  
>> So, basically the ActiveSync client code would really not change  
>> much from what I was planning - I would still need to compare the  
>> device flags with what Horde_Imap_Client tells me they are - it's  
>> just that the optimizations that would come from having QRESYNC  
>> available would be done inside Horde_Imap_Client?
>
> The optimizations from QRESYNC are already available in  
> Horde_Imap_Client.  It's the opposite - abstracting the code  
> necessary for the non-QRESYNC situation into Horde_Imap_Client.  So  
> you don't have to write all of that code in the application.
> Although again, not quite sure how I will/would/can do this - there  
> does not seem to be an easy way of abstracting the idea of a MODSEQ  
> value on the client side.  So that's what I need to figure out - an  
> easy way to determine sync state on both QRESYNC and non-QRESYNC  
> devices.

Ok. After digesting this and thinking about it some more, I think what  
I'm going to do for the time being is this:

Since the Horde session and ActiveSync session would theoretically  
need different cache objects, I'd rather not rely on this. It seems  
like a lot of duplicated data, especially since ActiveSync would only  
need a fraction of it.

It seems to me that the better way would be for ActiveSync to cache  
the data it needs. The infrastructure is already in place for this in  
ActiveSync code. If QRESYNC is available, this would only have to be  
the modseq/uidvalidity values. I will assume that any flag changes  
coming from the server are NOT duplicates so I would have no need to  
know the device's message flags (since we can catch 99.9% of the  
changes that would be duplicates by utilizing the system that already  
exists for the other collections).

If QRESYNC is NOT available, I'd need to cache nextuid and uidvalidity  
and at least the UIDs that exist on the device so I can detect  
vanished messages. To support flag changes if QRESYNC is not available  
I would also cache the seen flag state for each message. I'd probably  
make this part configurable to avoid large installs being killed by  
this.

Mail support in ActiveSync is being targeted for the 4.1 releases  
coming up in April. I think this approach probably makes the most  
sense, given the time constraints we are talking about and the fact  
that most of the infrastructure for this already exists in the  
ActiveSync code. If it turns out to be possible/practical to abstract  
out a MODSEQ-like functionality for non-QRESYNC servers that would  
make sense for ActiveSync, I could adapt the ActiveSync code for the  
5.0 release.

> (Wow, this thread is getting very technically dense.)

Indeed. Though I am learning a lot, thanks for taking the time to  
explain this.

-- 
mike

The Horde Project (www.horde.org)
mrubinsk at horde.org