[dev] History of share changes

Tue Aug 11 12:23:52 UTC 2009

Zitat von Evert Pot <evertpot at gmail.com>:

>> For CalDAV and other stuff we need to track when a share has been   
>> changed, or more precisely, when the content of a share, i.e the   
>> events/tasks in it have been changed.
>> Currently we only store history entries for individual objects, not  
>>  for the complete share. To identify objects we use a UID like   
>> application:share_id:object_id.
>>
>> There are two ways to solve this. The simple one is to extend the   
>> History API to get the timestamp from the most recently changed  
>> entry  in a share with:
>> SELECT history_ts FROM horde_histories WHERE object_uid LIKE   
>> 'app:share:%' ORDER BY history_ts DESC LIMIT 1
>>
>> The more complex solution is to add specific history entries for  
>> share  changes in all places where we already log object changes.  
>> We could  then use a query like:
>> SELECT history_ts FROM horde_histories WHERE object_uid = 'app:share'
>
>> Usually I would simply pick the most simple solution, but since the  
>> history table could grow pretty large, this might have an impact on  
>> the database performance. I'd like to get some feeback from anyone  
>> who has more experience with fine tuning database queries, where  
>> there may be a significant performance hit with the first query in  
>> a large table. Obviously this is slower, but does it outweigh the  
>> necessary code complexity and additional database record for the  
>> faster query?
>>
>> Also, and this question actually goes to Ben and Evert, do we only  
>> need ETags for existing shares/resources (which is why we need the  
>> change timestamp), or do we have to track the time of the  
>> deletion/addition of complete resources?
>
> Unless I'm misunderstanding the question, there's no need from  
> CalDAV's perspective to preserve older versions of events.
> I would personally generate ETags based on the content of an item  
> (MD5 hash), not the timestamp.  It might be worthwhile storing this  
> ETag for caching purposes.
>
> As for your points about DB speed. Like many others, my background  
> is mostly MySQL based.
> I don't think you'll have to worry about the more complicated query  
> if you have a proper index.
>
> However, using a string for your unique id (uid) is a bad idea. If  
> you were to split up that field, and use plain old integers (appid,  
> shareid) you will will be able to add a BTREE index on appid,  
> shareid, history_ts. In that case size will not be really be an  
> issue. If you're only interested in the history_ts field, in that  
> case the index will be a covering index, and the underlying table  
> will actually not be accessed at all.
>
> For large tables that are accessed often, you should pick InnoDB for  
> mysql, and recommend a large setting for innodb_buffer_pool_size. If  
> the latter setting is low (the default) MySQL will not be happy.
>
> If you are still worried about this after all that, create both a  
> history table and a 'current' table. The history table will likely  
> not be accessed as much as the current dataset, so your 'current'  
> table will be smaller in size, and you can optimize for specific  
> access patterns.

Any more opinions about this?

Jan.

-- 
Do you need professional PHP or Horde consulting?
http://horde.org/consulting/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digitale PGP-Unterschrift
URL: <http://lists.horde.org/archives/dev/attachments/20090811/4f577536/attachment.bin>