[dev] History of share changes

Thu Aug 13 04:08:04 UTC 2009

Quoting Jan Schneider <jan at horde.org>:

> Zitat von Evert Pot <evertpot at gmail.com>:
>
>>> For CalDAV and other stuff we need to track when a share has been   
>>> changed, or more precisely, when the content of a share, i.e the   
>>> events/tasks in it have been changed.
>>> Currently we only store history entries for individual objects,  
>>> not  for the complete share. To identify objects we use a UID like  
>>>  application:share_id:object_id.
>>>
>>> There are two ways to solve this. The simple one is to extend the   
>>> History API to get the timestamp from the most recently changed  
>>> entry  in a share with:
>>> SELECT history_ts FROM horde_histories WHERE object_uid LIKE   
>>> 'app:share:%' ORDER BY history_ts DESC LIMIT 1
>>>
>>> The more complex solution is to add specific history entries for  
>>> share  changes in all places where we already log object changes.  
>>> We could  then use a query like:
>>> SELECT history_ts FROM horde_histories WHERE object_uid = 'app:share'
>>
>>> Usually I would simply pick the most simple solution, but since  
>>> the history table could grow pretty large, this might have an  
>>> impact on the database performance. I'd like to get some feeback  
>>> from anyone who has more experience with fine tuning database  
>>> queries, where there may be a significant performance hit with the  
>>> first query in a large table. Obviously this is slower, but does  
>>> it outweigh the necessary code complexity and additional database  
>>> record for the faster query?
>>>
>>> Also, and this question actually goes to Ben and Evert, do we only  
>>> need ETags for existing shares/resources (which is why we need the  
>>> change timestamp), or do we have to track the time of the  
>>> deletion/addition of complete resources?
>>
>> Unless I'm misunderstanding the question, there's no need from  
>> CalDAV's perspective to preserve older versions of events.
>> I would personally generate ETags based on the content of an item  
>> (MD5 hash), not the timestamp.  It might be worthwhile storing this  
>> ETag for caching purposes.
>>
>> As for your points about DB speed. Like many others, my background  
>> is mostly MySQL based.
>> I don't think you'll have to worry about the more complicated query  
>> if you have a proper index.
>>
>> However, using a string for your unique id (uid) is a bad idea. If  
>> you were to split up that field, and use plain old integers (appid,  
>> shareid) you will will be able to add a BTREE index on appid,  
>> shareid, history_ts. In that case size will not be really be an  
>> issue. If you're only interested in the history_ts field, in that  
>> case the index will be a covering index, and the underlying table  
>> will actually not be accessed at all.
>>
>> For large tables that are accessed often, you should pick InnoDB  
>> for mysql, and recommend a large setting for  
>> innodb_buffer_pool_size. If the latter setting is low (the default)  
>> MySQL will not be happy.
>>
>> If you are still worried about this after all that, create both a  
>> history table and a 'current' table. The history table will likely  
>> not be accessed as much as the current dataset, so your 'current'  
>> table will be smaller in size, and you can optimize for specific  
>> access patterns.
>
> Any more opinions about this?

I've been thinking that we should put created and modified fields back  
into most of our entity tables. That would address a lot of the  
problems here for speed and also make History not necessary for many  
uses. The only thing it doesn't address is prior revisions (which  
history doesn't really address either) and tracking deleted items  
(which could either still use history, or use a simplified system).

-chuck