[dev] History of share changes

Jan Schneider jan at horde.org
Thu Aug 13 09:05:06 UTC 2009


Zitat von Chuck Hagenbuch <chuck at horde.org>:

> Quoting Jan Schneider <jan at horde.org>:
>
>> Zitat von Evert Pot <evertpot at gmail.com>:
>>
>>>> For CalDAV and other stuff we need to track when a share has been  
>>>>  changed, or more precisely, when the content of a share, i.e the  
>>>>  events/tasks in it have been changed.
>>>> Currently we only store history entries for individual objects,  
>>>> not  for the complete share. To identify objects we use a UID  
>>>> like  application:share_id:object_id.
>>>>
>>>> There are two ways to solve this. The simple one is to extend the  
>>>>  History API to get the timestamp from the most recently changed  
>>>> entry  in a share with:
>>>> SELECT history_ts FROM horde_histories WHERE object_uid LIKE   
>>>> 'app:share:%' ORDER BY history_ts DESC LIMIT 1
>>>>
>>>> The more complex solution is to add specific history entries for  
>>>> share  changes in all places where we already log object changes.  
>>>> We could  then use a query like:
>>>> SELECT history_ts FROM horde_histories WHERE object_uid = 'app:share'
>>>
>>>> Usually I would simply pick the most simple solution, but since  
>>>> the history table could grow pretty large, this might have an  
>>>> impact on the database performance. I'd like to get some feeback  
>>>> from anyone who has more experience with fine tuning database  
>>>> queries, where there may be a significant performance hit with  
>>>> the first query in a large table. Obviously this is slower, but  
>>>> does it outweigh the necessary code complexity and additional  
>>>> database record for the faster query?
>>>>
>>>> Also, and this question actually goes to Ben and Evert, do we  
>>>> only need ETags for existing shares/resources (which is why we  
>>>> need the change timestamp), or do we have to track the time of  
>>>> the deletion/addition of complete resources?
>>>
>>> Unless I'm misunderstanding the question, there's no need from  
>>> CalDAV's perspective to preserve older versions of events.
>>> I would personally generate ETags based on the content of an item  
>>> (MD5 hash), not the timestamp.  It might be worthwhile storing  
>>> this ETag for caching purposes.
>>>
>>> As for your points about DB speed. Like many others, my background  
>>> is mostly MySQL based.
>>> I don't think you'll have to worry about the more complicated  
>>> query if you have a proper index.
>>>
>>> However, using a string for your unique id (uid) is a bad idea. If  
>>> you were to split up that field, and use plain old integers  
>>> (appid, shareid) you will will be able to add a BTREE index on  
>>> appid, shareid, history_ts. In that case size will not be really  
>>> be an issue. If you're only interested in the history_ts field, in  
>>> that case the index will be a covering index, and the underlying  
>>> table will actually not be accessed at all.
>>>
>>> For large tables that are accessed often, you should pick InnoDB  
>>> for mysql, and recommend a large setting for  
>>> innodb_buffer_pool_size. If the latter setting is low (the  
>>> default) MySQL will not be happy.
>>>
>>> If you are still worried about this after all that, create both a  
>>> history table and a 'current' table. The history table will likely  
>>> not be accessed as much as the current dataset, so your 'current'  
>>> table will be smaller in size, and you can optimize for specific  
>>> access patterns.
>>
>> Any more opinions about this?
>
> I've been thinking that we should put created and modified fields  
> back into most of our entity tables. That would address a lot of the  
> problems here for speed and also make History not necessary for many  
> uses. The only thing it doesn't address is prior revisions (which  
> history doesn't really address either) and tracking deleted items  
> (which could either still use history, or use a simplified system).

I don't see any problem with the history system per se. I doubt that  
having the date attributes with the original data will have much if  
any performance improvement over a separate table.
Having a central place for all history data is much better  
maintenance-wise too, we would not only need a place to track deleted  
data, but we also have history that's not related to anything stored  
in our database, like message history.

Jan.

-- 
Do you need professional PHP or Horde consulting?
http://horde.org/consulting/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digitale PGP-Unterschrift
URL: <http://lists.horde.org/archives/dev/attachments/20090813/50a98abc/attachment-0001.bin>


More information about the dev mailing list