[dev] History of share changes

Evert Pot evertpot at gmail.com
Sat Aug 1 01:16:07 UTC 2009


> Usually I would simply pick the most simple solution, but since the  
> history table could grow pretty large, this might have an impact on  
> the database performance. I'd like to get some feeback from anyone  
> who has more experience with fine tuning database queries, where  
> there may be a significant performance hit with the first query in a  
> large table. Obviously this is slower, but does it outweigh the  
> necessary code complexity and additional database record for the  
> faster query?
>
> Also, and this question actually goes to Ben and Evert, do we only  
> need ETags for existing shares/resources (which is why we need the  
> change timestamp), or do we have to track the time of the deletion/ 
> addition of complete resources?

Unless I'm misunderstanding the question, there's no need from  
CalDAV's perspective to preserve older versions of events.
I would personally generate ETags based on the content of an item (MD5  
hash), not the timestamp.  It might be worthwhile storing this ETag  
for caching purposes.

As for your points about DB speed. Like many others, my background is  
mostly MySQL based.
I don't think you'll have to worry about the more complicated query if  
you have a proper index.

However, using a string for your unique id (uid) is a bad idea. If you  
were to split up that field, and use plain old integers (appid,  
shareid) you will will be able to add a BTREE index on appid, shareid,  
history_ts. In that case size will not be really be an issue. If  
you're only interested in the history_ts field, in that case the index  
will be a covering index, and the underlying table will actually not  
be accessed at all.

For large tables that are accessed often, you should pick InnoDB for  
mysql, and recommend a large setting for innodb_buffer_pool_size. If  
the latter setting is low (the default) MySQL will not be happy.

If you are still worried about this after all that, create both a  
history table and a 'current' table. The history table will likely not  
be accessed as much as the current dataset, so your 'current' table  
will be smaller in size, and you can optimize for specific access  
patterns.

I hope this helps,
Evert



More information about the dev mailing list