[dev] History of share changes
Evert Pot
evertpot at gmail.com
Sat Aug 1 01:16:07 UTC 2009
> Usually I would simply pick the most simple solution, but since the
> history table could grow pretty large, this might have an impact on
> the database performance. I'd like to get some feeback from anyone
> who has more experience with fine tuning database queries, where
> there may be a significant performance hit with the first query in a
> large table. Obviously this is slower, but does it outweigh the
> necessary code complexity and additional database record for the
> faster query?
>
> Also, and this question actually goes to Ben and Evert, do we only
> need ETags for existing shares/resources (which is why we need the
> change timestamp), or do we have to track the time of the deletion/
> addition of complete resources?
Unless I'm misunderstanding the question, there's no need from
CalDAV's perspective to preserve older versions of events.
I would personally generate ETags based on the content of an item (MD5
hash), not the timestamp. It might be worthwhile storing this ETag
for caching purposes.
As for your points about DB speed. Like many others, my background is
mostly MySQL based.
I don't think you'll have to worry about the more complicated query if
you have a proper index.
However, using a string for your unique id (uid) is a bad idea. If you
were to split up that field, and use plain old integers (appid,
shareid) you will will be able to add a BTREE index on appid, shareid,
history_ts. In that case size will not be really be an issue. If
you're only interested in the history_ts field, in that case the index
will be a covering index, and the underlying table will actually not
be accessed at all.
For large tables that are accessed often, you should pick InnoDB for
mysql, and recommend a large setting for innodb_buffer_pool_size. If
the latter setting is low (the default) MySQL will not be happy.
If you are still worried about this after all that, create both a
history table and a 'current' table. The history table will likely not
be accessed as much as the current dataset, so your 'current' table
will be smaller in size, and you can optimize for specific access
patterns.
I hope this helps,
Evert
More information about the dev
mailing list