[dev] History of share changes
Jan Schneider
jan at horde.org
Thu Aug 13 09:05:06 UTC 2009
Zitat von Chuck Hagenbuch <chuck at horde.org>:
> Quoting Jan Schneider <jan at horde.org>:
>
>> Zitat von Evert Pot <evertpot at gmail.com>:
>>
>>>> For CalDAV and other stuff we need to track when a share has been
>>>> changed, or more precisely, when the content of a share, i.e the
>>>> events/tasks in it have been changed.
>>>> Currently we only store history entries for individual objects,
>>>> not for the complete share. To identify objects we use a UID
>>>> like application:share_id:object_id.
>>>>
>>>> There are two ways to solve this. The simple one is to extend the
>>>> History API to get the timestamp from the most recently changed
>>>> entry in a share with:
>>>> SELECT history_ts FROM horde_histories WHERE object_uid LIKE
>>>> 'app:share:%' ORDER BY history_ts DESC LIMIT 1
>>>>
>>>> The more complex solution is to add specific history entries for
>>>> share changes in all places where we already log object changes.
>>>> We could then use a query like:
>>>> SELECT history_ts FROM horde_histories WHERE object_uid = 'app:share'
>>>
>>>> Usually I would simply pick the most simple solution, but since
>>>> the history table could grow pretty large, this might have an
>>>> impact on the database performance. I'd like to get some feeback
>>>> from anyone who has more experience with fine tuning database
>>>> queries, where there may be a significant performance hit with
>>>> the first query in a large table. Obviously this is slower, but
>>>> does it outweigh the necessary code complexity and additional
>>>> database record for the faster query?
>>>>
>>>> Also, and this question actually goes to Ben and Evert, do we
>>>> only need ETags for existing shares/resources (which is why we
>>>> need the change timestamp), or do we have to track the time of
>>>> the deletion/addition of complete resources?
>>>
>>> Unless I'm misunderstanding the question, there's no need from
>>> CalDAV's perspective to preserve older versions of events.
>>> I would personally generate ETags based on the content of an item
>>> (MD5 hash), not the timestamp. It might be worthwhile storing
>>> this ETag for caching purposes.
>>>
>>> As for your points about DB speed. Like many others, my background
>>> is mostly MySQL based.
>>> I don't think you'll have to worry about the more complicated
>>> query if you have a proper index.
>>>
>>> However, using a string for your unique id (uid) is a bad idea. If
>>> you were to split up that field, and use plain old integers
>>> (appid, shareid) you will will be able to add a BTREE index on
>>> appid, shareid, history_ts. In that case size will not be really
>>> be an issue. If you're only interested in the history_ts field, in
>>> that case the index will be a covering index, and the underlying
>>> table will actually not be accessed at all.
>>>
>>> For large tables that are accessed often, you should pick InnoDB
>>> for mysql, and recommend a large setting for
>>> innodb_buffer_pool_size. If the latter setting is low (the
>>> default) MySQL will not be happy.
>>>
>>> If you are still worried about this after all that, create both a
>>> history table and a 'current' table. The history table will likely
>>> not be accessed as much as the current dataset, so your 'current'
>>> table will be smaller in size, and you can optimize for specific
>>> access patterns.
>>
>> Any more opinions about this?
>
> I've been thinking that we should put created and modified fields
> back into most of our entity tables. That would address a lot of the
> problems here for speed and also make History not necessary for many
> uses. The only thing it doesn't address is prior revisions (which
> history doesn't really address either) and tracking deleted items
> (which could either still use history, or use a simplified system).
I don't see any problem with the history system per se. I doubt that
having the date attributes with the original data will have much if
any performance improvement over a separate table.
Having a central place for all history data is much better
maintenance-wise too, we would not only need a place to track deleted
data, but we also have history that's not related to anything stored
in our database, like message history.
Jan.
--
Do you need professional PHP or Horde consulting?
http://horde.org/consulting/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digitale PGP-Unterschrift
URL: <http://lists.horde.org/archives/dev/attachments/20090813/50a98abc/attachment-0001.bin>
More information about the dev
mailing list