[dev] Horde_Share_Base::listShares()
Chuck Hagenbuch
chuck at horde.org
Sat Jan 15 04:14:07 UTC 2011
Quoting Gunnar Wrobel <wrobel at horde.org>:
> Zitat von Jan Schneider <jan at horde.org>:
>
>> Zitat von Michael Rubinsky <mrubinsk at horde.org>:
>>
>>> Quoting Jan Schneider <jan at horde.org>:
>>>
>>>> Zitat von Gunnar Wrobel <wrobel at horde.org>:
>>>>
>>>>> Hi!
>>>>>
>>>>> The function mentioned above does support several sorting/paging
>>>>> features ('from', 'count', 'sort_by', 'direction'). As far as I
>>>>> can tell none of these parameters are actively used within our
>>>>> code base (with the exception of 'sort_by' but 'name' seems to
>>>>> be the only setting used here). Currently only the Sql-Drivers
>>>>> supports these features and I wonder if we really need them.
>>>
>>> Huh. That's embarrasing. I'm pretty sure it worked at one point,
>>> but maybe that was back in the DT days. I never realised that the
>>> paging parameters were ignored by the SQL share drivers.
>
> They don't. As mentioned in my original mail the SQL drivers are the
> only ones that actually support it. It does not work within the
> Horde_Share_Base::listShares() but that only applies to the Datatree
> and Kolab drivers.
>
>>> I know that Ansel passes these values when performing certain
>>> searches, like listing all of a user's galleries, or most recent
>>> galleries, or any number of other searches. The View code in
>>> Ansel, however, also honors the per-page prefs, which is probably
>>> why I never noticed this.
>
> Ah, I overlooked that it is indeed used there.
>
>>>
>>> Ansel could potentially generate a large number of shares, since
>>> each gallery/subgallery is a separate share - though I imagine it
>>> would be rare to reach the thousands per user number. I can
>>> understand the issue of large session caches in certain cases.
>>>
>>> I have mixed feelings about not implementing this though. On the
>>> one hand, if the user *does* have a large number of shares - and
>>> I'm honestly not sure how large that number would have to be to
>>> start beccoming an issue - then it seems a waste to carry around
>>> potentially the user's entire share list in the session if all
>>> that was ever needed were the first 9 shares. For example -
>>> imagine I have 100 Ansel galleries. I log into Horde, and since
>>> Ansel is my initial application, I load the first page of my
>>> gallery list, I then move on to reading email and don't go back to
>>> Ansel again during this session. Without support for paging, this
>>> would cause all 100 of my ansel shares to be carried around for
>>> the life of my session...and from what you are saying this is
>>> currently happening.
>
> No, it is indeed working as you expect it to. I didn't see it this
> way as I assumed the listShare() parameters could vary a lot and you
> might end up with many overlapping and redundant segments of the
> same list. But you are probably right and in most cases the list
> parameters will be quite regular.
>
>>>
>>> So I guess the question is: Is it more expensive to keep more
>>> search result entries in the list cache - each with a smaller
>>> subset of shares, or to keep a smaller number of search results,
>>> but each with larger numbers of shares?
>>>
>>> I think the case where a user would load thousands of shares, then
>>> sort them in multiple ways is a fringe case. More likely, a
>>> smaller subset of those shares would be loaded at a time e.g.,
>>> when paging through a gallery list.
>
> I agree.
>
>>>
>>> Another option is to ignore the paging/sorting parameters when
>>> caching and hitting the DB, and apply to paging limits in PHP,
>>> this has the disadvantage of having to apply the sort to the share
>>> list each time, prior to applying the limits. Since these
>>> parameters are not currently being passed to the DB anyway, this
>>> is probably what we are already doing in the application code.
>
> For Kolab I will have to do the paging and sorting in PHP but of
> course it makes sense to delegate the task to the backend in case it
> supports it.
>
>>>
>>>>> Horde_Share_Base::listShares() is fully replaced within
>>>>> Horde_Share_Sql and Horde_Share_Sql_Hierarchical.
>>>
>>> Doing it this way is probably historical cruft, left over from
>>> when these drivers were written, and DT was the only supported
>>> driver.
>>
>> No, this is exactly for the paging, because overwriting the default
>> method in the base class (which should do the paging/sorting in PHP
>> code) allows to implement optimized paging/sorting in the backend.
>> It's well possible that this doesn't work anymore, but I'm pretty
>> sure this was the original idea behind it.
>
> Yes, I agree. I just need to repair the paging/sorting in
> Horde_Share_Base::listShares() then. For drivers not overwriting
> listShares but instead using _listShares the whole list of shares
> gets stored in the _listcache though. That is something I will keep
> that way as the backend returns the full list anyway. I could
> imagine that one could also store the sorted/paged results in the
> cache to avoid the cycles that sorting will take. But at the moment
> only ansel uses this feature and there is only the SQL driver for
> ansel so that wouldn't make much sense.
>
>>
>>>>> For Horde_Share_Kolab and Horde_Share_Datatree
>>>>> Horde_Share_Base::listShares() is called which delegates to
>>>>> Horde_Share_{Datatree,Kolab}::_listShares(). The sorting/paging
>>>>> apparently should be handled within
>>>>> Horde_Share_Base::listShares() for these drivers. The current
>>>>> implementation is incomplete however.
>>>
>>> Now that DT is deprecated, if we decide to implement the paging in
>>> the DB, we could probably just make Horde_Share_Base::listShares()
>>> abstract. Otherwise, yeah, we could just implement it in
>>> Horde_Share_Base::listShares()
>
> I don't mind either way. The current structure has the side effect
> of forcing you to add the _listShares() method to the SQL driver
> which is kind of useless. But if some other driver than SQL based
> and Kolab would come along it might benefit from the current
> structure.
>
>>>
>>>>> I don't mind completing the implementation if necessary but
>>>>> there are some things that currently puzzle me. Assuming a user
>>>>> has a few thousand shares and we would use the paging feature
>>>>> for example: Wouldn't the current Horde_Share_Sql:listShares()
>>>>> quickly fill the Horde_Share_Base::_listcache with plenty of
>>>>> caching information that gets stored in the session? For every
>>>>> sort setting it would be the same thing.
>>>
>>> This is assuming that all thousand shares would be returned in
>>> each query. If we implement paging, it's likely only a small
>>> subset of those shares would be returned for any one query - the
>>> amount that would fit on a page. The exception to this is when
>>> querying for things like select lists, but those would likely not
>>> be requested with more than a single sort parameter. Generally,
>>> though, yes, each sort/page combmination would be cached separately.
>>>
>
> As mentioned above I see now that the current scheme makes sense.
>
>>>>> I also wonder a bit if that is actually information that should
>>>>> go into the session at all.
>>>
>>> I know that in FW_3, at least at one time, it greatly improved
>>> performance since share queries tend to be expensive, especially
>>> on mysql. Jan could probably answer this part better than I, since
>>> he has done some benchmarking recently.
>>
>> Yes, it improves performance a lot. And IIRC Michael has already
>> reduced session storage a lot by implementing Serializable.
>> It might still offer enough benefit if this cached in a Horde_Cache
>> backend instead of the session though. This would be separate from
>> existing share caching though.
>
> Caching was no question. I was just wondering if it should be part
> of the session. But if the list really does not get longer than the
> share list then I'm not really worried (anymore).
>
>>
>>>>> On the other hand there has been a lot of work put into the
>>>>> Share driver recently and as far as I understand it there is
>>>>> more to come. I just wanted to get some feedback to understand
>>>>> where this is heading and if I should ignore the topic for now.
>>>>
>>>> If we don't use some of the paging/sorting features and don't
>>>> come up with any use case for it (and honestly, who is going to
>>>> have such a huge load of shares?), I'm fine with removing them to
>>>> reduce complexity.
>>>
>>> Honestly not 100% sure which way I'd prefer, but I think I'm
>>> leaning towards having the DB honor the parameters, instead of
>>> having to sort/page only in pure php.
>
>
> I would then start fixing the current situation soon. After all the
> Kolab backend is the only that will currently use the PHP based
> sorting/paging.
>
> Two additional questions though:
>
> 1) Why do we have a separate Sql and Sql_Hierarchical driver?
> Wouldn't it make sense to collapse the two and always be
> hierarchical? I'd have no problems supporting a tree structure with
> the Kolab driver. In fact I'd like to do that.
One of the lessons for me from the DataTree library was that mixing
trees in with storage *and* auth was a nightmare for scaling. I'd
rather keep the tree structure out of the Share driver, and
potentially even move Ansel to tags instead of explicitly nested
galleries...
-chuck
More information about the dev
mailing list