[dev] Horde_Share_Base::listShares()

Sat Jan 15 04:14:07 UTC 2011

Quoting Gunnar Wrobel <wrobel at horde.org>:

> Zitat von Jan Schneider <jan at horde.org>:
>
>> Zitat von Michael Rubinsky <mrubinsk at horde.org>:
>>
>>> Quoting Jan Schneider <jan at horde.org>:
>>>
>>>> Zitat von Gunnar Wrobel <wrobel at horde.org>:
>>>>
>>>>> Hi!
>>>>>
>>>>> The function mentioned above does support several sorting/paging  
>>>>> features ('from', 'count', 'sort_by', 'direction'). As far as I  
>>>>> can tell none of these parameters are actively used within our  
>>>>> code base (with the exception of 'sort_by' but 'name' seems to  
>>>>> be the only setting used here). Currently only the Sql-Drivers  
>>>>> supports these features and I wonder if we really need them.
>>>
>>> Huh. That's embarrasing. I'm pretty sure it worked at one point,  
>>> but maybe that was back in the DT days. I never realised that the  
>>> paging parameters were ignored by the SQL share drivers.
>
> They don't. As mentioned in my original mail the SQL drivers are the  
> only ones that actually support it. It does not work within the  
> Horde_Share_Base::listShares() but that only applies to the Datatree  
> and Kolab drivers.
>
>>> I know that Ansel passes these values when performing certain  
>>> searches, like listing all of a user's galleries, or most recent  
>>> galleries, or any number of other searches. The View code in  
>>> Ansel, however, also honors the per-page prefs, which is probably  
>>> why I never noticed this.
>
> Ah, I overlooked that it is indeed used there.
>
>>>
>>> Ansel could potentially generate a large number of shares, since  
>>> each gallery/subgallery is a separate share - though I imagine it  
>>> would be rare to reach the thousands per user number. I can  
>>> understand the issue of large session caches in certain cases.
>>>
>>> I have mixed feelings about not implementing this though. On the  
>>> one hand, if the user *does* have a large number of shares - and  
>>> I'm honestly not sure how large that number would have to be to  
>>> start beccoming an issue - then it seems a waste to carry around  
>>> potentially the user's entire share list in the session if all  
>>> that was ever needed were the first 9 shares. For example -  
>>> imagine I have 100 Ansel galleries. I log into Horde, and since  
>>> Ansel is my initial application, I load the first page of my  
>>> gallery list, I then move on to reading email and don't go back to  
>>> Ansel again during this session.  Without support for paging, this  
>>> would cause all 100 of my ansel shares to be carried around for  
>>> the life of my session...and from what you are saying this is  
>>> currently happening.
>
> No, it is indeed working as you expect it to. I didn't see it this  
> way as I assumed the listShare() parameters could vary a lot and you  
> might end up with many overlapping and redundant segments of the  
> same list. But you are probably right and in most cases the list  
> parameters will be quite regular.
>
>>>
>>> So I guess the question is: Is it more expensive to keep more  
>>> search result entries in the list cache - each with a smaller  
>>> subset of shares, or to keep a smaller number of search results,  
>>> but each with larger numbers of shares?
>>>
>>> I think the case where a user would load thousands of shares, then  
>>> sort them in multiple ways is a fringe case. More likely, a  
>>> smaller subset of those shares would be loaded at a time e.g.,  
>>> when paging through a gallery list.
>
> I agree.
>
>>>
>>> Another option is to ignore the paging/sorting parameters when  
>>> caching and hitting the DB, and apply to paging limits in PHP,  
>>> this has the disadvantage of having to apply the sort to the share  
>>> list each time, prior to applying the limits. Since these  
>>> parameters are not currently being passed to the DB anyway, this  
>>> is probably what we are already doing in the application code.
>
> For Kolab I will have to do the paging and sorting in PHP but of  
> course it makes sense to delegate the task to the backend in case it  
> supports it.
>
>>>
>>>>> Horde_Share_Base::listShares() is fully replaced within  
>>>>> Horde_Share_Sql and Horde_Share_Sql_Hierarchical.
>>>
>>> Doing it this way is probably historical cruft, left over from  
>>> when these drivers were written, and DT was the only supported  
>>> driver.
>>
>> No, this is exactly for the paging, because overwriting the default  
>> method in the base class (which should do the paging/sorting in PHP  
>> code) allows to implement optimized paging/sorting in the backend.
>> It's well possible that this doesn't work anymore, but I'm pretty  
>> sure this was the original idea behind it.
>
> Yes, I agree. I just need to repair the paging/sorting in  
> Horde_Share_Base::listShares() then. For drivers not overwriting  
> listShares but instead using _listShares the whole list of shares  
> gets stored in the _listcache though. That is something I will keep  
> that way as the backend returns the full list anyway. I could  
> imagine that one could also store the sorted/paged results in the  
> cache to avoid the cycles that sorting will take. But at the moment  
> only ansel uses this feature and there is only the SQL driver for  
> ansel so that wouldn't make much sense.
>
>>
>>>>> For Horde_Share_Kolab and Horde_Share_Datatree  
>>>>> Horde_Share_Base::listShares() is called which delegates to  
>>>>> Horde_Share_{Datatree,Kolab}::_listShares(). The sorting/paging  
>>>>> apparently should be handled within  
>>>>> Horde_Share_Base::listShares() for these drivers. The current  
>>>>> implementation is incomplete however.
>>>
>>> Now that DT is deprecated, if we decide to implement the paging in  
>>> the DB, we could probably just make Horde_Share_Base::listShares()  
>>> abstract. Otherwise, yeah, we could just implement it in  
>>> Horde_Share_Base::listShares()
>
> I don't mind either way. The current structure has the side effect  
> of forcing you to add the _listShares() method to the SQL driver  
> which is kind of useless. But if some other driver than SQL based  
> and Kolab would come along it might benefit from the current  
> structure.
>
>>>
>>>>> I don't mind completing the implementation if necessary but  
>>>>> there are some things that currently puzzle me. Assuming a user  
>>>>> has a few thousand shares and we would use the paging feature  
>>>>> for example: Wouldn't the current Horde_Share_Sql:listShares()  
>>>>> quickly fill the Horde_Share_Base::_listcache with plenty of  
>>>>> caching information that gets stored in the session? For every  
>>>>> sort setting it would be the same thing.
>>>
>>> This is assuming that all thousand shares would be returned in  
>>> each query. If we implement paging, it's likely only a small  
>>> subset of those shares would be returned for any one query - the  
>>> amount that would fit on a page. The exception to this is when  
>>> querying for things like select lists, but those would likely not  
>>> be requested with more than a single sort parameter.  Generally,  
>>> though, yes, each sort/page combmination would be cached separately.
>>>
>
> As mentioned above I see now that the current scheme makes sense.
>
>>>>> I also wonder a bit if that is actually information that should  
>>>>> go into the session at all.
>>>
>>> I know that in FW_3, at least at one time, it greatly improved  
>>> performance since share queries tend to be expensive, especially  
>>> on mysql. Jan could probably answer this part better than I, since  
>>> he has done some benchmarking recently.
>>
>> Yes, it improves performance a lot. And IIRC Michael has already  
>> reduced session storage a lot by implementing Serializable.
>> It might still offer enough benefit if this cached in a Horde_Cache  
>> backend instead of the session though. This would be separate from  
>> existing share caching though.
>
> Caching was no question. I was just wondering if it should be part  
> of the session. But if the list really does not get longer than the  
> share list then I'm not really worried (anymore).
>
>>
>>>>> On the other hand there has been a lot of work put into the  
>>>>> Share driver recently and as far as I understand it there is  
>>>>> more to come. I just wanted to get some feedback to understand  
>>>>> where this is heading and if I should ignore the topic for now.
>>>>
>>>> If we don't use some of the paging/sorting features and don't  
>>>> come up with any use case for it (and honestly, who is going to  
>>>> have such a huge load of shares?), I'm fine with removing them to  
>>>> reduce complexity.
>>>
>>> Honestly not 100% sure which way I'd prefer, but I think I'm  
>>> leaning towards having the DB honor the parameters, instead of  
>>> having to sort/page only in pure php.
>
>
> I would then start fixing the current situation soon. After all the  
> Kolab backend is the only that will currently use the PHP based  
> sorting/paging.
>
> Two additional questions though:
>
> 1) Why do we have a separate Sql and Sql_Hierarchical driver?  
> Wouldn't it make sense to collapse the two and always be  
> hierarchical? I'd have no problems supporting a tree structure with  
> the Kolab driver. In fact I'd like to do that.

One of the lessons for me from the DataTree library was that mixing  
trees in with storage *and* auth was a nightmare for scaling. I'd  
rather keep the tree structure out of the Share driver, and  
potentially even move Ansel to tags instead of explicitly nested  
galleries...

-chuck