[dev] Fix kronolith-agenda script

Thu Mar 29 14:14:23 UTC 2012

Zitat von Gonçalo Queirós <goncalo.queiros at portugalmail.net>:

> On 03/21/2012 11:40 AM, Gonçalo Queirós wrote:
>> Citando Jan Schneider <jan at horde.org>:
>>> Zitat von Gonçalo Queirós <goncalo.queiros at portugalmail.net>:
>>>> Hi there dev.
>>>>
>>>>   We were trying to use the kronolith-agenda script to send daily  
>>>> agendas to
>>>>   everyone on our service, but the problem is that we currently have more
>>>>   than 350.000 shares and the script just runs out of memory,  
>>>> even with 2Gb!
>>>>
>>>>   Looking at the script more closely, we think there's a way to make the
>>>>   script work, regardless of the number of shares on the system,  
>>>> but we would
>>>>   like your opinion on that before coming out with a patch.
>>>>
>>>>   Current script:
>>>>   1 - Get every calendar share
>>>>   2 - For every share, list its events, to check if it has any  
>>>> event to the
>>>>   current day
>>>  This is not correct, there is no such step
>> Sorry, was debugging on my own code ;-)
>>>
>>>> 3 - For the remaining list get all users that have access to the calendars
>>>>   4 - For every user, check if he desires to receive the daily  
>>>> agenda (pref)
>>>>   5 - For every user that wants to receive the daily agenda, get his
>>>>   calendars
>>>>   6 - Again for every calendar get the ones that have any event to the
>>>>   current day
>>>>   7 - Send the email if there's any calendar left
>>>>
>>>>   For our installation the current script stops on the first step, because
>>>>   it runs out of memory.
>>>>   We thought on creating sub-sets for the shares, but the problem  
>>>> is that we
>>>>   only know the full agenda of a user after we analyze all shares he has
>>>>   access to.
>>>>
>>>>   What we propose:
>>>>   1 - Get every users that desire to receive the daily agenda (pref)
>>>  That's exactly what steps 1, 3, and 4 do.
>> I know, the difference is that instead of asking for all the shares, we
>> could ask directly for all the users that wan't to receive an agenda, so
>> we could eventually narrow the search.
>> Also, knowing the user instead of the calendars allows us to immediately
>> send the user his agenda, which will free the memory when the loop for
>> that user ends.
>>>> 2 - execute the steps 5,6,7 of the original script
>>>>
>>>>   In the worst case scenario this script will still perform  
>>>> better than the
>>>>   current one, because it doesn't have the first 3 steps.
>>>>   With this approach we can create sub-sets of users which will allow the
>>>>   script to run until the end without running out of memory (even  
>>>> if this is
>>>>   a long process, it will execute)
>>>>
>>>>   Problems:
>>>>   We don't think there's currently a method to retrieve all prefs from the
>>>>   backed by its name. Maybe we need to create it, and state that  
>>>> this is for
>>>>   admin purposes only and shouldn't be called by the user-level code (just
>>>>   like the listAllShares method from Horde_Share_Sql).
>>>>   Currently the pref_name column is not indexed, so we expect  
>>>> slow queries.
>>>>   The fix for that is obvious.
>>>  The problem is that not all backends support listing preference  
>>> details for others than the current user. On the other hand, those  
>>> backends are already missing some features anyway, and we already  
>>> have a listScopes() method that is only implemented in a few  
>>> backends too. In the end this might be an option.
>>>  Actually, since the same script already requires the preference  
>>> backend to return any user's preference, this would be a safe  
>>> approach.
>>>
>>>  Another approach would be do take a further look at why  
>>> listAllShares() exceeds memory. Well, there is not much to look  
>>> actually when creating 350.000+ share objects and then attempting  
>>> to sort them. Alternatively we could implement an Iterator  
>>> interface in the share drivers and only receive the shares one by  
>>> one while looping them.
>>>
>>>  Jan.
>>>
>>>  --
>>>  The Horde Project
>>> http://www.horde.org/
>>>
>>>
>>>  --
>>>  Horde developers mailing list
>>>  Frequently Asked Questions: http://horde.org/faq/To unsubscribe,  
>>> mail: dev-unsubscribe at lists.horde.org
>> If you iterate the shares one by one i think you will end up with a
>> similar problem, because you can't send an agenda before you have all the
>> events from all the calendars that a user has access to. So you would
>> probably end up again with the 350.000+ shares on memory.
>>
>> Will try to produce a patch and submit for your appreciation.
>
> Jan, attached is a first preview patch, so you can see if things are  
> going in the desired direction.
> Returning a class that implements the ArrayAccess allows backward  
> compatibility, but unfortunately, any array_ like function  
> (array_merge, array_keys, etc) don't work, so a bit more of refactor  
> is needed.
>
> One thing that needs to be done is move the Horde_Share_List factory  
> to an injector (if I understand the injectors correctly), but for  
> now I instantiated the objects inside the Horde_Share_Sql directly.
>
> Another problem I found was that Horde_Shares can have callbacks for  
> listings, so for now I create the new Horde_Share_List object and if  
> the callback is active, that object is sent to the callback. This  
> might brake up for users that rely on array_ like functions, because  
> as stated above they wont work now.
>
> What do you devs think?

- You cannot implement ArrayAccess in the base class and Iterator in  
the sub-class only. Why do you implement ArrayAccess at all? If  
necessary, the Horde_Share_List consumers can use iterator_to_array().
- You cannot use PDO functionality in the SQL driver. You need to use  
the Horde_Db API. Horde_Db's select() already returns an Iterator that  
you can proxy in Horde_Share_List.
- You cannot use the injector or any other globals in library code.
- You only need share_id in the SQL driver's SELECT query.
- Using the iterator obviously scales better memory-wise, but did you  
test how it scales performance-wise. IIUC you are now reading each  
share individually from the backend which could be a performance  
degration.

Jan.

-- 
The Horde Project
http://www.horde.org/