[dev] Fix kronolith-agenda script

Gonçalo Queirós goncalo.queiros at portugalmail.net
Mon Apr 2 11:10:27 UTC 2012


Citando Jan Schneider <jan at horde.org>:
> Zitat von Gonçalo Queirós <goncalo.queiros at portugalmail.net>:
>> On 03/21/2012 11:40 AM, Gonçalo Queirós wrote:   > Citando Jan  
>> Schneider <jan at horde.org>:    > Zitat von Gonçalo Queirós  
>> <goncalo.queiros at portugalmail.net>:     > Hi there dev.
>>>>>
>>>>>       We were trying to use the kronolith-agenda script to send  
>>>>> daily agendas to
>>>>>       everyone on our service, but the problem is that we  
>>>>> currently have more
>>>>>       than 350.000 shares and the script just runs out of  
>>>>> memory, even with 2Gb!
>>>>>
>>>>>       Looking at the script more closely, we think there's a way  
>>>>> to make the
>>>>>       script work, regardless of the number of shares on the  
>>>>> system, but we would
>>>>>       like your opinion on that before coming out with a patch.
>>>>>
>>>>>       Current script:
>>>>>       1 - Get every calendar share
>>>>>       2 - For every share, list its events, to check if it has  
>>>>> any event to the
>>>>>       current day
>>>>      This is not correct, there is no such step
>>>     Sorry, was debugging on my own code ;-)    >> 3 - For the  
>>> remaining list get all users that have access to the calendars
>>>>>       4 - For every user, check if he desires to receive the  
>>>>> daily agenda (pref)
>>>>>       5 - For every user that wants to receive the daily agenda, get his
>>>>>       calendars
>>>>>       6 - Again for every calendar get the ones that have any  
>>>>> event to the
>>>>>       current day
>>>>>       7 - Send the email if there's any calendar left
>>>>>
>>>>>       For our installation the current script stops on the first  
>>>>> step, because
>>>>>       it runs out of memory.
>>>>>       We thought on creating sub-sets for the shares, but the  
>>>>> problem is that we
>>>>>       only know the full agenda of a user after we analyze all  
>>>>> shares he has
>>>>>       access to.
>>>>>
>>>>>       What we propose:
>>>>>       1 - Get every users that desire to receive the daily agenda (pref)
>>>>      That's exactly what steps 1, 3, and 4 do.
>>>     I know, the difference is that instead of asking for all the shares, we
>>>     could ask directly for all the users that wan't to receive an  
>>> agenda, so
>>>     we could eventually narrow the search.
>>>     Also, knowing the user instead of the calendars allows us to  
>>> immediately
>>>     send the user his agenda, which will free the memory when the loop for
>>>     that user ends.    >> 2 - execute the steps 5,6,7 of the  
>>> original script
>>>>>
>>>>>       In the worst case scenario this script will still perform  
>>>>> better than the
>>>>>       current one, because it doesn't have the first 3 steps.
>>>>>       With this approach we can create sub-sets of users which  
>>>>> will allow the
>>>>>       script to run until the end without running out of memory  
>>>>> (even if this is
>>>>>       a long process, it will execute)
>>>>>
>>>>>       Problems:
>>>>>       We don't think there's currently a method to retrieve all  
>>>>> prefs from the
>>>>>       backed by its name. Maybe we need to create it, and state  
>>>>> that this is for
>>>>>       admin purposes only and shouldn't be called by the  
>>>>> user-level code (just
>>>>>       like the listAllShares method from Horde_Share_Sql).
>>>>>       Currently the pref_name column is not indexed, so we  
>>>>> expect slow queries.
>>>>>       The fix for that is obvious.
>>>>      The problem is that not all backends support listing  
>>>> preference details for others than the current user. On the other  
>>>> hand, those backends are already missing some features anyway,  
>>>> and we already have a listScopes() method that is only  
>>>> implemented in a few backends too. In the end this might be an  
>>>> option.
>>>>      Actually, since the same script already requires the  
>>>> preference backend to return any user's preference, this would be  
>>>> a safe approach.
>>>>
>>>>      Another approach would be do take a further look at why  
>>>> listAllShares() exceeds memory. Well, there is not much to look  
>>>> actually when creating 350.000+ share objects and then attempting  
>>>> to sort them. Alternatively we could implement an Iterator  
>>>> interface in the share drivers and only receive the shares one by  
>>>> one while looping them.
>>>>
>>>>      Jan.
>>>>
>>>>      --
>>>>      The Horde Project
>>>> http://www.horde.org/
>>>>
>>>>
>>>>      --
>>>>      Horde developers mailing list
>>>>      Frequently Asked Questions: http://horde.org/faq/To  
>>>> unsubscribe, mail: dev-unsubscribe at lists.horde.org
>>>     If you iterate the shares one by one i think you will end up with a
>>>     similar problem, because you can't send an agenda before you  
>>> have all the
>>>     events from all the calendars that a user has access to. So you would
>>>     probably end up again with the 350.000+ shares on memory.
>>>
>>>     Will try to produce a patch and submit for your appreciation.
>>    Jan, attached is a first preview patch, so you can see if things  
>> are going in the desired direction.
>>    Returning a class that implements the ArrayAccess allows  
>> backward compatibility, but unfortunately, any array_ like function  
>> (array_merge, array_keys, etc) don't work, so a bit more of  
>> refactor is needed.
>>
>>    One thing that needs to be done is move the Horde_Share_List  
>> factory to an injector (if I understand the injectors correctly),  
>> but for now I instantiated the objects inside the Horde_Share_Sql  
>> directly.
>>
>>    Another problem I found was that Horde_Shares can have callbacks  
>> for listings, so for now I create the new Horde_Share_List object  
>> and if the callback is active, that object is sent to the callback.  
>> This might brake up for users that rely on array_ like functions,  
>> because as stated above they wont work now.
>>
>>    What do you devs think?
>   - You cannot implement ArrayAccess in the base class and Iterator  
> in the sub-class only. Why do you implement ArrayAccess at all? If  
> necessary, the Horde_Share_List consumers can use iterator_to_array().
If i remember correclty, you sugested the ArrayAccess for a matter of
backward compatibility.
>
>   - You cannot use PDO functionality in the SQL driver. You need to  
> use the Horde_Db API. Horde_Db's select() already returns an  
> Iterator that you can proxy in Horde_Share_List.
The Horde_Db api states that it returns an iterator, but it actually
returns a PDO statement, that's why i need to use PDO functionality.
>
>   - You cannot use the injector or any other globals in library code.
Ok. >
>   - You only need share_id in the SQL driver's SELECT query.
I also need the name (because of the key), but i understand your point.
>
>   - Using the iterator obviously scales better memory-wise, but did  
> you test how it scales performance-wise. IIUC you are now reading  
> each share individually from the backend which could be a  
> performance degration.

I haven't tested this performance-wise, but i don't doubt it will run
slower than the current approach. The difference is that the current
approach will stop working when you get to a certain amount of shares. I
will test this a bit further so we can have a better knowledge of the real
impact in performance as soon as i have a stable patch ;-)

    Thanks,

    Gonçalo Queirós


More information about the dev mailing list