[dev] Fix kronolith-agenda script

Jan Schneider jan at horde.org
Mon Apr 2 21:40:35 UTC 2012


Zitat von Gonçalo Queirós <goncalo.queiros at portugalmail.net>:

> Citando Jan Schneider <jan at horde.org>:
>> Zitat von Gonçalo Queirós <goncalo.queiros at portugalmail.net>:
>>> On 03/21/2012 11:40 AM, Gonçalo Queirós wrote:   > Citando Jan  
>>> Schneider <jan at horde.org>:    > Zitat von Gonçalo Queirós  
>>> <goncalo.queiros at portugalmail.net>:     > Hi there dev.
>>>>>>
>>>>>>      We were trying to use the kronolith-agenda script to send  
>>>>>> daily agendas to
>>>>>>      everyone on our service, but the problem is that we  
>>>>>> currently have more
>>>>>>      than 350.000 shares and the script just runs out of  
>>>>>> memory, even with 2Gb!
>>>>>>
>>>>>>      Looking at the script more closely, we think there's a way  
>>>>>> to make the
>>>>>>      script work, regardless of the number of shares on the  
>>>>>> system, but we would
>>>>>>      like your opinion on that before coming out with a patch.
>>>>>>
>>>>>>      Current script:
>>>>>>      1 - Get every calendar share
>>>>>>      2 - For every share, list its events, to check if it has  
>>>>>> any event to the
>>>>>>      current day
>>>>>     This is not correct, there is no such step
>>>>    Sorry, was debugging on my own code ;-)    >> 3 - For the  
>>>> remaining list get all users that have access to the calendars
>>>>>>      4 - For every user, check if he desires to receive the  
>>>>>> daily agenda (pref)
>>>>>>      5 - For every user that wants to receive the daily agenda, get his
>>>>>>      calendars
>>>>>>      6 - Again for every calendar get the ones that have any  
>>>>>> event to the
>>>>>>      current day
>>>>>>      7 - Send the email if there's any calendar left
>>>>>>
>>>>>>      For our installation the current script stops on the first  
>>>>>> step, because
>>>>>>      it runs out of memory.
>>>>>>      We thought on creating sub-sets for the shares, but the  
>>>>>> problem is that we
>>>>>>      only know the full agenda of a user after we analyze all  
>>>>>> shares he has
>>>>>>      access to.
>>>>>>
>>>>>>      What we propose:
>>>>>>      1 - Get every users that desire to receive the daily agenda (pref)
>>>>>     That's exactly what steps 1, 3, and 4 do.
>>>>    I know, the difference is that instead of asking for all the shares, we
>>>>    could ask directly for all the users that wan't to receive an  
>>>> agenda, so
>>>>    we could eventually narrow the search.
>>>>    Also, knowing the user instead of the calendars allows us to  
>>>> immediately
>>>>    send the user his agenda, which will free the memory when the loop for
>>>>    that user ends.    >> 2 - execute the steps 5,6,7 of the  
>>>> original script
>>>>>>
>>>>>>      In the worst case scenario this script will still perform  
>>>>>> better than the
>>>>>>      current one, because it doesn't have the first 3 steps.
>>>>>>      With this approach we can create sub-sets of users which  
>>>>>> will allow the
>>>>>>      script to run until the end without running out of memory  
>>>>>> (even if this is
>>>>>>      a long process, it will execute)
>>>>>>
>>>>>>      Problems:
>>>>>>      We don't think there's currently a method to retrieve all  
>>>>>> prefs from the
>>>>>>      backed by its name. Maybe we need to create it, and state  
>>>>>> that this is for
>>>>>>      admin purposes only and shouldn't be called by the  
>>>>>> user-level code (just
>>>>>>      like the listAllShares method from Horde_Share_Sql).
>>>>>>      Currently the pref_name column is not indexed, so we  
>>>>>> expect slow queries.
>>>>>>      The fix for that is obvious.
>>>>>     The problem is that not all backends support listing  
>>>>> preference details for others than the current user. On the  
>>>>> other hand, those backends are already missing some features  
>>>>> anyway, and we already have a listScopes() method that is only  
>>>>> implemented in a few backends too. In the end this might be an  
>>>>> option.
>>>>>     Actually, since the same script already requires the  
>>>>> preference backend to return any user's preference, this would  
>>>>> be a safe approach.
>>>>>
>>>>>     Another approach would be do take a further look at why  
>>>>> listAllShares() exceeds memory. Well, there is not much to look  
>>>>> actually when creating 350.000+ share objects and then  
>>>>> attempting to sort them. Alternatively we could implement an  
>>>>> Iterator interface in the share drivers and only receive the  
>>>>> shares one by one while looping them.
>>>>>
>>>>>     Jan.
>>>>>
>>>>>     --
>>>>>     The Horde Project
>>>>> http://www.horde.org/
>>>>>
>>>>>
>>>>>     --
>>>>>     Horde developers mailing list
>>>>>     Frequently Asked Questions: http://horde.org/faq/To  
>>>>> unsubscribe, mail: dev-unsubscribe at lists.horde.org
>>>>    If you iterate the shares one by one i think you will end up with a
>>>>    similar problem, because you can't send an agenda before you  
>>>> have all the
>>>>    events from all the calendars that a user has access to. So you would
>>>>    probably end up again with the 350.000+ shares on memory.
>>>>
>>>>    Will try to produce a patch and submit for your appreciation.
>>>   Jan, attached is a first preview patch, so you can see if things  
>>> are going in the desired direction.
>>>   Returning a class that implements the ArrayAccess allows  
>>> backward compatibility, but unfortunately, any array_ like  
>>> function (array_merge, array_keys, etc) don't work, so a bit more  
>>> of refactor is needed.
>>>
>>>   One thing that needs to be done is move the Horde_Share_List  
>>> factory to an injector (if I understand the injectors correctly),  
>>> but for now I instantiated the objects inside the Horde_Share_Sql  
>>> directly.
>>>
>>>   Another problem I found was that Horde_Shares can have callbacks  
>>> for listings, so for now I create the new Horde_Share_List object  
>>> and if the callback is active, that object is sent to the  
>>> callback. This might brake up for users that rely on array_ like  
>>> functions, because as stated above they wont work now.
>>>
>>>   What do you devs think?
>>  - You cannot implement ArrayAccess in the base class and Iterator  
>> in the sub-class only. Why do you implement ArrayAccess at all? If  
>> necessary, the Horde_Share_List consumers can use  
>> iterator_to_array().
> If i remember correclty, you sugested the ArrayAccess for a matter of
> backward compatibility.

Take a look at how the share list results are consumed, i.e. whether  
there is array access necessary at all. This was just a suggestion to  
ease the transition with keeping the API as close as possible to the  
older version. If we only loop over the results anyway, you only need  
to implement the iterator.
What I said about the base and child class still holds though.

>>  - You cannot use PDO functionality in the SQL driver. You need to  
>> use the Horde_Db API. Horde_Db's select() already returns an  
>> Iterator that you can proxy in Horde_Share_List.
> The Horde_Db api states that it returns an iterator, but it actually
> returns a PDO statement, that's why i need to use PDO functionality.

It depends on the driver what's being returned, and you must use  
select(), not execute(). The point is, whatever is returned by  
select(), even PDO statements, *are* iterators. They implement  
Traversable to be more specific.

>>  - You cannot use the injector or any other globals in library code.
> Ok. >
>>  - You only need share_id in the SQL driver's SELECT query.
> I also need the name (because of the key), but i understand your point.
>>
>>  - Using the iterator obviously scales better memory-wise, but did  
>> you test how it scales performance-wise. IIUC you are now reading  
>> each share individually from the backend which could be a  
>> performance degration.
>
> I haven't tested this performance-wise, but i don't doubt it will run
> slower than the current approach. The difference is that the current
> approach will stop working when you get to a certain amount of shares. I
> will test this a bit further so we can have a better knowledge of the real
> impact in performance as soon as i have a stable patch ;-)

This will be a key point. I rather accept that the agenda script  
simply doesn't work with a large number of shares, than losing any  
speed. The sqlng share driver is highly optimized and tuned for high  
performance even with a huge number of shares.
If we cannot have both, we can still implement a separate method  
that's only used by the agenda script or any other cli script that can  
run longer.

-- 
Jan Schneider
The Horde Project
http://www.horde.org/




More information about the dev mailing list