[dev] IMP and html2text filter called from lib/MIME/Viewer/plain.php

Chuck Hagenbuch chuck at horde.org
Sat Nov 17 17:08:07 UTC 2007


Quoting Chris Stromsoe <cbs at cts.ucla.edu>:

> The lines are long because the text file is DNA sequences (a total  
> of 43 "words" in the 400k attachment).  The way that the regexp for  
> framework/Text_Filter/Filter/linkurls.php in getPatterns() is written,
>
>     |([\w+]+)://([^\s"<]*[\w+#?/&=])|e
>
> it ends up chewing memory and cpu looking for something valid.  An  
> alternate fix would be changing to something like
>
>     /(^|\s)(ftp|http|mailto|news):\/\/([^\s"<]*[\w+#?\/&=])/e
>
> using a white-listed set of prefixes that either start a line or  
> have preceding whitespace.
>
> I can create a ticket, or a ticket + patch for linkurl.php to  
> whitelist URL prefixes if that would work better.

Whitelisting makes sense in some situations like security, but here  
it'd just lead to a semi-endless list of requests for new items, like  
svn+ssh, nextthing+tls.... etc.

What about putting a theoretically high but hopefully still useful for  
performance limit on the length of the protocol? Something like:

     |([\w+]{1,20})://([^\s"<]*[\w+#?/&=])|e

(so, between 1 and 20 characters for the protocol string). It could  
probably be a bit less if it makes a big difference.

-chuck


More information about the dev mailing list