[dev] IMP and html2text filter called	from	lib/MIME/Viewer/plain.php
    Chuck Hagenbuch 
    chuck at horde.org
       
    Sat Nov 17 17:08:07 UTC 2007
    
    
  
Quoting Chris Stromsoe <cbs at cts.ucla.edu>:
> The lines are long because the text file is DNA sequences (a total  
> of 43 "words" in the 400k attachment).  The way that the regexp for  
> framework/Text_Filter/Filter/linkurls.php in getPatterns() is written,
>
>     |([\w+]+)://([^\s"<]*[\w+#?/&=])|e
>
> it ends up chewing memory and cpu looking for something valid.  An  
> alternate fix would be changing to something like
>
>     /(^|\s)(ftp|http|mailto|news):\/\/([^\s"<]*[\w+#?\/&=])/e
>
> using a white-listed set of prefixes that either start a line or  
> have preceding whitespace.
>
> I can create a ticket, or a ticket + patch for linkurl.php to  
> whitelist URL prefixes if that would work better.
Whitelisting makes sense in some situations like security, but here  
it'd just lead to a semi-endless list of requests for new items, like  
svn+ssh, nextthing+tls.... etc.
What about putting a theoretically high but hopefully still useful for  
performance limit on the length of the protocol? Something like:
     |([\w+]{1,20})://([^\s"<]*[\w+#?/&=])|e
(so, between 1 and 20 characters for the protocol string). It could  
probably be a bit less if it makes a big difference.
-chuck
    
    
More information about the dev
mailing list