[dev] IMP and html2text filter called from lib/MIME/Viewer/plain.php
Chuck Hagenbuch
chuck at horde.org
Sat Nov 17 17:08:07 UTC 2007
Quoting Chris Stromsoe <cbs at cts.ucla.edu>:
> The lines are long because the text file is DNA sequences (a total
> of 43 "words" in the 400k attachment). The way that the regexp for
> framework/Text_Filter/Filter/linkurls.php in getPatterns() is written,
>
> |([\w+]+)://([^\s"<]*[\w+#?/&=])|e
>
> it ends up chewing memory and cpu looking for something valid. An
> alternate fix would be changing to something like
>
> /(^|\s)(ftp|http|mailto|news):\/\/([^\s"<]*[\w+#?\/&=])/e
>
> using a white-listed set of prefixes that either start a line or
> have preceding whitespace.
>
> I can create a ticket, or a ticket + patch for linkurl.php to
> whitelist URL prefixes if that would work better.
Whitelisting makes sense in some situations like security, but here
it'd just lead to a semi-endless list of requests for new items, like
svn+ssh, nextthing+tls.... etc.
What about putting a theoretically high but hopefully still useful for
performance limit on the length of the protocol? Something like:
|([\w+]{1,20})://([^\s"<]*[\w+#?/&=])|e
(so, between 1 and 20 characters for the protocol string). It could
probably be a bit less if it makes a big difference.
-chuck
More information about the dev
mailing list