[dev] [framework-patch] clean HTML

Jan Schneider jan at horde.org
Fri Aug 6 05:30:46 PDT 2004


Zitat von Francois Marier <francois at nit.ca>:

> There is a small glitch with the current _cleanHTML method:
>
> If the data between <script> and </script> is not commented out, it
> will be displayed when we strip the tags out (by replacing them with
> <HordeCleaned>).

True, this doesn't look nice.

> (The same problem arises with <style>)

No. style tags get commented out in the part from line 185 on.

> This patch fixes the <script> problem by removing what's between the
> two tags.  It also fixes the <style> problem when displaying HTML
> inline (in non-inline mode, the <style> tags are preserved).

I don't like this approach, the style tag changes are not necessary anyway
and the script regexps are too weak to catch all cases and too strong to
catch common cases. This is a cosmetic issue, so the we don't want to catch
each obfuscated version of script tags here. Just take the style cleanup as
an example and model the script regexps after that.

> Furthermore, I also added a line that strips out all HTML comments
> (including scripts and styles) if we are displaying inline.  Since we
> cannot allow either script or styles, there is no point in sending
> this data to the browser.

I'm not sure if i want to trade the additional page size with the additional
cpu cycles, but I may get convinced. At least you shouldn't need to look for
withspace characters, the full stop already matches them. If you intended to
catch new lines, use the DOTALL modifier /s instead.

Jan.

--
Do you need professional PHP or Horde consulting?
http://horde.org/consulting.php


More information about the dev mailing list