[dev] [framework-patch] clean HTML

Francois Marier francois at nit.ca
Tue Aug 3 18:13:06 PDT 2004


There is a small glitch with the current _cleanHTML method:

If the data between <script> and </script> is not commented out, it
will be displayed when we strip the tags out (by replacing them with
<HordeCleaned>).

(The same problem arises with <style>)

This patch fixes the <script> problem by removing what's between the
two tags.  It also fixes the <style> problem when displaying HTML
inline (in non-inline mode, the <style> tags are preserved).

Furthermore, I also added a line that strips out all HTML comments
(including scripts and styles) if we are displaying inline.  Since we
cannot allow either script or styles, there is no point in sending
this data to the browser.

Francois
-------------- next part --------------
diff -rpuN -X ../ignorelist ../build/framework/MIME/MIME/Viewer/html.php framework/MIME/MIME/Viewer/html.php
--- ../build/framework/MIME/MIME/Viewer/html.php	Wed Jul 14 07:30:27 2004
+++ framework/MIME/MIME/Viewer/html.php	Tue Aug  3 20:59:45 2004
@@ -68,6 +68,12 @@ class MIME_Viewer_html extends MIME_View
             }
         }
 
+        /* Removes HTML comments (including some scripts & styles) 
+         * if displaying inline */
+        if (!$attachment) {
+            $data = preg_replace('/<!--(.|\s)*?-->/', '', $data);
+        }
+
         /* Change space entities to space characters. */
         $data = preg_replace('/&#(x0*20|0*32);?/i', ' ', $data);
 
@@ -123,6 +129,16 @@ class MIME_Viewer_html extends MIME_View
 
         /* Get all on<foo>="bar()". NEVER allow these. */
         $data = preg_replace('/(\s+[Oo][Nn]\w+)\s*=/', '\1HordeCleaned=', $data);
+
+        /* Remove all scripts since they might introduce garbage if they 
+         * are not quoted properly */
+        $data = preg_replace('|<[^>/]*s\s*c\s*r\s*i\s*p\s*t[^>]*>(.)*?<[^>/]*/\s*s\s*c\s*r\s*i\s*p\s*t[^>]*>|is', '<HordeCleaned_script>', $data);
+
+        /* Remove all styles since they might introduce garbage if they 
+         * are not quoted properly and we are displaying inline */
+        if (!$attachment) {
+            $data = preg_replace('|<[^>/]*style[^>]*>(.)*?<[^>/]*/style[^>]*>|is', '<HordeCleaned_style>', $data);
+        }
 
         /* Get all tags that might cause trouble - <object>, <embed>,
          * <base>, etc. Meta refreshes and iframes, too. */


More information about the dev mailing list