[Tickets #9567] Re: charset pb replying to message

bugs at horde.org bugs at horde.org
Wed Mar 16 16:41:57 UTC 2011


Ticket URL: http://bugs.horde.org/ticket/9567
  Ticket             | 9567
  Updated By         | rsalmon at mbpgroup.com
  Summary            | charset pb replying to message
  Queue              | IMP
  Version            | Git master
  Type               | Bug
  State              | Feedback
  Priority           | 1. Low
  Milestone          |
  Patch              |
  Owners             | Michael Slusarz

rsalmon at mbpgroup.com (2011-03-16 16:41) wrote:

>> --- Xss.php.org	2011-03-15 10:41:22.000000000 +0100
>> +++ Xss.php	2011-03-15 10:41:24.000000000 +0100
>> -        return Horde_String::convertCharset($text, $dom->encoding,
>> $this->_params['charset']);
>> +        return $text;
> This isn't correct.  Xss filter needs to return text in whatever  
> charset it was provided in, which is why the convertCharset() call  
> is necessary.  The question is why $dom->encoding is 'ISO-8859-1'  
> for you and 'UTF-8' for *everybody* else.

If we assume that information from this link  
http://devzone.zend.com/article/8855 are right, specifically section 5  
: "DOMDocument::saveXML($node) method is always performed in UTF-8"

Then, no matter what $doc->encoding is set to, the following code will  
*always* return a UTF-8 encoded string :
  if ($body && $body->hasChildNodes()) {
	 foreach ($body->childNodes as $child) {
		 $text .= $dom->dom->saveXML($child);

So, I think that Horde_Text_Filter_Xss::postProcess should be patch  
like this :

- return Horde_String::convertCharset($text, $dom->encoding,  
+ return Horde_String::convertCharset($text, 'UTF-8',  

Now, why $dom->encoding is different on my machine than yours, I don't  
have the answer (and I tried a lot of things). but according to  
http://devzone.zend.com/article/8855, Section 4,  
DOMDocument::loadHTML() should detect meta tag 'charset', and on my  
system, it does (I guess) and this should explain as why  
$dom->encoding=iso-8859-1 (or whatever charset the meta tag is set to,  
see other comments).

As I think that the above small patch is right, I don't mind if some  
of the other dev can try to reply to the message 'email_charset.eml'  
to see if I'm really alone on this one.


More information about the bugs mailing list