[Tickets #9567] Re: charset pb replying to message
bugs at horde.org
bugs at horde.org
Wed Mar 16 16:41:57 UTC 2011
DO NOT REPLY TO THIS MESSAGE. THIS EMAIL ADDRESS IS NOT MONITORED.
Ticket URL: http://bugs.horde.org/ticket/9567
------------------------------------------------------------------------------
Ticket | 9567
Updated By | rsalmon at mbpgroup.com
Summary | charset pb replying to message
Queue | IMP
Version | Git master
Type | Bug
State | Feedback
Priority | 1. Low
Milestone |
Patch |
Owners | Michael Slusarz
------------------------------------------------------------------------------
rsalmon at mbpgroup.com (2011-03-16 16:41) wrote:
>> --- Xss.php.org 2011-03-15 10:41:22.000000000 +0100
>> +++ Xss.php 2011-03-15 10:41:24.000000000 +0100
>> - return Horde_String::convertCharset($text, $dom->encoding,
>> $this->_params['charset']);
>> + return $text;
> This isn't correct. Xss filter needs to return text in whatever
> charset it was provided in, which is why the convertCharset() call
> is necessary. The question is why $dom->encoding is 'ISO-8859-1'
> for you and 'UTF-8' for *everybody* else.
If we assume that information from this link
http://devzone.zend.com/article/8855 are right, specifically section 5
: "DOMDocument::saveXML($node) method is always performed in UTF-8"
Then, no matter what $doc->encoding is set to, the following code will
*always* return a UTF-8 encoded string :
if ($body && $body->hasChildNodes()) {
foreach ($body->childNodes as $child) {
$text .= $dom->dom->saveXML($child);
}
}
So, I think that Horde_Text_Filter_Xss::postProcess should be patch
like this :
- return Horde_String::convertCharset($text, $dom->encoding,
$this->_params['charset']);
+ return Horde_String::convertCharset($text, 'UTF-8',
$this->_params['charset']);
Now, why $dom->encoding is different on my machine than yours, I don't
have the answer (and I tried a lot of things). but according to
http://devzone.zend.com/article/8855, Section 4,
DOMDocument::loadHTML() should detect meta tag 'charset', and on my
system, it does (I guess) and this should explain as why
$dom->encoding=iso-8859-1 (or whatever charset the meta tag is set to,
see other comments).
As I think that the above small patch is right, I don't mind if some
of the other dev can try to reply to the message 'email_charset.eml'
to see if I'm really alone on this one.
Thanks.
More information about the bugs
mailing list