[i18n] Htmlspecialchars multibyte charset bug
Viljo Viitanen
vviitane+mail.imp@mappi.helsinki.fi
Thu Nov 7 21:26:49 2002
Slow reply. I checked the archive just now, after joining this list today.
I wrote a few weeks ago:
>>The php htmlspecialchars does _very_ nasty things when dealing with
>>multi-byte text, say utf-8. It somehow automagically decodes the utf8
>>characters in current locale (latin1, for most people) to 8 bit octets.
To which Jan Schneider replied:
>Are you sure? I know that htmlspecialchars is broken for a lot of multibyte
>charsets but I always thought (but never approved) that is has support for
>utf-8.
I'm pretty sure (I cannot test this with other charsets than utf-8, it's the
only multibyte charset I know something about). See this simple test:
htmlspecialchars("ä",ENT_QUOTES,"utf-8") output is "ä".
htmlspecialchars("ä",ENT_QUOTES) works as it should, however.
(this is the case with PHP 4.2.3 compiled from source on Debian 3.0)
Anyway, the "funny" side-effect of the bug is that imp 3.1 displays utf-8
encoded mails "correctly" by accident when using locales using the default
charset iso-8859-1. But the function really is broken, OR, my understanding
of the php manual
(http://www.php.net/manual/en/function.htmlspecialchars.php) is broken...
--
Viljo Viitanen
(please use address Viljo.Viitanen@helsinki.fi for personal replies)
More information about the i18n
mailing list