[Tickets #1609] NEW: Incorrect encoding by MIME::encode() on some UTF-8 strings

bugs at bugs.horde.org bugs at bugs.horde.org
Tue Mar 22 15:53:48 PST 2005


DO NOT REPLY TO THIS MESSAGE. THIS EMAIL ADDRESS IS NOT MONITORED.

Ticket URL: http://bugs.horde.org/ticket/?id=1609
-----------------------------------------------------------------------
 Ticket             | 1609
 Created By         | horde at ndn.no
 Summary            | Incorrect encoding by MIME::encode() on some UTF-8 strings
 Queue              | Horde Base
 Version            | 3.0.3
 State              | Unconfirmed
 Priority           | 1. Low
 Type               | Bug
 Owners             | 
-----------------------------------------------------------------------


horde at ndn.no (2005-03-22 15:53) wrote:

While investigating a problem with the norwegian character "Å" (big "å"),
causing incorrectly encoded headers when sending mail with UTF-8 (but not
ISO-8859-1), i tracked it to line 142 in lib/Horde/MIME.php:

$size = preg_match_all('/([^\s]+)([\s]*)/', $text, $matches,
PREG_SET_ORDER);

In my case, adding the Unicode option (/u) to the regex solved the problem:

$size = preg_match_all('/([^\s]+)([\s]*)/u', $text, $matches,
PREG_SET_ORDER);

It seems preg_match_all does not always handle multibyte characters (e.g.
norwegian Å). On a system with PHP 4.3.10, Apache/1.3.33, and, the bug
appeared, as shown by this Amavis alert with "Åretur" as the subject:

X-Amavis-Alert: BAD HEADER Non-encoded 8-bit data (char 85 hex) in message
header 'Subject'
  Subject: Re: =?utf-8?b?ww==?=\205retur\n

A var_dump of $matches would show the mangled first character as the first
entry in the array, with "retur" in the second entry.

On another system running PHP 4.3.9, Apache/1.3.31 the bug did NOT appear.

I'm not sure whether this is a bug with other character sets, or whether
turning on multibyte character support in PHP would solve the problem.




More information about the bugs mailing list