[Tickets #1621] non-ASCII 7-bit message headers not RFC2047-encoded

bugs at bugs.horde.org bugs at bugs.horde.org
Fri Mar 25 16:00:54 PST 2005


DO NOT REPLY TO THIS MESSAGE. THIS EMAIL ADDRESS IS NOT MONITORED.

Ticket URL: http://bugs.horde.org/ticket/?id=1621
-----------------------------------------------------------------------
 Ticket             | 1621
 Updated By         | windhamg at email.arizona.edu
 Summary            | non-ASCII 7-bit message headers not RFC2047-encoded
 Queue              | IMP
 Version            | HEAD
 State              | Feedback
 Priority           | 2. Medium
 Type               | Bug
 Owners             | Michael Slusarz
-----------------------------------------------------------------------


windhamg at email.arizona.edu (2005-03-25 16:00) wrote:

Well, I tried the '[^\x00-\x7f]' regex pattern in is8bit(), but no dice.  I
may be speaking ignorantly (in fact, it's very likely) but, even though we
are using a multibyte-aware regex function, this character set (ISO-2022-JP)
*is still* a 7-bit character set.  How are we going to find byte values in
the range [\x80-\xff] in a 7-bit-byte character set?

I'm starting to think this is a lost cause...I placed some diagnostic output
in the String::regexMatch function and see that, even though the $charset
being passed in is "ISO-2022-JP", the resultant mb_regex_encoding() is
"EUC-JP".

IMHO, the root of this problem is that the MIME::encode function claims to
"Encode a string containing non-ASCII characters according to RFC 2047",
while it actually only encodes strings containing non-8bit characters. 
Since  non-8bit does not always imply ASCII, we need to find a good test of
"ASCII-ness".  I can test for ISO-2022-JP using a regex like '\x1b[\(\$]',
but it would be nicer to have a more general test (if one exists) for
non-ASCII 7-bit encodings.




More information about the bugs mailing list