[imp] problem with attachments in unicode (UTF16)
Andrew Morgan
morgan at orst.edu
Tue Mar 25 19:12:47 UTC 2008
On Tue, 25 Mar 2008, Philip Steeman wrote:
> I've tested the upload with all browsers I have
> - IE6 (windows XP)
> - IE7 (windows XP)
> - Firefox 2 (windows XP)
> - konqueror (Knoppix)
>
> All gave the same wrong result.
When I view the file locally using the url file:///tmp/unicode.txt,
Iceweasel correctly identifies it as UTF-16LE according to the Page Info
screen.
I managed to grab a packet capture of my browser uploading the file to IMP
during new message composition. This is from the POST data:
Content-Disposition: form-data; name="upload_1"; filename="unicode.txt"
Content-Type: text/plain
..t.h.i.s. .i.s. .a. .t.e.s.t.
.
.i.n. .U.T.F.1.6.
.
.
.
.P.h.i.l.i.p. .S.t.e.e.m.a.n.
.
.
The periods are actually null (00) bytes in the data stream.
Further testing shows that for attachments with Primary Type = 'text'
(from type 'text/plain' for example), IMP sets the charset of the
attachment to the character set of your language in IMP. When I choose
Japanese as my language when logging into IMP, the unicode.txt attachment
charset is "SHIFT_JIS". I suppose this means if I can force IMP into a
UTF-16 language, it would correctly identify the attachment.
This seems like a tricky issue.
Andy
More information about the imp
mailing list