[imp] problem with attachments in unicode (UTF16)

Andrew Morgan morgan at orst.edu
Tue Mar 25 19:12:47 UTC 2008


On Tue, 25 Mar 2008, Philip Steeman wrote:

> I've tested the upload with all browsers I have
> - IE6 (windows XP)
> - IE7 (windows XP)
> - Firefox 2 (windows XP)
> - konqueror (Knoppix)
>
> All gave the same wrong result.

When I view the file locally using the url file:///tmp/unicode.txt, 
Iceweasel correctly identifies it as UTF-16LE according to the Page Info 
screen.

I managed to grab a packet capture of my browser uploading the file to IMP 
during new message composition.  This is from the POST data:

Content-Disposition: form-data; name="upload_1"; filename="unicode.txt"
Content-Type: text/plain

..t.h.i.s. .i.s. .a. .t.e.s.t.
.
.i.n. .U.T.F.1.6.
.
.
.
.P.h.i.l.i.p. .S.t.e.e.m.a.n.
.
.


The periods are actually null (00) bytes in the data stream.


Further testing shows that for attachments with Primary Type = 'text' 
(from type 'text/plain' for example), IMP sets the charset of the 
attachment to the character set of your language in IMP.  When I choose 
Japanese as my language when logging into IMP, the unicode.txt attachment 
charset is "SHIFT_JIS".  I suppose this means if I can force IMP into a 
UTF-16 language, it would correctly identify the attachment.

This seems like a tricky issue.

 	Andy


More information about the imp mailing list