[imp] problem with attachments in unicode (UTF16)
Michael M Slusarz
slusarz at horde.org
Tue Mar 25 17:02:04 UTC 2008
Quoting Tim Bannister <Tim.Bannister at manchester.ac.uk>:
>> > So it becomes corrupted when opened in Outlook or Thunderbird.
>>
>> I tested this with Iceweasel (Firefox) 2.0.0.12 on Debian Unstable as the
>> client, and the latest stable releases of Horde and IMP with PHP5 on
>> Debian Etch as the server. The browser says the content-type is
>> text/plain when it uploads the attachment. Here is the exact attachment
>> that was sent in the email:
>
> You haven't mentioned what encoding the browser claimed when uploading the
> file. For this to work, the browser needs to upload the file with
> metadata like this:
> Content-Type: text/plain; charset=UTF-16
No - this is incorrect. The correct (and unfortunate) answer is that
we can not detect the charset of a text attachment if it is in a
different charset than the browser. Browser upload information does
not contain the charset of the uploaded data, only the type - all we
have to go by is the charset the browser reports to us via the HTTP
headers.
This is the reason UTF-8 is used to encode the file, and this is why
the quoted-printable encoding is incorrect. There is nothing wrong
with the way we Q-P - but if we Q-P using the wrong charset, the data
is going to be invalid.
The greater issue is that PHP provides us no means to determine what
the charset of the given file is. There is a function in the mb
extension called mb_detect_encoding(). However this function is
non-mandatory for use of IMP, is buggy and not fully reliable, and
doesn't detect, among other charsets, UTF-16 data. So it is useless
for present purposes. The libmagic file extension can provide charset
guesses when it determines a file is a text file, but again it is not
required for IMP and doesn't produce correct results reliably enough
(for example, on my system it detects the UTF-16 test file as
audio/mpeg).
We could attach text files as application/octet-stream, base64 encoded
data but this gets us no further - it may work to view if you download
the text file to an OS that can detect the charset, but it would not
render properly in an environment where charset detection is not
available (say, for example, PHP and IMP).
michael
--
___________________________________
Michael Slusarz [slusarz at horde.org]
More information about the imp
mailing list