[imp] problem with attachments in unicode (UTF16)

Michael M Slusarz slusarz at horde.org
Tue Mar 25 17:02:04 UTC 2008


Quoting Tim Bannister <Tim.Bannister at manchester.ac.uk>:

>> > So it becomes corrupted when opened in Outlook or Thunderbird.
>>
>> I tested this with Iceweasel (Firefox) 2.0.0.12 on Debian Unstable as the
>> client, and the latest stable releases of Horde and IMP with PHP5 on
>> Debian Etch as the server.  The browser says the content-type is
>> text/plain when it uploads the attachment.  Here is the exact attachment
>> that was sent in the email:
>
> You haven't mentioned what encoding the browser claimed when uploading the
> file. For this to work, the browser needs to upload the file with
> metadata like this:
> Content-Type: text/plain; charset=UTF-16

No - this is incorrect.  The correct (and unfortunate) answer is that  
we can not detect the charset of a text attachment if it is in a  
different charset than the browser.  Browser upload information does  
not contain the charset of the uploaded data, only the type - all we  
have to go by is the charset the browser reports to us via the HTTP  
headers.

This is the reason UTF-8 is used to encode the file, and this is why  
the quoted-printable encoding is incorrect.  There is nothing wrong  
with the way we Q-P - but if we Q-P using the wrong charset, the data  
is going to be invalid.

The greater issue is that PHP provides us no means to determine what  
the charset of the given file is.  There is a function in the mb  
extension called mb_detect_encoding().  However this function is  
non-mandatory for use of IMP, is buggy and not fully reliable, and  
doesn't detect, among other charsets, UTF-16 data.  So it is useless  
for present purposes.  The libmagic file extension can provide charset  
guesses when it determines a file is a text file, but again it is not  
required for IMP and doesn't produce correct results reliably enough  
(for example, on my system it detects the UTF-16 test file as  
audio/mpeg).

We could attach text files as application/octet-stream, base64 encoded  
data but this gets us no further - it may work to view if you download  
the text file to an OS that can detect the charset, but it would not  
render properly in an environment where charset detection is not  
available (say, for example, PHP and IMP).

michael

-- 
___________________________________
Michael Slusarz [slusarz at horde.org]



More information about the imp mailing list