[imp] problem with attachments in unicode (UTF16)

Michael M Slusarz slusarz at horde.org
Thu Mar 27 18:44:58 UTC 2008


Quoting Otto Stolz <Otto.Stolz at uni-konstanz.de>:

> So, I think the best solution would be:
> - Provide a Charset field next to the file-selection widget
>    for the user to specify the encoding of the file he chooses
>    for uploading;

No.  Because 99.9% of users have no idea idea what a charset is.  Even  
I, as a somewhat experienced user, have no idea what charset my text  
docs are in (and nor do I care what their charset is).

> - if the user chooses a text file and a charset, tag the
>    attachment so; optionally, warn if the uploaded file contains
>    illegal data w.r.t. the charset chosen;

No.  See above.

> - if the user chooses a text file, but leaves the Charset
>    default value ‘unknown’ alone, try to guess the charset,
>    as discussed earlier in this thread;

Alter a bit: if a user uploads a text file, attempt to "guess" the  
charset.  This will need to be done in PHP code.  Possible perl  
modules that may be useful to port to PHP for this purpose:

http://search.cpan.org/dist/Encode-Detect/
http://search.cpan.org/~dankogai/Encode-2.24/  (More specific, the  
Encode::Guess module)

Fallback to the charset the browser is using since that is (most  
likely) the charset used by the underlying OS.

michael

-- 
___________________________________
Michael Slusarz [slusarz at horde.org]



More information about the imp mailing list