[imp] Attachment corruption problem
Daniel A. Ramaley
daniel.ramaley at DRAKE.EDU
Wed Apr 26 09:15:44 PDT 2006
There has been an off-list discussion that should be part of this
thread. Both parties agree that it should be posted back to the list.
It follows, with most quoted text removed:
Date: Tue, 25 Apr 2006 16:54:06 +0200
From: Andreas Geesen
I experienced bad behaviour when attachments got encoded as
"quoted-printable". Can you confirm that this is the case with your
file, too?
If so take a look at the bytes of the original and the broken file. If
they differ in the way EOL is used (0x0d 0x0a vs. 0x0a) you have the
same prob which i had.
Depending on your answer i'll take the time to explain what i went
through.
-----
Date: Tue, 25 Apr 2006 10:19:51 -0500
From: Daniel Ramaley
I think that is the problem. I just sent the bad file to myself through
Imp and through a Yahoo mail account. Yahoo sent the file with base64
encoding, and the file came through without corruption. Imp sent the
file quoted-printable, and it was corrupt.
Well, at least i know what the problem is now. I don't know how to
correct it though. If you have a solution to this, i would be very
happy to hear it!
-----
Date: Tue, 25 Apr 2006 18:11:29 +0200
From: Andreas Geesen
RFC 2045 is all about MIME-Mail and theres a clear warning about
EOL-sequences in quoted-printable. These don't need to be encoded (thats
what is defined in this RFC) and are expected to be changed into the
default EOL-sequence during transport.
Thats where your attachment gets broken.
My results of several cross-browser, cross-mailclient, cross-server
tests showed that one mailserver (Exchange), and all windows-mailclients
(tested: Thunderbird and Outlook) changed the EOL-sequence inside these
files into the windows-default EOL-sequence automatically.
Using a linux-thunderbird the EOL-sequence stayed linux-standard. And
since my horde/imp installation runs on a linux-server those files stay
broken.
My suggestion to let horde/imp decide which EOL-sequence to use on the
OS-type the browser is using stayed uncommented.
Since all users on my horde/imp - Setup do use windows i forced imp to
never use quoted-printable again and changed the default EOL-sequence
imp uses to replace in quoted-printable messages into windows-eol.
Both changes are made in horde-framework: Horde/MIME/Part.php
1.) Within the first 20 lines you'll find two definitions of
EOL-sequences: MIME_PART_EOL and MIME_PART_RFC_EOL.
Make sure both are set to "\r\n". As far as i remember i only had to
change MIME_PART_EOL.
2.) And in function "getTransferEncoding" replace all (2 in my case)
occurrences of 'quoted-printable' with 'base64'.
Don't do a "replace-all" in Part.php! The other occurences there
have to remain unchanged.
||
As you can see the first part is a pure windows-fix. So it may cause
trouble saving these attachments on a linux-system. I didn't even try to
make a browser-os depending function.
The second part of my change has no negative effect on any
functionality. It has a slightly negative effect on mail-size.
I do have to say here, that the cause for the corruption lies in the
senders mailclient. Thunderbird decides to send those files as base64
encoded attachment and everything goes fine. But other mailclients like
Outlook may decide that your file isn't binary and thus decide to encode
in quoted-printable.
And i want to point out, too that all mailclients i tested exept imp use
the default EOL-sequence of the OS they're running on when decoding such
a quoted-printable attachment. I still suggest imp should adopt this
behaviour even if that isnt a total soultion to this problem at all.
Hope to have helped - at least a bit.
-----
Date: Tue, 25 Apr 2006 12:41:43 -0500
From: Daniel Ramaley
Thanks for your detailed response! I can't do the Windows-specific hack
that you mentioned; we have clients using many operating systems from
Windows, Mac OS Classic, Mac OS X, and Linux. It wouldn't surprise me a
bit if there are a few users in the physical science department who use
Solaris or Digital Unix. And i think there is at least one person in
the computer science department who uses Irix on an old SGI. Not that
it matters to distinguish between Unix systems, but this just shows how
varied the environment here can be.
I've tried your second change, to the getTransferEncoding function. It
doesn't quite work. If i send the bad file with Imp then check it with
my desktop mail client, the message comes through with an encoding type
of base64, but a file type of "Plain Text Document". When i send it
with Yahoo mail, the encoding is base64 with a file type of
"Unspecified Binary Data". Now, i think that the file type column in my
mail client may not be a standard way of describing the data; it may
just be something the mail client makes up. So i went in and looked at
the raw messages. Here's what i found in the attachment headers:
Imp sends this:
--Boundary_(ID_DYIdADiresNlQOvr+U6BMg)
Content-type: text/plain; charset=UTF-8; name=DU41.PAK
Content-transfer-encoding: BASE64
Content-disposition: attachment; filename=DU41.PAK
Yahoo sends something a bit different:
--Boundary_(ID_JbGuJ9PTwGU6ldeF62a8EA)
Content-type: application/octet-stream; name=DU41.PAK
Content-transfer-encoding: BASE64
Content-disposition: attachment; filename=DU41.PAK
Content-description: 3792803407-DU41.PAK
As you can see, Imp uses "text/plain; charset=UTF-8;" while Yahoo uses
"application/octet-stream;". I'll investigate more and see if i can
come up with a patch that will force Imp to use the correct encoding.
PS: Your original response to me was off-list. Should this discussion be
on the list?
-----
Date: Wed, 26 Apr 2006 08:21:32 +0200
From: Andreas Geesen
No that wasn't my intention to put this discussion off the list. I just
hit 'reply' on your first msg and this thread got off the list already.
As far as i know it doesnt matter which content-type is used. If the
sending mailer uses base64 encoding everything inside the attachment is
safe from being mangled on mail transport.
But as you state that the attachments differ between Yahoo and imp it
might be wise to analyze whats different. A diff of two files won't be
helpful if the files are treated as textfiles and differ in
non-printable characters. I got quite usable results on running a diff
on the outputs of "hexdump -C <file>". There you get text and bytes side
by side.
If the difference is not between EOL-sequences (0x0a vs 0x0d 0x0a) your
problem is different from what i had to deal with.
Funny ... on the list Michael M Slusarz is posting a link on a
bug-ticket i started on my issue.
-----
Date: Wed, 26 Apr 2006 09:01:24 -0500
From: Daniel Ramaley
I did a diff of the hexdumps. The difference is indeed in line endings.
Today i intend to read more of the Horde code to try and learn exactly
how attachments are processed so that i can adjust the encoding.
One thing i don't understand yet is the distinction between encoding and
mime type. My current understanding is that when the browser uploads a
file, it should provide the server with a mime type. But the data
shouldn't require any special encoding at that point since http is
8-bit clean. But Horde has to encode the file to prepare it for e-mail.
Horde chooses the encoding based on the mime type that the client
provided. Or in the event that the client did not provide a mime type,
Horde examines the data and makes a guess. Does this sound correct so
far? If my understanding of how attachments flow through the system is
correct, it shouldn't take me too long to learn the details and figure
out how to adjust it.
>Funny ... on the list Michael M Slusarz is posting a link on a
>bug-ticket i started on my issue.
Yes, i saw that. I get the impression that the Horde developers consider
it a client-side bug with the web browser, not Horde. But if other
webmail systems work with the same browsers, then i'm not convinced it
is a bug with the clients. And asking thousands of computer-illiterate
users to change some esoteric setting in their browser just isn't
feasible at most organizations.
-----
Date: Wed, 26 Apr 2006 16:46:51 +0200
From: Andreas Geesen
The files i had problems with ended in '.dat' and when i sent those
files (using imp) from a windows-box that had this suffix bound to a
mediaplayer the files were automatically treated as binary. Another
windows-box had these files bound to notepad and imp treated them as
text. In fact every browser tells imp which mime-type the uploaded file
has depending on the filetype. Imp has afaik no "guessing" function, but
a fallback encoding which is binary safe.
If you hack horde-framework the way i described in #2.) you trick this
mechanism. Everytime imp needs to encode a file it uses
"getTransferEncoding" to determine which encoding would be best and
"getTransferEncoding" returns 'base64' instead of 'quoted-printable'.
The bigger problem is that you/we have to deal with already modified
mail if the sender decided to use 'quoted-printable' on files where it
shouldn't be used. I already thought of a decision-matrix on how to
replace EOL depending on senders-os and receivers-os. The small version
for two platform-groups could be: (content-type: text/plain ONLY!)
Linux -> Linux = no change;
Linux -> Windows = replace 0x0a into 0x0d 0x0a (or replace "[^\r]\n"
by "\r\n")
Windows -> Linux = no change (changed by transport already)
Windows -> Windows = replace 0x0a into 0x0d 0x0a (or replace "[^\r]\n"
by "\r\n")
As you can see i'd restict that replacement to text/plain, however i
have encountered corrupted quotet-printable attachments of type
application/octet-stream, too. The latter is quite annoying. How can a
mail-client sanely decide to use quoted-printable on application/* -
types? And if a mail-client decided to use quoted-printable on a
text-file that is in fact not a text-file these EOL-conversion-matrix
would result in corrupted files on linux-platforms, when being sent from
windows.
All that isnt enough to "cure" this bad behaviour. Sometimes i wish i
had a good connection to a developer of eg. thunderbird to see how they
deal with this on all the platforms they support. I don't have the time
to browse that code, too.
We both know it's a senders issue, but we also recognize that imp is the
only application that has problems with it.
Maybe we really should post all this back to the list.
------------------------------------------------------------------------
Dan Ramaley Dial Center 118, Drake University
Network Programmer/Analyst 2407 Carpenter Ave
+1 515 271-4540 Des Moines IA 50311 USA
More information about the imp
mailing list