[imp] Attachment corruption problem

Daniel A. Ramaley daniel.ramaley at DRAKE.EDU
Wed Apr 26 09:15:44 PDT 2006


There has been an off-list discussion that should be part of this 
thread. Both parties agree that it should be posted back to the list. 
It follows, with most quoted text removed:



Date: Tue, 25 Apr 2006 16:54:06 +0200
From: Andreas Geesen

I experienced bad behaviour when attachments got encoded as
"quoted-printable". Can you confirm that this is the case with your
file, too?
If so take a look at the bytes of the original and the broken file. If
they differ in the way EOL is used (0x0d 0x0a vs. 0x0a) you have the
same prob which i had.
Depending on your answer i'll take the time to explain what i went 
through.

-----

Date: Tue, 25 Apr 2006 10:19:51 -0500
From: Daniel Ramaley

I think that is the problem. I just sent the bad file to myself through 
Imp and through a Yahoo mail account. Yahoo sent the file with base64 
encoding, and the file came through without corruption. Imp sent the 
file quoted-printable, and it was corrupt.

Well, at least i know what the problem is now. I don't know how to 
correct it though. If you have a solution to this, i would be very 
happy to hear it!

-----

Date: Tue, 25 Apr 2006 18:11:29 +0200
From: Andreas Geesen

RFC 2045 is all about MIME-Mail and theres a clear warning about
EOL-sequences in quoted-printable. These don't need to be encoded (thats
what is defined in this RFC) and are expected to be changed into the
default EOL-sequence during transport.
Thats where your attachment gets broken.

My results of several cross-browser, cross-mailclient, cross-server
tests showed that one mailserver (Exchange), and all windows-mailclients
(tested: Thunderbird and Outlook) changed the EOL-sequence inside these
files into the  windows-default EOL-sequence automatically.
Using a linux-thunderbird the EOL-sequence stayed linux-standard. And
since my horde/imp installation runs on a linux-server those files stay
broken.
My suggestion to let horde/imp decide which EOL-sequence to use on the
OS-type the browser is using stayed uncommented.

Since all users on my horde/imp - Setup do use windows i forced imp to
never use quoted-printable again and changed the default EOL-sequence
imp uses to replace in quoted-printable messages into windows-eol.

Both changes are made in horde-framework: Horde/MIME/Part.php

1.) Within the first 20 lines you'll find two definitions of
EOL-sequences: MIME_PART_EOL and MIME_PART_RFC_EOL.
    Make sure both are set to "\r\n". As far as i remember i only had to
change MIME_PART_EOL.
2.) And in function "getTransferEncoding" replace all (2 in my case)
occurrences of 'quoted-printable' with 'base64'.
    Don't do a "replace-all" in Part.php! The other occurences there
have to remain unchanged.
||
As you can see the first part is a pure windows-fix. So it may cause
trouble saving these attachments on a linux-system. I didn't even try to
make a browser-os depending function.
The second part of my change has no negative effect on any
functionality. It has a slightly negative effect on mail-size.

I do have to say here, that the cause for the corruption lies in the
senders mailclient. Thunderbird decides to send those files as base64
encoded attachment and everything goes fine. But other mailclients like
Outlook may decide that your file isn't binary and thus decide to encode
in quoted-printable.
And i want to point out, too that all mailclients i tested exept imp use
the default EOL-sequence of the OS they're running on when decoding such
a quoted-printable attachment. I still suggest imp should adopt this
behaviour even if that isnt a total soultion to this problem at all.

Hope to have helped - at least a bit.

-----

Date: Tue, 25 Apr 2006 12:41:43 -0500
From: Daniel Ramaley

Thanks for your detailed response! I can't do the Windows-specific hack 
that you mentioned; we have clients using many operating systems from 
Windows, Mac OS Classic, Mac OS X, and Linux. It wouldn't surprise me a 
bit if there are a few users in the physical science department who use 
Solaris or Digital Unix. And i think there is at least one person in 
the computer science department who uses Irix on an old SGI. Not that 
it matters to distinguish between Unix systems, but this just shows how 
varied the environment here can be.

I've tried your second change, to the getTransferEncoding function. It 
doesn't quite work. If i send the bad file with Imp then check it with 
my desktop mail client, the message comes through with an encoding type 
of base64, but a file type of "Plain Text Document". When i send it 
with Yahoo mail, the encoding is base64 with a file type of 
"Unspecified Binary Data". Now, i think that the file type column in my 
mail client may not be a standard way of describing the data; it may 
just be something the mail client makes up. So i went in and looked at 
the raw messages. Here's what i found in the attachment headers:

Imp sends this:

--Boundary_(ID_DYIdADiresNlQOvr+U6BMg)
Content-type: text/plain; charset=UTF-8; name=DU41.PAK
Content-transfer-encoding: BASE64
Content-disposition: attachment; filename=DU41.PAK

Yahoo sends something a bit different:

--Boundary_(ID_JbGuJ9PTwGU6ldeF62a8EA)
Content-type: application/octet-stream; name=DU41.PAK
Content-transfer-encoding: BASE64
Content-disposition: attachment; filename=DU41.PAK
Content-description: 3792803407-DU41.PAK

As you can see, Imp uses "text/plain; charset=UTF-8;" while Yahoo uses 
"application/octet-stream;". I'll investigate more and see if i can 
come up with a patch that will force Imp to use the correct encoding.

PS: Your original response to me was off-list. Should this discussion be 
on the list?

-----

Date: Wed, 26 Apr 2006 08:21:32 +0200
From: Andreas Geesen

No that wasn't my intention to put this discussion off the list. I just
hit 'reply' on your first msg and this thread got off the list already.

As far as i know it doesnt matter which content-type is used. If the
sending mailer uses base64 encoding everything inside the attachment is
safe from being mangled on mail transport.
But as you state that the attachments differ between Yahoo and imp it
might be wise to analyze whats different. A diff of two files won't be
helpful if the files are treated as textfiles and differ in
non-printable characters. I got quite usable results on running a diff
on the outputs of "hexdump -C <file>". There you get text and bytes side
by side.
If the difference is not between EOL-sequences (0x0a vs 0x0d 0x0a) your
problem is different from what i had to deal with.

Funny ... on the list Michael M Slusarz is posting a link on a
bug-ticket i started on my issue.

-----

Date: Wed, 26 Apr 2006 09:01:24 -0500
From: Daniel Ramaley

I did a diff of the hexdumps. The difference is indeed in line endings. 
Today i intend to read more of the Horde code to try and learn exactly 
how attachments are processed so that i can adjust the encoding.

One thing i don't understand yet is the distinction between encoding and 
mime type. My current understanding is that when the browser uploads a 
file, it should provide the server with a mime type. But the data 
shouldn't require any special encoding at that point since http is 
8-bit clean. But Horde has to encode the file to prepare it for e-mail. 
Horde chooses the encoding based on the mime type that the client 
provided. Or in the event that the client did not provide a mime type, 
Horde examines the data and makes a guess. Does this sound correct so 
far? If my understanding of how attachments flow through the system is 
correct, it shouldn't take me too long to learn the details and figure 
out how to adjust it.

>Funny ... on the list Michael M Slusarz is posting a link on a
>bug-ticket i started on my issue.

Yes, i saw that. I get the impression that the Horde developers consider 
it a client-side bug with the web browser, not Horde. But if other 
webmail systems work with the same browsers, then i'm not convinced it 
is a bug with the clients. And asking thousands of computer-illiterate 
users to change some esoteric setting in their browser just isn't 
feasible at most organizations.

-----

Date: Wed, 26 Apr 2006 16:46:51 +0200
From: Andreas Geesen

The files i had problems with ended in '.dat' and when i sent those
files (using imp) from a windows-box that had this suffix bound to a
mediaplayer the files were automatically treated as binary. Another
windows-box had these files bound to notepad and imp treated them as
text. In fact every browser tells imp which mime-type the uploaded file
has depending on the filetype. Imp has afaik no "guessing" function, but
a fallback encoding which is binary safe.
If you hack horde-framework the way i described in #2.) you trick this
mechanism. Everytime imp needs to encode a file it uses
"getTransferEncoding" to determine which encoding would be best and
"getTransferEncoding" returns 'base64' instead of 'quoted-printable'.
The bigger problem is that you/we have to deal with already modified
mail if the sender decided to use 'quoted-printable' on files where it
shouldn't be used. I already thought of a decision-matrix on how to
replace EOL depending on senders-os and receivers-os. The small version
for two platform-groups could be: (content-type: text/plain  ONLY!)

Linux -> Linux = no change;
Linux -> Windows = replace 0x0a into 0x0d 0x0a   (or replace "[^\r]\n"
by "\r\n")
Windows -> Linux = no change (changed by transport already)
Windows -> Windows = replace 0x0a into 0x0d 0x0a   (or replace "[^\r]\n"
by "\r\n")

As you can see i'd restict that replacement to text/plain, however i
have encountered corrupted quotet-printable attachments of type
application/octet-stream, too. The latter is quite annoying. How can a
mail-client sanely decide to use quoted-printable on application/* -
types? And if a mail-client decided to use quoted-printable on a
text-file that is in fact not a text-file these EOL-conversion-matrix
would result in corrupted files on linux-platforms, when being sent from
windows.

All that isnt enough to "cure" this bad behaviour. Sometimes i wish i
had a good connection to a developer of eg. thunderbird to see how they
deal with this on all the platforms they support. I don't have the time
to browse that code, too.
We both know it's a senders issue, but we also recognize that imp is the
only application that has problems with it.
Maybe we really should post all this back to the list.




------------------------------------------------------------------------
Dan Ramaley                            Dial Center 118, Drake University
Network Programmer/Analyst             2407 Carpenter Ave
+1 515 271-4540                        Des Moines IA 50311 USA


More information about the imp mailing list