[imp] Attachment corruption problem
Michael M Slusarz
slusarz at horde.org
Wed Apr 26 10:33:40 PDT 2006
Quoting "Daniel A. Ramaley" <daniel.ramaley at DRAKE.EDU>:
> There has been an off-list discussion that should be part of this
> thread. Both parties agree that it should be posted back to the list.
> It follows, with most quoted text removed:
>
>
>
> Date: Tue, 25 Apr 2006 16:54:06 +0200
> From: Andreas Geesen
>
> I experienced bad behaviour when attachments got encoded as
> "quoted-printable". Can you confirm that this is the case with your
> file, too?
> If so take a look at the bytes of the original and the broken file. If
> they differ in the way EOL is used (0x0d 0x0a vs. 0x0a) you have the
> same prob which i had.
[snip]
This is the same discussion as appeared on the bug report and,
unfortunately, this discussion is still incorrect.
As mentioned in the bug report - this is not a Horde/IMP issue. This
is an issue with quoted-printable not being able to handle binary data
UNLESS IT IS EXPLICTLY TOLD IT IS BEEN GIVEN BINARY DATA. More
important, this issue has *nothing* to do with EOL characters - or,
more correctly, messing with EOL characters is *absoultely* the wrong
way to look at this issue.
Maybe a simple example will be in order. Say I have the following
text/plain file:
-----
Line one.CR
Line two.
-----
And i send it in quoted-printable. It will be sent as the following:
-----
Line one.CRLF
Line two.
-----
As can be seen, pursuant to RFCs, all end of line characters are
converted to CRLF. Most important, no matter what OS the message is
read on, that OS can convert the CRLF string to whatever EOL
convention that OS uses - this is part of the decoding of an RFC
message on the receiving end. So the message appears with the same
line breaks no matter what OS is used to read the message. What is
important to realize is that this text message *WILL BE DIFFERENT*
depending on the OS used. On unix, the message will look like the
following:
-----
Line one.LF
Line two.
-----
On windows the message will look like the following:
-----
Line one.CRLF
Line two.
-----
As can be seen, the file length of the former file is 19. The file
length of the latter file is 20. *Ack*! What is going on? The
answer is nothing - as explained several times in the bug reports this
is exactly what the RFCs allow. Horde/IMP isn't broken. Since it is
text data, the difference is file sizes doesn't make any difference
since with textual data we only care about the *display*.
But, exactly like the RFCs warn us, the problem occurs when we try to
use quoted-printable to send BINARY data. Using the same example as
above, lets assume that this message is not text data but is binary
data instead. Lets assume it is a windows based program that parses
this data, and this program delimits lines by CRLF. Lets assume
Horde/IMP is running on a UNIX machine. We go to attach our message
using IMP. So far so good since the message will be canonicalized
when sending to:
-----
Line one.CRLF
Line two.
-----
Which just fortuitously happens to be in the format we need. Now
imagine this message is received on an IMP installation on a UNIX
machine. We go to download the file. The file is downloaded as such:
-----
Line one.LF
Line two.
-----
And, no suprise, the file is in the wrong format. The windows program
can't read the file. People incorrectly point the finger at Horde/IMP.
So how could this latter situation happen? Because the file is
reported to IMP at the time of sending as a text file. As adequately
demonstrated above, the RFCs clearly indicate that EOL formatting is
not guaranteed when using quoted-printable encoding of text data.
Thus, there is *nothing* broken. There is either an issue with the
browser incorrectly identifying the file as text to IMP when
attaching, or there is an issue with MIME magic detection of the file.
We don't support Q-P encoding of binary data. It defeats the whole
purpose of Q-P in the first place - Q-P is intended to provide a non
MIME-compliant reader (e.g. simple mail user agent, a user looking at
the raw text of the message) a way to understand the gist of the text
message without having to do any further processing.
We are not going to send all messages in base64 since #1 it would
result in *all* messages being approximately 33% larger than they
should and #2 it does not provide the ability to quickly look at a
mail message without specialized software and still be able to
understand most (if not all) of the message
Just FYI, to correctly Q-P binary data, the message above would have
to be sent as follows:
-----
Line one.=0E=0CLine two.
-----
But if we know the message is binary data, we are just going to base64
encode it anyway since it is a more efficient way of sending binary
data (33% more efficient if the entire message is binary data) and if
a message is binary data, we don't need the feature of being able to
look at the message (e.g. Q-P) without specialized software since the
data is going to be indecipherable anyway.
So if binary data is reported as text at the time of attachment then
there can be no expectation that the message will be transmitted
through RFC-compliant mail without alteration. As mentioned
previously, there may be two reasons why binary data is attached as
text:
1.) browser reports data as text/*
This is a browser issue.
SOLUTION: Fix your browser. Or hack Horde/IMP to send all messages in
base64. But this will neither become an option or the standard in our
codebase
2.) MIME magic reports application/octet-stream data as text/*
This may or may not be a Horde issue. This is only a Horde issue if
our internal MIME magic detection is used. But this is the *third*
option and is only used if both the PECL fileinfo module is not
installed and the PHP mime_magic extesion is not available. If either
of these modules are used, then the issue is with their mime magic
algorithims which is something out of the control of us.
In conclusion, there is absolutely nothing wrong with the way we send
Q-P data since we only Q-P encode a message if we are dealing with
text data. This is why Bug 3565 was correctly marked Bogus.
michael
___________________________________
Michael Slusarz [slusarz at horde.org]
More information about the imp
mailing list