[dev] MIME.php wrapHeaders corrupting filenames
Samuel Nicolary
sam at nicolary.org
Tue Aug 3 12:03:29 PDT 2004
The following function in the MIME framework module is under certain
circumstances taking long filenames which have spaces in them and
replacing a space in the filename with a tab:
function wrapHeaders($header, $text, $eol = "\r\n")
{
/* Remove any existing linebreaks. */
$text = preg_replace("/\r?\n\s?/", ' ', $text);
/* Wrap the line. */
$line = wordwrap(rtrim($header) . ': ' . rtrim($text), 75, $eol . "\t");
/* Make sure there are no empty lines. */
$line = preg_replace("/" . $eol . "\t\s*" . $eol . "\t/", "/" . $eol . "\t/", $line);
return substr($line, strlen($header) + 2);
}
Example:
Horde:
Content-Type: application/msword; name="Mid-Pgm Assessment
Form000000000000000.doc"
Content-Disposition: attachment; filename="Mid-Pgm Assessment
Form000000000000000.doc"
Content-Transfer-Encoding: base64
Horde with filename > 78 and no spaces:
Content-Type: application/msword;
name="Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_this_is_a_third_test_and_just_one_more_for_kicks.doc"
Content-Disposition: attachment;
filename="Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_this_is_a_third_test_and_just_one_more_for_kicks.doc"
Content-Transfer-Encoding: base64
Here are some examples of how other mailers construct this:
Pine:
Content-Type: APPLICATION/msword; name="Mid-Pgm Assessment Form000000000000000.doc"
Content-Transfer-Encoding: BASE64
Content-Disposition: attachment; filename="Mid-Pgm Assessment Form000000000000000.doc"
Pine with a filename > 78:
Content-Type: APPLICATION/msword; name*0="Mid-Pgm Assessment Form000000000000000 this is a test and this is another test and th";
name*1="is is a third test and just one more for kicks.doc"
Content-Transfer-Encoding: BASE64
Content-Disposition: attachment; filename*0="Mid-Pgm Assessment Form000000000000000 this is a test and this is another test and th";
filename*1="is is a third test and just one more for kicks.doc"
Pine with a filename > 78 and no spaces:
Content-Type: APPLICATION/msword; name*0=Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_;
name*1="this_is_a_third_test_and_just_one_more_for_kicks.doc"
Content-Transfer-Encoding: BASE64
Content-Disposition: attachment; filename*0=Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_;
filename*1="this_is_a_third_test_and_just_one_more_for_kicks.doc"
Mulberry:
Content-Type: application/msword;
name="Mid-Pgm Assessment Form000000000000000.doc"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="Mid-Pgm Assessment Form000000000000000.doc"; size=25088
Mulberry with a filename > 78:
Content-Type: application/msword;
name="Mid-Pgm Assessment Form000000000000000 this is a test and this is another test and this is a third test and just one more for kicks.doc"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="Mid-Pgm Assessment Form000000000000000 this is a test and this is another test and this is a third test and just one more for kicks.doc";
size=24064
Mulberry with a filename > 78 and no spaces:
Content-Type: application/msword;
name="Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_this_is_a_third_test_and_just_one_more_for_kicks.doc"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_this_is_a_third_test_and_just_one_more_for_kicks.doc";
size=24064
The following patch which replaces the tab character with a space at least
does not potentially embed a funky character in the attachment filename
quoted string which some mailers cannot make sense of and therefore
include but it does not deal with a long filename comprised of only
alphanumeric characters:
diff -r1.132 MIME.php
809c809
< $line = wordwrap(rtrim($header) . ': ' . rtrim($text), 75, $eol . "\t");
---
> $line = wordwrap(rtrim($header) . ': ' . rtrim($text), 75, $eol . " ");
812c812
< $line = preg_replace("/" . $eol . "\t\s*" . $eol . "\t/", "/" . $eol . "\t/", $line);
---
> $line = preg_replace("/" . $eol . " \s*" . $eol . " /", "/" . $eol . " /", $line);
The pine name*<n> notation looks like an interesting way to handle this.
>From rfc2822:
There are two limits that this standard places on the number of
characters in a line. Each line of characters MUST be no more than
998 characters, and SHOULD be no more than 78 characters, excluding
the CRLF.
The 998 character limit is due to limitations in many implementations
which send, receive, or store Internet Message Format messages that
simply cannot handle more than 998 characters on a line. Receiving
implementations would do well to handle an arbitrarily large number
of characters in a line for robustness sake. However, there are so
many implementations which (in compliance with the transport
requirements of [RFC2821]) do not accept messages containing more
than 1000 character including the CR and LF per line, it is important
for implementations not to create such messages.
The more conservative 78 character recommendation is to accommodate
the many implementations of user interfaces that display these
messages which may truncate, or disastrously wrap, the display of
more than 78 characters per line, in spite of the fact that such
implementations are non-conformant to the intent of this
specification (and that of [RFC2821] if they actually cause
information to be lost). Again, even though this limitation is put on
messages, it is encumbant upon implementations which display messages
I think since the character limit is a "MUST be no more than 998" and a
"SHOULD be no more than 78" then there are the following options:
- use spaces instead of tabs to indent continuation lines on MIME part
headers
- start a new continuation line each time a semi-colon is encountered
outside of a quoted-string unless it is the trailing character
- limit each of these lines to 998 or 78:
- either truncate the value portion of the header attribute to make the
overall length of the line less than 998
or
- use the attribute_key*<n> syntax to break up quoted-strings so that
no line exceeds 78 characters
I was thinking that replacing the call with something like the following -
this hasn't been syntactically checked or anything:
function wrapHeaders($header, $text, $eol = "\r\n")
{
/* Remove any existing linebreaks. */
$text = trim(preg_replace("/\r?\n\s?/", ' ', $text));
$header = trim($header);
$line = '';
if ((strlen($text) + strlen($header)) < 75) {
$line .= $header . ': ' . $text . $eol;
} else {
/* need a more accurate separator regex here but this is just for demonstrative purposes */
$attrs = array_map('trim', preg_split(';', $text, -1, PREG_SPLIT_NO_EMPTY));
for ($i = 0; $i < count($attrs); $i++) {
if ($i == 0) {
/* if this is the first line account for the length of the header addition */
$prefix = $header . ': ';
} else {
/* otherwise it is just a single whitespace indent to account for */
$prefix = ' ';
}
$offset = strlen($prefix);
if ((strlen($offset) + strlen($attrs[$i])) < 75) {
$line .= $prefix . $attrs[$i] . ';' . $eol;
} else {
$attrItems = explode('=', $attrs[$i], 1);
/* if the separator isn't found in the attribute then
* the value should probably not be folded.
* just make sure it doesn't exceed 995
*/
if (!$attrItems) {
$line .= $prefix . substr($attrs[$i], 0, 995 - $offset) . ';' . $eol;
} else {
$attrName = $attrItems[0];
$attrVale = trim($attrItems[1], '"');
$chunks = chunk_split(trim($attrItems[1], '"'), 75 - ($offset + strlen($attrName) + 6))
for ($c = 0; $c < count($chunks); $c++) {
$line .= $line .= $prefix . "$attrName*$c=" . '"' . $chunks[$c] . '";' . $eol;
}
}
}
}
return substr($line, strlen($header) + 2);
}
}
I think there should also be some code in place to deal with displaying
these long filenames at the top of the message in HTML. I think the
anchor tag should be truncated to a certain number of characters and an
alt tag with the full string should be added.
Comments?
--
Sam Nicolary
More information about the dev
mailing list