[dev] MIME.php wrapHeaders corrupting filenames

Samuel Nicolary sam at nicolary.org
Tue Aug 3 12:03:29 PDT 2004


The following function in the MIME framework module is under certain
circumstances taking long filenames which have spaces in them and
replacing a space in the filename with a tab:

    function wrapHeaders($header, $text, $eol = "\r\n")
    {
        /* Remove any existing linebreaks. */
        $text = preg_replace("/\r?\n\s?/", ' ', $text);

        /* Wrap the line. */
        $line = wordwrap(rtrim($header) . ': ' . rtrim($text), 75, $eol . "\t");

        /* Make sure there are no empty lines. */
        $line = preg_replace("/" . $eol . "\t\s*" . $eol . "\t/", "/" . $eol . "\t/", $line);

        return substr($line, strlen($header) + 2);
    }

Example:

Horde:
Content-Type: application/msword; name="Mid-Pgm Assessment
	Form000000000000000.doc"
Content-Disposition: attachment; filename="Mid-Pgm Assessment
	Form000000000000000.doc"
Content-Transfer-Encoding: base64

Horde with filename > 78 and no spaces:
Content-Type: application/msword;
	name="Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_this_is_a_third_test_and_just_one_more_for_kicks.doc"
Content-Disposition: attachment;
	filename="Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_this_is_a_third_test_and_just_one_more_for_kicks.doc"
Content-Transfer-Encoding: base64


Here are some examples of how other mailers construct this:

Pine:
Content-Type: APPLICATION/msword; name="Mid-Pgm Assessment Form000000000000000.doc"
Content-Transfer-Encoding: BASE64
Content-Disposition: attachment; filename="Mid-Pgm Assessment Form000000000000000.doc"

Pine with a filename > 78:
Content-Type: APPLICATION/msword; name*0="Mid-Pgm Assessment Form000000000000000 this is a test and this is another test and th";
        name*1="is is a third test and just one more for kicks.doc"
Content-Transfer-Encoding: BASE64
Content-Disposition: attachment; filename*0="Mid-Pgm Assessment Form000000000000000 this is a test and this is another test and th";
        filename*1="is is a third test and just one more for kicks.doc"

Pine with a filename > 78 and no spaces:
Content-Type: APPLICATION/msword; name*0=Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_;
        name*1="this_is_a_third_test_and_just_one_more_for_kicks.doc"
Content-Transfer-Encoding: BASE64
Content-Disposition: attachment; filename*0=Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_;
        filename*1="this_is_a_third_test_and_just_one_more_for_kicks.doc"


Mulberry:
Content-Type: application/msword;
 name="Mid-Pgm Assessment Form000000000000000.doc"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="Mid-Pgm Assessment Form000000000000000.doc"; size=25088

Mulberry with a filename > 78:
Content-Type: application/msword;
 name="Mid-Pgm Assessment Form000000000000000 this is a test and this is another test and this is a third test and just one more for kicks.doc"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="Mid-Pgm Assessment Form000000000000000 this is a test and this is another test and this is a third test and just one more for kicks.doc";
 size=24064

Mulberry with a filename > 78 and no spaces:
Content-Type: application/msword;
 name="Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_this_is_a_third_test_and_just_one_more_for_kicks.doc"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_this_is_a_third_test_and_just_one_more_for_kicks.doc";
 size=24064


The following patch which replaces the tab character with a space at least
does not potentially embed a funky character in the attachment filename
quoted string which some mailers cannot make sense of and therefore
include but it does not deal with a long filename comprised of only
alphanumeric characters:

diff -r1.132 MIME.php
809c809
<         $line = wordwrap(rtrim($header) . ': ' . rtrim($text), 75, $eol . "\t");
---
>         $line = wordwrap(rtrim($header) . ': ' . rtrim($text), 75, $eol . " ");
812c812
<         $line = preg_replace("/" . $eol . "\t\s*" . $eol . "\t/", "/" . $eol . "\t/", $line);
---
>         $line = preg_replace("/" . $eol . " \s*" . $eol . " /", "/" . $eol . " /", $line);


The pine name*<n> notation looks like an interesting way to handle this.  

>From rfc2822:
   There are two limits that this standard places on the number of
   characters in a line. Each line of characters MUST be no more than
   998 characters, and SHOULD be no more than 78 characters, excluding
   the CRLF.

   The 998 character limit is due to limitations in many implementations
   which send, receive, or store Internet Message Format messages that
   simply cannot handle more than 998 characters on a line. Receiving
   implementations would do well to handle an arbitrarily large number
   of characters in a line for robustness sake. However, there are so
   many implementations which (in compliance with the transport
   requirements of [RFC2821]) do not accept messages containing more
   than 1000 character including the CR and LF per line, it is important
   for implementations not to create such messages.

   The more conservative 78 character recommendation is to accommodate
   the many implementations of user interfaces that display these
   messages which may truncate, or disastrously wrap, the display of
   more than 78 characters per line, in spite of the fact that such
   implementations are non-conformant to the intent of this
   specification (and that of [RFC2821] if they actually cause
   information to be lost). Again, even though this limitation is put on
   messages, it is encumbant upon implementations which display messages


I think since the character limit is a "MUST be no more than 998" and a
"SHOULD be no more than 78" then there are the following options:

 - use spaces instead of tabs to indent continuation lines on MIME part 
   headers
 - start a new continuation line each time a semi-colon is encountered 
   outside of a quoted-string unless it is the trailing character
 - limit each of these lines to 998 or 78:

   - either truncate the value portion of the header attribute to make the 
     overall length of the line less than 998

     or

   - use the attribute_key*<n> syntax to break up quoted-strings so that 
     no line exceeds 78 characters

I was thinking that replacing the call with something like the following -
this hasn't been syntactically checked or anything:

    function wrapHeaders($header, $text, $eol = "\r\n")
    {
        /* Remove any existing linebreaks. */
        $text = trim(preg_replace("/\r?\n\s?/", ' ', $text));
        $header = trim($header);

        $line = '';

        if ((strlen($text) + strlen($header)) < 75) {
            $line .= $header . ': ' . $text . $eol;
        } else {
            /* need a more accurate separator regex here but this is just for demonstrative purposes */
            $attrs = array_map('trim', preg_split(';', $text, -1, PREG_SPLIT_NO_EMPTY));
            for ($i = 0; $i < count($attrs);  $i++) {

                if ($i == 0) {
                    /* if this is the first line account for the length of the header addition */
                    $prefix = $header . ': ';
                } else {
                    /* otherwise it is just a single whitespace indent to account for */
                    $prefix = ' ';
                }
                $offset = strlen($prefix);
                
                if ((strlen($offset) + strlen($attrs[$i])) < 75) {
                    $line .= $prefix . $attrs[$i] . ';' . $eol;
                } else {
                    $attrItems = explode('=', $attrs[$i], 1);

                    /* if the separator isn't found in the attribute then 
                     * the value should probably not be folded.
                     * just make sure it doesn't exceed 995
                     */
                    if (!$attrItems) {
                        $line .= $prefix . substr($attrs[$i], 0, 995 - $offset) . ';' . $eol;
                    } else {
                        $attrName = $attrItems[0];
                        $attrVale = trim($attrItems[1], '"');
                        $chunks = chunk_split(trim($attrItems[1], '"'), 75 - ($offset + strlen($attrName) + 6))
                        for ($c = 0; $c < count($chunks);  $c++) {
                            $line .= $line .= $prefix . "$attrName*$c=" . '"' . $chunks[$c] . '";' . $eol;
                        }
                    }
                }
            }
            return substr($line, strlen($header) + 2);
        }
    }

I think there should also be some code in place to deal with displaying
these long filenames at the top of the message in HTML.  I think the
anchor tag should be truncated to a certain number of characters and an
alt tag with the full string should be added.

Comments?

-- 
Sam Nicolary



More information about the dev mailing list