[Tickets #648] NEW: MIME.php wrapHeaders corrupting filenames

bugs at bugs.horde.org bugs at bugs.horde.org
Tue Sep 28 21:58:03 PDT 2004


DO NOT REPLY TO THIS MESSAGE. THIS EMAIL ADDRESS IS NOT MONITORED.

Ticket URL: http://bugs.horde.org/ticket/?id=648
-----------------------------------------------------------------------
 Ticket     | 648
 Created By | Michael Slusarz <slusarz at mail.curecanti.org>
 Summary    | MIME.php wrapHeaders corrupting filenames
 Queue      | Horde Framework
 State      | Assigned
 Priority   | 2. Medium
 Type       | Bug
 Owners     | Michael Slusarz
-----------------------------------------------------------------------


Michael Slusarz <slusarz at mail.curecanti.org> (2004-09-28 21:58) wrote:

The following function in the MIME framework module is under certain
circumstances taking long filenames which have spaces in them and
replacing a space in the filename with a tab:

    function wrapHeaders($header, $text, $eol = "\r\n")
    {
        /* Remove any existing linebreaks. */
        $text = preg_replace("/\r?\n\s?/", ' ', $text);

        /* Wrap the line. */
        $line = wordwrap(rtrim($header) . ': ' . rtrim($text), 75, $eol .
"\t");

        /* Make sure there are no empty lines. */
        $line = preg_replace("/" . $eol . "\t\s*" . $eol . "\t/", "/" . $eol
. "\t/", $line);

        return substr($line, strlen($header) + 2);
    }

Example:

Horde:
Content-Type: application/msword; name="Mid-Pgm Assessment
        Form000000000000000.doc"
Content-Disposition: attachment; filename="Mid-Pgm Assessment
        Form000000000000000.doc"
Content-Transfer-Encoding: base64

Horde with filename > 78 and no spaces:
Content-Type: application/msword;
       
name="Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_anothe
r_test_and_this_is_a_third_test_and_just_one_more_for_kicks.doc"
Content-Disposition: attachment;
       
filename="Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_an
other_test_and_this_is_a_third_test_and_just_one_more_for_kicks.doc"
Content-Transfer-Encoding: base64


Here are some examples of how other mailers construct this:

Pine:
Content-Type: APPLICATION/msword; name="Mid-Pgm Assessment
Form000000000000000.doc"
Content-Transfer-Encoding: BASE64
Content-Disposition: attachment; filename="Mid-Pgm Assessment
Form000000000000000.doc"

Pine with a filename > 78:
Content-Type: APPLICATION/msword; name*0="Mid-Pgm Assessment
Form000000000000000 this is a test and this is another test and th";
        name*1="is is a third test and just one more for kicks.doc"
Content-Transfer-Encoding: BASE64
Content-Disposition: attachment; filename*0="Mid-Pgm Assessment
Form000000000000000 this is a test and this is another test and th";
        filename*1="is is a third test and just one more for kicks.doc"

Pine with a filename > 78 and no spaces:
Content-Type: APPLICATION/msword;
name*0=Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_anoth
er_test_and_;
        name*1="this_is_a_third_test_and_just_one_more_for_kicks.doc"
Content-Transfer-Encoding: BASE64
Content-Disposition: attachment;
filename*0=Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_a
nother_test_and_;
        filename*1="this_is_a_third_test_and_just_one_more_for_kicks.doc"


Mulberry:
Content-Type: application/msword;
 name="Mid-Pgm Assessment Form000000000000000.doc"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="Mid-Pgm Assessment Form000000000000000.doc"; size=25088

Mulberry with a filename > 78:
Content-Type: application/msword;
 name="Mid-Pgm Assessment Form000000000000000 this is a test and this is
another test and this is a third test and just one more for kicks.doc"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="Mid-Pgm Assessment Form000000000000000 this is a test and this is
another test and this is a third test and just one more for kicks.doc";
 size=24064

Mulberry with a filename > 78 and no spaces:
Content-Type: application/msword;
 name="Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_anoth
er_test_and_this_is_a_third_test_and_just_one_more_for_kicks.doc"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_a
nother_test_and_this_is_a_third_test_and_just_one_more_for_kicks.doc";
 size=24064


The following patch which replaces the tab character with a space at least
does not potentially embed a funky character in the attachment filename
quoted string which some mailers cannot make sense of and therefore
include but it does not deal with a long filename comprised of only
alphanumeric characters:

diff -r1.132 MIME.php
809c809
<         $line = wordwrap(rtrim($header) . ': ' . rtrim($text), 75, $eol .
"\t");
---
>         $line = wordwrap(rtrim($header) . ': ' . rtrim($text), 75, $eol .
" ");
812c812
<         $line = preg_replace("/" . $eol . "\t\s*" . $eol . "\t/", "/" .
$eol . "\t/", $line);
---
>         $line = preg_replace("/" . $eol . " \s*" . $eol . " /", "/" . $eol
. " /", $line);


The pine name*<n> notation looks like an interesting way to handle this.  

>From rfc2822:
   There are two limits that this standard places on the number of
   characters in a line. Each line of characters MUST be no more than
   998 characters, and SHOULD be no more than 78 characters, excluding
   the CRLF.

   The 998 character limit is due to limitations in many implementations
   which send, receive, or store Internet Message Format messages that
   simply cannot handle more than 998 characters on a line. Receiving
   implementations would do well to handle an arbitrarily large number
   of characters in a line for robustness sake. However, there are so
   many implementations which (in compliance with the transport
   requirements of [RFC2821]) do not accept messages containing more
   than 1000 character including the CR and LF per line, it is important
   for implementations not to create such messages.

   The more conservative 78 character recommendation is to accommodate
   the many implementations of user interfaces that display these
   messages which may truncate, or disastrously wrap, the display of
   more than 78 characters per line, in spite of the fact that such
   implementations are non-conformant to the intent of this
   specification (and that of [RFC2821] if they actually cause
   information to be lost). Again, even though this limitation is put on
   messages, it is encumbant upon implementations which display messages


I think since the character limit is a "MUST be no more than 998" and a
"SHOULD be no more than 78" then there are the following options:

 - use spaces instead of tabs to indent continuation lines on MIME part
   headers
 - start a new continuation line each time a semi-colon is encountered
   outside of a quoted-string unless it is the trailing character
 - limit each of these lines to 998 or 78:

   - either truncate the value portion of the header attribute to make the
     overall length of the line less than 998

     or

   - use the attribute_key*<n> syntax to break up quoted-strings so that
     no line exceeds 78 characters

I was thinking that replacing the call with something like the following -
this hasn't been syntactically checked or anything:

    function wrapHeaders($header, $text, $eol = "\r\n")
    {
        /* Remove any existing linebreaks. */
        $text = trim(preg_replace("/\r?\n\s?/", ' ', $text));
        $header = trim($header);

        $line = '';

        if ((strlen($text) + strlen($header)) < 75) {
            $line .= $header . ': ' . $text . $eol;
        } else {
            /* need a more accurate separator regex here but this is just
for demonstrative purposes */
            $attrs = array_map('trim', preg_split(';', $text, -1,
PREG_SPLIT_NO_EMPTY));
            for ($i = 0; $i < count($attrs);  $i++) {

                if ($i == 0) {
                    /* if this is the first line account for the length of
the header addition */
                    $prefix = $header . ': ';
                } else {
                    /* otherwise it is just a single whitespace indent to
account for */
                    $prefix = ' ';
                }
                $offset = strlen($prefix);
                
                if ((strlen($offset) + strlen($attrs[$i])) < 75) {
                    $line .= $prefix . $attrs[$i] . ';' . $eol;
                } else {
                    $attrItems = explode('=', $attrs[$i], 1);

                    /* if the separator isn't found in the attribute then
                     * the value should probably not be folded.
                     * just make sure it doesn't exceed 995
                     */
                    if (!$attrItems) {
                        $line .= $prefix . substr($attrs[$i], 0, 995 -
$offset) . ';' . $eol;
                    } else {
                        $attrName = $attrItems[0];
                        $attrVale = trim($attrItems[1], '"');
                        $chunks = chunk_split(trim($attrItems[1], '"'), 75 -
($offset + strlen($attrName) + 6))
                        for ($c = 0; $c < count($chunks);  $c++) {
                            $line .= $line .= $prefix . "$attrName*$c=" .
'"' . $chunks[$c] . '";' . $eol;
                        }
                    }
                }
            }
            return substr($line, strlen($header) + 2);
        }
    }

I think there should also be some code in place to deal with displaying
these long filenames at the top of the message in HTML.  I think the
anchor tag should be truncated to a certain number of characters and an
alt tag with the full string should be added.

Comments?

-- 
Sam Nicolary




More information about the bugs mailing list