[Tickets #3101] RESOLVED: Wrong mime-encoding of subject header

Wed Dec 7 21:32:04 PST 2005

DO NOT REPLY TO THIS MESSAGE. THIS EMAIL ADDRESS IS NOT MONITORED.

Ticket URL: http://bugs.horde.org/ticket/?id=3101
-----------------------------------------------------------------------
 Ticket             | 3101
 Updated By         | Michael Slusarz <slusarz at mail.curecanti.org>
 Summary            | Wrong mime-encoding of subject header
 Queue              | IMP
 Version            | 4.0.4
-State              | Assigned
+State              | Bogus
 Priority           | 1. Low
 Type               | Bug
 Owners             | Michael Slusarz
-----------------------------------------------------------------------

Michael Slusarz <slusarz at mail.curecanti.org> (2005-12-07 21:32) wrote:

> When I write a new mail with umlaut characters in the subject, 
> sometimes spaces seem to be missing because the subjects seems to be 
> wrongly encoded. For example with:
>
> Subject: Infos über kIZ für die Nightline
>
> IMP encodes it as:
>
> Subject: Infos =?iso-8859-1?b?/GJlcg==?= kIZ =?iso-8859-1?b?Zvxy?= die
>        Nightline
>
> Note that before "Nightline" there is a new-line and a tab.

Yes, and this is perfectly acceptable.  It is how you break a MIME Header. 
The MUA should convert all newlines and leading white space in a header to a
single space.

> In RFC2047 it is written:
>
>
>   When displaying a particular header field that contains multiple
>   'encoded-word's, any 'linear-white-space' that separates a pair of
>   adjacent 'encoded-word's is ignored.  (This is to allow the use of
>   multiple 'encoded-word's to represent long strings of unencoded text,
>   without having to separate 'encoded-word's where spaces occur in the
>   unencoded text.)

Correct.

> That is, I think that you need to put the newline between two encoded 
> words so that the newline is ignored.

No, this is not what the RFC says.  The RFC says *if* you put a newline
between two encoded words, then the space is ignored.  So it allows you to
break in the middle of a word, for example, if that encoded word would cause
the line to exceed 78 characters (plus CRLF) in length.  However, there is
no requirement to break a line this way.

> Mutt does encode the same 
> subject like this:
>
> Subject: Infos =?iso-8859-1?Q?=FCber_kIZ_f=FC?=
>        =?iso-8859-1?Q?r?= die Nightline

It looks like mutt uses a different, less complex algorithim.  They encode
spaces within the encoded string using '_'.  IMP (actually Horde) only
encodes spaces when two consecutive words both contain characters that
require encoding. 

This results in (at least in my opinion) easier strings to read when the
string is unencoded (i.e. viewing the message source).  The IMP string of:
  Infos =?iso-8859-1?b?/GJlcg==?= kIZ =?iso-8859-1?b?Zvxy?= die Nightline
is logically viewed by me as
  Infos <some encoded word> klZ <some encoded word> die Nightline

While the mutt way of doing things:
Infos =?iso-8859-1?Q?=FCber_kIZ_f=FC?= =?iso-8859-1?Q?r?= die Nightline
is logically viewed by me as:
  Infos <some encoded word> <some encoded word> die Nightline

The 'klZ' in the mutt example is completely lost in the encoded stuff.

Long story short - both ways of encoding are correct according to the RFCs. 
So if your mail reader is entering extra spaces between words with either
encoding, then your mail reader is broken.