[Tickets #9201] Re: Check for ISO-8859-1/Windows-1252 improper charset labeling

bugs at horde.org bugs at horde.org
Thu Aug 26 18:08:48 UTC 2010


DO NOT REPLY TO THIS MESSAGE. THIS EMAIL ADDRESS IS NOT MONITORED.

Ticket URL: http://bugs.horde.org/ticket/9201
------------------------------------------------------------------------------
  Ticket             | 9201
  Updated By         | Michael Slusarz <slusarz at horde.org>
  Summary            | Check for ISO-8859-1/Windows-1252 improper charset
                     | labeling
  Queue              | IMP
  Version            | Git master
  Type               | Enhancement
  State              | Feedback
  Priority           | 1. Low
  Milestone          | 5.0
  Patch              |
  Owners             | Jan Schneider, Michael Slusarz, Horde Developers
------------------------------------------------------------------------------


Michael Slusarz <slusarz at horde.org> (2010-08-26 14:08) wrote:

> My guess is that there is something weird going on with the DOM  
> encoding/loading.  It seems to be working perfect on my system - but  
> that could be because I am using en_US.UTF-8.  It might not be  
> working properly on, e.g., de or fr locales.
>
> I would suggest playing around with charsets in Horde_Domhtml  
> (located in the horde/Util package).

For reference... when I view the HTML part in a new window,  
Horde_Domhtml is called once.  The initial loadHTML() call fails as  
the encoding is not auto-determined.  It then moves into the forced  
loadHTML() call after converting to UTF-8.  The charset passed into  
the constructor is UTF-8.

Pseudocode:

public function __construct($text, 'UTF-8)
{
         $doc = new DOMDocument();
         $doc->loadHTML($text);

         // $doc->encoding is empty
         $this->encoding = $doc->encoding;

         if (!is_null($charset)) {
             if (!$doc->encoding) {
                 $doc->loadHTML('<?xml encoding="UTF-8">' .  
Horde_String::convertCharset($text, $charset, 'UTF-8'));
                 $this->encoding = 'UTF-8';
             }
         }
}






More information about the bugs mailing list