[Tickets #4340] Problems with german umlaut

bugs@bugs.horde.org bugs at bugs.horde.org
Tue Aug 29 01:31:06 PDT 2006


DO NOT REPLY TO THIS MESSAGE. THIS EMAIL ADDRESS IS NOT MONITORED.

Ticket URL: http://bugs.horde.org/ticket/?id=4340
-----------------------------------------------------------------------
 Ticket             | 4340
 Updated By         | s_gatterbauer at idlm.net
 Summary            | Problems with german umlaut
 Queue              | Jonah
 Type               | Enhancement
 State              | Feedback
 Priority           | 1. Low
 Owners             | 
-----------------------------------------------------------------------


s_gatterbauer at idlm.net (2006-08-29 01:31) wrote:

today at evening I will try something like this in lib/Jonah.php :

looking in the first 80 characters of the source-file (should contain the
XML declaration ordered : version - encoding - standalone)
for the string after  encoding=  (should be the charset).


        if (preg_match('/.*;\s?charset="?([^"]*)/', $content_type,
$match)) {
            $result['charset'] = $match[1];
+        } else {
+            $t_start = strpos(substr($result['body'], 1, 60),
'encoding=') + 10;
+            if ($t_start) {
+                $t_stop  = strpos(substr($result['body'], $t_start, 20),
'"', $t_start);
+                $result['charset'] =
strtolower(trim(substr($result['body'], $t_start, $t_stop - $t_start)));
+            }
        }

not very inventive (I do not know php), but it should extract the right
thing.
yes - there is a problem with $t_stop if the encoding value is included in
single quotes (I will look after it).
I am not sure about  preg_match  , but the following should also work :

        if (preg_match('/.*;\s?charset="?([^"]*)/', $content_type,
$match)) {
            $result['charset'] = $match[1];
+        } elsif (preg_match('/.*\s?encoding="?([^"]*)/',
substr($result['body'], 1, 80), $match)) {
+            $result['charset'] = $match[1];
        }






More information about the bugs mailing list