[Tickets #9617] Re: db_migrate and incorrect charset handling

bugs at horde.org bugs at horde.org
Mon Apr 4 14:33:13 UTC 2011


DO NOT REPLY TO THIS MESSAGE. THIS EMAIL ADDRESS IS NOT MONITORED.

Ticket URL: http://bugs.horde.org/ticket/9617
------------------------------------------------------------------------------
  Ticket             | 9617
  Updated By         | Jan Schneider <jan at horde.org>
  Summary            | db_migrate and incorrect charset handling
  Queue              | Horde Framework Packages
  Version            | Git master
  Type               | Bug
-State              | Resolved
+State              | Assigned
  Priority           | 1. Low
  Milestone          |
  Patch              |
-Owners             | Michael Rubinsky
+Owners             | Jan Schneider, Michael Rubinsky
------------------------------------------------------------------------------


Jan Schneider <jan at horde.org> (2011-04-04 14:33) wrote:

>>> PHP's manual suggest that one should not assume that
>>> strtolower()/strtoupper() work correctly with
>>> multibyte charset like utf-8.
>>
>> Where does it say that? I don't see any such suggestions in the man pages.
>
> It does not it say it in so many words or at least says it  
> ambiguously: "Note that 'alphabetic' is determined by the current  
> locale"

Which is exactly what we want.

> But if we look at php's source code for strtoupper() it works by  
> bytes, therefore it will not work correctly with UTF-8 encoded  
> strings that contain non ascii characters.

So the manual is plain wrong.

> Excerpt from ext/standard/string.c:
> char *php_strtoupper(char *s, size_t len)
> {
>         unsigned char *c, *e;
>
>         c = (unsigned char *)s;
>         e = (unsigned char *)c+len;
>
>         while (c < e) {
>                 *c = toupper(*c);
>                 c++;
>         }
>         return s;
> }
>
> The non ascii characters in UTF-8 are multi byte. Therefore using  
> php's strtoupper()/strtolower() will not work correctly with UTF-8  
> encoded strings with non ascii characters.

Thanks for tracking this down so deep.






More information about the bugs mailing list