[Tickets #9617] Re: db_migrate and incorrect charset handling
bugs at horde.org
bugs at horde.org
Mon Apr 4 14:33:13 UTC 2011
DO NOT REPLY TO THIS MESSAGE. THIS EMAIL ADDRESS IS NOT MONITORED.
Ticket URL: http://bugs.horde.org/ticket/9617
------------------------------------------------------------------------------
Ticket | 9617
Updated By | Jan Schneider <jan at horde.org>
Summary | db_migrate and incorrect charset handling
Queue | Horde Framework Packages
Version | Git master
Type | Bug
-State | Resolved
+State | Assigned
Priority | 1. Low
Milestone |
Patch |
-Owners | Michael Rubinsky
+Owners | Jan Schneider, Michael Rubinsky
------------------------------------------------------------------------------
Jan Schneider <jan at horde.org> (2011-04-04 14:33) wrote:
>>> PHP's manual suggest that one should not assume that
>>> strtolower()/strtoupper() work correctly with
>>> multibyte charset like utf-8.
>>
>> Where does it say that? I don't see any such suggestions in the man pages.
>
> It does not it say it in so many words or at least says it
> ambiguously: "Note that 'alphabetic' is determined by the current
> locale"
Which is exactly what we want.
> But if we look at php's source code for strtoupper() it works by
> bytes, therefore it will not work correctly with UTF-8 encoded
> strings that contain non ascii characters.
So the manual is plain wrong.
> Excerpt from ext/standard/string.c:
> char *php_strtoupper(char *s, size_t len)
> {
> unsigned char *c, *e;
>
> c = (unsigned char *)s;
> e = (unsigned char *)c+len;
>
> while (c < e) {
> *c = toupper(*c);
> c++;
> }
> return s;
> }
>
> The non ascii characters in UTF-8 are multi byte. Therefore using
> php's strtoupper()/strtolower() will not work correctly with UTF-8
> encoded strings with non ascii characters.
Thanks for tracking this down so deep.
More information about the bugs
mailing list