[Tickets #9617] Re: db_migrate and incorrect charset handling

bugs at horde.org bugs at horde.org
Mon Apr 4 14:24:39 UTC 2011


DO NOT REPLY TO THIS MESSAGE. THIS EMAIL ADDRESS IS NOT MONITORED.

Ticket URL: http://bugs.horde.org/ticket/9617
------------------------------------------------------------------------------
  Ticket             | 9617
  Updated By         | leena.heino at uta.fi
  Summary            | db_migrate and incorrect charset handling
  Queue              | Horde Framework Packages
  Version            | Git master
  Type               | Bug
  State              | Resolved
  Priority           | 1. Low
  Milestone          |
  Patch              |
  Owners             | Michael Rubinsky
------------------------------------------------------------------------------


leena.heino at uta.fi (2011-04-04 14:24) wrote:

>> PHP's manual suggest that one should not assume that  
>> strtolower()/strtoupper() work correctly with
>> multibyte charset like utf-8.
>
> Where does it say that? I don't see any such suggestions in the man pages.

It does not it say it in so many words or at least says it  
ambiguously: "Note that 'alphabetic' is determined by the current  
locale"

But if we look at php's source code for strtoupper() it works by  
bytes, therefore it will not work correctly with UTF-8 encoded strings  
that contain non ascii characters.

Excerpt from ext/standard/string.c:
char *php_strtoupper(char *s, size_t len)
{
         unsigned char *c, *e;

         c = (unsigned char *)s;
         e = (unsigned char *)c+len;

         while (c < e) {
                 *c = toupper(*c);
                 c++;
         }
         return s;
}

The non ascii characters in UTF-8 are multi byte. Therefore using  
php's strtoupper()/strtolower() will not work correctly with UTF-8  
encoded strings with non ascii characters.







More information about the bugs mailing list