[horde] switching to UTF8 database
Andrew Morgan
morgan at orst.edu
Wed Feb 8 02:38:13 UTC 2012
On Tue, 7 Feb 2012, Jan Schneider wrote:
>
> Zitat von Andrew Morgan <morgan at orst.edu>:
>
>> On Tue, 7 Feb 2012, Jan Schneider wrote:
>>
>>>
>>> Zitat von Andrew Morgan <morgan at orst.edu>:
>>>
>>>> Is this really a problem for latin1 to utf8? If there are no conversions
>>>> that result in a double-byte character, then I think it would be okay.
>>>
>>> That has nothing to do with the charset though. If you data only contains
>>> ascii data, it doesn't matter to which charset you convert, indeed. As
>>> soon it latin1-specific characters, it will break.
>>
>> Can you explain why? How does the string length change? Perhaps an
>> example would help me understand. :)
>
> latin1, windows-1251, iso-8859-1, etc. are single-byte charsets, each
> character is a byte.
> UTF-8 is multi-byte charset with variable character lengths. ASCII characters
> in UTF-8 are still single bytes with the same code like in US-ASCII, but any
> non-ASCII characters like äöuáé etc take at least two bytes.
I made a test table and PHP script to play with this. Oh god this problem
is ugly!
I don't know if this will come out right, but here is what I found.
Using these starting values:
val=äöuáé
serval=s:9:"äöuáé";
1. Client: latin1 Table: latin1
stored in MySQL: s:9:"äöuáé";
PHP can correctly deserialize previously stored values
2. Client: latin1 Table: utf8
stored in MySQL: s:9:"äöuáé";
PHP can correctly deserialize previously stored values
3. Client: utf8 Table: utf8
stored in MySQL: s:9:"äöuáé";
PHP cannot deserialize latin1 stored values
4. Client: utf8 Table: latin1
stored in MySQL: s:9:"äöuáé";
PHP cannot deserialize latin1 stored values
So if I am understanding this right, once I change the CLIENT charset from
latin1 to utf8, everything that is not ASCII will be wrong.
Is there any tool to scan all my tables for non-ASCII values? Maybe I'll
be lucky and my users will have only used ASCII....
Andy
More information about the horde
mailing list