[imp] Signature in UTF-8 [solved]

Jan Schneider jan at horde.org
Thu Feb 7 16:10:21 UTC 2008


Zitat von "Daniel A. Ramaley" <daniel.ramaley at DRAKE.EDU>:

>>> Is any conversion of the dump file itself beyond changing the CREATE
>>> DATABASE encoding necessary? Right now if i perform a dump, the
>>> "file" utility reports the result as ASCII text.
>>
>> If you are absolutely sure that you only have ASCII data, this is
>> sufficient. But as soon as you have latin1 aka iso-8859-1 characters
>> above 127 inside your data, this no longer works, because those
>> characters are multibyte in utf-8. And this breaks at least in those
>> cases where we store serialized arrays.
>
> I'll be sure to do some more rigorous tests on the dump file to verify
> that it only contains ASCII; i think some versions of the "file"
> command only examine the first X characters of the file, for some value
> of X. If the dump is actually ISO-8859-1 i can probably just use iconv
> to switch it to UTF-8.

That still wouldn't solve the problem. Serialized PHP arrays track  
string lengths. But PHP versions lower than 6 don't really know about  
string characters, they all consider strings as binary. As a result  
the same string has different lengths in different charsets. If you  
only convert the string contents, the string length stored in the  
serialized arrays don't match anymore, and PHP is no longer able to  
unserialize them.

Jan.

-- 
Do you need professional PHP or Horde consulting?
http://horde.org/consulting/



More information about the imp mailing list