[imp] Signature in UTF-8 [solved]
Jan Schneider
jan at horde.org
Thu Feb 7 16:10:21 UTC 2008
Zitat von "Daniel A. Ramaley" <daniel.ramaley at DRAKE.EDU>:
>>> Is any conversion of the dump file itself beyond changing the CREATE
>>> DATABASE encoding necessary? Right now if i perform a dump, the
>>> "file" utility reports the result as ASCII text.
>>
>> If you are absolutely sure that you only have ASCII data, this is
>> sufficient. But as soon as you have latin1 aka iso-8859-1 characters
>> above 127 inside your data, this no longer works, because those
>> characters are multibyte in utf-8. And this breaks at least in those
>> cases where we store serialized arrays.
>
> I'll be sure to do some more rigorous tests on the dump file to verify
> that it only contains ASCII; i think some versions of the "file"
> command only examine the first X characters of the file, for some value
> of X. If the dump is actually ISO-8859-1 i can probably just use iconv
> to switch it to UTF-8.
That still wouldn't solve the problem. Serialized PHP arrays track
string lengths. But PHP versions lower than 6 don't really know about
string characters, they all consider strings as binary. As a result
the same string has different lengths in different charsets. If you
only convert the string contents, the string length stored in the
serialized arrays don't match anymore, and PHP is no longer able to
unserialize them.
Jan.
--
Do you need professional PHP or Horde consulting?
http://horde.org/consulting/
More information about the imp
mailing list