[dev] migration script for data with bad charset in SQL database

Jan Schneider jan at horde.org
Thu Sep 8 12:45:00 UTC 2011


Zitat von Vilius ?umskas <vilius at lnk.lt>:

> Hello,
>
> I have upgraded Whups recently and stumbled on a charset bug which I  
> never realized still exist in most of the applications.
>
> When correct MySQL charset handling was implemented in Horde_Db it  
> automatically enabled mysql <-> php charset conversion (which is  
> good). But all old data was written into the database like latin1  
> (which is the default in most older MySQL installations, check that  
> by connecting with console mysql client to your server, running  
> "show variables like '%char%';" and looking for  
> "character_set_connection"). This was the case even if you set UTF-8  
> in Horde 3.
>
> So now, when for example I have UTF-8 as my database charset in  
> Horde 4, I get plain UTF-8 characters appearing in Whups, Agora and  
> probably other ex-MDB2 applications I didn't try. I've never noticed  
> this with ex-PEAR::DB applications as I was running modified  
> PEAR::DB version which took care of UTF-8 handling in Horde 3 the  
> correct way. But I assume that for others with standard PEAR::DB  
> this still could be the problem when upgrading to H4.
>
> What needs to be done is a script, which takes all the data, dumps  
> it at default *MySQL* charset which was used with H3. And later  
> imported again through Horde_Db which should take care of data  
> automatically. I can volunteer to do so. However there is couple of  
> issues must be taken care of first.
>
> 1. Script must be dependent on default MySQL connection charset used  
> with H3. Do we assume that Horde administrator knows what he is  
> doing and he must input the charset manually? Or do we assume that  
> H3 was installed on the same server the script will be run on and  
> get it automatically from MySQL?

I'd say let's make this like in the existing convert-to-utf8 scripts  
and prompt for both the input and output charset, but default to  
iso-8859-1 for input and utf-8 for output.

> 2. Database/table charsets for horde tables needs to be changed to  
> match current Horde 4 charset setting in conf.php. Do you think the

Only if the target charset is indeed changed. Or does this problem  
only show up when switching to a different charset? In that case, the  
existing convert-to-utf8 script would do, no?

> following order for this is OK?
>     a) script dump all the data.
>    b) renames old database for backup purposes.
>    c) creates current H3 *schema* from backed-up old database with  
> correct charsets (like CREATE (....) ENGINE=.... DEFAULT  
> CHARSET=utf8).

That's not supported by Horde_Db's migrations.

>    d) imports data.
>
> 3. Should this script be the part of the upgrade process in every  
> application? But giving the above I'm not sure it is really  
> possible, no?

I'm still don't understand if this is necessary in any case, or only  
when switching to utf8. If the latter, this should be separated  
conversion script so that the conversion could be done at any time.

> As said, I imagine this bug would affect a fare number of users  
> running MySQL with standard configuration files. For example if you  
> had init-connect='SET NAMES utf8' in [mysql] section of your my.cnf  
> file it would not affect you.

I've only seen one such issue locally with Wicked recently, but this  
could as well have been some botched conversion while playing with  
migrations and charset conversions. I could probably test this again  
though.

Jan.

-- 
Do you need professional PHP or Horde consulting?
http://horde.org/consulting/



More information about the dev mailing list