[dev] [commits] Horde branch master updated. fa8d644f8ca59a43056e5eae668ff1699d36be24

Jan Schneider jan at horde.org
Mon Dec 13 16:45:55 UTC 2010


Zitat von Vilius Šumskas <vilius at lnk.lt>:

>> >> commit fa8d644f8ca59a43056e5eae668ff1699d36be24
>> >> Author: Jan Schneider <jan at horde.org>
>> >> Date:   Tue Nov 30 15:41:59 2010 +0100
>> >>
>> >>     Remove charset methods.
>> >>
>> >>     They are only implemented in the MySQL drivers anyway and using SET
>> >>     NAMES causes more problems than it solves. Actually it breaks the
>> >>     current share driver.
>> >
>> > Does this mean that it will not be implemented and we are stuck with
>> > ISO-8859-1 connection charsets forever? Horde is the only
>> > professional application which doesn't use charsets correctly on DB
>> > level.
>>
>> DB charset handling has always worked fine in Horde. I removed this
>
> True if you don't manage Horde's databases with external tools or if  
> you don't integrate it with other applications.
>
> As far as I remember there were dozen of questions in the mailling  
> list "why phpmyadmin displays my Horde database with wrong  
> characters".

I have never experienced any issues with several generations of  
phpMyAdmin, whether the data was UTF-8 or Latin1, whether it was data  
from H3 or H4.

>> stuff from Horde_Db because we spent almost an hour with two
>> developers to find out why we had some broken charset conversions with
>> the current Horde_Db code. We tried to align the MySQL documentation
>> with the behavior we've been seeing in different environments, to no
>> avail. It doesn't work as documented, and it doesn't work well either.
>
> I would gladly spare some of my free time for this. I've dealt with  
> web application/MySQL charset problems numerious times before. From  
> my perspective it works very well and consistent. Unless you are in  
> some rare case where it doesn't and which I don't know about. What  
> exactly were the problems?

I'm going to paste the discussion we had for completeness below. And  
because I'm too lazy to summarize it. :)

>> In the end, after we removed the SET NAMES call, everything started to
>> work properly again, with old data, with new data, with UTF-8 data,
>> with Latin1 data.
>
> By 'old' and 'new' you mean, you've changed db charset in Horde config?

No, this means with older existing data from Horde 3 as well as new  
data created with Horde 4.

>> Unless someone comes up with a convincing explanation, why SET NAMES
>> is really necessary, and shows me consistent, working, documented
>> behavior with different charsets and existing as well as new data,
>> plus an implementation that works with other databases too and with
>> the way we do charset conversions in Horde, this won't be added back.
>
> 1. It takes more space to store the same text. This is simple example:
>
> mysql> create table test (tekstas VARCHAR(255)) ENGINE=MyISAM;
> Query OK, 0 rows affected (0.06 sec)
>
> mysql> insert into test set tekstas='Premjera bus gruodžio 3 dieną,  
> 19 val. "Forum Cinemas Vingis"';
> Query OK, 1 row affected (0.00 sec)
>
> mysql> set names utf8;
> Query OK, 0 rows affected (0.00 sec)
>
> mysql> insert into test set tekstas='Premjera bus gruodžio 3 dieną,  
> 19 val. "Forum Cinemas Vingis"';
> Query OK, 1 row affected (0.00 sec)
>
> mysql> select length(tekstas) from test;
> +-----------------+
> | length(tekstas) |
> +-----------------+
> |              68 |
> |              63 |
> +-----------------+
> 2 rows in set (0.00 sec)
>
> You can imagine how numbers differ if you have russian or other non  
> latin text.

That's really a lame reason. Especially since no one forces you to use  
a multi-byte charset for the backend storage. If shortness of strings  
is more important for you than interoperability, you are still free to  
choose a latin charset or any other legacy charset instead of UTF-8.

> 2. It confuses and irritates administrators as they spent hours  
> debuging why the same text is shown ok in one application but badly  
> in the second. And I must remind you that administrators are these  
> people that bring Horde to users... At least most of the time.
>
> 3. Sorting of data is inefficient and in bad order. For example  
> Turba contacts with special characters in their surnames.

Sorting is done by collations, not by charsets.

Jan.


(17:22:11) yunosh: mrubinsk: could it be there's something wrong with  
charsets in the share code?
(17:22:48) mrubinsk: not sure… what are you seeing?
(17:23:00) mrubinsk: maybe related to the caching/serialization?
(17:23:12) yunosh: it looks like it always assumes the backend charset  
is latin1. i.e. it converts utf8 share names to latin1 in the backend
(17:23:47) mrubinsk: weird… I'll look
(17:25:49) mrubinsk: Horde_Share_Sql::toDriverCharset() converts from  
utf-8 to whatever Horde_Db reports as it's charset… let's see what  
Horde_Db is doing...
(17:36:44) mrubinsk: yunosh: what's the value of $conf[sql][charset]?
(17:36:56) yunosh: utf-8
(17:37:05) mrubinsk: hm
(17:37:26) mrubinsk: If I set mine to utf-8 $_db->getOption('charset')  
returns utf-8...
(17:37:40) mrubinsk: let me try saving some utf-8 names
(17:43:48) mrubinsk: I'm not seeing any issues using a few different  
special characters, they seem to be saved then reloaded and displayed  
correctly
(17:44:06) mrubinsk: do you have an example text string that is not working?
(17:44:27) yunosh: the conversion is done in both directions, so yes,  
it works if you retrieve chars that you just created
(17:44:37) yunosh: i'm seeing issues with old data
(17:44:46) yunosh: did you check the table contents?
(17:45:09) mrubinsk: ah '???'
(17:45:10) mrubinsk: hm
(17:59:43) mrubinsk: I'm lost.  The log entry for the sql insert  
statement in Horde_Db shows the correct characters
(18:00:02) mrubinsk: So it's sending the right characters, right?
(18:05:04) yunosh: yeah, sounds good. and do you see the correct  
characters in table too?
(18:05:25) mrubinsk: no, that's what is confusing me… the table shows '???'
(18:06:00) yunosh: huh
(18:06:10) yunosh: so, that's what i see too
(18:06:14) mrubinsk: and the charset for the table is listed as utf-8
(18:06:19) yunosh: yeah
(18:09:21) mrubinsk: ..and manually updating a name via the command  
line causes weird results in the UI
(18:15:12) mrubinsk: it seems to be an issue with anything using horde_db
(18:15:15) yunosh: wrobel: ping?
(18:15:41) mrubinsk: well, only tested tags also, but other, MDB2, DB  
usage seems to work
(18:18:37) yunosh: what happens if you comment out the SET NAMES  
statemen in Horde_Db_Adapter_Mysql_Schema?
(18:19:42) mrubinsk: bingo
(18:20:44) mrubinsk: though I don't understand *why* :)
(18:21:37) yunosh: people keep arguing we should add this to our h3 db  
code. but afaiu this setting make mysql automatically convert data to  
the db's charset. but since we already do this in userland code, this  
is not necessary
(18:22:29) yunosh: but even if we do this, it shouldn't have any  
effect, because we already sent utf-8 data, store utf-8 in the  
backend, get utf-8 from the interface. i.e. there shouldn't be *any*  
conversion at all
(18:23:06) mrubinsk: yea, exactly
(18:23:43) yunosh: and even though i consider myself a charset expert,  
i still don't graps mysql's documenation about charset handling  
completely
(18:23:58) mrubinsk: well, than I have absolutely no hope :)
(18:24:16) mrubinsk: which makes me feel slightly better, since I'm  
having a hard time understand the mysql docs at the moment
(18:24:22) yunosh: can you verify that we indeed *set* the correct  
charset in the SET NAMES statement?
(18:25:08) mrubinsk: var_dump($charset) shows utf8, which is what is  
set in the config
(18:26:02) yunosh: well, then i really don't understand it. either  
mysql doesn't work as advertised, or. well, i don't know any other  
reason
(18:28:19) yunosh: strange thing is that this is working perfectly  
fine on my dev server
(18:28:52) mrubinsk: hm, is there some my.cf setting that needs to be  
made for charset also?
(18:29:03) yunosh: the table's charset is utf8, right?
(18:29:32) yunosh: and the db's too?
(18:30:12) mrubinsk: if I execute: show create table … it shows utf8
(18:30:49) mrubinsk: ah, show create database shows latin1
(18:32:49) mrubinsk: doh, wrong machine … one sec
(18:33:28) mrubinsk: yea, same on this machine too… testing again
(18:34:03) mrubinsk: doesn't fix the problem
(18:34:33) mrubinsk: both table and db are utf8
(18:35:47) yunosh: i don't have any charset settings in the mysql  
configuration. do you?
(18:38:49) mrubinsk: no
(18:38:52) mrubinsk: but wait..
(18:39:11) mrubinsk: if I execute a set names utf8; on the command  
line, the characters show up correctly
(18:40:30) mrubinsk: so maybe the command line client is defaulting to  
latin1 and the data is actually correct?
(18:41:47) yunosh: that still doesn't explain the different behavior  
between your and my server
(18:42:02) yunosh: i see the same behavior with the command line client though
(18:42:18) yunosh: i.e. the data is only correct if i use SET NAMES
(18:42:45) yunosh: so that still doesn't tell us how the data is  
actually stored
(18:43:35) yunosh: oh crap, i mixed up the working and not working server
(18:44:01) yunosh: so, the server that works is using iso-8859-1 for  
the sql config
(18:44:19) yunosh: my head spins
(18:44:37) mrubinsk: yea, that's what my server *was* using before I  
just started testing…and probably why I hadn't seen any issues.
(18:45:09) mrubinsk: ok.. so, set names is telling mysql to expect  
utf8 and send back utf8 to the client, right?
(18:45:38) mrubinsk: assuming the sql config is utf8, that is.
(18:45:56) yunosh: theoretically yes
(18:46:50) yunosh: but it doesn't seem to be necessary. like it never  
was. actually it breaks things, right?
(18:47:31) yunosh: i.e. both you with your utf-8 setting and me with  
my iso-8859-1 setting see everything working fine if we remove that  
statement?
(18:47:48) mrubinsk: yea
(18:48:11) yunosh: is there any reason to keep it then?
(18:48:14) mrubinsk: ..and it worked fine with the iso-8859-1 setting  
I was using until now
(18:48:22) yunosh: yeah, same here
(18:48:32) yunosh: but not with utf-8 + set names enabled, right?
(18:48:41) mrubinsk: I don't see why we need it if we are doing all  
the conversion in userland
(18:48:50) mrubinsk: right
(18:49:12) mrubinsk: …unless there is some other mysql charset related  
setting that we are missing on the server
(18:49:44) yunosh: there might be an issue with sorting. because  
sorting depends on the collation
(18:50:23) yunosh: though i have no idea if SET NAMES has any  
influence on that. at least not from the mysql docs
(18:50:37) mrubinsk: these are the session variables I have in mysql,  
if it helps:
(18:50:39) mrubinsk: | character_set_client     | utf8                  
                                           |
(18:50:39) mrubinsk: | character_set_connection | utf8                  
                                           |
(18:50:39) mrubinsk: | character_set_database   | latin1                
                                           |
(18:50:39) mrubinsk: | character_set_filesystem | binary                
                                           |
(18:50:39) mrubinsk: | character_set_results    | utf8                  
                                           |
(18:50:39) mrubinsk: | character_set_server     | latin1                
                                           |
(18:50:39) mrubinsk: | character_set_system     | utf8
(18:51:19) mrubinsk: not sure what the server setting is doing...
(18:52:55) yunosh: docs say, that just the default charset. afaiu that  
means unless specified differently for the db or table
(18:54:49) mrubinsk: k
(18:59:03) mrubinsk: the docs say that "For comparisons of strings  
with column values,  
http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_collation_connection does not matter because columns have their own collation, which has a higher collation precedence."  Not sure if that also refers to sorting, but it sounds like it  
should.
(18:59:53) yunosh: well, we don't set the collation anyway, only the  
charset, so this shouldn't matter
(19:00:31) yunosh: since we don't set the collation, it should be  
possible to set it by choosing proper db/table/mysql setttings
(19:01:27) mrubinsk: we don't set it explicitly, but the docs say the  
set names resets it to the default collation for the charset
(19:01:47) yunosh: one more reason to *not* use SET NAMES
(19:01:48) mrubinsk: …if that matters :)
(19:01:52) mrubinsk: yea, exactly
(19:03:29) mrubinsk: this is really confusing. mysql will convert FROM  
character_set_client TO character_set_connection  but it also says  
that SET NAMES sets both of those values to the same charset, so I  
don't see why any conversion would take place at all.
(19:03:46) yunosh: exactly
(19:04:42) mrubinsk: not to mention, it also sets  
character_set_results to the same value as well… so no conversion  
going the other way either. *sigh*
(19:04:54) mrubinsk: I say we just get rid of the statement and see  
how that plays out
(19:05:10) yunosh: if only we could look at the data in the table  
without *any* conversion going on. but that doesn't even seem to be  
possible
(19:05:44) yunosh: that's the only thing that left me worrying,  
because of interaction with other systems
(19:05:58) mrubinsk: hm, yeah, good point
(19:06:40) yunosh: otoh we never used set names in h3 and there only  
have been very few complaints
(19:07:11) yunosh: it seems to work just fine for 99.99% (or mrore) of  
the users
(19:08:34) yunosh: and fwiw, the mysql drivers are only ones in  
Horde_Db that use the 'charset' setting at all
(19:09:10) mrubinsk: heh
(19:09:42) yunosh: i'm going to remove it completely
(19:10:02) mrubinsk: k


-- 
Do you need professional PHP or Horde consulting?
http://horde.org/consulting/



More information about the dev mailing list