[dev] [commits] Horde branch master updated. fa8d644f8ca59a43056e5eae668ff1699d36be24
Jan Schneider
jan at horde.org
Mon Dec 13 16:45:55 UTC 2010
Zitat von Vilius Šumskas <vilius at lnk.lt>:
>> >> commit fa8d644f8ca59a43056e5eae668ff1699d36be24
>> >> Author: Jan Schneider <jan at horde.org>
>> >> Date: Tue Nov 30 15:41:59 2010 +0100
>> >>
>> >> Remove charset methods.
>> >>
>> >> They are only implemented in the MySQL drivers anyway and using SET
>> >> NAMES causes more problems than it solves. Actually it breaks the
>> >> current share driver.
>> >
>> > Does this mean that it will not be implemented and we are stuck with
>> > ISO-8859-1 connection charsets forever? Horde is the only
>> > professional application which doesn't use charsets correctly on DB
>> > level.
>>
>> DB charset handling has always worked fine in Horde. I removed this
>
> True if you don't manage Horde's databases with external tools or if
> you don't integrate it with other applications.
>
> As far as I remember there were dozen of questions in the mailling
> list "why phpmyadmin displays my Horde database with wrong
> characters".
I have never experienced any issues with several generations of
phpMyAdmin, whether the data was UTF-8 or Latin1, whether it was data
from H3 or H4.
>> stuff from Horde_Db because we spent almost an hour with two
>> developers to find out why we had some broken charset conversions with
>> the current Horde_Db code. We tried to align the MySQL documentation
>> with the behavior we've been seeing in different environments, to no
>> avail. It doesn't work as documented, and it doesn't work well either.
>
> I would gladly spare some of my free time for this. I've dealt with
> web application/MySQL charset problems numerious times before. From
> my perspective it works very well and consistent. Unless you are in
> some rare case where it doesn't and which I don't know about. What
> exactly were the problems?
I'm going to paste the discussion we had for completeness below. And
because I'm too lazy to summarize it. :)
>> In the end, after we removed the SET NAMES call, everything started to
>> work properly again, with old data, with new data, with UTF-8 data,
>> with Latin1 data.
>
> By 'old' and 'new' you mean, you've changed db charset in Horde config?
No, this means with older existing data from Horde 3 as well as new
data created with Horde 4.
>> Unless someone comes up with a convincing explanation, why SET NAMES
>> is really necessary, and shows me consistent, working, documented
>> behavior with different charsets and existing as well as new data,
>> plus an implementation that works with other databases too and with
>> the way we do charset conversions in Horde, this won't be added back.
>
> 1. It takes more space to store the same text. This is simple example:
>
> mysql> create table test (tekstas VARCHAR(255)) ENGINE=MyISAM;
> Query OK, 0 rows affected (0.06 sec)
>
> mysql> insert into test set tekstas='Premjera bus gruodžio 3 dieną,
> 19 val. "Forum Cinemas Vingis"';
> Query OK, 1 row affected (0.00 sec)
>
> mysql> set names utf8;
> Query OK, 0 rows affected (0.00 sec)
>
> mysql> insert into test set tekstas='Premjera bus gruodžio 3 dieną,
> 19 val. "Forum Cinemas Vingis"';
> Query OK, 1 row affected (0.00 sec)
>
> mysql> select length(tekstas) from test;
> +-----------------+
> | length(tekstas) |
> +-----------------+
> | 68 |
> | 63 |
> +-----------------+
> 2 rows in set (0.00 sec)
>
> You can imagine how numbers differ if you have russian or other non
> latin text.
That's really a lame reason. Especially since no one forces you to use
a multi-byte charset for the backend storage. If shortness of strings
is more important for you than interoperability, you are still free to
choose a latin charset or any other legacy charset instead of UTF-8.
> 2. It confuses and irritates administrators as they spent hours
> debuging why the same text is shown ok in one application but badly
> in the second. And I must remind you that administrators are these
> people that bring Horde to users... At least most of the time.
>
> 3. Sorting of data is inefficient and in bad order. For example
> Turba contacts with special characters in their surnames.
Sorting is done by collations, not by charsets.
Jan.
(17:22:11) yunosh: mrubinsk: could it be there's something wrong with
charsets in the share code?
(17:22:48) mrubinsk: not sure… what are you seeing?
(17:23:00) mrubinsk: maybe related to the caching/serialization?
(17:23:12) yunosh: it looks like it always assumes the backend charset
is latin1. i.e. it converts utf8 share names to latin1 in the backend
(17:23:47) mrubinsk: weird… I'll look
(17:25:49) mrubinsk: Horde_Share_Sql::toDriverCharset() converts from
utf-8 to whatever Horde_Db reports as it's charset… let's see what
Horde_Db is doing...
(17:36:44) mrubinsk: yunosh: what's the value of $conf[sql][charset]?
(17:36:56) yunosh: utf-8
(17:37:05) mrubinsk: hm
(17:37:26) mrubinsk: If I set mine to utf-8 $_db->getOption('charset')
returns utf-8...
(17:37:40) mrubinsk: let me try saving some utf-8 names
(17:43:48) mrubinsk: I'm not seeing any issues using a few different
special characters, they seem to be saved then reloaded and displayed
correctly
(17:44:06) mrubinsk: do you have an example text string that is not working?
(17:44:27) yunosh: the conversion is done in both directions, so yes,
it works if you retrieve chars that you just created
(17:44:37) yunosh: i'm seeing issues with old data
(17:44:46) yunosh: did you check the table contents?
(17:45:09) mrubinsk: ah '???'
(17:45:10) mrubinsk: hm
(17:59:43) mrubinsk: I'm lost. The log entry for the sql insert
statement in Horde_Db shows the correct characters
(18:00:02) mrubinsk: So it's sending the right characters, right?
(18:05:04) yunosh: yeah, sounds good. and do you see the correct
characters in table too?
(18:05:25) mrubinsk: no, that's what is confusing me… the table shows '???'
(18:06:00) yunosh: huh
(18:06:10) yunosh: so, that's what i see too
(18:06:14) mrubinsk: and the charset for the table is listed as utf-8
(18:06:19) yunosh: yeah
(18:09:21) mrubinsk: ..and manually updating a name via the command
line causes weird results in the UI
(18:15:12) mrubinsk: it seems to be an issue with anything using horde_db
(18:15:15) yunosh: wrobel: ping?
(18:15:41) mrubinsk: well, only tested tags also, but other, MDB2, DB
usage seems to work
(18:18:37) yunosh: what happens if you comment out the SET NAMES
statemen in Horde_Db_Adapter_Mysql_Schema?
(18:19:42) mrubinsk: bingo
(18:20:44) mrubinsk: though I don't understand *why* :)
(18:21:37) yunosh: people keep arguing we should add this to our h3 db
code. but afaiu this setting make mysql automatically convert data to
the db's charset. but since we already do this in userland code, this
is not necessary
(18:22:29) yunosh: but even if we do this, it shouldn't have any
effect, because we already sent utf-8 data, store utf-8 in the
backend, get utf-8 from the interface. i.e. there shouldn't be *any*
conversion at all
(18:23:06) mrubinsk: yea, exactly
(18:23:43) yunosh: and even though i consider myself a charset expert,
i still don't graps mysql's documenation about charset handling
completely
(18:23:58) mrubinsk: well, than I have absolutely no hope :)
(18:24:16) mrubinsk: which makes me feel slightly better, since I'm
having a hard time understand the mysql docs at the moment
(18:24:22) yunosh: can you verify that we indeed *set* the correct
charset in the SET NAMES statement?
(18:25:08) mrubinsk: var_dump($charset) shows utf8, which is what is
set in the config
(18:26:02) yunosh: well, then i really don't understand it. either
mysql doesn't work as advertised, or. well, i don't know any other
reason
(18:28:19) yunosh: strange thing is that this is working perfectly
fine on my dev server
(18:28:52) mrubinsk: hm, is there some my.cf setting that needs to be
made for charset also?
(18:29:03) yunosh: the table's charset is utf8, right?
(18:29:32) yunosh: and the db's too?
(18:30:12) mrubinsk: if I execute: show create table … it shows utf8
(18:30:49) mrubinsk: ah, show create database shows latin1
(18:32:49) mrubinsk: doh, wrong machine … one sec
(18:33:28) mrubinsk: yea, same on this machine too… testing again
(18:34:03) mrubinsk: doesn't fix the problem
(18:34:33) mrubinsk: both table and db are utf8
(18:35:47) yunosh: i don't have any charset settings in the mysql
configuration. do you?
(18:38:49) mrubinsk: no
(18:38:52) mrubinsk: but wait..
(18:39:11) mrubinsk: if I execute a set names utf8; on the command
line, the characters show up correctly
(18:40:30) mrubinsk: so maybe the command line client is defaulting to
latin1 and the data is actually correct?
(18:41:47) yunosh: that still doesn't explain the different behavior
between your and my server
(18:42:02) yunosh: i see the same behavior with the command line client though
(18:42:18) yunosh: i.e. the data is only correct if i use SET NAMES
(18:42:45) yunosh: so that still doesn't tell us how the data is
actually stored
(18:43:35) yunosh: oh crap, i mixed up the working and not working server
(18:44:01) yunosh: so, the server that works is using iso-8859-1 for
the sql config
(18:44:19) yunosh: my head spins
(18:44:37) mrubinsk: yea, that's what my server *was* using before I
just started testing…and probably why I hadn't seen any issues.
(18:45:09) mrubinsk: ok.. so, set names is telling mysql to expect
utf8 and send back utf8 to the client, right?
(18:45:38) mrubinsk: assuming the sql config is utf8, that is.
(18:45:56) yunosh: theoretically yes
(18:46:50) yunosh: but it doesn't seem to be necessary. like it never
was. actually it breaks things, right?
(18:47:31) yunosh: i.e. both you with your utf-8 setting and me with
my iso-8859-1 setting see everything working fine if we remove that
statement?
(18:47:48) mrubinsk: yea
(18:48:11) yunosh: is there any reason to keep it then?
(18:48:14) mrubinsk: ..and it worked fine with the iso-8859-1 setting
I was using until now
(18:48:22) yunosh: yeah, same here
(18:48:32) yunosh: but not with utf-8 + set names enabled, right?
(18:48:41) mrubinsk: I don't see why we need it if we are doing all
the conversion in userland
(18:48:50) mrubinsk: right
(18:49:12) mrubinsk: …unless there is some other mysql charset related
setting that we are missing on the server
(18:49:44) yunosh: there might be an issue with sorting. because
sorting depends on the collation
(18:50:23) yunosh: though i have no idea if SET NAMES has any
influence on that. at least not from the mysql docs
(18:50:37) mrubinsk: these are the session variables I have in mysql,
if it helps:
(18:50:39) mrubinsk: | character_set_client | utf8
|
(18:50:39) mrubinsk: | character_set_connection | utf8
|
(18:50:39) mrubinsk: | character_set_database | latin1
|
(18:50:39) mrubinsk: | character_set_filesystem | binary
|
(18:50:39) mrubinsk: | character_set_results | utf8
|
(18:50:39) mrubinsk: | character_set_server | latin1
|
(18:50:39) mrubinsk: | character_set_system | utf8
(18:51:19) mrubinsk: not sure what the server setting is doing...
(18:52:55) yunosh: docs say, that just the default charset. afaiu that
means unless specified differently for the db or table
(18:54:49) mrubinsk: k
(18:59:03) mrubinsk: the docs say that "For comparisons of strings
with column values,
http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_collation_connection does not matter because columns have their own collation, which has a higher collation precedence." Not sure if that also refers to sorting, but it sounds like it
should.
(18:59:53) yunosh: well, we don't set the collation anyway, only the
charset, so this shouldn't matter
(19:00:31) yunosh: since we don't set the collation, it should be
possible to set it by choosing proper db/table/mysql setttings
(19:01:27) mrubinsk: we don't set it explicitly, but the docs say the
set names resets it to the default collation for the charset
(19:01:47) yunosh: one more reason to *not* use SET NAMES
(19:01:48) mrubinsk: …if that matters :)
(19:01:52) mrubinsk: yea, exactly
(19:03:29) mrubinsk: this is really confusing. mysql will convert FROM
character_set_client TO character_set_connection but it also says
that SET NAMES sets both of those values to the same charset, so I
don't see why any conversion would take place at all.
(19:03:46) yunosh: exactly
(19:04:42) mrubinsk: not to mention, it also sets
character_set_results to the same value as well… so no conversion
going the other way either. *sigh*
(19:04:54) mrubinsk: I say we just get rid of the statement and see
how that plays out
(19:05:10) yunosh: if only we could look at the data in the table
without *any* conversion going on. but that doesn't even seem to be
possible
(19:05:44) yunosh: that's the only thing that left me worrying,
because of interaction with other systems
(19:05:58) mrubinsk: hm, yeah, good point
(19:06:40) yunosh: otoh we never used set names in h3 and there only
have been very few complaints
(19:07:11) yunosh: it seems to work just fine for 99.99% (or mrore) of
the users
(19:08:34) yunosh: and fwiw, the mysql drivers are only ones in
Horde_Db that use the 'charset' setting at all
(19:09:10) mrubinsk: heh
(19:09:42) yunosh: i'm going to remove it completely
(19:10:02) mrubinsk: k
--
Do you need professional PHP or Horde consulting?
http://horde.org/consulting/
More information about the dev
mailing list