[dev] suggestion - turba, LDAP, UTF-8, ISO-8859-2
Jan Schneider
jan@horde.org
Wed, 23 Jan 2002 22:44:53 +0100
Zitat von Maciej Uhlig <muhlig@us.edu.pl>:
> There were some messages on Unicode support for LDAP in Turba in
> November
> 2001.
> Quoting Jan:
> http://marc.theaimsgroup.com/?l=horde-dev&m=100619132529927&w=2:
>
> >We just started to be aware of any unicode issues. So we don't have any
> >direction yet how to handle such things.
> >But in this special case it not only depends on us how to handle
> unicode
> >but also on the storage backend. I don't think we can rely on them to
> >handle unicode entries correctly.
>
> >If the data in the windows addressbook is utf-8 I guess the best thing
> we
> >can do now is to use utf8_decode().
>
> Well, I live in Poland. We use ISO-8859-2 as charset.
> utf8_decode|utf8_encode
> work for ISO-8859-1 only (this is noted as PHP bug/feature request 12225
> BTW).
> So, current Turba /horde/turba/lib/Driver/ldap.php code is unuseable for
> us
> here.
> Instead Polish diacritic mark, you get simply "?". Of course, OpenLDAP
> server stores
> these marks as Unicode UTF-8.
>
> I'd like to propose an idea of the solution, which actually works for me
> very well.
> I don't provide a patch because I'm not able to make the solution clean,
> but
> I hope
> a Horde/Turba developer can make instant use of this.
>
> The dirty (but simple and working) solution is:
>
> - configure PHP with libiconv (--with-iconv) - which would be needed for
> this
> functionality - libiconv is the only free software I found which can do
> UTF-8->ISO-8859-2 recoding.
> - edit /horde/turba/lib/Driver/ldap.php and change it in two places:
>
> from: $val = utf8_encode($val);
> to: $val = libiconv("ISO-8859-2", "UTF-8", $val);
>
> and
>
> from: $addr[$field] .= utf8_decode($entry[$field][$j]);
> to: $addr[$field] .= libiconv("UTF-8", "ISO-8859-2",
> $entry[$field][$j]);
I took a closer look at libiconv for the first time after this message and
start thinking if we can't use libiconv for more than converting unicode
data from ldap.
I have this scenery in my mind:
We determine by a Browser:: method if the browser supports unicode - using
the browser version or http_accept_charset header. If it doesn't we keep
things going like they used to.
If it does support unicode we put the iconv handler on the output buffer
stack (after http compression if enabled). If we come across some content
that's not in the selected language's charset like emails in a different
charset we print out the content until (now converted into unicode) print
the other content (also converting by iconv into unicode) and stack the
iconv handler to the output buffering stack again.
The result would be fully unicoded apps, where you can have the ui in
russian and read chinese messages without any information loss.
Thoughts on this appreciated.
Jan.
--
::::::::::::::::::::::::::::::::::::::::
AMMMa AG - discover your knowledge
:::::::::::::::::::::::::::
Detmolder Str. 25-33 :: D-33604 Bielefeld
fon +49.521.96878-0 :: fax +49.521.96878-20
http://www.ammma.de
::::::::::::::::::::::::::::::::::::::::::::::