suggestion - turba, LDAP, UTF-8, ISO-8859-2

Maciej Uhlig muhlig@us.edu.pl
Fri, 18 Jan 2002 12:36:47 +0100


There were some messages on Unicode support for LDAP in Turba in November
2001.
Quoting Jan:
http://marc.theaimsgroup.com/?l=horde-dev&m=100619132529927&w=2:

>We just started to be aware of any unicode issues. So we don't have any
>direction yet how to handle such things.
>But in this special case it not only depends on us how to handle unicode
>but also on the storage backend. I don't think we can rely on them to
>handle unicode entries correctly.

>If the data in the windows addressbook is utf-8 I guess the best thing we
>can do now is to use utf8_decode().

Well, I live in Poland. We use ISO-8859-2 as charset.
utf8_decode|utf8_encode
work for ISO-8859-1 only (this is noted as PHP bug/feature request 12225
BTW).
So, current Turba /horde/turba/lib/Driver/ldap.php code is unuseable for us
here.
Instead Polish diacritic mark, you get simply "?". Of course, OpenLDAP
server stores
these marks as Unicode UTF-8.

I'd like to propose an idea of the solution, which actually works for me
very well.
I don't provide a patch because I'm not able to make the solution clean, but
I hope
a Horde/Turba developer can make instant use of this.

The dirty (but simple and working) solution is:

- configure PHP with libiconv (--with-iconv) - which would be needed for
this
functionality - libiconv is the only free software I found which can do
UTF-8->ISO-8859-2 recoding.
- edit /horde/turba/lib/Driver/ldap.php and change it in two places:

from: 	$val = utf8_encode($val);
to: 		$val = libiconv("ISO-8859-2", "UTF-8", $val);

and

from: 	$addr[$field] .= utf8_decode($entry[$field][$j]);
to: 		$addr[$field] .= libiconv("UTF-8", "ISO-8859-2", $entry[$field][$j]);

I was not able to use $language and $nls['charsets'][$language] in ldap.php
because these were undefined there (I didn't work hard on it though). One
should be
able to define frontend and backend charsets in the configuration files not
in the
ldap.php itself.

One should also change 'utf8' to 'UTF-8' in turba/config/sources.php and
ldap.php
to be compatible with libiconv charset name convention.

Hope it will be helpful for the developers. Thank you.

--
Maciek Uhlig
Computer Center, University of Silesia, Katowice, Poland