[dev] suggestion - turba, LDAP, UTF-8, ISO-8859-2

Jan Schneider jan@horde.org
Wed, 23 Jan 2002 22:44:53 +0100


Zitat von Maciej Uhlig <muhlig@us.edu.pl>:

> There were some messages on Unicode support for LDAP in Turba in
> November
> 2001.
> Quoting Jan:
> http://marc.theaimsgroup.com/?l=horde-dev&m=100619132529927&w=2:
> 
> >We just started to be aware of any unicode issues. So we don't have any
> >direction yet how to handle such things.
> >But in this special case it not only depends on us how to handle
> unicode
> >but also on the storage backend. I don't think we can rely on them to
> >handle unicode entries correctly.
> 
> >If the data in the windows addressbook is utf-8 I guess the best thing
> we
> >can do now is to use utf8_decode().
> 
> Well, I live in Poland. We use ISO-8859-2 as charset.
> utf8_decode|utf8_encode
> work for ISO-8859-1 only (this is noted as PHP bug/feature request 12225
> BTW).
> So, current Turba /horde/turba/lib/Driver/ldap.php code is unuseable for
> us
> here.
> Instead Polish diacritic mark, you get simply "?". Of course, OpenLDAP
> server stores
> these marks as Unicode UTF-8.
> 
> I'd like to propose an idea of the solution, which actually works for me
> very well.
> I don't provide a patch because I'm not able to make the solution clean,
> but
> I hope
> a Horde/Turba developer can make instant use of this.
> 
> The dirty (but simple and working) solution is:
> 
> - configure PHP with libiconv (--with-iconv) - which would be needed for
> this
> functionality - libiconv is the only free software I found which can do
> UTF-8->ISO-8859-2 recoding.
> - edit /horde/turba/lib/Driver/ldap.php and change it in two places:
> 
> from: 	$val = utf8_encode($val);
> to: 		$val = libiconv("ISO-8859-2", "UTF-8", $val);
> 
> and
> 
> from: 	$addr[$field] .= utf8_decode($entry[$field][$j]);
> to: 		$addr[$field] .= libiconv("UTF-8", "ISO-8859-2",
> $entry[$field][$j]);

I took a closer look at libiconv for the first time after this message and 
start thinking if we can't use libiconv for more than converting unicode 
data from ldap.

I have this scenery in my mind:
We determine by a Browser:: method if the browser supports unicode - using 
the browser version or http_accept_charset header. If it doesn't we keep 
things going like they used to.
If it does support unicode we put the iconv handler on the output buffer 
stack (after http compression if enabled). If we come across some content 
that's not in the selected language's charset like emails in a different 
charset we print out the content until (now converted into unicode) print 
the other content (also converting by iconv into unicode) and stack the 
iconv handler to the output buffering stack again.

The result would be fully unicoded apps, where you can have the ui in 
russian and read chinese messages without any information loss.

Thoughts on this appreciated.

Jan.

--
::::::::::::::::::::::::::::::::::::::::
AMMMa AG - discover your knowledge
:::::::::::::::::::::::::::
Detmolder Str. 25-33 :: D-33604 Bielefeld
fon +49.521.96878-0 :: fax  +49.521.96878-20
http://www.ammma.de
::::::::::::::::::::::::::::::::::::::::::::::