[i18n] help.xml translation problem

Mon Jun 2 12:20:02 UTC 2008

Hello Vilius Šumskas,

<http://www.horde.org/horde/docs/?f=po_README.html> says,
w.r.t. translations:
> It is important to set the correct charset for the locale in the
> Content-Type: header
and w.r.t. help files:
> The help files must be encoded in the language's preferred character
> set.

I had asked:
> Now, what would happen, if you
> - set the Conten-Type header in a translation source to UTF-8,
> - convert both that translation and the pertinent help file to UTF-8,
> - and re-compile the translation?

You have written:
> Well, firstly preferred character set for the language Arminas is trying to
> translate *is* ISO-8859-13.

So, the very 1st question is: Where is that preferred character set
for any language defined? I was under the impression that it is the
charset defined in the Content-Type header of the pertinent translation,
but I could be completely wrong.

And the answer should go into the horde/po/README file.

You also have written:
> Secondly, special characters in help files must be entered numerically
> especially when they are Unicode.

I guess, this is a misunderstanding.

<http://www.horde.org/horde/docs/?f=po_README.html> says,
and w.r.t. help files:
> "There are no predefined entities beyond the XML standard entities [...]
> Any character available in the language's preferred character set
> can be entered as a numerical character reference (based on its Unicode
> scalar value), such as &#160; for the No-Break Space character."

This quote says "can", not "must". Incidentally, I am the author of
that paragraph, and I meant to make two points:
- you cannot use, e. g., "&nbsp;" in "&#160;"'s stead,
- you can only use the characters available in the language's preferred
   character set (wherever this may be defined), not arbitrary Unicode
   characters (as in many other XML sources).

Do you think, I should have been more explicit? Would you prefer a different
wording, and if so, which one?

If UTF-8 happens to be the language's preferred character set, then you
can enter any conceivable character, as UTF-8 comprises the entire UCS
(Universal Character Set). On the other hand, if the preferred character
set is, e. g., ISO 8859-1 (as for German), you cannot even include a
particular character present on the German keyboard, viz. '€', as this
one is not contained in that charset.

So, I guess, the best long-term route translators could embark on,
would be to use UTF-8, throughout. This would give them the freedom
to express their thoughts freely, in particular, to use typographically
sound punktuation characters, such as (curly) apostrophes, (anti-
symmetric) quote symbols, and the correct sorts of hyphens, dashes and
similar marks -- all of these character are missing from the ISO
8-bit fonts. Thence my question.

> Thirdly, I think translation.php is not Unicode aware. But I could be wrong.

How can then km_KH and sl_SI work? Both of those translations have
utf-8 defined as their respective charset (in the pertinent .po files
for Horde, Imp, and Turba).

Jan, can you shed some light on the whole issue?

Best wishes,
   Otto Stolz