[dev] shortcuts

Eric Rostetter eric.rostetter at physics.utexas.edu
Fri Apr 25 15:04:53 PDT 2003


Quoting Jan Schneider <jan at horde.org>:

> > No, the problem can potentially happen with any multi-byte character
> > set except possibly utf, so it should be enabled for many if not all
> > multi-byte character sets except for maybe utf ones.
> >
> > Chuck has already changed it for utf-8, so your problem is at least for
> > now gone.  This is a fringe case, that I don't have a good answer for.
> > Basically, AFAIK, this breaks big-5 support again if utf-8 is enabled.
>
> No, it won't break it.

Correct.  I just tested, and text is only broken when utf isn't enabled.
So Chuck's change was correct, as far as "breaking" things go.  It may
not be correct as far as providing access keys go, since what you end
up with is no access keys in most cases.  But no access keys is better
than unreadable text.

> And to be honest, your "fix" in incorrect because it
> only fixes a symptom.

No, I think it is more that my fix is incomplete and not well tested with
all setups, as it says in the commit log.  I feel it doesn't fix
a sympton, but rather it partially fixes a problem, or actually several
problems.  These problems may be due to the way we do access keys, and
maybe changing the way we do access keys is a better solution.

The use of it, as it stands, for the main reason I wanted to do it, for
the utf-8 charset, was wrong however.  (And why I noted this after my
commit, that I hadn't tested it and had no idea what it would do for
utf-8 support, asking for comments/testing).

Sorry for the above two terrible sentences. ;)

> The real problem is that mbstring doesn't support
> Big-5 (along with some other mb charsets) though it claims to do so.

That may be, but truthfully I know so little about the subject that it
is meaningless to me.  And as long as it isn't working, then the fix
is still valid, even if it is only working around a limitation.

> UTF-8 doesn't need to handled by your workaround because it _is_ supported
> by mbstring, so it's safe to take single characters out of a string with
> our mutlibyte safe String:: functions.

Seems to be correct.  I didn't realize this, and no one responded to my
requests for info or testing.  But it still doesn't solve the problem
of access keys, only the problem of access keys corrupting the text strings.

> What we need to do to really fix this problem instead of workaround it,
> would be to find out why some charsets aren't working in mbstring, document
> how to enable them, and find a way to detect what charsets are supported
> and handle only these seperately, not just every multibyte charset.

No.  This might prevent the text corruption, but it wouldn't provide access
keys for most multi-byte charsets.  So more than just this is needed.  And
more that just my patch so far, as was implied when I said that my changes
were an initial attempt and not a complete solution.

> Btw, is there any special reason, that you set the maximum bytes per
> character in nls.php instead of a simple bool flag if this charset is a
> multibyte one?

Because, as I noted in the commit log, I didn't know if this information would
be useful elsewhere, or if we would want to expand it for use elsewhere, etc.
So I wanted to error on the side of too much info rather than too little.
If it turns out that we have no use for this (or other additional info, like
minimum bytes, etc) then we can change it to a boolean later.
I hinted at this in the cvs commit log also I believe.

So, to recap, there are multiple problems we need to solve here.

1) Without my changes access keys corrupt the text of some multi-byte charsets.
2) With or without my changes, non-corrupted text strings in multi-byte charsets
   have no access keys provided.
3) With my changes, the assigned access keys for charsets affected by #1
   have no real relationship to the string itself.
4) For some multi-byte charsets, there would be no way to enable native
   access keys since the native characters can't be entered on most
   keyboards, so an alternative is needed.  The standard alternative used
   by most software is to use roman (us-ascii) letters in parenthesis behind
   the string.
5) From my testing, there are still problems with the preference to disable
   the access keys (it isn't always respected) so simply having people who
   are affected turn off access keys is not a feasible solution until these
   bugs are fixed.

I consider #1 the biggest problem, so it was my starting point.  With my
patches, #1 is no longer a problem.  To me, this is at least a good temporary
improvement, until we can work out a better and more charset friendly
access key system.

I'd probably try to solve #5 next.  But if people want to, or want me to,
work on the other problems (2-4) that's fine also.

> Jan.

--
Eric Rostetter
The Department of Physics
The University of Texas at Austin

Why get even? Get odd!


More information about the dev mailing list