Page 1 of 1

Unicode Special Casing and Locale Support

Posted: Fri May 23, 2014 11:49 am
by Peter Wood
LiveCode fails 4 of my 16 UnicodeOutOfTheBox tests. The failing tests are:

11. Special Case - Turkish - Upper case "i"
Set locale/Language to indicate Turkish
Upper case "i"
The expected result is U+0130
U+0130 is UTF-8 C4 B0

(Requires the ability to indicate that Turkish language rules should be used.)

12. Special Case - Turkish - Lower case "I"
Set locale/Language to indicate Turkish
Lower case "I"
The expected result is U+0131
U+0130 is UTF-8 C4 B1

(Requires the ability to indicate that Turkish language rules should be used.)

13. Upper Case sharp s (U+00DF)
Upper case "straU+00DFe"
The expected result is "STRASSE"
U+00DF is UTF-8 C3 9F

16. Performing case insensitive comparison
Compare "weiss" with "weiß"
The expected result is true
ß is U+00DF, UTF-8 C3 9F

Should I report these as a bug?

Re: Unicode Special Casing and Locale Support

Posted: Fri May 23, 2014 12:06 pm
by LCfraser
Hi Peter,

We are aware of all of these issues but the fix isn't as simple as it would seem at first.

For issues 11 and 12, as you point out, you need a method to indicate that you want the Turkish rules to be used - this is something that we are considering for future versions of LiveCode.

Issues 13 and 16 seem simple - DP1 and DP2 would have passed these tests but we changed things in DP3/DP4 so the "simple" case mapping rules are used (this means that changing the case of a character does not change its length). In other words, we went from being right in some situations and wrong in others to being right in a different set of situations but wrong in things that were previously right...

The problem with ß mapping to SS is that it is incorrect in some situations - we have been informed than in some contexts (e.g. certain variants of Swiss German), ß stays as it is when uppercased, while in others a special capital-ß character is used. Again, we will need someway to indicate what rules are in use (de_DE vs de_CH vs gsw_CH). Unlike the Turkish situation, however, it isn't obvious which set of rules we should adopt. We've had some debate internally about what we should do about it but haven't yet come to any conclusions.

(One point that was brought up was that non-German-speaking people might be confused that "strass" and "straß" would refer to the same variable, as variable names are always case-insensitive. On the other hand, German speakers would probably desire this).

Turns out, internationalisation can be tricky ;)


Re: Unicode Special Casing and Locale Support

Posted: Fri May 23, 2014 3:58 pm
by Peter Wood

Thanks for the detailed and informative response. I look forward to the final solution on locales. As somebody mentioned on the mailing list, this is a great opportunity to add value to LiveCode. A locale "object" could hold all sorts of valuable location dependant information, time zone, number formats (999,999.99 versus 999.999,99 etc), currency symbols, and no doubt many other properties.

On the case folding issue, I would like to take the opportunity to respond with what I hope is a pragmatic view.

You and your sources are clearly much better informed than I on the vagaries of natural languages. Perhaps I don't have a sufficiently enquiring mind as I was satisfied to read in Wikipedia that the ß character was not used in Switzerland or Lichtenstein.

My pragmatic view is that it is Unicode support is being implemented in LiveCode 7.0 and so it is sufficient to implement the Unicode standard.

When it comes to special casing, I would like to suggest that implementing the special casings in ... -3.2.0.txt would be the correct thing to do. In the list of special casings there are only two small sets of locale (language) specific casings - (a) Lithuanian & (b) Turkish and Azeri. All other casings in the list are not considered locale specific, at least by the Unicode Consortium.

So I would suggest that the standard (or default) uppercase of "ß" is "SS".

Providing additional locale support beyond that required by Unicode would be a real bonus for users that needed it.