Page 1 of 1
Yet Another Unicode Question
Posted: Wed Aug 20, 2014 2:31 pm
by WaltBrown
I have a field with long lists of button labels. Here are the first two lines:
0.0; Languages; Available command languages; Bulgarian; Croatian; Czech; Danish; Dutch; English; Estonian; Finnish; French; German; Greek; Hungarian; Icelandic; Irish; Italian; Latvian; Lithuanian; Macedonian; Maltese; Norwegian Bokmål; Norwegian Nynorsk; Polish; Portuguese; Raeto-Romance Ladin; Raeto-Romance Surmiran; Raeto-Romance Sursilvan; Raeto-Romance Rumantsch Grischun; Romanian; Russian; Slovak; Slovene; Spanish; Swedish; Turkish;
1.1; Yes; Confirm operation; Да; Da,Izvrši; Ano; Ja,Udfør;Ja;Yes,Confirm;Jah,Kinnita;Jatka,Kyllä;OK,Oui;Ja,OK,Ausführen;Επιβεβαίωση; Igen,Oké;Staðfesta;Cinnte;Sì,Confermo;Jā;Taip;Да,Продолжи;Iva;Ja;Ja;Tak;Sim;Schi; Ea;Gie;Gea;Da;Да; Áno;Da,Potrdi,Ja;Sí,Confirmar;Ja,OK;Onayla;
The language list (0.0) is my selector for changing some of the button labels in my stacks based on the ETSI European Language selection. The second line (1.1) is one of a large number (74 in command language release 2.1.1) of command names in the aforementioned languages. UTF8 cut and pasted nicely into the field (and into this window) and allowed formatting, changing punctuation, etc. But when I select one of the items and set a button's label to it, it generally fails for the languages with non-Latin characters (totally for the Cyrillic based and Greek, and partially for the Turkics) . I tried variations of useUnicode, unicodeLabel, etc. I believe the issue is twofold - one is my ignorance of the specific details of the Unicode implementation in LC versions. The documents seem to say I can use UTF16 on button labels? I tried setting the unicodeLabels of the buttons but that also failed miserably, giving me a wide array of ideograms instead. Is there a UTF8 to UTF16 conversion method?
The other problem, obviously not in LC's domain, is the ETSI language list has almost no correlation to the Unicode language list, but that's for solution in ETSI TC HF, not here
Thanks,
Re: Yet Another Unicode Question
Posted: Wed Aug 20, 2014 4:51 pm
by endernafi
Hi there Walt,
Using Unicode in Livecode is always tricky.
Generally, this simplest method should work assuming that the selected file is saved with UTF8 Encoding.
Code: Select all
on mouseUp
answer file "Select Label File"
set the unicodeLabel of me to uniEncode(url("file:" & it), "utf8")
end mouseUp
I've tested the above code as follows:
* Put all your sample labels into a txt file.
** The editor should be the simplest one, I suggest Sublime Text or Notepad++ respectively for Mac or Windows.
** The editor's *Encoding Option for Save* should be
UTF8.
* Read that file via
url("file:" ...) function.
* Encode it via
uniEncode( ..., "UTF8").
* Put it into your button with
set the unicodeLabel of or into your fields with
set the unicodeText of respectively.
Here is the result (or proof that it works):
That's the safest way both for desktop and mobile.
Use a separate UTF8 encoded resource file for your text.
Btw, I don't know who translated your labels;
but using *OK* or *Ja* for Turkish is totally, utterly, completely wrong.
Being a professional translator for 8 whole years, I can suggest these for Turkish:
OK, Confirm ->
Tamam, Onayla
OK, Confirm Operation ->
Tamam, İşlemi Onayla
Yes, Confirm ->
Evet, Onayla
Yes, Confirm Operation ->
Evet, İşlemi Onayla
Hope it helps...
Best,
~ Ender
Re: Yet Another Unicode Question
Posted: Wed Aug 20, 2014 5:56 pm
by FourthWorld
endernafi wrote:Using Unicode in Livecode is always tricky.
Hopefully that should be past tense, to read "Unicode in LiveCode
prior to v7 was always tricky".
If would be helpful if Walt had a little time to see if what he wants to do can be done easily and gracefully in v7 (dp10 was just released this morning):
http://downloads.livecode.com/livecode/
As with all software at all times (and slightly more so since v7 is still in testing), you'll want to make sure you have good backups before using it.
But the sweeping changes in v7 will make it the reference platform for the future, so even those folks not depending on Unicode would do well to work with it as much as possible, to ensure that it works flawlessly when the final build is released.
Re: Yet Another Unicode Question
Posted: Wed Aug 20, 2014 6:03 pm
by endernafi
FourthWorld wrote:
Hopefully that should be past tense, to read "Unicode in LiveCode prior to v7 was always tricky".
Hi Richard,
Well no, it's not past tense since v7 isn't officially released; it's still Developer Preview as you've stated, too.
But you have a point, of course.
Hopefully, v7 will change my sentence to this:
Using Unicode was tricky; thanks to Livecode 7.0, it's like a breeze now.
~ Ender
Re: Yet Another Unicode Question
Posted: Wed Aug 20, 2014 6:49 pm
by FourthWorld
True, v7 isn't final yet, but it is available and needs testing.
If we put off testing until after release, we'll have made the one choice that guarantees v7 won't have been adequately testing before release.
Given the powerful scope of changes in v7, the value of testing v7 now can't be overstated if we want to be able to rely on the final version when it's released.
Re: Yet Another Unicode Question
Posted: Wed Aug 20, 2014 7:01 pm
by WaltBrown
Dang, I just extracted all the Turkish and posted it here, but my message didn't post!
Ender, thanks. Check
http://www.google.com/url?sa=t&rct=j&q= ... 1344,d.aWw for the actual language selections. The OK and Ja you saw were Swedish, just before the Turkish entry for Onayla.
Your suggestions helped somewhat but not completely. Maybe it's Windows. I may wait for LC7 rather than waste time debugging code that will be deprecated anyway.
Best, Walt
Re: Yet Another Unicode Question
Posted: Wed Aug 20, 2014 7:06 pm
by WaltBrown
Oh and by the way Ender, I like your signature - "Together we are smarter". True. I have a somewhat more cynical version - "Organizations get smarter proportional to the square root of the number of members"...
Re: Yet Another Unicode Question
Posted: Wed Aug 20, 2014 7:17 pm
by WaltBrown
Here are two snapshots of my stack, the English and Turkish buttons (You can see the Unicode issue).
Edit: Except the last button, this was a quickie and I lost the last entry so it put the button name instead.
Re: Yet Another Unicode Question
Posted: Wed Aug 20, 2014 7:17 pm
by endernafi
WaltBrown wrote:Ender, thanks. Check
thatLongUrl for the actual language selections.
I've read all Turkish sections; it's actually a pretty good translation.
So, my apologies to the translator in her/his absence
WaltBrown wrote:I may wait for LC7 rather than waste time debugging code that will be deprecated anyway.
That sounds like a fair decision, the trouble of debugging deprecated code and all...
WaltBrown wrote:Your suggestions helped somewhat but not completely. Maybe it's Windows.
It may be Windows or some other thing.
It's kinda hard to explain the *thing*.
Let me try, though:
If the unicode text should pass through an internal channel and not handled properly, it might be corrupted.
Example?
Putting the content of the resource file into a custom property then read it from that custom property.
This is a delicate process and should be handled wisely.
There might be akin situations which may corrupt the unicode text.
Anyway, if it's not urgent, you should wait -or start trying- Livecode 7.
WaltBrown wrote:"Organizations get smarter proportional to the square root of the number of members"
Very true
Regards,
Re: Yet Another Unicode Question
Posted: Wed Aug 20, 2014 9:01 pm
by endernafi
Walt hi,
You got me curious so I've installed Windows 7 on a VM and tried my proposed solution:
Clearly your problem is a bit quirky.
Please try this stack and accompanied *labels* file:
If it doesn't produce the result as above screenshot,
then you'll certainly know the problem is related to your particular Windows installation.
If it does indeed work as expected, then you can dig into your code.
Best,
~ Ender
Re: Yet Another Unicode Question
Posted: Thu Aug 21, 2014 5:20 am
by WaltBrown
Thanks Ender, that was a good starting point. Your stack works perfectly. I could also do:
Code: Select all
set the unicodeLabel of me to uniEncode(item 9 of url("file:" & it), "utf8")
I can also get the proper field display with:
Code: Select all
set the unicodeText of fld "fFileData" to uniEncode(url("file:" & it),"utf8")
I can get the chunk in a local variable. This works
Code: Select all
put item 9 of url("file:" & it) into tChunk
set the unicodeLabel of me to uniEncode(tChunk, "utf8")
The issue fails when I need to transfer a chunk of the field in a variable (or directly) to set the label. The following all fail:
Code: Select all
set the unicodeLabel of me to uniEncode(item 9 of fld "fFileData", "utf8")
set the unicodeLabel of me to uniEncode(item 9 of the unicodeText of fld "fFileData", "utf8")
// This one SHOULD work!
put item 9 of fld "fFileData" into tChunk
set the unicodeLabel of me to uniEncode(tChunk, "utf8")
put uniEncode(item 9 of fld "fFileData","utf8") into tChunk
set the unicodeLabel of me to uniEncode(tChunk, "utf8")
put item 9 of the unicodeText of fld "fFileData" into tChunk
set the unicodeLabel of me to uniEncode(tChunk,"utf8")
put uniEncode(item 9 of the unicodeText of fld "fFileData","utf8") into tChunk
set the unicodeLabel of me to uniEncode(tChunk,"utf8")
put uniEncode(item 9 of the unicodeText of fld "fFileData","utf8") into tChunk
set the unicodeLabel of me to tChunk
Arrrrgh! I should have put this away hours ago, when I decided to wait for LC7. I have a hard time dropping problems though. It seems I cannot get the data into and out of a field without it getting massaged somehow.
Re: Yet Another Unicode Question
Posted: Thu Aug 21, 2014 5:40 am
by WaltBrown
I think I finally got it. This seems to work:
Code: Select all
put url("file:" & it) into tChunk
set the unicodeText of fld "fFileData" to uniEncode(tChunk,"utf8")
set the unicodeLabel of me to the unicodeText of item 9 of fld "fFileData"
I will go back to the original app and see if I can get the labels correct now, without using an external text file. I still can't put a chunk of a field into a variable, I'll need to pass around the chunk descriptor instead.
Walt
Re: Yet Another Unicode Question
Posted: Thu Aug 21, 2014 10:40 am
by endernafi
WaltBrown wrote:
Code: Select all
// This one SHOULD work!
put uniEncode(item 9 of fld "fFileData","utf8") into tChunk
set the unicodeLabel of me to uniEncode(tChunk, "utf8")
...
put uniEncode(item 9 of the unicodeText of fld "fFileData","utf8") into tChunk
set the unicodeLabel of me to tChunk
Walt,
You're encoding the chunk
twice;
that is, you're
encoding an
already encoded text.
That's not the way.
You have to
decode it first.
Try this instead:
Code: Select all
set the unicodeText of fld "fFileData" to uniEncode(tRawText, "utf8")
put uniDecode(the unicodeText of fld "fFileData", "utf8") into tChunk
set the unicodeLabel of me to uniEncode(item 9 of tChunk, "utf8")
Here is the result:
Best,
~ Ender
Re: Yet Another Unicode Question
Posted: Thu Aug 21, 2014 3:46 pm
by WaltBrown
Thanks! That did the trick. I was manipulating chunks of the data in the file after putting it into local variables, assuming I could leave them UTF8 encoded. That almost worked. Decoding the entire field contents into a local variable, manipulating, then re-encoding the selected label after manipulating the contents worked. Once I moved from the examples we have been trading into the actual app, I had to crawl through the execution path, and remove ALL intermediate uniDecode/uniEncode steps. I can only imagine that, once UTF16 or UTF32 dependent text is involved, I will have to leave the data uniEncoded during the chunking and manipulation processes and trust LC7 to be consistent.
Thanks again.
Re: Yet Another Unicode Question
Posted: Thu Aug 21, 2014 7:01 pm
by endernafi
You're most welcome,
glad that I could help...
~ Ender