Unicode Field routines
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller
Unicode Field routines
Difficulties in the use of unicode in Revolution are persuading me NOT to move my extensive Chinese study materials from HTML+Javascript into Revolution. That is disappointing. Just when I think I am beginning to make progress, another problem arises. For example :
A stack which I have been studying (http://revolution.byu.edu/unicode/unico ... utines.rev) says "And field 1 contains four "000A" characters, you can see they are RED." Yes, I see them, and they are red. I also find that, in my own exploratory work, that particular Chinese character is always a problem. Why? (Curious that no explanation was supplied in that stack!)
JC
A stack which I have been studying (http://revolution.byu.edu/unicode/unico ... utines.rev) says "And field 1 contains four "000A" characters, you can see they are RED." Yes, I see them, and they are red. I also find that, in my own exploratory work, that particular Chinese character is always a problem. Why? (Curious that no explanation was supplied in that stack!)
JC
Re: Unicode Field routines
Hi JC,
Do you still want to try to solve this problem or have you decided not to use Revolution? If you still want to continue with Revolution, why is 000A causing you problems? Is this character a return character?
Best,
Mark
Do you still want to try to solve this problem or have you decided not to use Revolution? If you still want to continue with Revolution, why is 000A causing you problems? Is this character a return character?
Best,
Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode
Re: Unicode Field routines
Mark, I'm still trying to make up my mind. This character, for example, 上, referred to in the source I quoted as 000A, is always omitted when I use code to move Chinese from one field, where I have pasted it, to another. It has taken me several hours to obtain Chinese-containing text from an external file, put it into a field, and then split the 3 items of each line (Chinese characters, pinyin representation, and English) into 3 other fields. So I have just about completed (apart from the 上 problem) the first two of a set of functions I am going to need for Chinese work. I hope I'm going to get better at it. If I do, I'll stay with Revolution. But if the 上 problem remains, or others appear, I'll be very sad, but I'll have to go back to quick'n'easy javascript.
JC
JC
Re: Unicode Field routines
Further study of (http://revolution.byu.edu/unicode/unico ... utines.rev) reveals the use of "countUnicodeLines(tdata)" and "UnicodeLineOffset(tdata,5)". These, and any others relating to Unicode, could be useful to me. But I cannot find them defined anywhere in the stack in question, nor can I find them in the Revolution Dictionary. So how can they work? And is there some other source of ready-made and potentially useful Unicode functions which I don't know about?
JC
JC
Re: Unicode Field routines
Hi JC,
the routines are self defined functions that are and
these functions are in the script of the stack, the stack script. Have a look at it. You see from the rest of the stack how they are called and what to do with what they return.
regards
Bernd
the routines are self defined functions that are
Code: Select all
function countUnicodeLines @tdata
put number of characters of tdata into tlength
set useunicode to true
put 1 into tlinecount
repeat with i = 1 to tlength step 2
if chartonum(char i to i+1 of tdata) is 10 then add 1 to tlinecount
end repeat
return tlinecount
end countUnicodeLines
Code: Select all
function UnicodeLineOffset @tdata,whichline
put number of characters of tdata into tlength
set useunicode to true
put 1 into tlinecount
put 1 into tlineoffset
repeat with i = 1 to tlength step 2
if tlinecount is whichline then exit repeat
if chartonum(char i to i+1 of tdata) is 10 then
add 1 to tlinecount
put i+2 into tlineoffset
end if
end repeat
if whichline > tlinecount then put tlength + 1 into tlineoffset
return tlineoffset
end UnicodeLineOffset
regards
Bernd
Re: Unicode Field routines
Hi JC,
Can you attach a simple text containing the 上 character? Just to make sure that I have a unicode file with some valid Chinese? I would like to do a little experiment with it.
As I said, 000A is a return character. Is isn't a 上 character. Just isn't. Apparently, there is some incompatibility between your text and the UTF16 encoding.
Best regards,
Mark
Can you attach a simple text containing the 上 character? Just to make sure that I have a unicode file with some valid Chinese? I would like to do a little experiment with it.
As I said, 000A is a return character. Is isn't a 上 character. Just isn't. Apparently, there is some incompatibility between your text and the UTF16 encoding.
Best regards,
Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode
Re: Unicode Field routines
Hi Mark. This is the first time I have ever attached anything to a reply. I don't know if I have done it correctly. My stored texts are usually in UTF 8 format. I have made a second copy of the one-line sample in UTF 16 format, and am hoping to upload both.
Thanks for your interest
JC
Thanks for your interest
JC
Re: Unicode Field routines
No attachment 

The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode
Re: Unicode Field routines
Hi Mark,
I'm having a bad morning. I don't know how to send attachments, and cannot find out. The 17 references to "attach" in the FAQ do not tell me. I went though the process of selecting the files to upload, and assumed that they would then be uploaded when I submitted the message. I must have omitted an important step.
I have checked in the stack where I got the idea that 上 is 000A. It says "There are 14 lines in field 1.
And field 1 contains four "000A" characters,
you can see they are RED."
The red character is 上. The text is not Chinese. I assume it is Japanese. But that should not affect a character's unicode number, should it?
The two little files I have tried to send you each contain
上,shang4,above 上,shang4,above
PS Bernd, thanks for your help on another matter relating to unicode.
I'm having a bad morning. I don't know how to send attachments, and cannot find out. The 17 references to "attach" in the FAQ do not tell me. I went though the process of selecting the files to upload, and assumed that they would then be uploaded when I submitted the message. I must have omitted an important step.
I have checked in the stack where I got the idea that 上 is 000A. It says "There are 14 lines in field 1.
And field 1 contains four "000A" characters,
you can see they are RED."
The red character is 上. The text is not Chinese. I assume it is Japanese. But that should not affect a character's unicode number, should it?
The two little files I have tried to send you each contain
上,shang4,above 上,shang4,above
PS Bernd, thanks for your help on another matter relating to unicode.
Re: Unicode Field routines
JC,
zip your stuff before uploading. Than it will work whatever is in it.
regards
Bernd
zip your stuff before uploading. Than it will work whatever is in it.
regards
Bernd
Re: Unicode Field routines
JC,
To put an end to all the confusion, the hex equivalent of 上 is a0e4 and the hex equivalent of a return (actually a linefeed in RunRev) is 000a. The unicode stack to which you linked works correctly, but the mention of the 上 characters makes no sense to me.
Best,
Mark
To put an end to all the confusion, the hex equivalent of 上 is a0e4 and the hex equivalent of a return (actually a linefeed in RunRev) is 000a. The unicode stack to which you linked works correctly, but the mention of the 上 characters makes no sense to me.
Best,
Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode
Re: Unicode Field routines
Thank you Mark and Bernd. Unfortunately many of the sources I have been using to understand and learn how to manipulate text containing unicode appear maddeningly incomplete (their authors assume I know more than I do) or "make no sense" as in the case of the specific character we were discussing. I have dropped the approach I was pursuing, and am trying an alternative which looks more promising.
On a different matter entirely, why are the methods of opening a card script and a stack script so different?
JC
On a different matter entirely, why are the methods of opening a card script and a stack script so different?
JC
Re: Unicode Field routines
JC,
In the property inspector, if the focus is on the stack then you can choose in the right arrow below the lock "edit script".
In the menu "Tools" if you choose the "Application Browser" and select in the left pane the stack, than do a right click and it offers you to go to the script of the stack.
So there are many ways to get at the script. The "Application Browser" tellse you the number of lines of a script, so there you can see whether there is a scrip at all and what objects do have scripts.
In Rev at first it can be confusing where all the scripts are since virtually any object can have a script. It helps to read on the "message hierarchy" either in Shafers book or in the "Revolution User Guide".
It just takes a little to get used to but when you understand the message hierarchy it is quite logical.
regards
Bernd
If you have a stack open an in front and look at the Object menu there you have the opions "Card Script" and "Stack Script". They lead you to the respective scripts.why are the methods of opening a card script and a stack script so different?
In the property inspector, if the focus is on the stack then you can choose in the right arrow below the lock "edit script".
In the menu "Tools" if you choose the "Application Browser" and select in the left pane the stack, than do a right click and it offers you to go to the script of the stack.
So there are many ways to get at the script. The "Application Browser" tellse you the number of lines of a script, so there you can see whether there is a scrip at all and what objects do have scripts.
In Rev at first it can be confusing where all the scripts are since virtually any object can have a script. It helps to read on the "message hierarchy" either in Shafers book or in the "Revolution User Guide".
It just takes a little to get used to but when you understand the message hierarchy it is quite logical.
regards
Bernd
Re: Unicode Field routines
For several days I made pleasing progress. I was splitting Chinese dialogues, each one in a field, into separate lines of dialogue, one line per field. New fields were created as needed. Using callbacks from a player, I could then get each line of text to be shown as it was spoken by the player. Splendid. I was on the brink of being well and truly hooked. Then I noticed that some of my lines of dialogue were being split into two lines, with two new fields being created instead of one.
As soon as I was aware of the problem, it was short work to locate the source of the problem. It's that 上 character, behaving as though it is a 'return'.
I made a couple of tests (with acknowledgements to Devin Asay of Brigham Young University) with a button and a field. I put the character 上 into a field called chinText :
on mouseUp
set the useUnicode to true
put charToNum(char 1 to 2 of fld "chinText") -- returns 19978
end mouseUp
Then I tried it the other way, using 19978 :
on mouseUp
set the useUnicode to true
set the unicodeText of fld "chinLetter" to numToChar(19978) -- the letter 上 should appear in the field, and it does.
end mouseUp
So far so good. But when I try to move this line of a field:
这是你第上一次来中国吗?
into a new field of its own, I get two new fields
这是你第
一次来中国吗?
You see the problem. The 上 has disappeared. A return has taken its place.
The code which moves the text is a line such as:
set the unicodeText of fld "Field2" to the unicodeText of line lineNumber of fld "Field1"
It always works, unless there is a 上 in it.
The character 上 is listed as being the fourteenth most common character in the Chinese language. Shall I write special character-by-character checks to watch out for it whenever it appears? No way. JC
As soon as I was aware of the problem, it was short work to locate the source of the problem. It's that 上 character, behaving as though it is a 'return'.
I made a couple of tests (with acknowledgements to Devin Asay of Brigham Young University) with a button and a field. I put the character 上 into a field called chinText :
on mouseUp
set the useUnicode to true
put charToNum(char 1 to 2 of fld "chinText") -- returns 19978
end mouseUp
Then I tried it the other way, using 19978 :
on mouseUp
set the useUnicode to true
set the unicodeText of fld "chinLetter" to numToChar(19978) -- the letter 上 should appear in the field, and it does.
end mouseUp
So far so good. But when I try to move this line of a field:
这是你第上一次来中国吗?
into a new field of its own, I get two new fields
这是你第
一次来中国吗?
You see the problem. The 上 has disappeared. A return has taken its place.
The code which moves the text is a line such as:
set the unicodeText of fld "Field2" to the unicodeText of line lineNumber of fld "Field1"
It always works, unless there is a 上 in it.
The character 上 is listed as being the fourteenth most common character in the Chinese language. Shall I write special character-by-character checks to watch out for it whenever it appears? No way. JC
Re: Unicode Field routines
Hi JC,
Apparently, you are doing something wrong when splitting your fields. You need to keep in mind that unicodeText is binary. It isn't text. For example, of you use syntaxy such as
things will go completely wrong. One of the reasons is that all unicode characters consist of two binary symbols. These symbols can include commas, tabs, returns, linefeeds, quotes, etc. In other words, all references to items, lines and words are completely useless. That's why you need to write your own routines, to find the correct items, lines and words (probably that was the purpose of the stack, which had the 上 characters coloured red). This explains why
doesn't work. Your syntax contains a reference to lines, while the binary 上 symbol is composed of a linefeed (which behaves as a return in RunRev) and a NULL. To find all lines in your text, you could use a repeat loop. The following example finds the first line.
Note that bytes are the same as chars in this example, but using "byte" makes it clear that we are actually not working with characters, since the actual characters as displayed in the field each consist of 2 bytes.
The example should allow you to write your own scripts to find all lines in your field.
Best,
Mark
Apparently, you are doing something wrong when splitting your fields. You need to keep in mind that unicodeText is binary. It isn't text. For example, of you use syntaxy such as
Code: Select all
put line 1 of fld 1 into x
put word 2 to -1 of fld 1 into x
put item 4 of fld 1 into x
Code: Select all
set the unicodeText of fld "Field2" to the unicodeText of line lineNumber of fld "Field1"
Code: Select all
repeat with x = 1 to (number of chars of fld "Field1" - 1) step 2
if byte x to (x+1) of fld "Field1" is linefeed & NULL then
set the unicodeText of fld "Field2" to byte 1 to (x+1) of fld "Field1"
exit repeat
end if
end repeat
The example should allow you to write your own scripts to find all lines in your field.
Best,
Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode