Converting Unicode to single-byte text?
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller
Converting Unicode to single-byte text?
I am reading a tab delimited unicode file (I think UTF 16bit) and I want to convert it to simple single-byte text to make it easy to parse. For example if I look for a string such as "productX" it doesn't match anything because the text "productX" is single-byte while trying to match unicode text.
answer file "Please select a file:"
if it is empty then exit mouseUp
put it into field "File path"
open file fld "File path"
read from file fld "File path" until EOF
put it into fld "file text"
This puts the unicode text into the field, which looks like a bunch of characters with a space separating each character. I want to get ride of the special characters.
if I replace the last line above with something like:
put uniDecode(it) into fld "file text"
Then the text becomes a mess, mostly "????" question marks.
Ultimately, I simply want to read in about 800 to 2000 lines of tab delimited text and parse it out into a useful report. I welcome any guidance.
answer file "Please select a file:"
if it is empty then exit mouseUp
put it into field "File path"
open file fld "File path"
read from file fld "File path" until EOF
put it into fld "file text"
This puts the unicode text into the field, which looks like a bunch of characters with a space separating each character. I want to get ride of the special characters.
if I replace the last line above with something like:
put uniDecode(it) into fld "file text"
Then the text becomes a mess, mostly "????" question marks.
Ultimately, I simply want to read in about 800 to 2000 lines of tab delimited text and parse it out into a useful report. I welcome any guidance.
Skyfisher
Round Rock, Texas
Mac OS
Round Rock, Texas
Mac OS
Re: Converting Unicode to single-byte text?
I have a poor workaround. I open the file with a text editor and then resave it as plain text (UTF-8). This allows me to parse the file without problems, however it does have a fair amount of junk charaters.
I would really like to figure out how to get the 2 byte text converted to 1 byte within runrev, without the extra step of opening the file in a text editor, saving it and then reopen it with my parser.
Any guidiance is appreciated.
I would really like to figure out how to get the 2 byte text converted to 1 byte within runrev, without the extra step of opening the file in a text editor, saving it and then reopen it with my parser.
Any guidiance is appreciated.
Skyfisher
Round Rock, Texas
Mac OS
Round Rock, Texas
Mac OS
Re: Converting Unicode to single-byte text?
I think I found a solution from a Tutorial:
on mouseUp
answer file "Select Text File"
if the result is not "cancel" then
set the unicodetext of field "text" to URL ("binfile:" & it)
end if
end mouseUp
This seems to allow me to load int the UTF-16 text without further manipulation.
on mouseUp
answer file "Select Text File"
if the result is not "cancel" then
set the unicodetext of field "text" to URL ("binfile:" & it)
end if
end mouseUp
This seems to allow me to load int the UTF-16 text without further manipulation.
Skyfisher
Round Rock, Texas
Mac OS
Round Rock, Texas
Mac OS
Re: Converting Unicode to single-byte text?
Skefisher,
To convert unicode, use the following:
or if you have UTF8 data:
If your unicode text file has a BOM signture, you have to delete that from the data before doing the conversion.
Best,
Mark
To convert unicode, use the following:
Code: Select all
put unidecode(myUnicodeVar,"UTF16") into myText
Code: Select all
put unidecode(uniencode(myUnicodeVar,"UTF8"),"UTF16") into myText
Best,
Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode
Re: Converting Unicode to single-byte text?
Hi Mark, Thank you
When I use: put unidecode(myUnicodeVar,"UTF16") into myText
It seems to work for a few lines out of hundreds but then it follows with garbled text, as if the text does not have any linefeeds.
Question: How do I determine if my text file includes a BOM signature?
When I use: put unidecode(myUnicodeVar,"UTF16") into myText
It seems to work for a few lines out of hundreds but then it follows with garbled text, as if the text does not have any linefeeds.
Question: How do I determine if my text file includes a BOM signature?
Skyfisher
Round Rock, Texas
Mac OS
Round Rock, Texas
Mac OS
Re: Converting Unicode to single-byte text?
Skyfisher,
For information about BOM signatures, just search on the net: http://qurl.tk/37 and http://qurl.tk/38
I think I made a small mistake. If you want to convert unicode text into standard ASCII text, you use:
or
If you want to convert to UTF8, you do
If you want to convert from UTF8 into UTF 16, you do
Uniencode always converts from something into UTF16, while unidecode converts from UTF16 into something else. So, if you want to convert from J-Shift into UTF8 you need:
(let's hope I didn't make any mistakes this time)
Best,
Mark
For information about BOM signatures, just search on the net: http://qurl.tk/37 and http://qurl.tk/38
I think I made a small mistake. If you want to convert unicode text into standard ASCII text, you use:
Code: Select all
unidecode(myUnicodeVar,"English")
Code: Select all
unidecode(myUnicodeVar)
Code: Select all
unidecode(myUnicodeVar,"UTF8")
Code: Select all
uniencode(myUnicodeVar,"UTF8")
Code: Select all
unidecode(uniencode(myUnicodeVar,"J-Shift"),"UTF8")
Best,
Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode