Converting Unicode to single-byte text?

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
Skyfisher
Posts: 15
Joined: Fri Jan 08, 2010 2:46 am

Converting Unicode to single-byte text?

Post by Skyfisher » Sat Jan 23, 2010 12:11 am

I am reading a tab delimited unicode file (I think UTF 16bit) and I want to convert it to simple single-byte text to make it easy to parse. For example if I look for a string such as "productX" it doesn't match anything because the text "productX" is single-byte while trying to match unicode text.

answer file "Please select a file:"
if it is empty then exit mouseUp
put it into field "File path"
open file fld "File path"
read from file fld "File path" until EOF
put it into fld "file text"

This puts the unicode text into the field, which looks like a bunch of characters with a space separating each character. I want to get ride of the special characters.

if I replace the last line above with something like:
put uniDecode(it) into fld "file text"

Then the text becomes a mess, mostly "????" question marks.

Ultimately, I simply want to read in about 800 to 2000 lines of tab delimited text and parse it out into a useful report. I welcome any guidance.
Skyfisher
Round Rock, Texas
Mac OS

Skyfisher
Posts: 15
Joined: Fri Jan 08, 2010 2:46 am

Re: Converting Unicode to single-byte text?

Post by Skyfisher » Sat Jan 23, 2010 9:48 pm

I have a poor workaround. I open the file with a text editor and then resave it as plain text (UTF-8). This allows me to parse the file without problems, however it does have a fair amount of junk charaters.

I would really like to figure out how to get the 2 byte text converted to 1 byte within runrev, without the extra step of opening the file in a text editor, saving it and then reopen it with my parser.

Any guidiance is appreciated.
Skyfisher
Round Rock, Texas
Mac OS

Skyfisher
Posts: 15
Joined: Fri Jan 08, 2010 2:46 am

Re: Converting Unicode to single-byte text?

Post by Skyfisher » Sun Jan 24, 2010 3:11 am

I think I found a solution from a Tutorial:

on mouseUp
answer file "Select Text File"
if the result is not "cancel" then
set the unicodetext of field "text" to URL ("binfile:" & it)
end if
end mouseUp


This seems to allow me to load int the UTF-16 text without further manipulation.
Skyfisher
Round Rock, Texas
Mac OS

Mark
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 5150
Joined: Thu Feb 23, 2006 9:24 pm
Contact:

Re: Converting Unicode to single-byte text?

Post by Mark » Sun Jan 24, 2010 9:55 am

Skefisher,

To convert unicode, use the following:

Code: Select all

put unidecode(myUnicodeVar,"UTF16") into myText
or if you have UTF8 data:

Code: Select all

put unidecode(uniencode(myUnicodeVar,"UTF8"),"UTF16") into myText
If your unicode text file has a BOM signture, you have to delete that from the data before doing the conversion.

Best,

Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode

Skyfisher
Posts: 15
Joined: Fri Jan 08, 2010 2:46 am

Re: Converting Unicode to single-byte text?

Post by Skyfisher » Mon Jan 25, 2010 4:11 am

Hi Mark, Thank you

When I use: put unidecode(myUnicodeVar,"UTF16") into myText

It seems to work for a few lines out of hundreds but then it follows with garbled text, as if the text does not have any linefeeds.

Question: How do I determine if my text file includes a BOM signature?
Skyfisher
Round Rock, Texas
Mac OS

Mark
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 5150
Joined: Thu Feb 23, 2006 9:24 pm
Contact:

Re: Converting Unicode to single-byte text?

Post by Mark » Wed Jan 27, 2010 4:09 pm

Skyfisher,

For information about BOM signatures, just search on the net: http://qurl.tk/37 and http://qurl.tk/38

I think I made a small mistake. If you want to convert unicode text into standard ASCII text, you use:

Code: Select all

unidecode(myUnicodeVar,"English")
or

Code: Select all

unidecode(myUnicodeVar)
If you want to convert to UTF8, you do

Code: Select all

unidecode(myUnicodeVar,"UTF8")
If you want to convert from UTF8 into UTF 16, you do

Code: Select all

uniencode(myUnicodeVar,"UTF8")
Uniencode always converts from something into UTF16, while unidecode converts from UTF16 into something else. So, if you want to convert from J-Shift into UTF8 you need:

Code: Select all

unidecode(uniencode(myUnicodeVar,"J-Shift"),"UTF8")
(let's hope I didn't make any mistakes this time)

Best,

Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode

Post Reply

Return to “Getting Started with LiveCode - Complete Beginners”