textDecode/Encode problems

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: Klaus, FourthWorld, heatherlaine, kevinmiller

Post Reply
Hans-Helmut
Posts: 57
Joined: Sat Jan 14, 2017 6:44 pm

textDecode/Encode problems

Post by Hans-Helmut » Fri Mar 31, 2017 3:40 pm

I am using Windows 10, LC 9 dp6.

As officially suggested, working with outside files, the textEncode () and textDecode () functions must be used.
After some testing, I do not know how to solve my current text encoding and decoding problem.
As a text editor, I am using the latest version of Notepad++.
Downloading a text file from Google (exported contacts), the original text file "google.csv" is encoded in "UCS-2 LE BOM". I do not know what this actually means, but my usual textDecode ( text , "UTF-8" ) does not work here for any nonstandard characters.
The text to test:
Программируем
Bewußt über das Thema diskutieren
The Russian text is not supported in "ANSI" for West-European languages and will not show.
  • Reading this Google-encoded file using UTF-8 will neither show German special "Umlaute" nor the Russian text (which appears as question marks).
  • Reading it without decoding correctly shows European and German characters, but no Russian characters
  • Converting the file using Notepad++ to UTF-8 works ok. Then also textDecode(text, "UTF-8") will work correctly.
So, does this mean that we can not be sure that such encoded text files will not work with our standard way of decoding? It would mean first opening such files with a third party tool (text editor), converting them to "UTF-8", and then importing them into LiveCode. Correct?

Is there a way that we could convert a text file from such format "UCS-2 LE BOM" to "UTF-8" using LiveCode? Unfortunately, I have not been able doing this.


To read the file, I tried two ways:

Code: Select all

...
open file tFileName
   read from file tFileName until eof -- end of file
close file tFileName
put textDecode ( it , "UTF-8" ) into field "test"
...
And the other way:

Code: Select all

...
get URL("file:"&tFileName)
put textDecode ( it , "UTF-8" ) into field "test"
...
Since we are newbies here, a short hint: In both code snippets, the variable tFileName contains the full path to the file.
Now a side question: What is better, using "reading from file" or the shorter "URL" syntax?

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 772
Joined: Wed Nov 22, 2006 3:42 pm
Location: France
Contact:

Re: textDecode/Encode problems

Post by Thierry » Fri Mar 31, 2017 4:07 pm

Hallo Hans-Helmut,

Here is some information from http://www.unicode.org/faq/utf_bom.html

HTH,

Thierry

Q: What is the difference between UCS-2 and UTF-16?

A: UCS-2 is obsolete terminology which refers to a Unicode implementation up to Unicode 1.1, before surrogate code points and UTF-16 were added to Version 2.0 of the standard. This term should now be avoided.

UCS-2 does not describe a data format distinct from UTF-16, because both use exactly the same 16-bit code unit representations. However, UCS-2 does not interpret surrogate code points, and thus cannot be used to conformantly represent supplementary characters.

Sometimes in the past an implementation has been labeled "UCS-2" to indicate that it does not support supplementary characters and doesn't interpret pairs of surrogate code points as characters. Such an implementation would not handle processing of character properties, code point boundaries, collation, etc. for supplementary characters. [AF]
Regex LiveCode sunnYrex
https://sunny-tdz.com

Klaus
Posts: 11873
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: textDecode/Encode problems

Post by Klaus » Fri Mar 31, 2017 4:31 pm

Hallo Hans-Helmut,

I am definitively not an encoding/Unicode expert, but you should read the file as BINARY!

Code: Select all

...
get URL("binfile:"&tFileName)
put textDecode ( it , "UTF-8" ) into field "test"
...
open file tFileName for binary read
read from file tFileName until eof -- end of file
close file tFileName
put textDecode (it , "UTF-8" ) into field "test"
...
If the file to read is not too big, maybe a couple of MB, then -> url("bin/file:...) should be used.
If not alone for its shortness! :D

But this way the COMPLETE file is read into memory at once!

When it comes to read/write to really BIG files like logfiles of some hundred or even more MB, then "open/read/close file..."
is the way to go, since you have more options e.g. to read only for a couple of lines or whatever.

Hope that helps.


Best

Klaus

Hans-Helmut
Posts: 57
Joined: Sat Jan 14, 2017 6:44 pm

Re: textDecode/Encode problems

Post by Hans-Helmut » Fri Mar 31, 2017 5:30 pm

Yes, helping comment. Thank you very much ! (These are best practice hints!)
I just continued testing. I will also test big data and see where and when the problems appear.

In this special use case, I found a work-around, but it is not a generic solution, unfortunately.
1. Reading the Google .csv file without using textDecode.
2. Writing the content back to the file and now encode it with UTF-8 to the file using textEncode ( text , "UTF-8" ).
3. From now on always reading using textDecode ( text , "UTF-8" ) since somehow now the file "knows" that it is "UTF-8".
4. Better: Reading from original file only once without textDecode(), writing the content to a new file using textEncode(). Then only use the new file using textDecode() for reading.

Here is the code:

Code: Select all

on mouseUp
   put specialfolderpath("desktop")&"/Google Contacts Convert" into fPath -- Folder path to file
   put "/Test.csv" after fPath  -- Whatever filename user is using
   put "file:" before fPath -- "file:" This keyword must be put into this variable, otherwise function does not work

   get URL (fPath) -- Read the file's content.
   -- get textDecode ( it , "UTF-8" ) -- ONLY WORKS FROM SECOND TIME
   put it into msg -- Testing the output. First time it is ok without using textDecode

   put textEncode( it ,"UTF-8") into URL fpath // Writing content back to file

   get URL (fPath)  -- Reading again second time
   get textDecode ( it , "UTF-8" ) -- Second time read: textDecode() is required
   put it into msg -- Testing the output
end mouseUp

# AVOID calling this function twice in this special case. Only once. 
# Writing the text back to the file converts the file to "UTF-8".
# Starting reading a second time requires using textDecode ( text , "UTF-8" )
# Hint: I am actually writing to a different file keeping the original file untouched.

# Not working: get URL "file:" & "C:/.../test.txt" for unknown reasons.
# Not working: put " some text " into URL "file:" & "C:/.../test.txt" for unkown reasons.
# But combining the keyword "file:" and the file path into the variable fPath works.

Klaus
Posts: 11873
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: textDecode/Encode problems

Post by Klaus » Fri Mar 31, 2017 5:55 pm

Hallo Hans-Helmut,
# Not working: get URL "file:" & "C:/.../test.txt" for unknown reasons.
# Not working: put " some text " into URL "file:" & "C:/.../test.txt" for unkown reasons.
both do not work for the same reason: Bad concatenation of the filename!
Do like this:

Code: Select all

get URL ("file:" & "C:/.../test.txt")
put " some text " into URL ("file:" & "C:/.../test.txt")
The parens force the engine to evaluate the expression in parens first, so no error and success!

Same for everything you concatenate like filenames and object names!
Example:

Code: Select all

...
## e.g. in a repeat loop through all your NUMBERED buttons: myButt1 ... myButt5
repeat wit i = 1 to 5
  ## Error: no such object!
  ## send "mouseup" to btn "myButt" & i

  ## Works:
  send "mouseup" to btn ("myButt" & i)
end repeat
...
Best

Klaus

Hans-Helmut
Posts: 57
Joined: Sat Jan 14, 2017 6:44 pm

Re: textDecode/Encode problems

Post by Hans-Helmut » Fri Mar 31, 2017 6:03 pm

Klaus:
When it comes to read/write to really BIG files like logfiles of some hundred or even more MB, then "open/read/close file..."
is the way to go, since you have more options e.g. to read only for a couple of lines or whatever.
Yes, and I learned that for really big files it is not possible to read all at once into memory at all. On my machine, this always leads to crashes. I do have such files. Then the only way is reading line by line and pausing in the repeat loop using "wait 0 milliseconds with messages" as an instruction put after the "repeat...".
If there are no well-defined line or record delimiters then it is possible to just read a chunk of data and then pause with the wait message which takes an unnoticeable time to execute.
Somewhere someone wrote about that it is more efficient reading small chunks of data within the repeat loop than reading big chunks. There is some "best byte chunk". :?
Again - this would all be "best practice".
There should be some place for "best practice" in LiveCode addressing all such experiences.

Hans-Helmut
Posts: 57
Joined: Sat Jan 14, 2017 6:44 pm

Re: textDecode/Encode problems

Post by Hans-Helmut » Fri Mar 31, 2017 6:11 pm

Klaus:

Code: Select all

get URL ("file:" & "C:/.../test.txt")
put " some text " into URL ("file:" & "C:/.../test.txt")
Well, yes, in the comment to my script I have addressed the issue. I still do not understand why it would not work. In my case, it works as I am putting all into one variable. I found out through test and error.

Putting get URL ("file:" & "C:/.../test.txt") works if the path string is typed. If it is not typed but put into a variable such as "file:"&path it does not work. If I am putting "file:"& "C:/.../test.txt" all together into a variable such as fPath then works with the expression "get URL (fPath)".

Why does the concatenation in the combined expression not work?

Klaus
Posts: 11873
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: textDecode/Encode problems

Post by Klaus » Fri Mar 31, 2017 6:19 pm

Hallo Hans-Helmut,

hm, just made a test and this does work of course, like in the 17 years before :D

Code: Select all

on mouseUp
   put "/Users/klaus/Documents/Texte/Diverses/FZdiscography.txt" into tFile
   put url("file:" & tFile) into fld 1
end mouseUp
So does this:

Code: Select all

on mouseUp
   put "/Users/klaus/Documents/Texte/Diverses/FZdiscography.txt" into tFile
   put "file:" before tFile
   put url(tFile) into fld 1
end mouseUp
And this:

Code: Select all

on mouseUp
   put url("file:/Users/klaus/Documents/Texte/Diverses/FZdiscography.txt") into fld 1
end mouseUp
macOS 10.12.4 and LC 9 dp6.


Best

Klaus

Hans-Helmut
Posts: 57
Joined: Sat Jan 14, 2017 6:44 pm

Re: textDecode/Encode problems

Post by Hans-Helmut » Fri Mar 31, 2017 7:20 pm

Klaus, you are doing this for 17 years?

OK. )))

What is wrong then with me for one of the variants? The only difference I can see is that there is a "C:/" in front of my path (created an absolute path). Just also tested again.

I will also test again, otherwise, edit my post... I hope you are right !!! ) I also had a case where there was some invisible character and I could just pull my hairs without finding the reason why something did not work, and then retyping solved the problem.

Hans-Helmut

Klaus
Posts: 11873
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: textDecode/Encode problems

Post by Klaus » Fri Mar 31, 2017 7:39 pm

Hi Hans-Helmut,
Hans-Helmut wrote:Klaus, you are doing this for 17 years?
yep, started at the end of 1999 with the first Win and Mac version MetaCard, the grandfather of Livecode. :D
MetaCard has been around since 1992 but was UNIX-only until 1999. Its old website is still available: http://metacard.com
Hans-Helmut wrote:What is wrong then with me for one of the variants? The only difference I can see is that there is a "C:/" in front of my path (created an absolute path). Just also tested again.
Macs do not have a character at the beginning of volumes!

Maybe try again without the extra concatenation:

Code: Select all

get URL ("file:C:/.../test.txt")
put " some text " into URL ("file:C:/.../test.txt")
Don't think this is the culprit, but who knows? :D

Best

Klaus

Post Reply

Return to “Getting Started with LiveCode - Complete Beginners”