Getting JSON from Google Sheets with unicode chars

Anything beyond the basics in using the LiveCode language. Share your handlers, functions and magic here.

Moderators: Klaus, FourthWorld, heatherlaine, robinmiller, kevinmiller

kaveh1000
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 337
Joined: Sun Dec 18, 2011 7:23 pm
Location: London
Contact:

Getting JSON from Google Sheets with unicode chars

Post by kaveh1000 » Sat Aug 31, 2019 1:55 pm

Hi all

Starting with this post from André:

https://andregarzia.com/2018/11/reading ... ecode.html

I have been trying to get data from Google Sheets, but in JSON as opposed to CSV. With some excellent offline help I have made great progress. A particular problem is that non-ascii characters are corrupted in the JSON I get into LiveCode. Here is the sheet I am trying to import:

https://docs.google.com/spreadsheets/d/ ... edit#gid=0

Please note the bullets and degree signs in the last column. I want these to be in the JSON exactly as they are. If you download the sheet as CSV or TSV, they are indeed downloaded as such. But in the JSON they come out as:

\u2022 and °

I attach a minimal stack that imports the data. Please see script of button and stack.

Any hints welcome.

Regards
Kaveh
Attachments
json from Google Sheets.zip
(1.6 KiB) Downloaded 20 times
Kaveh

bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3321
Joined: Sun Jan 07, 2007 9:12 pm
Location: Bochum, Germany

Re: Getting JSON from Google Sheets with unicode chars

Post by bn » Fri Sep 27, 2019 9:51 am

Hi Kaveh,

I don't know if you have solved this.

Here is some code that fits into your stack that you might try.

Code: Select all

on mouseup
   put "https://docs.google.com/spreadsheets/d/1hadj4bbJcSsgWIFaupz-_O_2ZAX92jOBP5RhakFpG24/edit?usp=sharing" \
         into tURL
   put "any_name_with_no_space" into tSheet
   -- put break point below to see array returned
   put _get_data_from_Google_Sheet(tURL, tSheet) into tData
   
   ## takes care of "degrees"
   put textDecode(tData, "utf8") into tData
   
   ## clear templateField
   reset the templateField
   repeat
      put offset("\u", tData) into tOffset
      if tOffset > 0 then
         
         ## isolate hexadecimal number assuming all start with \u and are 4 chars long
         put char tOffset + 2 to tOffset + 5 of tData into tText
         
         ## convert hexadecimal number to html entity (hexadecimal) like "•"
         put "&#x" & tText & ";" into tHTML
         
         ## generate htmlText
         set the htmlText of the templateField to tHTML
         
         ## textual representation of hecadecimal number
         put the text of the templateField into tText
         
         ## replace json-hexadecimal with text representation
         put tText into char tOffset to tOffset + 5 of tData 
      else
         ## clear templateField
         reset the templateField
         exit repeat
      end if
   end repeat
   
   
   ## (optional) changes \n into linefeed
   put format(tData) into tData
   
   put tData into field 2
end mouseup

-- put "•" into tTest
Kind regards
Bernd

kaveh1000
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 337
Joined: Sun Dec 18, 2011 7:23 pm
Location: London
Contact:

Re: Getting JSON from Google Sheets with unicode chars

Post by kaveh1000 » Fri Sep 27, 2019 10:02 am

Thank you so much Bernd. What I was looking for was

Code: Select all

put textDecode(tData, "utf8") into tData
that I got from Richmond's post here:

https://forums.livecode.com/viewtopic.php?f=9&t=33141

It seems to correct all characters. I tested with around 20. Do I still need your code to test for \u ?
Kaveh

bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3321
Joined: Sun Jan 07, 2007 9:12 pm
Location: Bochum, Germany

Re: Getting JSON from Google Sheets with unicode chars

Post by bn » Fri Sep 27, 2019 10:08 am

Do I still need your code to test for \u ?
I don't know but the code decodes \uxxxx

Although in a somewhat "homemade" fashion.
\u2022 which is used to represent the ° was not decoded. So it might be useful for exotic characters.

Kind regards
Bernd

kaveh1000
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 337
Joined: Sun Dec 18, 2011 7:23 pm
Location: London
Contact:

Re: Getting JSON from Google Sheets with unicode chars

Post by kaveh1000 » Sun Sep 29, 2019 4:46 pm

Hi Bernd

You were right that not all chars are translated with the textDecode handler. Your other lines do convert almost all. Working on it and will report back.

Kaveh
Kaveh

bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3321
Joined: Sun Jan 07, 2007 9:12 pm
Location: Bochum, Germany

Re: Getting JSON from Google Sheets with unicode chars

Post by bn » Sun Sep 29, 2019 8:02 pm

Hi Kaveh,

I replaced the hack via html with numToCodePoint() to get the "\uXXXX" encoded Jason characters.
That looks a bit more "professional"...
And maybe it solves the problem for those that did not convert before?

Code: Select all

on mouseup
   local tURL, tSheet, tData, tOffset, tText
   put "https://docs.google.com/spreadsheets/d/1hadj4bbJcSsgWIFaupz-_O_2ZAX92jOBP5RhakFpG24/edit?usp=sharing" \
         into tURL
   put "any_name_with_no_space" into tSheet
   -- put break point below to see array returned
   put _get_data_from_Google_Sheet(tURL, tSheet) into tData
   
   ## takes care of "degrees"
   put textDecode(tData, "utf8") into tData
   
   repeat
      put offset("\u", tData) into tOffset
      if tOffset > 0 then
         
         ## isolate hexadecimal number assuming all start with \u and are 4 chars long
         put char tOffset + 2 to tOffset + 5 of tData into tText
         
         ## take hexadecimal number and convert it to codePoint i.e. a character
         put "0x" before tText
         put numToCodepoint(tText) into tText
         
         ## replace json-hexadecimal with text representation
         put tText into char tOffset to tOffset + 5 of tData 
      else
         exit repeat
      end if
   end repeat
   
   
   ## (optional) changes \n into linefeed
   put format(tData) into tData
   
   put tData into field 2
end mouseup
you can test this on above stack.
Kind regards
Bernd

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 3721
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: Getting JSON from Google Sheets with unicode chars

Post by richmond62 » Sun Sep 29, 2019 9:54 pm

hexadecimal number assuming all start with \u and are 4 chars
That might be an unwise assumption:
-
Screenshot 2019-09-29 at 23.53.22.png
-
DA.jpg
DA.jpg (10.27 KiB) Viewed 445 times

kaveh1000
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 337
Joined: Sun Dec 18, 2011 7:23 pm
Location: London
Contact:

Re: Getting JSON from Google Sheets with unicode chars

Post by kaveh1000 » Sun Sep 29, 2019 10:22 pm

Thanks so much Bernd and Richmond. I will give this a try as soon as I can. In the meantime, I put some typical unicode symbols that I have been using, as a torture test:

https://docs.google.com/spreadsheets/d/ ... edit#gid=0

Regards
Kaveh
Kaveh

bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3321
Joined: Sun Jan 07, 2007 9:12 pm
Location: Bochum, Germany

Re: Getting JSON from Google Sheets with unicode chars

Post by bn » Sun Sep 29, 2019 10:57 pm

richmond62 wrote:
Sun Sep 29, 2019 9:54 pm
hexadecimal number assuming all start with \u and are 4 chars
That might be an unwise assumption:
this is json format for whatever:

https://www.json.org

I take this as meaning they always have
escape + u + 4 chars

'\' escape

escape 'u' hex hex hex hex

hex: digit 'A' . 'F' 'a' . 'f'
from
https://www.crockford.com/mckeeman.html


@Kaveh
I tried both versions (the html and the numToCodepoint) with your torture test and they both seem to work. No funny stuff in it.

Kind regards
Bernd

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 664
Joined: Wed Nov 22, 2006 3:42 pm
Location: France
Contact:

Re: Getting JSON from Google Sheets with unicode chars

Post by Thierry » Mon Sep 30, 2019 8:23 am

bn wrote: I replaced the hack via html with numToCodePoint() to get the "\uXXXX"
encoded Jason characters.
Hi Kaveh,

As you're a sunnYrex user, here is another solution using sunnYreplace().
In fact it's the same logic as Bernd's solution, but written differently.

Code: Select all

   <....>
   put textDecode(tSheetImported, "utf8") into tSheetImported
   get sunnyReplace( tSheetImported, "(\\u([\da-fA-F]+))", "?{Ux \2}", R)
   return R
end  _get_data_from_Google_Sheet

on Ux digits4
   return numToCodepoint("0x" & digits4)
end Ux
Well, not tested but you will tell me...

Kind regards,
Thierry
Thierry Douez - https://sunny-tdz.com
Pourquoi tant de notes lorsqu'il suffit de jouer les plus belles... [Barbara]

kaveh1000
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 337
Joined: Sun Dec 18, 2011 7:23 pm
Location: London
Contact:

Re: Getting JSON from Google Sheets with unicode chars

Post by kaveh1000 » Mon Sep 30, 2019 8:36 am

Thank you Thierry. I have not even got onto using the advanced functions of sunnYrex, where (I think) you are calling an external handler. Good reminder for me! i will certainly be using sunnYreplace as soon as I get the principle sorted out.

As an interim report, I can confirm all the strange chars I put in are converted correctly. I think the problem is in quotation characters (as so often in LiveCode). I have put another minimal sheet here:

https://docs.google.com/spreadsheets/d/ ... =622506043

These seem to be converted correctly with Bernd's code, but they fail further in LiveCode. I think it is a question escaping the quotes. I will get onto it as soon as I can.

Thanks all.
Kaveh

bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3321
Joined: Sun Jan 07, 2007 9:12 pm
Location: Bochum, Germany

Re: Getting JSON from Google Sheets with unicode chars

Post by bn » Mon Sep 30, 2019 9:05 am

@Thierry: Bonjour Monsieur :)

@Kaveh

I tested numToCodepoint with all 16729 variants of XXXX - unicode. That seems to work, some characters are not displayed but apart from ASCII 0 to 31 which are not displaying anyways that was probably a font problem.

I tested in your stack you posted above with your latest link and the "torture" link . So when you have retrieved and cleaned up what you get from your sheet and it is put into the lower field then this code converts it into a nice array. No problem with quotes and such. The array is then resolved and displayed in the message box.

Code: Select all

on mouseUp
   local tData
   local tCollect
   
   put field 2 into tData
   
   ## get rid of non-jason stuff
   ## beginning
   set the itemDelimiter to "{"
   put empty into item 1 of tData
   
   ## end
   set the itemDelimiter to "}"
   put empty into item -1 of tData
   
   set the itemDelimiter to comma
   
   ## import tData
   put JsonImport(tData) into tCollect
   
   ## works only in IDE, puts the resolved array into the message box
   revPutArray tCollect
end mouseUp
Kind regards
Bernd

bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3321
Joined: Sun Jan 07, 2007 9:12 pm
Location: Bochum, Germany

Re: Getting JSON from Google Sheets with unicode chars

Post by bn » Mon Sep 30, 2019 9:43 am

here is the stack I tested the display of the 16729 variants of XXXX unicode chars. Meaning using 4 "chars/numbers" to code the hex value of the char.
The source is:

ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt


jasonUnicode.livecode.zip
click button to see what is displayed
(42.57 KiB) Downloaded 9 times


Please be a bit patient when displaying the chars. When saved with the stack the loading is slow.

Just a proof of concept.

Kind regards
Bernd

bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3321
Joined: Sun Jan 07, 2007 9:12 pm
Location: Bochum, Germany

Re: Getting JSON from Google Sheets with unicode chars

Post by bn » Mon Sep 30, 2019 4:02 pm

Hi Kaveh,

here is a stack that does the whole thing:
get the google data clean it, turn json into array, turn array into a table.

It uses a regular field with hGrid and vGrid turned no and tabstops set to 100.

the url is hardcode in the button.

It would be interesting to test this with more than one column

Kind regards
Bernd
Attachments
putJasonIntoTable.livecode.zip
resolves the array and puts it into a table
(2.64 KiB) Downloaded 12 times

bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3321
Joined: Sun Jan 07, 2007 9:12 pm
Location: Bochum, Germany

Re: Getting JSON from Google Sheets with unicode chars

Post by bn » Mon Sep 30, 2019 8:56 pm

Hi Kaveh,

could you put up a Google Sheet with a couple of columns and a couple of rows the way you did the other ones?
I would like to test above stack with more than one column.

Kind regards
Bernd

Post Reply

Return to “Talking LiveCode”