Getting JSON from Google Sheets with unicode chars
Moderators: Klaus, FourthWorld, heatherlaine, kevinmiller, robinmiller
-
- Livecode Opensource Backer
- Posts: 454
- Joined: Sun Dec 18, 2011 7:23 pm
- Location: London
- Contact:
Getting JSON from Google Sheets with unicode chars
Hi all
Starting with this post from André:
https://andregarzia.com/2018/11/reading ... ecode.html
I have been trying to get data from Google Sheets, but in JSON as opposed to CSV. With some excellent offline help I have made great progress. A particular problem is that non-ascii characters are corrupted in the JSON I get into LiveCode. Here is the sheet I am trying to import:
https://docs.google.com/spreadsheets/d/ ... edit#gid=0
Please note the bullets and degree signs in the last column. I want these to be in the JSON exactly as they are. If you download the sheet as CSV or TSV, they are indeed downloaded as such. But in the JSON they come out as:
\u2022 and °
I attach a minimal stack that imports the data. Please see script of button and stack.
Any hints welcome.
Regards
Kaveh
Starting with this post from André:
https://andregarzia.com/2018/11/reading ... ecode.html
I have been trying to get data from Google Sheets, but in JSON as opposed to CSV. With some excellent offline help I have made great progress. A particular problem is that non-ascii characters are corrupted in the JSON I get into LiveCode. Here is the sheet I am trying to import:
https://docs.google.com/spreadsheets/d/ ... edit#gid=0
Please note the bullets and degree signs in the last column. I want these to be in the JSON exactly as they are. If you download the sheet as CSV or TSV, they are indeed downloaded as such. But in the JSON they come out as:
\u2022 and °
I attach a minimal stack that imports the data. Please see script of button and stack.
Any hints welcome.
Regards
Kaveh
- Attachments
-
- json from Google Sheets.zip
- (1.6 KiB) Downloaded 64 times
Kaveh
-
- VIP Livecode Opensource Backer
- Posts: 3461
- Joined: Sun Jan 07, 2007 9:12 pm
- Location: Bochum, Germany
Re: Getting JSON from Google Sheets with unicode chars
Hi Kaveh,
I don't know if you have solved this.
Here is some code that fits into your stack that you might try.
Kind regards
Bernd
I don't know if you have solved this.
Here is some code that fits into your stack that you might try.
Code: Select all
on mouseup
put "https://docs.google.com/spreadsheets/d/1hadj4bbJcSsgWIFaupz-_O_2ZAX92jOBP5RhakFpG24/edit?usp=sharing" \
into tURL
put "any_name_with_no_space" into tSheet
-- put break point below to see array returned
put _get_data_from_Google_Sheet(tURL, tSheet) into tData
## takes care of "degrees"
put textDecode(tData, "utf8") into tData
## clear templateField
reset the templateField
repeat
put offset("\u", tData) into tOffset
if tOffset > 0 then
## isolate hexadecimal number assuming all start with \u and are 4 chars long
put char tOffset + 2 to tOffset + 5 of tData into tText
## convert hexadecimal number to html entity (hexadecimal) like "•"
put "&#x" & tText & ";" into tHTML
## generate htmlText
set the htmlText of the templateField to tHTML
## textual representation of hecadecimal number
put the text of the templateField into tText
## replace json-hexadecimal with text representation
put tText into char tOffset to tOffset + 5 of tData
else
## clear templateField
reset the templateField
exit repeat
end if
end repeat
## (optional) changes \n into linefeed
put format(tData) into tData
put tData into field 2
end mouseup
-- put "•" into tTest
Bernd
-
- Livecode Opensource Backer
- Posts: 454
- Joined: Sun Dec 18, 2011 7:23 pm
- Location: London
- Contact:
Re: Getting JSON from Google Sheets with unicode chars
Thank you so much Bernd. What I was looking for was
that I got from Richmond's post here:
https://forums.livecode.com/viewtopic.php?f=9&t=33141
It seems to correct all characters. I tested with around 20. Do I still need your code to test for \u ?
Code: Select all
put textDecode(tData, "utf8") into tData
https://forums.livecode.com/viewtopic.php?f=9&t=33141
It seems to correct all characters. I tested with around 20. Do I still need your code to test for \u ?
Kaveh
-
- VIP Livecode Opensource Backer
- Posts: 3461
- Joined: Sun Jan 07, 2007 9:12 pm
- Location: Bochum, Germany
Re: Getting JSON from Google Sheets with unicode chars
I don't know but the code decodes \uxxxxDo I still need your code to test for \u ?
Although in a somewhat "homemade" fashion.
\u2022 which is used to represent the ° was not decoded. So it might be useful for exotic characters.
Kind regards
Bernd
-
- Livecode Opensource Backer
- Posts: 454
- Joined: Sun Dec 18, 2011 7:23 pm
- Location: London
- Contact:
Re: Getting JSON from Google Sheets with unicode chars
Hi Bernd
You were right that not all chars are translated with the textDecode handler. Your other lines do convert almost all. Working on it and will report back.
Kaveh
You were right that not all chars are translated with the textDecode handler. Your other lines do convert almost all. Working on it and will report back.
Kaveh
Kaveh
-
- VIP Livecode Opensource Backer
- Posts: 3461
- Joined: Sun Jan 07, 2007 9:12 pm
- Location: Bochum, Germany
Re: Getting JSON from Google Sheets with unicode chars
Hi Kaveh,
I replaced the hack via html with numToCodePoint() to get the "\uXXXX" encoded Jason characters.
That looks a bit more "professional"...
And maybe it solves the problem for those that did not convert before?
you can test this on above stack.
Kind regards
Bernd
I replaced the hack via html with numToCodePoint() to get the "\uXXXX" encoded Jason characters.
That looks a bit more "professional"...
And maybe it solves the problem for those that did not convert before?
Code: Select all
on mouseup
local tURL, tSheet, tData, tOffset, tText
put "https://docs.google.com/spreadsheets/d/1hadj4bbJcSsgWIFaupz-_O_2ZAX92jOBP5RhakFpG24/edit?usp=sharing" \
into tURL
put "any_name_with_no_space" into tSheet
-- put break point below to see array returned
put _get_data_from_Google_Sheet(tURL, tSheet) into tData
## takes care of "degrees"
put textDecode(tData, "utf8") into tData
repeat
put offset("\u", tData) into tOffset
if tOffset > 0 then
## isolate hexadecimal number assuming all start with \u and are 4 chars long
put char tOffset + 2 to tOffset + 5 of tData into tText
## take hexadecimal number and convert it to codePoint i.e. a character
put "0x" before tText
put numToCodepoint(tText) into tText
## replace json-hexadecimal with text representation
put tText into char tOffset to tOffset + 5 of tData
else
exit repeat
end if
end repeat
## (optional) changes \n into linefeed
put format(tData) into tData
put tData into field 2
end mouseup
Kind regards
Bernd
-
- Livecode Opensource Backer
- Posts: 5262
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: Getting JSON from Google Sheets with unicode chars
That might be an unwise assumption:hexadecimal number assuming all start with \u and are 4 chars
- -
-
- Livecode Opensource Backer
- Posts: 454
- Joined: Sun Dec 18, 2011 7:23 pm
- Location: London
- Contact:
Re: Getting JSON from Google Sheets with unicode chars
Thanks so much Bernd and Richmond. I will give this a try as soon as I can. In the meantime, I put some typical unicode symbols that I have been using, as a torture test:
https://docs.google.com/spreadsheets/d/ ... edit#gid=0
Regards
Kaveh
https://docs.google.com/spreadsheets/d/ ... edit#gid=0
Regards
Kaveh
Kaveh
-
- VIP Livecode Opensource Backer
- Posts: 3461
- Joined: Sun Jan 07, 2007 9:12 pm
- Location: Bochum, Germany
Re: Getting JSON from Google Sheets with unicode chars
this is json format for whatever:richmond62 wrote: ↑Sun Sep 29, 2019 9:54 pmThat might be an unwise assumption:hexadecimal number assuming all start with \u and are 4 chars
https://www.json.org
I take this as meaning they always have
escape + u + 4 chars
from'\' escape
escape 'u' hex hex hex hex
hex: digit 'A' . 'F' 'a' . 'f'
https://www.crockford.com/mckeeman.html
@Kaveh
I tried both versions (the html and the numToCodepoint) with your torture test and they both seem to work. No funny stuff in it.
Kind regards
Bernd
-
- VIP Livecode Opensource Backer
- Posts: 795
- Joined: Wed Nov 22, 2006 3:42 pm
- Location: France
- Contact:
Re: Getting JSON from Google Sheets with unicode chars
Hi Kaveh,bn wrote: I replaced the hack via html with numToCodePoint() to get the "\uXXXX"
encoded Jason characters.
As you're a sunnYrex user, here is another solution using sunnYreplace().
In fact it's the same logic as Bernd's solution, but written differently.
Code: Select all
<....>
put textDecode(tSheetImported, "utf8") into tSheetImported
get sunnyReplace( tSheetImported, "(\\u([\da-fA-F]+))", "?{Ux \2}", R)
return R
end _get_data_from_Google_Sheet
on Ux digits4
return numToCodepoint("0x" & digits4)
end Ux
Kind regards,
Thierry
Regex ❤ LiveCode ❤ sunnYrex
https://sunny-tdz.com
https://sunny-tdz.com
-
- Livecode Opensource Backer
- Posts: 454
- Joined: Sun Dec 18, 2011 7:23 pm
- Location: London
- Contact:
Re: Getting JSON from Google Sheets with unicode chars
Thank you Thierry. I have not even got onto using the advanced functions of sunnYrex, where (I think) you are calling an external handler. Good reminder for me! i will certainly be using sunnYreplace as soon as I get the principle sorted out.
As an interim report, I can confirm all the strange chars I put in are converted correctly. I think the problem is in quotation characters (as so often in LiveCode). I have put another minimal sheet here:
https://docs.google.com/spreadsheets/d/ ... =622506043
These seem to be converted correctly with Bernd's code, but they fail further in LiveCode. I think it is a question escaping the quotes. I will get onto it as soon as I can.
Thanks all.
As an interim report, I can confirm all the strange chars I put in are converted correctly. I think the problem is in quotation characters (as so often in LiveCode). I have put another minimal sheet here:
https://docs.google.com/spreadsheets/d/ ... =622506043
These seem to be converted correctly with Bernd's code, but they fail further in LiveCode. I think it is a question escaping the quotes. I will get onto it as soon as I can.
Thanks all.
Kaveh
-
- VIP Livecode Opensource Backer
- Posts: 3461
- Joined: Sun Jan 07, 2007 9:12 pm
- Location: Bochum, Germany
Re: Getting JSON from Google Sheets with unicode chars
@Thierry: Bonjour Monsieur 
@Kaveh
I tested numToCodepoint with all 16729 variants of XXXX - unicode. That seems to work, some characters are not displayed but apart from ASCII 0 to 31 which are not displaying anyways that was probably a font problem.
I tested in your stack you posted above with your latest link and the "torture" link . So when you have retrieved and cleaned up what you get from your sheet and it is put into the lower field then this code converts it into a nice array. No problem with quotes and such. The array is then resolved and displayed in the message box.
Kind regards
Bernd

@Kaveh
I tested numToCodepoint with all 16729 variants of XXXX - unicode. That seems to work, some characters are not displayed but apart from ASCII 0 to 31 which are not displaying anyways that was probably a font problem.
I tested in your stack you posted above with your latest link and the "torture" link . So when you have retrieved and cleaned up what you get from your sheet and it is put into the lower field then this code converts it into a nice array. No problem with quotes and such. The array is then resolved and displayed in the message box.
Code: Select all
on mouseUp
local tData
local tCollect
put field 2 into tData
## get rid of non-jason stuff
## beginning
set the itemDelimiter to "{"
put empty into item 1 of tData
## end
set the itemDelimiter to "}"
put empty into item -1 of tData
set the itemDelimiter to comma
## import tData
put JsonImport(tData) into tCollect
## works only in IDE, puts the resolved array into the message box
revPutArray tCollect
end mouseUp
Bernd
-
- VIP Livecode Opensource Backer
- Posts: 3461
- Joined: Sun Jan 07, 2007 9:12 pm
- Location: Bochum, Germany
Re: Getting JSON from Google Sheets with unicode chars
here is the stack I tested the display of the 16729 variants of XXXX unicode chars. Meaning using 4 "chars/numbers" to code the hex value of the char.
The source is:
ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
Please be a bit patient when displaying the chars. When saved with the stack the loading is slow.
Just a proof of concept.
Kind regards
Bernd
The source is:
ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
Please be a bit patient when displaying the chars. When saved with the stack the loading is slow.
Just a proof of concept.
Kind regards
Bernd
-
- VIP Livecode Opensource Backer
- Posts: 3461
- Joined: Sun Jan 07, 2007 9:12 pm
- Location: Bochum, Germany
Re: Getting JSON from Google Sheets with unicode chars
Hi Kaveh,
here is a stack that does the whole thing:
get the google data clean it, turn json into array, turn array into a table.
It uses a regular field with hGrid and vGrid turned no and tabstops set to 100.
the url is hardcode in the button.
It would be interesting to test this with more than one column
Kind regards
Bernd
here is a stack that does the whole thing:
get the google data clean it, turn json into array, turn array into a table.
It uses a regular field with hGrid and vGrid turned no and tabstops set to 100.
the url is hardcode in the button.
It would be interesting to test this with more than one column
Kind regards
Bernd
- Attachments
-
- putJasonIntoTable.livecode.zip
- resolves the array and puts it into a table
- (2.64 KiB) Downloaded 55 times
-
- VIP Livecode Opensource Backer
- Posts: 3461
- Joined: Sun Jan 07, 2007 9:12 pm
- Location: Bochum, Germany
Re: Getting JSON from Google Sheets with unicode chars
Hi Kaveh,
could you put up a Google Sheet with a couple of columns and a couple of rows the way you did the other ones?
I would like to test above stack with more than one column.
Kind regards
Bernd
could you put up a Google Sheet with a couple of columns and a couple of rows the way you did the other ones?
I would like to test above stack with more than one column.
Kind regards
Bernd