Page 1 of 2
Getting JSON from Google Sheets with unicode chars
Posted: Sat Aug 31, 2019 1:55 pm
by kaveh1000
Hi all
Starting with this post from André:
https://andregarzia.com/2018/11/reading ... ecode.html
I have been trying to get data from Google Sheets, but in JSON as opposed to CSV. With some excellent offline help I have made great progress. A particular problem is that non-ascii characters are corrupted in the JSON I get into LiveCode. Here is the sheet I am trying to import:
https://docs.google.com/spreadsheets/d/ ... edit#gid=0
Please note the bullets and degree signs in the last column. I want these to be in the JSON exactly as they are. If you download the sheet as CSV or TSV, they are indeed downloaded as such. But in the JSON they come out as:
\u2022 and
°
I attach a minimal stack that imports the data. Please see script of button and stack.
Any hints welcome.
Regards
Kaveh
Re: Getting JSON from Google Sheets with unicode chars
Posted: Fri Sep 27, 2019 9:51 am
by bn
Hi Kaveh,
I don't know if you have solved this.
Here is some code that fits into your stack that you might try.
Code: Select all
on mouseup
put "https://docs.google.com/spreadsheets/d/1hadj4bbJcSsgWIFaupz-_O_2ZAX92jOBP5RhakFpG24/edit?usp=sharing" \
into tURL
put "any_name_with_no_space" into tSheet
-- put break point below to see array returned
put _get_data_from_Google_Sheet(tURL, tSheet) into tData
## takes care of "degrees"
put textDecode(tData, "utf8") into tData
## clear templateField
reset the templateField
repeat
put offset("\u", tData) into tOffset
if tOffset > 0 then
## isolate hexadecimal number assuming all start with \u and are 4 chars long
put char tOffset + 2 to tOffset + 5 of tData into tText
## convert hexadecimal number to html entity (hexadecimal) like "•"
put "&#x" & tText & ";" into tHTML
## generate htmlText
set the htmlText of the templateField to tHTML
## textual representation of hecadecimal number
put the text of the templateField into tText
## replace json-hexadecimal with text representation
put tText into char tOffset to tOffset + 5 of tData
else
## clear templateField
reset the templateField
exit repeat
end if
end repeat
## (optional) changes \n into linefeed
put format(tData) into tData
put tData into field 2
end mouseup
-- put "•" into tTest
Kind regards
Bernd
Re: Getting JSON from Google Sheets with unicode chars
Posted: Fri Sep 27, 2019 10:02 am
by kaveh1000
Thank you so much Bernd. What I was looking for was
Code: Select all
put textDecode(tData, "utf8") into tData
that I got from Richmond's post here:
https://forums.livecode.com/viewtopic.php?f=9&t=33141
It seems to correct all characters. I tested with around 20. Do I still need your code to test for \u ?
Re: Getting JSON from Google Sheets with unicode chars
Posted: Fri Sep 27, 2019 10:08 am
by bn
Do I still need your code to test for \u ?
I don't know but the code decodes \uxxxx
Although in a somewhat "homemade" fashion.
\u2022 which is used to represent the ° was not decoded. So it might be useful for exotic characters.
Kind regards
Bernd
Re: Getting JSON from Google Sheets with unicode chars
Posted: Sun Sep 29, 2019 4:46 pm
by kaveh1000
Hi Bernd
You were right that not all chars are translated with the textDecode handler. Your other lines do convert almost all. Working on it and will report back.
Kaveh
Re: Getting JSON from Google Sheets with unicode chars
Posted: Sun Sep 29, 2019 8:02 pm
by bn
Hi Kaveh,
I replaced the hack via html with numToCodePoint() to get the "\uXXXX" encoded Jason characters.
That looks a bit more "professional"...
And maybe it solves the problem for those that did not convert before?
Code: Select all
on mouseup
local tURL, tSheet, tData, tOffset, tText
put "https://docs.google.com/spreadsheets/d/1hadj4bbJcSsgWIFaupz-_O_2ZAX92jOBP5RhakFpG24/edit?usp=sharing" \
into tURL
put "any_name_with_no_space" into tSheet
-- put break point below to see array returned
put _get_data_from_Google_Sheet(tURL, tSheet) into tData
## takes care of "degrees"
put textDecode(tData, "utf8") into tData
repeat
put offset("\u", tData) into tOffset
if tOffset > 0 then
## isolate hexadecimal number assuming all start with \u and are 4 chars long
put char tOffset + 2 to tOffset + 5 of tData into tText
## take hexadecimal number and convert it to codePoint i.e. a character
put "0x" before tText
put numToCodepoint(tText) into tText
## replace json-hexadecimal with text representation
put tText into char tOffset to tOffset + 5 of tData
else
exit repeat
end if
end repeat
## (optional) changes \n into linefeed
put format(tData) into tData
put tData into field 2
end mouseup
you can test this on above stack.
Kind regards
Bernd
Re: Getting JSON from Google Sheets with unicode chars
Posted: Sun Sep 29, 2019 9:54 pm
by richmond62
hexadecimal number assuming all start with \u and are 4 chars
That might be an unwise assumption:
-
-

- DA.jpg (10.27 KiB) Viewed 10261 times
Re: Getting JSON from Google Sheets with unicode chars
Posted: Sun Sep 29, 2019 10:22 pm
by kaveh1000
Thanks so much Bernd and Richmond. I will give this a try as soon as I can. In the meantime, I put some typical unicode symbols that I have been using, as a torture test:
https://docs.google.com/spreadsheets/d/ ... edit#gid=0
Regards
Kaveh
Re: Getting JSON from Google Sheets with unicode chars
Posted: Sun Sep 29, 2019 10:57 pm
by bn
richmond62 wrote: ↑Sun Sep 29, 2019 9:54 pm
hexadecimal number assuming all start with \u and are 4 chars
That might be an unwise assumption:
this is json format for whatever:
https://www.json.org
I take this as meaning they always have
escape + u + 4 chars
'\' escape
escape 'u' hex hex hex hex
hex: digit 'A' . 'F' 'a' . 'f'
from
https://www.crockford.com/mckeeman.html
@Kaveh
I tried both versions (the html and the numToCodepoint) with your torture test and they both seem to work. No funny stuff in it.
Kind regards
Bernd
Re: Getting JSON from Google Sheets with unicode chars
Posted: Mon Sep 30, 2019 8:23 am
by Thierry
bn wrote:
I replaced the hack via html with numToCodePoint() to get the "\uXXXX"
encoded Jason characters.
Hi Kaveh,
As you're a sunnYrex user, here is another solution using sunnYreplace().
In fact it's the same logic as Bernd's solution, but written differently.
Code: Select all
<....>
put textDecode(tSheetImported, "utf8") into tSheetImported
get sunnyReplace( tSheetImported, "(\\u([\da-fA-F]+))", "?{Ux \2}", R)
return R
end _get_data_from_Google_Sheet
on Ux digits4
return numToCodepoint("0x" & digits4)
end Ux
Well, not tested but you will tell me...
Kind regards,
Thierry
Re: Getting JSON from Google Sheets with unicode chars
Posted: Mon Sep 30, 2019 8:36 am
by kaveh1000
Thank you Thierry. I have not even got onto using the advanced functions of sunnYrex, where (I think) you are calling an external handler. Good reminder for me! i will certainly be using sunnYreplace as soon as I get the principle sorted out.
As an interim report, I can confirm all the strange chars I put in are converted correctly. I think the problem is in quotation characters (as so often in LiveCode). I have put another minimal sheet here:
https://docs.google.com/spreadsheets/d/ ... =622506043
These seem to be converted correctly with Bernd's code, but they fail further in LiveCode. I think it is a question escaping the quotes. I will get onto it as soon as I can.
Thanks all.
Re: Getting JSON from Google Sheets with unicode chars
Posted: Mon Sep 30, 2019 9:05 am
by bn
@Thierry: Bonjour Monsieur
@Kaveh
I tested numToCodepoint with all 16729 variants of XXXX - unicode. That seems to work, some characters are not displayed but apart from ASCII 0 to 31 which are not displaying anyways that was probably a font problem.
I tested in your stack you posted above with your latest link and the "torture" link . So when you have retrieved and cleaned up what you get from your sheet and it is put into the lower field then this code converts it into a nice array. No problem with quotes and such. The array is then resolved and displayed in the message box.
Code: Select all
on mouseUp
local tData
local tCollect
put field 2 into tData
## get rid of non-jason stuff
## beginning
set the itemDelimiter to "{"
put empty into item 1 of tData
## end
set the itemDelimiter to "}"
put empty into item -1 of tData
set the itemDelimiter to comma
## import tData
put JsonImport(tData) into tCollect
## works only in IDE, puts the resolved array into the message box
revPutArray tCollect
end mouseUp
Kind regards
Bernd
Re: Getting JSON from Google Sheets with unicode chars
Posted: Mon Sep 30, 2019 9:43 am
by bn
here is the stack I tested the display of the 16729 variants of XXXX unicode chars. Meaning using 4 "chars/numbers" to code the hex value of the char.
The source is:
ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
Please be a bit patient when displaying the chars. When saved with the stack the loading is slow.
Just a proof of concept.
Kind regards
Bernd
Re: Getting JSON from Google Sheets with unicode chars
Posted: Mon Sep 30, 2019 4:02 pm
by bn
Hi Kaveh,
here is a stack that does the whole thing:
get the google data clean it, turn json into array, turn array into a table.
It uses a regular field with hGrid and vGrid turned no and tabstops set to 100.
the url is hardcode in the button.
It would be interesting to test this with more than one column
Kind regards
Bernd
Re: Getting JSON from Google Sheets with unicode chars
Posted: Mon Sep 30, 2019 8:56 pm
by bn
Hi Kaveh,
could you put up a Google Sheet with a couple of columns and a couple of rows the way you did the other ones?
I would like to test above stack with more than one column.
Kind regards
Bernd