Converting Unicode codes to readable text
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller
Converting Unicode codes to readable text
I'm getting data from a website in JSON format. As far as I understand, the text is Russian and I've put it into a field.
The text is this:
\u041f\u0440\u0438\u0432\u0435\u0442 \u043a\u0430\u043a \u0434\u0435\u043b\u0430?
Is there no simple way to convert this to readable text? I've searched the forum and I came across this:
https://forums.livecode.com/viewtopic.p ... on#p183823
Is there no easier way to convert those Unicode codes?
The text is this:
\u041f\u0440\u0438\u0432\u0435\u0442 \u043a\u0430\u043a \u0434\u0435\u043b\u0430?
Is there no simple way to convert this to readable text? I've searched the forum and I came across this:
https://forums.livecode.com/viewtopic.p ... on#p183823
Is there no easier way to convert those Unicode codes?
-
- VIP Livecode Opensource Backer
- Posts: 9837
- Joined: Sat Apr 08, 2006 7:05 am
- Location: Los Angeles
- Contact:
Re: Converting Unicode codes to readable text
Does the JSON file or the API documentation include a description of the encoding used?
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
Re: Converting Unicode codes to readable text
No, I checked both the returned JSON output and the API docs and encoding isn’t mentioned anywhere.
-
- Livecode Opensource Backer
- Posts: 9386
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: Converting Unicode codes to readable text
Code: Select all
on mouseUp
set the itemDelimiter to "\"
put 2 into KOUNT
repeat until item KOUNT of fld "fRAW" is "XXX"
put item KOUNT of fld "fRAW" into BUKVA
delete char 1 of BUKVA
put ("0x" & BUKVA) into MAGIC
put numToCodepoint(MAGIC) after fld "fOUT"
add 1 to KOUNT
end repeat
end mouseUp
Привет как дела
Hey, what's up?
- Attachments
-
- Dekoder.livecode.zip
- Here's the stack.
- (42.21 KiB) Downloaded 153 times
Re: Converting Unicode codes to readable text
Thanks for this richmond62! So it does look like I need to convert each character one by one. Was hoping that Livecode had some function for this that I overlooked, but I guess not. I’ll figure out a way to make sure the space gets preserved. Thanks again.
-
- Livecode Opensource Backer
- Posts: 9386
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: Converting Unicode codes to readable text
По принсип мой скрипт е една функция!
-
- VIP Livecode Opensource Backer
- Posts: 7237
- Joined: Sat Apr 08, 2006 8:31 pm
- Location: Minneapolis MN
- Contact:
Re: Converting Unicode codes to readable text
A bit quicker, but the same idea:
Code: Select all
function doTranslate pString
set the itemDelimiter to "\u"
if char 1 to 2 of pString = "\u" then delete char 1 to 2 of pString -- avoid empty first item
repeat for each item i in pString
put numToCodepoint("0x" & i) after tTranslation
end repeat
return tTranslation
end doTranslate
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com
HyperActive Software | http://www.hyperactivesw.com
Re: Converting Unicode codes to readable text
Hi,
Applying your text sample with the last solution, I found 2 errors:
-- spaces are suppressed
-- last chunk \u0430? breaks the code (error with numtocodepoint)
So, here is my take on this:
--> Привет как дела?
and thank you, I've learned my 1st Russian sentence today
Take care,
Thierry
Applying your text sample with the last solution, I found 2 errors:
-- spaces are suppressed
-- last chunk \u0430? breaks the code (error with numtocodepoint)
So, here is my take on this:
Code: Select all
local T = "\u041f\u0440\u0438\u0432\u0435\u0442 \u043a\u0430\u043a \u0434\u0435\u043b\u0430?"
on mouseUp
put tdzTranslate(T)
end mouseUp
Code: Select all
on getCodePoint V
return numToCodepoint( "0x" & V)
end getCodePoint
function tdzTranslate T
local R
get sunnyReplace(T,"\\u([0-9a-f]{4})","?{ getCodePoint \1}", R)
return R
end tdzTranslate
and thank you, I've learned my 1st Russian sentence today
Take care,
Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
Re: Converting Unicode codes to readable text
Thanks Jacque and Thierry.
Thierry, for my own small project I can't really afford a paid external, but it's good to know that it's there and I've made note of it, may be I will use it some time in the future.
For now I've used a repeat loop which finds each \uXXXX string and replaces it with the actual character.
A bit hesitant to paste it here because I know I'm a bad hobby coder but anyway, here you have it:
Thierry, for my own small project I can't really afford a paid external, but it's good to know that it's there and I've made note of it, may be I will use it some time in the future.
For now I've used a repeat loop which finds each \uXXXX string and replaces it with the actual character.
A bit hesitant to paste it here because I know I'm a bad hobby coder but anyway, here you have it:
Code: Select all
on mouseup
put "\u041f\u0440\u0438\u0432\u0435\u0442 \u043a\u0430\u043a \u0434\u0435\u043b\u0430?" into myTranslation
repeat
put "\u" into myCharsToFind
put offset(myCharsToFind, myTranslation) into myStartChar
if myStartChar is 0 then exit repeat
put myStartChar + 5 into myEndChar
put char myStartChar to myEndChar of myTranslation into codeToConvert
replace "\u" with "0x" in codeToConvert
put numToCodepoint(codeToConvert) into myChar
delete char myStartChar to myEndChar in myTranslation
put myChar after char myStartChar - 1 in myTranslation
end repeat
answer myTranslation
end mouseup
Re: Converting Unicode codes to readable text
It's fine for me, I do understand.japino wrote: Thierry, for my own small project I can't really afford a paid external...
Actually, I have a small number of regex followers who like
regex use cases; that's the main reason of my regex posts...
Oh, BTW, it's a library, not an external.
I've quickly made a new version of your excellent code,For now I've used a repeat loop which finds each \uXXXX string and replaces it with the actual character.
A bit hesitant to paste it here because I know I'm a bad hobby coder but anyway, here you have it:
just in case your curious...
But your code and mine is not efficient for long input text!
Code: Select all
function tdzTranslate txt
repeat
put offset("\u", txt) into idxStart
if idxStart is 0 then exit repeat
put idxStart + 5 into idxEnd
get numToCodepoint("0x" & char idxStart+2 to idxEnd of txt)
put IT into char idxStart to idxEnd of txt
end repeat
return txt
end tdzTranslate
Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
Re: Converting Unicode codes to readable text
Aw, many thanks for this Thierry, this is excellent! And I don't worry about long texts, because I should be dealing with sentences only.