Converting Unicode codes to readable text

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
japino
Posts: 78
Joined: Sun Oct 14, 2012 10:56 am

Converting Unicode codes to readable text

Post by japino » Fri Mar 27, 2020 8:43 am

I'm getting data from a website in JSON format. As far as I understand, the text is Russian and I've put it into a field.

The text is this:
\u041f\u0440\u0438\u0432\u0435\u0442 \u043a\u0430\u043a \u0434\u0435\u043b\u0430?

Is there no simple way to convert this to readable text? I've searched the forum and I came across this:
https://forums.livecode.com/viewtopic.p ... on#p183823

Is there no easier way to convert those Unicode codes?

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9837
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Converting Unicode codes to readable text

Post by FourthWorld » Fri Mar 27, 2020 3:51 pm

Does the JSON file or the API documentation include a description of the encoding used?
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

japino
Posts: 78
Joined: Sun Oct 14, 2012 10:56 am

Re: Converting Unicode codes to readable text

Post by japino » Fri Mar 27, 2020 8:45 pm

No, I checked both the returned JSON output and the API docs and encoding isn’t mentioned anywhere.

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9386
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: Converting Unicode codes to readable text

Post by richmond62 » Fri Mar 27, 2020 9:41 pm

Dekoder.png
-

Code: Select all

on mouseUp
   set the itemDelimiter to "\"
   put 2 into KOUNT
   repeat until item KOUNT of fld "fRAW" is "XXX"
         put item KOUNT of fld "fRAW" into BUKVA
         delete char 1 of BUKVA
         put ("0x" & BUKVA) into MAGIC
         put numToCodepoint(MAGIC) after fld "fOUT"
         add 1 to KOUNT
   end repeat
end mouseUp
Oddly enough the 3 words don't have gaps between them:

Привет как дела

Hey, what's up?
Attachments
Dekoder.livecode.zip
Here's the stack.
(42.21 KiB) Downloaded 153 times

japino
Posts: 78
Joined: Sun Oct 14, 2012 10:56 am

Re: Converting Unicode codes to readable text

Post by japino » Sat Mar 28, 2020 6:38 pm

Thanks for this richmond62! So it does look like I need to convert each character one by one. Was hoping that Livecode had some function for this that I overlooked, but I guess not. I’ll figure out a way to make sure the space gets preserved. Thanks again.

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9386
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: Converting Unicode codes to readable text

Post by richmond62 » Sat Mar 28, 2020 7:41 pm

По принсип мой скрипт е една функция! 8)

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7237
Joined: Sat Apr 08, 2006 8:31 pm
Location: Minneapolis MN
Contact:

Re: Converting Unicode codes to readable text

Post by jacque » Sat Mar 28, 2020 11:42 pm

A bit quicker, but the same idea:

Code: Select all

function doTranslate pString
  set the itemDelimiter to "\u"
  if char 1 to 2 of pString = "\u" then delete char 1 to 2 of pString -- avoid empty first item
  repeat for each item i in pString
    put numToCodepoint("0x" & i) after tTranslation
  end repeat
  return tTranslation
end doTranslate
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Converting Unicode codes to readable text

Post by Thierry » Sun Mar 29, 2020 8:57 am

Hi,

Applying your text sample with the last solution, I found 2 errors:
-- spaces are suppressed
-- last chunk \u0430? breaks the code (error with numtocodepoint)

So, here is my take on this:

Code: Select all

local T = "\u041f\u0440\u0438\u0432\u0435\u0442 \u043a\u0430\u043a \u0434\u0435\u043b\u0430?"

on mouseUp
   put tdzTranslate(T)
end mouseUp

Code: Select all

on getCodePoint V
   return numToCodepoint( "0x" & V)
end getCodePoint

function tdzTranslate T
   local R
   get sunnyReplace(T,"\\u([0-9a-f]{4})","?{ getCodePoint \1}", R)
   return R
end tdzTranslate
--> Привет как дела?


and thank you, I've learned my 1st Russian sentence today :)

sunnYscrrenshot 2020-03-29 à 09.49.57.png

Take care,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

japino
Posts: 78
Joined: Sun Oct 14, 2012 10:56 am

Re: Converting Unicode codes to readable text

Post by japino » Sun Mar 29, 2020 10:25 am

Thanks Jacque and Thierry.

Thierry, for my own small project I can't really afford a paid external, but it's good to know that it's there and I've made note of it, may be I will use it some time in the future.

For now I've used a repeat loop which finds each \uXXXX string and replaces it with the actual character.
A bit hesitant to paste it here because I know I'm a bad hobby coder :oops: but anyway, here you have it:

Code: Select all

on mouseup
   put "\u041f\u0440\u0438\u0432\u0435\u0442 \u043a\u0430\u043a \u0434\u0435\u043b\u0430?" into myTranslation
   repeat
      put "\u" into myCharsToFind
      put offset(myCharsToFind, myTranslation) into myStartChar
      if myStartChar is 0 then exit repeat
      put myStartChar + 5 into myEndChar
      put char myStartChar to myEndChar of myTranslation into codeToConvert
      replace "\u" with "0x" in codeToConvert
      put numToCodepoint(codeToConvert) into myChar 
      delete char myStartChar to myEndChar in myTranslation
      put myChar after char myStartChar - 1 in myTranslation
   end repeat
   answer myTranslation
end mouseup

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Converting Unicode codes to readable text

Post by Thierry » Sun Mar 29, 2020 11:35 am

japino wrote: Thierry, for my own small project I can't really afford a paid external...
It's fine for me, I do understand.
Actually, I have a small number of regex followers who like
regex use cases; that's the main reason of my regex posts...

Oh, BTW, it's a library, not an external.
For now I've used a repeat loop which finds each \uXXXX string and replaces it with the actual character.
A bit hesitant to paste it here because I know I'm a bad hobby coder :oops: but anyway, here you have it:
I've quickly made a new version of your excellent code,
just in case your curious...
But your code and mine is not efficient for long input text!

Code: Select all

function tdzTranslate txt
   repeat
      put offset("\u", txt) into idxStart
      if idxStart is 0 then exit repeat
      put idxStart + 5 into idxEnd
      get numToCodepoint("0x" & char idxStart+2 to idxEnd of txt)
      put IT into char idxStart to idxEnd of txt
   end repeat
   return txt
end tdzTranslate
Take care,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

japino
Posts: 78
Joined: Sun Oct 14, 2012 10:56 am

Re: Converting Unicode codes to readable text

Post by japino » Sun Mar 29, 2020 4:09 pm

Aw, many thanks for this Thierry, this is excellent! And I don't worry about long texts, because I should be dealing with sentences only. :)

Post Reply

Return to “Getting Started with LiveCode - Complete Beginners”