Page 1 of 1

character encoding - never ending story

Posted: Sun Feb 26, 2017 9:56 pm
by UKMC
Hi altogether,

I have a MariaDB (10.0.29) with character set latin1 and collate latin1_german1_ci

When I save the text "(ÄÖÜäöüßÉ) from livecode to the DB without special dealing, in the database is stored "(????????)"

After searching the forum, I introduced the following function in my livecode script:
put unidecode(uniencode("(ÄÖÜäöüßÉ)"),"utf8") into utf8text
and sent this to the database.

Result: characters are stored correctly.

But when retrieving the data again, I get "(ÄÖÜäöüßÉ)"
Now I do not know how to translate them back to the original in livecode. I tried
put unidecode(uniencode("(ÄÖÜäöüßÉ)"),"utf8") into utf8text (does not work)
mactoiso("(ÄÖÜäöüßÉ)") results in "(ÄÖÜäöüßÉ)"

The following solution works, but it seems to me very inelegant:

function Umlautdecodierung eingabe
--Ä
replace "Ä" with "Ä" in eingabe
-- Ö
replace "Ö" with "Ö" in eingabe
-- Ü
replace "Ãœ" with "Ü" in eingabe
-- ä
replace "ä" with "ä" in eingabe
-- ö
replace "ö" with "ö" in eingabe
-- ü
replace "ü" with "ü" in eingabe
-- ß
replace "ß" with "ß" in eingabe
-- É
replace "É" with "Ö" in eingabe

return eingabe
end Umlautdecodierung


I hope one of you has the better idea to solve this problem.

By the way: My favourite solution would be not to have converting at all as this makes scripting much complicated. Perhaps you know the magic configuration for my MariaDB.

Best regards


Ulrich

Re: character encoding - never ending story

Posted: Mon Feb 27, 2017 12:43 pm
by AndyP
I think you will also need to set the db to default UTF8.

see here>https://mariadb.com/kb/en/mariadb/setti ... ollations/

Re: character encoding - never ending story

Posted: Mon Feb 27, 2017 8:42 pm
by jacque
The uniEncode and uniDecode functions are deprecated and shouldn't be used with LC 7 or above, though they do still work. But the new functions are much easier to work with and I recommend them. See textEncode() and textDecode() in the dictionary. They do all the work for you, as long as you know the correct character set you're working with (and you do.)

Or you can follow AndyP's suggestion, and set the database to use UTF8.

Re: character encoding - never ending story

Posted: Thu Mar 09, 2017 1:12 pm
by MaxV

Code: Select all

put urlencode("ÄÖÜäöüßÉ")
result is standard ASCII: %C4%D6%DC%E4%F6%FC%DF%C9

Code: Select all

pur urldecode("%C4%D6%DC%E4%F6%FC%DF%C9")
result is you chars: ÄÖÜäöüßÉ

:D

Re: character encoding - never ending story - mySQL

Posted: Thu May 18, 2017 8:55 pm
by Hans-Helmut
I am lost at the moment and asking for help. I have to use international characters UTF-8 encoded, mainly Russian, German and English together.
Even though all appears fine when looking on the server and using phpMyAdmin to browse the database, in LiveCode trying all kinds of settings, it does not work.

Code: Select all

#Simplified code snippet without error checking:
on mouseUp
global gConnectionID
   put "SELECT name FROM party" into tSQL
   put revDataFromQuery(tab, cr, gConnectionID, tSQL) into tList
   put textDecode ( tList , "UTF-8")  into field "data"
end mouseUp
The Russian character string is "аловуе" - (alowue)
Selecting this record results in "??????"

Settings
LiveCode: 8.1.4 (rc 2)
OS: Windows 2000, 64bit, latest update

Server: Localhost via UNIX socket
Server type: MySQL
Server version: 5.6.33-log - MySQL Community Server (GPL)
Protocol version: 10
User: b@localhost
Server charset: UTF-8 Unicode (utf8)

Server connection collation: utf8_general_ci
User language: English
Database: b_address
Table: party
Column: name
Collumn collation: utf8_general_ci

Re: character encoding - never ending story

Posted: Fri May 19, 2017 7:33 am
by Hans-Helmut
I am still stuck with Russian text in mySQL and LC... )

For now, I can not go through PHP or server side scripting. I need to use the direct connection as described.

All settings in the MySQL database are for UTF-8.

Russian characters are visible on the server side. But they do not render on the client side.

Executing through LiveCode using textDecode()

1. Special Latin-1 characters are not shown. A "Müller" will become "Mller". // Why? Wrong.
2. Any Russian character will not render at all: "Димитрий" will become "?????????" // Why? Wrong.

Executing without textDecode()

1. Special Latin-1 characters are shown. A "Müller" is still "Müller" with "u-Umlaut".
2. Any Russian character will not render: "Димитрий" will become "?????????"

Is this a bug in LiveCode?
Is there still something wrong on the server-side settings?

I really need this. For this, I can not use PHP. And a LiveCode server installation is not permitted.

Thanks for any help.

Re: character encoding - never ending story

Posted: Fri May 19, 2017 10:13 am
by bangkok
Hans-Helmut wrote: All settings in the MySQL database are for UTF-8.

Russian characters are visible on the server side. But they do not render on the client side.
Before to do your select, try to execute this query :

Code: Select all

   revExecuteSQL gConnectionID, "SET NAMES 'utf8'"

Re: character encoding - never ending story

Posted: Fri May 19, 2017 5:27 pm
by jacque
As mentioned above, you need to use textDecode() to translate the incoming text to a format LC can use. Most databases use UTF8 so I think it's safe to assume that.

Code: Select all

put textDecode(data, "UTF8") into tString
Edit : I just saw you are already using textDecode so ignore the above.

Re: character encoding - never ending story

Posted: Sat May 20, 2017 8:22 pm
by MaxV
Use urlencode and urldecode functions, all chars are translate to standard ASCII and put data safely in a database, then with urldecode come back in your charset.
Urlencode and urldecode functions are the best way to preserve data. See http://livecode.wikia.com/wiki/URLEncode

Examples:

put urlencode("Müller")
=
M%FCller

put urlencode(textEncode("Димитрий","UTF8"))
=
%D0%94%D0%B8%D0%BC%D0%B8%D1%82%D1%80%D0%B8%D0%B9

put urldecode("M%FCller")
=
Müller

put textdecode(urldecode("%D0%94%D0%B8%D0%BC%D0%B8%D1%82%D1%80%D0%B8%D0%B9"),"UTF8")
=
Димитрий

As you can see the urlencode function uses always just plan ASCII that is compatible with any charset, so data are compatible with any database in the world!!! :D
Why I added textencode/textdecode with Russian chars? Because my PC is UTF16, but URLencode/urldecode works only with UTF8 chars. Livecode always works with PC encoding, in my case UTF16, so I needed to add the textencode with chars like Russian that have different hexadecimal values from UTF8 in my PC.

Re: character encoding - never ending story

Posted: Sat May 20, 2017 10:46 pm
by jacque
my PC is UTF16, but URLencode/urldecode works only with UTF8 chars.
Actually, textEncode/textDecode work with nine different encodings:

"ASCII"
"UTF-16"
"UTF-16BE"
"UTF-16LE"
"UTF-32"
"UTF-32BE"
"UTF-32LE"
"UTF-8"
"CP1252"
Livecode always works with PC encoding
When importing or opening files, LC uses the machine native encoding which will vary depending on the OS.