MySQL and chars : best strategy use unicode->UTF8->ASCII?

Creating desktop or client-server database solutions?

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
bangkok
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 937
Joined: Fri Aug 15, 2008 7:15 am

MySQL and chars : best strategy use unicode->UTF8->ASCII?

Post by bangkok » Sat Nov 13, 2010 12:20 pm

I have a MySQL DB. Would like to use it to store some french, chinese or portuguese texts, without making complex queries and adding a lot of tests in my LiveCode scripts.
Detail : those texts won't be searched, just stored and displayed.

I spent quite some time trying to figure out the wonderfull world of Unicode, UTF16 and UTF8 on the LiveCode side, and characters set on the MySQL side... Tricky.
:)

If I want to minimize the risks and the complexity... could you tell me what do you think about the following strategy :

To write to MySQL :

-put unidecode(the unicodeText of field "source",utf8) into tData
-put urlencode(tData) into tToBeWritten

We convert the chinese text for instance in UTF8 (binary). And then we convert the UTF8 in plain ASCII characters (with hexadecimal values for each characters)

To read from MySQL, reverse the process :
-put urldecode(uniencode(readFromDB,"utf8")into toBeDisplayed
-set the unicodetext of field "source" to toBeDisplayed

So :
-french word "BONJOUR" will be stored as... "BONJOUR"
-portuguese word "coração" will be stored as "cora%C3%A7%C3%A3o"
-chinese word "隶书 / 隸書" will be stored as "%E9%9A%B6%E4%B9%A6+%2F+%E9%9A%B8%E6%9B%B8"

Advantages :
-brute force, but easy, and should minimize the risks, and eventually my work...

-Inconvenient : requires longer column in mySQL (to store chinese)

What do you think ? Or should I persevere with UTF 8 format directly in MySQL ?

Mark
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 5150
Joined: Thu Feb 23, 2006 9:24 pm
Contact:

Re: MySQL and chars : best strategy use unicode->UTF8->ASCII?

Post by Mark » Sun Nov 14, 2010 12:56 pm

Bangkok,

The easiest way is the correct way. Set up the database correctly, e.g. with UTF8 encoding, and write all data to the database using that encoding. Since RunRev has some difficulties reading and writing binary to and from a database, you might want to set up a web server (could even be local) and use PHP as an intermediary.

Best regards,

Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode

bangkok
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 937
Joined: Fri Aug 15, 2008 7:15 am

Re: MySQL and chars : best strategy use unicode->UTF8->ASCII?

Post by bangkok » Sun Nov 14, 2010 1:29 pm

Mark wrote: The easiest way is the correct way. Set up the database correctly, e.g. with UTF8 encoding, and write all data to the database using that encoding. Since RunRev has some difficulties reading and writing binary to and from a database, you might want to set up a web server (could even be local) and use PHP as an intermediary.
k
Thanks for the advice. Could you elaborate about the "difficulties" of reading/writing binary for LiveCode ? What's wrong ?

Mark
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 5150
Joined: Thu Feb 23, 2006 9:24 pm
Contact:

Re: MySQL and chars : best strategy use unicode->UTF8->ASCII?

Post by Mark » Sun Nov 14, 2010 2:24 pm

Bangkok,

Revolution can only exchange ASCII test with externals, database drivers and the shell. That's just a fact. However, it can exchange binary data through sockets, with PHP for example.

Kind regards,

Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode

kevinmiller
Livecode Staff Member
Livecode Staff Member
Posts: 120
Joined: Thu Feb 23, 2006 7:57 pm

Re: MySQL and chars : best strategy use unicode->UTF8->ASCII?

Post by kevinmiller » Tue Nov 16, 2010 2:11 pm

Mark wrote:Bangkok,

Revolution can only exchange ASCII test with externals, database drivers and the shell. That's just a fact. However, it can exchange binary data through sockets, with PHP for example.

Kind regards,

Mark
That's not correct. The externals API allows you to pass data to and from externals - but it cannot currently have this passed directly as parameters nor as results - you need to indirect through a variable.

This is why database calls which fetch/store data need you to pass a name of a variable. e.g.
local tMyLocalVar
get revDatabaseColumnNamed(tQueryId, "mycolumn", "tMyLocalVar")

Also when creating queries and doing variable binds, it is the *name* of variables that should be passed, and not the variables themselves. Prefixing such variable names by '*b' indicates to revDB that their contents should be treated as binary, rather than text.

However if you are using UTF-8 as the encoding to pass data to the database, then the binary issue doesn't come up - as there are no NUL characters as part of the actual strings.
Kevin Miller ~ kevin@livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps

kevinmiller
Livecode Staff Member
Livecode Staff Member
Posts: 120
Joined: Thu Feb 23, 2006 7:57 pm

Re: MySQL and chars : best strategy use unicode->UTF8->ASCII?

Post by kevinmiller » Tue Nov 16, 2010 2:24 pm

Addendum:

Note that some care is needed here - of the top of my head I'm not 100% sure what happens when you interact with a UTF-8 database.

However, the thing to remember is that UTF-8 encoded text 'looks' just like Latin-1, it just won't display correctly unless it is appropriately converted. So if the database is only being used by LiveCode, the simplest thing to do is just insert/fetch UTF-8 strings and not worry about the database's underlying encoding.

If, however, LiveCode is interoperating with other languages/systems, then indirecting through a central CGI or such is generally good practice as that means you won't get into any 'incompatibility' difficulties.
Kevin Miller ~ kevin@livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps

Mark
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 5150
Joined: Thu Feb 23, 2006 9:24 pm
Contact:

Re: MySQL and chars : best strategy use unicode->UTF8->ASCII?

Post by Mark » Tue Nov 16, 2010 2:35 pm

Hi Kevin and Bangkok,

Yes, of course, if the external was made specifically to provide Rev variables with binary data, then it can be done, but I don't think that Bangkok meant to ask how to create an external that can set variables.

Clients of mine had lots of trouble using UTF8 and databases with Rev. That's one of two reasons why I always use PHP as an intermediate, if possible. The other reason is that web hosts usually don't allow direct access from an application to a database, unless you have a dedicated server. Also, PHP makes MySQL very easy, while the Rev commands are unnecessarily complicated.

Another problem is of course the impossibility to set the textFont of a field to e.g. "Polish,UTF8". Instead, you need to convert every string to UTF8 before writing it to the database.

Do NOT think that you can just write UTF8 to a database and not worry about the underlying encoding! Such a mistake might cost you a lot of money! The field definitions of a database must be correct!

Best regards,

Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode

trevordevore
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 1005
Joined: Sat Apr 08, 2006 3:06 pm
Location: Overland Park, Kansas
Contact:

Re: MySQL and chars : best strategy use unicode->UTF8->ASCII?

Post by trevordevore » Tue Nov 16, 2010 3:10 pm

Bangkok - I would use UTF-8 as LiveCode, UTF-8 and databases can play quite nicely together. I currently use UTF-8 with SQLite, MySQL and PostgreSQL and maintaining a proper text encoding means you can search the data.

Depending on the database vendor you may or may not need to set the encoding. For MySQL you should set the default charset of the table to "utf8" when you create the table. Looking at some of my tables I have encoding set to "utf8" and the collation set to utf8_general_ci.

After that it is just a matter of encoding all text you pass to the database as UTF-8 and decoding all text that you retrieve from UTF-8 to whatever is appropriate via the unidecode/uniencode functions.
Trevor DeVore
ScreenSteps - https://www.screensteps.com

LiveCode Repos - https://github.com/search?q=user%3Atrevordevore+topic:livecode
LiveCode Builder Repos - https://github.com/search?q=user%3Atrevordevore+topic:livecode-builder

Post Reply

Return to “Databases”