CSV UTF-8

LiveCode is the premier environment for creating multi-platform solutions for all major operating systems - Windows, Mac OS X, Linux, the Web, Server environments and Mobile platforms. Brand new to LiveCode? Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
matgarage
Posts: 64
Joined: Sat Apr 20, 2013 11:39 am
Location: France

CSV UTF-8

Post by matgarage » Wed Mar 20, 2024 2:08 pm

Hello,
I'm trying to generate a CSV UTF-8 with Livecode.
I use CSV with an Adobe Illustrator plugin to generate variable data printing.
Using Excel's "CSV UTF-8" saving option, everything works correctly for French characters accents.
However, I can't generate a CSV with accents that is recognized by my plugin, when I do it with Livecode.

The file generated with Livecode is correctly displayed with accents in TextEdit on the Mac.
When I import it into the Illustrator plugin, it's always recognized with a "Western Europe (Windows)" encoding, and the accents are not managed.
The same file imported into Excel and saved as a CSV UTF-8 file is recognized with a "UNICODE" encoding and the accents are managed.
How can I generate a CSV in livecode with "UNICODE" encoding?

Here is my code :

Code: Select all

on mouseUp pMouseButton
   put "NOM;ENTREPRISE;TEXTE" & CR into tVariable
   put "Matthieu;Garage;Hespéridée"  after tVariable
   ask file "Choisissez la destination de l'export" with "BD_test"
   if the result is not "cancel" then
      put it into tFileName 
      put textencode(tVariable,"UTF-8") into tVariable
      put tVariable into URL ("file:" & tFileName & ".csv")
   end if
end mouseUp
In attachments the two CSV files from Livecode and Excel
Attachments
CSVs.zip
(1.19 KiB) Downloaded 17 times

Klaus
Posts: 13829
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: CSV UTF-8

Post by Klaus » Wed Mar 20, 2024 3:21 pm

Hi Mat,

obviously EXCEL add a BOM (Byte Order Mark) at the beginning of your file, but LC does not.
I think the Illustrator plugin is the culprit, since a BOM does NOT guarantee that the file is UTF8, but you could try this:
https://forums.livecode.com/viewtopic.php?f=7&t=22365

Best

Klaus

matgarage
Posts: 64
Joined: Sat Apr 20, 2013 11:39 am
Location: France

Re: CSV UTF-8

Post by matgarage » Wed Mar 20, 2024 4:32 pm

Hi Klaus

The BOM is the a solution for my issue.
My IT colleague had to set the "BOM" option when he export for me CSV file from PHP engine.

I have tried this on my code :

Code: Select all

 put textencode(tVariable,"UTF-8") into tVariable
      put numtoByte(238) & numtoByte(187) & numtoByte(191) before tVariable 
      put tVariable into URL ("binfile:" & tFileName & ".csv")
But it's not working because (I think) I can see the "unknown" characters when I preview it in textEdit.
Illustrator ses that as a normal character and not as BOM.

I've tried with "binfile:" and with "file:" with not the same result but with BOM not "silent".

I don't understand what's wrong...

Thanks for your help

Mat

SparkOut
Posts: 2852
Joined: Sun Sep 23, 2007 4:58 pm

Re: CSV UTF-8

Post by SparkOut » Wed Mar 20, 2024 10:37 pm

I don't know anything about how using a Mac would affect this, but have in the past used text files with BOM generated from LC on Windows to be read by TV station tricaster in RTL languages.
I wonder whether the order of the steps of your file creation might be part of the issue? Just maybe

Code: Select all

put numtoByte(238) & numtoByte(187) & numtoByte(191) before tVariable 
put textencode(tVariable,"UTF-8") into URL ("binfile:" & tFileName & ".csv")
I don't know if that will be any use though.

stam
Posts: 2686
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: CSV UTF-8

Post by stam » Thu Mar 21, 2024 1:04 am

SparkOut wrote:
Wed Mar 20, 2024 10:37 pm
I don't know anything about how using a Mac would affect this, but have in the past used text files with BOM generated from LC on Windows to be read by TV station tricaster in RTL languages.
I wonder whether the order of the steps of your file creation might be part of the issue? Just maybe

Code: Select all

put numtoByte(238) & numtoByte(187) & numtoByte(191) before tVariable 
put textencode(tVariable,"UTF-8") into URL ("binfile:" & tFileName & ".csv")
I don't know if that will be any use though.
I'm not well versed with BOM so I looked it up: https://en.wikipedia.org/wiki/Byte_order_mark
For UTF-8 there is no distinction in 'endian-ness' so byte sequence doesn't matter. It's 0xFEFF in UTF-16; in UTF-8 it's 0xEF 0xBB 0xBF.

However - forgive me if I'm wrong - the code above looks like you're adding binary data to a string and then converting this again to binary data.
Would the correct process not to be be to add the BOM as string then convert the whole variable to binary? Or to add the binary to a binary?
e.g.

Code: Select all

put numToCodepoint(0xFEFF) before tVariable 
put textEncode (tVariable, "UTF-8") into URL ("binfile:" & tFilePath)
Or perhaps that first line should be

Code: Select all

put numToCodepoint(0xEF) & numToCodepont(0xBB) & numToCodepont(0xBF) before tVariable
I may be waaaay off, I'm really just commenting for my own learning...
Also, not sure if you can concatenate binary data but I'm guessing you can't concatenate binary with text (?)
More than likely the above is wrong - I have no way of testing as on Mac it seems to always respect the unicode text and can't see 'strange' characters.

matgarage
Posts: 64
Joined: Sat Apr 20, 2013 11:39 am
Location: France

Re: CSV UTF-8

Post by matgarage » Thu Mar 21, 2024 8:31 am

SparkOut wrote:
Wed Mar 20, 2024 10:37 pm
I don't know anything about how using a Mac would affect this, but have in the past used text files with BOM generated from LC on Windows to be read by TV station tricaster in RTL languages.
I wonder whether the order of the steps of your file creation might be part of the issue? Just maybe

Code: Select all

put numtoByte(238) & numtoByte(187) & numtoByte(191) before tVariable 
put textencode(tVariable,"UTF-8") into URL ("binfile:" & tFileName & ".csv")
I don't know if that will be any use though.
The "strange" characters are stil there with this option.

matgarage
Posts: 64
Joined: Sat Apr 20, 2013 11:39 am
Location: France

Re: CSV UTF-8

Post by matgarage » Thu Mar 21, 2024 8:36 am

stam wrote:
Thu Mar 21, 2024 1:04 am

Code: Select all

put numToCodepoint(0xFEFF) before tVariable 
put textEncode (tVariable, "UTF-8") into URL ("binfile:" & tFilePath)
It works !!! Nice.
Thanks to all for your help

stam
Posts: 2686
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: CSV UTF-8

Post by stam » Thu Mar 21, 2024 10:03 am

matgarage wrote:
Thu Mar 21, 2024 8:36 am
stam wrote:
Thu Mar 21, 2024 1:04 am

Code: Select all

put numToCodepoint(0xFEFF) before tVariable 
put textEncode (tVariable, "UTF-8") into URL ("binfile:" & tFilePath)
It works !!! Nice.
Thanks to all for your help
Glad that helped! I learned something new too ;)
Maybe edit the title of the original post and add "[SOLVED]" after the title or some such for others that may have a similar issue?

Post Reply

Return to “Getting Started with LiveCode - Experienced Developers”