Page 1 of 1

UTF8 problem after writing a text file

Posted: Mon Jan 03, 2011 1:49 pm
by Zax
Hello,

I have an UTF-8 encoded text file and BBEdit well recognize the encoding.
Now I open this file with Rev Studio 4.0 with

Code: Select all

open file myFile for binary read
read from file myFile until EOF
put uniDecode(uniEncode(it,"UTF8")) into data
-- here some treatments on data...
put data into URL ("binfile:" & myFile)
After that, when I open again the file in BBEdit, BBEdit doesn't recognize UTF-8 encoding.
Can anyone tell me what am I doing wrong?

Thanks.

Re: UTF8 problem after writing a text file

Posted: Mon Jan 03, 2011 2:33 pm
by Janschenkel
I'm afraid you're getting a little mixed up in the use of the uniDecode and uniEncode functions.

Code: Select all

open file myFile for binary read
read from file myFile until EOF
-- first convert from UTF8 to UTF16
put uniEncode(it,"UTF8") into data
-- here some treatments on data...
-- finally convert from UTF16 to UTF8
put uniDecode(data,"UTF8") into data
-- and write that to file
put data into URL ("binfile:" & myFile)
Also make sure to set the useUnicode local property when needed as you treat the data.

HTH,

Jan Schenkel.

Re: UTF8 problem after writing a text file

Posted: Mon Jan 03, 2011 3:14 pm
by Zax
Thank you Jan for your reply.

I'm not familiar with these encoding problems but I found the "uniDecode(uniEncode(it,"UTF8"))" trick in the built-in Rev help and I have to say that it works well (on Mac OS at least).
My problem is when writing the file.

I tried your script but was unable to make it work, maybe I missed something.

Re: UTF8 problem after writing a text file

Posted: Tue Jan 04, 2011 1:42 pm
by Zax
OK, I made a mistake. The following test script works for UTF-8 text files, with or without BOM

Code: Select all

  open file myFile for binary read
  read from file myFile until EOF
  close file myFile
  
  put uniEncode("__et voilĂ __") into addedString -- for testing purpose
   
  set the useUnicode to true
  put uniEncode(it,"UTF8") into data
  put addedString after char 50 of data -- text modification
  put uniDecode(data,"UTF8") into data
  
  put data into URL ("binfile:" & myFile)
But now problems are with non-UTF8 encoded files, Mac OS roman for example: accented characters are lost and output text file is now UTF8 encoded :(
So, is there a way to know how a text file is encoded before modifying it?

Re: UTF8 problem after writing a text file

Posted: Thu Jan 20, 2011 5:29 pm
by Mark
Hi Zax,

The safest way to do this is to provide the user with an open file dialog, which includes a menu, e.g.

answer file "Choose a text file..." with type "MacRoman|txt|TEXT" or type "Windows Latin|txt|TEXT" or type "UTF8|txt|TEXT" or type "Rich Text Format|rtf|RTF "

After this command is executed, the result contains "MacRoman" or "WindowsLatin" etc. This way, you can ask the user what kind of file you're dealing with.

Best regards,

Mark