HTML source code Non-English character issue

Bringing the internet highway into your project? Building FTP, HTTP, email, chat or other client solutions?

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

ARAS
Posts: 55
Joined: Sat Nov 02, 2013 5:35 pm

Re: HTML source code Non-English character issue

Post by ARAS » Sat Nov 09, 2013 12:09 pm

Hi Marek,

It is neither a top secret nor a personal file(or sth directly related to me). I just respect privacy. Please, don't get me wrong.

This is some sort of an equivalent code. It contains some Turkish characters.

Code: Select all

    <html>

    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=windows-1254">
    <meta http-equiv="Content-Language" content="tr">
    <title>Kayıt</title>
    <meta name="keywords" content="Kaşık, Çatal, İstanbul, Ördek, Öğretmen, Üzüm">


    </head>

    <body>
    Kaşık, Çatal, İstanbul, Ördek, Öğretmen, Üzüm
    </body>
</html>

snm
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 253
Joined: Fri Dec 09, 2011 11:17 am

Re: HTML source code Non-English character issue

Post by snm » Sat Nov 09, 2013 12:54 pm

Hi ARAS,

I saved html from your post as utf-8 coded txt file.
Then I made new stack with field "Field" and button. In button is script:

Code: Select all

local tTemp

on mouseUp
   put the desktop folder & "/Turkish.txt" into tTemp
   put "file:/" & tTemp into tTemp
   put url tTemp into tTemp
   set the unicodeText of fld "Field" to uniEncode (tTemp, "UTF8")
end mouseUp
After click the button I've got text in field "Field" looking the same as in your post (see screenshot as attachment).
Please check it, maybe I missed something.

Marek
Attachments
Screen Shot 2013-11-09 at 12.39.43 PM.png

ARAS
Posts: 55
Joined: Sat Nov 02, 2013 5:35 pm

Re: HTML source code Non-English character issue

Post by ARAS » Sat Nov 09, 2013 1:54 pm

Hi snm,

I am not sure, but it is probably you saved the file as utf8.

It is normally a windows-1254 coded html file.

Could you try with the file below? Click "to Download" when the link opens.
http://rapidshare.com/share/DD3D80CB7DF ... 6E4EB35E38

ARAS

Klaus
Posts: 14177
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: HTML source code Non-English character issue

Post by Klaus » Sat Nov 09, 2013 2:17 pm

Hi ARAS,

I downloaded the file as is and have the same correct result with Marek's script!

Looks like the file IS already UTF8 encoded, although it is not indicated somewhere!?
Even my Browser "Safari" shows garbage when opening that file 8)


Best

Klaus

ARAS
Posts: 55
Joined: Sat Nov 02, 2013 5:35 pm

Re: HTML source code Non-English character issue

Post by ARAS » Sat Nov 09, 2013 4:08 pm

Hi,

I use Coda 2. I am not sure, but it might have saved it as UTF. I am sorry guys.

Could you also try with this file?
http://rapidshare.com/share/B8BFE16FF0B ... 9ABE798FBA

ARAS

Klaus
Posts: 14177
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: HTML source code Non-English character issue

Post by Klaus » Sat Nov 09, 2013 4:16 pm

OK, Safari and any other browser display this file correct, but fails in Livecode :(
Even when I "isotomac" the file first.

snm
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 253
Joined: Fri Dec 09, 2011 11:17 am

Re: HTML source code Non-English character issue

Post by snm » Sat Nov 09, 2013 4:36 pm

Hi ARAS,

Your html file is also working
I just downloaded it, changed the path to it in my script, and imported into field by clicking the button:
Screen Shot 2013-11-09 at 16.27.32 PM.png
Please try my stack after unzipping it (attached).

Marek
Attachments
windows1254_html.livecode.zip
(1.21 KiB) Downloaded 342 times

ARAS
Posts: 55
Joined: Sat Nov 02, 2013 5:35 pm

Re: HTML source code Non-English character issue

Post by ARAS » Sat Nov 09, 2013 6:44 pm

Hi snm,

Are you sure you have tried with the second file I uploaded?

I downloaded your stack, but it didn't work.

Image

Klaus,
Thanks for trying. :(



ARAS
Attachments
Turkish.html.zip
windows-1254
(415 Bytes) Downloaded 361 times

snm
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 253
Joined: Fri Dec 09, 2011 11:17 am

Re: HTML source code Non-English character issue

Post by snm » Sat Nov 09, 2013 9:35 pm

Hi ARAS,

Previously downloaded from your link file was coded in UTF-8. Now is coded as Turkish Windows Latin 5. With this coding as I see LC have problem to convert to Unicode. If you use uniEncode (tTemp, "Turkish"), there are problems you describe. Only possible runaround I found is to open your file in TextWrangler, change coding to UTF-8, and save it ignoring cautions. Then my script with uniEncode (tTemp, "UTF8") is working correct.

Sorry, I have not other idea. Maybe there is some fault in your file, or you should send it as bug to RunRev, or ask on forum if uniEncode with "Turkish" should work with files coded as Turkish Windows Latin 5 - as I see there are few more "Turkish" codings: DOS, ISO Latin 5, Mac Os, and Windows Latin 5.

As I remember years ago we had few coding for "Polish", 2 on Mac computers and even more on Windows. Additionally on Mac computers we were using Mac Roman coding with fonts prepared to work with. Problems finished when we started using UTF-8.

If it depends from you, ask for delivery of files with Turkish text saved as coded in UTF-8, not Windows Latin 5.

Marek

ARAS
Posts: 55
Joined: Sat Nov 02, 2013 5:35 pm

Re: HTML source code Non-English character issue

Post by ARAS » Sat Nov 09, 2013 11:05 pm

Hi snm,

I appreciate your help. I need to work with a website encoded the same. I won't be able to change the format.

Thanks anyway,
I've learned a lot. :o

A few hours ago, I was trying to get a snapshot of the Turkish encoded html file. While trying that, I was surprised because I was able to see the all Turkish characters and view it as if it is in the browser. I wish I was able to get the source code without any faulted Turkish character.

Code: Select all

 get revBrowserOpen(the windowId of this stack, "/Users/Aras/Desktop/Turkish.html") 
   if it is an integer then
      put it into tBrowserId
     revBrowserSet tBrowserId, "rect", the rect of image "Image"
   else
      answer "There was an error: " && it  --if the browser doesn't start, show the error. 
   end if
Note: I found this code on the forum and edited the file path. Thanks to the person who wrote this lines.

I've just send a bug report. I hope they return fast. :(

Regards,
ARAS

snm
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 253
Joined: Fri Dec 09, 2011 11:17 am

Re: HTML source code Non-English character issue

Post by snm » Sat Nov 09, 2013 11:33 pm

revBrowser use internet browser engine of your computer system (Safari or Internet Explorer) so it's the reason you see all Turkish letters correct. Check revBrowserGet function in Dictionary. You can get htmltext property from revBrowser - maybe this will work.

If you send me the link to such text (with similar wrong behaviour as your company html), I could try to help you.

Marek

ARAS
Posts: 55
Joined: Sat Nov 02, 2013 5:35 pm

Re: HTML source code Non-English character issue

Post by ARAS » Sun Nov 10, 2013 3:12 pm

Hi Marek,

I was trying to use htmltext property.
When I use the code below, LC shuts itself down. What's wrong with it?

Code: Select all

on mouseUp
   get revBrowserOpen(the windowId of this stack, "http://www.google.com") 
   if it is an integer then
      put it into tBrowserId
     revBrowserSet tBrowserId, "rect", the rect of image "Image"
     put revBrowserGet(tBrowserId, "htmltext") into tSource
     put tSource into Field "Source"
   else
      answer "There was an error: " && it  --if the browser doesn't start, show the error. 
   end if
--answer tBrowserId
end mouseUp
ARAS

snm
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 253
Joined: Fri Dec 09, 2011 11:17 am

Re: HTML source code Non-English character issue

Post by snm » Sun Nov 10, 2013 5:04 pm

Probably you are trying to get html from revBrowser before it finnish loading data. It should not shuts down the application - it's probably LC bug.
If you put brake before revBrowseget … , then step by step manually, your code is working.

To solve your problem put this script in your card:

Code: Select all

local tBrowserId, tSource

on startTest
   get revBrowserOpen (the windowId of this stack) 
   if it is an integer then
      put it into tBrowserId
      revBrowserSet tBrowserId, "rect", the rect of image "Image"
      revBrowserSet tBrowserId, "url", "http://www.google.com"
   else
      answer "There was an error: " && it  --if the browser doesn't start, show the error. 
   end if
   --answer tBrowserId
end startTest

on browserDocumentComplete
   put revBrowserGet(tBrowserId, "htmltext") into tSource
   put tSource into Field "Source"
end browserDocumentComplete
and this in your button:

Code: Select all

on mouseUp
   startTest
end mouseUp
Will work. Check browserDocumentComplete in the Dictionary. Don't forget about close revBrowser before next run of this script.

Marek

ARAS
Posts: 55
Joined: Sat Nov 02, 2013 5:35 pm

Re: HTML source code Non-English character issue

Post by ARAS » Sun Nov 10, 2013 5:44 pm

Wow,

Thanks Marek, so it was about giving enough time to load.
When I read this first
Probably you are trying to get html from revBrowser before it finnish loading data
In a sudden, some kind of delay solution came to my mind. However, "browserDocumentComplete" is way more awesome :)

Thank you.

Unfortunately,
Windows htmltext didn't work for Windows 1254. :(

I'd better try another solution. I am thinking to do sth like this.
if text input contains "ç", replace it with "√á", and search. Thus, even though input is "çatal", it will search for "√áatal".
Hope, I can do that.

Anyway, it is out of this topic.

Thanks again Marek.

ARAS

Post Reply