The characters from an html file lose their accents

Deploying to Mac OS? Ask Mac OS specific questions here.

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
Mag
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 802
Joined: Fri Nov 16, 2012 10:51 pm

The characters from an html file lose their accents

Post by Mag »

Hi, all,

I'm getting text from an html file using this code:

set the htmlText of the templateField to thePageContent
put the text of the templateField into textVar

Everything works except accented characters lose their accents, for example instead of "è" it displays "√®"...

Some one has a solution better than this one? :D

replace "√†" with "à" in fixedText
replace "√®" with "è" in fixedText
replace "√π" with "ù" in fixedText
... and so on...
Simon
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3901
Joined: Sat Mar 24, 2007 2:54 am

Re: The characters from an html file lose their accents

Post by Simon »

Hi Mag,
Is the "è" encoded as è or è in the html file?

Simon
I used to be a newbie but then I learned how to spell teh correctly and now I'm a noob!
Mag
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 802
Joined: Fri Nov 16, 2012 10:51 pm

Re: The characters from an html file lose their accents

Post by Mag »

Hi Simon. One of the html page which I use for tests it reads:

<li><a href="/jobs/it/">Opportunità di lavoro</a></li>
It is this one: http://www.apple.it
Simon
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3901
Joined: Sat Mar 24, 2007 2:54 am

Re: The characters from an html file lose their accents

Post by Simon »

Hi Mag,
Sorry I can't seem to get it to work either, :( I tried different encoding but I keep getting:
à not à
from the line:<li><a href="/jobs/it/">Opportunità di lavoro</a></li> at http://www.apple.com/it/

Simon
I used to be a newbie but then I learned how to spell teh correctly and now I'm a noob!
Mag
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 802
Joined: Fri Nov 16, 2012 10:51 pm

Re: The characters from an html file lose their accents

Post by Mag »

Thank you Simon. In the source code there is this text but I don't know what meaning it has...

Code: Select all

 type="text/javascript" charset="utf-8"
And also don't know if this affects the accented chars in some way...
Simon
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3901
Joined: Sat Mar 24, 2007 2:54 am

Re: The characters from an html file lose their accents

Post by Simon »

Yes, I tried UTF8 but it made even more of a mess.

Simon
I used to be a newbie but then I learned how to spell teh correctly and now I'm a noob!
snm
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 253
Joined: Fri Dec 09, 2011 11:17 am

Re: The characters from an html file lose their accents

Post by snm »

Try

Code: Select all

set the unicodeText of fld "field" to uniEncode (theFieldContent, "UTF8")
You should get proper text in field.

Marek
Mag
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 802
Joined: Fri Nov 16, 2012 10:51 pm

Re: The characters from an html file lose their accents

Post by Mag »

It works fine! :D

Thank you so much Marek and Simon
snm
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 253
Joined: Fri Dec 09, 2011 11:17 am

Re: The characters from an html file lose their accents

Post by snm »

You are always welcome, just ask. Next time you can help somebody.

Marek
jaguayo
Posts: 10
Joined: Sun Jun 14, 2009 8:12 pm

Re: The characters from an html file lose their accents

Post by jaguayo »

Hello Mag:

In your post put "√†" for a "à", "√®" for a "è"....
Where can I find the codes for "á", "é", "í", "ó" y "ú"

Thanks.

Joseba
richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10416
Joined: Fri Feb 19, 2010 10:17 am

Re: The characters from an html file lose their accents

Post by richmond62 »

Quick quote from the Livecode documentation:

"Special characters (whose ASCII value is greater than 127) are encoded as HTML entities. LiveCode recognizes the following named entities:

Á &Aacute;
Á &aacute;
Acirc;
acirc;
acute;
AElig;
aelig;
Agrave;
agrave;
Aring;
aring;
Atilde;
atilde;
Auml;
auml;
brvbar;
Ccedil;
ccedil;
cedil;
cent;
copy;
curren;
° deg;
divide;
éEacute;
éeacute;
Ecirc;
ecirc;
Egrave;
egrave;
ETH;
eth;
Euml;
euml;
frac12;
frac14;
frac34;
gt;
Iacute;
iacute;
Icirc;
icirc;
iexcl;
Igrave;
igrave;
iquest;
Iuml;
iuml;
laquo;
lt;
macr;
micro;
middot;
nbsp;
not;
Ntilde;
ntilde;
Oacute;
oacute;
Ocirc;
ocirc;
Ograve;
ograve;
ordf;
ordm;
Oslash;
oslash;
Otilde;
otilde;
Ouml;
ouml;
para;
plusmn;
pound;
raquo;
reg;
sect;
shy;
sup1;
sup2;
sup3;
szlig;
THORN;
thorn;
times;
Uacute;
uacute;
Ucirc;
ucirc;
Ugrave;
ugrave;
uml;
Uuml;
uuml;
Yacute;
yacute;
yen;
yuml;

Unicode characters whose numeric value is greater than 255 are encoded as "bignum" entities, with a leading ampersand and trailing semicolon. For example, the Japanese character whose numeric value is 12387 is encoded as "#12387;"."

This is why this stack doesn't do a very good job [attached].
HTMLer.rev.zip
HTML import
(7.44 KiB) Downloaded 515 times
jaguayo
Posts: 10
Joined: Sun Jun 14, 2009 8:12 pm

Re: The characters from an html file lose their accents

Post by jaguayo »

Thanks Richmond62 !!
With your stack solve the problem.
Thank you very much.

Un saludo.

Joseba
Post Reply