LC Server and UTF8

Bringing the internet highway into your project? Building FTP, HTTP, email, chat or other client solutions?

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
RobertC
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 46
Joined: Sun Feb 04, 2007 3:43 pm

LC Server and UTF8

Post by RobertC » Thu May 30, 2013 8:52 pm

Hi,
I just identified a problem with my new site: all pages are in UTF-8, but this does not work if the file passes through the LC Server.
I.e.:
I write a page containing accented characters such as é and ü, in UTF-8.
If the extension is .html it displays perfectly.
If I change the extension to .lc without any other change, these characters are displayed badly.
The only difference is that in the .html case the page does not go through the lc server's interpretation, and in the other case it does. There is not even any LiveCode statement in it, i.e. no <?lc ... ?> at all.

Any hint as to how to solve this?
(I obviously mean in other ways than having to replace é with &eacute: and ü with &uuml; which is not an acceptable solution)

Thanks.
The Old Rant Robert.

wsamples
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 262
Joined: Mon May 18, 2009 4:12 am

Re: LC Server and UTF8

Post by wsamples » Fri May 31, 2013 3:27 am

I was looking at this in response to your post. Initially adding some of these chars to a page resulted in incorrect rendering, but opening the page in my editor and changing the document encoding to UTF-16, saving and changing back to UTF-8 and saving again results in correct rendering. The encoding was "apparently" correct to begin with, but doing this is altering something and it now renders correctly in the browser. I can add other chars without issue after having done this. This is the kind of stuff that leads to bald spots from scrathing one's head. (I haven't looked into any variations, this is enough strangeness for me!)

RobertC
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 46
Joined: Sun Feb 04, 2007 3:43 pm

Re: LC Server and UTF8

Post by RobertC » Wed Jul 17, 2013 5:48 pm

Thanks wsamples, I was "off" for a while so saw your reply only now.

I must say I do not quite understand the sequence you went through. I use BBEdit for my pages, and they all start like this:

<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />

in addition the BBEdit preference for default encoding for new documents is set to UTF-8.
I have had no problem in other pages, on other servers or with other editors (e.g. Golive). It only happens if I use the '.lc' extension, and without even having the <?lc ... ?> brackets anywhere in the page.
It's a pity, because my new site claims to run without php, but now that seems impossible.
You can look at
http://178.250.211.179/Strange.lc
and
http://178.250.211.179/Strange.html
(exactly the same code in the files)

At 66 I still have enough hair, but as bald spots will eventually form, I'm not going to do much scratching.
I'm also a VIP Kickstarter person, but did not use my badge. Maybe I should.

Robert.
The Old Rant Robert.

wsamples
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 262
Joined: Mon May 18, 2009 4:12 am

Re: LC Server and UTF8

Post by wsamples » Wed Jul 17, 2013 6:50 pm

Hi Robert. I'm not talking about the html encoding, but the document encoding used by the text editor itself. Paying closer attention to what my editor is doing suggests that what makes this work is setting (adding) the BOM. My editor does this automatically when the encoding is changed and toggling this seems to be the difference. Give it a try!

wsamples
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 262
Joined: Mon May 18, 2009 4:12 am

Re: LC Server and UTF8

Post by wsamples » Thu Jul 18, 2013 4:37 pm


RobertC
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 46
Joined: Sun Feb 04, 2007 3:43 pm

Re: LC Server and UTF8

Post by RobertC » Mon Jul 22, 2013 10:58 am

Thanks wsamples, that seems indeed to solve the problem for now.
Sigh.
Yes, BBEdit does have a preference setting to choose between "UTF-8" (which was what I used) and "UTF-8 with BOM".
I have now set it to output the byte order mark.
So:
(1) I suppose the livecode server does not try to guess the byte order, or maybe does, but then outputs something in another byte order.
Therefore I still consider this a "bug" at some level.
(2) Loads of time wasted again by at least two people, just because there is no standard on byte order. There should of course not even be a byte order problem. The little-endian/big-endian problem dates back to the early 1970s if I am not mistaken, and had to do with storing multi-byte numbers (integers mainly) in byte-organised memory but used as "word"-organised numbers. For a stream of characters there should not be such a "problem".
Sigh (again).
And they talk of computer "science". It's just bad engineering. Reminds me of the horrible end of the railway gauge issue. (by the way, do you know where the 1/2 inch comes from in the current silly standard of 4 feet 8 inches and 1/2 ?)
Have fun, and many thanks again.

(BTW: this is what my editor's manual says:

BOM: When saving Unicode files, you may include a byte-order mark (BOM) so that the reading application knows what byte order the file’s data is in. However, since many applications do not correctly handle files which contain BOMs, you may wish to use an encoding variant without a BOM for maximum compatibility. (For purposes of recognition when you use this option, the UTF-16 BOM is FEFF, and the UTF-8 BOM is EFBBBF.)

and of course the reason I never used a BOM is that indeed many applications do not handle them correctly.)

(BTW2: I now also see that for HTML5 the BOM is required... See http://www.w3.org/International/questio ... order-mark
and that article is again quite confusing because of too much terminology that is not or ill defined. Sigh (3rd time).)
The Old Rant Robert.

wsamples
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 262
Joined: Mon May 18, 2009 4:12 am

Re: LC Server and UTF8

Post by wsamples » Sat Oct 25, 2014 6:17 pm

It seems that setting the BOM might be a bad idea. There is something else that works for me in my environment that you could try.

Try inserting this at the very top of your document:

Code: Select all

<?lc put header "Content-Type: text/html; charset=utf-8" ?>
I found that this was necessary despite the fact that my server's default is to set the content-type header to utf-8. The header being sent for lc pages was in fact iso-8859-1.

Make sure you have unset the BOM! This is critical because BOM information is sent first and interferes with the response headers! The BOM is not a requirement for html5. What w3c says is required is that browsers recognize and use it if it's present. Actually, if you leave the BOM set, the page works, but the header is not sent. This may be true of any other custom header you want to send, possibly including cookies, which could be a bad thing. (It may also be that some part of this behavior is peculiar to my environment.)

I would also point out that although w3c "requires" modern browsers to use the BOM when present to detect page encoding, ignoring any other page or header declarations, the w3c's validator is completely confused by the BOM, in my experience.

At any rate, this is something you could investigate in your server environment, as a possible solution

RobertC
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 46
Joined: Sun Feb 04, 2007 3:43 pm

Re: LC Server and UTF8

Post by RobertC » Sat Nov 08, 2014 9:32 pm

Thanks.
I have removed BOMs everywhere. They also interfere if you use them in .htaccess files and other places.
OK for now.
R.
The Old Rant Robert.

Post Reply

Return to “Internet”