Escapes in string literals

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, LCMark

Locked
LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 1206
Joined: Thu Apr 11, 2013 11:27 am

Re: Escapes in string literals

Post by LCMark » Mon Jun 03, 2013 10:36 am

monte wrote:Hmm... but \u<code-point> is translated to different bytes depending on the type of unicode we are talking about. Or would a string literal only support UTF-8?
That's not quite the right way to look at things...

The \uXXXX form only has one meaning - that of the Unicode 'character' to which the code point maps. A string is a sequence of such characters. Text encodings (such as UTF8) only come into play when you need to represent such a thing as a sequence of bytes - eg when reading/writing to files.

Dealing with Unicode in LiveCode will be no different from dealing with native text at the moment... A char will be a roman a, a Greek alpha, an Egyptian hieroglyph etc... How it might be stored internally will be completely invisible to script, and when script requires a string in a particular encoding, any translation will happen at that point.

monte
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 1564
Joined: Fri Jan 13, 2012 1:47 am
Contact:

Re: Escapes in string literals

Post by monte » Mon Jun 03, 2013 11:15 am

OK, I don't want to hijack the thread... clearly you are doing something more complicated than a buffer and a length to represent the string ;-)
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 1206
Joined: Thu Apr 11, 2013 11:27 am

Re: Escapes in string literals

Post by LCMark » Mon Jun 03, 2013 11:29 am

OK, I don't want to hijack the thread... clearly you are doing something more complicated than a buffer and a length to represent the string
Indeed - I'll try and post a separate topic about how Unicode is going pan out this week :)

DarScott
Posts: 227
Joined: Fri Jul 28, 2006 12:23 am
Location: Albuquerque
Contact:

Re: Escapes in string literals

Post by DarScott » Mon Jun 03, 2013 3:59 pm

Perhaps the forms of \u depends on where things are going with LiveCode and Unicode. For me, it would be convenient if that included UTF8.

DarScott
Posts: 227
Joined: Fri Jul 28, 2006 12:23 am
Location: Albuquerque
Contact:

Re: Escapes in string literals

Post by DarScott » Mon Jun 03, 2013 4:11 pm

Whoops, I missed some Unicode comments before I made mine.

I eagerly await seeing Unicode directions. And byte sequences.

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: Escapes in string literals

Post by mwieder » Mon Jun 03, 2013 6:19 pm

Focusing back on the main discussion about bringing c-style formatted strings to the language then, as I said before, this is perfectly feasible at the point the new parser kicks in and people need to translate their scripts to take advantage of it. The translater will be able to handle rewriting the string literals appropriately. An important thing to remember here is that we are just talking about the interpretation of the tokens in the script - the act of consuming the string literal at the lexical analysis phase will result in a value that has the appropriate content (i.e. escapes resolved) and as such causes no performance impact.
Maybe I missed something in these (to date) seven pages of posts... you're suggesting that it will be necessary to run existing scripts through a translator to bring them up to conformance with the new parser? I like this idea, even with the pain that forced translation would bring. That being the case...

There are two issues here: readability and compatibility.
The two approaches that I prefer, agnostic quotes and c-style character escaping, both fail the backward-compatibility test separately or together if no translation of current scripts is involved. But taken together I think they improve readability.

The forced use of the "format" command is backward-compatible with current syntax but at a loss of readability.
Same with the "@" syntax, which has, I think, the added drawback of repurposing an existing operator.

Is there a way to mark scripts as having been translated, i.e., set a custom property or something that says "this script was made using a previous version of the engine and has now been updated for the new string syntax"? If so, then the translation could be automatic and behind the scenes.

monte
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 1564
Joined: Fri Jan 13, 2012 1:47 am
Contact:

Re: Escapes in string literals

Post by monte » Mon Jun 03, 2013 9:48 pm

Is there a way to mark scripts as having been translated, i.e., set a custom property or something that says "this script was made using a previous version of the engine and has now been updated for the new string syntax"? If so, then the translation could be automatic and behind the scenes.
I think the plan is there will be some object property governing which parser is being used.
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 1206
Joined: Thu Apr 11, 2013 11:27 am

Re: Escapes in string literals

Post by LCMark » Tue Jun 04, 2013 8:56 am

you're suggesting that it will be necessary to run existing scripts through a translator to bring them up to conformance with the new parser? I like this idea, even with the pain that forced translation would bring.
Yes - the new parser and syntax is incompatible with the current parser and syntax and so to use the new syntax will require a translation step. This won't be forced, you'll be able to translate scripts as and when you want since the engine will carry all the old parsing code with it too (eventually it will likely be factored out into a 'legacy support' module). The internal representation will be the same for both new and old syntax - in the refactor-syntax branch each command/function class has been endowed with a 'compile()' method which constructs the new internal representation from the parsing of the syntax.

So, at this point it will be possible to also change the structure of (string) literals and warn people about using things-in-identifiers-they-really-shouldn't-be should it be decided that we are going down that path :)
Is there a way to mark scripts as having been translated, i.e., set a custom property or something that says "this script was made using a previous version of the engine and has now been updated for the new string syntax"? If so, then the translation could be automatic and behind the scenes.
There will need to be some sort of internal flag so that the engine knows what parser to run the script through - and perhaps some mechanism for switching an object between the two (although the translator will only go old -> new).

Locked

Return to “Engine Contributors”