Escapes in string literals

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, LCMark

Locked
monte
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 1564
Joined: Fri Jan 13, 2012 1:47 am
Contact:

Escapes in string literals

Post by monte » Sat May 18, 2013 9:27 am

Code: Select all

put "I \"think\" it would be cool if we could \n do this" into whereDoIStart
Last edited by monte on Mon Jun 03, 2013 2:37 am, edited 1 time in total.
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

DarScott
Posts: 227
Joined: Fri Jul 28, 2006 12:23 am
Location: Albuquerque
Contact:

Re: Escapes in string constants

Post by DarScott » Sat May 18, 2013 5:46 pm

You can do that for a string that is the first parameter to format(). Special parsing is done there.

(However, the editor does not know about the special parsing, so colorization is off.)

The lack of such an ability can be mitigate (or perhaps better served) with constant folding. Is "constant folding" the right label; I mean the compiler computes some constant expressions at compile time. Perhaps it should also apply, or only apply, to constants.

put format("I \"think\" it would be cool if we could \n do this") into whereDoIStart

This will work now and the value will fold into a constant if and when constant folding comes. Slightly longer of course, but it won't break old stacks.

(I must be using the Code button wrong. I can't seem to mark that as code.)

sturgis
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 1685
Joined: Sat Feb 28, 2009 11:49 pm

Re: Escapes in string constants

Post by sturgis » Sat May 18, 2013 8:12 pm

Nah, its broke for me too.
DarScott wrote: (I must be using the Code button wrong. I can't seem to mark that as code.)

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: Escapes in string constants

Post by mwieder » Sat May 18, 2013 11:12 pm

It's been broken for quite a while now. Nobody's minding the store.

DarScott
Posts: 227
Joined: Fri Jul 28, 2006 12:23 am
Location: Albuquerque
Contact:

Re: Escapes in string constants

Post by DarScott » Sun May 19, 2013 6:07 pm

We seemed to have used the word "constant" two ways, so I better (try to) clear that up a bit.

Monte used "constant" in the sense of a literal.

I was referring to the constant command. If the adding of constant folding, the evaluation of constant expressions at compile time, be coincident (or near) allowing expressions that evaluate constants in the constant command, then that folding can be done there. My mention the constant command is a bit of a distraction, I was just wonder if people might think it should only apply there, or it should be expanded to expressions, if constant folding is added to LiveCode.

Now, to the implied question in the variable name whereDoIStart...

Well, since I brought up format(), I'd say start there...

1. Examine format() with a single parameter literal:
A. Does this work as I say?
B. How fast is it?
C. Does the format(...) overhead get in the way of reading?
D. Does format(...) make lines too long?
E. Should the special string formatting stay with format() only or should new functions be allowed to use it.
F. Should some special syntax or operator be used as a synonym? See 2D. (Consuming expansion flexibility)
G. How does this impact language philosophy?

2. If speed is an issue, then what language features can mitigate that?
A. Constant Folding (used where the compiler wants? explicit markers? constant command only? ((hints))?)
B. Handlers that run at compile time to set script locals used as constants
C. Changes that break scripts (eg fmt() alias, or all literals use the format-style string).
D. Add special literal strings, that do this, and encourage developers to use that even in the first parameter of format().

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 1209
Joined: Thu Apr 11, 2013 11:27 am

Re: Escapes in string constants

Post by LCMark » Mon May 20, 2013 11:28 am

There's a number of issues here to consider beyond the ones Dar has mentioned.

First of all, the writing is on the wall for the way the input string to 'format()' works. There is a huge problem with format() and matchText() - for the argument list, the way script is tokenized is changed. This means that what a token means is different depending on whether it is within the parantheses for format()/matchText() or whether it is within the parantheses of anywhere else in the language. This is because the way strings are tokenized is changed (see 'allowescapes') during parsing. This is a really really bad idea because it means that the syntactic structure of the script has a direct affect on the lexical structure of the script. (I don't think I can over-stress just how bad an idea this is in terms of language design - it means the parsing and lexical phases cannot be entirely separated, and it heavily restricts the type of parsing you can use).

So, in the new scheme of things (with the new parser), the format of the input strings to both format() and matchText() will change to remove the offending problem - that is of allowing \" (escaped quotes) - instead something like \q will map to a quote. [ Note that this is a transformation that can be done by the script translater that will be built-in, so there's no problem with 'breaking' things here ].

With that out of the way, back to Monte's original suggestion. There are two issues I see here.

The first is backwards compatibility right now - we cannot (right now) change significantly how string constants are interpreted as this will break any one's scripts that contain '\' in string constants.

The second is the simple question - do we want C-style formatted strings as the default in the language? My issue here is that it does slightly go against the 'easy-to-use' language aspect as it means you have to understand much more about the format of string constants to be able to use them.

In terms of the first problem, then we could consider changing things at the new syntax point as there will be a translater for old-style scripts so all string constants could be transformed to adopt any new scheme. However, there is a compromise option which could be done now without any issue with backwards-compatibility and that is to adopt the Pascal approach:

Code: Select all

put "Hello ""World""!" into tVar
Would result in the string

Code: Select all

Hello "World"!
Here two quotes next to each other would be elided to a single quote. Since there is no place (at the moment) you can have two string constants next to each other (in all cases it is a syntax error) there's no issue with backwards compatibility. The only slight issue I have with this is readability (although no less readable perhaps than having \" all over the place) and that would be mitigated heavily by script highlighting.

Of course, this compromise option does mean you don't get \t and \n, but then I think it is quotes in string literals that cause the most problem at the moment:

Code: Select all

put "I \"think\" it would be cool if we could \n do this" into whereDoIStart
Would be written something like this:

Code: Select all

put "I ""think"" it would be cool if we could" & return & \
        "do this" into whereDoIStart
So that is essentially the only option we have right now.

Talking to the second concern I had above about usability - I am slightly uncomfortable forcing the notion of C-style formatted strings upon the language since I don't think they quite fit. Yes, I can see how such things would be convenient - but there is a balance to be struck between convenience for the (experienced) developer, and ease-of-use for all. Now, that being said, it doesn't seem too onerous to suggest that if you want C-style formatting you use 'format()' - especially as that allows you to do a lot more things than just parse out escape characters. Also, constant-folding (as Dar suggested) will be a relatively easy optimisation to add in the new scheme of things (with the new parser / evaluator) and at that point such uses of 'format' would become free.

P.S. I had no problem using the 'code' block formatting option here - I used it above...

sturgis
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 1685
Joined: Sat Feb 28, 2009 11:49 pm

Re: Escapes in string constants

Post by sturgis » Mon May 20, 2013 2:50 pm

What browser are you using? I'm on chrome, ML latest update. I have to type in the codes by hand, none of the buttons work for me. In fact, the cursor is entirely removed from the field and any selection is 'unselected' when a button is clicked.
P.S. I had no problem using the 'code' block formatting option here - I used it above...

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 1209
Joined: Thu Apr 11, 2013 11:27 am

Re: Escapes in string constants

Post by LCMark » Mon May 20, 2013 3:46 pm

@sturgis: Chrome on Snow Leopard - fully up to date and the buttons seem to work fine. How odd!

DarScott
Posts: 227
Joined: Fri Jul 28, 2006 12:23 am
Location: Albuquerque
Contact:

Re: Escapes in string constants

Post by DarScott » Mon May 20, 2013 4:01 pm

I had missed the special parsing in matchText. Perhaps I accidentally used it and didn't know.

I agree with Mark about the language issue. The proposed fix is a hard hitting language change, though, and it would be nice to soften that with an overlapping time period. That is, maybe there is a period in which old scripts work for format() and the new string literal syntax works. Folks would be encouraged to use new string syntax in format() and have a period in which to convert their scripts. I'm not sure what the challenge is there.

This also makes it easier to switch from literals in the format() to literals in variables, the syntax is uniform.

To me, this approach cleans up the language so much that I don't mind the breaking of scripts.

(I'm on Safari with all its shots and I can't do Quote or Code.)

I also agree with staying with a LiveCode style in strings. An alternative syntax to double-quote might be to allow a single quote on the outside in which case the double-quote character does not need to be repeated, but, though slightly more readable in many cases, it is not as flexible and it consumes the single quote in future syntax options. There might be a way that colorizing can help with reading the repeated quote, so maybe that is not an issue. Maybe the first of a repeated quote is dimmed slightly.

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 1209
Joined: Thu Apr 11, 2013 11:27 am

Re: Escapes in string constants

Post by LCMark » Mon May 20, 2013 4:15 pm

@DarScott: I did perhaps muddy the waters here by mentioning the fate of format() and matchText(), although the issue there is highly-related to the proposed change in string literals.

The change to format() and matchText() (changing \" to \q) will come to the new style syntax ('Clean Syntax') which will appear before 'Open Language'. However, since the new syntax parser is incompatible with the old (due to ambiguity and inability to express the current ad-hoc syntax in a structured way) there will be a translation system built in. Scripters won't have to update their scripts (the 'old' parsing code will still be present in the engine to ensure 100% compatibility of old scripts) until they want to, at which point the IDE will be able to do it for them and this will include changing any string literals in format() or matchText() parameter lists at the same time. Thus there is a natural overlapping period built-in for that and no backwards-compatibility issue.

The "" (double quote) escape for quote in strings could be done now without any impact I can think and would make a lot of scripts manipulating quotes in string literals a little more readable.

DarScott
Posts: 227
Joined: Fri Jul 28, 2006 12:23 am
Location: Albuquerque
Contact:

Re: Escapes in string constants

Post by DarScott » Mon May 20, 2013 5:48 pm

I think format() and matchText() are important in this issue. So, I don't think it is muddying the water.

I can't think of any way that adding the repeated-quote syntax would break anything, so I think you are right there. I'm trying to think of some command that would take two literals in a row.

And you are already thinking of a transition period. I'm not sure when \q is best turned on.

So, it seems to me that you are (as usual) way ahead of both me and Monte on this. I like how this is going.

(I have used "repeated-quote" to avoid the ambiguity of "double-quote" which is sometimes used for the two-tick quote to avoid confusion with the single tick quote mark.)

monte
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 1564
Joined: Fri Jan 13, 2012 1:47 am
Contact:

Re: Escapes in string constants

Post by monte » Tue May 21, 2013 6:51 am

Hmm... well I guess "" is better than nothing. "& quote &" is terrible so at least we get past that. \t and \n would be nice though. I don't think "& return &" aids readability or learning at all. Also unless we get unicode scripts then \u would be helpful I think...

Is there any reason to add \q to format etc when you could just translate \" to "" ?
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

DarScott
Posts: 227
Joined: Fri Jul 28, 2006 12:23 am
Location: Albuquerque
Contact:

Re: Escapes in string constants

Post by DarScott » Tue May 21, 2013 7:05 am

monte wrote:Is there any reason to add \q to format etc when you could just translate \" to "" ?
yeah

(My problem with "& return &" is because I'm an old timer who thinks ASCII CR.)
(It is interesting that comma and space they their own concatenating operator, but tab and new line do not.)

monte
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 1564
Joined: Fri Jan 13, 2012 1:47 am
Contact:

Re: Escapes in string constants

Post by monte » Tue May 21, 2013 7:55 am

(My problem with "& return &" is because I'm an old timer who thinks ASCII CR.)
(It is interesting that comma and space they their own concatenating operator, but tab and new line do not.)
Can you think of nice operators to use?
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

markw
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 32
Joined: Mon Mar 04, 2013 4:44 pm

Re: Escapes in string constants

Post by markw » Tue May 21, 2013 9:51 am

To keep things easy to understand for the beginner I think you have to keep things like "First line" & return & "Second line" but some shorthand syntax might be nice for readability.

Are ; and : free for tab and return? e.g. "First line" : "Second Line" : "Third line" or "First item" ; "Second item" ; "Third item"

Locked

Return to “Engine Contributors”