Page 1 of 2

Multiple itemDelimiters

Posted: Wed Sep 24, 2014 11:36 pm
by dunbarx
I have wanted this since forever. In the beginning, there was the comma. Then came any character. A wonderful improvement. But I wish there were, like custom properties, the ability to define as many "item" delimiters as one wanted. In many scripts I write, and especially in loops, I set and reset the itemDelimiter, back and forth, parsing data in different portions of the processed text. Consider data sets like:

aaa,bbb,ccc#ddd,eee,fff -- note the "," and "#" separating different portions of each string
ggg,hhh,jjj#kkk,xxx,yyy

Not sure what to call them, ( "item1.item2...")? So you could (pseudo):

Code: Select all

get theAboveText
set the itemDel to comma
set the item2Del to "#"

repeat with x = 1 to the number of lines of it
  repeat with y = 1 to the number of items of line x of it
    repeat with z = 1 to the number of item2 of item y of line x of it
     put item2 z of item y of line x of it & return after temp
    end repeat
  end repeat
end repeat
That sort of thing...

I admit the naming scheme ("item2") is awful, but the concept seems like a winner. It would be as if not only could you delimit by items, but also by "buckets", "packages", "parcels" and "trainLoads":

Code: Select all

answer parcel 2 of item 3 of bucket 4 of yourString
Forget having as many as you want; perhaps only a small fixed number of new "item" keywords would be necessary. I bet adding the four above would cover most bases.

Craig Newman

Re: Multiple itemDelimiters

Posted: Thu Sep 25, 2014 11:37 am
by WaltBrown
I had seen a request like that a while ago. Terms like wordDelim, phraseDelim, sentenceDelim, chunkDelim, might be other arbitrary names to hold multiple delimiters.

Re: Multiple itemDelimiters

Posted: Thu Sep 25, 2014 7:37 pm
by phaworth
I like that idea. An alternative might be to have a "delimited by" clause with the default being comma. So your code would become

Code: Select all

get theAboveText
repeat with x = 1 to the number of lines of it
  repeat with y = 1 to the number of items of line x of it
    repeat with z = 1 to the number of items delimited by "#" of item y of line x of it
     put item z of item y of line x of it & return after temp
    end repeat
  end repeat
end repeat

Re: Multiple itemDelimiters

Posted: Fri Sep 26, 2014 12:51 am
by dunbarx
Well, that is a great way to eliminate the silly names of the new "item" keywords. And it reads in a very grown-up way.

It is not as simple or neat as an explicit "newItem" keyword (like "trainload"), though. But I bet it would be easier to sell that way, and has the advantage of having virtually no limits as to the number of new keywords. Maybe "trainLoad" is a bit much, after all.

Note that there is another thread where it has been discovered that in LC 7, both the itemDelimiter and the "split" command (so far) may now contain any number of characters:

Code: Select all

set the itemDelimiter to "XYZ"
answer item 2 of "aaaXYZbbb" --gives "bbb"
I guess this is a good thing. It would allow the new syntax you suggested to be very descriptive:

Code: Select all

...the number of items delimited by "colorChoice" of... 
or whatever.

Craig

Re: Multiple itemDelimiters

Posted: Fri Sep 26, 2014 1:45 pm
by paul@researchware.com
I really like the 'item delimited by' convention. However, I think to be reasonably parsable by programming language syntax parser, it would also need to be in the 'put statement. As in:

put item z delimited by "#" of item y [delimited by ","] of line x of it & return after temp

Otherwise, the interpreter has to some how remember the delimiter associated with each variable to differentiate between "item z" and "item y"

Re: Multiple itemDelimiters

Posted: Sun Sep 28, 2014 1:24 am
by dunbarx
Good point.

Unless the parser always knows that if no "delimited by" phrase is seen, then either the default delimiter is to be used (comma), or the most recent explicit itemdelimiter declaration. I think this should be readily implemented.

Craig

Re: Multiple itemDelimiters

Posted: Sun Sep 28, 2014 1:45 am
by paul@researchware.com
Craig,

Exactly my point. Using a parser rule that if no "delimited by" phrase is seen, then either the default delimiter is to be used (comma), or the most recent explicit itemdelimiter declaration the original sample code (below) would not be parsed as intended.

Code: Select all

repeat with x = 1 to the number of lines of it
  repeat with y = 1 to the number of items of line x of it
    repeat with z = 1 to the number of items delimited by "#" of item y of line x of it
     put item z of item y of line x of it & return after temp
    end repeat
  end repeat
end repeat
The line put item z of item y of line x of it & return after temp would see that no "delimited by" clause was present, so it would use comma or the most recent delimited by (from the repeat above) of "#" and both item z and item y would look for "#" delimiters.

I mean, you can write a parser to bind a variable (z or y) to remember that if the variable is preceded by "item" remember the previously bound item delimiter, but from a interpreter coding perspective, it is much more complicated than using the "delimited by syntax in the put statement itself.

Re: Multiple itemDelimiters

Posted: Sun Sep 28, 2014 4:21 am
by phaworth
Aplogies, I just missed the "delimited by" clause from the put by oversight.
It would make coding simpler to use the repeat for format:

Repeat for each item ritem delimited by "#" of item x of line y of tText
put ritem.....

I'm now thinking this syntax could be used to extend LC to include generic chunks. We'd need a new keyword, perhaps "chunk", and then we could:

put chunk 4 delimited by " $" of chunk 1 delimited by "*".....

Pete

Re: Multiple itemDelimiters

Posted: Mon Sep 29, 2014 4:42 pm
by dunbarx
Paul.

Phaworth's typo aside, are we all now on the same page? I think all the new ideas, especially a new keyword (and why not "chunk"? It is not currently a native word) would allow exquisite control over text parsing.

What office do we protest outside of? I will make placards.

Craig

Re: Multiple itemDelimiters

Posted: Mon Sep 29, 2014 4:57 pm
by FourthWorld
You could submit it to the request queue:
http://quality.runrev.com/

If there's enough interest in this you may be able to find someone in the community to implement it.

Re: Multiple itemDelimiters

Posted: Tue Sep 30, 2014 4:18 pm
by dunbarx
Enhancement logged. Mark replied:
Hi Craig,

I wonder if a slightly different approach might be better here.

What you are essentially doing is saying you have a string which is a sequence
of nested lists. At the first level, the list is delimited by ',', and at the
second level the list is delimited by '#'.

I wonder if extending the split command to handle arbitrarily deep nestings
might be more appropriate. For example:

split it by "," then "#"

Would give you an array:
it[1][1]="ggg"
it[2][1]="hhh"
it[3][1]="jjj"
it[3][2]="kkk"
it[4][1]="xxx"
it[4][1]="yyy"

Essentially, the command would have an identical effect as follows:
split it by ","
repeat with i = 1 to the number of elements of it
split it by "#"
end repeat

The loop would then become:

repeat for each line tLine in it
split tLine by "," then "#"
repeat for each element tItem1 in tLine
repeat for each element tItem2 in tItem1
put tItem2 & return after temp
end repeat
end repeat
end repeat

It provides a similar feature as the one you suggest, except that could perhaps
be a bit more efficient (only one pass is ever done over the input string) and
avoids having to manage multiple delimiters simulataneously.

Warmest Regards,

Mark.


Apparently there is interest in Scotland. I replied:
Mark.

That is very nice indeed for the "split" command, to create a multi-level array in one pass. But what if I want to parse my data in the clear? In that case, as the lower portion of that thread delves into, it is necessary to declare a few explicit delimiters and place them as required in an implicit "nesting".

Both enhancements would be fantastic, but only upping the "split" command would leave much on the table, don't you agree?

Anyway, thank you for getting back to me. You are the Dan Winkler of the 21st century.

Craig

Re: Multiple itemDelimiters

Posted: Tue Sep 30, 2014 4:30 pm
by paul@researchware.com
I concur with Craig, both an enhanced split command for dimensional data beyond 2 and an 'chunk x delimited by y' for would be very helpful additions for people dealing with large amounts of data.

Sometimes you have a big blob of data you want to parse all at once (split) and some time you want a specific item from a specific line (chunk 3 delimited by "#" of chunk 2 delimited by "," of chunk 6 delimited by tab of tData)

Re: Multiple itemDelimiters

Posted: Sat Oct 18, 2014 3:34 am
by [-hh]
Better give it up.

Originally I intended to support your request. It's very creative and kind of trying to introduce multiplication after one has addition. But then I saw today arguments in the use-list: ...

Edit. If you can't see these posts, they can't be there.

Re: Multiple itemDelimiters

Posted: Sun Oct 19, 2014 3:06 am
by dunbarx
Hermann.

I did not see the posts on the use-list you refer to.

Oh, and never give up.

Craig

super itemDel redux

Posted: Tue May 15, 2018 6:16 pm
by dunbarx
I am re-opening this issue (See "Multiple itemDelimiters" in the forum). I had asked that the language be enhanced with additional chunkDelimiters, "parcels", "servings", whatever.

So one could have item 3 of parcel 4 of serving 5 of myText, all those gadgets defined beforehand.

Phaworth cleverly suggested a variant on this theme, created on the fly:

Code: Select all

repeat with z = 1 to the number of items delimited by "#" of line x of it
Both methods seem incredibly powerful and useful. I am wondering if this might yet gain traction.

Craig Newman