filter enhancements

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, LCMark

Locked
Janschenkel
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 977
Joined: Sat Apr 08, 2006 7:47 am
Location: Aalst, Belgium
Contact:

filter enhancements

Post by Janschenkel » Thu May 23, 2013 2:29 pm

Following up on a use-livecode mailing list discussion, I figured I'd post here.

There are a couple of enhancement requests regarding the 'filter' command:
  • Bug 2805 - Filter should implement the regular expression syntax
  • Bug 7441 - Filter should work with items, tokens, words and chars as well
  • Bug 9959 - Extend Filter and Sort commands with 'into' clause
There are a two more (1889/9960) but I'd rather leave those for separate discussions.

Here's what I propose to add:
- filtering items in addition to lines (matching what 'sort container' can do)
- filtering using a regular expression (in addition to the existing 'filterPattern')
- outputting to another container instead of overwriting the existing one

How about the following syntax:

Code: Select all

filter [ { lines | items } of ] <sourceContainer> { { with | without } <filterPattern> | matching <regexPattern> } [ into <targetContainer> ] 
This would add a single keyword 'matching' to the language, and the rest of the work can be pretty much confined to extending cmds.h/cmdsf.cpp

What say you?

Jan Schenkel.
Quartam Reports & PDF Library for LiveCode
www.quartam.com

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 1207
Joined: Thu Apr 11, 2013 11:27 am

Re: filter enhancements

Post by LCMark » Thu May 23, 2013 2:59 pm

@Janschenkel: Well, since you asked ;)

There's a number of aspects here, which should probably be considered individually.

Regular Expressions:
It would be good to give access to PCRE REs rather than the (old-style) Unix type ones that filter currently supports. I quite like the idea of 'matching' being used to distinguish between the old and new, indeed a 'x matches y' operator would be quite useful also. Moving forward, 'replace' could benefit from being RE aware - although 'matching' / 'matches' doesn't really work here. Something like:

Code: Select all

    replace pattern <pattern> with [ pattern ] <replace_pattern> in <container>
Of course, this then begs the question would:

Code: Select all

    filter <container> [ not ] matching pattern <pattern>
    x matches pattern y
Be better / more consistent? It also clearly indicates that <pattern> is treated as a regular expression, rather than just a regular string.

Into Clause:
I think we could do a little better here - incorporating some semantics from 'convert' at the same time.

Code: Select all

    filter <container_or_expr> [ not ] matching pattern <pattern> [ into <container> ]
The combinations could work thusly:
  • If there is no into clause and <container_or_expr> is an expression (i.e. not writable) then the result is put into 'it'.
  • If there is no into clause and <container_or_expr> is a container (i.e. writable) then the result is put into the container.
  • If there is an into clause then it doesn't matter what sort of syntax <container_or_expr> is, as the result is put into the 'into' container.
Other Chunks:
There's an issue with 'token' and 'word' chunks here due to white-space. i.e. A word can have many spaces between it and the next one - thus policy on preservation (or not!) needs to be worked out. A similar problem arises when considering the set of chunks 'sort' can deal with. In either case, there's an abstraction lurking here in terms of processing this kinds of stuff - and then there's the whole other bundle of joy with regards how to get it to operate on field chunks without obliterating the formatting.

The Regular Expression and Into Clause changes could be relatively easily cleaved into the existing code... Sorting out 'Other Chunks' would probably be best done after - once suitable abstractions have been sorted out internally (to stop duplication of code / make it easier to implement).

Janschenkel
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 977
Joined: Sat Apr 08, 2006 7:47 am
Location: Aalst, Belgium
Contact:

Re: filter enhancements

Post by Janschenkel » Thu May 23, 2013 5:53 pm

@RunRevMark: You're right, we should split them up - at the very least when it comes to the implementation.
I mainly wanted to bounce the syntax idea off of the forum readers.

Regular expressions - looking at other commands and operators

I like the 'matches' operator idea as it would nicely complement 'begins with' and 'ends with' - but I'm not sure about the idea of having to add 'pattern' wherever I use 'matches' or 'matching'. The 'replace' command has a 'replaceText' function counterpart with regex support.

Then again, adding 'pattern' throughout would make for consistent syntax. How about making it optional syntactic sugar for 'matches' and 'matching'?

Code: Select all

filter <container> [ not ] matching [ pattern ] <pattern>
x matches [ pattern ] <pattern>
replace pattern <pattern> with [ pattern ] <replace_pattern> in <container>
Come to think of it, surrounding certain operators with a 'not' and parentheses also looks a bit clunky, so the following might be nice too:

Code: Select all

x does not match [ pattern ] <pattern>
x does not begin with <prefix>
x does not end with <suffix>
But that's definitely out of the scope of this discussion...

Into clause - incorporating semantics from 'convert'

Thanks for pointing out these semantics of the 'convert' command; they're not described in the docs but definitely handy!
And it does sound like a very fluent way of coding for the developer.
Thinking out loud, these semantics could also apply to the 'split' command - but that's perhaps a bridge too far and again out of scope...

Other chunks - let's limit this to 'items'

I didn't want to go the whole way, just the ability to filter items as this would match the sort command capabilities and is straightforward to implement.

Code: Select all

filter [ { lines | items } of ] <container> ... 
Other chunk types would definitely be a headache for the reasons RunRevMark mentioned.

Combining it all - into a single syntax definition

So when we combine the above we would come to the following new syntax for the 'filter' command.

Code: Select all

filter [ { lines | items } of ] <container_or_exp> { { with | without } <filter_pattern> | [ not ] matching [ pattern ] <regex_pattern> } [ into <container> ] 
Definitely food for multiple feature branches :-)

Jan Schenkel.
Quartam Reports & PDF Library for LiveCode
www.quartam.com

phaworth
Posts: 592
Joined: Thu Jun 11, 2009 9:51 pm

Re: filter enhancements

Post by phaworth » Thu May 23, 2013 6:38 pm

filter [ { lines | items } of ] <container_or_exp> { { with | without } <filter_pattern> | [ not ] matching [ pattern ] <regex_pattern> } [ into <container> ]

Do we need the "not" option? Regular expressions already allow for "not" operations. Could lead to some interesting double negatives!

Pete

ricochet1k
Posts: 39
Joined: Tue Apr 23, 2013 1:30 am

Re: filter enhancements

Post by ricochet1k » Thu May 23, 2013 6:56 pm

Regular expressions can be made to match negatives, but the results can be painful. What's the opposite of /abc[def]+ghi/ for instance? A not option sounds like a good idea to me.

Are the regexs going to be a string or a new pattern token like Perl/Javascript/etc.? String patterns will be very annoying if escapes can't be used, but new token patterns can be a pain to parse.

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 1207
Joined: Thu Apr 11, 2013 11:27 am

Re: filter enhancements

Post by LCMark » Fri May 24, 2013 12:52 pm

Regular expressions can be made to match negatives, but the results can be painful. What's the opposite of /abc[def]+ghi/ for instance? A not option sounds like a good idea to me.
Yes - not is required I think: 'filter with' == 'filter matching', 'filter without' == 'filter not matching'.
Are the regexs going to be a string or a new pattern token like Perl/Javascript/etc.? String patterns will be very annoying if escapes can't be used, but new token patterns can be a pain to parse.
I was thinking patterns would just be strings rather than a new token type. Since it is proposed that patterns would be PCRE-style, that already comes with its own escape structure and so we'd just follow that (with the only change being "" would perhaps indicate a quote, rather than \").

Btw, as an aside, is it just me or is the verb 'filter' slightly confusing. I always find myself having to think twice about what 'filter ... with' does and I'm not sure 'filter matching' improves that any. Perhaps its just one of these concepts one has to learn and then move on from, although I wonder if there might be a better syntax for expressing the commands action.

phaworth
Posts: 592
Joined: Thu Jun 11, 2009 9:51 pm

Re: filter enhancements

Post by phaworth » Fri May 24, 2013 3:37 pm

You're right, "not" is needed.

I'm not familiar with some of the terminology being used, PCRE specifically. I hope the intention is not to have a different regexp syntax than that used by matchText? It's hard enough learning one variant f regexp syntax, I'd hate to have to learn another.

Also, I'm wondering whether we need the " [not] matching" phrase. Maybe with/without would suffice:

filter with pattern .....
filter without pattern...

Maybe it doesn't flow as well as matching but pattern signifies the use of a regexp. The less syntax variations the better for me.

Pete

phaworth
Posts: 592
Joined: Thu Jun 11, 2009 9:51 pm

Re: filter enhancements

Post by phaworth » Fri May 24, 2013 3:44 pm

One other question. What does the "items" keyword do specifically? Is the intention to filter on a specific item number? If so, that would be great but I didn't see a way to specify an item number in the syntax.
Pete

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 1207
Joined: Thu Apr 11, 2013 11:27 am

Re: filter enhancements

Post by LCMark » Fri May 24, 2013 3:48 pm

@phaworth:

PCRE (perl-compatible regular expressions) is the library we use to provide the regexp support in matchText() and replaceText().

The 'items' clause would cause filter to use the itemDelimiter to split the string up into chunks rather than the lineDelimiter. It would bring greater consistency to the command in line with others such as sort.

Janschenkel
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 977
Joined: Sat Apr 08, 2006 7:47 am
Location: Aalst, Belgium
Contact:

Re: filter enhancements

Post by Janschenkel » Fri May 24, 2013 4:49 pm

@phaworth:

I've been pondering your suggestion of not introducing the 'matching' keyword and just adding 'pattern' after 'with'/'without' to distinguish between wildcard and regex filtering. The downside of this approach would be that the current wildcard is already known as 'filterPattern' - plus, we already have 'matchExpression' in multiple places in the docs where a regular expression can be used ('replaceText', 'matchText', 'matchChunk').
We definitely don't want to confuse developers when some people already have to sit back and think if they should use 'with' or 'without' when they're about to use the 'filter' command.

@RunRevMark:

I just noticed while looking at the code that the 'filter' command ignores the 'lineDelimiter' property and always uses '\n' - maybe I should file a bug report and then go in and fix it :-)

Jan Schenkel.
Quartam Reports & PDF Library for LiveCode
www.quartam.com

phaworth
Posts: 592
Joined: Thu Jun 11, 2009 9:51 pm

Re: filter enhancements

Post by phaworth » Fri May 24, 2013 5:19 pm

The 'items' clause would cause filter to use the itemDelimiter to split the string up into chunks rather than the lineDelimiter. It would bring greater consistency to the command in line with others such as sort.
This is probably outside the scope of the initial implementation but, continuing the sort command analogy, it would be great to be able to say:

filter lines of tVar by item 3 of each with "xyz"

or

filter items of tVar by word 2 of each with "mark" (understanding the previous comment on the difficulties of filtering words)

Pete

Janschenkel
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 977
Joined: Sat Apr 08, 2006 7:47 am
Location: Aalst, Belgium
Contact:

Re: filter enhancements

Post by Janschenkel » Fri May 24, 2013 5:28 pm

@phaworth

I was hoping to someday add Filter-Map-Reduce functionality in the engine and use the 'by' keyword for that purpose.
Check out enhancement request 9960 for more information.

Cheers,

Jan Schenkel.
Quartam Reports & PDF Library for LiveCode
www.quartam.com

phaworth
Posts: 592
Joined: Thu Jun 11, 2009 9:51 pm

Re: filter enhancements

Post by phaworth » Fri May 24, 2013 5:35 pm

Janschenkel wrote:@phaworth

I was hoping to someday add Filter-Map-Reduce functionality in the engine and use the 'by' keyword for that purpose.
Check out enhancement request 9960 for more information.

Cheers,

Jan Schenkel.
Sounds good Jan, "by" is more natural syntax I think.

Pete

Janschenkel
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 977
Joined: Sat Apr 08, 2006 7:47 am
Location: Aalst, Belgium
Contact:

Re: filter enhancements

Post by Janschenkel » Fri May 24, 2013 10:40 pm

@RunRevMark:
I've been pondering the implications of the 'convert' semantics wrt the <container_or_exp> part, and although more complex to implement, I think this deserves wider adoption.

Suppose you have a tab-separated text file with a number of lines you don't need:

Code: Select all

filter url "file:toomuchdata.txt" with "100" & tab & "*"
set the dgText of group "DataGrid" to it
Or if you don't need to filter anything out but want it sorted straight away:

Code: Select all

sort url "file:unsorteddata.txt" ascending numeric by item 2 of each into tPreSortedData
It sure is a nice change from the day-job bug-hunting activities to think about these things :-)

Jan Schenkel.
Quartam Reports & PDF Library for LiveCode
www.quartam.com

monte
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 1564
Joined: Fri Jan 13, 2012 1:47 am
Contact:

Re: filter enhancements

Post by monte » Sat May 25, 2013 7:02 am

While we are pondering on filter it would be nice to consider arrays:

Code: Select all

filter [keys|elements of] <container> [ not ] matching [ pattern ] <pattern>
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

Locked

Return to “Engine Contributors”