Pattern Matching in Livecode

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

capellan
Posts: 654
Joined: Wed Aug 15, 2007 11:09 pm

Pattern Matching in Livecode

Post by capellan » Wed Nov 09, 2016 1:38 pm

Hi all,

Recently I made a very long script for searching a pattern among a list
(download and check the zipped stack attached to this message).

I suspect that LiveCode provides better tools for this task, but I don't know
which are and how to use them. Maybe a simpler solution is to employ a regex,
arrays combinations or a really clever handler.

How many different methods (functions and commands) provides Livecode
to make this task of comparing and finding 3 words (from a list of 12 words)
among 1080 lines of 4 words?

Thanks in advance!

Al
MatchingPatternsv02.zip

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10305
Joined: Wed May 06, 2009 2:28 pm

Re: Pattern Matching in Livecode

Post by dunbarx » Wed Nov 09, 2016 3:20 pm

Hi.

(I saw this on the use-list as well, so I repost here)

Long handler?

Wouldn't the judicious use of the "wordOffset" function, with a little help from the "words to skip" parameter, do this rather quickly and with just a handful of lines of code?

I did not download your stack, but perhaps I am not understanding what you meant?

Craig

[-hh]
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 2262
Joined: Thu Feb 28, 2013 11:52 pm

Re: Pattern Matching in Livecode

Post by [-hh] » Wed Nov 09, 2016 8:09 pm

was a bit complicated :-)
Last edited by [-hh] on Thu Nov 10, 2016 9:53 pm, edited 1 time in total.
shiftLock happens

capellan
Posts: 654
Joined: Wed Aug 15, 2007 11:09 pm

Re: Pattern Matching in Livecode

Post by capellan » Wed Nov 09, 2016 8:47 pm

Hi Craig,

Your clever idea of reverting search terms works great too.
Many Thanks for answering my request! :D

Still, I am curious to learn if regex and arrays operations
could produce the same results.

Have a nice week!

Al
Matching Pattern v03.PNG
MatchingPatternsv03.zip

[-hh]
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 2262
Joined: Thu Feb 28, 2013 11:52 pm

Re: Pattern Matching in Livecode

Post by [-hh] » Wed Nov 09, 2016 9:31 pm

Using Craigs idea that a repeat is faster if walking though the less words of the combination, this is my improved 'combined' proposal:

Code: Select all

on mouseUp
  put the millisecs into m1; lock screen; lock messages
  put empty into fOUT; put empty into check
  put fld id 1008 into zxs; put fld id 1014 into SC1080
  repeat for each line K in SC1080
    repeat for each word L in K
      if L is in zxs then add 1 to check[K]
    end repeat
    if check[K] = 4 then put cr & (word 1 to 3 of K) after fOUT
  end repeat
  delete char 1 of fOUT; sort fOUT; put fOUT into fld id 1025
  unlock screen; unlock messages; put "c+h: " & (the millisecs -m1) into fld "info"
end mouseUp
shiftLock happens

capellan
Posts: 654
Joined: Wed Aug 15, 2007 11:09 pm

Re: Pattern Matching in Livecode

Post by capellan » Wed Nov 09, 2016 9:36 pm

Just noticed that Peter and Hermann
posted handlers too. Later, I will test
Peter's handlers for processing data.

In this update, I just copy and paste
Hermann's handler in the script of
a button.

Checking their processing time, Craig's idea is
faster using just 3, 4 or 5 milliseconds.
Hermann script just takes from 10 to 15 milliseconds
but my script takes a lot longer: from 30 to 34 milliseconds.

Al
MatchingPatternsv04.livecode.zip
MatchingPatternsv04.png

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Pattern Matching in Livecode

Post by Thierry » Thu Nov 10, 2016 11:56 am

Hi Alejandro,

As you are interested in different solutions, including regex,
here is a quick one I made in a couple of minutes:

Code: Select all

 
filter lines of field id 1014 with regex \
         pattern format("^(?:(?:%s)\\s){3}",replaceText(fld id 1008,return,"|")) \
         into fld "RegexRocks"

Enjoy!

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

[-hh]
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 2262
Joined: Thu Feb 28, 2013 11:52 pm

Re: Pattern Matching in Livecode

Post by [-hh] » Thu Nov 10, 2016 2:18 pm

was a bit complicated :-)
Last edited by [-hh] on Thu Nov 10, 2016 9:54 pm, edited 1 time in total.
shiftLock happens

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Pattern Matching in Livecode

Post by Thierry » Thu Nov 10, 2016 7:28 pm

Hi all,

Out of curiosity, I tried another solution with regex plus did some benchmarks too.
Tested on Mac with LC 8.1.1.
Here are the results in milliseconds:
tdz1: 2 tdz2: 9 hh1: 40 hh2: 266
tdz1: 3 tdz2: 8 hh1: 32 hh2: 263
tdz1: 2 tdz2: 9 hh1: 32 hh2: 262
tdz1: 3 tdz2: 8 hh1: 32 hh2: 263
tdz1: 2 tdz2: 9 hh1: 34 hh2: 262

Here is the benchmark part:

Code: Select all

on mouseUp
   local ml, R
   constant NumberOfTests = 5
   
   put fld id 1008 into random_12words // previously zxs
   put fld id 1014 into lines1080 // previously SC1080
   
   repeat for NumberOfTests
      put _timer200() into m1
      
      -- filter command with regex pattern
      put tdz1(random_12words, lines1080 ) into R2
      
      put "  tdz1: " & the millisecs -m1 after R
      put _timer200() into m1
      
      -- repeat loop with matchText() test
      put tdz2(random_12words, lines1080 ) into R3
      
      put "  tdz2: " & the millisecs -m1 after R
      put _timer200() into m1
      
      -- double repeat loop with 'is in' test
      put hh1(random_12words, lines1080 ) into R1
      
      put "   hh1: " & the millisecs -m1 after R
      put _timer200() into m1
      
      -- variant of double repeat loop with 'is in' test
      put hh2(random_12words, lines1080 ) into R4
      
      put "   hh2: " & the millisecs -m1 after R
      put cr after R
   end repeat
   
   put R
   
end mouseUp

function _timer200
   wait 200 milliseconds with messages
   return the milliseconds
end _timer200

and the 4 tested functions:

Code: Select all

function hh1 random12, text1080
   repeat for each line K in text1080
      repeat for each line L in random12
         if L is in K then add 1 to check[K]
      end repeat
      if check[K] = 3 then put cr & (word 1 to 3 of K) after R
   end repeat
   delete char 1 of R
   return R
end hh1

Code: Select all

function hh2 random12, text1080
   repeat for each line K in text1080
      repeat for each word L in K
         if L is in random12 then add 1 to check[K]
      end repeat
      if check[K] = 4 then put cr & (word 1 to 3 of K) after R
   end repeat
   delete char 1 of R
   return R
end hh2

Code: Select all

function tdz1 random12, text1080
   filter lines of text1080 with regex \
         pattern format("^(?:(?:%s)\\s){3}",replaceText(random12,return,"|")) \
         into R
   return R
end tdz1

Code: Select all

function tdz2 random12, text1080
   replace return with "|" in random12
   put "^((?:(?:" & random12 &  ")\s){3})" into REX
   repeat for each line K in text1080
      if matchText(K,REX,gotIT) then \
            put char 1 to -2 of gotIT &cr after R
   end repeat
   delete last char of R
   return R
end tdz2

Kind regards,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

capellan
Posts: 654
Joined: Wed Aug 15, 2007 11:09 pm

Re: Pattern Matching in Livecode

Post by capellan » Sat Nov 12, 2016 6:55 am

Hi Thierry,

It's amazing that your code almost reach 1 millisecond! :D
Previously, the best time was 3 milliseconds.

Many, many thanks for posting this regex solution.
Hopefully the array solution would appear too,
anytime soon.

Now, I am searching for sets of 12 words that produce 11
valid combinations. For now, I have just found only 1.

Have a nice weekend!

Al

rkriesel
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 119
Joined: Thu Apr 13, 2006 6:25 pm

Re: Pattern Matching in Livecode

Post by rkriesel » Mon Nov 14, 2016 8:46 am

Hi, capellan. I'm thinking of a technique using arrays to solve your problem, but there are a few things I don't understand about your stack.

Why does your stack say "There are 1080 unique possible combinations for those 81 words"? Why aren't there are (81! / (4! * (81-4)!)), which is 1663740 rather than 1080?

Why does any "possible combination" include the same word twice? Why does every combination start and end with same word?

-- Dick

capellan
Posts: 654
Joined: Wed Aug 15, 2007 11:09 pm

Re: Pattern Matching in Livecode

Post by capellan » Mon Nov 14, 2016 4:57 pm

Hi Richard,
Why does your stack say "There are 1080 unique possible combinations for those 81 words"? Why aren't there are (81! / (4! * (81-4)!)), which is 1663740 rather than 1080?
Because those are the rules for valid combinations of SET card game:
http://smart-games.org/en/main/rules/
http://smart-games.org/en/set/open_cards
Why does any "possible combination" include the same word twice? Why does every combination start and end with same word?
Actually, repeating the first word at the end of each line is
only needed for my very own (slow) handler. Craig's idea,
and both handlers posted by Hermann and Thierry
do not need this repetition of the first word.

You could delete safely the last word of each line without
affecting the final result:

repeat for each line z in 1080Combinations
delete last word of z
end repeat


Al

[-hh]
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 2262
Joined: Thu Feb 28, 2013 11:52 pm

Re: Pattern Matching in Livecode

Post by [-hh] » Wed Nov 16, 2016 5:13 pm

Below is a handler that doesn't search all 1080 admissible combinations (fulfilling the rules of the game SET) but determines the admissible combinations that can be built with the random sample of 12 words. This seems to be very fast and is elementary LiveCode.

Code: Select all

on mouseUp
  put fld id 1008 into zxs -- the random sample, nothing else needed
  put the millisecs into m1; lock screen; lock messages
  -- repeat 20 -- start your alternate method here
  put empty into fOut
  repeat with i=1 to 12
    put line i of zxs into l1
    put char 1 of l1 into u1; put char 2 of l1 into u2
    put char 3 of l1 into u3; put char 4 of l1 into u4
    repeat with j=i+1 to 12
      put line j of zxs into l2
      put char 1 of l2 into v1; put char 2 of l2 into v2
      put char 3 of l2 into v3; put char 4 of l2 into v4
      repeat with k=j+1 to 12
        put line k of zxs into l3
        -- checking for the rules of the game SET
        if u1 & v1 &char 1 of l3 is not in "111,222,333,123,132,213,312,231,321" then next repeat
        if u2 & v2 &char 2 of l3 is not in "RRR,GGG,PPP,RGP,RPG,GRP,PRG,GPR,PGR" then next repeat
        if u3 & v3 &char 3 of l3 is not in "FFF,LLL,HHH,FLH,FHL,LFH,HFL,LHF,HLF" then next repeat
        if u4 & v4 &char 4 of l3 is not in "DDD,OOO,SSS,DOS,DSO,ODS,SDO,OSD,SOD" then next repeat
        put cr & l1 &space& l2 &space& l3 after fOut
      end repeat
    end repeat
  end repeat
  -- end repeat -- end your alternate method here
  unlock screen; unlock messages; put (the millisecs -m1)
  put char 2 to -1 of fOut into fld id 1025
end mouseUp
I have here, without the (avoidable) read/write from/to a field, when using

> LC 6.7.11: 6 ms for 20 repeats
> LC 8.1.1: 19 ms for 20 repeats
> LC 9.0.0: 20 ms for 20 repeats

Good luck with evaluating the 70'724'320'184'700 different samples of 12 different words out of the 81 words.
shiftLock happens

rkriesel
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 119
Joined: Thu Apr 13, 2006 6:25 pm

Re: Pattern Matching in Livecode

Post by rkriesel » Thu Nov 17, 2016 3:15 am

capellan wrote:Hopefully the array solution would appear too,
anytime soon.
Here's a solution using arrays.

Code: Select all

local sSetsForCard -- sSetsForCard[ tCard ][ tSet ] = "true" -- which sets contain a given card

function getSetsForShownCards pShownCards -- pShownCards is a return-delimited list; so is the result
  local tCountForSet -- tCountForSet[ tSet ] = tCount -- how many shown cards for a given set
  local tShownSets -- tShownSets[ tSet ] = "true" -- sets that contain three of the shown cards
  
  repeat for each line tCard in pShownCards -- for each shown card
    repeat for each key tSet in sSetsForCard[ tCard ] -- for each set that contains the card
      add 1 to tCountForSet [ tSet ] -- increment the set’s count of shown cards
      if tCountForSet [ tSet ] is 3 then -- if the set has three shown cards then
        put "true" into tShownSets[ tSet ] -- remember it
      end if
    end repeat
  end repeat

  return the keys of tShownSets -- the result is a return-delimited list
end getSetsForShownCards
Script-local array sSetsForCard is only a transformation of the field of sets. (So it could be stored in the stack rather than being generated at run time.)

Code: Select all

command initialize pSets -- pSets is a return-delimited list of sets; each set is a space-delimited list of cards
    repeat for each line tSet in pSets
        get word 1 to 3 of tSet
        put "true" into sSetsForCard[ word 1 of it ][ it ]
        put "true" into sSetsForCard[ word 2 of it ][ it ]
        put "true" into sSetsForCard[ word 3 of it ][ it ]
    end repeat
end initialize
In my testing in Livecode 8.1.1 on macOS 10.12.1, function getSetsForShownCards is usually slightly faster than Thierry's first regex solution:
dk1: 0.82 tdz1: 0.9 tdz2: 4 hh1: 33 hh2: 238
dk1: 0.9 tdz1: 0.93 tdz2: 4 hh1: 25 hh2: 237
dk1: 0.89 tdz1: 0.95 tdz2: 3 hh1: 25 hh2: 236
dk1: 0.89 tdz1: 0.93 tdz2: 3 hh1: 24 hh2: 236
dk1: 0.88 tdz1: 0.89 tdz2: 4 hh1: 25 hh2: 236
dk1: 0.88 tdz1: 0.93 tdz2: 3 hh1: 25 hh2: 245
dk1: 0.89 tdz1: 0.9 tdz2: 3 hh1: 24 hh2: 238
dk1: 0.91 tdz1: 0.94 tdz2: 3 hh1: 25 hh2: 239
dk1: 0.94 tdz1: 0.92 tdz2: 4 hh1: 24 hh2: 241
dk1: 0.89 tdz1: 0.9 tdz2: 3 hh1: 24 hh2: 238
Here's a new version of Al's stack, with the new version of Thierry's timing test that produced the above results.
Code I added or changed is only in the card script and the script of button "timing tests."
MatchingPatternsv05.livecode.zip
(18.54 KiB) Downloaded 260 times
What do you think, Al?

-- Dick

capellan
Posts: 654
Joined: Wed Aug 15, 2007 11:09 pm

Re: Pattern Matching in Livecode

Post by capellan » Tue Nov 22, 2016 1:02 am

Hi Dick Kriesel,

I feel humbled and deeply grateful of all of you
for sharing your time and expertise.

This thread is a great lesson about the many
ways that LiveCode offers to solve a simple
task like searching a list of characters within
a larger list.

With 3 milliseconds, Hermann latest handler
take the lead in my tests.
Just 4 milliseconds requires Dick Kriesel
handler and Thierry's regex only takes
5 milliseconds.

Although applying these methods in such small
amount of data seems overkill, all these methods
are really useful if you have to search for a few
words among millions of lines of data.
Still there is a lot to learn and practice for me.

By the way, I have found 11 combinations from 12 cards
(that you could find in the attached stack) using
this empirical method:

1) start with a group of 12 cards that produce 4 or more combinations
2) store the cards that produce these combinations (maybe 8 or 9 cards)
3) replace one by one the rest of the cards (only 4 or 3 cards) with cards
that produce more combinations.

For example: If a group of 8 cards produces 4 combinations, then
add 1 more (now, we have 9 cards) and check how many additional
combinations we could get and... repeat again.
Add another card (we have 10 cards) and check again
how many additional combinations we could get and...
repeat again with 11 cards... and again until you complete
all 12 cards. Store results and export as text. :D

I am sure that exists a surprisingly simple algorithm
that could find 11 valid combinations from
a set of 12 cards, but until someone publish it
this is the only way that I know. :)

Al
Matching Patternsv05B.livecode.zip

Post Reply