Implementing full regex in LiveCode

Anything beyond the basics in using the LiveCode language. Share your handlers, functions and magic here.

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: Implementing full regex in LiveCode

Post by mwieder » Thu Jul 11, 2019 11:29 pm

@Kaveh-

Here you go:

Code: Select all

function convertBackwards pInputString
   local tNumber, tText
   repeat while matchtext(pInputString, " (\d+)([A-Z]+) ", tNumber, tText)
      replace (space & tNumber & tText & space) with (space & tText && tNumber & space) in pInputString
   end repeat
   return pInputString
end convertBackwards

kaveh1000
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 508
Joined: Sun Dec 18, 2011 7:23 pm
Location: London
Contact:

Re: Implementing full regex in LiveCode

Post by kaveh1000 » Fri Jul 12, 2019 12:48 am

Thank you so much Mark. This is a great test. So I have produced a test stack, attached.

There are 5000 line to change in the top field (4998 unique). This takes some 10 seconds on LiveCode. In BBEdit, using

Code: Select all

(\d+)([A-Z]+) > \2 \1
it is instantaneous (less than 1/4 sec I estimate). Regex is fast!!

So the challenge is how to make this faster. Can it be done without native implementation of Regex?

Love to hear expert views.
Attachments
regex test.zip
(15.32 KiB) Downloaded 189 times
Kaveh

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: Implementing full regex in LiveCode

Post by mwieder » Fri Jul 12, 2019 2:23 am

OK - this gets it down to about 250 milliseconds on my computer (moved tReplaced to a script variable and lock screen before calling the function, unlock afterwards):

Code: Select all

function convertBackwards2 pInputString
   local tNumber, tText
   local tReturnString
   
   put 0 into sReplaced
   set the itemdelimiter to space
   repeat for each item tItem in pInputString
      if matchtext(tItem, "(\d+)([A-Z]+)", tNumber, tText) then
         put (tText && tNumber) & space after tReturnString
         add 1 to sReplaced
      else
         put tItem & space after tReturnString
      end if
   end repeat
   delete char -1 of tReturnString
   return tReturnString
end convertBackwards2

kaveh1000
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 508
Joined: Sun Dec 18, 2011 7:23 pm
Location: London
Contact:

Re: Implementing full regex in LiveCode

Post by kaveh1000 » Fri Jul 12, 2019 8:21 am

This is astonishing, Mark. Thank you again. A lot of food for thought. My first reaction:
  • Making local variable to script variable does not have much or any effect
  • Locking screen does not have much or any effect (there is only one screen update) but good practice
  • This is a huge improvement, but it is always limited to the search in question. e.g. what if there was no space before and after the string? I have a feeling there will be an answer, but just thinking aloud
Very exciting and will mull over it as soon as I have a chance.

Kaveh
Kaveh

bogs
Posts: 5435
Joined: Sat Feb 25, 2017 10:45 pm

Re: Implementing full regex in LiveCode

Post by bogs » Fri Jul 12, 2019 11:27 am

I'm curious, as well as locking the screen, wouldn't locking the messages while it is running, then unlocking after it is done reduce the time? Or do I mis-understand message locking (among other things) :D
Image

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: Implementing full regex in LiveCode

Post by mwieder » Fri Jul 12, 2019 4:25 pm

Locking messages wouldn't have any effect here because there aren't any side effects that would invoke any other messages. My guess about the speed increase:

The matchtext function is notoriously slow, and is described that way in the dictionary notes. But it does allow you to use regex in the matching string. I'm guessing that pointing it at a single word, even though it's called more often, is faster than pointing it at the entire text. Even so, I'm surprised to end up with a 40x increase in speed.

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: Implementing full regex in LiveCode

Post by mwieder » Fri Jul 12, 2019 4:29 pm

@Kaveh-
what if there was no space before and after the string?
All I have to work with is the text in the sample you posted. I have no idea what your real-world context might be like. The regex might have to be tweaked if you have other restrictions.

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7215
Joined: Sat Apr 08, 2006 8:31 pm
Location: Minneapolis MN
Contact:

Re: Implementing full regex in LiveCode

Post by jacque » Fri Jul 12, 2019 4:43 pm

Could you repeat for each word instead of items? That would ignore extraneous spaces at both ends as well as internally.
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: Implementing full regex in LiveCode

Post by mwieder » Fri Jul 12, 2019 5:12 pm

That opens up other problems: by-word instead of by-item removes the crlfs so you'd have to add extra code to take care of that.

There's still the problem of the target text being at the end of a sentence because periods would be removed from the converted text:

Lorem ipsum dolor sit 1235ABCD feugait nulla facilisi 1235ABCD.
Lorem ipsum dolor sit 1236ABCD feugait nulla facilisi.

==>

Lorem ipsum dolor sit ABCD 1235 feugait nulla facilisi ABCD 1235 ipsum dolor sit ABCD 1236 feugait nulla facilisi.

kaveh1000
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 508
Joined: Sun Dec 18, 2011 7:23 pm
Location: London
Contact:

Re: Implementing full regex in LiveCode

Post by kaveh1000 » Fri Jul 12, 2019 5:16 pm

regex is completely general, which is why it is so powerful. And you can do anything with it. But I have a good feeling we can have a good LC solution.

I have a rough idea of putting a unique delimiter around the strings that need to be changed first, then just looking at those. so first convert

Code: Select all

Lorem ipsum dolor sit amet, 1234ABC consectetuer
adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore
zzril delenit 678XYZ augue duis dolore te feugait nulla
facilisi.
to

Code: Select all

Lorem ipsum dolor sit amet, •1234ABC• consectetuer
adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore
zzril delenit •678XYZ• augue duis dolore te feugait nulla
facilisi.
assuming • will not appear in text, then use • as the itemdelimiter, do the replace, then remove •

this way we have fewer items.

Just a rough thought. I have a project next few days but will dive in next week. Wishing all you great folks a happy weekend. :-)
Kaveh

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: Implementing full regex in LiveCode

Post by mwieder » Fri Jul 12, 2019 5:19 pm

Well, as you so aptly said,
nulla facilisi.

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7215
Joined: Sat Apr 08, 2006 8:31 pm
Location: Minneapolis MN
Contact:

Re: Implementing full regex in LiveCode

Post by jacque » Fri Jul 12, 2019 6:11 pm

Maybe "trueWord" then. It ignores spaces, punctuation, etc. I'm just thinking out loud.
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: Implementing full regex in LiveCode

Post by mwieder » Fri Jul 12, 2019 9:30 pm

Yep - same problem. Using "trueword" ignores punctuation, so the punctuation (period, comma, crlf, etc) never makes the comparison list and gets lost.

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7215
Joined: Sat Apr 08, 2006 8:31 pm
Location: Minneapolis MN
Contact:

Re: Implementing full regex in LiveCode

Post by jacque » Sat Jul 13, 2019 4:57 pm

assuming • will not appear in text, then use • as the itemdelimiter, do the replace, then remove •
My favorite delimiters are ascii 3 and 8. They can't be typed, so neither will ever be in the text a user creates.
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: Implementing full regex in LiveCode

Post by mwieder » Sat Jul 13, 2019 6:24 pm

Yeah, mine too. But the problem with this approach for this situation is that if you've already located the text objects you want to delimit with non-printable characters then you've already solved the problem of you're trying to solve by delimiting them and you might as well convert them in situ.

Post Reply

Return to “Talking LiveCode”