Find text between two different delimiters/tags using Regex

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
golife
Posts: 103
Joined: Fri Apr 02, 2010 12:10 pm

Find text between two different delimiters/tags using Regex

Post by golife » Sat Apr 03, 2021 9:39 am

Subject: Find text between two different delimiters/tags using Regex

Frequently, I am extracting the text data between two different tags or delimiters of varying length.

I like to use a Regex function, but even though trying various models published, I failed to get it working. Maybe someone has an idea to use the function below as a Regex function?

My LiveCode function looks like this:

Usage:
... Extract a number enclosed in parenthesis "(" and ")"
... Extracting text between a start tag, for example "---", and an ending tag "---/"
... Defining one's own tags to extract text between such tags
... Extract tags from HTML or any other tagged source code

Note:
This function only returns the FIRST instance in a string. To go through a whole document, the function needs to be extended.

Code: Select all

// Example 1: Using "(" as the beginning tag and ")" as the ending tag
put "James hit the ball in the (23)rd soccer game in Milano." into tText
put filterTags (tText, "(,)" )

Code: Select all

// Example 2: Using beginning tag "---" and "---/" as ending tag
put "James played the ball in the at the ---23rd---/ soccer game." into tText
put filterTags (tText, "---,---/" )

Code: Select all

function filterTags pString,pDels
   ## Extract text that is between two different delimiters/tags
   
   if pString = "?" then 
      return "filterTags ( pString, pDels ). "& \
            "Param 'pDels': One or two delimiters ('tags','separators') "& \
            "as comma separated items."
   end if
  
   local a,b
   
   if the number of items of pDels is 2 then
      put item 1 of pDels into pDel1
      put item 2 of pDels into pDel2
   else
      return empty
   end if
   
   if pDel1 is pDel2 then return empty -- requires two tags that are different
   
   put offset( pDel1 , pString ) + length( pDel1 ) into a
   put offset( pDel2 , pString ) - 1 into b
   
   if a > 0 AND b > a then
      return char a to b of pString
   else
      return empty
   end if
end filterTags
s

Regards, Roland (golife)

grzkmo
Posts: 8
Joined: Thu Aug 07, 2008 9:50 am

Re: Find text between two different delimiters/tags using Regex

Post by grzkmo » Sat Apr 03, 2021 10:52 am

Without regex:

Card with 2 Fields (f1, f2) and 1 Button

Text in fld "f1":

<div class="entry">
<h1>{{title}}</h1>
<div class="body">
{{body}}
</div>
</div>
A handlebars expression is a {{, some contents, followed by a }}



Code of field "f1":

Code: Select all

on mouseUp pMouseButton
  constant kLd = "}}"
  constant kId = "{{"
  put fld "f1" into tText
  set the linedelimiter to kLd
  set the itemdelimiter to kId
  repeat for each line tLine in tText
    put the last item of tLine & cr after tFound
  end repeat
  delete the last char of tFound
  put tFound into fld "f2"
end mouseUp
After mouse clicked the Text of fld "f2" will be:

title
body
, some contents, followed by a

Best
Günter

grzkmo
Posts: 8
Joined: Thu Aug 07, 2008 9:50 am

Re: Find text between two different delimiters/tags using Regex

Post by grzkmo » Sat Apr 03, 2021 10:57 am

Code: Select all

Code of field "f1":
has to be

Code: Select all

Code of the Button:

AxWald
Posts: 578
Joined: Thu Mar 06, 2014 2:57 pm

Re: Find text between two different delimiters/tags using Regex

Post by AxWald » Sat Apr 03, 2021 11:21 am

From my Library-Stack:

Code: Select all

function inBeet_BL theString, theStart, theEnd, offNum
   /* inBeet_BL() is a very fast function - it extracts data between 2 search hits.
   
   # theString (String): The data you're searching in;
   
   # theStart, theEnd (String): what is before and after your desired text,
   . assumed that theEnd comes AFTER theStart, and that at least theEnd is not empty;
   . If you look for something at the beginning of theString, leave theStart empty -
   . in case of mode 2 (2-line return) you need to set offNum to 0.
   
   # OffNum (Int, optional): skip [offNum] chars at the beginning of theString;
   . 2 modes of action, depending on the value of offNum:
   
   - Mode 1: When offNum is empty, it just returns the found string, or empty;
   - Mode 2: When offNum <> empty, it returns a 2-line result:
   .        line 1 is the pos of the last char touched in theString (= last char found of theEnd) 
   .        and line 2 is the found string. Use line 1 as offNum for your next repeat!
   
   What negative values do for offNum is left as an exercise - it has a (strange) use too!
   axwald @ forums.livecode.com, GPL v3, 10/2020   */
   
   if (offNum is not empty) then
      if theStart is empty then 
         put 1 into theStart
      else
         put offset(theStart,theString,offNum) + len(theStart) + offNum into myStart
      end if
      if myStart is (len(theStart) + offNum) then return empty
      put offset(theEnd,theString,myStart) + (myStart)-1 into myEnd
      if myEnd is (myStart)-1 then return empty
      return myEnd & CR & char myStart to myEnd of theString
   else
      return char (offset(theStart,theString) + len(theStart)) to \
            (offset(theEnd,theString,(offset(theStart,theString) + len(theStart))) \
            + (offset(theStart,theString) + len(theStart))-1) \
            of theString
   end if
end inBeet_BL
Have fun ;-)

PS: I don't use RegExp - I just detest the cryptic syntax. But they are mighty tools for those that can bear 'em - I'm sure Thierry will chime in with some breathtaking solution :)
All code published by me here was created with Community Editions of LC (thus is GPLv3).
If you use it in closed source projects, or for the Apple AppStore, or with XCode
you'll violate some license terms - read your relevant EULAs & Licenses!

stam
Posts: 2634
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Find text between two different delimiters/tags using Regex

Post by stam » Sat Apr 03, 2021 1:34 pm

i think the regex to match text between 2 different delimiters (delim1 and delim2 in this example) would be

Code: Select all

(?<=delim1)(.*)(?=delim2)
hope that helps..

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9578
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Find text between two different delimiters/tags using Regex

Post by dunbarx » Sat Apr 03, 2021 3:43 pm

What am I missing here? Why does this not do what is required? With a field 1 with text in it, and a button with this in its script:

Code: Select all

on mouseUp
   answer twoTags(fld 1,comma,"epoch")
end mouseUp

function twoTags tText,firsttag,secondTag
   set the itemDel to firstTag
   put item 2 to 1000 of tText into temp
   set the itemDel to secondTag
   return item 1 of temp
end twoTags
I had this in my own field 1"
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of light, it was the season of darkness, it was the spring of hope, it was the winter of despair.
I got back this:
it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the
Craig

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Find text between two different delimiters/tags using Regex

Post by Thierry » Sat Apr 03, 2021 4:09 pm

stam wrote:
Sat Apr 03, 2021 1:34 pm

Code: Select all

(?<=delim1)(.*)(?=delim2)
hope that helps..
Hello Stam,

here is a little saturday night quiz...

I've replaced delim1 and delim2 by _1 and _2

Code: Select all

on mouseUP
   constant T = "_1qwerty_2 _1asdf_2"
   
   put "Text in: " &tab& T &cr& \
         "regex:" &tab& "result:" &cr into fld 1
   
   constant rexStam1 = "(?<=_1)(.*)(?=_2)"
   if matchText( T, rexStam1, gotIt) then
      put rexStam1 &tab& gotIt &cr after fld 1
   end if
   
   constant rexStam2 = "_1(.*)_2"
   if matchText( T, rexStam2, gotIt) then
      put rexStam2 &tab& gotIt &cr after fld 1
   end if
   
   -- constant rexTdz1 = "?????"
   if matchText( T, rexTdz1, gotIt) then
      put "rexTdz1" &tab& gotIt &cr after fld 1
   end if
end mouseUP
and the corresponding results:

screenshot 2021-04-03 à 16.50.16.jpg
screenshot 2021-04-03 à 16.50.16.jpg (17.18 KiB) Viewed 7993 times

Hint: I added 1 meta char in your regex. so what's in rexTdz1 ? :)

Of course, this doesn't resolve the OP question, as he wants all the pattern occurences,
and not only the 1st.

Enjoy or not :)

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9578
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Find text between two different delimiters/tags using Regex

Post by dunbarx » Sat Apr 03, 2021 4:47 pm

Thierry wrote:
Of course, this doesn't resolve the OP question, as he wants all the pattern occurences,
and not only the 1st.
I did not know that. But anyway, and like Axwald I also do not love regex (though I admire it) this "old fashioned" gadget works. I had:
abc, def epoch, hij epoch klm, nop epoch qrs , tuv epoch wxyz epoch
in field 1 and this in the button script. Comma and "epoch" are the two "tags". A little recursion never hurts:

Code: Select all

on mouseUp
   answer twoTags(fld 1,comma,"epoch")
end mouseUp

function twoTags ttext,firstTag,secondTag,accum
   set the itemDel to firstTag
   put item 2 to 10000 of tText into temp
   set the itemDel to secondTag
   put item 1 of temp after accum
   
   delete item 1 of tText
   if firstTag is in tText and secondTag is in tText then
      put twoTags(tText,firstTag,secondtag,accum) after accum
   else
      return accum
   end if
end twoTags
I get:
def hij nop tuv
Craig

stam
Posts: 2634
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Find text between two different delimiters/tags using Regex

Post by stam » Sat Apr 03, 2021 4:56 pm

Thierry wrote:
Sat Apr 03, 2021 4:09 pm
here is a little saturday night quiz...
...
Hint: I added 1 meta char in your regex. so what's in rexTdz1 ? :)
Thierry
Hi Thierry - and and thanks for jumping in and correcting me ;)
I forgot to add the non-greedy operator '?'
code should be

Code: Select all

(?<=delim1)(.*?)(?=delim2)
he's an example using the non-greedy operator, using the text Craig posted above
regx.jpg
Using (.*) instead of (.*?) finds one long group, from the first delim1 to the last delim2:
greedy.jpg
Adding the non-greedy operator finds the individual groups - thank you for correcting me, i always find regex so useful but so difficult lol....

SparkOut
Posts: 2839
Joined: Sun Sep 23, 2007 4:58 pm

Re: Find text between two different delimiters/tags using Regex

Post by SparkOut » Sat Apr 03, 2021 6:21 pm

I always find regex difficult until Thierry helps out. Then I wonder what was so hard, but with a long time between needing to use regex it gets difficult again the next time, until Thierry swoops* in again.

*I don't need to link the xkcd strip again do I?

Edit: Oh alright then ...

Image

Substitute LiveCode for PERL

stam
Posts: 2634
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Find text between two different delimiters/tags using Regex

Post by stam » Sat Apr 03, 2021 7:15 pm

SparkOut wrote:
Sat Apr 03, 2021 6:21 pm

*I don't need to link the xkcd strip again do I?

Edit: Oh alright then ...
that's so brilliant lol!

golife
Posts: 103
Joined: Fri Apr 02, 2010 12:10 pm

Re: Find text between two different delimiters/tags using Regex

Post by golife » Sun Apr 04, 2021 8:53 am

When I posted the question first, I had in mind that Thierry night jump in. He did ... :D

I feel like a bloody beginner regarding Regex, nevertheless, it does things fast and sometimes it is just the best solution. But it also is a black box for most of us since we do not spend the time to study it deeply enough. Yes, it looks extremely cryptic. But how would it look if it would not use such a condensed way of parameterization?

I swear to myself to take some time to study it more deeply with examples and trial-and-error.

Very helpful posts here.. 8)

Thanks to all ..., Roland (golife)

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9578
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Find text between two different delimiters/tags using Regex

Post by dunbarx » Sun Apr 04, 2021 5:13 pm

Hi.

Regex is very powerful and compact. It can do in one line what would take "ordinary" LC several. I spent a little time working through some basics. However, I never use it. Too old and not smart enough, you know.

But did you check out the second "twoTags" handler above? I think it does what you want, the old fashioned way.

Craig

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Find text between two different delimiters/tags using Regex

Post by Thierry » Mon Apr 05, 2021 11:57 am

stam wrote:
Sat Apr 03, 2021 4:56 pm
Hi Thierry - and and thanks for jumping in ;)
I forgot to add the non-greedy operator '?'
code should be

Code: Select all

(?<=delim1)(.*?)(?=delim2)
please don't give too much heavy food to my 'sick-hatred-anti-regex' friends :roll:
this will work too with less typing: (1)

Code: Select all

delim1(.*?)delim2
he's an example using the non-greedy operator
file.jpg
Mmm, still some more work to do to get the same result in LiveCode :wink:

Happy Easter :)

Thierry

[1] all my comments with regex are in a LiveCode context only!
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Find text between two different delimiters/tags using Regex

Post by Thierry » Mon Apr 05, 2021 12:13 pm

SparkOut wrote:
Sat Apr 03, 2021 6:21 pm
*I don't need to link the xkcd strip again do I?

Edit: Oh alright then ...
Hi SparkOut, funny but that's an old story and only one side of the play :)

So, look at what was happening after that:

tarzan-liane.gif
(click the gif to see it in action...)

Happy Easter,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

Post Reply

Return to “Getting Started with LiveCode - Complete Beginners”