Difficult find and replace problem (for me)

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

anmldr
Posts: 459
Joined: Tue Sep 11, 2012 11:13 pm

Difficult find and replace problem (for me)

Post by anmldr » Sat Jun 13, 2020 9:27 pm

I would like to have a search and replace button that can find a string BETWEEN two items. For instance, if I have 10 HTML files in a folder. The files may contain links to websites that contain "James" in the URL and I don't want to break the link(s). But in what is displayed on the HTML page, I would want it to be changed to "John".

So, in theory, I would search all files in a folder. I want to find all instances of "James" in the HTML files. There may be links that look something like this:

<a href="http://w w w.JamesWebSite.com/">James</a> (The only way that I could get this forum to NOT treat this as a real link was to add spaces between the "www".)

I want want to replace "James" with "John" if it is between these portions of the anchor tags
/"> & </a>

AND if "James" is located somewhere in the URL, it would not be replaced.

So, Find and Replace "James" with "John" if the string is located between /"> and </a> and not anywhere else in the files.

Thanks. I am beating my head against the wall trying to figure this one out.

Linda

SparkOut
Posts: 2852
Joined: Sun Sep 23, 2007 4:58 pm

Re: Difficult find and replace problem (for me)

Post by SparkOut » Sat Jun 13, 2020 11:12 pm

I am sure this is easier than I made it, and I know for a fact that it would take Thierry about 3 seconds to produce an elegant one-liner, but here's a function I adapted from something I had in the past (I had help from Thierry then):

Code: Select all

function searchAndReplace pHaystack,pNeedle,pNewText
   put "(?ms)<a href=.http.+?" & pNeedle & ".+?>(" & pNeedle & ")</a>" into tBigNeedle
   repeat until matchChunk(pHaystack,tBigNeedle,tStart,tEnd) is false
      put pNewText into char tStart to tEnd of pHaystack
   end repeat
   return pHaystack
end searchAndReplace
If you feed the function with the contents of the file that you want to search for the anchor text to replace, then in your example you could

Code: Select all

put url ("file:" & <path to the original html file to search>) into tHaystack
put "James" into tNeedle --for clarity - you could obviously specify this directly in the function call
put "John" into tNewText --for clarity - you could obviously specify this directly in the function call
put searchAndReplace(tHaystack,tNeedle,tNewText) into url ("file:" & <the path to the file for updated haystack content>)
The first line of the function is hard coded to build the regex to become the "big" needle to search for in the haystack.
(?ms) means search in multi-line mode (or single-line mode) since html is whitespace agnostic and could be split over linebreaks.
Then I have set to match some character patterns where the regex must find an anchor tag with <a href in it.
After that is a wildcard pattern of least hungriness ".+?" which skips text until it finds a match with "James" (pNeedle) BEFORE the closing bracket of the html tag, then more wildcards until it finds the closing bracket. The parenthesis "(" opens the recording of the position of the following matched text (pNeedle). Then there is a closing parenthesis and a requirement to match an html closing tag for the anchor element.
This means that you will get a match only when the anchor url contains James as well as the text ref containing James - in case you don't want to change

Code: Select all

<a href="www.peter.com">James</a>
The function repeats to swap all occurrence of the Needle in the Haystack.

anmldr
Posts: 459
Joined: Tue Sep 11, 2012 11:13 pm

Re: Difficult find and replace problem (for me)

Post by anmldr » Sun Jun 14, 2020 12:54 am

Thanks. I will try to digest this. It was only for learning’s sake anyway. I am not working on a project.

Before I try to deeply understand it, would it work if it were not HTML? Would it work on anything that you wanted the string “between” two chunks?

This may be over my head at the moment but it is still fun.

Linda

kdjanz
Posts: 300
Joined: Fri Dec 09, 2011 12:12 pm
Location: Fort Saskatchewan, AB Canada

Re: Difficult find and replace problem (for me)

Post by kdjanz » Sun Jun 14, 2020 1:34 am

Yes, regex comes with great power and great responsibility. It is a power saw on steroids that runs on 1000 volts that can do almost anything is the right hands. But the learning curve is vertical - at least for me.

https://regex101.com is a website where you can paste in a bunch of your own search text and then experiment with different regex expressions.

If you are on a Mac you can try the live regex in BBEdit v 13, showing all matches in a document as you type the regex.

There are many other resources because so many people need a refresher each time they need to pick up the tool again. But for text matching and searching there is no equal out there in the geek world.

Kelly

anmldr
Posts: 459
Joined: Tue Sep 11, 2012 11:13 pm

Re: Difficult find and replace problem (for me)

Post by anmldr » Sun Jun 14, 2020 1:43 am

I just had a look...I have not used BBEdit in a lot of years. The last version that I had was "2"!! I think that I am definitely behind the times on that one. Yes, I am on a Mac. It is old too. 2015 model.

Linda

SparkOut
Posts: 2852
Joined: Sun Sep 23, 2007 4:58 pm

Re: Difficult find and replace problem (for me)

Post by SparkOut » Sun Jun 14, 2020 5:10 am

anmldr wrote:
Sun Jun 14, 2020 12:54 am
would it work if it were not HTML? Would it work on anything that you wanted the string “between” two chunks?
In a word, yes. But the pattern to match would obviously need to be reconstructed - the example above is specific to your original problem. Regex is very powerful and versatile, and everything kdjanz said.

The built in function replaceText takes regex parameters and could be good to look up, especially if you want to experiment with slightly more basic setups.

anmldr
Posts: 459
Joined: Tue Sep 11, 2012 11:13 pm

Re: Difficult find and replace problem (for me)

Post by anmldr » Sun Jun 14, 2020 7:15 am

Thank you. I will spend some time on it...

Linda

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9385
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: Difficult find and replace problem (for me)

Post by richmond62 » Sun Jun 14, 2020 7:32 am

I am on a Mac. It is old too. 2015 model.
Really? I do the vast majority of my work on 2 2006 iMacs.

But, when I work with BBC BASIC I use my 1981 BBC MODEL B.

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9385
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: Difficult find and replace problem (for me)

Post by richmond62 » Sun Jun 14, 2020 7:39 am

<a href="http://w w w.JamesWebSite.com/">James</a>
AND if "James" is located somewhere in the URL, it would not be replaced.
Well, if your example URL is anything to go on ALL you need to do is to differentiate between
'JamesWebSite' and 'James' . . .

So, I suggest you look up both 'word' and 'trueWord' in the Documentation, and
stop over-complicating things. 8)

pseudoCode:

Code: Select all

 if XXX contains trueWord "James" then
      replace trueWord "James" with "John" in XXX
   end if
The main problem is that that does not work either with 'trueWord' or with 'word'.

This is possibly because 'James' is not space-delimited.

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Difficult find and replace problem (for me)

Post by Thierry » Sun Jun 14, 2020 9:03 am

anmldr wrote:
Sun Jun 14, 2020 7:15 am
Thank you. I will spend some time on it...
Hello Linda,

+42 for everything Sparkout and Kelly said.

Here are few comments out of my head:

- You can get BBEdit 13 free version (less features that the shipped one)

- If your interest is to learn regex,
I kindly suggest that you start with some tutorials first,
then try to learn the basic syntax;
I believe there are around 30 or so meta-characters to remember.
Finally, practice a lot! or said differently, learn by doing.
It's all about a mind shift,
as we are dealing with patterns, sub-patterns,..

- Personaly, when I'm stuck or need to refresh my memories,
I'm using most of my time this excellent resource:
https://www.regular-expressions.info/tutorial.html


- I find this minimalist MacOS app quite handy,
focusing on regex without any distractions.
You can get a free version at https://www.apptorium.com/expressions
and here a screenshot with sparkout's solution. You will have to adapt a bit
the working regex for LiveCode, but when you know how, it's easy.

Expressions screenshot.jpg

Finally,as a side note,
I find giving pills to my sick cat much more complicated...

I wish you an excellent sunday,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9385
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: Difficult find and replace problem (for me)

Post by richmond62 » Sun Jun 14, 2020 9:12 am

Well, "fudge is as fudge does" . . . 8)
-
Screenshot 2020-06-14 at 11.08.44.png
-

Code: Select all

on mouseUp
   put empty into fld "fOUTPUT"
   put fld "fINPUT" into XXX
   put 1 into VOCABLE
   repeat until item VOCABLE of XXX is empty
      put item VOCABLE of XXX into VOX
      if VOX contains ">James<" then
         replace ">James<" with ">John<" in XXX
      end if
      add 1 to VOCABLE
   end repeat
   put XXX into fld "fOUTPUT"
end mouseUp
This is because there are no words 'James' anywhere in your string.
Attachments
Reee Placer.livecode.zip
Here's the stack.
(1.16 KiB) Downloaded 193 times
Last edited by richmond62 on Sun Jun 14, 2020 10:16 am, edited 1 time in total.

SparkOut
Posts: 2852
Joined: Sun Sep 23, 2007 4:58 pm

Re: Difficult find and replace problem (for me)

Post by SparkOut » Sun Jun 14, 2020 9:35 am

Hi Thierry :D

Oh, I also forgot to mention above, that in the Scripting Conferences there is some great info about text munging, and does cover some regex.

http://www.hyperactivesw.com/revscriptc ... ences.html

AxWald
Posts: 578
Joined: Thu Mar 06, 2014 2:57 pm

Re: Difficult find and replace problem (for me)

Post by AxWald » Sun Jun 14, 2020 2:10 pm

Hi,

RegEx ever was beyond my limited scope. Too much hermetic symbols to remember, too rigid a syntax, too enigmatic the code - too many possibilities to completely "fraggle up", for an old coder with average brain like me, at least.

"Offset()" on the other hand is a thing of sheer beauty - simple, elegant, powerful.
And sometimes you even understand what these offset arithmetic concatenations do, more or less!

A while ago a wrote an offset() based function to extract "what's between two strings". After long consideration I baptized it "inBeet()" :)

This one I use now to find every "James" as text part of a href in a list of those, for simplicity. This could be a whole web site source, or any other text. InBeet (the search function) is quite fast, but the replacement loop in the mouseUp handler could be improved when it comes to real phat data.

Code: Select all

on mouseUp
   put "<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">James</a>" & CR & \
         "<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">Jim</a>" & CR & \
         "<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">Robert</a>" & CR & \
         "<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">Jane</a>" & CR & \
         "<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">James</a>" & CR & \
         "<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">Frank</a>" & CR & \
         "<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">Susanna</a>" & CR & \
         "<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">Anthony</a>" & CR & \
         "<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">Mathilda</a>" & CR & \
         "<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">James</a>" into myStr
   /* => myStr:
   <a href="http://w w w.aWebSite.com/">James</a>
   <a href="http://w w w.aWebSite.com/">Jim</a>
   <a href="http://w w w.aWebSite.com/">Robert</a>
   [ ... ] */
   
   put "James" into myFind                                           --  what we look for
   put "John" into myRepl                                         --  this is the replacement
   
   put 0 into myOff
   put 0 into myCnt
   repeat until the controlkey is down                                --  so we can interrupt
      put inBeet(myStr,"<a href=","</a>",myOff) into myRes           --  find a href construct
      if myRes is empty then exit repeat                           --  not found or we're through
      put line 1 of myRes into myOff                               --  = pos. of last char of foundChunk
      if (line 2 of myRes ends with (myFind)) then                   --  that's a hit!
         put myRepl into char (myOff - len(myFind) + 1) to myOff of myStr  --  so replace
         add len(myRepl) - len(myFind) to myOff                           --  correct myOff
         add 1 to myCnt
      end if
   end repeat
   
   answer information "Replaced '" & myFind & "' with '" & myRepl & "' in the String." & CR & \
         "Find count: " & myCnt & CR & CR & "Result: => message box" titled "woOOOot!"
   put myStr
end mouseUp

function inBeet theString, theStart, theEnd, offNum
   /* "inBeet" extracts what's between "theStart" and "theEnd" in "theString",
   .  starting its search at char "offNum" of "theString". It's fast :)
   
   # theString: a string that can be quite long.
   # theStart & theEnd: strings containing what's before & after the search result.
   . It's assumed that theEnd comes AFTER theStart & that both are not empty ...
   # offNum: an offset integer, determining where to start a search. Or empty.
   
   The result comes in 2 flavors, depending of the content of offNum:
   - offNum is empty: inBeet returns the first found string.
   - offNum is not empty: inBeet returns the position of the last char found 
   . of theEnd in theString (behind position offNum) & CR & the first found string.
   
   Exercise to the esteemed reader: What happens with negative offNums? ;-)
   axwald @ forums.livecode.com, GPL v3  */
   
   if offNum is not empty then
      put offset(theStart,theString,offNum) + len(theStart) + offNum into myStart
      if myStart is (len(theStart) + offNum) then return empty
      put offset(theEnd,theString,myStart) + (myStart)-1 into myEnd
      if myEnd is (myStart)-1 then return empty
      return myEnd & CR & char myStart to myEnd of theString
   else
      return char (offset(theStart,theString) + len(theStart)) to \
            (offset(theEnd,theString,(offset(theStart,theString) + len(theStart))) \
            + (offset(theStart,theString) + len(theStart))-1) of theString
   end if
end inBeet
Have fun!
All code published by me here was created with Community Editions of LC (thus is GPLv3).
If you use it in closed source projects, or for the Apple AppStore, or with XCode
you'll violate some license terms - read your relevant EULAs & Licenses!

anmldr
Posts: 459
Joined: Tue Sep 11, 2012 11:13 pm

Re: Difficult find and replace problem (for me)

Post by anmldr » Sun Jun 14, 2020 6:41 pm

SparkOut wrote:
Sun Jun 14, 2020 9:35 am
Hi Thierry :D

Oh, I also forgot to mention above, that in the Scripting Conferences there is some great info about text munging, and does cover some regex.

http://www.hyperactivesw.com/revscriptc ... ences.html
Very good, SparkOut!

Also, BYU's lessons in section "Text Boolean Operators—Comparing Strings" talking about
is in
is not in
contains
is among
is not among
begins with
ends with

I was looking for something like "before" and "after". But these help me to understand the language a bit better.

About RegEx, I am afraid that I will get all wrapped up in it and will not spend the time that I want on LC. I tend to do that...delve deeply into a subject and stick with it for awhile. I was really spending time with LC back in 2015. It took me this long to get back to it. I will bookmark the info about RegEx though.

Linda

P.S. My background with coding is not very elaborate. I can play around with hand coding HTML and CSS. That is not really "coding" to me. But for about 10 years, I did write apps for the Palm OS, then Pocket PC and the beginnings of iOS. Almost everything that I wrote was for veterinary use. Calculators, conversion apps, eBooks, etc. It was fun. I didn't make much $ at it because it was just a hobby. I was making only a small percentage of the sales of the eBooks. The author made the vast majority of it and I was fine with that. But, the iPhone took over other handhelds and Apple would take 30% of the sales and there was no way to market the apps otherwise. I gave up. It was not worth spending the time on it. Xcode was too difficult for me anyway and I did not have the time to spend on it. Then I found LC. It looks like it is more up my alley. Besides, I have time now since my eyesight is failing and I no longer can work as a veterinarian.

P.S.S. As for old equipment, I just hooked up a Windows XP machine to find some of the old code used for the Palm OS so that I can try to translate it into LC apps. Fun. That way, I don't have to concentrate on collecting the info to put into the apps. I can spend all of the time on learning to use LC itself.

marksmithhfx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 931
Joined: Thu Nov 13, 2008 6:48 am
Location: London, UK

Re: Difficult find and replace problem (for me)

Post by marksmithhfx » Tue Jun 16, 2020 2:41 pm

Does anyone know how to convert the following to regex?

select all text between "[" and "]" including the brackets

e.g. select "[abcdef123]" or "[bbb]" from a line of text

Thanks
Mark

PS if anyone is interested in how I did this without regex, here is the code (although I think a replacetext command with a regex expression would be a lot simpler).

Code: Select all

   repeat for each line tline in tText
      -- check for bracketed text like [this] in the string, and cut
      put offset ("[", tline) into tStart
      if tStart > 0 then
         put offset ("]", tline) into tEnd
         if tEnd > tStart then
            put char 1 to tStart-1 of tline into tFirst
            put char tEnd + 1 to the number of chars of tline of tline into tLast
            put tFirst & tLast into tline
         end if
      end if
   end repeat
macOS 12.6.5 (Monterey), Xcode 14.2, LC 10.0.0, iOS 15.6.1
Targets: Mac, iOS

Post Reply

Return to “Getting Started with LiveCode - Complete Beginners”