Difficult find and replace problem (for me)
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller
Difficult find and replace problem (for me)
I would like to have a search and replace button that can find a string BETWEEN two items. For instance, if I have 10 HTML files in a folder. The files may contain links to websites that contain "James" in the URL and I don't want to break the link(s). But in what is displayed on the HTML page, I would want it to be changed to "John".
So, in theory, I would search all files in a folder. I want to find all instances of "James" in the HTML files. There may be links that look something like this:
<a href="http://w w w.JamesWebSite.com/">James</a> (The only way that I could get this forum to NOT treat this as a real link was to add spaces between the "www".)
I want want to replace "James" with "John" if it is between these portions of the anchor tags
/"> & </a>
AND if "James" is located somewhere in the URL, it would not be replaced.
So, Find and Replace "James" with "John" if the string is located between /"> and </a> and not anywhere else in the files.
Thanks. I am beating my head against the wall trying to figure this one out.
Linda
So, in theory, I would search all files in a folder. I want to find all instances of "James" in the HTML files. There may be links that look something like this:
<a href="http://w w w.JamesWebSite.com/">James</a> (The only way that I could get this forum to NOT treat this as a real link was to add spaces between the "www".)
I want want to replace "James" with "John" if it is between these portions of the anchor tags
/"> & </a>
AND if "James" is located somewhere in the URL, it would not be replaced.
So, Find and Replace "James" with "John" if the string is located between /"> and </a> and not anywhere else in the files.
Thanks. I am beating my head against the wall trying to figure this one out.
Linda
Re: Difficult find and replace problem (for me)
I am sure this is easier than I made it, and I know for a fact that it would take Thierry about 3 seconds to produce an elegant one-liner, but here's a function I adapted from something I had in the past (I had help from Thierry then):
If you feed the function with the contents of the file that you want to search for the anchor text to replace, then in your example you could
The first line of the function is hard coded to build the regex to become the "big" needle to search for in the haystack.
(?ms) means search in multi-line mode (or single-line mode) since html is whitespace agnostic and could be split over linebreaks.
Then I have set to match some character patterns where the regex must find an anchor tag with <a href in it.
After that is a wildcard pattern of least hungriness ".+?" which skips text until it finds a match with "James" (pNeedle) BEFORE the closing bracket of the html tag, then more wildcards until it finds the closing bracket. The parenthesis "(" opens the recording of the position of the following matched text (pNeedle). Then there is a closing parenthesis and a requirement to match an html closing tag for the anchor element.
This means that you will get a match only when the anchor url contains James as well as the text ref containing James - in case you don't want to changeThe function repeats to swap all occurrence of the Needle in the Haystack.
Code: Select all
function searchAndReplace pHaystack,pNeedle,pNewText
put "(?ms)<a href=.http.+?" & pNeedle & ".+?>(" & pNeedle & ")</a>" into tBigNeedle
repeat until matchChunk(pHaystack,tBigNeedle,tStart,tEnd) is false
put pNewText into char tStart to tEnd of pHaystack
end repeat
return pHaystack
end searchAndReplace
Code: Select all
put url ("file:" & <path to the original html file to search>) into tHaystack
put "James" into tNeedle --for clarity - you could obviously specify this directly in the function call
put "John" into tNewText --for clarity - you could obviously specify this directly in the function call
put searchAndReplace(tHaystack,tNeedle,tNewText) into url ("file:" & <the path to the file for updated haystack content>)
(?ms) means search in multi-line mode (or single-line mode) since html is whitespace agnostic and could be split over linebreaks.
Then I have set to match some character patterns where the regex must find an anchor tag with <a href in it.
After that is a wildcard pattern of least hungriness ".+?" which skips text until it finds a match with "James" (pNeedle) BEFORE the closing bracket of the html tag, then more wildcards until it finds the closing bracket. The parenthesis "(" opens the recording of the position of the following matched text (pNeedle). Then there is a closing parenthesis and a requirement to match an html closing tag for the anchor element.
This means that you will get a match only when the anchor url contains James as well as the text ref containing James - in case you don't want to change
Code: Select all
<a href="www.peter.com">James</a>
Re: Difficult find and replace problem (for me)
Thanks. I will try to digest this. It was only for learning’s sake anyway. I am not working on a project.
Before I try to deeply understand it, would it work if it were not HTML? Would it work on anything that you wanted the string “between” two chunks?
This may be over my head at the moment but it is still fun.
Linda
Before I try to deeply understand it, would it work if it were not HTML? Would it work on anything that you wanted the string “between” two chunks?
This may be over my head at the moment but it is still fun.
Linda
Re: Difficult find and replace problem (for me)
Yes, regex comes with great power and great responsibility. It is a power saw on steroids that runs on 1000 volts that can do almost anything is the right hands. But the learning curve is vertical - at least for me.
https://regex101.com is a website where you can paste in a bunch of your own search text and then experiment with different regex expressions.
If you are on a Mac you can try the live regex in BBEdit v 13, showing all matches in a document as you type the regex.
There are many other resources because so many people need a refresher each time they need to pick up the tool again. But for text matching and searching there is no equal out there in the geek world.
Kelly
https://regex101.com is a website where you can paste in a bunch of your own search text and then experiment with different regex expressions.
If you are on a Mac you can try the live regex in BBEdit v 13, showing all matches in a document as you type the regex.
There are many other resources because so many people need a refresher each time they need to pick up the tool again. But for text matching and searching there is no equal out there in the geek world.
Kelly
Re: Difficult find and replace problem (for me)
I just had a look...I have not used BBEdit in a lot of years. The last version that I had was "2"!! I think that I am definitely behind the times on that one. Yes, I am on a Mac. It is old too. 2015 model.
Linda
Linda
Re: Difficult find and replace problem (for me)
In a word, yes. But the pattern to match would obviously need to be reconstructed - the example above is specific to your original problem. Regex is very powerful and versatile, and everything kdjanz said.
The built in function replaceText takes regex parameters and could be good to look up, especially if you want to experiment with slightly more basic setups.
Re: Difficult find and replace problem (for me)
Thank you. I will spend some time on it...
Linda
Linda
-
- Livecode Opensource Backer
- Posts: 9385
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: Difficult find and replace problem (for me)
Really? I do the vast majority of my work on 2 2006 iMacs.I am on a Mac. It is old too. 2015 model.
But, when I work with BBC BASIC I use my 1981 BBC MODEL B.
-
- Livecode Opensource Backer
- Posts: 9385
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: Difficult find and replace problem (for me)
<a href="http://w w w.JamesWebSite.com/">James</a>
Well, if your example URL is anything to go on ALL you need to do is to differentiate betweenAND if "James" is located somewhere in the URL, it would not be replaced.
'JamesWebSite' and 'James' . . .
So, I suggest you look up both 'word' and 'trueWord' in the Documentation, and
stop over-complicating things.
pseudoCode:
Code: Select all
if XXX contains trueWord "James" then
replace trueWord "James" with "John" in XXX
end if
This is possibly because 'James' is not space-delimited.
Re: Difficult find and replace problem (for me)
Hello Linda,
+42 for everything Sparkout and Kelly said.
Here are few comments out of my head:
- You can get BBEdit 13 free version (less features that the shipped one)
- If your interest is to learn regex,
I kindly suggest that you start with some tutorials first,
then try to learn the basic syntax;
I believe there are around 30 or so meta-characters to remember.
Finally, practice a lot! or said differently, learn by doing.
It's all about a mind shift,
as we are dealing with patterns, sub-patterns,..
- Personaly, when I'm stuck or need to refresh my memories,
I'm using most of my time this excellent resource:
https://www.regular-expressions.info/tutorial.html
- I find this minimalist MacOS app quite handy,
focusing on regex without any distractions.
You can get a free version at https://www.apptorium.com/expressions
and here a screenshot with sparkout's solution. You will have to adapt a bit
the working regex for LiveCode, but when you know how, it's easy.
Finally,as a side note,
I find giving pills to my sick cat much more complicated...
I wish you an excellent sunday,
Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
-
- Livecode Opensource Backer
- Posts: 9385
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: Difficult find and replace problem (for me)
Well, "fudge is as fudge does" . . .
- -
This is because there are no words 'James' anywhere in your string.
- -
Code: Select all
on mouseUp
put empty into fld "fOUTPUT"
put fld "fINPUT" into XXX
put 1 into VOCABLE
repeat until item VOCABLE of XXX is empty
put item VOCABLE of XXX into VOX
if VOX contains ">James<" then
replace ">James<" with ">John<" in XXX
end if
add 1 to VOCABLE
end repeat
put XXX into fld "fOUTPUT"
end mouseUp
- Attachments
-
- Reee Placer.livecode.zip
- Here's the stack.
- (1.16 KiB) Downloaded 193 times
Last edited by richmond62 on Sun Jun 14, 2020 10:16 am, edited 1 time in total.
Re: Difficult find and replace problem (for me)
Hi Thierry
Oh, I also forgot to mention above, that in the Scripting Conferences there is some great info about text munging, and does cover some regex.
http://www.hyperactivesw.com/revscriptc ... ences.html
Oh, I also forgot to mention above, that in the Scripting Conferences there is some great info about text munging, and does cover some regex.
http://www.hyperactivesw.com/revscriptc ... ences.html
Re: Difficult find and replace problem (for me)
Hi,
RegEx ever was beyond my limited scope. Too much hermetic symbols to remember, too rigid a syntax, too enigmatic the code - too many possibilities to completely "fraggle up", for an old coder with average brain like me, at least.
"Offset()" on the other hand is a thing of sheer beauty - simple, elegant, powerful.
And sometimes you even understand what these offset arithmetic concatenations do, more or less!
A while ago a wrote an offset() based function to extract "what's between two strings". After long consideration I baptized it "inBeet()" :)
This one I use now to find every "James" as text part of a href in a list of those, for simplicity. This could be a whole web site source, or any other text. InBeet (the search function) is quite fast, but the replacement loop in the mouseUp handler could be improved when it comes to real phat data.
Have fun!
RegEx ever was beyond my limited scope. Too much hermetic symbols to remember, too rigid a syntax, too enigmatic the code - too many possibilities to completely "fraggle up", for an old coder with average brain like me, at least.
"Offset()" on the other hand is a thing of sheer beauty - simple, elegant, powerful.
And sometimes you even understand what these offset arithmetic concatenations do, more or less!
A while ago a wrote an offset() based function to extract "what's between two strings". After long consideration I baptized it "inBeet()" :)
This one I use now to find every "James" as text part of a href in a list of those, for simplicity. This could be a whole web site source, or any other text. InBeet (the search function) is quite fast, but the replacement loop in the mouseUp handler could be improved when it comes to real phat data.
Code: Select all
on mouseUp
put "<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">James</a>" & CR & \
"<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">Jim</a>" & CR & \
"<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">Robert</a>" & CR & \
"<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">Jane</a>" & CR & \
"<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">James</a>" & CR & \
"<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">Frank</a>" & CR & \
"<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">Susanna</a>" & CR & \
"<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">Anthony</a>" & CR & \
"<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">Mathilda</a>" & CR & \
"<a href=" & quote & "http://w w w.aWebSite.com/" & quote & ">James</a>" into myStr
/* => myStr:
<a href="http://w w w.aWebSite.com/">James</a>
<a href="http://w w w.aWebSite.com/">Jim</a>
<a href="http://w w w.aWebSite.com/">Robert</a>
[ ... ] */
put "James" into myFind -- what we look for
put "John" into myRepl -- this is the replacement
put 0 into myOff
put 0 into myCnt
repeat until the controlkey is down -- so we can interrupt
put inBeet(myStr,"<a href=","</a>",myOff) into myRes -- find a href construct
if myRes is empty then exit repeat -- not found or we're through
put line 1 of myRes into myOff -- = pos. of last char of foundChunk
if (line 2 of myRes ends with (myFind)) then -- that's a hit!
put myRepl into char (myOff - len(myFind) + 1) to myOff of myStr -- so replace
add len(myRepl) - len(myFind) to myOff -- correct myOff
add 1 to myCnt
end if
end repeat
answer information "Replaced '" & myFind & "' with '" & myRepl & "' in the String." & CR & \
"Find count: " & myCnt & CR & CR & "Result: => message box" titled "woOOOot!"
put myStr
end mouseUp
function inBeet theString, theStart, theEnd, offNum
/* "inBeet" extracts what's between "theStart" and "theEnd" in "theString",
. starting its search at char "offNum" of "theString". It's fast :)
# theString: a string that can be quite long.
# theStart & theEnd: strings containing what's before & after the search result.
. It's assumed that theEnd comes AFTER theStart & that both are not empty ...
# offNum: an offset integer, determining where to start a search. Or empty.
The result comes in 2 flavors, depending of the content of offNum:
- offNum is empty: inBeet returns the first found string.
- offNum is not empty: inBeet returns the position of the last char found
. of theEnd in theString (behind position offNum) & CR & the first found string.
Exercise to the esteemed reader: What happens with negative offNums? ;-)
axwald @ forums.livecode.com, GPL v3 */
if offNum is not empty then
put offset(theStart,theString,offNum) + len(theStart) + offNum into myStart
if myStart is (len(theStart) + offNum) then return empty
put offset(theEnd,theString,myStart) + (myStart)-1 into myEnd
if myEnd is (myStart)-1 then return empty
return myEnd & CR & char myStart to myEnd of theString
else
return char (offset(theStart,theString) + len(theStart)) to \
(offset(theEnd,theString,(offset(theStart,theString) + len(theStart))) \
+ (offset(theStart,theString) + len(theStart))-1) of theString
end if
end inBeet
All code published by me here was created with Community Editions of LC (thus is GPLv3).
If you use it in closed source projects, or for the Apple AppStore, or with XCode
you'll violate some license terms - read your relevant EULAs & Licenses!
If you use it in closed source projects, or for the Apple AppStore, or with XCode
you'll violate some license terms - read your relevant EULAs & Licenses!
Re: Difficult find and replace problem (for me)
Very good, SparkOut!SparkOut wrote: ↑Sun Jun 14, 2020 9:35 amHi Thierry
Oh, I also forgot to mention above, that in the Scripting Conferences there is some great info about text munging, and does cover some regex.
http://www.hyperactivesw.com/revscriptc ... ences.html
Also, BYU's lessons in section "Text Boolean Operators—Comparing Strings" talking about
is in
is not in
contains
is among
is not among
begins with
ends with
I was looking for something like "before" and "after". But these help me to understand the language a bit better.
About RegEx, I am afraid that I will get all wrapped up in it and will not spend the time that I want on LC. I tend to do that...delve deeply into a subject and stick with it for awhile. I was really spending time with LC back in 2015. It took me this long to get back to it. I will bookmark the info about RegEx though.
Linda
P.S. My background with coding is not very elaborate. I can play around with hand coding HTML and CSS. That is not really "coding" to me. But for about 10 years, I did write apps for the Palm OS, then Pocket PC and the beginnings of iOS. Almost everything that I wrote was for veterinary use. Calculators, conversion apps, eBooks, etc. It was fun. I didn't make much $ at it because it was just a hobby. I was making only a small percentage of the sales of the eBooks. The author made the vast majority of it and I was fine with that. But, the iPhone took over other handhelds and Apple would take 30% of the sales and there was no way to market the apps otherwise. I gave up. It was not worth spending the time on it. Xcode was too difficult for me anyway and I did not have the time to spend on it. Then I found LC. It looks like it is more up my alley. Besides, I have time now since my eyesight is failing and I no longer can work as a veterinarian.
P.S.S. As for old equipment, I just hooked up a Windows XP machine to find some of the old code used for the Palm OS so that I can try to translate it into LC apps. Fun. That way, I don't have to concentrate on collecting the info to put into the apps. I can spend all of the time on learning to use LC itself.
-
- VIP Livecode Opensource Backer
- Posts: 931
- Joined: Thu Nov 13, 2008 6:48 am
- Location: London, UK
Re: Difficult find and replace problem (for me)
Does anyone know how to convert the following to regex?
select all text between "[" and "]" including the brackets
e.g. select "[abcdef123]" or "[bbb]" from a line of text
Thanks
Mark
PS if anyone is interested in how I did this without regex, here is the code (although I think a replacetext command with a regex expression would be a lot simpler).
select all text between "[" and "]" including the brackets
e.g. select "[abcdef123]" or "[bbb]" from a line of text
Thanks
Mark
PS if anyone is interested in how I did this without regex, here is the code (although I think a replacetext command with a regex expression would be a lot simpler).
Code: Select all
repeat for each line tline in tText
-- check for bracketed text like [this] in the string, and cut
put offset ("[", tline) into tStart
if tStart > 0 then
put offset ("]", tline) into tEnd
if tEnd > tStart then
put char 1 to tStart-1 of tline into tFirst
put char tEnd + 1 to the number of chars of tline of tline into tLast
put tFirst & tLast into tline
end if
end if
end repeat
macOS 12.6.5 (Monterey), Xcode 14.2, LC 10.0.0, iOS 15.6.1
Targets: Mac, iOS
Targets: Mac, iOS