Hi!
I am trying to find URLs embedded in large text files. I want to search for occurrences of "://" and "<stuff>.<stuff>.<stuff>" (three unknown chunks without spaces separated by dots to identify URLs and IP addresses that don't have "http://" appended).
Any suggestions?
Find chars "://" in field does not seem to like to search for "://", I get no results on a small test document that is full of URLs. And I cannot see how to make a regexp for the dotted address given unknown chunk contents and lengths
Thanks!
Walt
Noob question on regexp
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller
Noob question on regexp
Walt Brown
Omnis traductor traditor
Omnis traductor traditor
-
- VIP Livecode Opensource Backer
- Posts: 4000
- Joined: Sun Jan 07, 2007 9:12 pm
- Location: Bochum, Germany
Walt,
I give you my shot at it:
It gives you the a list of hits. Within these hits you will have to extract the things between the dots.
This will give you the stripped down version of the hits
Just make 2 fields, field 1 has the source code of a html page. make two buttons and a second field. Give it a try.
regards
Bernd
I give you my shot at it:
Code: Select all
-- from the dictionary
-- offset(charsToFind,stringToSearch[,charsToSkip])
on mouseUp
put field 1 into myVar
put "" into tCollector
put 0 into tCounter
repeat -- repeats until first offset returns 0 i.e. not found
--put offset(quote &"://",myVar,tCounter) into myHitStart -- if you start with a quote
put offset("://",myVar,tCounter) into myHitStart -- without quote
if myHitStart > 0 then
add myHitStart to tCounter
put offset(quote,myVar, tCounter) into myEnd -- looking for next quote
if myEnd = 0 then exit repeat -- second searchpattern not found
-- select char tCounter to (tCounter + myEnd) of field 1 -- if you want to look at the selection
-- wait 1 second -- if you want to look at the selection in field 1
put char tCounter to (tCounter + myEnd) of field 1 & return after tCollector -- make list of hits
add myEnd to tCounter
else
exit repeat
end if
end repeat
if tCollector <> "" then delete last char of tCollector -- return
put tCollector into field 2
end mouseUp
This will give you the stripped down version of the hits
Code: Select all
on mouseUp
put field 2 into myTemp
replace "://" with empty in myTemp
-- replace quote & "://" with empty in myTemp -- if you want to have a leading quote
set the itemdelimiter to "/"
repeat with i = 1 to the number of lines of myTemp
put item 1 of line i of myTemp into line i of myTemp
end repeat
put myTemp into field 2
end mouseUp
regards
Bernd