Noob question on regexp

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
WaltBrown
Posts: 466
Joined: Mon May 11, 2009 9:12 pm
Location: NE USA

Noob question on regexp

Post by WaltBrown » Wed May 13, 2009 11:05 pm

Hi!

I am trying to find URLs embedded in large text files. I want to search for occurrences of "://" and "<stuff>.<stuff>.<stuff>" (three unknown chunks without spaces separated by dots to identify URLs and IP addresses that don't have "http://" appended).

Any suggestions?

Find chars "://" in field does not seem to like to search for "://", I get no results on a small test document that is full of URLs. And I cannot see how to make a regexp for the dotted address given unknown chunk contents and lengths

Thanks!

Walt
Walt Brown
Omnis traductor traditor

WaltBrown
Posts: 466
Joined: Mon May 11, 2009 9:12 pm
Location: NE USA

Post by WaltBrown » Wed May 13, 2009 11:10 pm

I got the first one to work, but any suggestions for the second form would be helpful.

Thanks,
Walt
Walt Brown
Omnis traductor traditor

bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 4000
Joined: Sun Jan 07, 2007 9:12 pm
Location: Bochum, Germany

Post by bn » Thu May 14, 2009 12:25 pm

Walt,
I give you my shot at it:

Code: Select all

-- from the dictionary
-- offset(charsToFind,stringToSearch[,charsToSkip])

on mouseUp
   put field 1 into myVar
   put "" into tCollector
   put 0 into tCounter
   
   repeat -- repeats until first offset returns 0 i.e. not found
      
      --put offset(quote &"://",myVar,tCounter) into myHitStart -- if you start with a quote
      
      put offset("://",myVar,tCounter) into myHitStart -- without quote
      
      if myHitStart > 0 then 
         add myHitStart to tCounter
         put offset(quote,myVar, tCounter) into myEnd -- looking for next quote
         if myEnd = 0 then exit repeat -- second searchpattern not found
         
         -- select char tCounter to (tCounter + myEnd) of field 1 -- if you want to look at the selection
         -- wait 1 second -- if you want to look at the selection in field 1
         
         put char tCounter to (tCounter + myEnd) of field 1 & return after tCollector -- make list of hits
         add myEnd to tCounter
         
      else
         exit repeat
      end if
   end repeat
   if tCollector <> "" then delete last char of tCollector -- return
   put tCollector into field 2
end mouseUp
It gives you the a list of hits. Within these hits you will have to extract the things between the dots.
This will give you the stripped down version of the hits

Code: Select all

on mouseUp
   put field 2 into myTemp
   replace "://" with empty in myTemp
   -- replace quote & "://" with empty in myTemp -- if you want to have a leading quote
   
   set the itemdelimiter to "/"
   repeat with i = 1 to the number of lines of myTemp
      put item 1 of line i of myTemp into line i of myTemp
   end repeat
   put myTemp into field 2
end mouseUp
Just make 2 fields, field 1 has the source code of a html page. make two buttons and a second field. Give it a try.
regards
Bernd

WaltBrown
Posts: 466
Joined: Mon May 11, 2009 9:12 pm
Location: NE USA

Post by WaltBrown » Fri May 22, 2009 3:07 pm

Thanks, Bernd, I'm getting up to speed on the various flavors of regexp, especially the various need or not for some of the delimiting characters like parenthesis. You've been a great help.
Walt Brown
Omnis traductor traditor

Post Reply

Return to “Getting Started with LiveCode - Complete Beginners”