wordOffset problem

Bringing the internet highway into your project? Building FTP, HTTP, email, chat or other client solutions?

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
Barkandgargle
Posts: 9
Joined: Wed Feb 16, 2011 5:32 pm

wordOffset problem

Post by Barkandgargle » Sun Feb 20, 2011 7:24 am

I'm trying to grab the URL on any given wikipedia page, for the "official website" shown in "External Links". I grab the HTML source for the wikipedia page in question and then am attempting to use the wordoffset to find the URL string. The URL I want is always preceeded by the code <li><span class="official website"> . So I search for this string first.

The problem I'm having: because the word official has no space between it and the quotation mark before it, wordoffset does not see official as a separate word. Ok - so then I search for the word official, including all the characters before it, up until the prior space. Which means I'm searching for <li><span class="official But wordoffset always returns zero, indicating it could not find anything. When gosh darn it I can obviously see that it is indeed there.

Because what I'm looking for contains a quotation mark, I can't include that in the string. So I construct the string to search for and put it into a variable. Such as put "<li><span class=" & quote & "official" into xHolder. I've checked by putting the string into the message window to make sure the string is as it should be, and it is. But again, wordoffset returns zero. Argggg.

Any and all advice on this greatly appreciated.

Barkandgargle
Posts: 9
Joined: Wed Feb 16, 2011 5:32 pm

Re: wordOffset problem

Post by Barkandgargle » Sun Feb 20, 2011 7:29 am

Maybe a shorter way of asking about my issue about this above is this: how can I locate words or phrases within html code, when they're not necessarily separated by spaces? Wordoffset won't identify a word unless it is separated by spaces. If I can figure out how to do this, I won't need to much around solving what I've articulated in the post above.

Thanks.

bangkok
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 937
Joined: Fri Aug 15, 2008 7:15 am

Re: wordOffset problem

Post by bangkok » Sun Feb 20, 2011 7:36 am

Barkandgargle wrote: But wordoffset always returns zero, indicating it could not find anything.
No. ;-)

Dictionary is your (best) friend.

If the wordToFind is more than one word, the wordOffset function always returns zero, even if the wordToFind appears in the stringToSearch.

Barkandgargle
Posts: 9
Joined: Wed Feb 16, 2011 5:32 pm

Re: wordOffset problem

Post by Barkandgargle » Sun Feb 20, 2011 7:38 am

Hi Bankgkok. Thanks for replying. However, the word I'm finding is not more than one word. The fact that it is a contiguous set of characters means that livecode is seeing it as one word.

Here's something else:

If I use put "<li><span class=" & quote & "official " is in field htmlsource it returns true - indicating the string or "word" does exist and can be found in the code.

But, as I said, when I use put the wordOffset ("<li><span class=" & quote & "official ", field "htmlsource") it returns zero, indicating it cannot be found. And I need to know not only that it exists, but what word number it is.

??

bangkok
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 937
Joined: Fri Aug 15, 2008 7:15 am

Re: wordOffset problem

Post by bangkok » Sun Feb 20, 2011 7:49 am

Ah okay. So you should perhaps make a replace.

replace "<li>" with space in myHTMLSource

That would allow the wordoffset to find the "word" (because 1 space before, 1 space after).

BvG
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 1239
Joined: Sat Apr 08, 2006 1:10 pm
Contact:

Re: wordOffset problem

Post by BvG » Sun Feb 20, 2011 12:16 pm

words are delmited by space, tab or return. In addition everything enclosed by quotes is a word.

you are searching for a string that contains a quote, and LC can't see that as a word. Use the normal offset to do that.
Various teststacks and stuff:
http://bjoernke.com

Chat with other RunRev developers:
chat.freenode.net:6666 #livecode

uabclst
Posts: 7
Joined: Thu Feb 10, 2011 2:15 pm

Re: wordOffset problem

Post by uabclst » Mon Feb 28, 2011 11:54 pm

Have you tried to use matchText function to get the URL?
Like "get matchText(theText,"official website..(.*)xxx",theURL)" where xxx is some chars that comes after the URL.

Post Reply