wordOffset problem
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller
-
- Posts: 9
- Joined: Wed Feb 16, 2011 5:32 pm
wordOffset problem
I'm trying to grab the URL on any given wikipedia page, for the "official website" shown in "External Links". I grab the HTML source for the wikipedia page in question and then am attempting to use the wordoffset to find the URL string. The URL I want is always preceeded by the code <li><span class="official website"> . So I search for this string first.
The problem I'm having: because the word official has no space between it and the quotation mark before it, wordoffset does not see official as a separate word. Ok - so then I search for the word official, including all the characters before it, up until the prior space. Which means I'm searching for <li><span class="official But wordoffset always returns zero, indicating it could not find anything. When gosh darn it I can obviously see that it is indeed there.
Because what I'm looking for contains a quotation mark, I can't include that in the string. So I construct the string to search for and put it into a variable. Such as put "<li><span class=" & quote & "official" into xHolder. I've checked by putting the string into the message window to make sure the string is as it should be, and it is. But again, wordoffset returns zero. Argggg.
Any and all advice on this greatly appreciated.
The problem I'm having: because the word official has no space between it and the quotation mark before it, wordoffset does not see official as a separate word. Ok - so then I search for the word official, including all the characters before it, up until the prior space. Which means I'm searching for <li><span class="official But wordoffset always returns zero, indicating it could not find anything. When gosh darn it I can obviously see that it is indeed there.
Because what I'm looking for contains a quotation mark, I can't include that in the string. So I construct the string to search for and put it into a variable. Such as put "<li><span class=" & quote & "official" into xHolder. I've checked by putting the string into the message window to make sure the string is as it should be, and it is. But again, wordoffset returns zero. Argggg.
Any and all advice on this greatly appreciated.
-
- Posts: 9
- Joined: Wed Feb 16, 2011 5:32 pm
Re: wordOffset problem
Maybe a shorter way of asking about my issue about this above is this: how can I locate words or phrases within html code, when they're not necessarily separated by spaces? Wordoffset won't identify a word unless it is separated by spaces. If I can figure out how to do this, I won't need to much around solving what I've articulated in the post above.
Thanks.
Thanks.
Re: wordOffset problem
No.Barkandgargle wrote: But wordoffset always returns zero, indicating it could not find anything.

Dictionary is your (best) friend.
If the wordToFind is more than one word, the wordOffset function always returns zero, even if the wordToFind appears in the stringToSearch.
-
- Posts: 9
- Joined: Wed Feb 16, 2011 5:32 pm
Re: wordOffset problem
Hi Bankgkok. Thanks for replying. However, the word I'm finding is not more than one word. The fact that it is a contiguous set of characters means that livecode is seeing it as one word.
Here's something else:
If I use put "<li><span class=" & quote & "official " is in field htmlsource it returns true - indicating the string or "word" does exist and can be found in the code.
But, as I said, when I use put the wordOffset ("<li><span class=" & quote & "official ", field "htmlsource") it returns zero, indicating it cannot be found. And I need to know not only that it exists, but what word number it is.
??
Here's something else:
If I use put "<li><span class=" & quote & "official " is in field htmlsource it returns true - indicating the string or "word" does exist and can be found in the code.
But, as I said, when I use put the wordOffset ("<li><span class=" & quote & "official ", field "htmlsource") it returns zero, indicating it cannot be found. And I need to know not only that it exists, but what word number it is.
??
Re: wordOffset problem
Ah okay. So you should perhaps make a replace.
replace "<li>" with space in myHTMLSource
That would allow the wordoffset to find the "word" (because 1 space before, 1 space after).
replace "<li>" with space in myHTMLSource
That would allow the wordoffset to find the "word" (because 1 space before, 1 space after).
Re: wordOffset problem
words are delmited by space, tab or return. In addition everything enclosed by quotes is a word.
you are searching for a string that contains a quote, and LC can't see that as a word. Use the normal offset to do that.
you are searching for a string that contains a quote, and LC can't see that as a word. Use the normal offset to do that.
Various teststacks and stuff:
http://bjoernke.com
Chat with other RunRev developers:
chat.freenode.net:6666 #livecode
http://bjoernke.com
Chat with other RunRev developers:
chat.freenode.net:6666 #livecode
Re: wordOffset problem
Have you tried to use matchText function to get the URL?
Like "get matchText(theText,"official website..(.*)xxx",theURL)" where xxx is some chars that comes after the URL.
Like "get matchText(theText,"official website..(.*)xxx",theURL)" where xxx is some chars that comes after the URL.