Page 1 of 1

Getting text from HTML <div> into LiveCode

Posted: Fri May 24, 2013 2:51 pm
by aircooled76
Hi Guys,

I am working on a basketball scoreboard controller... I would like to pull information from a Statistics website to use in the scoreboard.

I can get the full HTML code into LiveCode via

Code: Select all

get URL "siteURLhere"
put it into field statinfo
The HTML statistics I need to get are in the following HTML code, in the first div 30/67 (45%) is the Field Goal stat of the first team.

Any suggestions on how to extract the text I need... I would need the "30/67 (45%)" to be put in a separate text field and so on for all the stats...

A

Code: Select all

<div class="game-sum-all-wrap">
	<div class="game-sum-row gs-fg">
		<div class="gs-col-left">30/67 (45%)</div>
		<div class="gs-col-mid">Field Goals</div>
		<div class="gs-col-right">35/72 (49%)</div>
	</div>
	<div class="game-sum-row gs-fg3">
		<div class="gs-col-left">6/14 (43%)</div>
		<div class="gs-col-mid">3 Points</div>
		<div class="gs-col-right">9/20 (45%)</div>
	</div>
	<div class="game-sum-row gs-ft">
		<div class="gs-col-left">23/27 (85%)</div>
		<div class="gs-col-mid">Free Throws</div>
		<div class="gs-col-right">9/16 (56%)</div>
	</div>
	<div class="game-sum-row gs-reb">
		<div class="gs-col-left">33</div>
		<div class="gs-col-mid">Rebounds</div>
		<div class="gs-col-right">31</div>
	</div>
	<div class="game-sum-row gs-ass">
		<div class="gs-col-left">13</div>
		<div class="gs-col-mid">Assists</div>
		<div class="gs-col-right">23</div>
	</div>
	<div class="game-sum-row gs-stl">
		<div class="gs-col-left">5</div>
		<div class="gs-col-mid">Steals</div>
		<div class="gs-col-right">2</div>
	</div>
	<div class="game-sum-row gs-blk">
		<div class="gs-col-left">2</div>
		<div class="gs-col-mid">Blocks</div>
		<div class="gs-col-right">2</div>
	</div>
	<div class="game-sum-row gs-tov">
		<div class="gs-col-left">10</div>
		<div class="gs-col-mid">Turnovers</div>
		<div class="gs-col-right">13</div>
	</div>
	<div class="game-sum-row gs-fou">
		<div class="gs-col-left">18 (24)</div>
		<div class="gs-col-mid">Fouls (Fouls On)</div>
		<div class="gs-col-right">24 (17)</div>
	</div>
	<div class="game-sum-row gs-pip">
		<div class="gs-col-left">34</div>
		<div class="gs-col-mid">Points in the Paint</div>
		<div class="gs-col-right">38</div>
	</div>
	<div class="game-sum-row gs-scp">
		<div class="gs-col-left">4</div>
		<div class="gs-col-mid">Second Chance Points</div>
		<div class="gs-col-right">7</div>
	</div>
	<div class="game-sum-row gs-pft">
		<div class="gs-col-left">14</div>
		<div class="gs-col-mid">Points off Turnovers</div>
		<div class="gs-col-right">11</div>

Re: Getting text from HTML <div> into LiveCode

Posted: Fri May 24, 2013 3:54 pm
by Klaus
Hi aircooled76,

you need to "parse" the data and remove everything that is NOT what you wnat :-D

Here is one solution to your problem:

Code: Select all

on mouseUp

  ## I copied your example data into a field for my example
  ## replacde this line with: put URL/URL_to_get) into tData
  put fld 1 into tData
  
  ## 1. Since TEXT <> HTML e.g.:
  ## <p>ddfdf</p><p>iiiii</p>
  ## is ONE line of text but TWO lines of displayed HTML
  ## For my solution we need to make sure ALL DIVs are in a separate line:
  replace "<div" with (CR & "<div") in tData
  put  "<div class=" & q("gs-col-left") & ">" into tDivLine
  
  ## 2. Now we FILTER all the lines of tData with tDivLine
  ## so in the end tData will be a CR delimited list with only these divs 
  ## at the beginning of each line
  filter tData with (tDivLine & "*")
  
  ## 3. Remove all HTML code:
  ## starting DIV TAG
  replace tDivLine with "" in tData
  
  ## ending DIV TAG
  replace "</div>" with "" in tData
  put tData into fld 2
end mouseUp

function q tString
  return QUOTE & tString & QUOTE
end q
Tested and works :-)

Look up all unknown terms like FILTER etc. in the dictionary to get more info.

Here some great learning resources to get the basics (= grips :-) ) of Livecode:
http://www.hyperactivesw.com/revscriptc ... ences.html


Best

Klaus

P.S:
We can create a function that will extract ANY div tag you need, if you like.
Just drop a line here...

Re: Getting text from HTML <div> into LiveCode

Posted: Wed May 29, 2013 3:25 am
by icouto
Here is a one-line function that will remove all html tags from a string:

Code: Select all

function removeHTML pString
   return replaceText(pString,"<[^<]+>",empty)
end removeHTML
Removing the html tags makes it easier for you to then parse the text and find the information you want.

LiveCode's built-in replaceText function is very powerful because it uses Regular Expressions, which are an advanced topic in their own right.

A Regular Expression (or Regex) defines a way for you to select text by using a standard 'formula'. The regex that is used in the replaceText function above is: "<[^<]+>". This means, literally: "select a chunk of text which starts with '<', then one or more characters that are not '<', and finally a closing '>'." If you'd like to learn more about Regular Expressions, google 'Regular Expression Tutorial' - I'd post a link, but apparently my account does not allow me to do it!

In any case, be warned: Regular Expressions are complicated stuff! ;-)

Re: Getting text from HTML <div> into LiveCode

Posted: Wed May 29, 2013 11:39 am
by Klaus
icouto wrote:In any case, be warned: Regular Expressions are complicated stuff! ;-)
Yep, that's why I avoid them :-D

Re: Getting text from HTML <div> into LiveCode

Posted: Wed May 29, 2013 7:46 pm
by jacque
Another way to remove html is:

Code: Select all

set the htmltext of the templatefield to htmlVar -- variable contains the html string
put the text of the templatefield into tPlainText

Re: Getting text from HTML <div> into LiveCode

Posted: Thu May 30, 2013 12:00 am
by Klaus
jacque wrote:Another way to remove html is:

Code: Select all

set the htmltext of the templatefield to htmlVar -- variable contains the html string
put the text of the templatefield into tPlainText
Yes, Ma'am, but that was not the question :-D

And won't work with DIV/CSS either... :cool:

Re: Getting text from HTML <div> into LiveCode

Posted: Thu May 30, 2013 12:14 am
by jacque
Yeah, you're right. icouto and I got off-track, but maybe someone will find it useful.

BTW, congrats on your honorable mention at the conference. You're a big help here, Klaus. :)

Re: Getting text from HTML <div> into LiveCode

Posted: Thu May 30, 2013 1:43 am
by sturgis
Heres an alternate method though it assumes data formatted as perfectly as the sample you posted.

Code: Select all

   put field 1 into tData -- I put your sample data into a field.  You'd get it from wherever you already get it of course. 
   repeat for each line tLine in tData -- cycle through the data
      switch 
         case tLine contains "game-sum-row " -- if its a sum row, use it as a category 
            put the last word of tLine into tCategory
            
            -- couldn't get it to delete using char -1 to... So did it the hard way. No clue why it wouldn't work.. 
            delete char -2 to -1  of tCategory 
            put cr & tCategory after tFilteredData -- building up the data
            break
     -- do the same thing for all columns (could separate them out if need be
     -- but this way it falls through to the last one which has the actual code.  
         case tLine contains "gs-col-left"
         case tLine contains "gs-col-mid"
         case tLine contains "gs-col-right"
            get  matchtext(tLine,">(.*)<",tMatch) -- if any of the 3 lines we're looking for shows up, grab the right part of it for the data
            put tab &  tMatch after tFilteredData -- add it to the data we're building up
            break
      end switch   
   end repeat
   delete the first char of tFilteredData -- remove extraneous leading cr
   put tFilteredData -- use the now filtered data however you need
After this is run, tFilteredData will contain a tab and cr delimited list that appears like so:

You could always dump the first category since the data itself has the category in column 3. (just comment out the part that

gs-fg 30/67 (45%) Field Goals 35/72 (49%)
gs-fg3 6/14 (43%) 3 Points 9/20 (45%)
gs-ft 23/27 (85%) Free Throws 9/16 (56%)
gs-reb 33 Rebounds 31
gs-ass 13 Assists 23
gs-stl 5 Steals 2
gs-blk 2 Blocks 2
gs-tov 10 Turnovers 13
gs-fou 18 (24) Fouls (Fouls On) 24 (17)
gs-pip 34 Points in the Paint 38
gs-scp 4 Second Chance Points 7
gs-pft 14 Points off Turnovers 11

Re: Getting text from HTML <div> into LiveCode

Posted: Thu May 30, 2013 12:23 pm
by Klaus
jacque wrote:BTW, congrats on your honorable mention at the conference.
Ma'am???

Re: Getting text from HTML <div> into LiveCode

Posted: Thu May 30, 2013 6:24 pm
by jacque
Klaus, that was my way of congratulating you for being commended in the keynote at the conference. (I think it was the keynote, it might have been the concluding session. I was jetlagged and exhausted at the beginning, and sleep deprived at the end, so I forget.) But Kevin was talking about how supportive our community is, and how helpful our members are, and explicitly mentioned your name as one of the foremost forum contributors. He said you had thousands of posts here.

I agree with him, your contributions here are amazing.

Re: Getting text from HTML <div> into LiveCode

Posted: Thu May 30, 2013 9:02 pm
by SparkOut
Yes they are, but did Kevin actually say "Klausimausi"?! :lol:

Re: Getting text from HTML <div> into LiveCode

Posted: Fri May 31, 2013 11:19 am
by Klaus
@Jaqueline
Ah, oh, OK, thanks for the info!

@SparkOut
I DOUBT (for his sake!) :-D

Re: Getting text from HTML <div> into LiveCode

Posted: Fri May 31, 2013 6:03 pm
by mwieder
It wasn't just an honorable mention, it was a stellar example of how supportive the community is, emphasized with an endorsement from <someone in the educational system> about what a wonderful resource Klaus is. And then a search to see how many posts, etc. I *think* it's in the keynote, but don't remember exactly.

Kudos to Klaus.

Re: Getting text from HTML <div> into LiveCode

Posted: Sun Jun 02, 2013 7:33 am
by Simon
Klaus wrote:
@Jaqueline
Ah, oh, OK, thanks for the info!

@SparkOut
I DOUBT (for his sake!)
Klaus,
Please post your replies to the proper forum :)

Simon