Getting text from HTML <div> into LiveCode

Bringing the internet highway into your project? Building FTP, HTTP, email, chat or other client solutions?

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
aircooled76
Posts: 38
Joined: Mon May 20, 2013 3:14 pm

Getting text from HTML <div> into LiveCode

Post by aircooled76 » Fri May 24, 2013 2:51 pm

Hi Guys,

I am working on a basketball scoreboard controller... I would like to pull information from a Statistics website to use in the scoreboard.

I can get the full HTML code into LiveCode via

Code: Select all

get URL "siteURLhere"
put it into field statinfo
The HTML statistics I need to get are in the following HTML code, in the first div 30/67 (45%) is the Field Goal stat of the first team.

Any suggestions on how to extract the text I need... I would need the "30/67 (45%)" to be put in a separate text field and so on for all the stats...

A

Code: Select all

<div class="game-sum-all-wrap">
	<div class="game-sum-row gs-fg">
		<div class="gs-col-left">30/67 (45%)</div>
		<div class="gs-col-mid">Field Goals</div>
		<div class="gs-col-right">35/72 (49%)</div>
	</div>
	<div class="game-sum-row gs-fg3">
		<div class="gs-col-left">6/14 (43%)</div>
		<div class="gs-col-mid">3 Points</div>
		<div class="gs-col-right">9/20 (45%)</div>
	</div>
	<div class="game-sum-row gs-ft">
		<div class="gs-col-left">23/27 (85%)</div>
		<div class="gs-col-mid">Free Throws</div>
		<div class="gs-col-right">9/16 (56%)</div>
	</div>
	<div class="game-sum-row gs-reb">
		<div class="gs-col-left">33</div>
		<div class="gs-col-mid">Rebounds</div>
		<div class="gs-col-right">31</div>
	</div>
	<div class="game-sum-row gs-ass">
		<div class="gs-col-left">13</div>
		<div class="gs-col-mid">Assists</div>
		<div class="gs-col-right">23</div>
	</div>
	<div class="game-sum-row gs-stl">
		<div class="gs-col-left">5</div>
		<div class="gs-col-mid">Steals</div>
		<div class="gs-col-right">2</div>
	</div>
	<div class="game-sum-row gs-blk">
		<div class="gs-col-left">2</div>
		<div class="gs-col-mid">Blocks</div>
		<div class="gs-col-right">2</div>
	</div>
	<div class="game-sum-row gs-tov">
		<div class="gs-col-left">10</div>
		<div class="gs-col-mid">Turnovers</div>
		<div class="gs-col-right">13</div>
	</div>
	<div class="game-sum-row gs-fou">
		<div class="gs-col-left">18 (24)</div>
		<div class="gs-col-mid">Fouls (Fouls On)</div>
		<div class="gs-col-right">24 (17)</div>
	</div>
	<div class="game-sum-row gs-pip">
		<div class="gs-col-left">34</div>
		<div class="gs-col-mid">Points in the Paint</div>
		<div class="gs-col-right">38</div>
	</div>
	<div class="game-sum-row gs-scp">
		<div class="gs-col-left">4</div>
		<div class="gs-col-mid">Second Chance Points</div>
		<div class="gs-col-right">7</div>
	</div>
	<div class="game-sum-row gs-pft">
		<div class="gs-col-left">14</div>
		<div class="gs-col-mid">Points off Turnovers</div>
		<div class="gs-col-right">11</div>

Klaus
Posts: 14177
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: Getting text from HTML <div> into LiveCode

Post by Klaus » Fri May 24, 2013 3:54 pm

Hi aircooled76,

you need to "parse" the data and remove everything that is NOT what you wnat :-D

Here is one solution to your problem:

Code: Select all

on mouseUp

  ## I copied your example data into a field for my example
  ## replacde this line with: put URL/URL_to_get) into tData
  put fld 1 into tData
  
  ## 1. Since TEXT <> HTML e.g.:
  ## <p>ddfdf</p><p>iiiii</p>
  ## is ONE line of text but TWO lines of displayed HTML
  ## For my solution we need to make sure ALL DIVs are in a separate line:
  replace "<div" with (CR & "<div") in tData
  put  "<div class=" & q("gs-col-left") & ">" into tDivLine
  
  ## 2. Now we FILTER all the lines of tData with tDivLine
  ## so in the end tData will be a CR delimited list with only these divs 
  ## at the beginning of each line
  filter tData with (tDivLine & "*")
  
  ## 3. Remove all HTML code:
  ## starting DIV TAG
  replace tDivLine with "" in tData
  
  ## ending DIV TAG
  replace "</div>" with "" in tData
  put tData into fld 2
end mouseUp

function q tString
  return QUOTE & tString & QUOTE
end q
Tested and works :-)

Look up all unknown terms like FILTER etc. in the dictionary to get more info.

Here some great learning resources to get the basics (= grips :-) ) of Livecode:
http://www.hyperactivesw.com/revscriptc ... ences.html


Best

Klaus

P.S:
We can create a function that will extract ANY div tag you need, if you like.
Just drop a line here...

icouto
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 92
Joined: Wed May 29, 2013 1:54 am

Re: Getting text from HTML <div> into LiveCode

Post by icouto » Wed May 29, 2013 3:25 am

Here is a one-line function that will remove all html tags from a string:

Code: Select all

function removeHTML pString
   return replaceText(pString,"<[^<]+>",empty)
end removeHTML
Removing the html tags makes it easier for you to then parse the text and find the information you want.

LiveCode's built-in replaceText function is very powerful because it uses Regular Expressions, which are an advanced topic in their own right.

A Regular Expression (or Regex) defines a way for you to select text by using a standard 'formula'. The regex that is used in the replaceText function above is: "<[^<]+>". This means, literally: "select a chunk of text which starts with '<', then one or more characters that are not '<', and finally a closing '>'." If you'd like to learn more about Regular Expressions, google 'Regular Expression Tutorial' - I'd post a link, but apparently my account does not allow me to do it!

In any case, be warned: Regular Expressions are complicated stuff! ;-)

Klaus
Posts: 14177
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: Getting text from HTML <div> into LiveCode

Post by Klaus » Wed May 29, 2013 11:39 am

icouto wrote:In any case, be warned: Regular Expressions are complicated stuff! ;-)
Yep, that's why I avoid them :-D

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7389
Joined: Sat Apr 08, 2006 8:31 pm
Contact:

Re: Getting text from HTML <div> into LiveCode

Post by jacque » Wed May 29, 2013 7:46 pm

Another way to remove html is:

Code: Select all

set the htmltext of the templatefield to htmlVar -- variable contains the html string
put the text of the templatefield into tPlainText
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

Klaus
Posts: 14177
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: Getting text from HTML <div> into LiveCode

Post by Klaus » Thu May 30, 2013 12:00 am

jacque wrote:Another way to remove html is:

Code: Select all

set the htmltext of the templatefield to htmlVar -- variable contains the html string
put the text of the templatefield into tPlainText
Yes, Ma'am, but that was not the question :-D

And won't work with DIV/CSS either... :cool:

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7389
Joined: Sat Apr 08, 2006 8:31 pm
Contact:

Re: Getting text from HTML <div> into LiveCode

Post by jacque » Thu May 30, 2013 12:14 am

Yeah, you're right. icouto and I got off-track, but maybe someone will find it useful.

BTW, congrats on your honorable mention at the conference. You're a big help here, Klaus. :)
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

sturgis
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 1685
Joined: Sat Feb 28, 2009 11:49 pm

Re: Getting text from HTML <div> into LiveCode

Post by sturgis » Thu May 30, 2013 1:43 am

Heres an alternate method though it assumes data formatted as perfectly as the sample you posted.

Code: Select all

   put field 1 into tData -- I put your sample data into a field.  You'd get it from wherever you already get it of course. 
   repeat for each line tLine in tData -- cycle through the data
      switch 
         case tLine contains "game-sum-row " -- if its a sum row, use it as a category 
            put the last word of tLine into tCategory
            
            -- couldn't get it to delete using char -1 to... So did it the hard way. No clue why it wouldn't work.. 
            delete char -2 to -1  of tCategory 
            put cr & tCategory after tFilteredData -- building up the data
            break
     -- do the same thing for all columns (could separate them out if need be
     -- but this way it falls through to the last one which has the actual code.  
         case tLine contains "gs-col-left"
         case tLine contains "gs-col-mid"
         case tLine contains "gs-col-right"
            get  matchtext(tLine,">(.*)<",tMatch) -- if any of the 3 lines we're looking for shows up, grab the right part of it for the data
            put tab &  tMatch after tFilteredData -- add it to the data we're building up
            break
      end switch   
   end repeat
   delete the first char of tFilteredData -- remove extraneous leading cr
   put tFilteredData -- use the now filtered data however you need
After this is run, tFilteredData will contain a tab and cr delimited list that appears like so:

You could always dump the first category since the data itself has the category in column 3. (just comment out the part that

gs-fg 30/67 (45%) Field Goals 35/72 (49%)
gs-fg3 6/14 (43%) 3 Points 9/20 (45%)
gs-ft 23/27 (85%) Free Throws 9/16 (56%)
gs-reb 33 Rebounds 31
gs-ass 13 Assists 23
gs-stl 5 Steals 2
gs-blk 2 Blocks 2
gs-tov 10 Turnovers 13
gs-fou 18 (24) Fouls (Fouls On) 24 (17)
gs-pip 34 Points in the Paint 38
gs-scp 4 Second Chance Points 7
gs-pft 14 Points off Turnovers 11

Klaus
Posts: 14177
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: Getting text from HTML <div> into LiveCode

Post by Klaus » Thu May 30, 2013 12:23 pm

jacque wrote:BTW, congrats on your honorable mention at the conference.
Ma'am???

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7389
Joined: Sat Apr 08, 2006 8:31 pm
Contact:

Re: Getting text from HTML <div> into LiveCode

Post by jacque » Thu May 30, 2013 6:24 pm

Klaus, that was my way of congratulating you for being commended in the keynote at the conference. (I think it was the keynote, it might have been the concluding session. I was jetlagged and exhausted at the beginning, and sleep deprived at the end, so I forget.) But Kevin was talking about how supportive our community is, and how helpful our members are, and explicitly mentioned your name as one of the foremost forum contributors. He said you had thousands of posts here.

I agree with him, your contributions here are amazing.
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

SparkOut
Posts: 2943
Joined: Sun Sep 23, 2007 4:58 pm

Re: Getting text from HTML <div> into LiveCode

Post by SparkOut » Thu May 30, 2013 9:02 pm

Yes they are, but did Kevin actually say "Klausimausi"?! :lol:

Klaus
Posts: 14177
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: Getting text from HTML <div> into LiveCode

Post by Klaus » Fri May 31, 2013 11:19 am

@Jaqueline
Ah, oh, OK, thanks for the info!

@SparkOut
I DOUBT (for his sake!) :-D

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Contact:

Re: Getting text from HTML <div> into LiveCode

Post by mwieder » Fri May 31, 2013 6:03 pm

It wasn't just an honorable mention, it was a stellar example of how supportive the community is, emphasized with an endorsement from <someone in the educational system> about what a wonderful resource Klaus is. And then a search to see how many posts, etc. I *think* it's in the keynote, but don't remember exactly.

Kudos to Klaus.

Simon
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3901
Joined: Sat Mar 24, 2007 2:54 am

Re: Getting text from HTML <div> into LiveCode

Post by Simon » Sun Jun 02, 2013 7:33 am

Klaus wrote:
@Jaqueline
Ah, oh, OK, thanks for the info!

@SparkOut
I DOUBT (for his sake!)
Klaus,
Please post your replies to the proper forum :)

Simon
I used to be a newbie but then I learned how to spell teh correctly and now I'm a noob!

Post Reply