Screen scraping or interpereting data from a website

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
Jordy
Posts: 32
Joined: Sat Feb 27, 2016 3:59 am

Screen scraping or interpereting data from a website

Post by Jordy » Sun Aug 21, 2016 6:10 pm

Hi,

Is there a way to access the text on a website so that I can analyze and work with it. For instance trying to figure out what information a web form is asking for.

Not sure what capabilities livecode already has. Worst case scenario I was considering getting the source code from the website and parsing it to interpret the website.


THANKS

Mikey
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 755
Joined: Fri Jun 27, 2008 9:00 pm

Re: Screen scraping or interpereting data from a website

Post by Mikey » Mon Aug 22, 2016 2:44 pm

I typically use other tools to extract the scrape, and then LC to analyze it, but, you can use "put url" to get the data from a URL.
I haven't tried it yet, because I just noticed this, last night, but the source for the browser widget is available in 8 right in the application bundle, so my long-delayed dream of using LC to scrape directly might be closer, once I see what the source is doing...

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9802
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Screen scraping or interpereting data from a website

Post by FourthWorld » Mon Aug 22, 2016 5:03 pm

What tools do you use for scraping, Mikey?
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

Mikey
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 755
Joined: Fri Jun 27, 2008 9:00 pm

Re: Screen scraping or interpereting data from a website

Post by Mikey » Mon Aug 22, 2016 5:16 pm

There are many, many tools. As I mentioned, I have used "put url" in LC, but if I'm doing a big scrape (think hundreds of thousands of records), the one I like the best is a plugin for Chrome called... "Web Scraper", from Martins Balodis. It takes a little fiddling, but once you get it set up, it works great, even when scraping huge sites, and it lets you set the delay between pages so that you don't piss off the operator by burning their pipe down. Martins has both a paid and free version. The paid version works from one of his servers, the free version from your machine. When you're done you end up with a csv file. You can have multiple scrapes going in different tabs at the same time. Note that if you are trying to do a big scrape on a single URL, breaking it into sections can be tricky, but if not, then you're grinning.

I have also paid him to write a custom scrape configuration, in the case where things were more complicated than what I was able to figure out. The price was cheap, I thought, and after I saw it, I gained new insight into how to use the tool, so now I'm generally able to write my own scripts without much difficulty, even for the most complex sites.

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7215
Joined: Sat Apr 08, 2006 8:31 pm
Location: Minneapolis MN
Contact:

Re: Screen scraping or interpereting data from a website

Post by jacque » Mon Aug 22, 2016 6:20 pm

This will give you the plain text:

Code: Select all

put url tURL into tData 
set the htmltext of the templateField to tData 
put the text of the templateField into tPlainText
Now you can parse the plain text to see what's there.
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

Mikey
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 755
Joined: Fri Jun 27, 2008 9:00 pm

Re: Screen scraping or interpereting data from a website

Post by Mikey » Mon Aug 22, 2016 7:27 pm

By the way, when I mean the source for the browser widget, I mean the LCB source, not C++ or OC, for those who are wondering. The reason to NOT use the "put url" technique is for cases where there is JS framework that has to be executed in the browser, as well. In many of those cases, the data is not retrieved when you retrieve the page source. In those cases, the meat, i.e. the data has to be separately pulled by the browser. In those cases, you can either read through the JS to figure out how to write the code to get what you want, or you can use a browser to get the net result and pull data from that.

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9802
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Screen scraping or interpereting data from a website

Post by FourthWorld » Mon Aug 22, 2016 7:33 pm

Mikey wrote:The reason to NOT use the "put url" technique is for cases where there is JS framework that has to be executed in the browser, as well. In many of those cases, the data is not retrieved when you retrieve the page source. In those cases, the meat, i.e. the data has to be separately pulled by the browser. In those cases, you can either read through the JS to figure out how to write the code to get what you want, or you can use a browser to get the net result and pull data from that.
Given the growing need for JS blockers like NoScript, the prudent business owner should consider content requiring JS to be a bug.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

Post Reply

Return to “Getting Started with LiveCode - Complete Beginners”