Screen scraping or interpereting data from a website
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller
Screen scraping or interpereting data from a website
Hi,
Is there a way to access the text on a website so that I can analyze and work with it. For instance trying to figure out what information a web form is asking for.
Not sure what capabilities livecode already has. Worst case scenario I was considering getting the source code from the website and parsing it to interpret the website.
THANKS
Is there a way to access the text on a website so that I can analyze and work with it. For instance trying to figure out what information a web form is asking for.
Not sure what capabilities livecode already has. Worst case scenario I was considering getting the source code from the website and parsing it to interpret the website.
THANKS
Re: Screen scraping or interpereting data from a website
I typically use other tools to extract the scrape, and then LC to analyze it, but, you can use "put url" to get the data from a URL.
I haven't tried it yet, because I just noticed this, last night, but the source for the browser widget is available in 8 right in the application bundle, so my long-delayed dream of using LC to scrape directly might be closer, once I see what the source is doing...
I haven't tried it yet, because I just noticed this, last night, but the source for the browser widget is available in 8 right in the application bundle, so my long-delayed dream of using LC to scrape directly might be closer, once I see what the source is doing...
-
- VIP Livecode Opensource Backer
- Posts: 9823
- Joined: Sat Apr 08, 2006 7:05 am
- Location: Los Angeles
- Contact:
Re: Screen scraping or interpereting data from a website
What tools do you use for scraping, Mikey?
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
Re: Screen scraping or interpereting data from a website
There are many, many tools. As I mentioned, I have used "put url" in LC, but if I'm doing a big scrape (think hundreds of thousands of records), the one I like the best is a plugin for Chrome called... "Web Scraper", from Martins Balodis. It takes a little fiddling, but once you get it set up, it works great, even when scraping huge sites, and it lets you set the delay between pages so that you don't piss off the operator by burning their pipe down. Martins has both a paid and free version. The paid version works from one of his servers, the free version from your machine. When you're done you end up with a csv file. You can have multiple scrapes going in different tabs at the same time. Note that if you are trying to do a big scrape on a single URL, breaking it into sections can be tricky, but if not, then you're grinning.
I have also paid him to write a custom scrape configuration, in the case where things were more complicated than what I was able to figure out. The price was cheap, I thought, and after I saw it, I gained new insight into how to use the tool, so now I'm generally able to write my own scripts without much difficulty, even for the most complex sites.
I have also paid him to write a custom scrape configuration, in the case where things were more complicated than what I was able to figure out. The price was cheap, I thought, and after I saw it, I gained new insight into how to use the tool, so now I'm generally able to write my own scripts without much difficulty, even for the most complex sites.
-
- VIP Livecode Opensource Backer
- Posts: 7228
- Joined: Sat Apr 08, 2006 8:31 pm
- Location: Minneapolis MN
- Contact:
Re: Screen scraping or interpereting data from a website
This will give you the plain text:
Now you can parse the plain text to see what's there.
Code: Select all
put url tURL into tData
set the htmltext of the templateField to tData
put the text of the templateField into tPlainText
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com
HyperActive Software | http://www.hyperactivesw.com
Re: Screen scraping or interpereting data from a website
By the way, when I mean the source for the browser widget, I mean the LCB source, not C++ or OC, for those who are wondering. The reason to NOT use the "put url" technique is for cases where there is JS framework that has to be executed in the browser, as well. In many of those cases, the data is not retrieved when you retrieve the page source. In those cases, the meat, i.e. the data has to be separately pulled by the browser. In those cases, you can either read through the JS to figure out how to write the code to get what you want, or you can use a browser to get the net result and pull data from that.
-
- VIP Livecode Opensource Backer
- Posts: 9823
- Joined: Sat Apr 08, 2006 7:05 am
- Location: Los Angeles
- Contact:
Re: Screen scraping or interpereting data from a website
Given the growing need for JS blockers like NoScript, the prudent business owner should consider content requiring JS to be a bug.Mikey wrote:The reason to NOT use the "put url" technique is for cases where there is JS framework that has to be executed in the browser, as well. In many of those cases, the data is not retrieved when you retrieve the page source. In those cases, the meat, i.e. the data has to be separately pulled by the browser. In those cases, you can either read through the JS to figure out how to write the code to get what you want, or you can use a browser to get the net result and pull data from that.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn