Page 1 of 1
Downloading dynamic web pages
Posted: Wed Oct 12, 2022 11:18 am
by danielrr
Most webpages today are dynamic (and when they are not, I always feel some nostalgia for the lost Internet golden years past
Anyway, when I try to download one of these dynamic pages (generated or not by PHP) using LC (with get URL or libURLDownloadToFile) I usually get nothing but rubbish: either a page sending me to some place where I can upgrade my browser (the one I not using) like
http://browsehappy.com/ or a generic page from some company.
Is there a way to retrieve one of these dynamic pages using LC?
Re: Downloading dynamic web pages
Posted: Wed Oct 12, 2022 3:18 pm
by FourthWorld
WP's Browse Happy page seems a redirect destination when JavaScript isn't enabled. Do you have a URL to an example of a page that doesn't download well? The issue may be solvable by setting the USER-AGENT header, or may require JS HXR, but which is the case will require seeing a page exhibiting the problem.
Re: Downloading dynamic web pages
Posted: Wed Oct 12, 2022 3:23 pm
by Emily-Elizabeth
You might want to try setting the User-Agent header with libURLSetCustomHTTPHeaders (if I'm reading the docs correctly, this should work)
This is mine from my Safari
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Safari/605.1.15
Re: Downloading dynamic web pages
Posted: Thu Oct 13, 2022 1:18 am
by danielrr
Thanks FourthWorld, Emily-Elizabeth. An example of a page that doesn't download well may be this one
https://www.dicciogriego.es/index.php#l ... =801&n=801
OTOH, and regarding Emily-Elizabeth suggestion, I'd appreciate an example of the actual code used in a script to set the User-Agent header to something server-savvy like "Mozilla/5.0 etc.
FourthWorld wrote: ↑Wed Oct 12, 2022 3:18 pm
WP's Browse Happy page seems a redirect destination when JavaScript isn't enabled. Do you have a URL to an example of a page that doesn't download well? The issue may be solvable by setting the USER-AGENT header, or may require JS HXR, but which is the case will require seeing a page exhibiting the problem.
Re: Downloading dynamic web pages
Posted: Thu Oct 13, 2022 3:11 am
by FourthWorld
Thank you for that URL. When I use "get url..." to see what the server sends, what I get seems identical to what a web browser gets if JavaScript is turned off.
With some pages using JS to dynamically load content I might be inclined to suggest hunting for the one line that sits at the heart of what you're looking for, and making a URL out of that. But alas this page is not a simple, made less so with its use of the Bootstrap framework.
At the heart of this there may be a mismatch between intention and implementation with the org running that site. I don't read Spanish so my interpretation may be off, but they make their lookup dictionary web app freely available, and go further to apparently license under Creative Commons so others can benefit from the work. But then they provide no obvious means by which anyone would do so.
I would have expected a team using smart tooling to deliver such smart work to also provide an API so folks like you could easily grab the portion you're looking for. And perhaps they're considering it; the site is marked as "Under Construction", and maybe when it's done they'll add extras like an API for other apps to access their dictionary.
At this point the most expedient suggest I have is to write them and ask them if they have a REST API, or intend to do so. If they do, post the URL to that API's documentation here and we can show you how to use that with LC.
If they have no API and no plans to implement one, you will have a much greater challenge. Are you up for learning Bootstrap?
Re: Downloading dynamic web pages
Posted: Fri Oct 14, 2022 9:52 am
by danielrr
Thanks for the reply. Yes, I'm not an expert but I do have some experience with Bootstrap. And yet I wouldn't know how to solve he problem. Actually this problem seem to appear for most sites using PHP I'm finding in a quick survey. For instance, this one (a much simpler page):
https://logeion.uchicago.edu/κάναβος (the problem is the same if you encode the URL
https://logeion.uchicago.edu/%CE%BA%CE% ... F%89%CF%82 )
Re: Downloading dynamic web pages
Posted: Fri Oct 14, 2022 3:44 pm
by FourthWorld
PHP is server-side. If you run your browser with a JavaScript blocker you'll see the same thing you download with anything that doesn't run JavaScript as it loads the page, like LC's "get url" command.