Getting image urls out of html page

andyh1234 · Post by **andyh1234** » Sun Sep 17, 2023 9:54 pm

Hi,

Does anyone have any code to strip image urls out of a html page and put them in an array.

Looking to write a small piece of code to be able to download all of the images on a patricular (customisable) web page, I can write the code to download them but am struggling on parsing the html to extract just the image src tags.

stam · Post by **stam** » Sun Sep 17, 2023 10:32 pm

Sounds like a job that could be done with regex?

John Gruber (inventor of Markdown and blog writer worth following) has a post on just this: https://daringfireball.net/2009/11/libe ... ching_urls

EDIT: regex pasted here:

Code: Select all

 \b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))

Apparently this picks up every known variation of URLs, including some wild ones I was unaware of. Some explanation of this in the linked post, for things like this I use https://regex101.com to analyse and test.

It should be possible to adapt to LC as this regex is fully PCRE compatible…

andyh1234 · Post by **andyh1234** » Mon Sep 18, 2023 1:34 pm

Thanks, I guess that can return the first URL, then I can loop through the HTML to find the rest.

Thanks for the idea.

Andy

LiveCode Forums

Getting image urls out of html page

Getting image urls out of html page

Re: Getting image urls out of html page

Re: Getting image urls out of html page