Page 1 of 1

HTML entity decoding

Posted: Mon Mar 17, 2014 10:11 pm
by thatkeith
I'm grabbing text from a web page, and this often includes HTML entities, as in this string: "In this second chapter of Ayumi’s biography".
I could simply set the HTMLtext of a field to the grabbed text, but it also contains stuff I want to keep, such as img tags. I could simply run it through a few 'replace' steps, but is there a neater method?

Re: HTML entity decoding

Posted: Tue Mar 18, 2014 5:24 pm
by jacque
The htmltext isn't lost when it's displayed in a field, so I think I'd just do that because it's the easiest way to translate all the entities. Then when you want to find a particular tag, you can "get the htmltext of fld 1" and use offset or wordoffset to find the tag you're looking for. If you need to collect all the tags of a particular type, include the "skip" parameter in the offset function. That will let you hop through the htmltext very quickly, collecting all the instances as you go.