livecode and *Readability.js* (='.'=)

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Mariasole
Posts: 235
Joined: Tue May 07, 2013 9:38 pm

livecode and *Readability.js* (='.'=)

Post by Mariasole » Mon Mar 12, 2018 12:13 pm

Hello everyone!
I would like to implement the boilerpipe algorithm inside a livecode application.

---
https://github.com/kohlschutter/boilerpipe
http://boilerpipe-web.appspot.com/
---

My project is simple: extract the contents of an html page in a stack! 8)
This algorithm is used in browsers for the various "readers mod".
The fact is that this algorithm is in java... :shock:
In your opinion, is there a way to "attach" (and call) it to a livecode application? :?
Oops ... I reformulate the question ... :oops:
There is a simple way to implement this java library in an livecode stack? :roll:

Sorry for my naivety! And to all those who already want to throw stones, I ask forgiveness in advance! :oops:


Mariasole
(='.'=)
Last edited by Mariasole on Thu Mar 22, 2018 12:23 pm, edited 4 times in total.
"I'm back" - The Cyberdyne Systems Model 101 Series 800 Terminator

MaxV
Posts: 1579
Joined: Tue May 28, 2013 2:20 pm
Location: Italy
Contact:

Re: livecode boilerpipe (='.'=)

Post by MaxV » Mon Mar 12, 2018 2:57 pm

just use htmltext property and text property?
Livecode Wiki: http://livecode.wikia.com
My blog: https://livecode-blogger.blogspot.com
To post code use this: http://tinyurl.com/ogp6d5w

Mariasole
Posts: 235
Joined: Tue May 07, 2013 9:38 pm

Re: livecode boilerpipe (='.'=)

Post by Mariasole » Mon Mar 12, 2018 3:35 pm

Hi MaxV,
thank you for your answer. I think I understand what you suggest, but in this case the "boilerpipe" library allows you to use advanced algorithms to "isolate" the meaningful text of an html page automatically.

For example, if you go to a news page and on the browser click on "read mode", the browser will show you only the text of the news without menus, descriptions of the photos etc. The boilerpipe algorithm allows just that!

I would like to implement it inside a stack, but I do not know if it's possible. :cry:


Grazie!
Mariasole
(='.'=)
"I'm back" - The Cyberdyne Systems Model 101 Series 800 Terminator

Mariasole
Posts: 235
Joined: Tue May 07, 2013 9:38 pm

Re: livecode and JAVA boilerpipe (='.'=)

Post by Mariasole » Tue Mar 13, 2018 10:09 am

I read the article:

https://livecode.com/infinite-livecode-java-progress/

I do not understand anything about Java ... but I think that what I ask can not be done! :roll:

Thanks the same for the warm help!

Mariasole
(='.'=)
"I'm back" - The Cyberdyne Systems Model 101 Series 800 Terminator

bogs
Posts: 5435
Joined: Sat Feb 25, 2017 10:45 pm

Re: livecode and JAVA boilerpipe (='.'=)

Post by bogs » Tue Mar 13, 2018 2:16 pm

Well, I think it *can* be done, but since I'm not sure of the exact steps of doing it, I hesitated to open my pen as it were.

One way it *could* be done -
... use java to handle the java thing your talking about, use Lc to execute the java, like you would use Lc to launch any external program. While you don't understand java, I'm almost positive there are examples of script in java that execute this library, probably on the site that has it itself.

Of course, Max's suggestion of using built in Lc to do what ever this library does is what I would target, it might take a bit of pain to figure out how to do it, but then you've learned more about a language you know about.

The reason I can't give you a more complete explanation is that I haven't reached doing something similar in Lc myself yet, and while I can (most of the time) parse java, I am by no means fluent in it.

Good luck with it Mariasole :)
Image

Mariasole
Posts: 235
Joined: Tue May 07, 2013 9:38 pm

Re: livecode and JAVA boilerpipe (='.'=)

Post by Mariasole » Tue Mar 13, 2018 2:52 pm

Thanks Bogs,
you are so sweet! :D You gave me courage to find a solution ...
I think the only thing I could do is recreate the Java code in CC.
But I do not know Java, not even in an elementary way! Anyway I'll try! Maybe it takes me five or six years! :cry:
Thank you so much Bogs!



Mariasole
(='.'=)
"I'm back" - The Cyberdyne Systems Model 101 Series 800 Terminator

bogs
Posts: 5435
Joined: Sat Feb 25, 2017 10:45 pm

Re: livecode and JAVA boilerpipe (='.'=)

Post by bogs » Tue Mar 13, 2018 4:44 pm

Mariasole wrote:
Tue Mar 13, 2018 2:52 pm
Maybe it takes me five or six years!
Heh, we're in the middle of another storm, but when I have some time to look into it more, I'll try to see if together, we can figure out a way to shorten it to weeks or even days for you :D
Image

SparkOut
Posts: 2839
Joined: Sun Sep 23, 2007 4:58 pm

Re: livecode and JAVA boilerpipe (='.'=)

Post by SparkOut » Tue Mar 13, 2018 9:30 pm

Cara Maria

I see in the pages you linked that there is a web api for the boilerplate library. I agree it might be beneficial to migrate the library into LiveCode but using the existing web api is a shortcut to getting the result which can ne called from LC.

The linked page says
Just call http://boilerpipe-web.appspot.com/extract?url= http://someurl to highlight the main content of an arbitrary URL.

This usually works fairly well, but you can adjust the extraction parameters to suit your needs.
<and more details... >
If you can keep below the api usage limits you should be able to get the results you need returned to LC which you can then put into a field or process as you require.

bogs
Posts: 5435
Joined: Sat Feb 25, 2017 10:45 pm

Re: livecode and JAVA boilerpipe (='.'=)

Post by bogs » Tue Mar 13, 2018 10:06 pm

LOL

SparkOut, you and I were literally thinking the same thing, be afraid, be VERY afraid :shock:

I had been looking at the pages you linked at the top. I set up a simple stack with 2 fields (result is the large field, txtAddress is the smaller field) and a button "btnApiCall".

This is probably overly complicated, and certainly not perfect, but the code I put in the button using Lc 6.5.2 was -

Code: Select all

on mouseUp
   put "http://boilerpipe-web.appspot.com/extract?url=" & the text of field "txtAddress" into tmpVar
   set the htmlText of field "result" to URL(tmpVar)
end mouseUp

Which gave this result -
Selection_004.png
which I am sure can be cleaned up a bit either by further use of their web api, or through Lc itself. Their api does allow output to text, among other things -
BoilerPipe API page wrote: To change the extraction strategy, add the extractor parameter, with one of the following values:
Strategy Description
ArticleExtractor (default). A full-text extractor which is tuned towards news articles. In this scenario it achieves higher accuracy than DefaultExtractor.
DefaultExtractor A quite generic full-text extractor, but usually not as good as ArticleExtractor.
LargestContentExtractor Like DefaultExtractor, but only keeps the largest content block. Good for non-article style texts with only one main content block.
KeepEverythingExtractor Treats everything as "content". Useful to track down SAX parsing errors.

To change the output format, add the output parameter, with one of the following values:
Output Format Description
html (default). Output the whole HTML document and highlight the extracted main content
htmlFragment Output only those HTML fragments that are regarded main content
text Output the extracted main content as plain text
json Output the extracted main content as JSON. For details, see this page.
debug Output debug information to understand how boilerpipe internally represents a document.
Hope that helps you Mariasole :mrgreen:
Image

bogs
Posts: 5435
Joined: Sat Feb 25, 2017 10:45 pm

Re: livecode and JAVA boilerpipe (='.'=)

Post by bogs » Tue Mar 13, 2018 11:04 pm

While searching how to actually communicate with Java, I did find this (apparently) incomplete lesson :D You might also want to look up (in the dictionary) alternateLanguages, although if your on linux you might be out of luck there.
Image

Mariasole
Posts: 235
Joined: Tue May 07, 2013 9:38 pm

Re: livecode and JAVA boilerpipe (='.'=)

Post by Mariasole » Wed Mar 14, 2018 11:15 am

Thanks SparkOut and thanks bogs!
SparkOut knows that certain computer concepts are very difficult for me! :(
He also knows that without help it takes me years to create something! :cry:
For example, I've been asking for help (for years!) in the forum to create a simple library to post with LC on Facebook or Twitter! Still, while studying, I could not do it! :x

The things I ask are not for me, but they are for everyone! I write programs for pure personal knowledge. Like a meditation! :oops:
But let's get back to us!

Also I thought of using the boilerpipe web api ... but since the code of boilerpipe is open source and there was the possibility of getting myhands in it I said to myself: let's integrate it! 8)

The problem is that it is in Java. So I told myself: let's study how LC integrates with Java. I found practically nothing, and not knowing Java at all, I wondered if anyone could give me a hand ... :roll:

From what I understand, LC does not interface in Java, or interfaces badly and partially.

So I'm thinking of opening the Java code and porting the library 8) , then making it available to everyone. :D

Obviously I do not know where to start. And so I predict it will take years and years! :|

But this will not stop me! I firmly believe in eternal life!

Thanks SparkOut and bogs you are very sweet!

Mariasole
(='.'=)
"I'm back" - The Cyberdyne Systems Model 101 Series 800 Terminator

bogs
Posts: 5435
Joined: Sat Feb 25, 2017 10:45 pm

Re: livecode and JAVA boilerpipe (='.'=)

Post by bogs » Wed Mar 14, 2018 2:55 pm

Looks like all 3 of us think roughly the same then :)

I wasn't able to find much else on integrating java with Lc, so I don't think (unless someone else knows how) there is a way to do it outside of modifying the java source to throw a result (say, into the clipboard, or out to file), then launching the java program from lc and reading the result back in lc.

I mentioned alternateLanguages up there, but from all I could tell that doesn't work on linux (my os), so it is a little hard to test out. Mark W. had some input on solving that in a thread long ago, but I didn't find much else about it, and again I didn't see java per se mentioned there.

The only real question (still) is whether or not you really need any of this api to do what your trying to accomplish? From what I was playing with, Lc's built in functionality seems to accomplish much the same thing with far less pain.

I don't do much with fb and twitter, but I also think Lc has functions for 'post' and 'get' as well, unfortunately you would need to have someone chime in with far more knowledge than I have about how all that stuff works, since I am not about to sign up on either of the platforms to do testing, sorry :D
Image

Mariasole
Posts: 235
Joined: Tue May 07, 2013 9:38 pm

Re: livecode and JAVA boilerpipe (='.'=)

Post by Mariasole » Wed Mar 14, 2018 6:16 pm

Thank you so much bogs!
You are very intelligent, kind and sensitive. Thanks for your help. I know that we have gender equality also between coders, but it is always nice that a girl is treated like a princess! :wink: So thank you again!
The only real question (still) is whether or not you really need any of this api to do what your trying to accomplish? From what I was playing with, Lc's built in functionality seems to accomplish much the same thing with far less pain.
I would like to use this library not for a whim, but because it does what LC obviously is not designed to do.

With this library I can automatically extract only the main content of a webpage, without other spurious contents. :D

For example, if I put in this library a news, this library will take me only the title, summary and article.

Everything else (text of the menu, text blocks on the right, and other text that is part of the "frame" of the page) will not be taken into consideration! Is like the button "read mode" of your browser.

So this is not about simply extracting all the text from an html page (it would be easy!), But only the important text, using an algorithm!

Grazie tante palude! --> (bog---> palude in italian)
Mariasole
(='.'=)


PS: Thanks for Twitter and Facebook, but almost I have lost the hopes of building this LC free library! :cry:
"I'm back" - The Cyberdyne Systems Model 101 Series 800 Terminator

bogs
Posts: 5435
Joined: Sat Feb 25, 2017 10:45 pm

Re: livecode and JAVA boilerpipe (='.'=)

Post by bogs » Wed Mar 14, 2018 11:01 pm

Soon as I free up some block of time, I'll look a bit more into it all. While I'm not fluent in Italian, let me just say (hope I spell this right) buona sera, bambina :) <ducking any thrown rocks>
Image

Mariasole
Posts: 235
Joined: Tue May 07, 2013 9:38 pm

Re: livecode and JAVA boilerpipe (='.'=)

Post by Mariasole » Fri Mar 16, 2018 6:51 pm

buona sera, bambina
Oh Bogs... so you enchant me! :D

Thank you for taking a look at this library... You do not know how much I thank you. :D
I continue to read the listings of the program, but I understand little ... especially the relationship between them! :?
Actually the functions inside seem very simple, but I do not understand their succession, I do not know if I explained myself! :oops:
If I could, I could try step by step (in a couple of years!) to porting the library in LC!
As you can see my English is not fluent (it is orrible and ludicrous) ... but let me just say (hope I spell this right) hello nice big cat! :wink:

Thank you so much!


Mariasole
(='.'=)
"I'm back" - The Cyberdyne Systems Model 101 Series 800 Terminator

Post Reply

Return to “Getting Started with LiveCode - Complete Beginners”