Page 1 of 1

browser widget use in livecode server (restrictions?)

Posted: Fri Feb 14, 2020 12:35 am
by couchpotato
I have a functioning stack that does what I need to do, running fine on LC 9 on Mac OS X with my Indy license. For now, it simply creates a browser widget, opens a website in it and extracts the value of a specific field.

on myJSHandler Message
answer "myJSHandler:" && Message
end myJSHandler

create widget "myBrowser" as "com(dot)livecode(dot)widget(dot)browser"
set the javascriptHandlers of widget "myBrowser" to "myJSHandler"
set the url of widget "myBrowser" to "someserver address"
put "var myLoc;"&return&"myLoc=$('#aboutLandscape h2').html();"&return&"liveCode(dot)myJSHandler(myLoc);" into tJS
do tJS in widget "myBrowser"

This opens the html page and sucks out the value of the specified id (I was ecstatic to discover that jQuery on the site was usable). This code fails silently when run in within index(dot)lc page using LiveCodeIndyServer-9_5_1-Linux-x86_64. Livecode server is properly installed and functional. "Fail" means that the callback myJSHandler is never hit, so no extracted data is available. Since there's apparently no other way get scrape content from a browser widget other than using "do" with the corresponding liveCode(dot)callback (see above) I use "put" instead of "answer" in the myJSHandler in the case of Livecode Server (since answer, ask, etc functionality isn't possible in Livecode Server). Am I foolish to believe that I should be able to use a browser widget on Livecode Server .lc pages? I have lots of other jQuery goodies that I expected to hook up once I get this protocol to work, the fact that it works so well on the Mac IDE makes me suspicious that it's not expected to work on Livecode Server, is that true?

Re: browser widget use in livecode server (restrictions?)

Posted: Fri Feb 14, 2020 1:48 am
by FourthWorld
The browser widget provides a GUI for surfing the web. Linux servers don't normally include any GUI support.

You might be able to add GUI components to the server, but it seems like a lot of overhead for a non-GUI task.

Can you describe what you need to do there? I'll bet we can come up with a leaner solution.

Re: browser widget use in livecode server (restrictions?)

Posted: Fri Feb 14, 2020 4:09 am
by couchpotato
I am not expecting a GUI version of the browser, it was the functionality (not the appearance) that I was after.
I am looking for a means by which i can programmatically enter data into fields on an arbitrary website as a robotic typist would.
The intention here is to streamline processes for which external APIs aren't yet developed.
open website xxx
paste name into html input id#0
paste email into html input id#1
trigger click on html button #id2

this is all above-board, would honor login userid/password

What the target site is? Immaterial, any sequence of click navigation to
locate desired fields and perform simple data entry using scripted behavior.

It seems inconsistent that Livecode Server would allow creating browser widgets but not actually populate them
with website contents - I had expected some kind of error, got none. After I create the browser widget and set it's
url the number of widgets on the card goes from zero to one. I was (wistfully) hoping that since the identical scripts
that work on local IDE would be useable on LiveCode Server, but such is not the case. Oh Well...

Re: browser widget use in livecode server (restrictions?)

Posted: Fri Feb 14, 2020 4:55 pm
by FourthWorld
couchpotato wrote:
Fri Feb 14, 2020 4:09 am
I am not expecting a GUI version of the browser,
Ah, but that's exactly what a browser is, a way for a human user to literally browse web documents. A non-GUI "browser" would be something very different: there are no human eyes on a server system, so no browsing per se is possible.

That said, the tasks a user performs in a browser can, in many cases, be automated. But when running such automation on a system like a Linux server (which has no GUI support), using a GUI application like a web browser would not be the means by which developers accomplish that task.

Consider what a browser does:
1. Sends a request to a remote host.
2. Downloads the resulting page data.
3. Renders the page for display.
4. Handles any user interaction locally.
5. Where a form is used, it submits the form data to the remote host via POST.

Only step 3, rendering, truly requires any GUI support. Everything else can be done without embedding an entire web browser application into the mix.

LiveCode will indeed support nearly all of its GUI components across platforms. The limitation here isn't LC, but the nature of servers themselves: Linux servers are designed to be lean and efficient, and since no human eyes can see them they save considerable RAM and CPU time by not including GUI support at all.

You may be able to modify the system to include GUI components, but performance is always worth considering and in a server process much more so, since every request will consume more memory and CPU, and with enough traffic that can multiply to slow things down. Writing lean code is always useful; when writing for a server it's essential.

So how to people do things like this?

A quick web search for "automating web form submission from another server" brings up a number of ways this is done in various languages: ... her+server

In LiveCode, Steps 1 & 2 above can be done with "GET url"; step 4 can be done with some coding to look for form elements and set their values, and step 5 can be done with the POST command.

However, you wisely anticipated important caveats here:
this is all above-board, would honor login userid/password
That introduces two important considerations: security and authentication.

Authentication will vary from page to page. Some will use OAuth2, others HTTPBasicAuth, others may use something else entirely.

The means to handle authentication and authorization may be to include an authorization token and/or session ID in the HTTP header, originally provided during the login, in the case of OAuth2. Or where HTTPBasicAuth is used it may be as simple as embedding the user name and password in the URL. Other methods will require other means, usually far more complex than HTTPBasicAuth.

HTTPBasicAuth is easy to emulate if you have user credentials; but by design, OAuth2 is not.

Expect to encounter a wide range of authentication methods, and be prepared for non-trivial work to attempt to "honor login".

This is as it should be, given the implications if it were easy for robots to masquerade as human users all over the web.

While your intentions are no doubt honorable, the prudent designer of the system you're submitting form data to cannot make such assumptions about the other seven billion people on the planet. Everyone from script kiddies to hostile nation-states attempt to submit form data emulating a human user all the time for nefarious, sometimes dangerous, purposes.

Accordingly, in addition to ever-more-complex authentication schemes, many hosts add additional impediments to automated form submission (or when done by the bad guys, commonly call "spambots"). There's an entire niche industry built around blocking such things, Captcha and ReCaptcha being among the most popular. Those require not only GUI rendering within a browser, but sophisticated AI to attempt to emulate how a human user would interact with them.

Another possible limitation is that some server systems may authenticate by IP address, and subsequent attempts to use an auth token from an IP address different from the user's, such as your web server's, would invalidate the token.

And then there's the question of how you obtain the user credentials and/or auth token to begin with. Unless there's a serious bug in the user's browser you should not be able to obtain the auth token or any other data stored in cookies from another domain. And asking users to re-enter the credentials they have for another site into yours should at least raise eyebrows with them, and for you it creates an embumbrance of handling those credentials with extreme care and caution.

You originally defined your goal as "..programmatically enter data into fields on an arbitrary website as a robotic typist would." In your case that would be useful, but in a world chock full o' bad actors you'll have the best minds in the field, including the top security engineers at Google, working day and night to prevent that from happening.

You may need to reconsider your original goal, focusing on a subset of forms of use to your customers. Depending on what those forms are and security skills of the host's administrator it may be easy, or challenging, or extremely difficult. But almost certainly more achievable than to attempt to write one system that can emulate a human user or any form on any site.