Threading

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, LCMark

Locked
mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Threading

Post by mwieder » Wed Mar 12, 2014 4:27 am

I posted this in a threading discussion in Feature Requests, but it seems appropriate here:

There's an interesting discussion on threading in an xtalk world over at
https://groups.yahoo.com/neo/groups/sta ... s/messages
Last edited by heatherlaine on Fri Mar 14, 2014 3:05 pm, edited 1 time in total.
Reason: split topic and renamed

monte
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 1564
Joined: Fri Jan 13, 2012 1:47 am
Contact:

Re: Non-dispatching wait and event handling

Post by monte » Wed Mar 12, 2014 6:03 am

Could you post anything important here because people need to sign up to that group to read it?
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: Non-dispatching wait and event handling

Post by mwieder » Wed Mar 12, 2014 4:24 pm

Ack! Sorry - I thought the view permission was public... I'm not sure about the ethics of reposting, so I'm taking a chance here...
-----------------------------------------------------------------------------------------------
So, multi-core computing is the future, and threading is probably the most non-intuitive programming domain out there. It is so
un-HyperTalk, because it’s so easy to shoot yourself in the foot. That’s exactly why xTalks must solve this problem to have a future.

I know one xTalk at least had threading built in, via a clever ‘return through’ keyword. Essentially it spawned off its own thread
containing the rest of the handler’s lines. But AFAIK scripters still have to perform their own locking and do other low-level,
“programmery” things. (Dan?)

So I’m kinda trying to come up with a metaphor that somehow works for performing tasks on another thread without losing all the
simplicity and protection. I’m reading around on the web to find what threading approaches exist and what I’ve so far come up with is
a bit restricted and unnatural for my tastes, but is roughly the ‘Actor’ model. I.e. to do threading, you split an entire *object*
off into its own thread, where it doesn’t have access to anything that belongs to the main thread.

That allows for longer, concurrent operations. In addition, one could be permitted to pass data to it when it begins processing, and
it could “return to the main thread” once it is done. It’s kind of like phase-shifting it into a parallel reality, I guess ;-)

But unless I can find a decent real-life metaphor that is similar, I probably wouldn’t be happy with this approach. Anyone have an
idea? Experience?

I guess alternately I could make a “conveyor belt machine” card part not unlike a timer, which does its work on a separate thread
and on which you can queue operations, maybe using the ‘send’ command, and then it sends a reply whenever it has completed a message.
So kind of like NSOperationQueue/GCD queues on the Mac. But that’s also weird.
-----------------------------------------------------------------------------------------------
The issue of threading is a great one to discuss. Thanks for bringing it up!

In between other things in recent years I've refactored the SenseTalk runtime to make it thread-safe (I hope!) but have only begun
musing about exactly how to enable users to run more than one thing at once. So here are a few half-baked thoughts -- at this point I
have more questions than answers.

My goal (like yours, I know) is to make it as simple as possible for users so they don't have to think about locking or any memory
access or deadlock issues. It seems like the simplest approach would be to have a way to send a message to call a handler in another
thread. That handler would get a local copy of any parameters passed in and would operate independently in its own space. I guess
this is essentially the Actor model you mentioned (do you have any links to particularly good descriptions?). Access to global
variables would be protected by locks in the runtime, transparently to the user. That would allow globals to be used as a means of
sharing state/returning values. Not very elegant, but maybe at least a starting point for building something more elegant?

Once we have a fundamental mechanism for 'encapsulating' a background thread/task we can look at how to manipulate these 'blocks'.
Your "conveyor belt" metaphor is interesting, but I don't think it's quite right. Background tasks wouldn't necessarily be performed
in a strict FIFO order. So my first question from the user perspective is: What kinds of tasks does the user want to perform in the
background? Here are a couple of scenarios that come to mind:

(A) A long-running calculation or operation (like a web search) that would be better done in the background while other foreground
tasks continue.
(B) A batch process like performing some sort of conversion on a large number of files.

Case (A) is essentially a one-off. You start it up and ignore it until it's done. Some sort of completion notification would be
important, but otherwise it's basically on its own, and not dependent on any other task.
Case (B) is a bunch of independent tasks that can be performed in any order. Here, it would be more helpful to be notified when all
of the tasks are done -- the user most likely doesn't care to know when each individual task finishes.

It would be helpful to add more scenarios to this list. I'm sure there are some more complex situations with dependencies between
tasks, but how many of them are things people are likely to care about in an xTalk language? Should we aim to support all possible
scenarios or focus on the simple ones that will cover 90+% of the likely actual usage? I'm all for supporting every possible scenario
if that's easy to do, but maybe not if it would add a lot of complexity, unless it's easy to hide the complexity for the simple
cases. I think I'd like to get a better handle on how people would use this capability before nailing down the exact mechanism.

Hmm... for a metaphor, maybe the most natural one is an assistant that you can ask to perform a task for you. Or, in the case of a
batch process, a whole host of 'minions' that can perform lots of tasks at once? ;-) (yeah, I recently watched "Despicable Me")
-----------------------------------------------------------------------------------------------
> It seems like the simplest approach would be to have a way to send a message to call a handler in another thread. That handler
> would get a local copy of any parameters passed in and would operate independently in its own space.

Yes, this is the simplest approach one could think of. It gives us the lowest-level actual threading like Cocoa’s
detachNewThreadSelector:. The aspect of insulation seems to me to be central to providing a kind of threading that is safer than the
classic “put the burden of thread-safety on the scripter” that current programming languages do.

> I guess this is essentially the Actor model you mentioned (do you have any links to particularly good descriptions?).

No good ones. The Wikipedia article describes it in a basic way, but that doesn’t match the description I found when I wrote the
original message (which I can’t find anymore). But I guess what I took away from it is that it’s a standalone object that is
specifically declared as processing messages asynchronously, and that can mainly be interacted with using messages, and has a
restricted number of references to outside objects. Let’s just call it an “isolated concurrent object” for now to avoid confusing
people that know better than I what the Actor model should be.

Given how much I see the foundation of an xTalk being “composing objects”, it kind of feels wrong to have threading happen at the
granularity of the message. I’d want to be able to still access objects, even if it is just one or several objects that are
specifically cleared for this use. If we allow picking a thread at the message level, how do we decide what this object is permitted
to do? How do we avoid it “falling into a gap” when e.g. iterating over the list of card buttons while the main thread deletes
something? If we prevent all this, how do we let it still do anything useful beyond some calculations on variables?

I suppose this is less of a problem for SenseTalk, where objects are mostly identical to dictionaries in variables, which can just
be copied onto the thread?

> Access to global variables would be protected by locks in the runtime, transparently to the user. That would allow globals to be
> used as a means of sharing state/returning values. Not very elegant, but maybe at least a starting point for building something
> more elegant?

While I agree that globals are probably the safest thing we have to be accessed from a thread (it’s fairly straightforward to apply
locks when a global is accessed), I’m not sure I want to encourage use of globals as the only way for threads to communicate.

That’s what’s nice about the Actor model. Since it seems messages are its main inter-thread communication mechanism, you can just
send back a reply message with the result data. If we came up with a block syntax, we could probably even let the caller pass along a
callback block that gets called on the main thread when the thread is done, and gets a copy of the result data.

> Once we have a fundamental mechanism for 'encapsulating' a background thread/task we can look at how to manipulate these
> 'blocks'. Your "conveyor belt" metaphor is interesting, but I don't think it's quite right. Background tasks wouldn't necessarily
> be performed in a strict FIFO order. So my first question from the user perspective is: What kinds of tasks does the user want to
> perform in the background? Here are a couple of scenarios that come to mind:
>
> (A) A long-running calculation or operation (like a web search) that would be better done in the background while other
> foreground tasks continue.
> (B) A batch process like performing some sort of conversion on a large number of files.
>
> Case (A) is essentially a one-off. You start it up and ignore it until it's done. Some sort of completion notification would be
> important, but otherwise it's basically on its own, and not dependent on any other task.
> Case (B) is a bunch of independent tasks that can be performed in any order. Here, it would be more helpful to be notified when
> all of the tasks are done -- the user most likely doesn't care to know when each individual task finishes.

Are we looking for concrete or abstract examples? On the more concrete end of the spectrum, I’d see things like:

1) Sorting
2) Image manipulation
3) Path finding
4) File/Network access

The issue here is *what* is being operated on. If it’s just variables, the value will automatically be copied in the process of
sending a message. If it’s a native object in the image, like sorting cards, or iterating over card parts, it gets harder. I guess a
first approach could just be to not permit operations like that from a thread. But ideally, I’d want to have object graphs on the
thread.

Maybe if each thread had a “worksheet” on which it can create, access and destroy objects? And there was a “snapshot" command that
could be used to create a serialized representation of an object to pass along in a message? Then the snapshot could be
rubber-stamped into the workspace, manipulated there, and then snapshotted again, for the reply message? Or maybe, for performance,
these worksheets would be cards that are locked to a particular thread. You can create such a copy (from an entire card/background,
or just a subset of it), and then when you send one as part of a message to another thread, it becomes attached to the destination
thread, locked to it. That way, the engine could lazily copy things, and e.g. if nobody changes the background, it would remain a
reference?

> It would be helpful to add more scenarios to this list. I'm sure there are some more complex situations with dependencies
> between tasks, but how many of them are things people are likely to care about in an xTalk language? Should we aim to support all
> possible scenarios or focus on the simple ones that will cover 90+% of the likely actual usage?

I don’t think dependencies between tasks need to be built into the language right away (though it is a good idea). If we send
request and reply messages (kind of like Apple Events in HyperCard are handled, come to think of it), the reply message can take care
of kicking off the next message. If you have to wait for another reply, you’d have to keep the current result of each message in e.g.
a user property of the receiving object (or a real global, if you must), and when either of the messages comes in, check if the user
properties are now both filled, and if yes, kick of the next task that needs both.

Maybe we could have a special “group” for messages and send another message with all the reply messages’ parameters when all
messages in a group have arrived. Something like:

set the completionMessage of message group «web search for “Bill Atkinson”» to “foundBillAtkinson”
send searchOnGoogle “Bill Atkinson” to card part “GoogleThreadSearcher” with message group «web search for “Bill Atkinson”»
send searchOnDuckDuckGo “Bill Atkinson” to card part “DuckDuckGoThreadSearcher” with message group «web search for “Bill Atkinson”»

And then you would get reply messages like:

foundOnGoogle “Bill Atkinson”,listOfGooglePageURLsAndOffsets
foundOnDuckDuckGo “Bill Atkinson”,listOfDDGPageURLsAndOffsets
foundBillAtkinson “Bill Atkinson”, listOfGooglePageURLsAndOffsets, “Bill Atkinson”, listOfDDGPageURLsAndOffsets

The searchOnDuckDuckGo handler would internally send its reply with

send foundOnDuckDuckGo theSearchTerm,myResultsList to the sender with message group the message group

I.e., just before its message completes (and could potentially trigger the “foundBillAtkinson” message), it adds a new message to
the group, which postpones the “foundBillAtkinson” message until *that* message has been handled.

This still seems a bit complicated (not to mention the parameter-adding part would be order-dependent, and how do we keep the
original request’s parameters separate from the reply parameters?), but maybe it gives one of you an idea?

> I'm all for supporting every possible scenario if that's easy to do, but maybe not if it would add a lot of complexity, unless
> it's easy to hide the complexity for the simple cases. I think I'd like to get a better handle on how people would use this
> capability before nailing down the exact mechanism.

Yeah. The message groups at least can be an optional feature until you actually need dependencies between tasks. Though they’re not
really very intuitive.

> Hmm... for a metaphor, maybe the most natural one is an assistant that you can ask to perform a task for you. Or, in the case of
> a batch process, a whole host of 'minions' that can perform lots of tasks at once? ;-) (yeah, I recently watched "Despicable Me”)

Heh. I’m beginning to realize that this begins to sound more and more like an NSOperationQueue and dispatch_async, just without the
blocks.
-----------------------------------------------------------------------------------------------
>> Hmm... for a metaphor, maybe the most natural one is an assistant that you can ask to perform a task for you. Or, in the case
>> of a batch process, a whole host of 'minions' that can perform lots of tasks at once? ;-) (yeah, I recently watched "Despicable
>> Me”)

I’ve been meaning to dive in on this (and other topics), but have been a bit busy. I would like to take more time to give a fuller
response at some point, but just briefly…

My guy feeling is that GCD, despite being quite low level, is by far the most intuitive solution I’ve come across for asynchronous
execution, so I would seriously consider trying to just wrap it (or NSOperationQueue) up in as clean and non-scary a way as is
possible.

The question that begs is whether you try to represent blocks. In some ways I like the idea of making a message the fundamental unit
of work, rather than some sort of block concept, but it does present a few problems which blocks neatly circumvent, mostly around
parameters and capturing the context.

Something along the lines of:

on mouseUp
perform on <queue>
set x to blah
send doSomeStuff to card “fred”
end
end

where <queue> could be a named queue you’d made yourself (queue “My Queue”), or perhaps some default queues supported directly by
the syntax (low priority queue, high priority queue, etc).

I guess the other problem is whether you try to hide the fact that UI interaction etc must happen on the main thread/queue. For an
xTalk level, it seems inappropriate for the user to need to know that, but inserting the various thread switches automatically
without destroying any hope of performance sounds a bit challenging...
-----------------------------------------------------------------------------------------------
Yeah, it feels kinda similar, and is a good start. I’d definitely go with arbitrary named queues, just because it feels like the
xTalk thing to do. Given xTalk is generally very forgiving and almost formless, with few restrictions on what you can do when, it
does feel kind of odd to limit threading to a particular object or to force users to have a message to contain their code. So making
it more of a control structure would make it feel more natural to scripters.

I wouldn’t necessarily have a problem with telling the user that certain operations are more costly on a thread. E.g. we could
dispatch all property manipulation to the main thread, transparently. Do we really need to protect a user from executing a ‘go’
command on a thread? Or should we just label that a slow command and move on? I mean, it’s not as if we prevent people from looping
forward over buttons by number and deleting each button. That’s a bit of logic everyone has to learn to code with an xTalk. (Until we
add an ‘all card buttons’ function that returns a list of buttons referenced by ID and make ‘delete’ cope with such lists, at least)

If the interpreter loop on the main thread implicitly yields to requests to do something on the main thread sent by other threads,
we can’t really have hard deadlocks, can we? Any deadlock would at least have to be the scripter’s fault, waiting for something. It’s
not as bad as C, though far from xTalk’s usual intuitiveness, or the even higher standards we’re striving for, but it is a baseline.

Still, I would prefer … something … more for an xTalk. This is still not far from detachNewThreadSelector. But being a Very High
Level Language, an xTalk can already do a lot more locking behind the scenes. But we can’t do all. The classic example of adding
something really requires a lock or a critical region of some sort. Do we want to take out a lock on the card and background every
time a thread accesses a property or global to make sure the add command doesn’t lose a value? Worse, the same for the ‘set’ command?
‘put’? Those are a lot of thread sync points.

Then again, we can detect this while parsing and switch to AddAtomicInstruction etc., so while a user can do this, we can also tell
them: “This is slow, this is how to structure your code if you need it to be faster”. Not that xTalk is a language where speed is the
highest priority. If a language embodies “First make it work, then make it fast”, it is probably xTalk.

However, while we can do this for direct variable accesses, if we have getter/setter methods, things get a lot more complicated.
Not that the other approaches we’ve discussed are that much less complicated… :-)

monte
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 1564
Joined: Fri Jan 13, 2012 1:47 am
Contact:

Re: Non-dispatching wait and event handling

Post by monte » Thu Mar 13, 2014 10:02 pm

I'd be surprised if we need much more than the concept of doing some work in the background|foreground. The syntax in that discussion seems pretty clunky to me compared to the dispatch idea I floated. I'd probably amend it to:

Code: Select all

dispatch <commandName> to <objectReference> [with param1… n] [[in background|foreground] with message <callback>]
Adding dispatch in foreground would allow callbacks to the main script thread. Any dispatch or send commands would remain on the same thread unless specified otherwise. All engine messages come in on the main script thread. I suspect that this would be a bucket load of work by the way... anything that sends a callback would need to know which thread to send it to.
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: Non-dispatching wait and event handling

Post by mwieder » Thu Mar 13, 2014 10:08 pm

Yeah, I like that approach as well. The interesting thing in the long thread I excerpted is the idea of named threads, which could simplify the situation of knowing which thread a callback would be attached to.

monte
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 1564
Joined: Fri Jan 13, 2012 1:47 am
Contact:

Re: Non-dispatching wait and event handling

Post by monte » Thu Mar 13, 2014 10:58 pm

Hmm… OK… how about there's two standard queues (foreground, background) with the ability to create more as required. A queue would have a thread and it's own pending messages with engine messages going to the foreground queue.

Maybe:

Code: Select all

create queue <queue name>
dispatch <commandName> to <objectReference> [with param1… n] [[in {foreground|background|<queue name>}] with message <callback>]
delete queue <queue name>
A queue might have it's own instance of the handlers to message but perhaps we would need a sort of common script local which would be accessible from all threads and a regular script local which isn't. In most cases there would be non need to create a named queue but it might be helpful in some cases.
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: Non-dispatching wait and event handling

Post by mwieder » Thu Mar 13, 2014 11:27 pm

Making variables thread-safe is probably a whole topic on its own.

Q: what happens if two tasks are queued onto the background thread? What about onto a named thread? If this isn't prevented somehow then reference counting would have to come into play and "delete queue" commands would have to be themselves queued until safe to execute.

monte
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 1564
Joined: Fri Jan 13, 2012 1:47 am
Contact:

Re: Non-dispatching wait and event handling

Post by monte » Thu Mar 13, 2014 11:36 pm

I was thinking this form of dispatch would work like send in -1 milliseconds so it would start executing as soon as the thread is idle. The delete queue I think it would need to also wait until the queue is idle as you suggest using a flag which would also cause an error to be thrown if trying to dispatch to a queue that had been deleted.
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 1207
Joined: Thu Apr 11, 2013 11:27 am

Re: Threading

Post by LCMark » Fri Mar 14, 2014 3:10 pm

@mwieder/@monte: Hope you don't mind - I asked Heather to split this topic as it diverged slightly from the original.

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: Threading

Post by mwieder » Fri Mar 14, 2014 7:09 pm

OK - I don't see the divergence, but it's ok by me. I'll go back to lurking mode.

Locked

Return to “Engine Contributors”