Page 1 of 1

Non-blocking repeat?

Posted: Mon Mar 24, 2014 12:32 am
by trenatos
I have a simple program that goes over a block of text and compares a list of words, if the word is present it's replaced with a bold version of itself.
The problem is that with long texts (Book length) the program locks up until it's done, so I'm wondering if anyone knows of a threading or non-blocking way to run a Repeat loop.

Here's the code, not exactly pretty but functional.

Code: Select all

   set the visible of group "wkGroup" to true
   put the filename of stack "advTool" into tempFolder
   set itemdelimiter to slash
   delete the last item of tempFolder
   put slash & "adverbs.txt" after tempFolder
   open file tempFolder for read
   read from file tempFolder until EOF
   put it into theList
   put "" into theWorks
   put field "out" into theWorks
   If theWorks contains "<p>" is false Then
      put "<p>" & theWorks & "</p>" into theWorks
   End if
   Replace CR & LF with "</p><br><p>" in theWorks
   set the itemdelimiter to CR
   Repeat for each item theIt in theList
      replace " " & theIt & " " with " <b>" & theIt & "</b> " in theWorks
   End Repeat
   set the HTMLText of field "out" to theWorks
   set the visible of group "wkGroup" to false

Re: Non-blocking repeat?

Posted: Mon Mar 24, 2014 2:19 am
by bn
Hi Trenatos,

how long is long? How many adverbs? What is the length of the text (in chars)?

Basically you try to take the text of a field and replace the formatting of the adverbs of the list by changing the htmlText of that field. This is basically a very fast operation.

Some assumptions you make are not right: a field in livecode has no CRLF line delimiter, it is always LF for which CR is a synonym. This is regardless of the source of the text: on Windows, where the line delimiter is CRLF this is transformed on the fly when text is read into a field. But that is not the speed limiting factor.
What you do is to do a replace. I suspect this is what slows things down. Because Livecode has to shuffle a lot of memory to replace the text. A lot faster is to do a new list from your existing list using "put after". Livecode is optimized for that. It only has to append, not split the list insert the new string and reassemble the list.

I just tried to change in a very long text of 5.000 lines with 10.494.999 chars 40.000 occurences of the word "Livecode" to bold and it took about 380 milliseconds. Basically using your approach to work on htmlText.
Now you want to change more than one word. You want to use a list of adverbs and set their style to bold. That may of cours add to the time it takes. Hence my question of how many adverbs.

But I would rather try to speed the things up than to do a non-blocking repeat.

If you could post an example stack with a couple of paragraphs (lines) that are then multiplied after download to a representative length of an average project of yours and a typical list of adverbs in that stack I would have a look at it.

Anyways It takes more words to explain this than to append a sample stack.

I hope Craig is not watching. :)

It ilustrates what I try to explain above. First click on the button "Fill Field", this generates the text. Then click on button "bold HTML Trenatos". The script for the fill field is in the stack script, the other in button "bold HTML Trenatos". To adapt this to a list of words instead of a single word would be up to you to achieve, you can make it easy by repeating the whole thing for each adverb, maybe you find a faster way.

Kind regards

Kind regards

Re: Non-blocking repeat?

Posted: Mon Mar 24, 2014 3:04 am
by trenatos
Hi Bernd,

The adverbs file contains 1338 separate words.

Yes, the idea is to change any word in the text field, that is also in the adverbs list, to bold.

My test-text is 103958 words, 622791 characters.

My assumptions about CRLF aren't assumptions, they're there in the text after it was copied in, and I do the replacement to keep paragraphs apart (Otherwise they all became one giant block of text), if there's a better solution, I'd love to hear it.
From what I know, CR and LF are *not* the same thing, and Notepad++ identifies it as two separate things as well, and it seems to work with the copied text.

The test text is a copyrighted book so I can't include it in a stack, but the stack is really just a text field, a button and a text saying "Working".

Re: Non-blocking repeat?

Posted: Mon Mar 24, 2014 3:24 am
by trenatos
I just ran your stack, and while it says it only took 578ms to bold, it actually locked up for about 3 seconds.
The Fill field claims just over 4 seconds and seemed about accurate.
(I'm using 6.5.2 btw)

Re: Non-blocking repeat?

Posted: Mon Mar 24, 2014 3:43 am
by dunbarx

A common method is to do this:

repeat 100000000000
wait 0 with messages
end repeat

This allows you to run the IDE normally, at least within the time frame of that silly handler call. Try it. Make another button that puts a random number into msg. Click on the button while your repeat is running.

Craig (always watching) Newman

Re: Non-blocking repeat?

Posted: Mon Mar 24, 2014 8:21 am
by trenatos
Thanks Craig, that did the trick (Seems to keep the GUI from locking up)

Re: Non-blocking repeat?

Posted: Mon Mar 24, 2014 10:07 am
by bn
Hi Marcus,

Good thing that Craig watched, apparently problem solved.
just ran your stack, and while it says it only took 578ms to bold, it actually locked up for about 3 seconds.
The Fill field claims just over 4 seconds and seemed about accurate.
If you look at the times that are displayed when filling the field you have a breakdown of the time it takes to generate the text in a variable and the time it takes to display content of the variable when you put it into the field.

Sorry, I should have been more clear. 380 milliseconds refers to the generation of the htmlText in the variable, the display time is the same as for the original text. The time it takes Livecode to render the field is constant.
If you want to see the total time add

Code: Select all

put cr & "including putting the text into the field: " & the milliseconds - t && "ms total time" after field "rst_fld"
as the last line to the script of the button. It is on my computer roughly 3500 milliseconds. In my example you would have 380 milliseonds that you could turn into non-blocking code and the remaining time is blocking, no matter what you do. That was the reasoning behind claiming these times.

You might have noticed that I don't set any line endings when building the htmlText. I just bolden the word in a html way. That is because Livecode does some behind the sceen magic on htmlText.

You could put a button into the stack and test on a non-empty result field

Code: Select all

on mouseUp
   put the htmlText of field "rst_fld" into thtmlText
   put (field "rst_fld" contains numToChar("13")) &&  (field "rst_fld" contains CR)
end mouseUp
Anyways, you got it working now. If you just would indicate the time it takes to build your variable or the whole script on your text and adverbs?

Kind regards


Re: Non-blocking repeat?

Posted: Tue Mar 25, 2014 12:31 pm
by bn
Hi Marcus,

I was still intrigued with boldening adjectives from a list.

It turns out that I was wrong regarding "replace". It is in fact quite fast.

I changed the sample stack to test against a list of 1347 adjectives. I could not find a list of adverbs that long. But that is only for demonstration purposes.
Assuming that not all adverbs of your list will be in the text I test the occurrence of adjectives against the list and reduce the number of adjectives to the actual number of adjectives used.

The way I do it is to convert the text into and array and the list into an array and make an intersect. (see code)

This reduces the time to process the text considerably because instead of a list of 1300 you test only against maybe a list of 300.

I also have a button that does the same with offset.

The only problem I see in the approach is that currently adverbs/adjetives are only found if there is a space before and a space after the index word.

To test I have the option after generating the text to append a number of adjectives to the test text. That is an option button that appends space surrounded one word lines.
That way you can see the impact of different amounts of occurences in the text.

The button that uses the offset function has a conditional to test the character before the found offset and the character after the found offset. I had to implement that to avoid partial matches.
That could be used to test for non-space but still valid word boundaries like quotes, commas, periods, exclamation marks etc.
Using offset is unfortunately a lot more complicated than using replace.

Have a look at the stack.

Kind regards

Re: Non-blocking repeat?

Posted: Fri Mar 28, 2014 11:37 am
by bn
Hi Marcus,

did you have the time to look at the stack I posted?

If so did it help?

Kind regards