Here are ways to improve performance for processing large strings

MichaelBluejay · Post by **MichaelBluejay** » Fri Oct 04, 2019 9:48 am

I nearly abandoned the development of a commercial app using LiveCode because it was too slow for processing large strings (e.g., 8000 lines). I then found some ways to get the performance I needed, so I wanted to share some tips. I hope these tips will be added to the Dictionary and other documentation.

Of course, if you're doing data analysis where only you need to see the result, then waiting four seconds for the answer is no big deal. But if you're deploying a commercial app, you can't expect your users to wait an annoying four seconds every time the app needs to process data. Fortunately, there are solutions.

Benchmarking performance

When trying different strategies to see how long they each one takes to process, time it with the milliseconds:

Code: Select all

on mouseup
  put the milliseconds into startTime
  put field theHugeString into stringToProcess
  repeat with counter = 1 to the number of lines of stringToProcess
     --Do some processing here
  end repeat
  put the milliseconds - startTime
end mouseup

Use repeat for each.. instead of repeat with...

For my 8000-line string, it took 3.9 seconds to process with repeat with...:

Code: Select all

repeat with counter = 1 to the number of lines in hugeString
   delete item 3 to 7 of line counter of hugeString
end repeat

The same processing takes only 0.04 seconds with repeat for each:

Code: Select all

repeat for each line theLine in hugeString
  delete item 3 to 7 of theLine
  put theLine & cr after output
end repeat

Note that repeat for each... doesn't affect the original variable, so it's necessary to store each processed line in a new variable.

Use variables instead of fields

If your data is in a field, put it into a variable and then process the variable. Processing variables is way faster than processing fields.

Save the processed data so it doesn't have to be processed again, or pre-process it

In most cases, repeat for each (above) is so stupidly fast it will solve all your problems. When it doesn't, here are some other ideas to consider.

Once you've processed data, you can save it somewhere so you don't have to process it again in the future, such as a field, custom property, or SQLite database. If the result of the calculations will change when the user adds new data, save the old result anyway, and then just add the calculation for the newly-added data to the old result. That way you process only a small amount of data rather than the whole dataset.

Another trick is to PRE-process the data. For example, let's say you have 800,000 records in a database, and the user wants to change the date format. The slow way is to pull all 800,000 records out of the database, reformat the date for each record and save it back to the database. Now say the user changes his mind and chooses *another* date format. Same as before, you pull all the records, process them, and then save them.

Instead of this, when the user enters a new record, you could save it using three different date fields, one for the three different date options the user can choose, each in a different field. (e.g., 1/15/19; Jan 15, 2019; or 2019-01-15). Then when the user chooses a different date format, you can just pull all the records out of the database with the appropriate field, and not have to process anything. This will use a little more disk space, so that's a tradeoff.

Process in the background

If the user doesn't need the processed data right away, you can process it in the background with LiveCode's on idle handler.

Use an older version of LiveCode?

AxWald reports below that his version 6.7.10 of LiveCode runs twice as fast as his version 9.5.

Process large strings in slices

This section is probably obsolete. I wrote it before I learned that repeat for each is way faster than repeat with.... I'm retaining it to illustrate the concept.

As a string gets longer, it takes exponentially longer to process it. That is, a string that's 8 times longer than a small string doesn't take 8 times longer to process, it can take 65 times longer to process! For example, when I run a loop to process a certain number of lines, this is how long it takes:

Code: Select all

# of lines    # of seconds
----------- -----------------
 1000        0.06
 2000        0.27
 3000        0.58
 4000        0.99
 5000        1.50
 6000        2.20
 7000        3.00
 8000        3.90

I think I read long ago that the reason for the slowdown is that when LiveCode processes lines, it starts counting from the very time for each iteration. Assuming this is true, then when you target, say, line 4256 of a string, LiveCode doesn't zoom into line 4256, it starts with line 1 and counts through until it gets to line 4256. So, when traversing a large string, it's fast at first, but then progressively bogs down.

My solution here was to split up the string into 1000-line slices, process each slice, and then string them together. Doing that, I'm able to process 8000 records in only 0.9 seconds, over 4 times faster than processing a single 8000-line string.

Code: Select all

on mouseup
  put field theHugeString into stringToProcess
  put empty into field output
  repeat with sets = 1 to 8
    put line 1 to 1000 of stringToProcess into subset
    delete line 1 to 1000 of stringToProcess  

    repeat with counter = 1 to the number of lines of subset
      -- Do some processing here
    end repeat  
    put subset after field output
  end repeat
end mouseup

You might think that you could process the first line of the string, save it somewhere else, delete it, and then process the next line in the string, which is now the first line of the string, because you deleted the previously-first line after you processed it and saved it. The idea is that you'd always be processing the first line, which should be fast. Unfortunately this won't work if you have to save each line, because as you tack each processed line onto the end of some new string or a field (even if you've set lockscreen and lockmessages to true), that container gets bigger and LiveCode presumably has to start counting from the top of the container every time you tack on to the end of it. (And by the way, it takes about twice as long to tack into the end of a field than to a string variable.)

What if the order of the data didn't matter? Could you write each processed line to the *beginning* of a new string so that LiveCode wouldn't have to count from the top? I tried that, and for whatever reason, it doesn't save any time.

AxWald · Post by **AxWald** » Fri Oct 04, 2019 11:43 am

Hi,
most important here:
If you go for large data, avoid "repeat with ..."! Instead use "repeat for each ..."

Example:
I dumped a table from an SQL database, 67247 records, 77 fields each, 18 MB. I'll count how often "899" occurs in the 3rd field.

1st try:

Code: Select all

on mouseUp
   put URL "file:C:\Users\user\Desktop\demo.csv" into myData
   set itemdel to tab
   put 0 into myCounter
   put the millisecs into t1
   repeat with i = 1 to the number of lines of myData
      if item 3 of line i of mydata = "899" then
         add 1 to myCounter
      end if 
   end repeat
   put "Hits:" && myCounter & "; Time spent:" && the millisecs - t1
end mouseUp

Result: "Hits: 719; Time spent: 513464" - unacceptable.

Now a faster approach:

Code: Select all

on mouseUp
   put URL "file:C:\Users\user\Desktop\demo.csv" into myData
   set itemdel to tab
   put 0 into myCounter
   put the millisecs into t1
   repeat for each line L in myData
      if item 3 of L = "899" then
         add 1 to myCounter
      end if 
   end repeat
   put "Hits:" && myCounter & "; Time spent:" && the millisecs - t1
end mouseUp

Result: "Hits: 719; Time spent: 59" - that's better, right?

Additional: It might be useful to consider the version of LC used. The times above are with 6.7.10 Win, the version I use in everyday coding. The 2nd example in LC 9.5 Win-64bit gives:
Result: "Hits: 719; Time spent: 102" - nearly double the millisecs ...

Have fun!

Klaus · Post by **Klaus** » Fri Oct 04, 2019 11:46 am

Hi Michael,

you want speed? Then use the "repeat for each" syntax!

Try this and see what I mean:

Code: Select all

on mouseup
  put the milliseconds into startTime
  put field theHugeString into stringToProcess
  lock screen
  repeat for each line tLine in stringToProcess
    ## tLine hold the CONTENT of the line not the line number!
    ## This way LC does NOT count the lines in every loop and it thus lightning fast!
    ## Do some processing here with tLine
  end repeat
  unlock screen
  put the milliseconds - startTime
end mouseup

Best

Klaus

MichaelBluejay · Post by **MichaelBluejay** » Fri Oct 04, 2019 12:06 pm

Thanks, guys. I'd written off for each because it doesn't modify the original string (which is what I need) so I thought I should use the regular for. However, I just tried it, and found that that even if I create a new variable to hold the processed data, it's still blazing fast. I'll edit my first post to include that, so it's a one-stop shopping list of performance issues with large strings.

Klaus · Post by **Klaus** » Fri Oct 04, 2019 12:11 pm

Hi Michael,

MichaelBluejay wrote: ↑
Fri Oct 04, 2019 12:06 pm
Thanks, guys. I'd written off for each because it doesn't modify the original string (which is what I need) so I thought I should use the regular for. However, I just tried it, and found that that even if I create a new variable to hold the processed data, it's still blazing fast.

yes, that's the way to go, just created a new variable for the processed data.
And one has to see the speed to believe it, right?

Best

Klaus

MichaelBluejay · Post by **MichaelBluejay** » Fri Oct 04, 2019 12:18 pm

Indeed.

I think I'm a bigger fan of this forum than of LiveCode itself.

AxWald · Post by **AxWald** » Fri Oct 04, 2019 12:44 pm

Hi,

right, leaving the original data "as-is" & building a new output variable usually is much faster.
Only, at a certain point (= VERY big variables) you'll hit a wall - when appending to the output variable takes more & more time each iteration!
Should this happen you'll be faster writing your output variable to disk ("open file 'output.txt' for append" ...).
An example is here.

Have fun!

[-hh] · Post by **[-hh]** » Fri Oct 04, 2019 3:01 pm

@AxWald

With such a huge data I wonder whether the following could even gain some speed?

Code: Select all

on mouseUp
   put URL "file:C:\Users\user\Desktop\demo.csv" into myData
   set itemdel to tab
   put 0 into myCounter
   put the millisecs into t1
   repeat for each line L in myData
      if item 3 of L = "899" then
         add 1 to c[item 3 of L is "899"] #<-- do it in one stroke
      end if 
   end repeat
   put "Hits:" && c[true]  & "; Time spent:" && the millisecs - t1
end mouseUp

LiveCode Forums

Here are ways to improve performance for processing large strings

Here are ways to improve performance for processing large strings

Re: Here are ways to improve performance for processing large strings

Re: Here are ways to improve performance for processing large strings

Re: Here are ways to improve performance for processing large strings

Re: Here are ways to improve performance for processing large strings

Re: Here are ways to improve performance for processing large strings

Re: Here are ways to improve performance for processing large strings

Re: Here are ways to improve performance for processing large strings