How to Speed Up This Code?

LiveCode is the premier environment for creating multi-platform solutions for all major operating systems - Windows, Mac OS X, Linux, the Web, Server environments and Mobile platforms. Brand new to LiveCode? Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

sritcp
Posts: 431
Joined: Tue Jun 05, 2012 5:38 pm
Location: Alexandria, Virginia

How to Speed Up This Code?

Post by sritcp » Thu Jul 30, 2015 4:23 pm

I have a plain text document that has about 35000 lines.
Every so often, the text contains this pattern:

Special Education Director
Jane Smith
PH: 123-457-7890
Fax: 234-567-8901
jane.smith@something.org


I need to harvest the fax numbers of special ed directors. So I used this code:

Code: Select all

   put the milliseconds into tTime
   repeat with i = 1 to the number of lines in field "RawData"
      if  line i of field "RawData" contains "Special Education Director" then
         put line (i+3) of field "RawData" & return after field "Output"
         add 3 to i
      end if
   end repeat
   put "Done" into msg
   put (the milliseconds - tTime) after msg
It takes 567964 millisecs or over 9 minutes to do this! I find this unreasonable.
I know that I can improve the efficiency by not writing into the "Output" field each time but instead hold it in a variable, etc., but I doubt it will bring it down a whole lot. I was hoping that the whole thing should take not more than a few seconds (say 30). Am I being unreasonable?

My objective here is to improve my skills generally rather than solve this specific problem. Any advice is welcome.

Thanks,
Sri.

P.S.: I can upload the plain text raw data if you need it. It is public data of a state education directory.

Klaus
Posts: 13823
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: How to Speed Up This Code?

Post by Klaus » Thu Jul 30, 2015 4:59 pm

Hi sri,

some general speed hints :D

1. lock screen!
2. do NOT access fields in a repeat loop! Do everything with variables and after computing, write text cack into field!
3. use REPEAT FOR EACH whenever possible!

Try this (out of my head :) )

Code: Select all

...
   put the milliseconds into tTime
   locj screen
   put fld "RawData" into tRD
   repeat with i = 1 to the number of lines of tRD STEP 3
      if line i of tRD contains "Special Education Director" then
         put line (tCounter+3) of tRD & return after OutPutVar
      end if
   end repeat
   put OutPutVar after field "Output"
   unlock screen
   put "Done" into msg
   put (the milliseconds - tTime) after msg
...
Will surely be a tad faster than before :D


Best

Klaus

sritcp
Posts: 431
Joined: Tue Jun 05, 2012 5:38 pm
Location: Alexandria, Virginia

Re: How to Speed Up This Code?

Post by sritcp » Thu Jul 30, 2015 9:25 pm

Hi Klaus:

Thanks for your suggestions.

1. Putting input and output into variables (not fields) reduces the total time only by about 5%.

2. It is better not to use lock screen when the processing time is appreciable and the user needs to be updated on the progress (so they don't freak out).

3. In my example, I can't loop in steps of 3 (as you have suggested). I have to look at each line to see if it contains "Special Education Director" and then can skip the next few lines. After that I have to check each line.

I still think 9 minutes is way too long. May be text processing is, by its very nature, slow.

Regards,
Sri

Klaus
Posts: 13823
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: How to Speed Up This Code?

Post by Klaus » Thu Jul 30, 2015 10:19 pm

Hi Sri,
sritcp wrote:1. Putting input and output into variables (not fields) reduces the total time only by about 5%.
at least somethig...
sritcp wrote:2. It is better not to use lock screen when the processing time is appreciable and the user needs to be updated on the progress (so they don't freak out).
Sure!
sritcp wrote:3. In my example, I can't loop in steps of 3 (as you have suggested). I have to look at each line to see if it contains "Special Education Director" and then can skip the next few lines. After that I have to check each line.
Oh, sorry, looks like I misunderstood your problem a bit.
Will think about it again :D


Best

Klaus

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: How to Speed Up This Code?

Post by mwieder » Thu Jul 30, 2015 11:29 pm

Sri-

Lock the screen anyway. You can show progress through other means.
The following speeds things up by a factor of 3:

Code: Select all

on mouseUp
   local tTime
   local tData
   local tFound
   local i
   local tCursor
   
   lock screen
   put field "RawData" into tData
   put 1 into i
   put the cursor into tCursor -- save the current cursor
   put the milliseconds into tTime
   repeat for each line tLine in tData
      if "Special Education Director" is in tLine then
         put line (i+3) of tData & cr after tFound
      end if
      add 1 to i
      set the cursor to busy -- show progress
   end repeat
   set the cursor to tCursor -- restore the original cursor
   put tFound into field "Output"
   put "Done" && (the milliseconds - tTime) into msg
   unlock screen
end mouseUp
On the other hand, if your code is really structured in the simple format you posted, the following gives a 1000% increase in speed:

Code: Select all

on mouseUp
   local tTime
   local tData
   
   lock screen
   put the milliseconds into tTime
   put field "RawData" into tData
   filter tData with "Fax:*"
   put tData into field "Output"
   put "Done" && (the milliseconds - tTime) into msg
   unlock screen
end mouseUp

Simon
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3901
Joined: Sat Mar 24, 2007 2:54 am
Location: Palo Alto

Re: How to Speed Up This Code?

Post by Simon » Fri Jul 31, 2015 12:04 am

@mwieder Wouldn't be add 5 to i?

Simon
I used to be a newbie but then I learned how to spell teh correctly and now I'm a noob!

sritcp
Posts: 431
Joined: Tue Jun 05, 2012 5:38 pm
Location: Alexandria, Virginia

Re: How to Speed Up This Code?

Post by sritcp » Fri Jul 31, 2015 12:19 am

mwieder wrote:On the other hand, if your code is really structured in the simple format you posted,...
No, it is not.
Each school district has several kinds of administrators (I am only interested in one kind); so, my data of interest is sprinkled among, e.g.,

Superintendent
John Doe
PH: 321-555-1234
Fax: 890-444-4321
john.doe@school.org

plus a whole lot of other tracts of text.

Regards,
Sri

Simon
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3901
Joined: Sat Mar 24, 2007 2:54 am
Location: Palo Alto

Re: How to Speed Up This Code?

Post by Simon » Fri Jul 31, 2015 12:24 am

Hey Sri,
I'm game to play!
Do you have a URL to the data? Not saying I can beat any of the suggested methods, like you, just want to test/increase my knowledge.

Simon
Edit; I'm thinking Theirry will drop in and beat us all with his regex kung fu. Wait a sec... mwieder gots regex chops!
I used to be a newbie but then I learned how to spell teh correctly and now I'm a noob!

sritcp
Posts: 431
Joined: Tue Jun 05, 2012 5:38 pm
Location: Alexandria, Virginia

Re: How to Speed Up This Code?

Post by sritcp » Fri Jul 31, 2015 12:33 am

mwieder wrote:Sri-

Lock the screen anyway. You can show progress through other means.
The following speeds things up by a factor of 3:

Code: Select all

on mouseUp
   local tTime
   local tData
   local tFound
   local i
   local tCursor
   
   lock screen
   put field "RawData" into tData
   put 1 into i
   put the cursor into tCursor -- save the current cursor
   put the milliseconds into tTime
   repeat for each line tLine in tData
  ..............
  
Unbelievably fast!
Do you mean to say that the gain in speed is due only to lock screen (I don't see any other significant changes to my code)?
I was under the impression that, since the screen wasn't getting updated during the handler run, any gain would be minimal. But this is astonishing!

Could you (or someone) explain why lock screen would result in such gains -- or more correctly, why an unlocked screen would cause such a drag even when nothing on the screen was being updated?

Sri.

sritcp
Posts: 431
Joined: Tue Jun 05, 2012 5:38 pm
Location: Alexandria, Virginia

Re: How to Speed Up This Code?

Post by sritcp » Fri Jul 31, 2015 12:45 am

sritcp wrote: Do you mean to say that the gain in speed is due only to lock screen (I don't see any other significant changes to my code)?..........
Sri.
My bad, Mark!
You have used "repeat for each line in ..." instead of "repeat with i=1 to ..." (and used a counter to keep track of the line number).
SO THAT'S WHAT DID IT!

I knew "repeat for each..." is much faster than "repeat with i = 1 to ..." but didn't imagine the quantum leap!

I think I (and a lot of others would benefit, too) need a few tutorials on how some of these functions/commands/iterative loops interact with the LC engine and their speed implications.

Thanks a lot, Mark!

(Simon, I have attached the plain text file of data, if you want to play around with it.
EDIT: The file won't upload. This forum says ".txt" files are not allowed. The .txt file is 689 KB. The original rich text file is 11.8 MB !!
EDIT #2: I have attached the zipped file. Thanks for the hint, mwieder!).

Regards,
Sri
Attachments
Plaintext SD SpEd 2015 Sch Directory.zip
(85.4 KiB) Downloaded 313 times
Last edited by sritcp on Fri Jul 31, 2015 2:19 pm, edited 1 time in total.

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9823
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: How to Speed Up This Code?

Post by FourthWorld » Fri Jul 31, 2015 1:27 am

sritcp wrote:I think I (and a lot of others would benefit, too) need a few tutorials on how some of these functions/commands/iterative loops interact with the LC engine and their speed implications.
I've been collecting notes for a performance FAQ, hopefully up at livecodejournal.com in a few weeks.

The Dictionary entry for "repeat" notes that the "for each" form is much faster, but doesn't elaborate as to why.

The trick to most of these things is to think like a machine, though fortunately that doesn't happen naturally for most people. :)

Any chunk expressions, like "the number of lines", may be easy to write, but they require the engine to examine every character, counting carriage returns as it goes. When used in a repeat loop, such as "get line i of tSomeText", it means that each time through the loop it repeats that process of counting carriage returns from the beginning, taking longer and longer each time as it goes deeper into the text.

The "repeat for each" form bypasses all of that very smartly: each time through the loop it only examines the characters up to the chunk type (in the case of lines, return), puts what it found up to that point into the variable you'd specified in the loop instruction, ready to be operated on by your script. As it goes it keeps track of where it left off, so each time through the loop it counts only one line.

As a general rule, whenever you find performance in LiveCode surprisingly slow compared to other scripting languages like Python, PHP, etc., you'll find you can usually bring performance at least on par with those by using the best language options for the task at hand.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: How to Speed Up This Code?

Post by mwieder » Fri Jul 31, 2015 3:23 am

Sri-

There are actually several things going on here.

1. Locking the screen while doing calculations improves the speed because updating the screen is slow.
In my test data file, the cursor barely had a chance to update, and the code would be even faster if I didn't update it, because again it's updating the screen in real time. (Setting the cursor to busy during a repeat loop is a trick to show that something's going on... each time through the loop the spinning cursor increments to its next cycle, so the user knows that the computer is actually busy)

2. Using data from a variable instead ofa field object also contributes to the speed increase.
When you retrieve text from a field or put new text into it, more goes on than just dealing with text.
I haven't been through that section of code, but I think the engine is making a copy of the field's text and then extracting the section you want when you retrieve it. Writing into the field is even worse because a lot of text format mangling happens that isn't required in a variable that's just dealing with text itself.

3. Using the "repeat for each" format is *much* faster than any other form of looping.
I'm not sure why, and I'm looking forward to Richard's article.
The only downside to the "repeat for each" method is, as you've noticed, that you sometimes have to manipulate a loop index yourself.
But I almost always find that disadvantage more than offset by the speed increase of the repeat loop.

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Location: Berkeley, CA, US
Contact:

Re: How to Speed Up This Code?

Post by mwieder » Fri Jul 31, 2015 3:26 am

Oh... and if you zip the text file you may be able to attach it.

sritcp
Posts: 431
Joined: Tue Jun 05, 2012 5:38 pm
Location: Alexandria, Virginia

Re: How to Speed Up This Code?

Post by sritcp » Fri Jul 31, 2015 4:29 am

FourthWorld wrote:...... Any chunk expressions, like "the number of lines", may be easy to write, but they require the engine to examine every character, counting carriage returns as it goes. When used in a repeat loop, such as "get line i of tSomeText", it means that each time through the loop it repeats that process of counting carriage returns from the beginning, taking longer and longer each time as it goes deeper into the text.

The "repeat for each" form bypasses all of that very smartly: each time through the loop it only examines the characters up to the chunk type (in the case of lines, return), puts what it found up to that point into the variable you'd specified in the loop instruction, ready to be operated on by your script. As it goes it keeps track of where it left off, so each time through the loop it counts only one line.
This explains why the performance difference is particularly marked in cases where we are looking for a needle in a haystack; in my dataset, I was looking for 250 lines of interest among 35,000 total lines. So, "line i of tRawData" was starting at line #1 each time and plodding through. Towards the end, it was counting 35,000 lines to do a single iteration! The performance difference would not have been so stark if, for instance, 30,000 of the 35,000 lines contained retrievable data. This gives me some generalizable patterns to look for when evaluating data.

I look forward to Richard's article. (I was under the impression that LiveCodeJournal was in suspended animation; the tutorials in there are of some vintage and the last blog entry is over a year old. Glad to know it is coming alive again!)

Regards,
Sri

sritcp
Posts: 431
Joined: Tue Jun 05, 2012 5:38 pm
Location: Alexandria, Virginia

Re: How to Speed Up This Code?

Post by sritcp » Fri Jul 31, 2015 4:33 am

mwieder wrote: .......... There are actually several things going on here.........
Yes, I see that.
Using variables instead of fields gave about 5% improvement. I didn't test lock screen separately, but I doubt it contributed much to the saving since very little screen updating was going on during the handler. "Repeat for each...." seems to account for most of the benefit, at least in this specific case.

Regards,
Sri

Post Reply

Return to “Getting Started with LiveCode - Experienced Developers”