adventuresofgreg wrote:I stand WAY corrected. I read over the benchmarking link that you sent, and set up a few my own tests. In every case, avoiding fields as much as possible was faster - even with Lock Screen.
I've been using this family of languages for more than 20 years, and in the early days many xTalk dialects were quite slow so I took up the habit of testing various ways to do things to see which was faster. Sometimes the difference is trivial, but when it pays off it can pay off big, freeing up enough clock cycles to add new features like real-time updates.
LiveCode is so fast that many benchmarks don't matter much, but sometimes I find surprising things that make it worth taking a minute or so to test them out.
Glad that was helpful for you.
One question that is still bothering me: why is:
repeat for each line myline in holder
so much faster than:
repeat with i=1 to the number of lines of holder
The reason this bothers me, is that when using the fast method, I have no way of referring to some line offset from the current "myline".
Let me first explain what's happening under the hood so you can better appreciate the speed difference, and then I'll offer a solution for your counter that will hopefully be just what you need while still taking advantage of the speed boost:
When you use this:
Code: Select all
repeat with i = 1 to the number of lines of tData
DoSomethingWith line i of tData
end repeat
…here's what the engine needs to do each time through the loop:
1. That repeat statement requires the engine to walk through every character in tData, counting the return characters and adding them up, while incrementing the integer in i. After all, the engine has no way to know if tData was changed within the loop, so reexamining it in full is required for every iteration.
2. The second line requires the engine to repeat most of the steps it just did in the repeat statement before it, examining every character and counting returns until it reaches the number of return characters in i.
In contrast, when you use this:
Code: Select all
repeat for each line tLine in tData
DoSomethingWith tLine
end repeat
…the engine is only counting return characters once for the entire duration of the process:
1. Since this repeat form allows the engine to assume that the data in tData will not be changed in the loop (indeed, it won't even allow changing tData), it makes one pass through the data, examing every character but stopping when it reaches the next return, putting everything from the previous return to the current one into tLine, so that:
2. tLine is already parsed out and ready to use, with no further traversing of tData needed.
If you need a counter in a "repeat for each" loop, you can do this:
Code: Select all
put 0 into i
repeat for each line tLine in tData
add 1 to i
DoSomethingWith tLine, i
end repeat
In that setup i can be used to increment a progress indicator or do other things that depend on knowing which line you're currently working with, at the low cost of incrementing an integer so it has no noticeable effect on performance.
If you need to alter the data in tData, you'll find that "put…after" has been well optimized since LiveCode 1.5, so you can rebuild lists on the fly very quickly - a simple example:
Code: Select all
put empty into tNuData
put 0 into i
repeat for each line tLine in tData
add 1 to i
put "This is line # " & i &tab& tLine &cr after tNuData
end repeat
delete last char of tNuData -- trailing cr
return tNuData
In your specific scenario, however, I would consider a completely different approach since you need to be able to operate on multiple lines in each iteration:
To keep it simple, we have a list of 10000 lines of random numbers. Lets say that I want to add the 1st line to the 100th line - for all 10000 lines.
If I use the repeat with i=1 to 10000 method, then all I need to do is add line i to line 1+100, then next i.
If I use the repeat for eachline myline, then I can't refer to "myline + 100". I have to start building a second list where I can refer to line x and line x+100, and that slows the process down tremendously.
This seems like an excellent place to use an array for part of that.
Using "repeat for each" will parse data efficiently, sometimes as fast or even faster than some array methods, but is ideally suited for those cases where you'll be working with only one line at a time.
If you need to traverse the data rapidly to various locations, arrays will be hard to beat. While lists must be parsed to count return characters to find a given line, an array uses a more sophisticated hashing scheme to point to slots where the data can be found much more quickly - often by orders of magnitude if you need to jump around through the data frequently.
Arrays are able to do this for the modest cost of a one-time parsing that's done within the engine's well-optimized C++ code using the "split" command.
That said, it occurs to me that that the "split" command can sometimes be a relatively expensive operation, in that it effectively has to parse the entire chunk looking for all returns in order to put each line into the various array elemens.
This leaves the performance-obsessive scripter with a question: Would it be faster to load the array incrementally using the values already parsed in each line?
Being the benchmarking fan that I am, I figured it would be worth a couple minutes to try both methods to see how they compare:
Code: Select all
on mouseUp
put fld 1 into tData
--
-- Test 1: parse up front with split
--
put the millisecs into t
put empty into tNuData1
put tData into tParsedValuesA
split tParsedValuesA by cr
--
put 0 into i
repeat for each line tLine in tData
add 1 to i
put tLine + tParsedValuesA[i-100] &cr after tNuData1
end repeat
delete last char of tNuData1 -- trailing cr
put the millisecs - t into t1
--
-- Test 2: parse on the fly
--
put the millisecs into t
put empty into tNuData2
put empty into tParsedValuesA
--
put 0 into i
repeat for each line tLine in tData
add 1 to i
put tLine into tParsedValuesA[i]
put tLine + tParsedValuesA[i-100] &cr after tNuData2
end repeat
delete last char of tNuData2 -- trailing cr
put the millisec - t into t2
--
-- Show results:
put "Split : "& t1 &" On-the-fly: "& t2 &cr&\
"Results match: "& (tNuData1 = tNuData2)
-- The last line is just a sanity check to make sure
-- both methods produce the same result.
end mouseUp
That test shows that even with the expense of using split, it's still faster than loading each array element individually as tData is parsed, by about 30%.
There are probably many ways to speed it up even more, and with any luck Bernd will post a stack illustrating how to do it.
But for an off-the-cuff rough draft it's not bad, parsing 10,000 lines and adding the value of every hundredth line in 43 milliseconds on my 2.6 GHz Mac.
You can probably speed it up even more, but even this rough algo is a few orders of magnitude faster than diving into the field for each operation.
