Garrett wrote:I have a list of about 128,000 lines. Each line has anywhere from 2 to maybe 10 items. The longest line is maybe 100 characters total. So
each line isn't really a lot of data.
I did a simple repeat loop and compared the word being searched for with
the first word of each entry. Works of course, but it's so slow...
If you're using "repeat with i = 1 to the number of lines..." we can definitely cut this time down by about an order of magnitude in one move, switching that to use "repeat for each line in..." instead.
When you use this:
repeat with i = 1 to the number of lines of tData
put line i of tData into tLine
-- do stuff on tLine
end repeat
....each time through the loop it has to count through each line in tData until it reaches i number of lines. So as you can surmise, each time through the loop it's slower than the last, and by the time you get past 100,000 lines it's dead slow.
But if you change that to:
repeat for each line tLine in tData
-- do stuff on tLine
end repeat
...then Rev is parsing tData into lines as it goes, putting the current line into tLine and keeping track of where it is for the next iteration.
It's able to do this by making the critical assumption that tData doesn't change. Using "repeat with i = 1 to x" whatever you're parsing can be dynamic because it's going to count through the data anyway. But with "repeat for each" it's able to keep track of where it is as it goes because it doesn't allow tData to change.
It sounds like for your purposes the data set doesn't need to change, so you should be able to use the "repeat for each" form well.
Of course this assumes you're not already using that form. If you are let's take a look at the code to see if we can find other areas for optimization.