Page 1 of 1
array calculations slower after combine/split
Posted: Mon Apr 06, 2020 4:24 pm
by Wil
Take a look at the script below:
Code: Select all
on mouseUp
repeat with i = 1 to 500000
put random (32000) into myArray [i]
end repeat
put the milliSeconds into ttt
put average (myArray) into mean1
put the milliSeconds - ttt into time1
combine myArray by return
split myArray by return
put the milliSeconds into ttt
put average (myArray) into mean2
put the milliSeconds - ttt into time2
put time1 && time2 & cr & mean1 && mean2
end mouseUp
On my system time1 is about 100 milliseconds, whereas time2 is about 150 milliseconds. The content of the array is exactly the same before and after the combine/split, which is also shown by the fact that mean1=mean2. On LC version 6 the difference is even more pronounced: 8 versus 32 milliseconds. I have not the faintest idea of the cause of this difference. Am I overlooking something? Something trivial perhaps?
Wil Dijkstra
Re: array calculations slower after combine/split
Posted: Mon Apr 06, 2020 5:06 pm
by LCMark
@Wil: It isn't *quite* the same...
In the first case, the values in the array are numbers already (random returns a number).
In the second case, the values in the array are strings (split doesn't know what is on each line, after all) so average has to first convert the string to a number in order to do its job.
So the difference is that the second 'average' is also doing 500000 string to number conversions, as well as computing the average.
Re: array calculations slower after combine/split
Posted: Mon Apr 06, 2020 5:20 pm
by Wil
Ah, of course, already thought there should be an easy explanation.
Thanks, Wil
Re: array calculations slower after combine/split
Posted: Mon Apr 06, 2020 5:48 pm
by Wil
Nevertheless something mysterious remains. I add "put myArray + 0 into myArray" after split:
Code: Select all
on mouseUp
repeat with i = 1 to 500000
put random (32000) into myArray [i]
end repeat
put the milliSeconds into ttt
put average (myArray) into mean1
put the milliSeconds - ttt into time1
combine myArray by return
split myArray by return
put myArray + 0 into myArray
put the milliSeconds into ttt
put average (myArray) into mean2
put the milliSeconds - ttt into time2
put time1 && time2 & cr & mean1 && mean2
end mouseUp
Apparently I convinced the array even more that it consists of numbers. Now time2 is less then time1: about 60 milliseconds (versus 100 for time1)! So what could now be the difference between the array before combine/split and after combine/split and adding zero?
Wil Dijkstra
Re: array calculations slower after combine/split
Posted: Mon Apr 06, 2020 6:11 pm
by LCMark
Cache coherence I think...
Average iterates over the elements of an array in hash-order (as this is the fastest way to iterate through an array and order of processing doesn't matter in this case).
The loop which constructs the values results in the values in memory being in numeric order (of i).
When you `+ 0` to an array, the engine will iterate through the array in hash-order (again because the order of processing doesn't matter) which means that the values in the new array formed will be in hash-order in memory.
This means that when average processes the new array, the order of values of the elements in memory is the same as the order of processing of those values.
Put another way, the average(myArray) gives rise to lots of cache misses, average(myArray + 0) gives rise to very few cache misses.
So, it isn't the combine/split which is having this effect - its the +0 which is (helpfully?) essentially reordering the values in memory so they match the order average wants them in.
Re: array calculations slower after combine/split
Posted: Mon Apr 06, 2020 6:45 pm
by FourthWorld
LCMark wrote: ↑Mon Apr 06, 2020 6:11 pm
Put another way, the average(myArray) gives rise to lots of cache misses, average(myArray + 0) gives rise to very few cache misses.
I'm pre-coffee. How do we add 0 to an array?
Re: array calculations slower after combine/split
Posted: Mon Apr 06, 2020 7:03 pm
by LCMark
@FourthWorld: The arithmetic operators have always been overloaded - you can do Number op Number, Array op Number or Array op Array.