Page 1 of 1

speed of is among

Posted: Sun Dec 08, 2013 2:33 am
by mickclns
I have a variable that contains about 80,000 words (whole numbers in ascending order). I assume that some sort of hash is involved. I am currently using it like the following:

if 874693 is among the words of varContainingNumbersInOrder then ..

It seems like it is pretty slow on average. Would it be faster (since the numbers are in order) to write my own similar function (say, using binary search using the word indices)? Any other suggestions?
Thanks,
- Mick

Re: speed of is among

Posted: Sun Dec 08, 2013 2:48 am
by Simon
Hi Mick,
Welcome to the forums:)

I'm not too sure on speed but have you tried "lineOffset" or "filter"?

Simon

Re: speed of is among

Posted: Sun Dec 08, 2013 2:58 am
by FourthWorld
There may be a few strategies we might consider, such as trading off the one-time hit of using the split command for the blinding fast performance of lookups in arrays.

But there may be others as well. Are the numbers padded so that their all of equal length? What else is in the data besides the numbers?

Re: speed of is among

Posted: Sun Dec 08, 2013 3:57 am
by mickclns
Hi, Simon, I've never used lineOffset or filter.
The words are in a single line, so, if my reading of the dictionary is correct neither is applicable as the variable stands now. Perhaps wordOffset? Suggestions how to change the structure of the variable so either lineOffset or filter could be used for speedup?

Hi, Richard,
The numbers are not padded and the data is all numbers (the variable contains all the prime numbers up to 1,000,000). What do you see as the speedup if they were padded?

I keep the numbers in a field and load them into a global in a preopenstack handler. I assume the time for a split at that time would not be excessive. However, I am not sure what the keys would be. Perhaps, I'd have to repackage/reprocess the field so that the keys would be the same as the indices for the words? Or is there an option for split so that for a field/var that is ordered numerically (or some other way) without keys, the keys are automatically generated?

Thanks for the help, guys,
- Mick

Re: speed of is among

Posted: Sun Dec 08, 2013 4:06 am
by Simon
Hi Mick,
Yeah, lineOffset and filter will not work on a single line.
How are you delimiting each number?

Simon

Re: speed of is among

Posted: Sun Dec 08, 2013 4:08 am
by mickclns
Oops, sorry Richard, I didn't see before that the dictionary answered my question about a variable without specified keys.

Re: speed of is among

Posted: Sun Dec 08, 2013 4:12 am
by mickclns
Simon, at present they are simply words delimited by spaces, sometimes with several spaces or returns for readability when browsing with my eyes.

Re: speed of is among

Posted: Sun Dec 08, 2013 6:33 am
by FourthWorld
If all the data is just numbers and you already have the number you're looking for, are you just checking to see if a given number is in the set?

If so, and if the delimiters around the numbers are at all consistent, something like this using the offset function may be a little faster because it doesn't have to account for the concept of lines per se, it's only looking for a byte sequence:

Code: Select all

if offset( cr& tNumToFind &cr, cr& tWholeCollection &cr) > 0 then
   put "Got it!"
else
   put "Not found."
end if
If you need something more please provide a link to the data set if possible so we have something concrete to chew on.

Re: speed of is among

Posted: Sun Dec 08, 2013 9:40 am
by mickclns
Thanks, Richard, I'll try that, and do some timing and will let you know (here) which seems to be the best (for this particular set -- I'll have to check the code which creates the set to see how I decided when to add a return instead of a simple space (I initially wrote this years ago just after my transition (from Hypercard) to Rev/LC) ).

The main place I use it is in a function isPrime(n) where it first checks if n <= 1,000,000, if so, it checks whether it is in the list of primes and returns true if it is and false if it isn't, and (much less often) if n > 1,000,000 it checks whether it is divisible by one of the primes in the list, and in that way can check the primality of a number up to 1,000,000,000,000 (takes a lot longer than Mathematica but doesn't use any ethereal techniques and is easy to modify and more fun). In another context, it finds the prime factorization of a number, breaking it down (when possible) by dividing by the smallest primes first.

Thanks again,
- Mick

Re: speed of is among

Posted: Sun Dec 08, 2013 12:25 pm
by [-hh]
..........

Re: speed of is among

Posted: Sun Dec 08, 2013 5:49 pm
by dunbarx
Hermann.

LiveCode being the first?

Craig

Re: speed of is among

Posted: Sun Dec 08, 2013 7:21 pm
by [-hh]
..........

Re: speed of is among

Posted: Sun Dec 08, 2013 11:51 pm
by mickclns
|\_/||||//_\|/\|//____/|\_/|\_//|||//_\|/_\|/____//|\_|\_//|
|\_//|||//_\|_\|//____//|\/|\_//|||//_\|/_||/____//|\_/\_//|
|\_//|||/__\/_\|/_____//|\_|\_//|||//_\|/\|//_____/|\_/|\_//
|\__/////_\|/_\|/______/|\_/|\_/////__\/_\|/___\__/|\_/|\_//
/|\_/////_\|/_\|/_\\\__/|\_/|\_/////_\|/_\|/__\\\_/|\_/|\_//
/|\__///__\|/\|/__\\\\__/|_/|\__///__\|/_\//_\\\\__/|\_|\\_/
/|\\_____\|/_\|/_\\||\\_/|\_/|\_____\|/_\|/_\\||\\_/|\_/|\__
_/|\____\\|/_\/_\\||||\_/|\_/|\\___\\|/_\|/_\||||\\_/|_/|\\_
_/||\\\\\|/_\|/_\||/|||\_/|\_/|\\\\\||/\|/_\|||/||\_/|\_/|\\
\_/||\\\||/_\/_\||////|\\/|\_/||\\\||/_\|/_\|////||\_/\_/||\
|_//|||||/_\|/_\|//__//|\_/|\_/|||||/_\|/_\|//__//|\_/|\_/||
|\_//|||/__\/_\|/_____//|\_|\_//|||//_\|/\|//_____/|\_/|\_//
/|\_////__\|/_|//_\\\\_/|\_/|\__////_\|/_\|/_\\\\__/|_/|\__/
_|\\_____\|/_\|/_\\||\\_/|\_/|\_____\|/_\|/_\\||\\_/|\_/|\__
_/|\\__\\|/_\|/_\|||||\\_/|_/||\___\\|/_|/_\\|||||\_/|\_/|\\
\_/||\\\||/_\/_\||////|\\/|\_/||\\\||/_\|/_\|////||\_/\_/||\
|\_/|||||/_\|/_\|/___//|\_/|\_/|||||/_\|/_\|//___/|\_/|\_/||
|\__/////_\|/_\|/__\___/|\_/|\_/////__\/_\|/___\__/|\_/|\_//