Searching a List, or do I need a better DB Engine?

Creating desktop or client-server database solutions?

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
Garrett
Posts: 386
Joined: Sat Apr 08, 2006 8:15 am
Contact:

Searching a List, or do I need a better DB Engine?

Post by Garrett » Tue Apr 18, 2006 4:18 am

Using Rev 2.6.1

Greetings,

I have a list of about 128,000 lines. Each line has anywhere from 2
to maybe 10 items. The longest line is maybe 100 characters total. So
each line isn't really a lot of data.

I did a simple repeat loop and compared the word being searched for with
the first word of each entry. Works of course, but it's so slow that I was
able to take a break, get something to eat, bathroom break, play with the
dog, give the wife and kids a bad time etc..... You get the point. ;-) It
was seriously slow.

Is there a better, faster way to search a list other than what I described
above?

Or am I going to have to convert the list over to a database like SQL?

Any suggestions or ideas welcome and appreciated,
-Garrett

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9833
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Searching a List, or do I need a better DB Engine?

Post by FourthWorld » Tue Apr 18, 2006 6:04 am

Garrett wrote:I have a list of about 128,000 lines. Each line has anywhere from 2 to maybe 10 items. The longest line is maybe 100 characters total. So
each line isn't really a lot of data.

I did a simple repeat loop and compared the word being searched for with
the first word of each entry. Works of course, but it's so slow...
If you're using "repeat with i = 1 to the number of lines..." we can definitely cut this time down by about an order of magnitude in one move, switching that to use "repeat for each line in..." instead.

When you use this:

repeat with i = 1 to the number of lines of tData
put line i of tData into tLine
-- do stuff on tLine
end repeat

....each time through the loop it has to count through each line in tData until it reaches i number of lines. So as you can surmise, each time through the loop it's slower than the last, and by the time you get past 100,000 lines it's dead slow.

But if you change that to:

repeat for each line tLine in tData
-- do stuff on tLine
end repeat

...then Rev is parsing tData into lines as it goes, putting the current line into tLine and keeping track of where it is for the next iteration.

It's able to do this by making the critical assumption that tData doesn't change. Using "repeat with i = 1 to x" whatever you're parsing can be dynamic because it's going to count through the data anyway. But with "repeat for each" it's able to keep track of where it is as it goes because it doesn't allow tData to change.

It sounds like for your purposes the data set doesn't need to change, so you should be able to use the "repeat for each" form well.

Of course this assumes you're not already using that form. If you are let's take a look at the code to see if we can find other areas for optimization.

Garrett
Posts: 386
Joined: Sat Apr 08, 2006 8:15 am
Contact:

Post by Garrett » Tue Apr 18, 2006 6:59 pm

Hi Richard,

Thanks for the reply. Here's the code I have been playing with:

Code: Select all

  put field "fWordToSearch" into varTemp
  put empty into field "listResults"
  if varTemp is not empty then
    put field "listRhymeDB" into varData
    set the wholeMatches to false
    get lineOffset("dog ",field "listRhymeDB")
    put it into varResult
    answer varResult
    --put "0" into field "fSearchCounter"
    --put number of lines of varData into varLineTotal
    --put 1 into varLineCounter
    --repeat until varLineCounter > varLinetotal
      --if word 1 of line varLineCounter of varData is varTemp then
        --put line varLineCounter of varData after field "listResults"
        --put varLineTotal into varLineCounter
      --end if
      --put 1 + varLineCounter into varLineCounter
      --put varLineCounter into field "fSearchCounter"
    --end repeat
  end if
I found "lineOffset" in the docs and decided to give that a try, but am
having issues with the "wholeMatches" property. If I set it to false, it
works, but not exact matches of course. If I set it to true, it's does not
work as intended, no matches are found even if the search word is
correct and there is a match in the list.

Here's an example of a line in the list:

Code: Select all

DOG  D AO1 G
I may have misunderstood how the "lineOffset" works of course, but I
assumed it would find "dog" in the list.

I commented out the repeat loop when I found "lineOffset".

Is "wholeMatches" case sensitive?

Thanks,
-Garrett

Garrett
Posts: 386
Joined: Sat Apr 08, 2006 8:15 am
Contact:

Post by Garrett » Tue Apr 18, 2006 7:10 pm

Wholly Bat Dung Batman!

Richard, thank you so much. I just gave your example a try and the
results are instantaneous!!!

Thanks a million! :-)
-Garrett

Post Reply

Return to “Databases”