Modest sized dataset and datagrid

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
Paul.RaulersonBUSIvAg
Posts: 21
Joined: Tue Aug 07, 2012 2:00 pm

Modest sized dataset and datagrid

Post by Paul.RaulersonBUSIvAg » Thu Aug 16, 2012 1:40 pm

I wonder how you guys would do this? :)

A modest sized dataset, a text file of about 30,000 lines, formatted with fixed fields (no tabs).

Want to read it into a datagrid to enable fast finds, sorting by different columns, etc.

I can read the file into a variable (even pulling it from a webserver) very quickly indeed.

Once loaded into the datagrid, sorts / finds are blindingly fast.

But oh, loading that datagrid... That takes close to 2 minutes on a Macbook Pro with a 2.9ghz i7. Huh? I did repeat through the variable I wrote the file to, and I used a variable as an index to identify what line I was working on. I used the substring syntax to pull each field out of each line (chunk) and put it into an array, then assigned the array to the datagrid.

Besides being slow, LC was locked up during the load process. I guess that messages were locked out. So... Advice on this? Using a database on a server is not an option here, and it will eventually need to run on an iPad.

Advice, suggestions, and even laughs are quite welcome!

Paul

P.S. I can post the code later if you wish, but is is pretty much straight out of the online lessons, with jUst a few changes to deal with a fixed format data file. Fields are padded by spaces in the file. I am on an iPad or I woukd have posted it first thing. :)

sturgis
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 1685
Joined: Sat Feb 28, 2009 11:49 pm

Re: Modest sized dataset and datagrid

Post by sturgis » Thu Aug 16, 2012 1:54 pm

Can you post a few lines of sample data in the same format as your real data, along with a quick explanation of how it needs to be split apart? Wouldn't hurt to post your processing loop either.

Paul.RaulersonBUSIvAg
Posts: 21
Joined: Tue Aug 07, 2012 2:00 pm

Re: Modest sized dataset and datagrid

Post by Paul.RaulersonBUSIvAg » Thu Aug 16, 2012 2:56 pm

I can do that -

Code: Select all

ABCDEF                                                          XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
                                                                            XXXXXXXXXXXXXXXX
                                                      XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

                
Number      Pref   Address Line 1                     Address Line 2                    Address Line 3                 Address Line 4            Name         

9999999999  999    XX XXXXXXXX  XXXXX                 XX XXXX XX 99999-9999                                                                      XXXXXX,XXXXXXX
9999999999  999    XXXX XXXXXXXX XXXXX XXX X          XXXXXXXX XX 99999-9999                                                                     XXXXXXX,XXXXXXX
9999999999  999    XXXX XXX XXXXX                     XXXXXX XX 99999-9999                                                                       XXXXXXXX,XXXXXXX

And the code is:

Code: Select all

on mouseUp
   // Read File and Display Header
   put "" into field theHeader
   answer file "Select data file please"
   put it into theFile
   open file theFile for read
   read from file theFile until EOF
   put it into theFileData
   close file theFile
   put line 1 to 3 of theFileData into field theHeader

   local tCount
   local theDataArray
   local theLineCount
   local theCharCount
   
   put 1 into tCount
   put the number of lines in theFileData into theLineCount
   set the dgData of group Data1 to empty
   repeat with theFileIndex=8 to the number of lines in theFileData
   --repeat with theFileIndex=8 to 100
      put the number of characters in  line theFileIndex of theFileData into theCharCount
      put char 1 to 11 of line theFileIndex of theFileData into theDataArray[tCount][PNUMBER]
      put char 13 to 15 of line theFileIndex of theFileData into theDataArray[tCount][ZIPPRE]
      put char 20 to 54 of line theFileIndex of theFileData into theDataArray[tCount][ADDR1]
      put char 55 to 77 of line theFileIndex of theFileData into theDataArray[tCount][ADDR2]
      put char 89 to 119 of line theFileIndex of theFileData into theDataArray[tCount][ADDR3]
      put char 120 to 145 of line theFileIndex of theFileData into theDataArray[tCount][ADDR4]
      put char 146 to theCharCount of line theFileIndex of theFileData into theDataArray[tCount][INAME]
      add 1 to tCount
      end repeat
   set the dgData of group Data1 to theDataArray
end mouseUp


sturgis
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 1685
Joined: Sat Feb 28, 2009 11:49 pm

Re: Modest sized dataset and datagrid

Post by sturgis » Thu Aug 16, 2012 3:27 pm

EDIT: A tick is 1/60th of a sec. If your loop is too slow you can use milliseconds or "wait 0 with messages" or remove the wait with messages entirely.
Changes explained by ## comments in the code.

Code: Select all

on mouseUp
   // Read File and Display Header
   put "" into field theHeader
   answer file "Select data file please"
   put it into theFile
   open file theFile for read
   read from file theFile until EOF
   put it into theFileData
   close file theFile
   put line 1 to 3 of theFileData into field theHeader
   
   local tCount
   local theDataArray
   local theLineCount
   local theCharCount
   
   put 1 into tCount
   put the number of lines in theFileData into theLineCount
   set the dgData of group Data1 to empty
   
   ##Added this to track the current location in the data. If >= 8 then process the line. (used in the repeat loop) 
   put 0 into thefileindex
   
   ## Use repeat for each. If you are doing accesses using "line linenum of container" 
   ## each and every access must cycle through the contents in the container
   ## until the line is found. So line 29000 has to bipass 28999 lines to find the data
   ## for each just proceeds through the data a line at a time *for each line..* 
   ## and puts the contents of that line into tLine. Its HUGELY faster
   ## Then you must keep your indexes yourself (as you did)  
   ## I just added the check (untested) for >= 8 so that 
   ## only the correct lines are processed. 

   repeat for each line tLine in theFileData ## this will put each successive line into tLine
      wait 1 tick with messages ## Added this so that other events can occur while processing (don't know if you need this or not)
      add 1 to theFileIndex
      if theFileIndex >= 8 then     
         
         ## at this point, the data for whatver line you are working with next is in tline
         ## so just process tLine as you were, tCount is still needed so that your array
         ## starts at 1. 
         put the number of characters of tLine into theCharCount
         put char 1 to 11 of tLine into theDataArray[tCount][PNUMBER] ## <-- using tLine rather than referencing a line number in a container
         put char 13 to 15 of tLine into theDataArray[tCount][ZIPPRE]
         put char 20 to 54 of tLine into theDataArray[tCount][ADDR1]
         put char 55 to 77 of tLine into theDataArray[tCount][ADDR2]
         put char 89 to 119 of tLine into theDataArray[tCount][ADDR3]
         put char 120 to 145 of tLine into theDataArray[tCount][ADDR4]
         put char 146 to theCharCount of tLine into theDataArray[tCount][INAME]
         add 1 to tCount
      end if
   end repeat
   set the dgData of group Data1 to theDataArray
end mouseUp

Paul.RaulersonBUSrLn
Posts: 4
Joined: Tue Aug 07, 2012 2:00 pm

Re: Modest sized dataset and datagrid

Post by Paul.RaulersonBUSrLn » Thu Aug 16, 2012 8:27 pm

That sure did the ticket! Now loads up in under 2 seconds on even the slowest machine. Thanks!
-Paul

P.S. If you test it or play with that, be sure to take the tCount increment out of the if statement! Took a minute or two to find that one. :)

sturgis
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 1685
Joined: Sat Feb 28, 2009 11:49 pm

Re: Modest sized dataset and datagrid

Post by sturgis » Thu Aug 16, 2012 8:31 pm

Hmm, actually I left the tcount inside the loop so that your array will be built starting at 1, then added a seperate counter to determine when/if it was far enough into the loop to be at the actual data (thefileindex). However, having said that, it is possible, even likely that i'm missing something! Missing key elements is my main strength. (plus, i'm not sure if the array keys starting at 1 is a requirement anyway)
Paul.RaulersonBUSrLn wrote:That sure did the ticket! Now loads up in under 2 seconds on even the slowest machine. Thanks!
-Paul

P.S. If you test it or play with that, be sure to take the tCount increment out of the if statement! Took a minute or two to find that one. :)

Paul.RaulersonBUSIvAg
Posts: 21
Joined: Tue Aug 07, 2012 2:00 pm

Re: Modest sized dataset and datagrid

Post by Paul.RaulersonBUSIvAg » Fri Aug 17, 2012 11:56 am

No, I just wasn't clear because I was in a rush and wanted to be sure I got a thank you in there in a timely manner. I had written it up earlier, but lost that post.

Putting a wait in there with each iteration dramatically slowed down execution, so I added an if statement to only wait each 500 records or so, and update a progress bar. And of course, I left the increment in the wrong place, so essentially I got into an infinite loop. (I actually forget exactly how I did that.) A hopefully somewhat humorous description of that was in the original post I somehow managed to lose.

Essentially, I was wondering if there was a forced interrupt somewhere in the IDE -and if not humorously suggesting some way to interrupt an infinite loop might be a great enhancement.... ;)

sturgis
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 1685
Joined: Sat Feb 28, 2009 11:49 pm

Re: Modest sized dataset and datagrid

Post by sturgis » Fri Aug 17, 2012 3:06 pm

You can try command-. on a mac, think its ctrl-. on windows (might have to smack it repeatedly!) to break out of runaway loops.

For some reason i've also had luck sometimes madly clicking the messages button on the toolbar to turn off messages.

I've also heard other people have had luck facerolling their keyboard until execution stops.

I just did a test with 100k loops and a wait 0 with messages is pretty fast (just over 1 sec average added to the time) but without the wait it ran in 15 milliseconds. If you wait less often as you have done, (wait 0, every 500 loops or so) then you should be able to break out of it with the ctrl-. or command-. method. If you need other parts of your stack to function during the loop this (wait with 0 every so often) method is likely best.

During testing there is also the option of adding a line like:

Code: Select all

if the shiftkey is "down" then exit repeat 
This way you only have to hold the shiftkey down until the next iteration and voila' out of the loop. The check for the shiftkey down added about 15 milliseconds to my execution time of 100k loops so its a minor nothing of a hit.

Paul.RaulersonBUSIvAg
Posts: 21
Joined: Tue Aug 07, 2012 2:00 pm

Re: Modest sized dataset and datagrid

Post by Paul.RaulersonBUSIvAg » Fri Aug 17, 2012 4:17 pm

The "If the Shiftkey is down" is really a cool testing idea, thanks!

I should have looked up wait in the Dictionary, "wait 0 ticks with messages" is a lot faster than "wait 1 tick with messages" :)

If I have not said how much I appreciate the pointers, well - I do! Thanks!
-Paul

sturgis
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 1685
Joined: Sat Feb 28, 2009 11:49 pm

Re: Modest sized dataset and datagrid

Post by sturgis » Fri Aug 17, 2012 5:06 pm

My fault with the 1 tick thing. I wasn't thinking well at the moment! (1 tick is 1/60th of a sec I think, and builds up FAST)

Glad things are coming together for you!

Post Reply