Manipulating large text file stalling

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Andycal
Posts: 144
Joined: Mon Apr 10, 2006 3:04 pm

Manipulating large text file stalling

Post by Andycal » Wed Aug 19, 2015 4:14 pm

I've got a text file with 50,000 rows in it all comma delimited. It's a list of people who have visited salons and who may have had various treatments, picked up loyalty cards etc. So, for example, it could have lines like this:

Name, Loyalty Card, Wax Treatment, EyeBrow treatment, email
Jane, 1,2,1, jane@email.com
Denise, 0,1,2, Denise@email.com
Jane, 1,1,0, jane@email.com

Now, I start with field one that has all the lines in it. I then copy each line into a variable and check whether the email address is already in field two. If not, I just pop it in field two.

If I find there's already a record, I add up all the fields, then update the line. So in the example above, I'd end up with :

Jane, 2,3,1, jane@email.com
Denise, 0,1,2, Denise@email.com

As it's going through each line, I'm incrementing a counter and displaying the current line in another label. Thing is, as it's getting big, it seems to freeze, but if I leave it long enough, it does finish and then update the total.

For example, if I just do 5,000 records, it gets to 1813 and then stops. I come back ten minutes later and it's updated to 5,000 and finished.

So, a) Why does it pause output but actually finish (apparently) correctly?
b) Am I just a crap coder and this code should be waaaay more efficient?

My code is here for you to giggle at:

Code: Select all

on mouseUp
   #put zero into field "lblProc"
   #put zero into field "lblDupes"
   repeat for each line tLine in field one
      put the value of field "lblProc" + 1 into field "lblProc"
      put item 11 of tLine into tEmail
      if tEmail is empty then
         put tLine & CR after fld "fldNoEmail"
      end if
      
      put lineoffset(tEmail, field two) into wLine
      if wLine is not 0 then
         put the value of field "lblDupes" + 1 into field "lblDupes"
         put line wLine of field two into tProc
         put line wLine field two & CR after field "fldDupetxt"
         # Loyalty cards
         put item seven of tLine into tCardOne
         put item seven of tProc into tCardTwo
         put tCardOne + tCardTwo into tTotal
         put tTotal into item seven of tProc
         # End Loyalty cards
         # Browsin
         put item 9 of tLine into tBrowOne
         put item 9 of tProc into tBrowTwo
         put tBrowOne + tBrowTwo into tBrowTotal
         put tBrowTotal into item 9 of tProc
         # end Browsin
         # Back to wow
         put item 13 of tLine into tBackOne
         put item 13 of tProc into tBackTwo
         put tBackOne + tBackTwo into tBackTotal
         put tBackTotal into item 13 of tProc
         # end back to wow
         # Eyebrow Tint
         put item 14 of tLine into tTintOne
         put item 14 of tProc into tTintTwo
         put tTintOne + tTintTwo into tTintTotal
         put tTintTotal into item 14 of tProc
         # end eyebrow Tint
         # EyeLash Tint
         put item 15 of tLine into tLashOne
         put item 15 of tProc into tLashTwo
         put tLashOne + tLashTwo into tLashTotal
         put tLashTotal into item 15 of tProc
         # end eyelash tint
         # FLC
         put item 16 of tLine into tFLCOne
         put item 16 of tProc into tFLCTwo
         put tFLCOne + tFLCTwo into tFLCTotal
         put tFLCTotal into item 16 of tProc
         # end FLC
         # EyeDo
         put item 18 of tLine into tEyeOne
         put item 18 of tProc into tEyeTwo
         put tEyeOne + tEyeTwo into tEyeTotal
         put tEyeTotal into item 18 of tProc
         # end eyedo
         # WaxLash
         put item 22 of tLine into tWLashOne
         put item 22 of tProc into tWLashTwo
         put tWLashOne + tWLashTwo into tWLashTotal
         put tWLashTotal into item 22 of tProc
         # end WaxLash
         # wax Treat
         put item 23 of tLine into tWaxTreatOne
         put item 23 of tProc into tWaxTreatTwo
         put tWaxTreatOne + tWaxTreatTwo into tWaxTreatTotal
         put tWaxTreatTotal into item 23 of tProc
         # end wax treat
         # Wow Brow
         put item 24 of tLine into tWowOne
         put item 24 of tProc into tWowTwo
         put tWowOne + tWowTwo into tWowTotal
         put tWowTotal into item 24 of tProc
         # end Wow Brow
         
         put tProc into line wLine of field two    
         
         
      else put tLine & CR after field two
      
   end repeat
end mouseUp

Klaus
Posts: 14177
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: Manipulating large text file stalling

Post by Klaus » Wed Aug 19, 2015 4:31 pm

Hi Andy,

do not access FIELDS in a repeat loop if you want SPEED! 8)
Put everything into variables and fill/update the appropriate fields AFTER the loop "en bloc".

If your really need to update a field (show progress), do this not in every loop,
use only every 10th loop or something.
Check this by using a custom counter in "repeat for each..."

it two the name of your field or do you mean field 2?
If the first, please use QUOTES around the name, if the latter use the number, that is better readable.


Best

Klaus

Andycal
Posts: 144
Joined: Mon Apr 10, 2006 3:04 pm

Re: Manipulating large text file stalling

Post by Andycal » Wed Aug 19, 2015 4:39 pm

Ah, OK, variables then, I get you. I just noticed by the way that the counter stops when I take focus away from the stack to go do something else.

Field names: Yes, they're numbers. I'm a writer by trade so I automatically write numbers under 20 verbose! Bad habit for coding, I know!

So using a counter, I guess I'd do something like:

tCounter = 0

Repeat for each item tLine in tSource

...do all the gubbins

put tLine & CR after tDestination

tCounter = tCounter +1
if tCounter = 50 then
put tDestination into field two
put empty into tCounter

I guess. It's currently running, I'll give it a go in a tick.

Klaus
Posts: 14177
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: Manipulating large text file stalling

Post by Klaus » Wed Aug 19, 2015 4:47 pm

Hi Andy,
tCounter = tCounter +1
will throw an error, this is not C of any kind :D
Do this:
...
add 1 to tCounter
...
And maybe use MOD, so you don't have to reset the counter every time:
...
if tCounter mod 50 = 0 then
...


Best

Klaus

Andycal
Posts: 144
Joined: Mon Apr 10, 2006 3:04 pm

Re: Manipulating large text file stalling

Post by Andycal » Wed Aug 19, 2015 5:03 pm

Oh yeah, that's quicker, much quicker.

Cheers buddy!

Klaus
Posts: 14177
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: Manipulating large text file stalling

Post by Klaus » Wed Aug 19, 2015 5:09 pm

Grrrrrrrreat! :D

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10043
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Manipulating large text file stalling

Post by FourthWorld » Wed Aug 19, 2015 6:17 pm

Related discussion with interesting tips for optimizing a similar task:
http://forums.livecode.com/viewtopic.php?f=8&t=24945
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

Andycal
Posts: 144
Joined: Mon Apr 10, 2006 3:04 pm

Re: Manipulating large text file stalling

Post by Andycal » Wed Aug 19, 2015 6:34 pm

Interesting article. I'm using everything there except lock screen.

It's a lot faster now I'm using variables, I get it to update the counter every 500 lines now because I'm processing the full 50K. However, it still freezes after a while, the window says "not responding" and only updates when it's finished.

It's running now, I expect it'll take about 30 mins. I might cancel it and try lock screen, see if that changes anything.

Andycal
Posts: 144
Joined: Mon Apr 10, 2006 3:04 pm

Re: Manipulating large text file stalling

Post by Andycal » Wed Aug 19, 2015 7:31 pm

Well, lockscreen sped it up some more, however it still comes up as not responding and appears to have crashed when I move focus to another window, which is odd.

There are actually 56,000 rows in there, I guess whatever I do it's going to be slow as the routine that checks for duplicates is going to be going through an ever increasing sample size. So I wonder if there's another way.

Maybe de-dupe first, then just process the dupes? I'm convinced there's a better way to do this. It'll probably come to me when I'm on my third pint tonight...

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10043
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Manipulating large text file stalling

Post by FourthWorld » Wed Aug 19, 2015 8:00 pm

Unless those rows are very long, 50,000 of them isn't much data to traverse. We should be able to get speed down to well under a second, possibly less than 1/10 of a second. In the thread I linked to above the community pulled together and brought a routine that had taken 9 minutes down to about 3 milliseconds. I'm sure we can do the same for yours, provided we have both the current code and the data. The latter is especially important because sometimes the optimal method may vary depending on characteristics of the data being used. If the data contains sensitive information, taking the time to generate dummy values with the same length and structure will be equally valuable.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

AxWald
Posts: 578
Joined: Thu Mar 06, 2014 2:57 pm

Re: Manipulating large text file stalling

Post by AxWald » Wed Aug 19, 2015 9:23 pm

Hi,
Andycal wrote:I've got a text file with 50,000 rows in it all comma delimited. It's a list of people who have visited salons and who may have had various treatments, picked up loyalty cards etc.
Cannot help but thinking that this is something where a database would be the most suitable tool. SQLite, for instance, may make your life a lot more comfortable.

Have fun!
All code published by me here was created with Community Editions of LC (thus is GPLv3).
If you use it in closed source projects, or for the Apple AppStore, or with XCode
you'll violate some license terms - read your relevant EULAs & Licenses!

Andycal
Posts: 144
Joined: Mon Apr 10, 2006 3:04 pm

Re: Manipulating large text file stalling

Post by Andycal » Thu Aug 20, 2015 8:32 am

AxWald wrote:Hi,

Cannot help but thinking that this is something where a database would be the most suitable tool. SQLite, for instance, may make your life a lot more comfortable.
That's not a bad idea actually! Anyway, just generating lots of lovely dummy data to test it with.

Andycal
Posts: 144
Joined: Mon Apr 10, 2006 3:04 pm

Re: Manipulating large text file stalling

Post by Andycal » Thu Aug 20, 2015 11:24 am

Right, finally generated some dummy data with dupes. Taken hours, but helped me with the debugging process a lot. I discovered I hadn't converted all my fields to variables.

Anyway, the data dump is here : https://www.dropbox.com/s/83eoycq83hjw3 ... a.csv?dl=0

That's generated with some online generator, de-duped and then I forced some duplicates in.

The latest code doing the work is here:

Code: Select all

on mouseUp
   put zero into field "lblProc"
   put zero into field "lblDupes"
   
   # transfer everything to vars
   
   put field 1 into tSource
   
   put empty into tDestination 
   
   put fld "lblProc" into tCounter
   
   Put empty into tCounter
   
   set the lockscreen to true
   
   repeat for each line tLine in tSource
      put tCounter+1 into tCounter
      
      if tCounter Mod 50 = 0 then 
         set the lockscreen to false
      put tDestination into field 2
      put tCounter into fld "lblProc"
      set the lockscreen to true
   end if
   
      
      #put the value of field "lblProc" + 1 into field "lblProc"
      put item 11 of tLine into tEmail
      
      # If no email, store it in another field
      
      if tEmail is empty then
         put tLine & CR after fld "fldNoEmail"
      end if
                 
      # put tLine into fld "lblCurrent"
      
            
      put lineoffset(tEmail, tDestination) into wLine
      
      if wLine is not 0 then
         put the value of field "lblDupes" + 1 into field "lblDupes"
         put line wLine of tDestination into tProc
         set the lockscreen to false
         put line wLine of tDestination & CR after field "fldDupetxt"
         set the lockscreen to true
         # Loyalty cards
         put item seven of tLine into tCardOne
         put item seven of tProc into tCardTwo
         put tCardOne + tCardTwo into tTotal
         put tTotal into item seven of tProc
         # End Loyalty cards
         # Browsin
         put item 9 of tLine into tBrowOne
         put item 9 of tProc into tBrowTwo
         put tBrowOne + tBrowTwo into tBrowTotal
         put tBrowTotal into item 9 of tProc
         # end Browsin
         # Back to wow
         put item 13 of tLine into tBackOne
         put item 13 of tProc into tBackTwo
         put tBackOne + tBackTwo into tBackTotal
         put tBackTotal into item 13 of tProc
         # end back to wow
         # Eyebrow Tint
         put item 14 of tLine into tTintOne
         put item 14 of tProc into tTintTwo
         put tTintOne + tTintTwo into tTintTotal
         put tTintTotal into item 14 of tProc
         # end eyebrow Tint
         # EyeLash Tint
         put item 15 of tLine into tLashOne
         put item 15 of tProc into tLashTwo
         put tLashOne + tLashTwo into tLashTotal
         put tLashTotal into item 15 of tProc
         # end eyelash tint
         # FLC
         put item 16 of tLine into tFLCOne
         put item 16 of tProc into tFLCTwo
         put tFLCOne + tFLCTwo into tFLCTotal
         put tFLCTotal into item 16 of tProc
         # end FLC
         # EyeDo
         put item 18 of tLine into tEyeOne
         put item 18 of tProc into tEyeTwo
         put tEyeOne + tEyeTwo into tEyeTotal
         put tEyeTotal into item 18 of tProc
         # end eyedo
         # WaxLash
         put item 22 of tLine into tWLashOne
         put item 22 of tProc into tWLashTwo
         put tWLashOne + tWLashTwo into tWLashTotal
         put tWLashTotal into item 22 of tProc
         # end WaxLash
         # wax Treat
         put item 23 of tLine into tWaxTreatOne
         put item 23 of tProc into tWaxTreatTwo
         put tWaxTreatOne + tWaxTreatTwo into tWaxTreatTotal
         put tWaxTreatTotal into item 23 of tProc
         # end wax treat
         # Wow Brow
         put item 24 of tLine into tWowOne
         put item 24 of tProc into tWowTwo
         put tWowOne + tWowTwo into tWowTotal
         put tWowTotal into item 24 of tProc
         # end Wow Brow
         
         put tProc into line wLine of tDestination   
         
         
      else put tLine & CR after tDestination
      
      
      
      
   end repeat
   set the lockscreen to false
   put tDestination into field 2
end mouseUp
This is proving to be a very good debugging lesson, thanks y'all!

Klaus
Posts: 14177
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: Manipulating large text file stalling

Post by Klaus » Thu Aug 20, 2015 12:52 pm

Hi Andy,

you are still accessing several fields inside of your repeat loop,
and I do not mean the field where you show your progress! 8)

Best

Klaus

Andycal
Posts: 144
Joined: Mon Apr 10, 2006 3:04 pm

Re: Manipulating large text file stalling

Post by Andycal » Thu Aug 20, 2015 12:59 pm

Yes, you're right, however they're actually very rarely used in the data.

For example, in the whole dataset there are probably only 100 or so with no email, so that little bit of the routine wouldn't get called very often.

Also, there are actually only about 1000 dupes in my data, so again, that routine will very rarely be called.

Would they still cause overhead even if not used?

Post Reply