Manipulating large text file stalling

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Klaus
Posts: 14177
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: Manipulating large text file stalling

Post by Klaus » Thu Aug 20, 2015 1:11 pm

They ARE being used and accessed in your repeat loop! :D

Andycal
Posts: 144
Joined: Mon Apr 10, 2006 3:04 pm

Re: Manipulating large text file stalling

Post by Andycal » Thu Aug 20, 2015 1:15 pm

Hmmmmmm, I'm going to have to take a look on a bigger screen, I must be code blind!

SparkOut
Posts: 2943
Joined: Sun Sep 23, 2007 4:58 pm

Re: Manipulating large text file stalling

Post by SparkOut » Thu Aug 20, 2015 3:43 pm

Make sure that within the loop there is a line
wait 0 milliseconds with messages

That is the reason you are getting the "not responding" issue. Adding this line will not optimise or speed up the processing but it will avoid the user thinking the program has crashed.

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10043
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Manipulating large text file stalling

Post by FourthWorld » Thu Aug 20, 2015 4:26 pm

Andycal wrote:Right, finally generated some dummy data with dupes. Taken hours, but helped me with the debugging process a lot. I discovered I hadn't converted all my fields to variables.

Anyway, the data dump is here : https://www.dropbox.com/s/83eoycq83hjw3 ... a.csv?dl=0

That's generated with some online generator, de-duped and then I forced some duplicates in.

The latest code doing the work is here:

Code: Select all

on mouseUp
   put zero into field "lblProc"
   put zero into field "lblDupes"
   
   # transfer everything to vars
   
   put field 1 into tSource
   
   put empty into tDestination 
   
   put fld "lblProc" into tCounter
   
   Put empty into tCounter
   
   set the lockscreen to true
   
   repeat for each line tLine in tSource
      put tCounter+1 into tCounter
      
      if tCounter Mod 50 = 0 then 
         set the lockscreen to false
      put tDestination into field 2
      put tCounter into fld "lblProc"
      set the lockscreen to true
   end if
   
      
      #put the value of field "lblProc" + 1 into field "lblProc"
      put item 11 of tLine into tEmail
      
      # If no email, store it in another field
      
      if tEmail is empty then
         put tLine & CR after fld "fldNoEmail"
      end if
                 
      # put tLine into fld "lblCurrent"
      
            
      put lineoffset(tEmail, tDestination) into wLine
      
      if wLine is not 0 then
         put the value of field "lblDupes" + 1 into field "lblDupes"
         put line wLine of tDestination into tProc
         set the lockscreen to false
         put line wLine of tDestination & CR after field "fldDupetxt"
         set the lockscreen to true
         # Loyalty cards
         put item seven of tLine into tCardOne
         put item seven of tProc into tCardTwo
         put tCardOne + tCardTwo into tTotal
         put tTotal into item seven of tProc
         # End Loyalty cards
         # Browsin
         put item 9 of tLine into tBrowOne
         put item 9 of tProc into tBrowTwo
         put tBrowOne + tBrowTwo into tBrowTotal
         put tBrowTotal into item 9 of tProc
         # end Browsin
         # Back to wow
         put item 13 of tLine into tBackOne
         put item 13 of tProc into tBackTwo
         put tBackOne + tBackTwo into tBackTotal
         put tBackTotal into item 13 of tProc
         # end back to wow
         # Eyebrow Tint
         put item 14 of tLine into tTintOne
         put item 14 of tProc into tTintTwo
         put tTintOne + tTintTwo into tTintTotal
         put tTintTotal into item 14 of tProc
         # end eyebrow Tint
         # EyeLash Tint
         put item 15 of tLine into tLashOne
         put item 15 of tProc into tLashTwo
         put tLashOne + tLashTwo into tLashTotal
         put tLashTotal into item 15 of tProc
         # end eyelash tint
         # FLC
         put item 16 of tLine into tFLCOne
         put item 16 of tProc into tFLCTwo
         put tFLCOne + tFLCTwo into tFLCTotal
         put tFLCTotal into item 16 of tProc
         # end FLC
         # EyeDo
         put item 18 of tLine into tEyeOne
         put item 18 of tProc into tEyeTwo
         put tEyeOne + tEyeTwo into tEyeTotal
         put tEyeTotal into item 18 of tProc
         # end eyedo
         # WaxLash
         put item 22 of tLine into tWLashOne
         put item 22 of tProc into tWLashTwo
         put tWLashOne + tWLashTwo into tWLashTotal
         put tWLashTotal into item 22 of tProc
         # end WaxLash
         # wax Treat
         put item 23 of tLine into tWaxTreatOne
         put item 23 of tProc into tWaxTreatTwo
         put tWaxTreatOne + tWaxTreatTwo into tWaxTreatTotal
         put tWaxTreatTotal into item 23 of tProc
         # end wax treat
         # Wow Brow
         put item 24 of tLine into tWowOne
         put item 24 of tProc into tWowTwo
         put tWowOne + tWowTwo into tWowTotal
         put tWowTotal into item 24 of tProc
         # end Wow Brow
         
         put tProc into line wLine of tDestination   
         
         
      else put tLine & CR after tDestination
      
      
      
      
   end repeat
   set the lockscreen to false
   put tDestination into field 2
end mouseUp
This is proving to be a very good debugging lesson, thanks y'all!
Your script requires a lot of fields with specific names. I'm happy to help optimize it, but rather than going through the script to find all the field references and create those fields from scratch it would be very helpful if you could post a stack that already has the required objects in it.

Thanks -
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

Andycal
Posts: 144
Joined: Mon Apr 10, 2006 3:04 pm

Re: Manipulating large text file stalling

Post by Andycal » Thu Aug 20, 2015 4:49 pm

I think it may be solved!

Thierry dropped me a PM and took a look, and he only blummin' well got it down to 30 seconds!!!

Rather than me just pile in here with the source, I'm going to look at it, digest it and understand it. It's also highlighted some other areas where I need to be altering my data handling. I'll report back with a fully optimised and working script.

This really has been a fantastic exercise and I thank you all, especially Thierry who provided that pivotal "Ah-ha!" moment!

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10043
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Manipulating large text file stalling

Post by FourthWorld » Thu Aug 20, 2015 4:56 pm

You're in good hands. Thierry's been a consistently generous contributor to these forums, and does excellent work.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

Andycal
Posts: 144
Joined: Mon Apr 10, 2006 3:04 pm

Re: Manipulating large text file stalling

Post by Andycal » Thu Aug 20, 2015 5:30 pm

Agree with that. Here's the bit of genius that I really liked:

Code: Select all

repeat  for each item N in "7,9,13,14,15,16,18,23,24"
         put (item N of tLine) + (item N of tProc) into item N of tProc
      end repeat
So instead of my clunky checking every little bit, do it in one swoop.

There's a lot of arrays going on as well. I never got on with those, so I'm having fun working out what they're doing.

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10043
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Manipulating large text file stalling

Post by FourthWorld » Thu Aug 20, 2015 6:10 pm

Arrays are well worth learning in any language, and in LiveCode they have so many uses you'll find the time an excellent investment.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 4163
Joined: Sun Jan 07, 2007 9:12 pm

Re: Manipulating large text file stalling

Post by bn » Tue Aug 25, 2015 7:29 pm

Hi Andy,

what do you want to do with Arsenio,Hampton, he sneakes 6 times into your sample data

he lives in lines
18727,18735,18736,18737,18738,18739 of your sample data (without header line)

ipsum.porta.elit@mollis.co.uk Arsenio,Hampton

Does your real data contain more than double entries? Or should I delete those from my copy of your sample data?



Anyways Arsenio Hampton to 2 occurrences using the code below you get for 49777 lines of data with 20 duplicates computational time about a second, loading fields with the result about additional 9 seconds

All times using LC 7.1 RC1 on a MacBook Pro Intel i5 2.53 GHz.

I used Thierry's code for updating the items.

Code: Select all

on mouseUp
   put the milliseconds into t
   lock screen
   put field "source" into tSource
   put "7,9,13,14,15,16,18,23,24" into tItems
   
   split tSource by return
   repeat for each key aKey in tSource
      put item 11 of tSource[aKey] into tEmail
      put aKey & comma after tEmailArray[tEmail]["theLines"]
   end repeat
   
   repeat for each key aKey in tEmailArray
      put tEmailArray[aKey]["theLines"] into tLines
      delete last char of tLines -- a comma
      if the number of items of tLines > 1 then
         sort items of tLines ascending numeric
         repeat for each item N in tItems
            put item N of tSource[item 1 of tLines] + item N of tSource[item 2 of tLines] into item N of tSource[item 2 of tLines]
         end repeat
         put tSource[item 1 of tLines] & cr after tCollectDupes
         add 1 to tCountDupes
         put tSource[item 2 of tLines] & cr after tCollectDest
         add 1 to tCountProc
      else
         put tSource[item 1 of tLines] & cr after tCollectDest
         add 1 to tCountProc
      end if
   end repeat
   put "processing ms " & the milliseconds - t 
   put the milliseconds into t
   put tCountDupes into field "lblDupes"
   put tCountProc into field "lblProc"
   put tCollectDupes into field "fldDupeTxt"
   put tCollectDest into field "Destination"
   put " loading fields ms " & the milliseconds - t after msg
   unlock screen
end mouseUp
Kind regards
Bernd

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Manipulating large text file stalling

Post by Thierry » Thu Sep 03, 2015 6:50 am

Hi,

Finally got some time to clear a bit the script..
Here are the timings on my mini-mac:
sunnY 2015-09-03.png
sunnY 2015-09-03.png (12.79 KiB) Viewed 6852 times

Code: Select all


-- change the filenames to your need
local filenameOutput = "/Users/U/Desktop/dummydataOut.txt"
local filenameInput = "/Users/U/Desktop/dummydata.csv"

-- only for debugging (when it works, drop it + line where the test is )
local Maxi4testing = 1001

-- vars shared between handlers
local fldNoEmail, lblDupes, fldDupetxt, lblProc
local destination, Timers, N


on mouseUp
   put zero into field "lblProc"
   put zero into field "lblDupes"
   put empty into fld 1
   put empty into field 2
   put empty into destination
   put empty into fldNoEmail
   put empty into lblDupes
   put empty into fldDupetxt
   put empty into lblProc
   
   startChrono 1
   
   put URL  ("file:" & filenameInput )  into tSource
   
   startChrono 2
   
   processCSVfile tSource
   
   -- refresh our view
   put fldNoEmail into  fld "fldNoEmail"
   put lblDupes  into field "lblDupes"
   put fldDupetxt into field "fldDupetxt"
   put lblProc into fld "lblProc"
   put N into fld "lblProc"
   
   startChrono 3
   
   -- Transform our array back to text ( one line per value)
   combine destination by return
   
   startChrono 4
   
   -- well, updating the field text takes 10 seconds!!!!
   -- put ArrayDestination into field 2
   -- instead, save it as a text file:
   put destination into URL ( "file:" & filenameOutput)
   put "Check results in file: " & filenameOutput into field 2
   
   startChrono 5
   
   showTimers Timers
end mouseUp



local kMail = 11

on processCSVfile @CSVtext
   put 0 into N
   repeat for each line aLine in CSVtext
      add 1 to N
      -- drop it when script works:
      -- if N > Maxi4testing then exit repeat
      
      -- view progress: ( comment all only for speed measurement)
      --      if N Mod 1000 = 0 then
      --         put N into fld "lblProc"
      --         -- give the engine some time to breath:
      --         wait 1 milliseconds with messages
      --      end if
      
      put item kMail of aLine into tEmail
      if tEmail is empty then  -- If no email, store it apart
         put aLine & cr after fldNoEmail
         next repeat
      end if
      
      if destination[ tEmail] is empty then
         put aLine into destination[ tEmail]
         next repeat
      end if
      
      add 1 to  lblDupes
      put destination[ tEmail] into tProc
      put tProc & cr after fldDupetxt
      
      repeat  for each item X in "7,9,13,14,15,16,18,23,24"
         add  (item X of aLine) to  item X of tProc
      end repeat
      
      put tProc into destination[ tEmail]
      
   end repeat
end processCSVfile


on showTimers T
   local s
   put "loading CSV file:" &tab&  T[ 2] - T[ 1] &cr into s 
   put "repeat loop:" &tab& T[ 3] - T[ 2] &cr after s
   put "combine:" &tab&  T[ 4] - T[ 3] &cr after s 
   put "fill up field:" &tab& T[ 5] - T[ 4] &cr after s 
   put "whole script:" &tab& T[ 5] - T[ 1] &cr after s
   answer s
end showTimers

on startChrono n
   put the milliseconds into Timers[ n]
end startChrono

Andy, I'll send you the whole stack.

Kind regards,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Manipulating large text file stalling

Post by Thierry » Sat Sep 05, 2015 10:19 am

Hi,

After few interesting exchanges with Bernd,
here is a way to gain some CPU cycles:

So, instead of:

Code: Select all

 combine destination by return
do that:

Code: Select all

   repeat for each element aRecord in destination
      put aRecord & cr after tCollect
   end repeat
and here the new results you can compare with the previous ones.
sunnY 2015-09-05 à 11.12.33.png
sunnY 2015-09-05 à 11.12.33.png (12.92 KiB) Viewed 6821 times
Tested with LC 7.1 (rc 2).

Kind regards,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Manipulating large text file stalling

Post by Thierry » Sun Sep 06, 2015 3:19 pm

Hi,

Just downloaded LC 8.0 dp 4 on my Mac and tried to run the same stack.

It works as expected. Great!

Here are the results:
sunnY 2015-09-06 à 16.04.36.png
sunnY 2015-09-06 à 16.04.36.png (12.77 KiB) Viewed 6788 times

Seems a bit slower ?

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

Post Reply