For each line loop slows down

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: Klaus, FourthWorld, heatherlaine, kevinmiller

Post Reply
edmerckx99
Posts: 8
Joined: Sat Nov 20, 2010 2:32 am

For each line loop slows down

Post by edmerckx99 » Thu Aug 25, 2016 5:00 pm

i have a script that is parsing a a large text file, ~30,000 lines with around 40-50 items per line. its taking quite a long time to process it, sometimes 45mins. if i just loop through each line without parsing it, running my code, it takes about 3 minutes.

one of the interesting things is the script seems to slow down overtime. i am trying to convert each line to a sql insert command so i can bulk import them to a sqlite database.

Here is my code

Code: Select all

function prepareSQL pText, pColumns
   delete the first line of pText
   local sqlText, sqlLineTemplate, theLines
   put 1 into theLines
   put "INSERT INTO 'TheTable'" && "(" & theDatabaseColumns & ")" && "VALUES" into sqlLineTemplate
   repeat for each line tLine in pText
      local tempLine
      put tLine into tempLine
      if item 1 of tempLine is not empty then
         put quote before item 1 of tempLine
         put quote after item 1 of tempLine
      end if
      local x = 1
      repeat for each item tItem in tempLIne
         if theLines is 1947 then
            if theSchemaArray[x] is "INTEGER" and item x of tempLine is " " then
               put "NULL" into item x of tempLine
            end if
         end if
         if tItem is empty then
            put "NULL" into item x of the last line of tempLine
         else
            if Not(tItem contains quote) and not (item x of tempLine is a number) then
               if theSchemaArray[x] is "INTEGER" then
                  repeat with y = number of chars in tItem down to 1
                     if char y of tItem is not a number then 
                        delete char y of tItem
                     end if
                  end repeat
                  if tItem is empty then put "NULL" into item x of tempLine
               end if
            end if
         end if
         add 1 to x
      end repeat
      put sqlLineTemplate && "(" & tempLine & ");" into line theLines of sqlText
      
      add 1 to theLines
      wait 0 millisec with messages
      updateProgressBar theLines
   end repeat
  return sqlText
end prepareSQL
*edit sorry forgot the last 2 lines
Last edited by edmerckx99 on Fri Aug 26, 2016 1:39 am, edited 3 times in total.

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7168
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: For each line loop slows down

Post by dunbarx » Thu Aug 25, 2016 5:12 pm

Hi.

Your code offering is incomplete, and therefore impossible to evaluate. I would make a 30,000 line sample text file to do so, but have no way forward.

Craig Newman

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 794
Joined: Wed Nov 22, 2006 3:42 pm
Location: France
Contact:

Re: For each line loop slows down

Post by Thierry » Thu Aug 25, 2016 5:46 pm

Hi,

Having a very quick look,
here is the first thing I will do:

Replacing this:

Code: Select all

wait 0 millisec with messages
updateProgressBar theLines
by that:

Code: Select all

if ( theLines mod 300) is 0 then
   wait 0 millisec with messages
  -- will show every new percent
   updateProgressBar theLines
end if
What are your timings then?

Thierry
Regex LiveCode sunnYrex
https://sunny-tdz.com

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 8169
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: For each line loop slows down

Post by FourthWorld » Thu Aug 25, 2016 7:27 pm

For general tips on speeding up LiveCoce scripts, this thread is well worth the read:
http://forums.livecode.com/viewtopic.php?f=8&t=24945

In your case, though, it seems you have SQL inserts taking place without transactions, yes? If so I'll wager that just opening a transaction before the insert and closing afterward will speed up that routine many-fold.
Richard Gaskin
Community volunteer LiveCode Community Liaison

LiveCode development, training, and consulting services: Fourth World Systems: http://FourthWorld.com
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

edmerckx99
Posts: 8
Joined: Sat Nov 20, 2010 2:32 am

Re: For each line loop slows down

Post by edmerckx99 » Fri Aug 26, 2016 1:38 am

Thierry,
it seems to have sped it up a bit, around 1/2hr instead of about 40 minutes. but it still seems to be slowing down as the script goes along.

FourthWorld wrote:For general tips on speeding up LiveCoce scripts, this thread is well worth the read:
http://forums.livecode.com/viewtopic.php?f=8&t=24945

In your case, though, it seems you have SQL inserts taking place without transactions, yes? If so I'll wager that just opening a transaction before the insert and closing afterward will speed up that routine many-fold.
i am not doing executing any sql here, i am prepping the text for a bulk transaction. all of my test are not including the sql transactions.

Here is the interesting part i added to the update section some code that calculates the milliseconds between the 300 lines.

Code: Select all

 if (theLines mod 300) is 0 then 
         
         put the millisecs into t2
         wait 0 millisec with messages
         updateProgressBar theLines
         put t2 - t1 into line timingLine of fld "TimingLine"
         add 1 to timingLine
         put the millisecs into t1
 end if 
the first 300 took 111ms
*** i have all times between the start and finish just didn't think added all 100 lines here. ***
the last 300 took 23553ms

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 8169
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: For each line loop slows down

Post by FourthWorld » Fri Aug 26, 2016 4:39 am

If this:

Code: Select all

put sqlLineTemplate && "(" & tempLine & ");" into line theLines of sqlText
...is changed to this:

Code: Select all

put sqlLineTemplate && "(" & tempLine & ");" &cr after sqlText
...then you get to take more full advantage of using "repeat for each" rather than "repeat with", since using the chunk expression "into line" requires the engine to count all line within sqlText each time through the loop.

You may be able to extend that to some of the item expressions as well, appending the items being traversed with "repeat for each" into a new line built up with "put...after...", avoiding expressions that require counting delimiters like "...item x of..."

Appends in LC are generally pretty fast.
Richard Gaskin
Community volunteer LiveCode Community Liaison

LiveCode development, training, and consulting services: Fourth World Systems: http://FourthWorld.com
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

edmerckx99
Posts: 8
Joined: Sat Nov 20, 2010 2:32 am

Re: For each line loop slows down

Post by edmerckx99 » Fri Aug 26, 2016 5:19 am

wow, that was it. it is now taking just 2 minutes. i really didn't think that using

Code: Select all

put "text" into line x of <container>
would have cause that much of a slow down.

thank you!

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 794
Joined: Wed Nov 22, 2006 3:42 pm
Location: France
Contact:

Re: For each line loop slows down

Post by Thierry » Fri Aug 26, 2016 11:57 am

edmerckx99 wrote:Thierry,
it seems to have sped it up a bit, around 1/2hr instead of about 40 minutes. but it still seems to be slowing down as the script goes along.
Hi,

So now that you have reach the 2 minutes,
if you send updateProgressBar for every line, you should see that it's not a detail anymore.
Here is the interesting part i added to the update section some code
that calculates the milliseconds between the 300 lines.

Code: Select all

 if (theLines mod 300) is 0 then 
         put the millisecs into t2
         wait 0 millisec with messages
         updateProgressBar theLines
         put t2 - t1 into line timingLine of fld "TimingLine"
         add 1 to timingLine
         put the millisecs into t1
 end if 
the first 300 took 111ms
*** i have all times between the start and finish just didn't think added all 100 lines here. ***
the last 300 took 23553ms
Yes, that's the way it works with LC since a long time.

If the 2 minutes are still too long for your purpose,
then I suggest to work with an external file for the output.
This way, you're not buffering your "sqlText" output into LC.
I've done that for big data quite a few times with good results.

My 2 cents

Thierry
Regex LiveCode sunnYrex
https://sunny-tdz.com

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 8169
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: For each line loop slows down

Post by FourthWorld » Fri Aug 26, 2016 3:39 pm

edmerckx99 wrote:wow, that was it. it is now taking just 2 minutes. i really didn't think that using

Code: Select all

put "text" into line x of <container>
would have cause that much of a slow down.

thank you!
Happy to help. LiveCode is a very capable language, made even better with flexible syntax that generally prevents us from having to think like a computer. But once in a while, esp. when performance is critical, thinking like a computer can help.

Chunk expressions like "item tSomething of line tSomethingElse" are very powerful, offering graceful parsing that rivals sed or awk for many tasks, but much more readable.

And in most cases they're pretty efficient, with the traversal of the string and keeping track of delimiters all done in reasonably well optimized C++.

But in loops, esp. loops working on large data, it pays to be mindful of any chunk expressions which may cause redundant string traversal.

The "repeat for each" form keeps a pointer into the source string as it goes, so it never needs to look for more than one delimiter. The "repeat with" form doesn't normally do that, and when its iterator variable is used in a chunk expression (like "get line i of tSomething") that's when we find that in order for the engine to know what we mean it needs to traverse the string, counting line delimiters as it goes, until it finds the i-nth one.

I've been delightfully surprised with how efficiently LiveCode does appends. I have no idea how they manage such dynamic memory allocation under the hood, but it's pretty nice. In many cases where I need to transform a large chunk, I find that parsing it with "repeat for each" and then building up a tranformed copy with append operations is much faster than transforming in place in the source text, since it never needs to count anything as it goes.
Richard Gaskin
Community volunteer LiveCode Community Liaison

LiveCode development, training, and consulting services: Fourth World Systems: http://FourthWorld.com
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

Post Reply

Return to “Getting Started with LiveCode - Complete Beginners”