Page 1 of 1

Performance considerations when working with large directory

Posted: Mon Apr 29, 2013 9:27 pm
by tjm167us
Good afternoon everyone,
I am in the process of writing a file parsing program that will be processing files from a directory that has ~100,000 files. My general strategy is as follows:
1. Let the user select which directory they wish to process files from
2. Grab a file, one at a time, process it, append to an output file, and then grab the next file for processing.

Before I start grabbing files however, I need to get a list of the files in the directory (in addition to some file details such as date modified and date created). This I am currently doing the following way:

Code: Select all

   answer folder "Select the folder of event files you wish to search:"
   set the directory to it
   put it into field "fld_Folder"
   
   put the detailed files into fileDetails
The problem I am finding is that the files function is taking a long while to process this (understandably so), but when processing this number of files, I find it inappropriate for the program to be unresponsive to the user without any option to opt out, or any status bar, etc.

Is there a way to instead grab a smaller number of files from the directory (say maybe 1000), process them, provide the user with status regarding where the program is in processing, and then grab the next 1000 files in the directory ad nauseum? I look forward to hearing back from you guys!

Thanks,
Tom

Re: Performance considerations when working with large direc

Posted: Tue Apr 30, 2013 1:50 am
by sturgis
What OS? If its a windows only application you could use open process and then read from the process 1k lines at a time until the end. (open process "dir" for text read)

If you need multi platform support its a little more complicated due to how open process works. So you might consider using scripts for each os that will output to a text file then read from the file. (launch "whateverscript/app"

In fact, you could write a separate app in LC that will just process the file list and output to a file. (the second app could be launched with the dir to be addressed as a parameter) then use a shell call to create the file list. Since it would be a separate executable in its own thread there would be no blocking your main app that would be in a read loop, reading the file created by the 2nd app.

Just thinking out loud here.. er.. out typed? There are other ways i'm sure.

Re: Performance considerations when working with large direc

Posted: Thu May 09, 2013 12:39 pm
by tjm167us
Thanks for the reply!
I am still pretty new to this stuff, so this open process "dir" stuff seems a little over my head at the moment. This is going to be deployed on Windows only. On a seemingly related issue, without worrying too much about performance at the moment, I have written my code to loop through these large number of files, and an interesting thing happens that I can't seem to explain. As I am looping through my files, I have a text label that displays "Processing file X of XXXX", which you can use to tell the programs progress. This works great for the first couple hundred of files, at which point the UI stops updating, I get the spinny wheel, and after a little longer, live code becomes responsive again, and the task is completed. Can someone explain why the UI stops updating, but the files continue to be properly processed? The general structure is shown below.

Code: Select all

 
repeat for each line tLine in fileList
--open file
--process file
--update my UI
end repeat
Any ideas would be greatly appreciated.
Thanks,
Tom

Re: Performance considerations when working with large direc

Posted: Thu May 09, 2013 12:43 pm
by Klaus
Hi Tom,

you could try to add a "wait 0 with messages" to give the UI time to respond:

Code: Select all

...
repeat for each line tLine in fileList
  --open file
  --process file
  --update my UI
  wait 0 with messages
end repeat
...
Best

Klaus