Page 1 of 1

Reading 100,000+ files

Posted: Thu Jul 19, 2012 7:10 pm
by lars604
Hi guys,

I have over 100,000 XMLs that I need to read from and make a list out of a few specific tags. These files are broken into several folders (as displayed below in Fig1). I've put the code snippet below as well.

So far I've made the program run through the folders/files (by setting defaultFolder) and it seems to be working fine; however, after 244 files it just stops. It successfully goes through the first few folders no problem and the folder it's stopping on is right in the middle. Then I am unable to save or run the process again. I believe I'm running into some sort of system/memory limitation.

Fig1: Folder Structure
Image

Any tips on how to create this 100,000 item list without crashing?

Thanks!

** APP DETAILS **

Objects:
Buttons:
1: "Start" with the below script
Fields:
1: gPath
2: fileList
3: status


-- code for the "Start" button --


on mouseUp
global gPath

// Set default path
if fld "gPath"=empty then
answer folder "Choose root folder"
put it into gPath
put gPath into fld "gPath"
else
put fld "gPath" into gPath
end if

// Open Index Folder
put gPath&"/IndexDetails_1/" into gPath

set the defaultfolder to gPath
put the folders into indexFolders

filter indexFolders without ".*"

// Repeat for each folder in the index
put 0 into counter
repeat for each line thisIndexFolder in indexFolders
if the capslockkey=down then exit mouseUp

set the defaultfolder to gPath&thisIndexFolder
put the folders into folderLevel1
filter folderLevel1 without ".*"

repeat for each line thisFolder1 in folderLevel1
if the capslockkey=down then exit mouseUp

set the defaultfolder to gPath&thisIndexFolder&"/"&thisFolder1
put the folders into folderLevel2
filter folderLevel2 without ".*"

repeat for each line thisFolder2 in folderLevel2
if the capslockkey=down then exit mouseUp

set the defaultfolder to gPath&thisIndexFolder&"/"&thisFolder1&"/"&thisFolder2
put the files into allFiles
filter allFiles without ".*"

repeat for each line thisFile in allFiles
if the capslockkey=down then exit mouseUp

// Open the file and read the data
put gPath&thisIndexFolder&"/"&thisFolder1&"/"&thisFolder2&"/"&thisFile into fPath
open file fPath for read
read from file fPath until end
put it into rawData

// Get the Doc location
replace "><" with ">"&return&"<" in rawData
put lineOffSet("<docfilename>",rawData) into x
put line x of rawData into thisDocFile
replace "<docfilename>" with empty in thisDocFile
replace "</docfilename>" with empty in thisDocFile

// Get the TransID
put lineOffSet("<name>TransactionID",rawData) into x
put line x+1 of rawData into thisTransID
replace "<value>" with empty in thisTransID
replace "</value>" with empty in thisTransID

put thisTransID&tab&thisDocFile&return after fileList
put counter+1 into counter
put counter into fld "status"
end repeat
end repeat
end repeat
end repeat

put fileList into fld "fileList"
end mouseUp

--

Re: Reading 100,000+ files

Posted: Thu Jul 19, 2012 7:28 pm
by Klaus
Hi Lars,

I would close all files after reading:

Code: Select all

...
// Open the file and read the data
put gPath&thisIndexFolder&"/"&thisFolder1&"/"&thisFolder2&"/"&thisFile into fPath
open file fPath for read
read from file fPath until end
put it into rawData
close file fPath
...
8)

And/or add a breakpoint "if tCounter = 243 then breakpoint" to see what's going on.


Best

Klaus

Re: Reading 100,000+ files

Posted: Fri Jul 20, 2012 5:16 pm
by lars604
Whoops! How did I forget that. Thanks for pointing that out. It's now timing out/crashing so I know it's getting past 244. Now I'll build in some logic to create 10,000 line files or so.

Many thanks!