recursion

marksmithhfx · Post by **marksmithhfx** » Mon May 23, 2022 8:45 pm

I am wondering if anyone has a different solution.

I had exported my Safari bookmarks to a file so I could manipulate them. My file is 11,500 lines and contains 294 bookmark folders. The file format is in XML (fairly easy to understand and parse) but has folders nested within folders. This makes processing the file linearly (top to bottom) difficult to do accurately. But as a first pass that is what I did. I wrote a linear routine that loops through the file using

Code: Select all

Repeat for each line tLine in field “myData” 
   — process tLine
End repeat

Gathering information about how many bookmark folders there were, and how many bookmarks in each folder. Data I could use to make a quick “accuracy” check against the actual data in Safari. And, as predicted (I had written code in the routine to handle it), I had a number of “orphaned” bookmarks that were created by aborting the process of one folder and starting another when I encountered a nested folder. Thankfully there were not too many of those so the number of orphans (around 59) was certainly a very manageable number of bookmarks to have to “process” manually after a straight linear run through a non-linear dataset.

The execution time: around 2 seconds.

But I knew that, if I was going to release this, I would have to bite the bullet and code it properly using recursion (ie. the handler that processes a folder calls itself whenever it encounters an embedded folder, etc). While this required a bit of mental gymnastics to figure out I knew I would have to give up the “repeat for…” loop I had used before because each time I called “process new folder” I needed to start in the field wherever I was, and continue processing from that point on, and then when I unwound that call, I would have to continue from where the last folder ended, etc, in short I had to have explicit control over the line number pointer in the field. So I wrote a function called GetNextLine that increments the pointer, fetches a line, and tests for end of file. That way I could have any number of “repeat while not EOF” loops and have them start and stop at different sequential places.

Code: Select all

function getNextLine
   add 1 to lLineCount
   put line lLineCount of fld “myData” into tLine
   if tLine = "</HTML>" then put true into lEOF
   return tLine
end getNextLine

Note that, yes, this also does a linear run through the file, but now I can stop the processing of one folder and continue with a new one, and then return back to the original using repeat loops that would be unique to each invocation of the handler. You cannot do that with something like “repeat for each line tLine…” (at least, I don’t think you can) because “repeat for each line” is always starting at the top of the field.

And yes, that worked remarkably well to read and categorise the bookmarks accurately and reduced the number of orphans from 59 to just 3; which happened to be 3 URL’s that were not categorised inside any folder anyway and so were legitimate orphans from a folder perspective.

Great, except it took 40 minutes to run. Yes, you read that correctly. Same input file, same computer, 2 seconds vs 40 minutes. How could processing the same file go from 2 secs to 40 mins? I was curious, and set about to find out.

Since the internal data processing was essentially the same between the two versions, I decided I would focus on the loop. And, there was no point in drilling into the “repeat for each line” version since that was already at virtually instant speed.

But in the other program I decided to just read through the file, top to bottom, using the getNextLine() function and doing NO other processing. And that took 36 Mins (or 90% of the time).

So I added a field to output the line number as it was being fetched and watched the progress. It actually started out pretty quickly (30 lines per second or thereabouts) but then as things progressed it was getting slower and slower. By the time it reached 5000 lines I could easily keep up with counting as fast as the computer was processing ie. About 5 lines a second. And it continued to deteriorate.

So I made two conclusions: first, “repeat for each line tLine…” is incredibly optimised, processing, even with the folder and bookmark identification and counting, about 5000 lines per second. And the manual version using “get next line” is incredibly non-optimised, processing at most (even with NO folder or bookmark processing) about 5 lines per second. 5000 vs 5. What could account for a 1,000 fold difference?

Watching the display as it continued to deteriorate I came to the conclusion that my “fetch next line” routine was actually starting at the TOP of the field every time and counting all the way down to the point where the next line was located. So the 10,000th line had to start at the beginning and count 9,999 lines before identifying the 10,000 line. And then repeat this whole process again counting 10,000 lines to identify the 10,001st line, etc. At least that is what it looks like. Can anyone confirm this?

So then I thought, well this really sinks any recursive solution if it can’t be done with an optimised read through the data. Unless… (and I hope) someone knows a better approach?

Mark

bwmilby · Post by **bwmilby** » Mon May 23, 2022 10:52 pm

I think you should be able to do this without recursion. You just need to keep track of the folder depth at any time and account for changing levels.

mwieder · Post by **mwieder** » Tue May 24, 2022 1:40 am

To start with, I wouldn't do this with a field. Copy the field into a variable, then iterate through the lines of the variable.

dunbarx · Post by **dunbarx** » Tue May 24, 2022 2:30 am

WHAT MARK SAID. Always do that sort of thing. Reading/writing to fields is SLOW.

See what your speed is like with that modification. I bet it is fast enough.

And then you can play with recursion as an exercise, not as a snailSpeed killer.

Craig

FourthWorld · Post by **FourthWorld** » Tue May 24, 2022 8:19 am

Besides, recursion is often slower than iteration anyway, in addition to taking more memory.

marksmithhfx · Post by **marksmithhfx** » Tue May 24, 2022 1:31 pm

dunbarx wrote: ↑
Tue May 24, 2022 2:30 am
WHAT MARK SAID. Always do that sort of thing. Reading/writing to fields is SLOW.

See what your speed is like with that modification. I bet it is fast enough.

And then you can play with recursion as an exercise, not as a snailSpeed killer.

Awesome, thanks for the suggestions. I'm never writing to the field, just reading it. I switched from field to var and shaved 6 mins off a 40 min execution time. Not enough to consider a significant improvement.

I'm looking at other options to speed it up. Brian's is one (although the memory management makes my head hurt... the thing is, each bookmark ends up in an array entry and a container that gets stored in an SQLite file and I can't see how "depth" would inform which array entry I should be updating).

The more I think about it though, the more I realise the only thing I am interested in doing is sorting the top level bookmarks. Anything embedded is just something that gets carried along with the top level (ie. if the top level is "home improvement" then any embedded folder just rides along with the top level when sorting bookmarks). In which case, some combination of that realisation with Brian's idea of keeping track of depth might do the trick (ie. any folder not at the top level is just ignored (from a folder perspective). Will give that a try and let you know how it pans out.

Back to the drawing board!!
Mark

marksmithhfx · Post by **marksmithhfx** » Tue May 24, 2022 1:34 pm

FourthWorld wrote: ↑
Tue May 24, 2022 8:19 am
Besides, recursion is often slower than iteration anyway, in addition to taking more memory.

I was prepared to live with that, if the overhead was not too great. But I think you are right, it looks pricey.

We will see what the next iteration of this challenge brings forth

Mark

bwmilby · Post by **bwmilby** » Tue May 24, 2022 2:46 pm

You could shave a bit of time by using a script local and splitting the data into an array. Then accessing each line would be constant time, but probably still noticeable.

Somewhere at the top of your code (outside any handler):

Code: Select all

local sData, sLine

Someone in the mouseUp that starts everything:

Code: Select all

put fld "myData" into sData
split sData by lf --or cr, same thing actually
put 0 into sLine

Revised function:

Code: Select all

function getNextLine
   add 1 to sLine
   put sData[sLine] into tLine
   if tLine = "</HTML>" then put true into lEOF
   return tLine
end getNextLine

marksmithhfx · Post by **marksmithhfx** » Wed May 25, 2022 12:55 pm

Thanks everyone. Your suggestions paid off. Somewhere between Brian's suggestion of using levels and my realisation that I only needed to keep track of "top level" folders I found a solution that does an iterative read through the file and produces a sortable list of all of the top-level bookmark folders (screenshot attached) in 1.7 secs (using the millisecond timer in LC this time instead of my watch chronometer). That is for a file containing 11,518 lines, 289 "top-level" folders, all folders saved individually to an Sqlite file (so they can be exported in a different order if desired), and just 3 orphans. The XML was stored in a field and read/processed from there (also screenshot of that).

By keeping track of the depth in the hierarchy I was able to ignore embedded folders and just process them as "more bookmarks". Of course I kept the XML structure so that in Safari they still look like embedded folders, I just didn't need to keep track of them as sortable objects.

Just finishing up now with an instructions page and then I'll save it as standalone (haven't done the signing / stapling of macOS standalones before so that may take a bit of time, but Panos did a presentation on it at DevCon so I'll go check that out). Then I'll upload it for others to give a whack at and discover all the "hidden" bugs.

Very appreciative of all the input you guys provided. This was a difficult challenge, but a good one, and the result is what I was hoping for.

Mark

I've posted the two screenshots in my dropbox folder. They were too large to upload here. https://www.dropbox.com/s/9y2l70ez3e5by ... s.zip?dl=0

stam · Post by **stam** » Wed May 25, 2022 4:51 pm

marksmithhfx wrote: ↑
Wed May 25, 2022 12:55 pm
Just finishing up now with an instructions page and then I'll save it as standalone (haven't done the signing / stapling of macOS standalones before so that may take a bit of time, but Panos did a presentation on it at DevCon so I'll go check that out). Then I'll upload it for others to give a whack at and discover all the "hidden" bugs.

hi Mark - there is an excellent Lesson on this. It also comes with a stack that can do the heavy lifting for you. However once you’ve got your profile details down it’s actually only 4 terminal commands - I now have a text file I copy/paste these from, just changing the specific details like app location and UUID as needed. Usually takes 5-10 mins to get an app stapled…
I don’t have the URL to the lesson handy but found it invaluable, so do google it…

Regarding this challenge - well done and well done for identifying this as a fun project

it is actually quite interesting but only had about 30 mins to play with some ideas (real life and all that…).it’s very quick to derive a folder structure for the bookmarks, but have really had time to move it on from there… I’ll keep tinkering as a diversion

stam · Post by **stam** » Wed May 25, 2022 8:53 pm

The lesson i mentioned (because there is more than one) is here: https://lessons.livecode.com/m/4071/l/1 ... c-appstore

Very detailed write-up and highly recommended.

i used the stack attached to this lesson initially and it does work, but more recently have just copied the 6 terminal commands to a text file, adjust as needed and copy into terminal (for paths, i just drag/drop the item into the terminal window so formatted correctly and don't need to be enclosed in quotes):

Code: Select all

sudo xattr -cr <path_to_app_bundle>

Code: Select all

sudo chmod -R u+rw <path_to_app_bundle>

Code: Select all

codesign --deep --force --verify --verbose --sign "Developer ID Application: <certificate name>" --options runtime <path_to_app_bundle>

Now zip the app to AFTER code signing

Code: Select all

xcrun altool -type osx --notarize-app --primary-bundle-id "<bundle_identifier>" --username "<apple_user_name>" --password "<app_specific_password>" --file <path_to_ZIPPED_app>

This returns a UUID which you paste into the below optionally to check the status of notarisation (or instead just wait for the email that app has been notarised)

Code: Select all

xcrun altool --notarization-info <UUID> --username "<apple_user_name>" --password "<app_specific_password>"

once notarised, staple the notarisation:

Code: Select all

xcrun stapler staple -v <path_to_app_bundle>

that's it - i just keep these in a text file and re-use. For certificate name, app specific passwords etc, look at the lesson above, it's a 1-time thing.
HTH
Stam

marksmithhfx · Post by **marksmithhfx** » Wed May 25, 2022 8:56 pm

stam wrote: ↑
Wed May 25, 2022 4:51 pm
hi Mark - there is an excellent Lesson on this. It also comes with a stack that can do the heavy lifting for you. However once you’ve got your profile details down it’s actually only 4 terminal commands - I now have a text file I copy/paste these from, just changing the specific details like app location and UUID as needed. Usually takes 5-10 mins to get an app stapled…
I don’t have the URL to the lesson handy but found it invaluable, so do google it…

Hi Stam,

Would it be this one?

https://lessons.livecode.com/m/4071/l/1 ... c-appstore

I plan to have a closer look at it tomorrow.

Best,
Mark

stam · Post by **stam** » Wed May 25, 2022 9:30 pm

marksmithhfx wrote: ↑
Wed May 25, 2022 8:56 pm
Would it be this one?

Yep - see above. I probably posted it a minute before u

marksmithhfx · Post by **marksmithhfx** » Thu May 26, 2022 8:05 am

stam wrote: ↑
Wed May 25, 2022 9:30 pm

Yep - see above. I probably posted it a minute before u

Ok great, thanks for the extra tips.

Mark

marksmithhfx · Post by **marksmithhfx** » Fri May 27, 2022 5:08 pm

stam wrote: ↑
Wed May 25, 2022 8:53 pm
i used the stack attached to this lesson initially and it does work, but more recently have just copied the 6 terminal commands to a text file, adjust as needed and copy into terminal (for paths, i just drag/drop the item into the terminal window so formatted correctly and don't need to be enclosed in quotes):

hat's it - i just keep these in a text file and re-use. For certificate name, app specific passwords etc, look at the lesson above, it's a 1-time thing.
HTH
Stam

Thanks Stam. I tried your links but I must have screwed up somewhere. The response I got back from altool was:

Code: Select all

          Date: 2022-05-27 3:56:40 pm +0000
          Hash: 7b7832eb4bc4ff2558d3806731d6a61836d80bb01503678dcce0134907a81644
    LogFileURL: https://osxapps-ssl.itunes.apple.com/itunes-assets/Enigma112/v4/9f/01/69/9f016958-240b-570e-cd46-3fc40b4c01ab/developer_log.json?accessKey=1653861676_3049167111613762662_kqVvMcdHhaZnnOaZAK7vtNKTZzTMqjNyNnDi%2FpNvY5goGMIpZ%2FygPQO63lDcfVEFeBEbqNZfadSW6uzeKIalJGHy6yNISLdtdtUSDbVMdNGHgkA36cl%2F9nmiaxW475PgyETy6NFQnKf5mfnK6aahXopC8lFlkcKaX7m0wothuGo%3D
   RequestUUID: 1abed8e1-88c1-49b4-84f8-ea0ee9c24547
        Status: invalid
   Status Code: 2
Status Message: Package Invalid

Does that suggest anything to you? I wasn't trying to sign a package, just a zipped app. Could that be the problem?

Mark

LiveCode Forums

recursion

recursion

Re: recursion

Re: recursion

Re: recursion

Re: recursion

Re: recursion

Re: recursion

Re: recursion

Re: recursion

Re: recursion

Re: recursion

Re: recursion

Re: recursion

Re: recursion

Re: recursion