Page 1 of 2
Complicated loop function
Posted: Sun Mar 07, 2010 11:31 am
by ac11ca
Hi,
I was looking for an efficient way to do the following:
I have a formula/model, which contains five adjustable parameters/variables (a,b,d,g,m), that is used to predict Y upon being given X. I also have the true/observed value of Y.
I would like to find out what specific set of parameters will allow the formula/model to minimise the difference (i.e., Mean Square Deviation; MSD) between the predicted-Y and the observed-Y.
Each of the five parameters/variables can range between 0 and 3 (not inclusive).
I want to run a loop function that adjusts each parameter/variable by 0.01, runs the model, and computes the MSD (with the intention to find the set of parameters that minimises the MSD). That is: find the MSD for when a=.01, b=.01, d=.01, g=.01, m=.01 and then when a=.02, b=.01, d=.01, g=.01, m=.01 and then when a=.03, b=.01, d=.01, g=.01, m=.01 and so on. Clearly, that is going to be a lot of number crunching. Basically, a brute force.
Thanks for any help,
Adrian
Re: Complicated loop function
Posted: Sun Mar 07, 2010 8:01 pm
by Regulae
As I understand it, you want to cycle such that each of five parameters a,b,d,g,m takes values from 0.01 to 2.99, with 0.01 increments. Thus we would generate (a at 299 values) * (b at 299 values) * (d at 299 values) * (g at 299 values) * (m at 299 values), calculating the MSD given each combination of parameters. This would fit the description of “brute force”. The aim is to find the combination which yields the minimum MSD (perhaps there will not be a unique combination). Even just cycling through the combinations takes time. I tested with a field “output”, and the following script in a button:
Code: Select all
on mouseUp
put the milliseconds into starttime
repeat with m = 0.01 to 2.98 step 0.01
put m&return after mAccumulator
end repeat
repeat for each line d in mAccumulator
repeat for each line g in mAccumulator
repeat for each line m in mAccumulator
-- d, g, and m are three parameters
end repeat
end repeat
end repeat
put the milliseconds into endtime
put endtime - starttime into card field "output"
end mouseUp
... which builds a list of the parameter values, 0.01 to 2.99, in mAccumulator, and uses the list purely to cycle through the combinations, outputting the time taken in milliseconds. The above takes on my machine on average 2230 milliseconds. If we add another “nest” of “repeat for each line b in mAccumulator” the time multiplies 299 * 2230 ms, giving 666770 ms, or roughly 11 minutes (which I calculated rather than sat through). The final nest of “repeat for each line a in mAccumulator” multiplies again, 299 * 11 minutes, or 54 hrs 48 min. Given that this is the time required, even without actually involving the parameters in any calculations, I’m not sure a computationally feasible solution beckons, assuming I have understood the requirement. If not, perhaps my post will help clarify the question.
Regards,
Michael
Re: Complicated loop function
Posted: Mon Mar 08, 2010 12:35 am
by ac11ca
Thanks for your reply Michael,
You have understood correctly what I am trying to do and, as I suspected, it is a bit much. In order to save computation time, I am prepared to simplify the model and reduce the search to the following:
assume a = b = value between 0.01 and 1.99, with 0.01 increments.
assume d = g = value between 0.01 and 1.99, with 0.01 increments.
m must still be indepndent and should be able to take any value between 0.50 and 3.50, with 0.01 increments.
This simplification can be done in 572ms on my machine (860ms for the code quoted in your post), so it seems much more workable.
Your code is very helpful Michael, so thanks a million! Can I ask what change I should make to your code to ensure that a is treated equal to b? I realise I could just find/replace all a's into b's, but in the future I may want to drop my a=b simplification.
Cheers,
Adrian
Re: Complicated loop function
Posted: Mon Mar 08, 2010 3:20 pm
by Regulae
Sorry for the delay in reply. Nested repeat loops take a little taming. Attached is a fairly “rough and ready” stack which lets you test repeat cycles through the set of parameter combinations, and the duration of a “core calculation”. This should give an indication of how long a given arithmetic experiment will take. Hopefully, we make progress, and of course other folks on the forum may have better approaches to suggest.
Regards,
Michael
Re: Complicated loop function
Posted: Tue Mar 09, 2010 1:55 pm
by ac11ca
Thanks for your help, Michael. I've been distracted today, and haven't had a good chance to digest your attachment, however, I will rectify that tomorrow hopefully and get back to you as to how it goes.
Cheers,
Adrian
Re: Complicated loop function
Posted: Tue Mar 09, 2010 2:24 pm
by Regulae
Just a quick note. I’ve just noticed that the script of the button “Calculation time tester” accidentally includes a title comment from another control, which suggests the script is a non-working example. The button is intended to actually function. Sorry for any confusion.
Regards,
Michael
Re: Complicated loop function
Posted: Wed Mar 10, 2010 1:19 pm
by ac11ca
Hi Michael,
I had a chance to look at your stack this evening - it is very useful and userfriendly: thankyou!
Attached is a stack that includes my original program with you additional scripts.
I have included my calculation in the button "Calculation time tester" and it takes on average 5ms. With the number of loops, that would nearly take a day. That's probably not feasable. I guess I will need to chop off some of the range in the paramaters to bring this down...
In the button "Parameter Test" I have included your loop with my calculation. You might note that I changed some notation (a = alpha, etc). You will also note that I have adjusted the step to .1 instead of .01 to test that the script runs through without error: it does. However, the output is not correct: the MSD appears to equal 0 for all cases. Ive not worked out why yet.
In case you were wondering what it all means... I am presenting people with a choice between two options (each line in my fields represents a new choice), gambles if you will: one that is safe and one that is risky. I am then looking to predict which option they will select with a model that has certain adjustable paramaters that relate to some psychological construct (i.e., delta/gamma relate to how probabilities are subjectivly weighted, c.f. Prospect Theory if you are really interested). For example, the first row shows that the "safe" option paid out -.3 100% of the time and that the risky option paid out -2.1 4% of the time and -.3 96% of the time. If you click on "Model Fit", the model tries to predict the proportion of people that will select the riskier option. This number goes into the column "Prediction". It is based on paramaters estimated by a guy named "Erev" and the column with his name shows his results for these problems. The fact that mine are +/-1% his show some (rounding?) error in his or my code - still figuring out the source. The final column, "Observed", is what actually happened when we gave these decisions to people. Thus, the model fit that I am trying to perform seeks to minimise the difference (i.e., MSD) between "Prediction" and "Observed" columns.
Cheers,
Adrian
Re: Complicated loop function
Posted: Thu Mar 11, 2010 6:15 pm
by Regulae
Hi Adrian,
The attached stack addresses several issues. A particular challenge has been finding an effective way of estimating the time required for an extended parametric investigation. I must admit there have been some false starts on this question, but it’s crucial to establishing the computational feasibility of a given combination of ranges and step-values. Still, progress is being made.
Regards,
Michael
Re: Complicated loop function
Posted: Sat Mar 13, 2010 3:07 pm
by ac11ca
Hi Michael,
Let me first just say that I am genuinely grateful for your help. The program is much improved in all respects and, most importantly, it can now accomplish what I initially set out to do. Success! As you say, there are numerous improvements that can be made, but the machinery is now established. I do not know how to repay you, or even if I can really. However, I must ask what has motivated you to invest so much time into my little problem?
Specific things that I want to note:
- opening scripts directly by holding down the shiftKey and clicking is an ingenious timesaver. I wish I had come across this shortcut years ago.
- the parameter setting panel is very useful in this testing period
- the filter test was a good idea to pick up the mistake
- detecting the reason for the MSD locking to 0 was crucial. Well done on spotting the cause.
- the performance report is useful particularly, as you say, for coming to terms with the impact of step changes and the subsequent implication for test duration. Well done.
- as you note "Model fit" is just my core calculation with parameters estimated by someone else, so it is fairly straight forward
Apart from cosmetic rearrangements, eliminating unnecessary global variables and adding some script to ensure that my values are in the right variables, the major thing I have done is to test the validity of the parameter test. To do this, I compared its output (with only .05 steps as opposed to .01) against known output parameters for the same set of 60 problems (see table at the bottom of
http://tx.technion.ac.il/~erev/Comp/BaseDecRisk1.html). The estimated parameters were close indeed, which suggests that things are working as they should be. Beautiful! Note that 60 problems increases the calculation time notably (3.3 ms on average for my computer). However, 60 problems are only being used for the sake of validity testing. In practice, I will never have more than 10 problems.
You mention that the program is still work in progress and suggest some good possibilities for improvement (I particularly like the progress bar idea but am not sure I understand the idea related to data accumulation in the output field), however, I feel that the program basically does what I intended it to do and am thus moderately satisfied. Of course, if I decided that I wanted to separately estimate a, b, d, g (as I am sure I will at some point in the future), I would need to leave the program running over a weekend, which is not ideal but possible. My biggest concern at the moment, and what I will focus my energies on next, is locating the source of the discrepancy between my predictions and those of Erev (I have been in correspondence). Having said that, the latest version of the stack is attached and you are more than welcome to edit/improve it. I am not one to turn away from a superior tool, I am just time-pressed (and lacking in certain necessary skills) to create such an instrument.
Cheers,
Adrian
Re: Complicated loop function
Posted: Sun Mar 14, 2010 4:03 am
by Regulae
Hi Adrian,
It’s good to hear things have moved in the right direction. The problem has several points of interest- it involves complexity on more than one level. Because of this, any solution which is only partial is of little value. There is of course the complexity of the repeat structure required, with the challenge of optimising performance. There is also the fact that we intend the process to run for extended periods with iterations so numerous that only a computer can perform them. For this very reason, we can’t check the results “by hand”. Any confidence in the output rests on establishing four things about the process itself:
1. That the parameter ranges actually used in the process are as we intend.
2. That the parameter combinations available to the core calculation are correctly constructed by the “repeat” structure.
3. That the core calculation is an algorithm which correctly instantiates the model under test.
4. That the logical “filter” reliably gathers all, and only, the desired cases.
The “test points” included in the “Parameter Test” script attempt to address 1,2 and 4, and your tests to replicate existing data sets are encouraging concerning 3, though there seems to be something subtle going on. Hopefully our understanding will converge rapidly on the solution. Keeping in mind that the aim is to quickly produce a practical tool rather than anything more refined, there are four basic improvements to increase effectiveness:
1. Input validation to verify the correct number of parameters, and that they are numeric.
2. Changing the type of repeat used in the core calculation to improve speed.
3. Time estimation when changing parameter ranges, so you know what to expect.
4. Progress indication, so you know how things are going.
This sounds like a lot, but actually it’s pretty simple (I believe)- most of the pieces are already there, and much of it is interrelated. It will complete the basic mechanism, which can always be refined later. I’ll have a look at this over the next day or so.
Regards,
Michael
Re: Complicated loop function
Posted: Sun Mar 21, 2010 1:27 am
by Regulae
The “complicated loop” problem is all about large scale number-crunching. A practical challenge is that a given “run” can take several hours. In the current implementation (attached) when a run is started, a field is shown giving, firstly, the estimated time of the run. Also shown, periodically updated, is the percentage completed (counting up) and the estimated time remaining (counting down). I found that for really long processes a progress bar is rather uninformative compared with just the percentage/time remaining figures. Another useful feature for a process which can run for hours is “pause”, so the computer can be used for other tasks. Given that the complex repeat is performing a mathematical investigation, special care must be taken to ensure that the results of such a “stop/start” process are still valid. The lengthier the process, the greater the potential for an interruption, such as a power failure. It seemed worthwhile to add an “auto-save” capability, for runs estimated at longer than half an hour, to limit the extent of any such set-back.
One curiosity concerned:
Code: Select all
repeat with myVal = startVal to endVal step stepVal
... when the endVal = 0.8, or 0.9 or 1.0 or 1.1, and stepVal = 0.1
What normally happens is e.g.
repeat with 0.5 to 0.7 step 0.1
... with the resultant values 0.5, 0.6, 0.7 ... as you would expect. But:
repeat with 0.5 to 0.8 step 0.1
... generates the values 0.5, 0.6, 0.7, 0.8, 0.9 ...which is surprising. I found the same “extra value” phenomenon for the other endVal’s mentioned above, but only for these values. To see this, put the following into a button script:
Code: Select all
on mouseUp
put line 1 of fld "test" into startVal
put line 2 of fld "test" into endVal
put line 3 of fld "test" into stepVal
repeat with myVal = startVal to endVal step stepVal
put myVal&return after valueStore
end repeat
put valueStore
end mouseUp
... the example needs a field “Test”, with values entered into the lines as indicated, and outputs to the message box. This anomaly, admittedly obscure, did cause considerable confusion.
Regards,
Michael
Re: Complicated loop function
Posted: Sun Mar 21, 2010 10:48 am
by ac11ca
Hi Michael,
That's quite a strange phenomenon/anomaly you noticed. Testing an extra step isnt really a problem, but I can see how it could throw us off is we are unaware of it.
The improvements you have made in SCPT4 certainly make the program more usable and practical. The Monitor field is most welcomed. The pause function, however, is probably less likely to be taken advantage of. My approach to using the program would most likely be to start the Parameter Test on a Friday and return back to work with a result on Monday. Power outages, however, are always a possibility. You have also put in a number of useful safeguards with respect to data entry that are always useful to have in place (I should probably be the only user, so an advanced user assumption is warranted, but I can still make mistakes in data entry). Unfortunately, I don’t not understand what is meant to be happening with the "Field Behaviour" button and the fields you note in your comments that are associated with this button.
You may be happy to learn that I have nailed to the disparity between my results and those of Erev to one of two lines in the core calculation. We are still trying to work out exactly where our differences emerge from with respect to these two lines.
One concern I with the program is that it is getting a little beyond me: Ive spent the last few hours trying to understand what everything does, and I am sure I haven’t yet got it all. I am particularly concerned that if I tried to edit the script now to relax our constraints that a=b and d=g, that I would find myself in no end of strife (even with the "a<>b variant button that you have included). I was wondering, if you had time and there is no rush, if you could create a new button called, say "Advanced Parameter Test", that was identical in most ways to the current "Parameter Test" except for allowing a<>b and d<>g?
Cheers,
Adrian
Re: Complicated loop function
Posted: Sun Mar 21, 2010 11:57 am
by bn
Hi Michael,
may be this is a lead to the problem of ".8" :
Code: Select all
on mouseUp
put line 1 of fld "test" into startVal
put line 2 of fld "test" into endVal
put line 3 of fld "test" into stepVal
set the numberformat to ".#####################"
repeat with myVal = startVal to endVal step stepVal
put myVal&return after valueStore
end repeat
put valueStore
end mouseUp
but this is way beyond me. It does not happen with integers.
regards
Bernd
"...and what we can not speak about we must pass over in silence"
Re: Complicated loop function
Posted: Mon Mar 22, 2010 6:05 am
by Regulae
Hi Adrian,
Congratulations on locating the source of the disparity in the core calculation. That’s a crucial hurdle and I didn’t envy you the task. Trying to determine, amid a forest of conditionals involving inequalities, why one calculation produces different results from another can be the stuff of nightmares. This is very good news!
Of the many benefits of developing with Rev, one is rapid development time, another is the language, RevTalk, which can be readily understood if well written. SCPT4 reflects the benefits of the former somewhat at the expense of the latter. A script which is difficult to understand is a liability to maintain and modify, the more so as memories fade. Though the current script may work, it is opaque and intricate- not at all as I would like. New features are of dubious merit if they render the script unintelligible. Addressing this issue is a key objective of the next iteration of the development cycle. I am glad of the chance to do this in the a<>b and d<>g versions. This is quietly underway.
The button “Field Behavior” was experimental. I wondered if data entry into number-laden fields, like field “Risky”, would be easier and less error-prone if the fields could be, temporarily, enlarged. You click the field, it doubles in size and centres in the card, you enter numbers or change them, move the mouse out of the field, it “snaps” back to its original size and place. As there were several fields which might benefit from this, I employed “behavior”, i.e.:
Code: Select all
set the behavior of field “Risky” to the long id of button “Field Behavior”
... so now field “Risky” effectively has the script of button “Field Behavior” (my choice of button name was not helpful). This is a convenient way to have multiple objects, in this case fields “Safe”, “Risky”, “Observed” etc. all using the same script, that of the button “Field Behavior”. (It's worth mentioning that "behavior", like "color", has to have that spelling- I did forget.) To see the effect, click on any of the above-mentioned fields. They appear in the middle of the card, double height/width/textSize. Moving the mouse out of the field causes them to return to their original spot. To bypass the behavior, open the script of button “Field Behavior” and comment-in the line indicated in the “mouseUp” script. Another approach is to remove the behavior from an individual field. One way is to open its Object Inspector. At the bottom of the “Basic Properties” you should see “behavior”- delete the reference to button id 1158 of stack "SCPT2". You can also in the Message Box put:
Code: Select all
set the behavior of fld “Risky” to empty
It was confusing to introduce this experimental element at the last minute.
@Bernd
The idea of using “numberFormat” is inspired, in that it reveals something of what is happening behind the scenes. I’m still mystified as to what to make of it, and can only conclude that:
"An 'inner process' stands in need of outward criteria"
Regards,
Michael
Re: Complicated loop function
Posted: Wed Mar 31, 2010 7:01 am
by Regulae
Attached is a version of the Parameter Testing stack which allows a<>b and d<>g to be tested. At the same time, the script structure has been improved somewhat.
Regards,
Michael