Conflict Catcher in the Rye: Testing Results **Draft** Notes: Still need to identify the max_pw_length parameters used in each of the tests. Not in email? Retrievable from wetware? Types of Tests Performed: • Tests done with 1 day, 2 day, 4 day, 7 day max_pw size parameter • one round of tests done to vary the range of max_PW size • another round of tests done to try using not_after set to value < 1.0 OPR47083 • One test done to try setting not-after date weight to less than 1 and using it to control suppressing of CC’s visit planning beyond a certain date (cycle end). Files in /data/operational1/control/ modified for PWCC runs: • pwcc-control-file.lisp (as in load-a-control-file file last modified Oct 14) • pwcc-test.lisp (setup Oct 14 for one of the subsequent tests with 2 iterations of max-pwwidth-in-days 2 and then max-pw-width-in-days 7) • plan-window-conflict-parameters.lisp (scripting of the CC last updated Oct 25) LRPs created: 02_218PWC01 (August 6) 02_218PWC02 02_226PWC01 (August 14) 02_226PWC02 02_282PWC01 (October 9) 02_282PWC02 02_288PWC01(October 15) 02_288PWC02 Table cleanup has made those of the above format disappear. But still around are: |CC02| |CC03| |CC04| |CC05| |CCTEST02 | |CCTEST02B | |CCTEST02C | |CCTEST02D | Compilation of notes from the individual CC test run analyses: Wednesday, August 6: Early test of the conflict catcher. History of parameter settings lost. 02_218AUTO_02_218PWC01.DIFFERENCE;1 02_218PWC01_02_218PWC02.DIFFERENCE;1 Thursday, August 14: Early test of the conflict catcher. History of parameter settings lost. 02_226AUTO_02_226PWC01.DIFFERENCE;1 02_226PWC01_02_226PWC02.DIFFERENCE;1 Thursday, October 9 run: Test of dual serial iterations using different max_pw sizes. Test of what happens in an LRP when the CC is run several times with differing max_pw_length parameters in the sequential CC runs. Were the run values 2 days and then 7 days, 2days/4days, 1day/2days, or something else? I think they were either 2/7, 7/2, 7/1, or 1/7. 02_282AUTO_02_282PWC01.DIFFERENCE;1 • 4 2003 GOODS visits lost PWs but remained schedulable. History lost--if no PWs and schedulable...does this state make sense?? OPR needed if not already filed. Thought i discussed this one with Drew, or maybe even at a users meeting.... • AV noted that suppressed resources circa 02.360 caused 9503 to be moved a year later by CC • 2 visits in 9382 moved ~ 6 months early. reason not identified. • Numerous small changes identified, with most ‘probably ok’. 02_282PWC01_02_282PWC02.DIFFERENCE;1 • Several “GOOD!” comments! • But, 9290, visits 01-08 lost PWs though were still ‘schedulable’. I thought an OPR was filed on this, but i see no record of it being open. • ‘Things moving b/c low resources in “cycle 12” • change not-after date? Wednesday, October 15: Test of dual serial iterations using different max_pw sizes and different resource control strategy. This was sort of a repeat of the previous week’s run with (i think) identical numbers of iterations and max_pw sizes) except that we thought we would try a different way of controlling where visits would get moved to by the conflict catcher. The problem was that visits would be chosen to move by the conflict catcher based upon local violations of the resource levels which are conventionally set lower than the nominal limits in order to suppress visits coming into those reduced subscription areas. In this run, we set the not-after date earlier by about 6 months (to 03.181, as i recall) and set the weight to a value < 1.0, and set the resources up to their maximum levels thinking this would allow the CC to bias replanning of visits to before the not-after date. It did not, since SPIKE treats the not-after date as a kind of hard constraint regardless of its weight (not strictly true, but see OPR47083 comments for the details). The result was in the 02.288AUTO, visits and linksets out beyond that date dropped away as unschedulable. 02_288AUTO_02_288PWC01.DIFFERENCE;1 Remembering that the 02.288AUTO is compromised due to the lack of visits planned beyond 03.181, this shows the difference after the first PWC run. 02_288PWC01_02_288PWC02.DIFFERENCE;1 • Second iteration using different max_pw_length. • dropped 9290 visits’ windows again. Appears to be the window size. • several “GOOD” comments • with 9503, Alison highlights that the CC is truncing PW-lets that are at the beginning of a plan window...should these be left alone, since there are future opportunities as well, or do we not even want those to show? Test Conclusions: October 15: Stability/Convergence Check These were sequential iterations with LRPs written out in between to check convergence toward a stable (or unstable) solution using the same CC parameters each time. All iterations used the same max_pw_width = 2 days. 02287A_CCTEST02.DIFFERENCE;1 • First iteration. • 4 GOODS 9583 visits became unplanned because SPIKE didn’t try after CC attempted move. • 966707 became unschedulable probably the result of PC changes? • 3 9503 visits jumped a year. • Only 2 other jumps larger than 1 month. About a dozen less than that. CCTEST02_CCTEST02B.DIFFERENCE;1 • Second iteration. • The 4 GOODS visits that lost their windows in the 1st iteration got them back! • 4 visits changed flight ready dates. • 1 visit ‘flipped’ back to its previous spot (0950901). CCTEST02_CCTEST02C.DIFFERENCE;1 • Third iteration. • 1 visit flipped back: 0940172 • 1 new move: 0940149. CCTEST02_CCTEST02D.DIFFERENCE;1 • Fourth iteration. • 2 visits flipped back: 0938274 & 0950901. Test Conclusions: The conflict catcher does not produce entirely stable moves. Repeated iterations can have visits flip back and forth, visits can lose and then regain their windows on successive iterations without a predictable pattern (e.g., 9583). October 16: Varied max_plan_window size check a set of scripts run from /home/lrp/cycle11/cctest . CC0# stands for max_pw_width in # days. These are ‘parallel’ runs as opposed to sequential. 02287A_CC02.DIFFERENCE;1 • Didn’t make the 3 G & H 9583 visits move at all. • 09583F7, F8, G0, G1 lost PWs, but remained schedulable! 02287A_CC03.DIFFERENCE;1 • 12-hour windows for the 3 G & H 9583 visits. • The 4 9583 F & G visits kept their PWs!!! 02287A_CC04.DIFFERENCE;1 • Visits 09583G2, G7, H0 turned into 0-second PWs. • 09583F7, F8, G0, G1 lost PWs, but remained schedulable! 02287A_CC05.DIFFERENCE;1 • Visits 09583G2, G7, H0 turned into 0-second PWs. • 09583F7, F8, G0, G1 lost PWs, but remained schedulable! Test Conclusions: Results for particular visits are sensitive to the max_pw_length, but not necessarily in a predictable linear fashion. A. Vick’s summary of shortcomings of the conflict catcher post-analysis, with peanut gallery thoughts: Date: Wed, 20 Nov 2002 15:26:28 -0500 (EST) From: Alison Vick <sherwin@stsci.edu> To: jordan@stsci.edu Subject: conflict catcher short comings Our resources are often used to control things like the end of cycle, or a region we want undersubscribed, or at least not have new things moved into it. Thus, the resources used by the conflict catcher are often not what is really available. This inconsistancy causes unnecessary conflicts and moves. >> ij: can SPIKE be told to ignore either the visits or the areas? should an alternate resource parameter file input be made possible? an override? For example, could we simply do a (load “/ data/operational1/control/mean-resources”) on the very first line of the plan-window-conflictparameters.lisp file, before the define-conflict-repair-iterations method is called? It appears that these new resources may be specifiable in plan-window-conflict-parameters.lisp . Sometimes visits lose plan windows. Always unacceptable. >>ij: need to identify why SPIKE lost the PWs. What visits? • reason 1: CC couldn’t assign new pws and does not try to put them back. OPR needs filing. • are there other reasons? I noticed at least one time when the conflict catcher made zero second plan windows. not good. >>ij: need to identify the case and determine why. I seem to recall that the only candidate window available was a 0-sec window as passed thru the critics?? There should be a reason written out in the LRP, moved because of conflict catcher so that the pw moves can be easily identified. >>ij: OPR46799. On back to back runs, (or possibly runs on subsequent released LRPs) some visits tend to "flip" back and forth between regions. >>ij: violates resource in 1 area, so SPIKE moves it...regardless of violating new region’s resources AFTER placing visits there. Next SPIKE run undoes what was done. Check needed? SPIKE OPR? 4 days looked goo for the max-plan-window size. But maybe we really want 1-2 days, since this tool is looking for small window "absolute" conflicts, not just small area oversubscription. >>ij: or multiple runs as is currently being done...:max-pw-width-in-days 1 and then :max-pwwidth-in-days 2. Question: does it matter the order? Which is better 2,1 or 1,2? False positives are generated because the resources are conservative. This is independent of the first point about non-realisitic resources in that even in the regions of highest resources, there are really more orbits available than we tell spike. Is this why we saw 9503 continually moving. >>ij: see earlier comment: this may be solvable in two ways, either in the pwcc-control-file.lisp or the "pwcc-test" file. <end email> Overall summary: The conflict catcher is not ready for daily automation. Numerous changes would be necessary before it could be considered for such a state. However, manual supervision of the changes on a weekly schedule (during baselining) have been made an element of standard procedure, though this adds time required to baselining process since the changes need to be reviewed individually. It appears that iterated runs using different (but still small, apparently < 5 days for max_pw_length). max_plan_window_length is the current best identified technique for catching as many conflicts as possible without losing too many conflicts due to conflict intervals being ‘merged’ in to too large intervals. Major items which need to be addressed in the use and function of the conflict catcher are: What is the best way to specify new resources to be loaded when the CC runs? CC needs to ‘clean up’ unplanned visits (try putting back). 0-second plan windows (generic SPIKE problem? Critics?) ‘Flipping’ of visits in sequential runs.