QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. This work was conducted at West Virginia University and the Jet Propulsion Laboratory under grants with NASA's Software Assurance Research Program. Reference herein to any specific commercial product, process, or service by trademark, manufacturer, or otherwise, does not constitute or imply its endorsement by the United States Government. If you fix everything you lose fixes for everything else Tim Menzies (WVU) Jairus Hihn (JPL) Oussama Elrawas (WVU) Dan Baker (WVU) Karen Lum (JPL) International Workshop on Living with Uncertainty, IEEE ASE 2007, Atlanta, Georgia, Nov 5, 2007 tim@menzies.us oelrawas@mix.wvu.edu What does this mean? A supposedly np-hard task abduction over firstorder theories nogood/2 Q: for what models does (a few peeks) = (many hard stares)? 2 A: models with “collars” Grow – Monte Carlo a model – For each run “Collar” variables set the other variables – – Master variables – favoring settings with better scores If “collars”, then – – – … small rules … … learned quickly … … will suffice Kohavi & John ‘97 Back doors – Crawford & Baker ‘94 Feature subset selection – DeKleer ’85 Rule generation experiments, Amarel in the 60s Minimal environments Score each output Add score to each input settings Harvest – Narrows – Picking input settings at random Williams et al ‘03 Etc Implications for uncertainty? Feather & Menzies RE’02 3 For example STAR: collars + simulated annealing on Boehm’s USC’s software process models USC software process models for effort, defects, threats – – – y[i] = impact[i] * project[i] + b[i] for i {1,2,3,…} ≤ project[i] ≤ : uncertainty in project description ≤ impact[i] ≤ : uncertainty in model calibration controllable uncontrollable Random solution pick project[i] and impact[i] from any .. , .. .. set via domain knowledge; e.g. process maturity in 3 to 5 – range of .. known from history; – – Score solution by effort (Ef), defects (De) and Threat (Th) 4 Two studies y[i] = impact[i] * project[i] + b[i] one two Certain methods – – Using much historical data Learn the magnitude of the impact[i] relationship – With fixed impact[I] Tame Monte Carlo at andom across the project[i] settings E.g. uncontrollables via historical records – Regression-based tools that learn impact[I] from historical records – 93 records of JPL systems – SCAT: – JPL’s current methods 2CEE: WVU’s improvement over SCAT (currently under test) Methods with more uncertainty – Using no historical data – Monte Carlo at random across the project[i] settings and impact[i] settings E.g. – STAR – Monte Carlo a model – Score each output – Sort settings by their “C”, – “C”= cumulative score Rule generation experiments, favoring settings with better “C”. 5 Bad Inside STAR 1. sampling - simulated annealing 2. summarizing - post-processor for setting Sx { value[setting] += E } Sort all settings by their value – – – 38 not-so- good ideas Ignore uncontrollables impact[I] Assume the top (1 ≤ i ≤ max) project[I] settings Randomly select the rest “Policy point” : – Good smallest I with lowest E Median = 50% percentile – Spread = (75-50)% percentile 22 good ideas 6 SCAT vs 2CEE vs STAR project[i] 7 Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] 8 Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] 9 Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] Flight (effort) ed ia 2C sp n EE re ad m ed ia ST sp n AR re ad m ed ia sp n re ad m Spread : (75 - 50)% SC AT Median: 50% point 1600 1400 1200 1000 800 600 400 200 0 10 Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] Flight (effort) SC AT m Spread : (75 - 50)% ed ia 2C sp n EE re ad m ed ia ST sp n AR re ad m ed ia sp n re ad Median: 50% point 1600 1400 1200 1000 800 600 400 200 0 STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% 11 Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% OSP2 (effort) OSP (effort) 1500 1000 500 m ed ia n 2C spr ea EE d m ed ia n ST spr ea AR d m ed ia n sp re ad 0 ia n 2C spr e EE ad m ed ia n ST spr ea AR d m ed ia n sp re ad 2000 450 400 350 300 250 200 150 100 50 0 m ed 2500 SC AT m ed ia n 2C spr ea EE d m ed ia n ST spr e AR ad m ed ia n sp re ad 800 700 600 500 400 300 200 100 0 SC AT SC AT m Spread : (75 - 50)% ed ia 2C sp n EE re ad m ed ia ST sp n AR re ad m ed ia sp n re ad Median: 50% point 1600 1400 1200 1000 800 600 400 200 0 Ground (effort) SC AT Flight (effort) STAR/2cee= 400/1600= 25% STAR/2cee= 180/ 400= 45% STAR/scat= 400/1900= 21% STAR/scat= 180/1900= 60% 12 Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% OSP2 (effort) OSP (effort) 1500 1000 500 m ed ia n 2C spr ea EE d m ed ia n ST spr ea AR d m ed ia n sp re ad 0 ia n 2C spr e EE ad m ed ia n ST spr ea AR d m ed ia n sp re ad 2000 450 400 350 300 250 200 150 100 50 0 m ed 2500 SC AT m ed ia n 2C spr ea EE d m ed ia n ST spr e AR ad m ed ia n sp re ad 800 700 600 500 400 300 200 100 0 SC AT SC AT m Spread : (75 - 50)% ed ia 2C sp n EE re ad m ed ia ST sp n AR re ad m ed ia sp n re ad Median: 50% point 1600 1400 1200 1000 800 600 400 200 0 Ground (effort) SC AT Flight (effort) STAR/2cee= 400/1600= 25% STAR/2cee= 180/ 400= 45% STAR/scat= 400/1900= 21% STAR/scat= 180/1900= 60% 13 Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% Ignoring historical data is useful (!!!?) OSP2 (effort) OSP (effort) 1500 1000 500 m ed ia n 2C spr ea EE d m ed ia n ST spr ea AR d m ed ia n sp re ad 0 ia n 2C spr e EE ad m ed ia n ST spr ea AR d m ed ia n sp re ad 2000 450 400 350 300 250 200 150 100 50 0 m ed 2500 SC AT m ed SC AT SC AT STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% ia n 2C spr ea EE d m ed ia n ST spr e AR ad m ed ia n sp re ad 800 700 600 500 400 300 200 100 0 m Spread : (75 - 50)% 1600 1400 1200 1000 800 600 400 200 0 ed ia 2C sp n EE re ad m ed ia ST sp n AR re ad m ed ia sp n re ad Median: 50% point Ground (effort) SC AT Flight (effort) STAR/2cee= 400/1600= 25% STAR/2cee= 180/ 400= 45% STAR/scat= 400/1900= 21% STAR/scat= 180/1900= 60% 14 Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% Ignoring historical data is useful (!!!?) OSP2 (effort) OSP (effort) 1500 1000 500 m ed ia n 2C spr ea EE d m ed ia n ST spr ea AR d m ed ia n sp re ad 0 ia n 2C spr e EE ad m ed ia n ST spr ea AR d m ed ia n sp re ad 2000 450 400 350 300 250 200 150 100 50 0 m ed 2500 SC AT m ed SC AT SC AT STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% ia n 2C spr ea EE d m ed ia n ST spr e AR ad m ed ia n sp re ad 800 700 600 500 400 300 200 100 0 m Spread : (75 - 50)% 1600 1400 1200 1000 800 600 400 200 0 ed ia 2C sp n EE re ad m ed ia ST sp n AR re ad m ed ia sp n re ad Median: 50% point Ground (effort) SC AT Flight (effort) STAR/2cee= 400/1600= 25% STAR/2cee= 180/ 400= 45% STAR/scat= 400/1900= 21% STAR/scat= 180/1900= 60% 15 Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] 2000 1500 1000 500 STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% Ignoring historical data is useful (!!!?) ia n 2C spr ea EE d m ed ia n ST spr ea AR d m ed ia n sp re ad m ed SC AT ia n 2C spr ea EE d m ed ia n ST spr e AR ad m ed ia n sp re ad 0 450 400 350 300 250 200 150 100 50 0 ia n 2C spr e EE ad m ed ia n ST spr ea AR d m ed ia n sp re ad 2500 m ed SC AT SC AT STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% OSP2 (effort) OSP (effort) m ed 800 700 600 500 400 300 200 100 0 m Spread : (75 - 50)% 1600 1400 1200 1000 800 600 400 200 0 ed ia 2C sp n EE re ad m ed ia ST sp n AR re ad m ed ia sp n re ad Median: 50% point Ground (effort) SC AT Flight (effort) STAR/2cee= 400/1600= 25% STAR/2cee= 180/ 400= 45% STAR/scat= 400/1900= 21% STAR/scat= 180/1900= 60% If you fix everything, you lose fixes for everything else16 Luke, trust the force, I mean, collars IEEE Computer, Jan 2007 “The strangest thing about software” Extra Material Related work Abduction : – World W = minimal set of assumptions (w.r.t. size) such that – Feather, DDP, treatment learning – Framework for – T A => G Not(T U A => error) validation, diagnosis, planning, monitoring, explanation, tutoring, test case generation, prediction,… Theoretically slow (NP-hard) but this should be practical: Abduction + stochastic sampling Find collars Learn constraints on collars Optimization of requirement models XEROC PARC, 1980s, qualitative representations (QR) – – not overly-specific, Quickly collected in a new domain. – Used for model diagnosis and repair – Can found creative solutions in larger space of possible qualitative behaviors, than in the tighter space of precise quantitative behaviors 19 Possible optimizations (not used here) STAR, an example of a general process: – – Stochastic sampling – Sort settings by “value” – Rule generation experiments – – – – favoring highly “value”-ed settings See also, elite sampling in the cross-entropy method If SA convergence too slow Try moving back select into the SA; – Constrain solution mutation to prefer highly “value”-ed settings BORE (best or rest) – n runs Best= top 10% scores Rest = remaining 90% {a,b} = frequency of discretized range in {best, rest Sort settings by Ask -1 * (a/n)2 / (a/n + b/n) me why, off-line Other valuable tricks: – Incremental discretization: Gama&Pinto’s PID + Fayyad&Irani – Limited discrepancy search: Harvey&Ginsberg – Treatment learning: Menzies&Yu 20 “Uncertainty helps planning” (questions? comments?) At the “policy point”, STAR’s random solutions are surprisingly accurate LC : learn impact[i] via regression (JPL data) STAR: no tuning, randomly pick impact[i] Diff = ∑ mre(lc)/ ∑ mre(star) Mre = abs(predicted - actual) /actual ∑ mre(lc) / ∑ mre(star) { “” “”} strategic diff diff diff diff same same same same same tactical ground 66% 63% all 91% 75% OSP2 99% 125% OSP 112% 111% flight 101% 121% same at {95, 99}% confidence (MWU) Why so little Diff (median= 75%)? – diff Most influential inputs tightly constrained 22 (Model uncertainty = collars) << inputs In many models, a few “collar” variables set the other variables – – – – – – Collars appear in all execution traces (by definition) – Narrows (Amarel in the 60s) Minimal environments (DeKleer ’85) Master variables (Crawford & Baker ‘94) Feature subset selection (Kohavi & John ‘97) Back doors (Williams et al ‘03) See “The Strangest Thing About Software (IEEE Computer, Jan’07)” You don’t have to find the collars, they’ll find you So, to handle uncertainty – – – – Write a simulator Stagger over uncertainties From stagger, find collars Constrain collars This talk: a very simple example of this process 23 Comparisons Standard software process modeling – Models written more than run (PROSIM community) Limited sensitivity analysis Limited trade space – Or, expensive, error-prone, incomplete data collection programs Point solutions Here: – – – – No data collection Found stable conclusions within a space of possibilities Search : very simple Solution, not brittle With trade-off space 22 good ideas, sorted 24 Bad Summary Living with uncertainty – – Here, the smallest change to simulating annealing Useful: – – Good Simple: – Sometimes, simpler than you may think more useful than you might think Sometimes uncertainty can teach you more than certainty If you fix everything, you lose fixes to everything else Collars control certainty – – Uncertainty plus constrained collars more certainty Also, can drive model to better performance An example you can explain to any business user An example you can explain to any business user 22 good ideas, sorted 25