STAR Seeking New Frontiers in Cost Modeling Tim Menzies (WVU) Jairus Hihn (JPL)

advertisement
STAR
Seeking New Frontiers in Cost Modeling
Tim Menzies (WVU)
Jairus Hihn (JPL)
Oussama Elrawas (WVU)
Karen Lum (JPL)
Dan Baker (WVU)
22nd International Forum on COCOMO and Systems/Software Cost Modeling (2007)
Menzies/Hihn - 1
STAR
STAR has three key advancements over traditional methods and even 2cee
– Provides an integrated set of COCOMO models
• COCOMO II
• COQUALMO
• COCOMO II Risk (threats) Assessment Model
– Can be used to systematically analyze strategic and tactical policy decisions
• Searches for optimal combination of inputs that jointly reduce effort, defect rates
and threats
• Uses constraints to restrict search
– Free, Floating, Fixed
– Can be tuned/calibrated with constraint sets instead of traditional historical
data records
• Seek stable conclusions in space of all tunings
• Abduction: View it as an alternative to Bayesian methods based
STAR is an abductive inference engine that applies simulated annealing to a
treatment learner (TAR)
Menzies/Hihn - 2
Note
• This talk is an extension of
material presented in “The
Business Case for Automated
Software Engineering”
– IEEE ASE 2007
– Menzies, Elwaras, Hihn
Feather, Madachy, Boehm
http://menzies.us/pdf/ 07casease.pdf
Menzies/Hihn - 3
Method
•
Stagger across the space of known tunings and inputs (Monte Carlo)
•
For N staggers, score N runs by an index we call energy:
– Ef = (effort
- minEffort ) / (maxEffort - minEffort)
– De = (defects - minDefects ) / (maxDefects - minDefects)
– Th = (threats - minThreats) / (maxThreats - minThreats)
•
normalization
0 <= x <= 1
Save the one with lowest energy index
Menzies/Hihn - 4
How to Stagger
•
Simulated annealing (Von Neuman)
– Pick input ranges and internal
values at random
– Do many runs
• starting from “boiling hot”
(when you stagger around like a
drunk)
• to “cooler”
(No staggering walk straight to your
destination)
•
Keep track of multiple solutions
– Current
– New
– Best
Bad
Good
Best 10%
Sample runs from STAR
(after 500 runs, little improvement)
Menzies/Hihn - 5
Staggering the Tunings
Range of effort multipliers (COCOMO)
• COCOMO effort estimation
– Effort multipliers are straight (ish) lines
– when EM = 3 = nominal…
• multiple effort by one (I.e. nothing)
– i.e. they pass through the point {3,1};
Increase effort
cplx, data, docu
pvol, rely, ruse,
stor, time
decrease effort
acap, apex, ltex, pcap,
pcon, plex,sced,
site,toool
Menzies/Hihn - 6
After staggering,
select best things
•
•
Bad
Good
Sort all ranges by their
“goodness”
– Try the first ranked range,
– Then the first and second,
– Then the first and second
and third
– And so on
38 not-so- good ideas
Seek the “policy”
– The fewest ranges
– that most reduce
• threats,
• effort,
• defects
22 good ideas
Menzies/Hihn - 7
Staggering the inputs : 5 different ways
1
1.
COCOMO II: stagger over entire model input space
2
4
3
5
“Values” = fixed
“Ranges”= Loose (select within these ranges)
Menzies/Hihn - 8
Making Strategic Decisions
Strategic
All
pmat = 5
acap = 5
pcap = 5
apex = 5
ltex = 5
prec = 4
plex = 5
site = 6
peer reviews = 6
execution testing and
tools = 6
automatedanalysis=6
pcon=5
pcap=4
Full range of model
Flight
Ground
pmat=3
ltex=4
plex=4
automatedanalysis=6
site=6
apex=5
peer reviews=6
execution testing and
tools = 6
pcon=5
ltex=4
plex=4
site=6
peer reviews=6
automatedanalysis=6
pcon=5
prec=5
execution testing and
tools = 6
apex=5
prec=5
acap=5
pcap=5
prec=4
acap=5
pcap=5
pmat=2
Constrained by Jairus’ guess
at JPL environment
Menzies/Hihn - 9
Results : OSP
•
One advantage of this output display
– If you can’t accept the full policy…
– … you can see what trade-offs arise
with some partial policy
•
But partial polices cannot include many
choices. For example note the missing
values:
– Peer reviews < 6
– Execution testing & tools < 6
– Automated analysis < 5
Menzies/Hihn - 10
Results: OSP2
•
OSP2 was a more constrained
environment as it was a follow-on
from OSP and ‘inherited’ the
– Team
– Development Environment
– Design
– Etc.
•
Again note the missing values:
– Peer reviews < 6
– Execution testing & tools < 6
– Automated analysis < 6
Menzies/Hihn - 11
Results: all experiments
•
No point in half-hearted defect removal
– Never found in any policy
• Peer reviews in 1,2,3,4
• Execution testing & tools in 1,2,3,4
• Automated analysis in 1,2,3,4
•
Beware spurious generalities
– X= one of {cocomo or osp or osp2 or flight or ground}
– Y= one of {cocomo or osp or osp2 or flight or ground}
– Not(X = Y)
– X’s best policy is not Y’s best policy
– Exception …
•
… Use more automated analysis (model checking, etc)
– Automated analysis = 5 or 6 always in best policy
Menzies/Hihn - 12
Calibrating/Tuning Models
•
Traditional Approach Current cost models are tuned to local contexts
– LC (Boehm, 1981)
•
•
•
•
Traditional approach Next Step
– 2CEE (Menzies, Jalali, Baker, Hihn, Lum, 2007)
•
•
•
•
•
Tuned to local data using LC
– Hard to tell when old data no longer locally relevant
Suffers from the “large outlier problem”
Row pruning done heuristically
Tuned to local data using LC
Tunes and validates every time it runs
Tames outliers primarily with column pruning
Uses Nearest Neighbor for row pruning
– Not all flight software is equal
– Culls old data that is no longer relevant
– Both of these approaches require you get more data which may be hard to obtain
STAR
– Current research results suggest that we may be able to estimate almost as well without local
data and LC.
•
–
Constrain parameter ranges based on
•
•
–
Use est vs actual instead of energy as evaluation metric
project being estimated
knowledge of what typically varies in your environment
Assumes basic COCOMO tunings are ‘representative’
•
Seems reasonable
Menzies/Hihn - 13
Comparisons
diff
diff
diff
diff
diff
same
same
same
same
same
Mre = abs(predicted - actual) /actual
Diff = ∑ mre(lc)/ ∑ mre(star)
∑ mre(lc) / ∑ mre(star)
strategic
tactical
ground
66%
63%
all
91%
75%
OSP2
99%
125% 
OSP
112% 
111% 
flight
101% 
121% 
same at 95% confidence (MWU)
“” same at 99% confidence (MWU)
“”
Very little difference
– Half the time: insignificantly different
– Otherwise, median diffs = +/- 25%
Why so little difference?
– Most influential inputs tightly constrained
– Most of the variance comes from uncertainty
in the SLOC,
• Not from noise of internal staggering
Menzies/Hihn - 14
Download