Cost Estimation

advertisement
Software Cost Estimation
Strictly
speaking
effort!
강릉대학교 컴퓨터공학과
권기태
Agenda
1. Background
2. “Current” techniques
3. Machine learning techniques
4. Assessing prediction systems
5. Future avenues
Slide 2: 2016-03-14
1. Background
Scope:
 software projects
 early estimates
 effort ≠ cost
 estimate ≠ expected answer
Slide 3: 2016-03-14
What the Papers Say...
From Computing, 26 November 1998:
Defence system never worked
MoD project
loses £34m
The Ministry of Defence has been forced to write
off £34.6 million on an IT project it commissioned
in 1988 and abandoned eight years later, writes
Joanne Wallen.
The Trawlerman system, designed
...
Slide 4: 2016-03-14
The Problem
Software developers need to predict, e.g.
 effort, duration, number of features
 defects and reliability
But ...
little systematic data
noise and change
complex interactions between variables
poorly understood phenomena
Slide 5: 2016-03-14
So What is an Estimate?
An estimate is a prediction based
upon probabilistic assessment.
most likely
p
equal probability of under /
over estimate
0
effort
Slide 6: 2016-03-14
Some Causes of Poor Estimation
We don’t cope with political
problems that hamper the
process.
We don’t develop estimating
expertise.
We don’t systematically use
past experience.
Tom DeMarco
Controlling Software
Projects.
Management,
Measurement and
Estimation. Yourdon
Press: NY, 1982.
Slide 7: 2016-03-14
2. “Current” Techniques
Essentially a software cost estimation
system is an input vector mapped to an
output.
expert judgement
 COCOMO
 function points
 DIY models
Barry Boehm
“Software Engineering
Economics,” IEEE
Transactions on
Software Engineering,
vol. 10, pp. 4-21,
1984.
Slide 8: 2016-03-14
2.1 Expert Judgement
 Most widely used estimation technique
 No consistently “best” prediction system
 Lack of historical data
 Need to “own” the estimate
 Experts plus … ?
Slide 9: 2016-03-14
Expert Judgement Drawbacks
BUT
Lack of objectivity
Lack of repeatability
Lack of recall /awareness
Lack of experts!
Preferable to use
more than one
expert.
Slide 10: 2016-03-14
What Do We Know About Experts?
Most commonly practised technique.
Dutch survey revealed 62% of estimators used
intuition supplemented by remembered analogies.
UK survey - time to estimate ranged from 5
minutes to 4 weeks.
US survey found that the only factor with a
significant positive relationship with accuracy was
responsibility.
Slide 11: 2016-03-14
Information Used
Design requirements
Resources available
Base product/source code
(enhancement projects)
Software tools available
Previous history of product
...
Slide 12: 2016-03-14
Information Needed
Rules of thumb
Available resources
Data on past projects
Feedback on past estimates
...
Slide 13: 2016-03-14
Delphi Techniques?
Methods for
structuring group communication processes
to
solve complex problems.
Characterised by
iteration
anonymity
Devised by Rand Corporation
(1948). Refined by Boehm (1981).
Slide 14: 2016-03-14
Stages for Delphi Approach
1. Experts receive spec + estimation form
2. Discussion of product + estimation issues
3. Experts produce individual estimate
4. Estimates tabulated and returned to experts
5. Only expert's personal estimate identified
6. Experts meet to discuss results
7. Estimates are revised
8. Cycle continues until an acceptable degree
of convergence is obtained
Slide 15: 2016-03-14
Wideband Delphi Form
Project: X134
Date: 9/17/03
Estimator: Hyolee
Estimation round: 1
0
10
x
Key:
x*
20
x x!
x
30
x
x = estimate; x* = your estimate; x!
40
50
x
= median estimate
Slide 16: 2016-03-14
Observing Delphi Groups
Four groups of MSc student
Developing a C++ prototype for some simple
scenarios
Requested to estimate size of prototype (number
of delimiters)
Initial estimates followed by 2 group discussions
Recorded group discussions plus scribes
Slide 17: 2016-03-14
Delphi Size Estimation Results
Absolute errors
Estimation
Mean
Median Min
Max
Initial
Round 1
Round 2
371
219
271
160.5
40
40
2249
749
949
23
23
3
Slide 18: 2016-03-14
Converging Group
true size
Slide 19: 2016-03-14
A Dominant Individual
true size
Slide 20: 2016-03-14
2.2 COCOMO
Best known example of an algorithmic
cost model. Series of three models: basic,
intermediate and detailed.
Models assume relationships between:
size (KDSI) and effort
effort and elapsed time
MM  a.KDSI b
TDEV  c. MM d
Barry Boehm
“Software Engineering
Economics,” IEEE
Transactions on
Software Engineering,
vol. 10, pp. 4-21,
1984.
http://sunset.usc.edu/COCOMOII/
cocomo.html
Slide 21: 2016-03-14
COCOMO contd.
Model coefficients are dependant upon
the type of project:
organic: small teams, familiar
application
semi-detached
embedded: complex organisation,
software and/or hardware
interactions
Slide 22: 2016-03-14
COCOMO Cost Drivers
• product attributes
• computer attributes
• personnel attributes
• project attributes
Drivers hard to empirically validate.
Many are inappropriate for 1990's e.g.
database size.
Drivers not independent e.g. MODP and
TOOL.
Slide 23: 2016-03-14
COCOMO Assessment
Very influential, non-proprietory model.
Drivers help the manager understand
the impact of different factors upon
project costs.
Hard to port to different development
environments without extensive recalibration.
Vulnerable to mis-classification of
development type
Hard to estimate KDSI at the start of a
project.
Slide 24: 2016-03-14
2.3 What are Function Points?
A synthetic (indirect) measure derived from
a software requirements specification of
the attribute functionality.
This conforms closely to our notion of
specification size.
Uses:
effort prediction
productivity
Slide 25: 2016-03-14
Function Points (a brief history)
Albrecht developed FPs in mid 1970's at IBM.
Measure of system functionality as opposed to size.
Weighted count of function types derived from
specification:
interfaces
inquiries
inputs / outputs
files
A. Albrecht and J. Gaffney, “Software
function, source lines of code, and
development effort prediction: a
software science validation,” IEEE
Transactions on Software Engineering,
vol. 9, pp. 639-648, 1983.
C. Symons, “Function Point Analysis:
Difficulties and Improvements,” IEEE
Transactions on Software Engineering,
vol. 14, pp. 2-11, 1988.
Slide 26: 2016-03-14
Function Point Rules
Weighted count of different types of functions:
external input types (4) e.g. file names
external output types (5) e.g. reports, msgs.
inquiries (4) i.e. interactive inputs needing a response
external files (7) i.e. files shared with other software systems
internal files (10) i.e. invisible outside system
The unadjusted count (UFC) is the weighted sum
of the count of each type of function.
Slide 27: 2016-03-14
Function Types
Type
Simple
Average
Complex
External input
External output
Logical int. file
Ext. interface
Ext. inquiry
3
4
7
5
3
4
5
10
7
4
6
7
15
10
6
Slide 28: 2016-03-14
Adjusted FPs
14 factors contribute to the technical
complexity factor (TCF), e.g. performance,
on-line update, complex interface.
Each factor is rated 0 (n.a.) - 5 (essential).
TCF = 0.65 + (sum of factors)/100
Thus TCF may range from 0.65 to 1.35, and
FP = UFC*TCF
Slide 29: 2016-03-14
Technical Complexity Factors
Data communications
Distributed functions
Performance
Heavily used
configuration
Transaction rate
Online data entry
End user efficiency
Online update
Complex processing
Reusability
Installation ease
Operational ease
Multiple sites
Facilities change
Slide 30: 2016-03-14
Function Points and LOC
Language
LOC
Assembler
C
COBOL
Modula-2
4GL
Query languages
Spreadsheet
per
320
150
106
71
40
16
6
FP
(128)
(105)
(80)
(20)
(13)
Behrens (1983), IEEE TSE 9(6).
C. Jones “Applied Software Measurement,
McGraw-Hill (1991)
Slide 31: 2016-03-14
FP Based Predictions
Effort v FPs at XYZ Bank
Simplest form is:
effort = FC + p * FP
Need to determine local
productivity, p and fixed
costs, FC.
40000
30000
E
F
F
O 20000
R
T
10000
500
1000
1500
2000
FP
Slide 32: 2016-03-14
All environments are not equal
Productivity figures in FPs per 1000
hours:
IBM 29.6
Finnish 99.5
Canada 58.9
Mermaid 37.0
US 28.5
training
personnel
management
techniques
tools
applications
etc.
Slide 33: 2016-03-14
Function Point Users
Widely used, (e.g. government, financial
organisations) with some success:
monitor team productivity
cost estimation
Most effective where homogeneous
environment
Variants include Mk II Points and Feature
Points
Slide 34: 2016-03-14
Function Point Weaknesses
Subjective counting (Low and Jeffery report
30% variation between different analysts).
Hard to automate.
Hard to apply to maintenance work.
Not based upon organisational needs, e.g. is
it productive to produce functions irrelevant to
the user?
Oriented to traditional DP type applications.
Hard to calibrate.
Frequently leads to inaccurate prediction
systems.
Slide 35: 2016-03-14
Function Point Strengths
The necessary data can be available
early on in a project.
Language independent.
Layout independent (unlike LOC)
More accurate than estimated LOC?
What is the alternative?
Slide 36: 2016-03-14
2.4 DIY models
1000
750
A
C
T
500
250
75
150
225
FILES
Predicting effort using number of files
Slide 37: 2016-03-14
A Non-linear Model
To introduce an economies or diseconomies of
scale exponent:
e
effort = p * S
where 0<e.
An empirical study of 60 projects at IBM
Federal Systems Division during the mid 1970s
concluded that effort could be modelled as:
effort (PM) = 5.2 * KLOC0.91
Slide 38: 2016-03-14
Productivity and Size
Effort (PM)
Size (KLOC)
KLOC/PM
42.27
10
0.24
79.42
20
0.25
182.84
50
0.27
343.56
100
0.29
2792.57
1000
0.36
Productivity and Project Size using the Walston and
Felix Model
Slide 39: 2016-03-14
Productivity v Size
Slide 40: 2016-03-14
Bespoke is Better!
Model
Researcher
Basic COCOMO
FP
SLIM
ESTIMACS
COCOMO
Intermediate COCOMO
Kemerer
Kemerer
Kemerer
Kemerer
Miyazaki & Mori
Kitchenham
MMRE
601%
103%
772%
85%
166%
255%
Slide 41: 2016-03-14
So Where Are We?
• A major research topic.
• Poor results “off the shelf”.
• Accuracy improves with
calibration but still mixed.
• Needs accurate, (largely)
quantitative inputs.
Slide 42: 2016-03-14
3. Machine Learning Techniques
A new area but demonstrating
promise.
System “learns” how to
estimate from a training set.
Doesn’t assume a continuous
functional relationship.
In theory more robust against
outliers, more flexible types of
relationship.
Du Zhang and Jeffrey Tsai, “Machine
Learning and Software Engineering,”
Software Quality Journal, vol. 11, pp.
87-119, 2003.
Slide 43: 2016-03-14
Different ML Techniques
Case based reasoning (CBR) or
analogical reasoning
Neural nets
Neuro-fuzzy systems
Rule induction
Meta-heuristics e.g. GAs,
simulated annealing
Slide 44: 2016-03-14
Case Based Reasoning
problem
new
case
RETRIEVE
RETAIN
new case
previous
cases
tested /
repaired
case
retrieved
case
general knowledge
REUSE
REVISE
confirmed
solution
solved
case
suggested
solution
Slide 45: 2016-03-14
Using CBR
Characterise a project e.g.
no. of interrupts
size of interface
development method
Find similar completed projects
Use completed projects as a basis
for estimate (with adaptation)
Slide 46: 2016-03-14
Problems
Finding the analogy, especially
in a large organisation.
Determining how good the
analogy is
Need for domain knowledge
and expertise for case
adaptation.
Need for systematically
structured data to represent
each case.
Slide 47: 2016-03-14
ANGEL
ANaloGy Estimation tooL (ANGEL)
http://dec.bmth.ac.uk/ESERG/ANGEL/
Slide 48: 2016-03-14
ANGEL Features
Shell
n features (continuous or
categorical)
Brute force search for optimal
subset of features — O((2**n) -1)
Measures Euclidean distance
(standardised dimensions)
Uses k nearest cases.
Simple adaptation strategy
(weighted mean). With k=1
becomes a NN technique
Slide 49: 2016-03-14
CBR Results
A study of 275 projects from 9
datasets suggests that CBR
outperforms more traditional
statistical methods e.g.
stepwise regression.
Shepperd, M. Schofield, C.
IEEE Trans. on Softw. Eng.
23(11), pp736-743.
Slide 50: 2016-03-14
Sensitivity Analysis
200
180
160
120
T1
T2
T3
100
80
60
40
20
31
29
27
25
23
21
19
17
15
13
11
9
7
5
0
3
% MMRE
140
No. of Proj ects
Slide 51: 2016-03-14
Independent Replication

 Stensrud and Myrtviet (1998, 99)
Jeffery and Walkerden (1999)

Niessink and van Vliet (1997)
no search for best subset of features

Briand and El Eman (1998)
approx. 30 features so exhaustive search
for best subset not possible
homogeneity + well defined relationships
favour regression techniques
Slide 52: 2016-03-14
Artificial Neural Nets
FP
# files
effort
# screens
team
size
Input
layer
Hidden layers
Output
layer
A multi-layer feed forward ANN
Slide 53: 2016-03-14
ANN Results
Study
Learning
Algorithm
Venkatachalam BP
Wittig & Finnie BP
n
63
81
136
109
28
N/A
Results
“Promising”
MMRE = 29%
Jorgenson
Serluca
Karunanithi et
al.
BP
BP
CascadeCorrelation
Samson et al
Srinivasan &
Fisher
Hughes
BP
BP
MMRE = 100%
MMRE = 76%
“More accurate
than algorithmic
models”
63 MMRE = 428%
78 MMRE = 70%
BP
33 MMRE = 55%
BP = back propagation learning algorithm
Slide 54: 2016-03-14
ANN Lessons
need large training sets
deal with heterogeneous datasets
opaque (poor explanatory power)
sensitive to choices of topology and
learning algorithm
problems of over adaptation (neuro-fuzzy
approaches?)
Slide 55: 2016-03-14
Rule Induction
IF module_size > 100 THEN
high_development_effort
ELSE
IF developer_experience < 2
THEN
low_development_effort
ELSE
moderate_development_effort
C. Mair, G. Kadoda, M. Lefley, K. Phalp, C. Schofield, M.
Shepperd, and S. Webster, “An investigation of machine
learning based prediction systems,” J. of Systems Software,
vol. 53, pp. pp23-29, 2000.
Slide 56: 2016-03-14
Machine Learning Summary
Need training sets
ANNs require significant sized sets n≈50
Configuring the system can be a hard
search problem
Don’t need to specify the form of the
relationship in advance
Can produce more accurate results than
other methods
Adapts as new cases acquired
Slide 57: 2016-03-14
4. Assessing Estimation Systems
 accuracy
 tolerant of measurement error
 explanatory power
 ease of use
 availability of inputs
 ...
Slide 58: 2016-03-14
Assessing Model Performance
Absolute error
Percentage error and mean percentage
error
Magnitude of relative error and mean
magnitude of relative error (MMRE)
PRED(n)
Sum of the squares of the residuals
(SSR)
...
Slide 59: 2016-03-14
Absolute Error
Epred  Eact
But it fails to take into account the
size of the project.
A 6 PM error is serious if predicted
is only 3 PMs, yet, a 6 PM error for
a 3,000 PM project is a triumph.
Slide 60: 2016-03-14
Percentage Error
Epred  Eact
Eact
or for more than one estimate the mean percentage
error:
1 in Epred  Eact
.
i

n i1  Eact
where n is the number of estimates.
• Reveals any systematic bias to a predictive
model, e.g. if the model always over-estimates
then the percentage error will be positive.
• A weakness is that it will mask compensating
errors
Slide 61: 2016-03-14
MMRE
MMRE is defined as:
1 in 
Epred  Eact 

.
i

n i1 
Eact
Masks any systematic bias but highlights
overall accuracy.
Penalises regression derived models based
on least squares algorithms.
Slide 62: 2016-03-14
PRED(n)
Conte et al. suggest ≤ 25% as an
indicator of an acceptable prediction
model.
PRED(25) measures the % of
predictions that lie within 25% of actual
values.
PRED(25) ≥ 75% is a typical target
(seldom achieved!)
Slide 63: 2016-03-14
Sum of the Squared Residuals
If you are risk averse it penalises large
deviations more than small ones
SSR = ∑ (Epred-Eact)2
Can also compute mean square error.
Slide 64: 2016-03-14
A Comparison Case Study
Statistic
R-squared
LSR
0.28 
Robust
0.25
Median
0.26
MMRE
0.78
0.62 
0.62 
Pred (25)
45% 
35%
35%
0.78
0.77 
Balanced MMRE 0.84
Slide 65: 2016-03-14
So What’s Going On?
The ith residual is
yˆ i  yi
 central tendency (mean, median)
 spread (variance, kurtosis + skewness)
M. J. Shepperd, M. H. Cartwright,
and G. F. Kadoda, “On building
prediction systems for software
engineers,” Empirical Software
Engineering, vol. 5, pp175-182,
2000.
Slide 66: 2016-03-14
Estimation Objectives
Objective
Indicator
Type
Risk averse
sum of squares
spread
Error minimising
median absolute error
spread
Portfolio
total error
centre
Slide 67: 2016-03-14
5. Summary
Accuracy is a non-trivial concept
No ‘best’ technique
Algorithmic models need to be calibrated
Simple linear models can be surprisingly
effective
ANNs need large, not necessarily
homogeneous training sets
Evidence to suggest that CBR is often the
most accurate and most robust technique
Slide 68: 2016-03-14
Some Estimation Guidelines
Collect data
Use more than one estimating technique.
Minimise the number of cost drivers /
coefficients in a model to facilitate calibration:
smaller, more homogeneous data sets
look for simple solutions first
Exploit any local structure or standardisation.
Remember an estimate is a probabilistic
statement (bounds?).
Provide feedback for estimators.
Slide 69: 2016-03-14
Future Avenues
Great need for useful prediction systems
Consider the nature of the prediction problem
Combining prediction systems
Collaboration with experts
Managing with little or no systematic data
Slide 70: 2016-03-14
Experts plus … ?
Experiment by Myrtveit and Stensrud
using project managers at Andersen
Consulting
Asked subjects to make predictions
Found expert+tool significantly better than
either expert or tool alone.
? What type of estimation systems are
easiest to collaborate with?
I. Myrtveit and E. Stensrud, “A
controlled experiment to assess
the benefits of estimating with
analogy and regression models,”
IEEE Trans on Softw. Eng, 25,
pp510-525, 1999.
Slide 71: 2016-03-14
Download