A COCOMO Extension for Software Maintenance

advertisement
University of Southern California
Center for Systems and Software Engineering
A COCOMO Extension
for Software Maintenance
25th International Forum on COCOMO
and Systems/Software Cost Modeling
Vu Nguyen, Barry Boehm
November 2nd, 2010
© 2010, USC-CSSE
1
University of Southern California
Center for Systems and Software Engineering
Outline
• Motivation
• Problem and A Solution
• COCOMO Extension for SW Maintenance
– Sizing method
– Effort model
• Results
– Data collection results
– Calibrations
• Conclusions
© 2010, USC-CSSE
2
University of Southern California
Center for Systems and Software Engineering
Software Maintenance
• Work of modifying, enhancing, and providing
cost-effective support to the existing software
• Characteristics of maintenance projects
– Constrained by legacy system
•
•
•
•
Quality of the system
Requirements, architecture and design
System understandability
Documentation
© 2010, USC-CSSE
3
University of Southern California
Center for Systems and Software Engineering
Magnitude of Software Maintenance
• Majority of software costs incur after the first
operational release [Boehm 1981]
% of Software Cost
Maintenance vs. Total Software Cost
100
90
80
70
60
50
40
30
20
10
0
Others
Maintenance
Zelkowitz et
al. (1979)
McKee
(1984)
Moad
(1990)
Erlikh
(2000)
Studies
Software maintenance cost versus total software cost
© 2010, USC-CSSE
4
University of Southern California
Center for Systems and Software Engineering
Importance of Software Estimation in
Managing Software Projects
• Estimation is a key factor determining success or
failure of software projects
– Two out of three most-cited project failures are related to resource
estimation, CompTIA survey [Rosencrance 2007]
• Cost estimate is key information for investment,
project planning and control, etc.
• Many software estimation approaches have been
proposed and used in industry
– E.g., COCOMO, SEER-SEM, SLIM, PRICE-S, Function Point
Analysis
© 2010, USC-CSSE
5
University of Southern California
Center for Systems and Software Engineering
Problem and Solution
• These models are built on the assumptions of new
development projects
• Problem is that these assumptions do not always hold
in software maintenance due to differences between
new development and maintenance
– Low estimation accuracies achieved
Solution: Extending COCOMO II to support estimating
maintenance projects
Objective: Improving the estimation performance
© 2010, USC-CSSE
6
University of Southern California
Center for Systems and Software Engineering
COCOMO II for Maintenance
• An extension of COCOMO II
– COCOMO is the non-proprietary most popular model
– COCOMO has attracted many independent validations and
extensions
• Designed to estimate effort of a software release
• Has two components
– Maintenance Sizing Model
– Effort Model
• Supports maintenance types
– Enhancement
– Error corrections
© 2010, USC-CSSE
7
University of Southern California
Center for Systems and Software Engineering
COCOMO II for Maintenance – Extensions
• Maintenance Sizing Model
– Uniting Adaptation/Reuse and Maintenance models
– Redefining size parameters DM, CM, and IM
• Using deleted SLOC from modified modules
• Method to determine actual equivalent SLOC from code
• Effort Model
– Excluding RUSE and SCED cost drivers from the model
– Revising rating levels for personnel attributes
– Providing a reduced-parameter model
– Providing a new set of rating scales for the cost drivers
© 2010, USC-CSSE
8
University of Southern California
Center for Systems and Software Engineering
Software Maintenance Sizing
• Size is a key determinant of effort
• Sizing method has to take into account different types
of code
Preexisting Code
Delivered Code
Reused Modules
External Modules
Existing System
Modules
Manually develop
and maintain
Automatically
translate
Adapted Modules
New Modules
Automatically
Translated Modules
Types of Code
© 2010, USC-CSSE
9
University of Southern California
Center for Systems and Software Engineering
Software Maintenance Sizing (cont’d)
• Computing Equivalent SLOC:
– New Modules:
KSLOCadded
– Adapted Modules:
EKSLOC adapted  AKLOC * AAM
AKLOC : KSLOC of the adapted modules before changes
– Reused Modules:
EKSLOCreused  0.3 * RKSLOC * IM reused
RKSLOC: KSLOC of the reused modules
– Total Equivalent KSLOC:
EKSLOC  KSLOC added  EKSLOC adapted  EKSLOC reused
© 2010, USC-CSSE
10
University of Southern California
Center for Systems and Software Engineering
COCOMO Effort Model for Maintenance
• Using the same COCOMO II form, non-linear
SF

PM  A * Size
*  EM
B
Where,
PM – project effort measured in person-month
A – a multiplicative constant, calibrated using data sample
B – an exponent constant, calibrated using data sample
Size – software size measured in EKSLOC
EM – 15 effort multipliers, cost drivers that have an multiplicative effect on effort
SF – 5 scale factors, cost drivers that have an exponential effect on effort
• Linearizing the model using log-transformation
log(PM) = 0 + 1 log(Size) + i SFi log(Size) + j log(EMj)
© 2010, USC-CSSE
11
University of Southern California
Center for Systems and Software Engineering
Data Collection
• Delphi survey
– Surveying experts about rating scales of cost drivers
• Sample data
– Collecting data of completed maintenance projects from
industry
– Following inclusion criteria, e.g.,
• Starting and ending dates are clear
• Include only major releases with Equivalent SLOC no less than 2000 SLOC
• Maintenance type: error corrections, enhancements
Release N
Project starts for
Release N+1
Release N+1
Project starts for
Release N+2
Timeline
Baseline 1
Maintenance project N+1
Baseline 2
Release Period
© 2010, USC-CSSE
12
University of Southern California
Center for Systems and Software Engineering
Calibration
• Process of fitting data to the model to adjust its
parameters and constants
Initial rating
scales for
cost drivers
Delphi survey of 8 experts
(Expert-judgment estimates)
Model Calibration
New rating
scales for cost
drivers and
constants
Calibration Techniques:
- Ordinary Least Squares Regression (OLS)
- Bayesian Analysis [Boehm 2000]
- Constrained Regression [Nguyen 2008]
Sample data
80 data points from 3 organizations
© 2010, USC-CSSE
13
University of Southern California
Center for Systems and Software Engineering
Data Collection Results
• Delphi Survey Results
– 8 surveys collected from experts in the field
– Considerable changes seen in the personnel factors
R E LY
1.25
F LE X
1.26
T EA M
1.3
P R EC
1.31
Parameter
PMAT
PREC
TEAM
FLEX
RESL
PCAP
RELY
CPLX
TIME
STOR
ACAP
PLEX
LTEX
DATA
DOCU
PVOL
APEX
PCON
TOOL
SITE
1.35
ST OR
DATA
1.38
R ESL
1.39
P M AT
1.41
P LE X
1.45
P VOL
1.46
LT E X
1.46
D OC U
1.52
T OOL
1.55
T IM E
1.55
S IT E
1.56
A P EX
1.6
1.69
P C ON
1.77
ACAP
1.83
P CAP
C P LX
2.29
1.00
1.50
2.00
Delphi PR
1.41
1.31
1.30
1.26
1.39
1.83
1.22
2.29
1.55
1.35
1.77
1.45
1.46
1.38
1.52
1.46
1.60
1.69
1.55
1.56
COCOMO II.2000 PR Difference
1.43
-0.02
1.33
-0.02
1.29
0.01
1.26
0.00
1.38
0.01
1.76
0.07
1.24
-0.02
2.38
-0.09
1.63
-0.08
1.46
-0.11
2.00
-0.23
1.40
0.05
1.43
0.03
1.42
-0.04
1.52
0.00
1.49
-0.03
1.51
0.09
1.59
0.10
1.50
0.05
1.53
0.04
2.50
Differences in PRs between COCOMO II.2000 and Delphi Results
Productivity Ranges (PRs)
© 2010, USC-CSSE
14
University of Southern California
Center for Systems and Software Engineering
Data Collection Results (cont’d)
• Sample data
– 86 releases in 24 programs (6 releases are outliers)
Releases
Source
Statistics
64
A large organization member of
CSSE Affiliates, USA
14
A CMMI-L5 company, Vietnam
8
A CMMI-L3 company, Thailand
Average
Median
Max
Min
Size
(EKSLOC)
64.1
39.6
473.4
2.8
Effort
(PM)
115.2
58.7
1505.1
4.9
Schedule
(Month)
10.5
10.2
36.9
1.8
ESLOC
Added
31.8%
Equivalent SLOC differs
from SLOC of the
delivered program
ESLOC
Adapted
60.7%
ESLOC
Reused
7.5%
© 2010, USC-CSSE
Distribution of size metrics
15
University of Southern California
Center for Systems and Software Engineering
Data Collection Results (cont’d)
• Distribution of Size and Effort
1600
3.5
1400
3
1200
2.5
2
Log(PM)
PM
1000
800
600
1.5
400
1
200
0.5
0
0
0
100
200
300
400
500
EKSLOC
0
0.5
1
1.5
2
2.5
3
Log(EKSLOC)
Log(PM) vs. Log(EKSLOC)
PM vs. EKSLOC
© 2010, USC-CSSE
16
University of Southern California
Center for Systems and Software Engineering
Model Calibrations
• Full model calibrations
– Applying Bayesian and Constrained Regression
– Using 80 data points (6 outliers eliminated)
• Local calibrations
– Calibrating models into organizations and programs
– Using four approaches
• productivity index, simple regression, Bayesian, constrained
regression
© 2010, USC-CSSE
17
University of Southern California
Center for Systems and Software Engineering
Full Model Calibrations
• Bayesian approach
– Productivity ranges indicate that
•
•
•
APCAP is less influential than it is in COCOMO II.2000
CPLX is still the most influential
PCAP is more influential than ACAP
Parameter COCOMO II Maintenance PR
COCOMO II.2000 PR Difference
A
3.16
2.94
0.22
B
0.78
0.91
-0.13
PMAT
1.41
1.43
-0.03
PREC
1.31
1.33
-0.02
TEAM
1.29
1.29
0.01
FLEX
1.26
1.26
-0.01
RESL
1.39
1.38
0.01
PCAP
1.79
1.76
0.02
RELY
1.22
1.24
-0.02
CPLX
2.22
2.38
-0.16
TIME
1.55
1.63
-0.08
STOR
1.35
1.46
-0.11
ACAP
1.61
2.00
-0.39
PLEX
1.44
1.40
0.04
LTEX
1.46
1.43
0.03
DATA
1.36
1.42
-0.06
DOCU
1.53
1.52
0.01
PVOL
1.46
1.49
-0.04
APEX
1.58
1.51
0.08
PCON
1.49
1.59
-0.10
TOOL
1.55
1.50
0.05
SITE
1.53
1.53
0.01
1.22
R E LY
F LE X
1.26
1.29
T EA M
P R EC
1.31
ST OR
1.35
DATA
1.36
R ESL
P M AT
1.39
1.41
1.44
P LE X
P VOL
1.46
LT E X
1.46
P C ON
1.49
D OC U
1.53
S IT E
1.53
T OOL
1.55
T IM E
1.55
1.58
A P EX
1.61
ACAP
1.79
P CAP
C P LX
2.22
1.00
1.50
Productivity Ranges
2.00
2.50
Differences in PRs between COCOM II.2000 and COCOMO II for Maint.
© 2010, USC-CSSE
18
University of Southern California
Center for Systems and Software Engineering
Full Model Calibrations (cont’d)
• Estimation accuracies
– COCOMO II.2000: use the model to estimate 80 data points
– COCOMO II for Maintenance: calibrated using Bayesian and
Constrained regression approaches
• COCOMO II for Maintenance outperforms COCOMO II.2000 by a
wide margin
Model
MMRE
PRED(0.25)
PRED(0.3)
COCOMO II.2000
56%
31%
38%
COCOMO II for Maintenance: Bayesian
48%
41%
51%
COCOMO II for Maintenance: CMRE
37%
56%
60%
COCOMO II for Maintenance: CMSE
39%
43%
51%
COCOMO II for Maintenance: CMAE
42%
54%
58%
58%
Three Constrained Regression Techniques:
CMRE: Constrained Minimum sum of Relative Errors
CMSE: Constrained Minimum sum of Square Errors
CMAE: Constrained Minimum sum of Absolute Errors
© 2010, USC-CSSE
19
University of Southern California
Center for Systems and Software Engineering
Local Calibration
• Local calibration potentially improving the performance of
estimation models [Chulani 1999, Velerdi 2005]
• In local calibration, the model’s constants A and B estimated
using local data sets
• Local calibration types
– Organization-based
• All data points of each organization used to calibrate the
model
• 3 organizations, 80 releases
– Program-based
• All data points (releases) of each program
• Only programs having 5 or more releases
• Total 45 releases in 6 programs
© 2010, USC-CSSE
20
University of Southern California
Center for Systems and Software Engineering
Local Calibration (cont’d)
• Approaches to be compared
– Productivity index
• Using the productivity of past projects to estimate the effort of
the current project given size
• The most simple but widely used
– Simple linear regression
• Building a simple regression model using log(PM) as the
response and log(EKSLOC) as the predictor
• Widely used estimation approach
– COCOMO II for Maintenance: Bayesian analysis
– COCOMO II for Maintenance: CMRE
© 2010, USC-CSSE
21
University of Southern California
Center for Systems and Software Engineering
Local Calibration (cont’d)
• Organization-based calibration accuracies: 80 data points
Model
MMRE
PRED(0.25)
PRED(0.3)
Productivity index
44%
40%
48%
Simple linear regression
50%
34%
35%
COCOMO II for Maintenance: Bayesian
38%
54%
59%
COCOMO II for Maintenance: CMRE
34%
62%
64%
• Program-based calibration accuracies: 45 data points
Model
MMRE
PRED(0.25)
PRED(0.3)
Productivity index
27%
53%
64%
Simple linear regression
25%
64%
69%
COCOMO II for Maintenance: Bayesian
22%
71%
80%
COCOMO II for Maintenance: CMRE
21%
72%
79%
© 2010, USC-CSSE
22
University of Southern California
Center for Systems and Software Engineering
Conclusions
• A model for sizing maintenance and reuse is proposed
• A set of cost drivers and levels of their impact on
maintenance cost are derived
• Deleted SLOC is an important maintenance cost driver
• The extension is more favorable than the productivity
index and simple linear regression
• Organization-based and program-based calibrations
improve estimation accuracy
– Best model generates estimates within 30% of the actuals 80%
of the time
© 2010, USC-CSSE
23
University of Southern California
Center for Systems and Software Engineering
Threats to Validity
• Threats to Internal Validity
– Unrecorded overtime not included in actual effort reported
– Various counting tools used in the US organization
– Reliability of the data reported from the organizations
• Threats to External Validity
– Bias in the data set: data from the three organizations may
not be relevant to the general software industry
– Bias in the selection of participants for the Delphi survey
© 2010, USC-CSSE
24
University of Southern California
Center for Systems and Software Engineering
Future Work
• Calibrate the model with more data points from
industry
• Build domain-specific, language-specific, or platformspecific model
• Survey a more diverse group of experts, not only those
who are familiar with COCOMO
• Extend the model to other types of maintenance
– reengineering, language and data migration, performance
improvement, etc.
• Extend the model to support effort estimation of
iterations in iterative development
© 2010, USC-CSSE
25
University of Southern California
Center for Systems and Software Engineering
Thank You
© 2010, USC-CSSE
26
University of Southern California
Center for Systems and Software Engineering
References – 1/2
Abran A., Silva I., Primera L. (2002), "Field studies using functional size measurement in building estimation models for
software maintenance", Journal of Software Maintenance and Evolution, Vol 14, part 1, pp. 31-64
Abran A., St-Pierre D., Maya M., Desharnais J.M. (1998), "Full function points for embedded and real-time software",
Proceedings of the UKSMA Fall Conference, London, UK, 14.
Albrecht A.J. (1979), “Measuring Application Development Productivity,” Proc. IBM Applications Development Symp.,
SHARE-Guide, pp. 83-92.
Basili V.R., Condon S.E., Emam K.E., Hendrick R.B., Melo W. (1997) "Characterizing and Modeling the Cost of Rework in a
Library of Reusable Software Components". Proceedings of the 19th International Conference on Software
Engineering, pp.282-291
Boehm B.W. (1981), “Software Engineering Economics”, Prentice-Hall, Englewood Cliffs, NJ, 1981.
Boehm B.W. (1999), "Managing Software Productivity and Reuse," Computer 32, Sept., pp.111-113
Boehm B.W., Horowitz E., Madachy R., Reifer D., Clark B.K., Steece B., Brown A.W., Chulani S., and Abts C. (2000),
“Software Cost Estimation with COCOMO II,” Prentice Hall.
Briand L.C. & Basili V.R. (1992) “A Classification Procedure for an Effective Management of Changes during the Software
Maintenance Process”, Proc. ICSM ’92, Orlando, FL
Chulani S. (1999), "Bayesian Analysis of Software Cost and Quality Models", PhD Thesis, the University of Southern
California.
Port D., Nguyen V., Menzies T., (2009) “Studies of Confidence in Software Cost Estimation Research Based on the
Criterions MMRE and PRED.” Submitted to Journal of Empirical Software Engineering
De Lucia A., Pompella E., Stefanucci S. (2003), “Assessing the maintenance processes of a software organization: an
empirical analysis of a large industrial project”, The Journal of Systems and Software 65 (2), 87–103.
Erlikh L. (2000). “Leveraging legacy system dollars for E-business”. (IEEE) IT Pro, May/June, 17-23.
Gerlich R., and Denskat U. (1994), "A Cost Estimation Model for Maintenance and High Reuse, Proceedings," ESCOM
1994, Ivrea, Italy.
IEEE (1998) IEEE Std. 1219-1998, Standard for Software Maintenance, IEEE Computer Society Press, Los Alamitos, CA.
© 2010, USC-CSSE
27
University of Southern California
Center for Systems and Software Engineering
References – 2/2
Jorgensen M. (1995), “Experience with the accuracy of software maintenance task effort prediction models”, IEEE
Transactions on Software Engineering 21 (8) 674–681.
McKee J. (1984). “Maintenance as a function of design”. Proceedings of the AFIPS National Computer Conference, 187-193.
Moad J. (1990). “Maintaining the competitive edge”. Datamation 61-62, 64, 66.
Niessink F., van Vliet H. (1998), “Two case study in measuring maintenance effort”, Proceedings of International Conference
on Software Maintenance, Bethesda, MD, USA, pp. 76–85.
Ramil J.F. (2003), “Continual Resource Estimation for Evolving Software," PhD Thesis, University of London, Imperial College
of Science, Technology and Medicine.
Nguyen V., Deeds-Rubin S., Tan T., Boehm B.W. (2007), “A SLOC Counting Standard,” The 22nd International Annual
Forum on COCOMO and Systems/Software Cost Modeling.
Nguyen V., Steece B., Boehm B.W. (2008), “A constrained regression technique for COCOMO calibration”, Proceedings of
the 2nd ACM-IEEE international symposium on Empirical software engineering and measurement (ESEM), pp. 213-222
Nguyen V., Boehm B.W., Danphitsanuphan P. (2009), “Assessing and Estimating Corrective, Enhancive, and Reductive
Maintenance Tasks: A Controlled Experiment.” In Proceedings of 16th Asia-Pacific Software Engineering Conference
(APSEC 2009), Dec.
Nguyen V., Boehm B.W., Danphitsanuphan P. (2010), “A Controlled Experiment in Assessing and Estimating Software
Maintenance Tasks”, APSEC Special Issue, Information and Software Technology Journal, 2010.
Sneed H.M., (1995), "Estimating the Costs of Software Maintenance Tasks," IEEE International Conference on Software
Maintenance, pp. 168-181
Rosencrance L. (2007), "Survey: Poor communication causes most IT project failures," Computerworld
Selby R. (1988), Empirically Analyzing Software Reuse in a Production Environment, In Software Reuse: Emerging
Technology, W. Tracz (Ed.), IEEE Computer Society Press, pp. 176-189.
Sneed H.M., (2004), "A Cost Model for Software Maintenance & Evolution," IEEE International Conference on Software
Maintenance, pp. 264-273
Symons C.R. (1988) "Function Point Analysis: Difficulties and Improvements," IEEE Transactions on Software Engineering,
vol. 14, no. 1, pp. 2-11
Valerdi R. (2005), "The Constructive Systems Engineering Cost Model (Cosysmo)", PhD Thesis, The University of Southern
California.
Zelkowitz M.V., Shaw A.C., Gannon J.D. (1979). “Principles of Software Engineering and Design”. Prentice-Hall
© 2010, USC-CSSE
28
University of Southern California
Center for Systems and Software Engineering
Backup Slides
© 2010, USC-CSSE
29
University of Southern California
Center for Systems and Software Engineering
Abbreviations
COCOMO
COCOMO II
CMMI
EM
PM
OLS
MSE
MAE
CMSE
CMAE
CMRE
MMRE
MRE
PRED
ICM
PR
SF
Constructive Cost Model
Constructive Cost Model version II
Capability Maturity Model Integration
Effort Multiplier
Person Month
Ordinary Least Squares
Mean Square Error
Mean Absolute Error
Constrained Minimum Sum of Square Errors
Constrained Minimum Sum of Absolute Errors
Constrained Minimum Sum of Relative Errors
Mean of Magnitude of Relative Errors
Magnitude of Relative Errors
Prediction level
Incremental Commitment Model
Productivity Range
Scale Factor
© 2010, USC-CSSE
30
University of Southern California
Center for Systems and Software Engineering
Model Parameter Abbreviations
AA
AAF
AAM
AKSLOC
CM
DM
EKSLOC
ESLOC
IM
KSLOC
RKSLOC
SLOC
SU
UNFM
Assessment and Assimilation
Adaptation Adjustment Factor
Adaptation Adjustment Multiplier
Kilo Source Lines of Code of the Adapted Modules
Code Modified
Design Modified
Equivalent Kilo Source Lines of Code
Equivalent Source Lines of Code
Integration Modified
Kilo Source Lines of Code
Kilo Source Lines of Code of the Reused Modules
Source Lines of Code
Software Understanding
Programmer Unfamiliarity
© 2010, USC-CSSE
ACAP
APEX
CPLX
DATA
DOCU
FLEX
LTEX
PCAP
PCON
PERS
PLEX
PMAT
PREC
PREX
PVOL
RELY
RESL
SITE
STOR
TEAM
TIME
TOOL
Analyst Capability
Applications Experience
Product Complexity
Database Size
Documentation Match to Life-Cycle Needs
Development Flexibility
Language and Tool Experience
Programmer Capability
Personnel Continuity
Personnel Capability
Platform Experience
Equivalent Process Maturity Level
Precedentedness of Application
Personnel Experience
Platform Volatility
Required Software Reliability
Risk Resolution
Multisite Development
Main Storage Constraint
Team Cohesion
Execution Time Constraint
Use of Software Tools
31
University of Southern California
Center for Systems and Software Engineering
Model Accuracy Measures
• Magnitude of relative error (MRE)
yi  yˆi
MREi 
yi
• Mean of MRE (MMRE)
1
MMRE 
N
N
 MRE
i 1
i
• Prediction Level: PRED(l) = k/N
– k is the number of estimates with
MRE ≤ l
– Commonly used PRED(0.30) and PRED(0.25)
© 2010, USC-CSSE
32
Download