Domain-Driven Software Cost Estimation

advertisement
University of Southern California
Center for Systems and Software Engineering
Domain-Driven
Software Cost Estimation
Wilson Rosa (Air Force Cost Analysis Agency)
Barry Boehm (USC)
Brad Clark (USC)
Thomas Tan (USC)
Ray Madachy (Naval Post Graduate School)
27th International Forum on COCOMO®
and Systems/Software Cost Modeling
October 16, 2012
This material is based upon work supported, in whole or in part, by the U.S. Department of Defense through the Systems Engineering Research
Center (SERC) under Contract H98230-08-D-0171. The SERC is a federally funded University Affiliated Research Center (UARC) managed by
Stevens Institute of Technology consisting of a collaborative network of over 20 universities. More information is available at www.SERCuarc.org
University of Southern California
Center for Systems and Software Engineering
Research Objectives
• Make collected data useful to oversight and management entities
– Provide guidance on how to condition data to address challenges
– Segment data into different Application Domains and Operating
Environments
– Analyze data for simple Cost Estimating Relationships (CER) and
Schedule-Cost Estimating Relationships (SCER) within each domain
– Develop rules-of-thumb for missing data
Domain CER/SER
Data Preparation
and Analysis
Data Records
for one Domain
Cost (Effort) = a * Sizeb
Schedule = a * Sizeb * Staffc
27th International Forum on COCOMO® and Systems/Software Cost Modeling
2
University of Southern California
Center for Systems and Software Engineering
Stakeholder Community
• Research is collaborative across heterogeneous
stakeholder communities who have helped us in refining
our data definition framework, taxonomy, providing us data
and funding
Funding Sources
Data Sources
Project has 27th
evolved
into a Joint Government Software Study3
International Forum on COCOMO® and Systems/Software Cost Modeling
University of Southern California
Center for Systems and Software Engineering
Topics
• Data Preparation Workflow
– Data Segmentation
•
•
•
•
•
•
Analysis Workflow
Software Productivity Benchmarks
Cost Estimating Relationships
Schedule Estimating Relationships
Conclusion
Future Work
27th International Forum on COCOMO® and Systems/Software Cost Modeling
4
University of Southern California
Center for Systems and Software Engineering
Data Preparation
University of Southern California
Center for Systems and Software Engineering
Current Dataset
• Multiple Data Formats (SRDR, SEER, COCOMO)
• SRDR (377 records) + Other (143 records) = 522 total
records
Software Resources Data Report: Final Developer Report - Sample
Page 1: Report Context, Project Description and Size
1.
Report Context
1. System/Element Name (version/release):
2. Report As Of:
3. Authorizing Vehicle (MOU, contract/amendment, etc.):
4. Reporting Event: Contract/Release End
Submission # ________
(Supersedes # _______, if applicable)
Description of Actual Development Organization
5. Development Organization:
6. Certified CMM Level
(or equivalent):
7. Certification Date:
8. Lead Evaluator:
9. Affiliation:
10. Precedents (list up to five similar systems by the same organization or team):
Comments on Part 1 responses:
2.
Product and Development Description
Percent of
Product Size
1. Primary Application Type:
2.
17. Primary Language Used:
18.
Upgrade or
New?
Actual Development Process
% 3.
4.
%
21. List COTS/GOTS Applications Used:
22. Peak staff (maximum team size in FTE) that worked on and charged to this project: __________
23. Percent of personnel that was: Highly experienced in domain: ___%
Nominally experienced: ___%
Entry level, no experience: ___%
Comments on Part 2 responses:
3.
Multiple Sources
Provide Actuals at
Final Delivery
Product Size Reporting
1. Number of Software Requirements, not including External Interface Requirements (unless noted in associated Data
Dictionary)
2. Number of External Interface Requirements (i.e., not under project control)
3. Amount of Requirements Volatility encountered during development (1=Very Low .. 5=Very High)
Code Size Measures for items 4 through 6. For each, indicate S for physical SLOC (carriage returns); Snc for noncomment
SLOC only; LS for logical statements; or provide abbreviation _________ and explain in associated Data Dictionary.
4. Amount of New Code developed and delivered (Size in __________ )
5. Amount of Modified Code developed and delivered (Size in __________ )
6. Amount of Unmodified, Reused Code developed and delivered (Size in __________ )
Comments on Part 3 responses:
DD Form 2630-3
Page 1 of 2
27th International Forum on COCOMO® and Systems/Software Cost Modeling
6
University of Southern California
Center for Systems and Software Engineering
The Need for Data Preparation
• Issues found in dataset
–
–
–
–
–
–
–
–
–
–
–
Inadequate information on modified code (size provided)
Inadequate information on size change or growth
Size measured inconsistently
Inadequate information on average staffing or peak staffing
Inadequate information on personnel experience
Inaccurate effort data in multi-build components
Missing effort data
Replicated duration (start and end dates) across components
Inadequate information on schedule compression
Missing schedule data
No quality data
27th International Forum on COCOMO® and Systems/Software Cost Modeling
7
University of Southern California
Center for Systems and Software Engineering
Data Preparation Workflow
Start with SRDR submissions
Inspect each Data Point
Determine Data Quality Levels
No
resolution
Correct Missing or Questionable Data
Normalize Data
Exclude from
Analysis
Segment Data
8
27th International Forum on COCOMO® and Systems/Software Cost Modeling
University of Southern California
Center for Systems and Software Engineering
Segment Data by Operating Environments (OE)
Operating Environment
Fixed (GSF)
Ground Site (GS)
Mobile (GSM)
Manned (GVM)
Examples
Command Post, Ground Operations Center, Ground
Terminal, Test Faculties
Intelligence gathering stations mounted on vehicles,
Mobile missile launcher
Tanks, Howitzers, Personnel carrier
Ground Vehicle (GV)
Unmanned (GVU) Robots
Manned (MVM)
Aircraft carriers, destroyers, supply ships, submarines
Maritime Vessel (MV)
Unmanned (MVU) Mine hunting systems, Towed sonar array
Manned (AVM)
Fixed-wing aircraft, Helicopters
Aerial Vehicle (AV)
Unmanned (AVU) Remotely piloted air vehicles
Ordinance Vehicle (OV) Unmanned (OVU)
Air-to-air missiles, Air-to-ground missiles, Smart
bombs, Strategic missiles
Manned (SVM)
Passenger vehicle, Cargo vehicle, Space station
Unmanned (SVU)
Orbiting satellites (weather, communications),
Exploratory space vehicles
Space Vehicle (SV)
27th International Forum on COCOMO® and Systems/Software Cost Modeling
9
University of Southern California
Center for Systems and Software Engineering
Segment Data by Productivity Type (PT)
• Different productivities have been observed for different
software application types.
• SRDR dataset was segmented into 14 productivity types to
increase the accuracy of estimating cost and schedule
1. Sensor Control and
Signal Processing (SCP)
2. Vehicle Control (VC)
3. Real Time Embedded (RTE)
4. Vehicle Payload (VP)
5. Mission Processing (MP)
6. System Software (SS)
7. Telecommunications (TEL)
8. Process Control (PC)
9. Scientific Systems (SCI)
10.Planning Systems (PLN)
11.Training (TRN)
12.Test Software (TST)
13.Software Tools (TUL)
14.Intelligence & Information
Systems (IIS)
27th International Forum on COCOMO® and Systems/Software Cost Modeling
10
University of Southern California
Center for Systems and Software Engineering
Example: Finding Productivity Type
Finding Productivity Type (PT) using the Aircraft MIL-STD-881 WBS:
The highest level element represents the environment. In the MAV
environment there are the Avionics subsystem, Fire-Control sub-subsystem,
and the sensor, navigation, air data, display, bombing computer and
safety domains. Each domain has an associated productivity type.
Env
Subsys
Sub-subsystem
Domains
PT
MAV
Avionics
Fire Control
Search, target, tracking sensors
SCP
Self-contained navigation
RTE
Self-contained air data systems
RTE
Displays, scopes, or sights
RTE
Bombing computer
MP
Safety devices
RTE
Data Display
Multi-function display
RTE
and Controls
Control display units
RTE
Display processors
MP
On-board mission planning
TRN
Level 1
Level 2
Level 3
Level 4
27th International Forum on COCOMO® and Systems/Software Cost Modeling
11
University of Southern California
Center for Systems and Software Engineering
Operating Environment & Productivity Type
Productivity Type
Operating Environment
GSF GSM GVM GVU MVM MVU AVM AVU OVU SVM SVU
SSP
VC
RTE
VP
MP
SS
TEL
PC
SCI
PLN
TRN
TST
TUL
IIS
X
When the dataset is segmented by
Productivity Type and Operating
Environment, the impact accounted
for by many COCOMO II model
drivers are considered
27th International Forum on COCOMO® and Systems/Software Cost Modeling
12
University of Southern California
Center for Systems and Software Engineering
Data Analysis
University of Southern California
Center for Systems and Software Engineering
Analysis Workflow
Prepared, Normalized & Segmented Data
Derive CER Model Form
Derive SCER
Publish SCER
10/16/2012
Derive Final-CER &
reference data subset
Publish Productivity
Benchmarks by
Productivity Type &
Size Group
Publish CER results
CER: Cost Estimating Relationship
PR: Productivity Ratio
SER: Schedule Estimating Relationship
SCER: Schedule Compression /
Expansion Relationship
27th International Forum on COCOMO® and Systems/Software Cost Modeling
14
University of Southern California
Center for Systems and Software Engineering
Software Productivity Benchmarks
•
•
Productivity-based CER
Software productivity refers to the ability of an organization to generate
outputs using the resources that it currently has as inputs. Inputs typically
include facilities, people, experience, processes, equipment, and tools.
Outputs generated include software applications and documentation
used to describe them.
•
The metric used to express software productivity is thousands of
equivalent source lines of code (ESLOC) per person-month (PM) of
effort. While many other measures exist, ESLOC/PM will be used
because most of the data collected by the Department of Defense (DoD)
on past projects is captured using these two measures. While
controversy exists over whether or not ESLOC/PM is a good measure,
consistent use of this metric (see Metric Definitions) provides for
meaningful comparisons of productivity.
University of Southern California
Center for Systems and Software Engineering
Software Productivity Benchmarks
Benchmarks by PT, across all operating environments**
PT
MEAN
(ESLOC/PM
MIN
MAX
)
(ESLOC/PM)
(ESLOC/PM)
KESLOC
Obs.
Std.
Dev.
CV
MIN MAX
SCP
10
50
80
38
19
39%
1
162
VP
28
82
202
16
43
52%
5
120
RTE
33
136
443
52
73
54%
1
167
MP
34
189
717
47
110
58%
1
207
SCI
9
221
431
39
119
54%
1
171
SYS
61
225
421
60
78
35%
2
215
IIS
169
442
1039
36
192
43%
1
180
** The following operating environments were included in the analysis:
• Ground Surface Vehicles
• Sea Systems
• Aircraft
• Missile / Ordnance (M/O)
• Spacecraft
Preliminary Results – More Records to be added
University of Southern California
Center for Systems and Software Engineering
Software Productivity Benchmarks
Benchmarks by PT, Ground System Manned Only
MIN
(ESLOC/PM)
MEAN
MAX
(ESLOC/PM) (ESLOC/PM)
56
80
Std.
KESLOC
Obs. Dev. CV MIN MAX
13
17 30% 1 76
PT
SCP
OE
GSM
RTE
GSM
51
129
239
22
46 36% 9
89
MP
GSM
87
162
243
6
52 32% 15
91
SYS
GSM
115
240
421
28
64 26% 5
215
SCI
GSM
9
243
410
24
108 44% 5
171
IIS
GSM
236
376
581
23
85 23% 15 180
27
Preliminary Results – More
Records to be added
CV:
ESLOC:
KESLOC:
MAD:
MAX:
MIN:
PM:
PT:
OE:
Cost Variance
Equivalent SLOC
Equivalent SLOC in Thousands
Mean Absolute Deviation
Maximum
Minimum
Effort in Person-Months
Productivity Type
Operating Environment
University of Southern California
Center for Systems and Software Engineering
Cost Estimating Relationships
Preliminary Results – More Records to be added
University of Southern California
Center for Systems and Software Engineering
CER Model Forms
•
•
•
•
•
•
Effort = a * Size
Effort = a * Size + b
Effort = a * Sizeb + c
Effort = a * ln(Size) + b
Effort = a * Sizeb * Durationc
Effort = a * Sizeb * c1-n
Production Cost
(Cost/Unit)
Scaling Factor
Log-Log
transform
% Adjustment Factor
ln(Effort) = b0 + (b1 * ln(Size)) + (b2 * ln(c1)) + (b3 * ln(c2)) + …
Effort = eb0 * Sizeb1 * c1b2 * c2b3 + …
Anti-log transform
19
University of Southern California
Center for Systems and Software Engineering
Software CERs by Productivity Type (PT)
CERs by PT, across all operating environments**
PT
Equation Form
Obs.
R2
(adj)
IIS
PM = 1.266 * KESLOC1.179
37
90%
35%
65
1
180
MP
PM = 3.477 * KESLOC1.172
48
88%
49%
58
1
207
RTE
PM = 34.32 + KESLOC1.515
52
68%
61%
46
1
167
SCI
PM = 21.09 + KESLOC1.356
39
61%
65%
18
1
171
SCP
PM = 74.37 + KESLOC1.714
36
67%
69%
31
1
162
SYS
PM = 16.01 + KESLOC1.369
60
85%
37%
53
2
215
VP
PM = 3.153 * KESLOC 1.382
16
86%
27%
50
5
120
MAD
** The following operating environments were included in the analysis:
• Ground Surface Vehicles
• Sea Systems
• Aircraft
• Missile / Ordnance (M/O)
• Spacecraft
Preliminary Results – More Records to be added
PRED KESLOC
(30) MIN MAX
University of Southern California
Center for Systems and Software Engineering
Software CERs for Aerial Vehicle Manned (AVM)
CERs by Productivity Type, AVM Only
R2
PRED KESLOC
Obs. (adj) MAD (30) MIN MAX
PT
OE
Equation Form
MP
MAV
PM = 3.098*KESLOC1.236
31
88%
50%
59
1
207
RRTE
MAV
PM = 5.611*KESLOC1.126
9
89%
50%
33
1
167
SCP
MAV
PM = 115.8 + KESLOC1.614
8
88%
27%
62
6
162
Preliminary Results – More Records to be added
CERs:
ESLOC:
KESLOC:
MAD:
MAX:
MIN:
PM:
PRED:
PT:
OE:
Cost Estimating Relationships
Equivalent SLOC
Equivalent SLOC in Thousands
Mean Absolute Deviation
Maximum
Minimum
Effort in Person-Months
Prediction (Level)
Productivity Type
Operating Environment
University of Southern California
Center for Systems and Software Engineering
Software CERs for Manned Ground Systems Manned
CERs by Productivity Type (GSM)
PT
OE
Equation Form
IIS
MGS
MP
MGS
PM = 3.201 * KESLOC1.188
RTE
MGS
SCI
R2
PRE
Obs. (adj) MAD (30)
PM = 30.83 + 1.381 * KESLOC1.103 23
KESLOC
MIN MAX
16%
91
15
180
6
86% 24%
83
15
91
PM = 84.42 + KESLOC1.451
22
24%
73
9
89
MGS
PM = 34.26 + KESLOC1.286
24
37%
56
5
171
SCP
MGS
PM = 135.5 + KESLOC1.597
13
39%
31
1
76
SYS
MGS
PM = 20.86 + 2.347 * KESLOC1.115 28
19%
82
5
215
Preliminary Results – More Records to be added
CERs:
ESLOC:
KESLOC:
MAD:
MAX:
MIN:
PM:
PT:
OE:
Cost Estimating Relationships
Equivalent SLOC
Equivalent SLOC in Thousands
Mean Absolute Deviation
Maximum
Minimum
Effort in Person-Months
Productivity Type
Operating Environment
University of Southern California
Center for Systems and Software Engineering
Software CERs for Space Vehicle Unmanned
CERs by Productivity Type (PT) - SVU Only
PT
OE
Equation Form
Obs.
R2
(adj)
VP
SVU
PM = 3.153*KESLOC 1.382
16
86%
PRED
MAD (30)
27%
50
KESLOC
MIN
MAX
5
120
CERs:
Cost Estimating Relationships
ESLOC: Equivalent SLOC
KESLOC: Equivalent SLOC in Thousands
MAD:Mean Absolute Deviation
MAX:
Maximum
MIN: Minimum
PM: Effort in Person-Months
PRED:
Prediction (Level)
PT: Productivity Type
OE: Operating Environment
Preliminary Results – More Records to be added
University of Southern California
Center for Systems and Software Engineering
Schedule Estimating Relationships
Preliminary Results – More Records to be added
University of Southern California
Center for Systems and Software Engineering
Schedule Estimation Relationships (SERs)
• SERs by Productivity Type (PT), across operating environments**
PT
R2
PRED KESLOC
Obs. (adj) MAD (30) MIN MAX
Equation Form
IIS TDEV = 3.176 * KESLOC0.7209 / FTE
0.4476
35
65
25
68
1
180
MP TDEV = 3.945 * KESLOC0.968 / FTE 0.7505
43
77
39
52
1
207
RTE TDEV= 11.69 * KESLOC 0.7982 / FTE
0.8256
49
70
36
55
1
167
SYS TDEV = 5.781 * KESLOC0.8272 / FTE 0.7682
56
71
27
62
2
215
SCP TDEV = 34.76 * KESLOC0.5309 / FTE
35
62
26
64
1
165
0.5799
** The following operating environments were included in the analysis:
•Ground Surface Vehicles
•Sea Systems
•Aircraft
•Missile / Ordnance (M/O)
•Spacecraft
Preliminary Results – More Records to be added
27th International Forum on COCOMO® and Systems/Software Cost Modeling
25
University of Southern California
Center for Systems and Software Engineering
Size – People – Schedule Tradeoff
27th International Forum on COCOMO® and Systems/Software Cost Modeling
26
University of Southern California
Center for Systems and Software Engineering
COCOMO 81 vs. New Schedule Equations
• Model Comparisons
PT
Obs.
New Schedule Equations
COCOMO 81
Equations
IIS
35
TDEV = 3.176 * KESLOC0.7209 * FTE -0.4476
TDEV = 2.5 * PM0.38
MP
43
TDEV = 3.945 *KESLOC0.968 * FTE-0.7505
TDEV = 2.5 * PM0.35
RTE
49
TDEV= 11.69 *KESLOC 0.7982 * FTE -0.8256
TDEV = 2.5 * PM0.32
SYS
56
TDEV = 5.781 *KESLOC0.8272 * FTE-0.7682
TDEV = 2.5 * PM0.35
SCP
35
TDEV = 34.76 * KESLOC0.5309 * FTE-0.5799
TDEV = 2.5 * PM0.32
** The following operating environments were included in the analysis:
•Ground Surface Vehicles
•Sea Systems
•Aircraft
•Missile / Ordnance (M/O)
•Spacecraft
Preliminary Results – More Records to be added
University of Southern California
Center for Systems and Software Engineering
COCOMO 81 vs. New Schedule Equations
• Model Comparisons using PRED (30%)
PT
IIS
MP
RTE
SYS
SCP
Obs.
35
43
49
56
35
New Schedule
Equations PRED (30)
68
52
55
62
64
COCOMO 81 Equations
PRED (30)
28
23
16
5
8
** The following operating environments were included in the analysis:
•Ground Surface Vehicles
•Sea Systems
•Aircraft
•Missile / Ordnance (M/O)
•Spacecraft
Preliminary Results – More Records to be added
University of Southern California
Center for Systems and Software Engineering
Conclusions
University of Southern California
Center for Systems and Software Engineering
Conclusion
• Developing CERs and Benchmarks by grouping appears to
account for some of the variability in estimating
relationships.
• Grouping software applications by Operating Environment
and Productivity Type appears to have promise – but
needs refinement
• Analyses shown in this presentation are preliminary as
more data is available for analysis
– It requires preparation first
27th International Forum on COCOMO® and Systems/Software Cost Modeling
30
University of Southern California
Center for Systems and Software Engineering
Future Work
• Productivity Benchmarks need to be segregated by sizegroups
• More data is available to fill in missing cells in the OE-PT
table
• Workshop recommendations will be implemented
– New data grouping strategy
• Data repository that provides drill-down to source data
– Presents the data to the analyst
– If there is a question, it is possible to navigate to the source
document, e.g. data collection form, project notes, EVM data,
Gantt Charts, etc.
• Final results will be published online
http://csse.usc.edu/afcaawiki
Download