Tutorial: System and Software Sizing Barry Boehm, Ali Afzal Malik, USC-CSSE

advertisement
University of Southern California
Center for Systems and Software Engineering
Tutorial: System and Software
Sizing
Barry Boehm, Ali Afzal Malik, USC-CSSE
COCOMO/SSCM Forum Sizing Workshop
October 31, 2007
10/31/2007
©USC-CSSE
1
University of Southern California
Center for Systems and Software Engineering
Outline
•
•
•
•
•
•
Nature and value of sizing
Sizing paradoxes and challenges
Sizing metrics
Sizing methods and tools
Sizing research and development needs
Conclusions and references
10/31/2007
©USC-CSSE
2
University of Southern California
Center for Systems and Software Engineering
Nature and Value of Sizing
•
•
•
•
Definitions and general characteristics
Criteria for good sizing parameters
Value provided by sizing
When does sizing add less value?
10/31/2007
©USC-CSSE
3
University of Southern California
Center for Systems and Software Engineering
Sizing Definitions and Characteristics
• Dictionary definition: bigness, bulk,
magnitude
• Generally considered additive
– Size (A U B) = Size (A) + Size (B)
• Often weighted by complexity
– Some artifacts “heavier” to put in place
– Requirements, function points
10/31/2007
©USC-CSSE
4
University of Southern California
Center for Systems and Software Engineering
Criteria for Good Sizing Metrics
(Stutzke, 2005)
Criteria
Definition
Relevance
Represents the characteristic of interest.
Accuracy
Faithfully represents the amount or degree of that
characteristic.
Adequate Precision
The level of detail needed by user is achievable.
Dependable
Values are measured consistently. Mathematical
operations “work” as expected.
Timely
User receives values in time to act on them.
Affordable
The value of the information exceeds the cost of
obtaining it.
Predictability
Adequately accurate forecasts of future values are
possible.
Controllability
Actions exist that can influence the measured value.
10/31/2007
©USC-CSSE
5
University of Southern California
Center for Systems and Software Engineering
Value Provided by Sizing
• Often aids in cost and effort estimation
– (Cost, Effort) = (# Size Units) x
(
Cost, Effort
Size Unit
)
• Denominator for productivity, quality
comparisons
– Cost, effort / size unit
– Defects / size unit
10/31/2007
©USC-CSSE
6
University of Southern California
Center for Systems and Software Engineering
When Does Sizing Add Less Value?
• Often easier to go directly to estimating effort
– Imprecise size parameters
• GUI builders; COTS integration
– Familiar, similar-size applications
• Analogy to previous effort: “yesterday’s weather”
• Value in having performers do direct effort
estimation
– Stronger in-budget completion commitment
– Better understanding of job to be done
• Sizing can add some value here
• When size is a dependent variable
– Time-boxing with prioritized features
10/31/2007
©USC-CSSE
7
University of Southern California
Center for Systems and Software Engineering
Outline
•
•
•
•
•
•
Nature and value of sizing
Sizing paradoxes and challenges
Sizing metrics
Sizing methods and tools
Sizing research and development needs
Conclusions and references
10/31/2007
©USC-CSSE
8
University of Southern California
Center for Systems and Software Engineering
Paradox 1: More Software May Not Be Better
Product
Size (SLOC) Effort (PM)
SLOC/PM
TRW UNAS 0.7
12,200
28
436
TRW UNAS 0.9
5,200
38
137
•Benjamin Franklin (paraphrased):
“My apologies for the length of this letter.
Had I more time, it would have been shorter.”
10/31/2007
©USC-CSSE
9
University of Southern California
Center for Systems and Software Engineering
Productivity Paradoxes 2, 3
• Paradox 2: Cheaper software may not be
better
– More critical defects (Ariane V, Mars Climate
Orbiter)
– Later to market (cube-root vs. square-root law)
• Paradox 3: More reuse may not be better
– Reusing obsolete software
– Gaming the metrics: localization as reuse
10/31/2007
©USC-CSSE
10
University of Southern California
Center for Systems and Software Engineering
Traditional Minimum-Cost Schedule
• Months = 3 * ∛ Person-Months
• 27 PM  9 months, average of 3 persons
• Low communications overhead, but late
delivery
• Preferable RAD approach: 5.2 months,
average of 5.2 persons
– Square-root law
10/31/2007
©USC-CSSE
11
University of Southern California
Center for Systems and Software Engineering
Productivity Pitfalls
• WYMIWIG: What you measure is what you
get
– Weinberg data I
• One approach fits all situations
– Weinberg data II
• Assuming that asking for precise data will
produce precise data
– Not if it takes much extra effort
10/31/2007
©USC-CSSE
12
University of Southern California
Center for Systems and Software Engineering
WYMIWYG: What You Measure Is What You Get*
*Weinberg-Schulman, 1974
10/31/2007
©USC-CSSE
**1=Best
13
University of Southern California
Center for Systems and Software Engineering
Effect of Objectives on Productivity
(Weinberg-Schulman, 1974)
- Same application: Solve linear equations
10/31/2007
©USC-CSSE
14
University of Southern California
Center for Systems and Software Engineering
Challenges: Product Emergence and Elaboration
• Brooks factors: software system, product
• Seaver emergence and elaboration data
• Cone of uncertainty data
10/31/2007
©USC-CSSE
15
University of Southern California
Center for Systems and Software Engineering
Brooks’ Factor of 9 for Programming System Product
x3
A
A
Programming
Program
System
x3
A
A
Programming
Programming
Systems
Product
Adapted from
Brooks, 1995
Product
Product: Handles off-nominal cases, testing; well-documented
System: Handles interoperability, data management, business
management
Telecom: 1/6 basic call processing; 1/3 off-nominals; 1/2 billing
10/31/2007
©USC-CSSE
16
University of Southern California
Center for Systems and Software Engineering
Emergence and Elaboration Data
(Seaver, 2007)
• Example: 3 cost estimates for the same
project – Employee information database
Unit Counts
Transactions Data
What the CIO was
asked to do
What IT added once
they were asked
What the HR
department added
Totals
10/31/2007
Total Cost
Transaction
Costs
Data
Cost
15
4
$182,400
$122,143
$60,257
148
31
$1,710,000
$1,205,143
$504,857
153
32
$1,767,000
$1,245,857
$521,143
316
67
$3,659,400
$2,573,143
$1,086,257
©USC-CSSE
17
University of Southern California
Center for Systems and Software Engineering
Emergence and Elaboration 2
(Seaver, 2007)
CIO Request
Name
Core Employee Data
Online Help
User Access Data
3
Create
1
1
1
Transactions
3
3
Update Delete
1
1
1
1
1
1
3
15
3
Read
Report
1
1
1
2
1
Data
3
Save
in
File
4
1
Read
from
File
1
1
1
1
• How big is it now?
• How much of what we have now do we need to
keep?
• How much data do we need to transition?
• Any Security, Compliance, Privacy Issues?
10/31/2007
©USC-CSSE
18
University of Southern California
Center for Systems and Software Engineering
Emergence and Elaboration 3
(Seaver, 2007)
What IT knows
34
Transactions
29
29
Name
Create
4 Alternate Data Feeds
International Data
UK
Japan
Germany
China
4
4
1
1
3
1
Inputs from
Hiring Function (7)
Update Delete
29
148
27
Data
31
Save
in
File
Read
Report
4
4
4
4
1
1
3
1
1
1
3
1
1
1
3
1
1
1
1
1
1
1
7
7
7
7
7
7
Business Units
HR Functions (12)
12
12
12
12
12
12
Data Encryption
5
31
Read
from
File
1
5
• Are all business processes and rules covered?
10/31/2007
©USC-CSSE
19
University of Southern California
Center for Systems and Software Engineering
Emergence and Elaboration 4
(Seaver, 2007)
End User Inputs
Name
32
Create
Business Units
HR Functions (12)
12
Data Encryption
5
Report Writer
1
Transactions
27
27
Update Delete
12
12
27
153
40
Read
Report
12
12
Data
32
Save
in
File
32
Read
from
File
12
5
1
1
1
Error Notification
(Data Audit)
1
14
Data push
(Error Correction)
7
7
7
7
7
7
Data Pull
(Error Correction)
7
7
7
7
7
7
• Do we need all of this?
10/31/2007
©USC-CSSE
20
University of Southern California
Center for Systems and Software Engineering
The Cost and Size Cone of Uncertainty
• If you don’t know what you’re building, it’s hard to
estimate its size or cost
Boehm et al. 2000
10/31/2007
©USC-CSSE
21
University of Southern California
Center for Systems and Software Engineering
Outline
•
•
•
•
•
•
Nature and value of sizing
Sizing paradoxes and challenges
Sizing metrics
Sizing methods and tools
Sizing research and development needs
Conclusions and references
10/31/2007
©USC-CSSE
22
University of Southern California
Center for Systems and Software Engineering
Sizing Metrics vs. Time and Degree of Detail
(Stutzke, 2005)
Process
Phase
Possible
Measures
Primary
Aids
Concept
Subsystems
Key Features
Product
Vision,
Analogies
Elaboration
User Roles,
Use Cases
Construction
Screens,
Reports, Function
Components Objects
Files,
Points
Application
Points
Operational
Concept,
Specification,
Context
Feature List
Diagram
Architecture, Detailed
Top Level
Design
Design
Source
Lines of
Code,
Logical
Statements
Code
Increasing Time and Detail
10/31/2007
©USC-CSSE
23
University of Southern California
Center for Systems and Software Engineering
Example Metric Rating: Dependability
• Subsystems, Use Cases, Requirements
– Low: hard to pin down uniform level of detail
• Screens, Function Points, Components,
Objects
– Medium: takes some training and experience
• Lines of Code, Code Information Content
(Halstead)
– High: can be automatically counted
10/31/2007
©USC-CSSE
24
University of Southern California
Center for Systems and Software Engineering
COSYSMO Sizing
• 4 size drivers*
–
–
–
–
Number of system requirements
Number of major interfaces
Number of operational scenarios
Number of critical algorithms
*Each weighted by complexity, volatility, and degree of reuse
10/31/2007
©USC-CSSE
25
University of Southern California
Center for Systems and Software Engineering
Cockburn Use Case Level of Detail Scale
(Cockburn, 2001)
10/31/2007
©USC-CSSE
26
University of Southern California
Center for Systems and Software Engineering
Function Points
• 5 Function Types (IFPUG)
– External Input (EI), External Output (EO), External Query
(EQ), Internal Logical File (ILF), External Interface File
(EIF)
– Complexity levels: Low, Average, High
– Each combination of complexity level and function type
assigned a weight
– Unadjusted Function Points (UFPs): weighted sum of
count * weight
• Implementation-independent metric
• Available at an early stage
• Variant: Feature Points (Jones, 1996)
– Average complexity only, for all types
– Sixth type: algorithms
– Simple
10/31/2007
©USC-CSSE
27
University of Southern California
Center for Systems and Software Engineering
Object-Oriented Metrics
• Tradeoff (Stutzke, 2005)
– Objects: application domain vs. solution domain
• Six OO metrics (Chidamber and Kemerer,
1994)
–
–
–
–
–
–
10/31/2007
Weighted Methods Per Class (WMC)
Depth of Inheritance Tree (DIT)
Number of Children (NOC)
Coupling Between Object Classes (CBO)
Response for a Class (RFC)
Lack of Cohesion in Methods (LCOM)
©USC-CSSE
28
University of Southern California
Center for Systems and Software Engineering
Lines of Code
• Source Lines of Code (SLOC) = logical
source statements
• Logical source statements = data
declarations + executable statements
• CodeCount tools available on USC CSSE
web site
Adapted from Madachy, 2005
10/31/2007
©USC-CSSE
29
University of Southern California
Center for Systems and Software Engineering
SLOC Counting Rules
•Standard definition for counting lines
- Based on SEI definition checklist from CMU/SEI-92-TR-20
- Modified for COCOMO II
Statement type
Includes
1. Executable

Excludes
2. Non-executable:
10/31/2007
3.
Declarations

4.
Compiler directives

5.
Comments:
6.
On their own lines

7.
On lines with source code

8.
Banners and non-blank spacers

9.
Blank (empty) comments

10.
Blank lines

©USC-CSSE
Adapted from
Madachy, 2005
30
University of Southern California
Center for Systems and Software Engineering
Relationship Among Sizing Metrics
• Two broad categories of sizing metrics
– Implementation-specific e.g. Source Lines of
Code (SLOC)
– Implementation-independent e.g. Function
Points (FP)
• Need to relate the two categories
– e.g. SLOC/FP backfire ratios
10/31/2007
©USC-CSSE
31
University of Southern California
Center for Systems and Software Engineering
Multisource Estimation
Implementation-independent/dependent
• Implementation-independent estimators
– Use cases, function points, requirements
• Advantage: implementation-independent
– Good for productivity comparisons when using VHLLs,
COTS, reusable components
• Weakness: implementation-independent
– Gives same estimate when using VHLLs, COTS, reusable
components, 3GL development
• Multisource estimation reduces risk
10/31/2007
©USC-CSSE
32
University of Southern California
Center for Systems and Software Engineering
SLOC/FP Backfiring Table
(Jones, 1996): other backfire ratios up to 60% higher
10/31/2007
©USC-CSSE
33
University of Southern California
Center for Systems and Software Engineering
Reused and Modified Software
• Effort for adapted software (reused or
modified) is not the same as for new
software.
• Approach: convert adapted software into
equivalent size of new software.
10/31/2007
©USC-CSSE
34
University of Southern California
Center for Systems and Software Engineering
Nonlinear Reuse Effects
•
•
The reuse cost function does not go through the origin due to a cost of about 5%
for assessing, selecting, and assimilating the reusable component.
Small modifications generate disproportionately large costs primarily due the cost
of understanding the software to be modified, and the relative cost of interface
checking.
Data on 2954
NASA modules
[Selby,1988]
1.0
1.0
0.70
0.75
0.55
Relative
cost
0.5
Usual Linear
Assumption
0.25
0.046
0.25
0.5
0.75
1.0
Amount Modified
10/31/2007
©USC-CSSE
35
University of Southern California
Center for Systems and Software Engineering
COCOMO Reuse Model
• A nonlinear estimation model to convert
adapted (reused or modified) software into
equivalent size of new software:
AAF  0.4( DM )  0.3( CM )  0.3( IM )
ESLOC 
ASLOC[ AA  AAF (1  0.02( SU )(UNFM ))]
, AAF  0.5
100
ASLOC[ AA  AAF  ( SU )(UNFM )]
ESLOC 
, AAF  0.5
100
10/31/2007
©USC-CSSE
36
University of Southern California
Center for Systems and Software Engineering
COCOMO Reuse Model cont’d
•
•
•
•
•
•
•
•
•
ASLOC - Adapted Source Lines of Code
ESLOC - Equivalent Source Lines of Code
AAF - Adaptation Adjustment Factor
DM - Percent Design Modified. The percentage of the adapted software's design
which is modified in order to adapt it to the new objectives and environment.
CM - Percent Code Modified. The percentage of the adapted software's code
which is modified in order to adapt it to the new objectives and environment.
IM - Percent of Integration Required for Modified Software. The percentage of
effort required to integrate the adapted software into an overall product and to
test the resulting product as compared to the normal amount of integration and
test effort for software of comparable size.
AA - Assessment and Assimilation effort needed to determine whether a fullyreused software module is appropriate to the application, and to integrate its
description into the overall product description. See table.
SU - Software Understanding. Effort increment as a percentage. Only used
when code is modified (zero when DM=0 and CM=0). See table.
UNFM - Unfamiliarity. The programmer's relative unfamiliarity with the software
which is applied multiplicatively to the software understanding effort increment
(0-1).
10/31/2007
©USC-CSSE
37
University of Southern California
Center for Systems and Software Engineering
Assessment and Assimilation
Increment (AA)
AA Increment
10/31/2007
Level of AA Effort
0
None
2
Basic module search and documentation
4
Some module Test and Evaluation (T&E), documentation
6
Considerable module T&E, documentation
8
Extensive module T&E, documentation
©USC-CSSE
38
University of Southern California
Center for Systems and Software Engineering
Software Understanding
Increment (SU)
• Take the subjective average of the three categories.
• Do not use SU if the component is being used unmodified (DM=0
and CM =0).
Very Low
Low
Nominal
High
Very High
Structure
Very low
cohesion, high
coupling,
spaghetti code.
Moderately low
cohesion, high
coupling.
Reasonably wellstructured; some
weak areas.
Application
Clarity
No match
between program
and application
world views.
Some correlation
between program
and application.
Moderate
Good correlation
correlation
between program
between program and application.
and application.
Clear match between
program and
application worldviews.
SelfObscure code;
Descriptivenes documentation
s
missing, obscure
or obsolete
Some code
commentary and
headers; some
useful
documentation.
Moderate level of
code
commentary,
headers,
documentations.
Good code
commentary and
headers; useful
documentation;
some weak areas.
Self-descriptive code;
documentation up-todate, well-organized,
with design rationale.
SU Increment
to ESLOC
40
30
20
10
10/31/2007
50
©USC-CSSE
High cohesion, low Strong modularity,
coupling.
information hiding in
data / control
structures.
39
University of Southern California
Center for Systems and Software Engineering
Programmer Unfamiliarity
(UNFM)
• Only applies to modified software
UNFM Increment
10/31/2007
Level of Unfamiliarity
0.0
Completely familiar
0.2
Mostly familiar
0.4
Somewhat familiar
0.6
Considerably familiar
0.8
Mostly unfamiliar
1.0
Completely unfamiliar
©USC-CSSE
40
University of Southern California
Center for Systems and Software Engineering
Software Maintenance
• Reuse model also addresses software
maintenance sizing
– all of reused software becomes the maintenance
base, not “equivalent SLOC”
10/31/2007
©USC-CSSE
41
University of Southern California
Center for Systems and Software Engineering
Outline
•
•
•
•
•
•
Nature and value of sizing
Sizing paradoxes and challenges
Sizing metrics
Sizing methods and tools
Sizing research and development needs
Conclusions and references
10/31/2007
©USC-CSSE
42
University of Southern California
Center for Systems and Software Engineering
Basic Methods, Strengths, and Weaknesses
(Adapted from Boehm, 1981)
Method
Strengths
Weaknesses
Pair-wise comparison
•Accurate assessment of
relative size
•Absolute size of benchmark
must be known
Expert judgment
•Assessment of
representativeness,
interactions, exceptional
circumstances
•No better than participants
•Biases, incomplete recall
Analogy
•Based on representative
experience
•Representativeness of
experience
Parkinson
•Correlates with some
experience
•Reinforces poor practice
Price to win
•Often gets the contract
•Generally produces large
overruns
Top-down
•System level focus
•Efficient
•Less detailed basis
•Less stable
Bottom-up
•More detailed basis
•More stable
•Fosters individual commitment
•May overlook system level costs
•Requires more effort
10/31/2007
©USC-CSSE
43
University of Southern California
Center for Systems and Software Engineering
Counting Artifacts
• Artifacts: requirements, inputs, outputs,
classes, use cases, modules, scenarios
– Often weighted by relative difficulty
– Easy to count at initial stage
– Estimates may differ based on level of detail
10/31/2007
©USC-CSSE
44
University of Southern California
Center for Systems and Software Engineering
Comparison with Previous Projects
• Comparable metadata: domain, user base,
platform, etc.
• Pair-wise comparisons
• Differential functionality
– analogy; yesterday’s weather
10/31/2007
©USC-CSSE
45
University of Southern California
Center for Systems and Software Engineering
Group Consensus
• Wideband Delphi
–
–
–
–
–
–
Anonymous estimates
Facilitator provides summary
Estimators discuss results and rationale
Iterative process
Estimates converge after revision in next rounds
Improves understanding of the product’s nature and
scope
– Works when estimators are collocated
• Planning Poker (Cohen, 2005)
–
–
–
–
–
–
–
10/31/2007
Game: deck of cards
Moderator provides the estimation-item
Participants privately choose appropriate card from deck
Divergence is discussed
Iterative – convergence in subsequent rounds
Ensures everyone participates
Useful for estimation in agile projects
©USC-CSSE
46
University of Southern California
Center for Systems and Software Engineering
Probabilistic Methods
• PERT (Putnam and Fitzsimmons, 1979)
–
–
–
–
–
–
3 estimates: optimistic, most likely, pessimistic
Expected Size = optimistic + 4 * (most likely) + pessimistic
6
Std. Dev. = (pessimistic – optimistic) / 6
Easy to use
Ratio reveals uncertainty
Example
Component ai
10/31/2007
mi
bi
Ei
δi
SALES
6K 10K 20K 11K
2.33K
DISPLAY
4
7
13
7.5
1.5
INEDIT
8
12
19
12.5 1.83
TABLES
4
8
12
8
1.33
TOTALS
22
37
64
39
δ E = 3.6
©USC-CSSE
47
University of Southern California
Center for Systems and Software Engineering
Why Do People Underestimate Size?
(Boehm, 1981)
• Basically optimistic and desire to please
• Have incomplete recall of previous
experience
• Generally not familiar with the entire
software job
10/31/2007
©USC-CSSE
48
University of Southern California
Center for Systems and Software Engineering
Sizing R & D Needs
• Counting rules: unambiguous, hard to game
– level of detail for implementation-independent metrics
(e.g. # of requirements) grows with elaboration
• Relations among metrics
– SLOC per FP, requirement, use case, etc.
• Accounting for reuse, volatility, complexity
– avoiding double counting
• Automating the counting process
• Critical success factors for judgment-based sizing
methods
• More empirical data and analysis
10/31/2007
©USC-CSSE
49
University of Southern California
Center for Systems and Software Engineering
Conclusions
• Size plays an important role in estimation and
project management activities
• Software estimation is a “garbage in garbage out”
process
– Bad size in; bad cost out
• A number of paradoxical situations and challenges
make estimation of size difficult
• Different sizing metrics with varying degree of
detail are available at different stages of the
project lifecycle
• A plethora of sizing methods exists with each
method having a unique mix of strengths and
weaknesses
• Sizing R & D needed to bridge the gap between
Systems and Software Engineering
10/31/2007
©USC-CSSE
50
University of Southern California
Center for Systems and Software Engineering
References
•
Books
–
–
–
–
–
–
–
–
•
Journals
–
–
•
Chidamber, S. and C. Kemerer, 1994, “A Metrics Suite for Object Oriented Design,” IEEE
Transactions on Software Engineering.
Putnam, L.H., and Fitzsimmons, A., 1979. Estimating software costs. Datamation.
Tutorials/Lectures/Presentations
–
–
–
–
–
–
•
Boehm, Barry W., Software Engineering Economics, Prentice Hall, 1981.
Boehm, Barry W., et al., Software Cost Estimation With COCOMO II, Prentice Hall, 2000.
Brooks, Jr., Frederick P., The Mythical Man-Month, Addison-Wesley, 1995.
Cockburn, A., Writing Effective Use Cases, Addison-Wesley, 2001.
Cohen, M., Agile Estimating and Planning, Prentice Hall, 2005.
Jones, C., Applied Software Measurement: Assuring Productivity and Quality, McGraw-Hill, 1996.
Futrell, Robert T., et al., Quality Software Project Management, Prentice Hall, 2002.
Stutzke, Richard D., Estimating Software-Intensive Systems, Addison-Wesley, 2005.
Boehm, Barry W., CSCI 577a Lecture, “Cost Estimation with COCOMO II”, 2004.
Boehm, Barry W., BAE Systems SPIRE Tutorial, “The COCOMO II Suite of Software Cost
Estimation Models”, 2004.
Boehm, Barry W., Microsoft Presentation, “Software Productivity Perspectives, Paradoxes,
Pitfalls, and Prospects”, 1999.
Madachy, Ray, CSCI 510 Lecture, “COCOMO II Overview”, 2005.
Seaver, David, “Tactical Benchmarking – The Rosetta Stone for Linking IT and Business
Results”, 2007.
Valerdi, Ricardo, INCOSE Presentation, “The Constructive Systems Engineering Cost Model”,
2005.
Websites
–
–
–
10/31/2007
http://www.crisp.se/planningpoker/
http://www.ifpug.org/
http://www.spr.com/
©USC-CSSE
51
Download