Lessons Learned for Development and Management of Large

Lessons Learned for Development and
Management of Large-Scale Software
Systems
Rick Selby
Director of Software Products
Northrop Grumman Aerospace Systems
310-813-5570, Rick.Selby@NGC.com
Adjunct Professor of Computer Science
University of Southern California
0
© Copyright 2011. Richard W. Selby. All rights reserved.
Organizational Charter Focuses on Embedded
Software Products
 Embedded software for
 Advanced robotic
spacecraft platforms
 High-bandwidth satellite
payloads
 High-power laser
systems
Software Development Lab
Software Analysis
Software Peer Review
Software Process Flow for Each Build, with 3-15 Builds per Program
 Emphasis on both system
management and payload
software
 Reusable, reconfigurable
software architectures and
components
Prometheus / JIMO
NPOESS
JWST
EOS Aqua/Aura
Chandra
AEHF
MTHEL
Airborne Laser
Restricted
 Languages: O-O to C to
assembly
 CMMI Level 5 for Software
in February 2004;
ISO/AS9100; Six Sigma
GeoLITE
 High-reliability, long-life,
real-time embedded
software systems
1
© Copyright 2011. Richard W. Selby. All rights reserved.
Lessons Learned for Development and
Management of Large-Scale Software Systems
 Early planning



People are the largest lever
Engage your stakeholders and set expectations
Embrace change because change is value
 Lifecycle and architecture strategy




Prioritize features and align resources for high-payoff
Develop products incrementally
Iteration facilitates efficient learning
Reuse drives favorable effort and quality economics
 Execution




Organize to enable parallel activities
Invest resources in high return-on-investment activities
Automate testing for early and frequent defect detection
Create schedule margin by delivering early
 Decision making



2
Measurement enables visibility
Modeling and estimation improve decision making
Apply risk management to mitigate risks early
© Copyright 2011. Richard W. Selby. All rights reserved.
Lessons Learned for Development and
Management of Large-Scale Software Systems
 Early planning
 Lifecycle and architecture strategy
 Execution
 Decision making
3
© Copyright 2011. Richard W. Selby. All rights reserved.
People are the Largest Lever
 Barry Boehm’s comparisons of actual productivity rates across projects
substantiated the COCOMO productivity multiplier factors of 4.18 due to
personnel/team capability, etc.
Source: B. Boehm, “Improving Software Productivity,” IEEE Computer, Vol. 20, Issue 9, September 1987,
pp. 43-57.
4
© Copyright 2011. Richard W. Selby. All rights reserved.
Engage Your Stakeholders and Set Expectations
Source: B. Boehm, “Critical Success Factors for Schedule Estimation and Improvement,” 26th International
Forum on Systems, Software, and COCOMO Cost Modeling, November 2, 2011.
5
© Copyright 2011. Richard W. Selby. All rights reserved.
Embrace Change because Change is Value
Criteria for high adoption rate for innovations:
 Relative advantage – The innovation is technically superior
(in terms of cost, functionality, “image”, etc.) than the
technology it supersedes.
 Compatibility – The innovation is compatible with existing
values, skills, and work practices of potential adopters.
 Lack of complexity – The innovation is not relatively
difficult to understand and use.
 Trialability – The innovation can be experimented with on a
trial basis without undue effort and expense; it can be
implemented incrementally and still provide a net positive
benefit.
 Observability – The results and benefits of the innovation’s
use can be easily observed and communicated to others.
Source: E. M. Rogers, Diffusion of Innovations, Free Press, New York, 1983.
6
© Copyright 2011. Richard W. Selby. All rights reserved.
Lessons Learned for Development and
Management of Large-Scale Software Systems
 Early planning
 Lifecycle and architecture strategy
 Execution
 Decision making
7
© Copyright 2011. Richard W. Selby. All rights reserved.
Prioritize Features and Align Resources for HighPayoff
 Synchronize-and-stabilize lifecycle has planning,
development, and stabilization phases
 Planning phase



Vision statement – Product and program management use extensive customer input to identify and
prioritize product features
Specification document – Based on vision statement, program management and development group
define feature functionality, architectural issues, and component interdependencies
Schedule and feature team formation – Based on specification document, program management
coordinates schedule and arranges feature teams that each contain approximately 1 program manager,
3-8 developers, and 3-8 testers (who work in parallel 1:1 with developers)
 Development phase




Program managers coordinate evolution of specification. Developers design, code, and debug. Testers
pair up with developers for continuous testing.
Subproject I – First 1/3 of features: Most critical features and shared components.
Subproject II – Second 1/3 of features.
Subproject III – Final 1/3 of features: Least critical features.
 Stabilization phase



8

Program managers coordinate OEMs and ISVs and monitor customer feedback. Developers perform
final debugging and code stabilization. Testers recreate and isolate errors.
Internal testing – Thorough testing of complete product within the company.
External testing – Thorough testing of complete product outside the company by “beta” sites such as
OEMs, ISVs, and end-users.
Release preparation – Prepare final release of “golden master” version and documentation for
© Copyright 2011. Richard W. Selby. All rights reserved.
manufacturing.
Develop Products Incrementally
Ph a se s Tim e li n e Mi le s to n e s
Ma j o r
Re vie ws
Do cu m e n ts a n d
Int e rm e di a te
Acti vit ie s
Mil es to ne 0
Vis io n st at ement
Speci fi catio n d ocu men t
Planning
3-12 months
6-10 w eek s
• Co d e an d op t imizat io ns
• Test in g an d d eb u gg in g
• Feat u re st ab il izat io n
9
6-16 months
Development
Development
6-12 months
2 -5 w eek s
• Bu ffer ti me
Subproject II
Subproject I
Pro ject pla n
a pproval
2 -5 w eek s
• In tegrat io n
• Test in g an d d eb u gg in g
P ro to ty p es
D es ig n feas ib il it y st ud ies
Tes ti ng s trateg y
Sch ed u le
P ro ject
r eview
Imp lemen tati on p lan
Mil es to n e I
releas e
Mil es to n e II
releas e
Mil es to n e III
releas e
Vis ual freeze
Featu re comp lete
C o de complete
Stabilization
3-8 months
 Synchronize-andstabilize lifecycle
timeline and
milestones enable
frequent
incremental
deliveries
Sch ed u le
comp lete
Subproject III
Devel o pment
Subpro ject
2 -4 mo nt hs
(1 /3 o f al l featu res)
S peci fi ca ti on
r eview
O p timi zati on s
Tes ti ng and d ebu g gi ng
O p timi zati on s
Tes ti ng and d ebu g gi ng
In tern al t es ti ng
Buffer t ime
Bet a testi n g
Buffer t ime
Zero bu g releas e
R el ea s e to
ma nuf acturi ng
(Sh ip d at e)
P os tmo rt em
d ocu men t
© Copyright 2011. Richard W. Selby. All rights reserved.
Iteration Facilitates Efficient Learning
Figure 4.3-4. JIMO Incremental Software Builds
We provide incremental software deliveries support integration and test activities and synchronize with JPL,
Hamilton Sundstrand, and Naval Reactors to facilitate teaming, reduce risk, and enhance mission assurance.
 Incremental
software
builds
deliver
early
capabilities
and
accelerate
integration
and test
 Iteration
helps refine
problem
statements,
create
potential
solutions,
and elicit
feedback
CY 2004 2005
2006
2007
2008
2009
2010
A
B
C
ATP
PMSR
SM PDR
SM CDR
11/04
1/05
6/08
8/10
Flight Computer Unit (FCU) Builds
P
FCU1
Prelim Exec and C&DH Software
P
FCU2
2011
2012
2013
D
BUS I&T SM AI&T
8/12
8/13
JPL/NGC, Prelim.
Hardware/Software
Integration
JPL/NGC, Final Hardware
/Software Integration
JPL, Mission Module
Integration
Final Exec and C&DH Software
P
FCU3 Science Computer Interface
P
FCU4
Power Controller Interface
Reactor
AACS (includes autonomous navigation) P FCU5
Thermal and Power Control
P
FCU6
Configuration and Fault Protection
P
FCU7
Science Computer Unit (SCU) Builds
Note: Science Computer builds for common software only (no instrument software included)
Prelim Exec and C&DH Software
SCU1
SCU2
Final Exec and C&DH Software
Data Server Unit (DSU) Builds
DSU1
Prelim Exec and C&DH Software
DSU2
Final Exec and C&DH Software
P DSU3 Data Server Unique Software
Ground Analysis Software (GAS) Computer Builds
P
Preliminary Ground Analysis Software GAS1
Final Ground Analysis Software GAS2
Legend:
=
1
2
3
4
5
=
1
2
3
4
5
=
1
2
3
4
5
N
Design Agent
Performer of Activity N
JPL
P
Prototype
NGC
Role/activity shared by Activity
JPL and NGC
Delivered to, Usage
NR, Reactor
Power Controller
Integration
NGC, AACS Validation on
SMTB
NGC, TCS/EPS
Validation on SSTB
NGC, Fault Protection S/W
Validation on SSTB
JPL, Prelim.
Hardware/Software
Integration
JPL, Final Hardware/
Software Integration
NGC, Prelim. Hardware/
Software Integration
NGC, Final Hardware/
Software Integration
NGC, HCR Integration on
SMTB
JPL, Prelim. Integration
into Ground System
JPL, Final Integration into
Ground System
N is defined as follows:
1 Requirements
2 Preliminary Design
3 Detailed Design
4 Code and Unit Test/Software
Integration
5 Verification and Validation
04S01176-4-108f_154
10
© Copyright 2011. Richard W. Selby. All rights reserved.
Reuse Drives Favorable Effort and Quality
Economics
1.50
Faults per module (ave.)
1.25
1.00
0.75
0.50
0.25
0.00
New
development
Major revision
Mean
1.28
1.18
0.58
0.02
0.85
Std. dev.
2.88
1.81
1.20
0.17
2.29
Slight revision Complete reuse
All
Module origin
 Analyses of component-based software reuse shows favorable trends for
decreasing faults
 Data from 25 NASA systems
 Overall difference is statistically significant ( < .0001). Number of
components (or modules) in each category is: 1629, 205, 300, 820, and
11 2954, respectively.
© Copyright 2011. Richard W. Selby. All rights reserved.
Lessons Learned for Development and
Management of Large-Scale Software Systems
 Early planning
 Lifecycle and architecture strategy
 Execution
 Decision making
12
© Copyright 2011. Richard W. Selby. All rights reserved.
Organize to Enable Parallel Activities
Track
cycletime
between
activities
Legend
=
Automatable
activity
13
Source: “Flowchart,” www.wikipedia.org, May 2010.
16
© Copyright 2011. Richard W. Selby. All rights reserved.
Invest Resources in High Return-on-Investment
Activities
# Defects
Total Ave.
ROI
800
4500
3867
3782
257
29
N/A
15
1.7
N/A
3000
Prevention
cycles
2500
Defects
2621
291
7.3
Defects per
review
N/A
15
N/A
8.1%
1.3
4000
3326
1877
3500
600
500
# Defects
Reviews
3638
400
2000
300
1500
200
143
100
1000
292
500
16
9
S
Defects out- N/A
of-phase
NP
O
ES
S
NM
DA
S
AS
U
Projects
G
FC
/C
S
CT
I
Su
pp
o
LA
I
rt
0
RC
M
0
ke
ye
ROI
700
RS
JS
TA
JS
F
B2
C
Ha
w
 Return-on-investment (ROI) for software peer reviews ranges from 9:1 to 3800:1
per project
 Return-on-investment (ROI) = Net cost avoidance divided by non-recurring cost
 2621 defects, 257 reviews, 9 systems, 1.5 years
 High ROI drivers
 Mature and effective processes already in place
 Significant new scope under development
 Early lifecycle peer reviews (e.g., requirements phase)
 Four of the five programs with >80% requirements and design defects had
relatively higher ROI
E2
14
Ave. /
EKSLOC
© Copyright 2011. Richard W. Selby. All rights reserved.
Automate Testing for Early and Frequent Defect
Detection
Software Defect Injection Phase
49.1%
50.0%
45.0%
Defects (%)
40.0%
35.0%
30.0%
25.0%
20.0%
15.0%
9.0%
10.0%
5.0%
12.1% 11.7%
2.5% 1.9%
2.3%
0.0%
10.6%
0.7% 0.1% 0.0%
ns
e
io
ra
t
pe
nt
e
ai
M
O
o
na
I&
nc
T
n
rt
t
po
er
ifi
ca
tio
t
Te
s
V
Su
p
te
In
SW
SW
gr
at
io
ni
U
n
tT
es
t
e
od
es
i
D
C
gn
n
D
et
ai
in
le
d
ar
y
D
es
en
re
m
ui
lim
R
ig
ts
rc
e
.S
ou
eq
eq
Pr
e
Ex
t
er
na
lR
Pr
o
po
sa
l
0.0%
System Development Phase
 Distribution of software defect injection phases based on using peer
reviews across 12 system development phases
 3418 defects, 731 peer reviews, 14 systems, 2.67 years
 49% of defects injected during requirements phase
15
© Copyright 2011. Richard W. Selby. All rights reserved.
Create Schedule Margin by Delivering Early
 Critical path
defines the
path through
the network
containing
activities with
zero slack
Legend
Critical path
Source: “The Network Diagram and Critical Path,” www.slideshare.net, May 2010.
16
© Copyright 2011. Richard W. Selby. All rights reserved.
Lessons Learned for Development and
Management of Large-Scale Software Systems
 Early planning
 Lifecycle and architecture strategy
 Execution
 Decision making
17
© Copyright 2011. Richard W. Selby. All rights reserved.
Measurement Enables Visibility
DASHBOARD
Outliers
Data
Contact
Help
Metrics:
Development
Organization:
ABC Products Division
Requirements
Plan
Actual
LCL
UCL
Outliers
Data
Contact
Help
Project:
XYZ System
Outliers
Data
Contact
Help
Reuse
Plan
Actual
LCL
UCL
70%
60%
50%
40%
30%
20%
10%
0%
30
25
20
15
10
5
0
Jun-04
Jul-04
Aug-04 Sep-04
Proposal
SSR
PDR
Manager:
FirstName LastName
CDR
Contact:
Name@ABC.com x12345
Outliers
Data
Contact
Help
Technology Infusion
7
6
5
4
3
2
1
0
Jun-04
Plan
Actual
LCL
UCL
Status:
10/1/2004
Progress
Plan
Actual
LCL
UCL
250
200
150
100
50
0
Jul-04
Aug-04
Sep-04
Jun-04
Jul-04
Aug-04 Sep-04
Up Down Level 1 All
Up Down Level 1 All
Up Down Level 1 All
Up Down Level 1 All
Outliers
Data
Contact
Help
Outliers
Data
Contact
Help
Outliers
Data
Contact
Help
Outliers
Post-Delivery Defects
Data
Plan
Actual
Contact
Help
LCL
UCL
Cycletime
Plan
Actual
LCL
UCL
Deliveries
Plan
Actual
LCL
UCL
25
25
250
20
20
200
15
15
150
10
10
100
5
5
50
0
0
Jun-04
Jul-04
Up Down Level 1 All
Aug-04
Sep-04
Pre-Delivery Defects
Plan
Actual
LCL
UCL
14
12
10
8
6
4
2
0
0
Jun-04
Jul-04
Up Down Level 1 All
Aug-04
Sep-04
Jun-04
Jul-04
Up Down Level 1 All
Aug-04 Sep-04
Jun-04
Jul-04
Aug-04
Sep-04
Up Down Level 1 All
 Interactive metric dashboards provide framework for visibility, flexibility,
integration, and automation
 Interactive metric dashboards incorporate a variety of information and
features to help developers and managers characterize progress, identify
outliers, compare alternatives, evaluate risks, and predict outcomes
18
© Copyright 2011. Richard W. Selby. All rights reserved.
Modeling and Estimation Improve Decision Making
19
Overall average without optimizations
Optimizations for Consistency
Predicting all to be "+"
Predicting none to be "+"
Optimizations for Completeness
100
Consistency ( = 100% less % of False
Positives)
 Target: Identify
error-prone (top
25%) and effortprone (top 25%)
components
 16 large NASA
systems
 960 configurations
 Models use metricdriven decision
trees and
networks
 Analyses tradeoff
prediction
consistency
versus
completeness
90
80
70
60
50
40
30
20
10
0
0
10
20
30
40
50
60
70
80
90
100
Completeness ( = 100% less % of False Negatives)
© Copyright 2011. Richard W. Selby. All rights reserved.
Apply Risk Management to Mitigate Risks Early
WBS: 4.0 Spacecraft IPTRisk Ow ner: McWhorter, Larry Printed: 14 Dec 2005
Risk: CEV-252 - Flight Softw are Requirem ents Managem ent
26
25
24
23
22
21
20
19
18
17
16
15
14
13 26
12 25
24
11
23
10 22
9 21
8 20
7 19
6 18
5 17
16
4
15
3 14
2 13
1 12
0 11
High
3
2
1
4
TRL
4
TRL
5
Moderate
Exit/Success Criteria:
1. BM1 complete; customer concurs with approach
2. Software requirements
scope estimated
(preliminary).
Last Updated:
03-October-05
3. Software control board established (preliminary); change control process established.
Last Updated: 14-Dec-05
4. SDP released. Spec tree defined.
5. RTOS lab evaluation completed. Capabilities validated using sim.
6. Software requirements scope estimated (final)
7. System development process flow models implemented.
8. Spacecraft/subsystems/etc. users define use cases (for I/Fs, functions, nominal ops, off-nominal ops, etc.)
completed.
Validated using models/sim.
7
9. Finalize IFC1 requirements: Infrastructure SW completed. Validated requirements using models/sim.
10. Baseline allocation of SW requirements to IFCs with growth/correction/deficiency completed.
5 6
8
11. Software control board (final) established
12
12. SwRR conducted. NASA customer agrees with software requirements.
11
13. Finalize IFC2 requirements: Inter-module & inter-subsystem I/Fs completed. Validated requirements using
WBS: 4.0 Spacecraft IPTRisk Ow ner: Wood, Doug Printed: models/sim.
15 Nov 2005
9
10
14. Initial
end-to-end architecture model completed.
Risk: CEV-252 - Flight Softw are Requirem ents Managem
ent
TRL
15. Finalize IFC3 requirements: Subsystems major functions completed. Validated requirements using
6
models/sim.
16. Finalize IFC4 requirements: Nominal operations completed. Validated requirements using models/sim.
14
17. Deliver IFC3: Subsystems major functions completed. Validated capabilities using sim.
13
15
16
18. Finalize IFC5 requirements: Subsystems off-nominal operations completed. Validated requirements using
models/sim.
19. Finalize IFC5 requirements: Subsystems off-nominal operations completed. Validated requirements using
17models/sim.
20. Deliver IFC7: No new capabilities; Only system I&T corrections completed. SW complete for 1st mission.
TRL
7 users define/val/sim use-cases (I/F, funct, off-nom, ...)
Spacecraft/etc
18
Risk Score
Low
20
Conduct Sw RR
CSRR
SDR
PDR
CDR
Finalize/val/sim IFC4 requirements: Nominal operations
10
2005
2006
2007
Events
9
8
1 : Conduct
BM1
7
3 : Establish
softw are control board (preliminary)
6
5 : Complete
RTOS lab evaluation; Validate capabilities using sim
5
7 : Implement
system development process flow models
4
9 : Finalize/val/sim
IFC1 requirements: Infrastructure SW
11 : 3Establish softw are control board (final)
13 : 2Finalize/val/sim IFC2 requirements: Inter-module & inter-subsystem I/F
15 : 1Finalize/val/sim IFC3 requirements: Subsystems major functions
17 : 0Deliver IFC3: Subsystems major functions
2005
2006 System off-nominal operations
2007
19 : Finalize/val/sim
IFC6 requirements:
20
19
Planned Risk Level
Actual Risk Level
2008
2009
2010
Deliver IFC3: Subsystems major functions
2 : Estimate softw are requirements scope (preliminary)
4 : Release SDP (w ith spec tree, change process)
6 : Estimate softw are requirements scope (final)
8 : Spacecraft/etc users define/val/sim use-cases (I/F, funct, off-nom, ...)
10 : Baseline allocation of SW requirements to IFCs w ith grow th/correction
12 : Conduct Sw RR
14 : Initial end-to-end architecture model
16 : Finalize/val/sim IFC4 requirements: Nominal operations
18 : Finalize/val/sim IFC5 requirements: Subsystems off-nominal operations
2009
20 : Deliver IFC7:2008
No new capabilities; Only corrections;
1st mission 2010
ready
Planned (Solid=Linked, Hollow =Unlinked)
Completed
 Projects
define risk
mitigation
“burn down”
charts with
specific
tasks and
exit criteria
Control Points
Completed
© Copyright 2011. Richard W. Selby. All rights reserved.
Lessons Learned for Development and
Management of Large-Scale Software Systems
 Early planning



People are the largest lever
Engage your stakeholders and set expectations
Embrace change because change is value
 Lifecycle and architecture strategy




Prioritize features and align resources for high-payoff
Develop products incrementally
Iteration facilitates efficient learning
Reuse drives favorable effort and quality economics
 Execution




Organize to enable parallel activities
Invest resources in high return-on-investment activities
Automate testing for early and frequent defect detection
Create schedule margin by delivering early
 Decision making



21
Measurement enables visibility
Modeling and estimation improve decision making
Apply risk management to mitigate risks early
© Copyright 2011. Richard W. Selby. All rights reserved.