Lessons Learned for Development and Management of Large-Scale Software Systems Rick Selby Director of Software Products Northrop Grumman Aerospace Systems 310-813-5570, Rick.Selby@NGC.com Adjunct Professor of Computer Science University of Southern California 0 © Copyright 2011. Richard W. Selby. All rights reserved. Organizational Charter Focuses on Embedded Software Products Embedded software for Advanced robotic spacecraft platforms High-bandwidth satellite payloads High-power laser systems Software Development Lab Software Analysis Software Peer Review Software Process Flow for Each Build, with 3-15 Builds per Program Emphasis on both system management and payload software Reusable, reconfigurable software architectures and components Prometheus / JIMO NPOESS JWST EOS Aqua/Aura Chandra AEHF MTHEL Airborne Laser Restricted Languages: O-O to C to assembly CMMI Level 5 for Software in February 2004; ISO/AS9100; Six Sigma GeoLITE High-reliability, long-life, real-time embedded software systems 1 © Copyright 2011. Richard W. Selby. All rights reserved. Lessons Learned for Development and Management of Large-Scale Software Systems Early planning People are the largest lever Engage your stakeholders and set expectations Embrace change because change is value Lifecycle and architecture strategy Prioritize features and align resources for high-payoff Develop products incrementally Iteration facilitates efficient learning Reuse drives favorable effort and quality economics Execution Organize to enable parallel activities Invest resources in high return-on-investment activities Automate testing for early and frequent defect detection Create schedule margin by delivering early Decision making 2 Measurement enables visibility Modeling and estimation improve decision making Apply risk management to mitigate risks early © Copyright 2011. Richard W. Selby. All rights reserved. Lessons Learned for Development and Management of Large-Scale Software Systems Early planning Lifecycle and architecture strategy Execution Decision making 3 © Copyright 2011. Richard W. Selby. All rights reserved. People are the Largest Lever Barry Boehm’s comparisons of actual productivity rates across projects substantiated the COCOMO productivity multiplier factors of 4.18 due to personnel/team capability, etc. Source: B. Boehm, “Improving Software Productivity,” IEEE Computer, Vol. 20, Issue 9, September 1987, pp. 43-57. 4 © Copyright 2011. Richard W. Selby. All rights reserved. Engage Your Stakeholders and Set Expectations Source: B. Boehm, “Critical Success Factors for Schedule Estimation and Improvement,” 26th International Forum on Systems, Software, and COCOMO Cost Modeling, November 2, 2011. 5 © Copyright 2011. Richard W. Selby. All rights reserved. Embrace Change because Change is Value Criteria for high adoption rate for innovations: Relative advantage – The innovation is technically superior (in terms of cost, functionality, “image”, etc.) than the technology it supersedes. Compatibility – The innovation is compatible with existing values, skills, and work practices of potential adopters. Lack of complexity – The innovation is not relatively difficult to understand and use. Trialability – The innovation can be experimented with on a trial basis without undue effort and expense; it can be implemented incrementally and still provide a net positive benefit. Observability – The results and benefits of the innovation’s use can be easily observed and communicated to others. Source: E. M. Rogers, Diffusion of Innovations, Free Press, New York, 1983. 6 © Copyright 2011. Richard W. Selby. All rights reserved. Lessons Learned for Development and Management of Large-Scale Software Systems Early planning Lifecycle and architecture strategy Execution Decision making 7 © Copyright 2011. Richard W. Selby. All rights reserved. Prioritize Features and Align Resources for HighPayoff Synchronize-and-stabilize lifecycle has planning, development, and stabilization phases Planning phase Vision statement – Product and program management use extensive customer input to identify and prioritize product features Specification document – Based on vision statement, program management and development group define feature functionality, architectural issues, and component interdependencies Schedule and feature team formation – Based on specification document, program management coordinates schedule and arranges feature teams that each contain approximately 1 program manager, 3-8 developers, and 3-8 testers (who work in parallel 1:1 with developers) Development phase Program managers coordinate evolution of specification. Developers design, code, and debug. Testers pair up with developers for continuous testing. Subproject I – First 1/3 of features: Most critical features and shared components. Subproject II – Second 1/3 of features. Subproject III – Final 1/3 of features: Least critical features. Stabilization phase 8 Program managers coordinate OEMs and ISVs and monitor customer feedback. Developers perform final debugging and code stabilization. Testers recreate and isolate errors. Internal testing – Thorough testing of complete product within the company. External testing – Thorough testing of complete product outside the company by “beta” sites such as OEMs, ISVs, and end-users. Release preparation – Prepare final release of “golden master” version and documentation for © Copyright 2011. Richard W. Selby. All rights reserved. manufacturing. Develop Products Incrementally Ph a se s Tim e li n e Mi le s to n e s Ma j o r Re vie ws Do cu m e n ts a n d Int e rm e di a te Acti vit ie s Mil es to ne 0 Vis io n st at ement Speci fi catio n d ocu men t Planning 3-12 months 6-10 w eek s • Co d e an d op t imizat io ns • Test in g an d d eb u gg in g • Feat u re st ab il izat io n 9 6-16 months Development Development 6-12 months 2 -5 w eek s • Bu ffer ti me Subproject II Subproject I Pro ject pla n a pproval 2 -5 w eek s • In tegrat io n • Test in g an d d eb u gg in g P ro to ty p es D es ig n feas ib il it y st ud ies Tes ti ng s trateg y Sch ed u le P ro ject r eview Imp lemen tati on p lan Mil es to n e I releas e Mil es to n e II releas e Mil es to n e III releas e Vis ual freeze Featu re comp lete C o de complete Stabilization 3-8 months Synchronize-andstabilize lifecycle timeline and milestones enable frequent incremental deliveries Sch ed u le comp lete Subproject III Devel o pment Subpro ject 2 -4 mo nt hs (1 /3 o f al l featu res) S peci fi ca ti on r eview O p timi zati on s Tes ti ng and d ebu g gi ng O p timi zati on s Tes ti ng and d ebu g gi ng In tern al t es ti ng Buffer t ime Bet a testi n g Buffer t ime Zero bu g releas e R el ea s e to ma nuf acturi ng (Sh ip d at e) P os tmo rt em d ocu men t © Copyright 2011. Richard W. Selby. All rights reserved. Iteration Facilitates Efficient Learning Figure 4.3-4. JIMO Incremental Software Builds We provide incremental software deliveries support integration and test activities and synchronize with JPL, Hamilton Sundstrand, and Naval Reactors to facilitate teaming, reduce risk, and enhance mission assurance. Incremental software builds deliver early capabilities and accelerate integration and test Iteration helps refine problem statements, create potential solutions, and elicit feedback CY 2004 2005 2006 2007 2008 2009 2010 A B C ATP PMSR SM PDR SM CDR 11/04 1/05 6/08 8/10 Flight Computer Unit (FCU) Builds P FCU1 Prelim Exec and C&DH Software P FCU2 2011 2012 2013 D BUS I&T SM AI&T 8/12 8/13 JPL/NGC, Prelim. Hardware/Software Integration JPL/NGC, Final Hardware /Software Integration JPL, Mission Module Integration Final Exec and C&DH Software P FCU3 Science Computer Interface P FCU4 Power Controller Interface Reactor AACS (includes autonomous navigation) P FCU5 Thermal and Power Control P FCU6 Configuration and Fault Protection P FCU7 Science Computer Unit (SCU) Builds Note: Science Computer builds for common software only (no instrument software included) Prelim Exec and C&DH Software SCU1 SCU2 Final Exec and C&DH Software Data Server Unit (DSU) Builds DSU1 Prelim Exec and C&DH Software DSU2 Final Exec and C&DH Software P DSU3 Data Server Unique Software Ground Analysis Software (GAS) Computer Builds P Preliminary Ground Analysis Software GAS1 Final Ground Analysis Software GAS2 Legend: = 1 2 3 4 5 = 1 2 3 4 5 = 1 2 3 4 5 N Design Agent Performer of Activity N JPL P Prototype NGC Role/activity shared by Activity JPL and NGC Delivered to, Usage NR, Reactor Power Controller Integration NGC, AACS Validation on SMTB NGC, TCS/EPS Validation on SSTB NGC, Fault Protection S/W Validation on SSTB JPL, Prelim. Hardware/Software Integration JPL, Final Hardware/ Software Integration NGC, Prelim. Hardware/ Software Integration NGC, Final Hardware/ Software Integration NGC, HCR Integration on SMTB JPL, Prelim. Integration into Ground System JPL, Final Integration into Ground System N is defined as follows: 1 Requirements 2 Preliminary Design 3 Detailed Design 4 Code and Unit Test/Software Integration 5 Verification and Validation 04S01176-4-108f_154 10 © Copyright 2011. Richard W. Selby. All rights reserved. Reuse Drives Favorable Effort and Quality Economics 1.50 Faults per module (ave.) 1.25 1.00 0.75 0.50 0.25 0.00 New development Major revision Mean 1.28 1.18 0.58 0.02 0.85 Std. dev. 2.88 1.81 1.20 0.17 2.29 Slight revision Complete reuse All Module origin Analyses of component-based software reuse shows favorable trends for decreasing faults Data from 25 NASA systems Overall difference is statistically significant ( < .0001). Number of components (or modules) in each category is: 1629, 205, 300, 820, and 11 2954, respectively. © Copyright 2011. Richard W. Selby. All rights reserved. Lessons Learned for Development and Management of Large-Scale Software Systems Early planning Lifecycle and architecture strategy Execution Decision making 12 © Copyright 2011. Richard W. Selby. All rights reserved. Organize to Enable Parallel Activities Track cycletime between activities Legend = Automatable activity 13 Source: “Flowchart,” www.wikipedia.org, May 2010. 16 © Copyright 2011. Richard W. Selby. All rights reserved. Invest Resources in High Return-on-Investment Activities # Defects Total Ave. ROI 800 4500 3867 3782 257 29 N/A 15 1.7 N/A 3000 Prevention cycles 2500 Defects 2621 291 7.3 Defects per review N/A 15 N/A 8.1% 1.3 4000 3326 1877 3500 600 500 # Defects Reviews 3638 400 2000 300 1500 200 143 100 1000 292 500 16 9 S Defects out- N/A of-phase NP O ES S NM DA S AS U Projects G FC /C S CT I Su pp o LA I rt 0 RC M 0 ke ye ROI 700 RS JS TA JS F B2 C Ha w Return-on-investment (ROI) for software peer reviews ranges from 9:1 to 3800:1 per project Return-on-investment (ROI) = Net cost avoidance divided by non-recurring cost 2621 defects, 257 reviews, 9 systems, 1.5 years High ROI drivers Mature and effective processes already in place Significant new scope under development Early lifecycle peer reviews (e.g., requirements phase) Four of the five programs with >80% requirements and design defects had relatively higher ROI E2 14 Ave. / EKSLOC © Copyright 2011. Richard W. Selby. All rights reserved. Automate Testing for Early and Frequent Defect Detection Software Defect Injection Phase 49.1% 50.0% 45.0% Defects (%) 40.0% 35.0% 30.0% 25.0% 20.0% 15.0% 9.0% 10.0% 5.0% 12.1% 11.7% 2.5% 1.9% 2.3% 0.0% 10.6% 0.7% 0.1% 0.0% ns e io ra t pe nt e ai M O o na I& nc T n rt t po er ifi ca tio t Te s V Su p te In SW SW gr at io ni U n tT es t e od es i D C gn n D et ai in le d ar y D es en re m ui lim R ig ts rc e .S ou eq eq Pr e Ex t er na lR Pr o po sa l 0.0% System Development Phase Distribution of software defect injection phases based on using peer reviews across 12 system development phases 3418 defects, 731 peer reviews, 14 systems, 2.67 years 49% of defects injected during requirements phase 15 © Copyright 2011. Richard W. Selby. All rights reserved. Create Schedule Margin by Delivering Early Critical path defines the path through the network containing activities with zero slack Legend Critical path Source: “The Network Diagram and Critical Path,” www.slideshare.net, May 2010. 16 © Copyright 2011. Richard W. Selby. All rights reserved. Lessons Learned for Development and Management of Large-Scale Software Systems Early planning Lifecycle and architecture strategy Execution Decision making 17 © Copyright 2011. Richard W. Selby. All rights reserved. Measurement Enables Visibility DASHBOARD Outliers Data Contact Help Metrics: Development Organization: ABC Products Division Requirements Plan Actual LCL UCL Outliers Data Contact Help Project: XYZ System Outliers Data Contact Help Reuse Plan Actual LCL UCL 70% 60% 50% 40% 30% 20% 10% 0% 30 25 20 15 10 5 0 Jun-04 Jul-04 Aug-04 Sep-04 Proposal SSR PDR Manager: FirstName LastName CDR Contact: Name@ABC.com x12345 Outliers Data Contact Help Technology Infusion 7 6 5 4 3 2 1 0 Jun-04 Plan Actual LCL UCL Status: 10/1/2004 Progress Plan Actual LCL UCL 250 200 150 100 50 0 Jul-04 Aug-04 Sep-04 Jun-04 Jul-04 Aug-04 Sep-04 Up Down Level 1 All Up Down Level 1 All Up Down Level 1 All Up Down Level 1 All Outliers Data Contact Help Outliers Data Contact Help Outliers Data Contact Help Outliers Post-Delivery Defects Data Plan Actual Contact Help LCL UCL Cycletime Plan Actual LCL UCL Deliveries Plan Actual LCL UCL 25 25 250 20 20 200 15 15 150 10 10 100 5 5 50 0 0 Jun-04 Jul-04 Up Down Level 1 All Aug-04 Sep-04 Pre-Delivery Defects Plan Actual LCL UCL 14 12 10 8 6 4 2 0 0 Jun-04 Jul-04 Up Down Level 1 All Aug-04 Sep-04 Jun-04 Jul-04 Up Down Level 1 All Aug-04 Sep-04 Jun-04 Jul-04 Aug-04 Sep-04 Up Down Level 1 All Interactive metric dashboards provide framework for visibility, flexibility, integration, and automation Interactive metric dashboards incorporate a variety of information and features to help developers and managers characterize progress, identify outliers, compare alternatives, evaluate risks, and predict outcomes 18 © Copyright 2011. Richard W. Selby. All rights reserved. Modeling and Estimation Improve Decision Making 19 Overall average without optimizations Optimizations for Consistency Predicting all to be "+" Predicting none to be "+" Optimizations for Completeness 100 Consistency ( = 100% less % of False Positives) Target: Identify error-prone (top 25%) and effortprone (top 25%) components 16 large NASA systems 960 configurations Models use metricdriven decision trees and networks Analyses tradeoff prediction consistency versus completeness 90 80 70 60 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 Completeness ( = 100% less % of False Negatives) © Copyright 2011. Richard W. Selby. All rights reserved. Apply Risk Management to Mitigate Risks Early WBS: 4.0 Spacecraft IPTRisk Ow ner: McWhorter, Larry Printed: 14 Dec 2005 Risk: CEV-252 - Flight Softw are Requirem ents Managem ent 26 25 24 23 22 21 20 19 18 17 16 15 14 13 26 12 25 24 11 23 10 22 9 21 8 20 7 19 6 18 5 17 16 4 15 3 14 2 13 1 12 0 11 High 3 2 1 4 TRL 4 TRL 5 Moderate Exit/Success Criteria: 1. BM1 complete; customer concurs with approach 2. Software requirements scope estimated (preliminary). Last Updated: 03-October-05 3. Software control board established (preliminary); change control process established. Last Updated: 14-Dec-05 4. SDP released. Spec tree defined. 5. RTOS lab evaluation completed. Capabilities validated using sim. 6. Software requirements scope estimated (final) 7. System development process flow models implemented. 8. Spacecraft/subsystems/etc. users define use cases (for I/Fs, functions, nominal ops, off-nominal ops, etc.) completed. Validated using models/sim. 7 9. Finalize IFC1 requirements: Infrastructure SW completed. Validated requirements using models/sim. 10. Baseline allocation of SW requirements to IFCs with growth/correction/deficiency completed. 5 6 8 11. Software control board (final) established 12 12. SwRR conducted. NASA customer agrees with software requirements. 11 13. Finalize IFC2 requirements: Inter-module & inter-subsystem I/Fs completed. Validated requirements using WBS: 4.0 Spacecraft IPTRisk Ow ner: Wood, Doug Printed: models/sim. 15 Nov 2005 9 10 14. Initial end-to-end architecture model completed. Risk: CEV-252 - Flight Softw are Requirem ents Managem ent TRL 15. Finalize IFC3 requirements: Subsystems major functions completed. Validated requirements using 6 models/sim. 16. Finalize IFC4 requirements: Nominal operations completed. Validated requirements using models/sim. 14 17. Deliver IFC3: Subsystems major functions completed. Validated capabilities using sim. 13 15 16 18. Finalize IFC5 requirements: Subsystems off-nominal operations completed. Validated requirements using models/sim. 19. Finalize IFC5 requirements: Subsystems off-nominal operations completed. Validated requirements using 17models/sim. 20. Deliver IFC7: No new capabilities; Only system I&T corrections completed. SW complete for 1st mission. TRL 7 users define/val/sim use-cases (I/F, funct, off-nom, ...) Spacecraft/etc 18 Risk Score Low 20 Conduct Sw RR CSRR SDR PDR CDR Finalize/val/sim IFC4 requirements: Nominal operations 10 2005 2006 2007 Events 9 8 1 : Conduct BM1 7 3 : Establish softw are control board (preliminary) 6 5 : Complete RTOS lab evaluation; Validate capabilities using sim 5 7 : Implement system development process flow models 4 9 : Finalize/val/sim IFC1 requirements: Infrastructure SW 11 : 3Establish softw are control board (final) 13 : 2Finalize/val/sim IFC2 requirements: Inter-module & inter-subsystem I/F 15 : 1Finalize/val/sim IFC3 requirements: Subsystems major functions 17 : 0Deliver IFC3: Subsystems major functions 2005 2006 System off-nominal operations 2007 19 : Finalize/val/sim IFC6 requirements: 20 19 Planned Risk Level Actual Risk Level 2008 2009 2010 Deliver IFC3: Subsystems major functions 2 : Estimate softw are requirements scope (preliminary) 4 : Release SDP (w ith spec tree, change process) 6 : Estimate softw are requirements scope (final) 8 : Spacecraft/etc users define/val/sim use-cases (I/F, funct, off-nom, ...) 10 : Baseline allocation of SW requirements to IFCs w ith grow th/correction 12 : Conduct Sw RR 14 : Initial end-to-end architecture model 16 : Finalize/val/sim IFC4 requirements: Nominal operations 18 : Finalize/val/sim IFC5 requirements: Subsystems off-nominal operations 2009 20 : Deliver IFC7:2008 No new capabilities; Only corrections; 1st mission 2010 ready Planned (Solid=Linked, Hollow =Unlinked) Completed Projects define risk mitigation “burn down” charts with specific tasks and exit criteria Control Points Completed © Copyright 2011. Richard W. Selby. All rights reserved. Lessons Learned for Development and Management of Large-Scale Software Systems Early planning People are the largest lever Engage your stakeholders and set expectations Embrace change because change is value Lifecycle and architecture strategy Prioritize features and align resources for high-payoff Develop products incrementally Iteration facilitates efficient learning Reuse drives favorable effort and quality economics Execution Organize to enable parallel activities Invest resources in high return-on-investment activities Automate testing for early and frequent defect detection Create schedule margin by delivering early Decision making 21 Measurement enables visibility Modeling and estimation improve decision making Apply risk management to mitigate risks early © Copyright 2011. Richard W. Selby. All rights reserved.