USC CSSE Annual Research Review 2012 The Potential High Cost of Simple Systems Engineering Errors Jim Gottfried Chief Scientist/Engineer, Logistics and Engineering Solutions SAIC March 7, 2012 NATIONAL SECURITY • ENERGY & ENVIRONMENT • HEALTH • CYBERSECURITY © SAIC. All rights reserved. Ground Rules • The projects and circumstances to be discussed were all performed by strong, competent, and well-disciplined engineering companies, often operating under CMMI L3 or higher processes • The engineers working these projects were experienced, very competent, and disciplined system and software engineers • Still, problems do occur, cost money to fix, and may have been avoided 2 SAIC.com © SAIC. All rights reserved. Problem #1: Specification Errors • Setting the Stage – What characteristics describe good requirements? • • • • • • • 3 Clear/unambiguous Accurate Complete Necessary, traceable to a higher level requirement Consistent with other requirements/standards Achievable Verifiable SAIC.com © SAIC. All rights reserved. Problem #1: Specification Errors, cont. Requirements Example • Logistics Metrics (1): The radio system shall provide the capability for a remote or local user to view performance metrics information of the type listed below, as a minimum. – Availability (Ao): % time system capable of supporting prime mission – Mean Time Between Failures (MTBF): time in tenths of hours between failure of a software or hardware item – Mean Down Time (MDT): average downtime in tenths of hours where system cannot perform primary mission • Logistics Metrics (2): The radio system shall be capable of calculating the values of the logistics metrics described above. The remote maintenance software shall be capable of displaying these values on a user screen available to both a local and a remote user. The calculated data will be air base specific. • What is wrong or missing with the above requirements? 4 SAIC.com © SAIC. All rights reserved. Problem #1: Specification Errors, cont. • Note that the system reported the required metrics and the reporting format was fine to the user. • The metrics were calculated accurately. • The user reported the metrics to their management on a quarterly basis. • Could the user perform this reporting function? Why or why not? – Answer: No, there was no capability to reset the metrics after reading them each quarter • Resolution: Update software and documentation to allow resetting metrics upon command • Cost: Over $80K 5 SAIC.com © SAIC. All rights reserved. Problem #2: Systems Engineering Design Errors • Setting the Stage – The power for the system came through an uninterruptable power supply (UPS) – The UPS was software controlled and monitored for failure – Commercial UPS specifications were reviewed – A commercial UPS was selected and installed with the system – After installation when facility power failed, large electrical spikes were seen that shut down some of the electronic equipment – Investigation showed that this UPS was not designed to condition the power as installed on this system 6 SAIC.com © SAIC. All rights reserved. Problem #2: Systems Engineering Design Analysis • Resolution Options – Option 1: add a transformer between UPS and system Customer does not like this option as a long term solution (for additional bases as well) • This would make the first system different from other, future bases • – Option 2: replace the original UPS with a different UPS that will properly condition the power The only available UPSs that will do the job properly have a different software interface • This UPS is lower cost and more flexible in sizing • Customer wants this solution on future system sites • • Action: – New UPS purchased, system software changed for compatibility – New UPS installed and tested • Cost: Over $120K 7 SAIC.com © SAIC. All rights reserved. Problem #1 and #2 Lessons Learned • Both problems resulted from relatively simple systems engineering (SE) errors • Both problems resulted in substantial cost additions • How to avoid – My opinion: we will never eliminate all SE problems; system engineers are human – Best approach to avoid this type of problem is extremely thorough peer reviews of all requirements and design decisions using quality checklists – Thorough peer reviews take time and must be planned in the process – Peer reviews should involve a sufficient number of engineers to fully represent all stakeholder organizations including system, design, integration, test, and specialty engineers – Problem #1 (specification) may have been prevented by developing use cases for all user interactions with the system 8 SAIC.com © SAIC. All rights reserved. Problem #3: A System vs. a Hardware Item • What distinguishes a system from a hardware item (e.g., a communications radio [JTRS, air traffic control, etc.])? – Some characteristics: • • • • • • More functionality Multiple hardware items More external interfaces Computer controlled; more software/firmware Larger, more dynamic user interfaces … etc. • Problem: Understanding and appreciating the complexity of a system versus the previous hardware item 9 SAIC.com © SAIC. All rights reserved. Problem #3: A System vs. a Hardware Item, cont. • The need to understand and appreciate the complexity of a system is very intuitive, however, the solution is very difficult to understand and address • Why? – Psychology: Because we (system engineers) are the experts in the hardware item domain; we understand it well; the system is just an extension of what we know/do – New goals for the system are underestimated: rarely do we build a one-for-one replacement of the hardware • Systems are built to add flexibility to the product – Flexibility increases development complexity and time • Systems are built to add functionality to the product – More user/remote control, better user experience, easier maintenance, more capability, more accuracy, more timeliness • Systems are built to improve product reliability and availability – Better diagnostics, backup capability, redundancy and auto failover • Other? 10 SAIC.com © SAIC. All rights reserved. Problem #3: A System vs. a Hardware Item, cont. • Ramification of failure to understand the system vs. the hardware item – – – – – – Development time increases 2-3 times original plan Cost can increase 2-4 times original plan Late to market, competitor first to market Unhappy customers Frustrated management and engineers Cancellation of project • Solutions? – It must start with better appreciation of the problems, goals, and complexity of the system vs. the hardware item 11 SAIC.com © SAIC. All rights reserved.