The Potential High Cost of Simple Systems Engineering Errors

USC CSSE Annual Research
Review 2012
The Potential High Cost of Simple
Systems Engineering Errors
Jim Gottfried
Chief Scientist/Engineer, Logistics and Engineering Solutions
SAIC
March 7, 2012
NATIONAL SECURITY • ENERGY & ENVIRONMENT • HEALTH • CYBERSECURITY
© SAIC. All rights reserved.
Ground Rules
• The projects and circumstances to be discussed were all
performed by strong, competent, and well-disciplined
engineering companies, often operating under CMMI L3 or
higher processes
• The engineers working these projects were experienced, very
competent, and disciplined system and software engineers
• Still, problems do occur, cost money to fix, and may have been
avoided
2
SAIC.com
© SAIC. All rights reserved.
Problem #1: Specification Errors
• Setting the Stage
– What characteristics describe good requirements?
•
•
•
•
•
•
•
3
Clear/unambiguous
Accurate
Complete
Necessary, traceable to a higher level requirement
Consistent with other requirements/standards
Achievable
Verifiable
SAIC.com
© SAIC. All rights reserved.
Problem #1: Specification Errors, cont.
Requirements Example
• Logistics Metrics (1): The radio system shall provide the capability for a remote
or local user to view performance metrics information of the type listed below,
as a minimum.
– Availability (Ao): % time system capable of supporting prime mission
– Mean Time Between Failures (MTBF): time in tenths of hours between failure of a
software or hardware item
– Mean Down Time (MDT): average downtime in tenths of hours where system cannot
perform primary mission
• Logistics Metrics (2): The radio system shall be capable of calculating the
values of the logistics metrics described above. The remote maintenance
software shall be capable of displaying these values on a user screen available
to both a local and a remote user. The calculated data will be air base specific.
• What is wrong or missing with the above requirements?
4
SAIC.com
© SAIC. All rights reserved.
Problem #1: Specification Errors, cont.
• Note that the system reported the required metrics and the reporting
format was fine to the user.
• The metrics were calculated accurately.
• The user reported the metrics to their management on a quarterly basis.
• Could the user perform this reporting function? Why or why not?
– Answer: No, there was no capability to reset the metrics after reading them
each quarter
• Resolution: Update software and documentation to allow resetting
metrics upon command
• Cost: Over $80K
5
SAIC.com
© SAIC. All rights reserved.
Problem #2: Systems Engineering Design Errors
• Setting the Stage
– The power for the system came through an uninterruptable power
supply (UPS)
– The UPS was software controlled and monitored for failure
– Commercial UPS specifications were reviewed
– A commercial UPS was selected and installed with the system
– After installation when facility power failed, large electrical spikes
were seen that shut down some of the electronic equipment
– Investigation showed that this UPS was not designed to condition the
power as installed on this system
6
SAIC.com
© SAIC. All rights reserved.
Problem #2: Systems Engineering Design Analysis
• Resolution Options
– Option 1: add a transformer between UPS and system
Customer does not like this option as a long term solution (for additional bases as well)
• This would make the first system different from other, future bases
•
– Option 2: replace the original UPS with a different UPS that will properly
condition the power
The only available UPSs that will do the job properly have a different software interface
• This UPS is lower cost and more flexible in sizing
• Customer wants this solution on future system sites
•
• Action:
– New UPS purchased, system software changed for compatibility
– New UPS installed and tested
• Cost: Over $120K
7
SAIC.com
© SAIC. All rights reserved.
Problem #1 and #2 Lessons Learned
• Both problems resulted from relatively simple systems engineering (SE)
errors
• Both problems resulted in substantial cost additions
• How to avoid
– My opinion: we will never eliminate all SE problems; system engineers are
human
– Best approach to avoid this type of problem is extremely thorough peer
reviews of all requirements and design decisions using quality checklists
– Thorough peer reviews take time and must be planned in the process
– Peer reviews should involve a sufficient number of engineers to fully represent
all stakeholder organizations including system, design, integration, test, and
specialty engineers
– Problem #1 (specification) may have been prevented by developing use cases
for all user interactions with the system
8
SAIC.com
© SAIC. All rights reserved.
Problem #3: A System vs. a Hardware Item
• What distinguishes a system from a hardware item (e.g., a
communications radio [JTRS, air traffic control, etc.])?
– Some characteristics:
•
•
•
•
•
•
More functionality
Multiple hardware items
More external interfaces
Computer controlled; more software/firmware
Larger, more dynamic user interfaces
… etc.
• Problem: Understanding and appreciating the complexity of a
system versus the previous hardware item
9
SAIC.com
© SAIC. All rights reserved.
Problem #3: A System vs. a Hardware Item, cont.
• The need to understand and appreciate the complexity of a system is very
intuitive, however, the solution is very difficult to understand and address
• Why?
– Psychology: Because we (system engineers) are the experts in the hardware item
domain; we understand it well; the system is just an extension of what we know/do
– New goals for the system are underestimated: rarely do we build a one-for-one
replacement of the hardware
• Systems are built to add flexibility to the product
– Flexibility increases development complexity and time
• Systems are built to add functionality to the product
– More user/remote control, better user experience, easier maintenance, more
capability, more accuracy, more timeliness
• Systems are built to improve product reliability and availability
– Better diagnostics, backup capability, redundancy and auto failover
• Other?
10
SAIC.com
© SAIC. All rights reserved.
Problem #3: A System vs. a Hardware Item, cont.
• Ramification of failure to understand the system vs. the hardware
item
–
–
–
–
–
–
Development time increases 2-3 times original plan
Cost can increase 2-4 times original plan
Late to market, competitor first to market
Unhappy customers
Frustrated management and engineers
Cancellation of project
• Solutions?
– It must start with better appreciation of the problems, goals, and
complexity of the system vs. the hardware item
11
SAIC.com
© SAIC. All rights reserved.