Part 2 – Session 1 – Breakout #1 Old Lessons Apply in the New World 3/26/2006 Herb Shivers NASA/MSFC 1 Recap from Panel Discussion • “There has to be an optimum balance among technical performance, time schedule and cost.” Dr. Eberhard Rees • “If eternal vigilance is the price of liberty, then chronic unease is the price of safety.” Professor James Reason (2005, p 37) (substitute “quality” for “safety”) • Quality and System Safety both are instrumental in the prevention process 3/26/2006 Herb Shivers NASA/MSFC 2 What is Quality Engineering? Juran: – customer satisfaction, or simply "fitness for use" (p 20) Ishikawa: – the practice of developing, designing, producing, and servicing a quality product that is most economical, useful, and satisfactory to the customer (p 64) Crosby: – conformance to requirements (p 21) Deming: – a predictable degree of uniformity and dependability that is suited to the market at low cost. In other words, quality is meeting customer needs and wants (p 61) ASQ, 2001 3/26/2006 Herb Shivers NASA/MSFC 3 Quality Evolution – Babylonian, Egyptian, Greek, Roman weights and measures for trade – Trades and craft guilds standards (experts) – Mass production and machinery (low level training) – Supervisor quality monitors – Inspectors (ala quality control) – Deming: Plan-Do-Check Action Cycle – Juran, Feigenbaum, Ishikawa: TQM – Quality assurance • designed in, not inspected in (James Reason, pp 46/7) 3/26/2006 Herb Shivers NASA/MSFC 4 What is System Safety Engineering? System Safety Engineering (SSE) - A subset of the safety engineering discipline that provides direct support to programs and projects to achieve acceptable mishap risk through a systematic approach of hazard analysis, risk assessment, and risk management. (J.R. Goodin/NASA/KSC ( retired), 2004) System Safety is the application of engineering and management principles, criteria, and techniques to optimize all aspects of safety within the constraints of operational effectiveness, time, and cost throughout all phases of the system life cycle (Air Force Safety Agency, 2000, p vii) System safety is… – A management doctrine, and – A family of analytical approaches that support that doctrine (Mohr, Jacobs Sverdrup, 2002) 3/26/2006 Herb Shivers NASA/MSFC 5 Some Analysis Types Preliminary Hazard Analysis (PHA) System Hazard Analysis (SHA) Subsystem Hazard Analysis (SSHA) Occupational Health Hazard Assessment (OHHA) Software Hazard Analysis SSE Analyses consider system limits and risks Mohr, 2002 3/26/2006 Herb Shivers NASA/MSFC 6 Some Analytical Techniques Preliminary Hazard Analysis Failure Modes and Effects Analysis Fault Tree Analysis Event Tree Analysis Cause-Consequence Analysis Sneak Circuit Analysis Probabilistic Risk Assessment Digraph Analysis Hazard and Operability Study (HAZOP) Management Oversight and Risk Tree Analysis (MORT) SSE requires a toolbox of techniques; there is no one size fits all tool Mohr, 2002 3/26/2006 Herb Shivers NASA/MSFC 7 Why System Safety Engineering? Support management risk decisions relative to system hazards Avoid “fly-fail-fix-fly” and “pilot error” mentalities Manage safety in the same manner as any other design or operational parameter Prevent accidents, not react to them Consider impacts to: workers, the public, product quality, productivity, environment, facilities and equipment Shivers, 2005 3/26/2006 Herb Shivers NASA/MSFC 8 Effective System Safety Program Attributes Management Commitment Safety Culture Independent Safety Organization Communication Qualified/Educated Personnel Well-Defined Roles, Processes and Tools Including: Use of Technical Standards, Capture/Use of Lessons Learned, Audits and Reviews, Stop Work Authority Sufficient Resources (Kiessling, Shivers, and Tippet, 2004) 3/26/2006 Herb Shivers NASA/MSFC 9 Systems Thinking Learn to view connected events as a system Learn to view connected events as a system (Peter Senge, The Fifth Discipline ) Seeing wholes – the “big picture,” unintended consequences, cause and effect (including delay), Seeing wholes – the “big picture,” unintended long term views, etc. consequences, cause and effect (including delay), long term views, etc. Our jobs don’t exist in isolation Our jobs don’t exist in isolation Deal with root causes, not symptoms Deal with root causes, not symptoms 3/26/2006 Senge Herb Shivers NASA/MSFC 10 Who Should Implement SSE? SSE is the responsibility of all technical and management personnel on a project team Chief engineers, systems engineers, design engineers, project managers all must include “SSE thinking” as a minimum in their work and understand what SSE is and does SSE practitioners generally come from the safety and mission assurance organizations, but must be planned for and included in the team activities Shivers, 2005 3/26/2006 Herb Shivers NASA/MSFC 11 SSE Thinking SSE thinking is focused on identifying and controlling potential failure, while design engineering thinking might be more focused on successful operation Together, the two thought modes are complimentary and lead to better chance of success, which is the goal of each Both thought modes need to be within the realm of “Systems Thinking” in general to consider all impacts of decisions made Shivers, 2005 3/26/2006 Herb Shivers NASA/MSFC 12 When is SSE Implemented? SSE considerations must be included in the up front conceptualization so that pertinent information can be used in trade studies and requirements development SSE is applied throughout the life cycle with appropriate tools and analyses brought to bear as warranted – The system safety process can be applied at any point in the system life cycle, but the greatest advantages are achieved when it is used early in the acquisition life cycle – The system safety process is normally repeated as the system evolve or changes and as problem areas are identified (Air Force Safety Agency, 2000, p 14) Decisions made under cost and schedule pressure can lead to hazards (Stroup and Naylor, 2001) 3/26/2006 Herb Shivers NASA/MSFC 13 SSE and the Life Cycle Early in the life cycle SSE considers hazards that may occur any time in the life cycle Early identification usually results in less expensive corrections Analysis can be and is done at any time in the life cycle Shivers, 2005 3/26/2006 Herb Shivers NASA/MSFC 14 System Safety Program Objectives • • • • • • • • • • a. Safety, consistent with mission requirements is designed into the system in a timely, cost-effective manner b. Hazards are identified, evaluated, and eliminated, or the associated risk reduced to a level acceptable to the managing activity (MA) throughout the entire life cycle of a system c. Historical safety data, including lessons learned from other systems, are considered and used d. Minimum risk is sought in accepting and using new designs, materials, and production and test techniques e. Actions taken to eliminate hazards or reduce risk to a level acceptable to the MA are documented f. Retrofit actions are minimized g. Changes in design, configuration, or mission requirements are accomplished in a manner that maintains a risk level acceptable to the MA h. Consideration is given to safety, ease of disposal, and demilitarization of any hazardous materials associated with the system i. Significant safety data are documented as “lessons learned” and are submitted to data banks, design handbooks, or specifications j. Hazards identified after production are minimized consistent with program restraints Air Force Safety Agency, 2000, p 1 3/26/2006 Herb Shivers NASA/MSFC 15 Some Concept Phase SSE Tasks Concept Trade Studies – Concept alternative studies include quantitative and qualitative SSE analysis input and criteria Concept Definition – Requirements management, risk management planning, feasibility and design trades safety technical requirements generation include results from SSE analysis Shivers, 2005 3/26/2006 Herb Shivers NASA/MSFC 16 Some Development Phase SSE Tasks Development of contract requirements in the Statement of Work and for the contract data requirements (analyses & reports) – Example analyses requirements System Safety Plan Preliminary Hazard List Preliminary Hazard Analysis Operating & Support Hazard Analyses System Hazard Analyses Fault Tree Analyses (FTA) Probabilistic Risk Analysis (PRA) Design and Development – SSE input into specification development and verification planning Shivers, 2005 3/26/2006 Herb Shivers NASA/MSFC 17 Some Production Phase SSE Tasks Fabrication integration, test and evaluation – SSE input into ground activities and verification – Test planning to validate safety features – Conducting test safely Shivers, 2005 3/26/2006 Herb Shivers NASA/MSFC 18 Some Operations Phase SSE Tasks Operations – SSE input into operations and performance validation (must be considered early as well) Operation and Support Hazard Analyses Analyses from the Human Factors Program Shivers, 2005 3/26/2006 Herb Shivers NASA/MSFC 19 Some Close Out SSE Tasks Decommissioning, disposal, recycling – SSE inputs into process decisions Shivers, 2005 3/26/2006 Herb Shivers NASA/MSFC 20 NASA S&MA Roles S&MA provides – SSE practitioners – Assurance that requirements are set and met – Development of disciplines and tools S&MA in-line engineering has a review, evaluation and concurrence role The S&MA assurance supports engineering, validation and verification, policy and planning, and independent assessments 3/26/2006 Herb Shivers NASA/MSFC 21 System Safety Effort Throughout Project Lifecycle Proposal Support Requirements Definition Design Assessment Identification of Hazards Recommended Hazard Controls Assessment of Risk Verification of Hazard Controls Development of Safety Data Packages Interface with KSC & Range Safety Safety Support during I&T Activities Track Closure of Verification Items Safety Certification Prelaunch Safety Support Goddard Space Flight Center, 2006 3/26/2006 Herb Shivers NASA/MSFC 22 1:20 SUMMARY System safety is involved throughout entire project lifecycle Hazards to personnel or mission success are identified, eliminated or controlled to an acceptable level of risk Effectiveness of hazard controls must be verified Hazard analysis results and verification results are documented Goddard Space Flight Center, 2006 3/26/2006 Herb Shivers NASA/MSFC 23 Organizational Accidents Rare, sometimes catastrophic, events that occur within complex modern technologies Have multiple causes Have devastating effects on uninvolved populations and things Contrast with individual accidents that involve a person as often the victim and agent of the event Difficult to understand and control (James Reason, p 1) 3/26/2006 Herb Shivers NASA/MSFC 24 Generic Cause of Organizational Accidents “All organizational accidents entail the breaching of the barriers and safeguards that separate damaging and injurious hazards from vulnerable people or assetscollectively termed ‘losses’ In individual accidents such defenses are often either inadequate or lacking Three factors of breaching defenses: – Human, technical, organizational – Governed by production and protection (James Reason, p 2) 3/26/2006 Herb Shivers NASA/MSFC 25 Unintended Consequences “… conflicts between production and protection pressures tend to be resolved in favour of the former – at least until a bad accident occurs.” – “efficient” methods for work arise naturally – “safety” adds restrictions to procedures – rules become more restrictive over time – the scope of allowable actions is reduced – violation of procedure becomes necessary to accomplish the job (James Reason, p 49) 3/26/2006 Herb Shivers NASA/MSFC 26 Maintenance Can Seriously Damage Your System “…it is often latent conditions created by maintenance lapses that either set the accident sequence in motion or thwart its recovery.” “…of the various possible error types associated with the reassembly, installation or restoration of components, omission – the failure to carry out necessary steps in the task – comprise the largest single error type.” (James Reason, pp 85/6) 3/26/2006 Herb Shivers NASA/MSFC 27 Some Well-known Accidents USS Thrasher 1963 sinking – – – – QC of brazing, etc. Quality Problem = safety problem Poor design, overhaul followed by severe test Quality - to prevent, not learn from catastrophe Design, manufacturing, identify safety critical elements, test and verification, test planning X31 Crash 1995 – Faulty Configuration Management – Pitot tube heaters not present in design – Failure to follow procedure, find process escapes, identify critical failures, verification Idaho Falls nuclear reactor explosion 1991 – Poor maintenance procedures, on the fly process modifications, design flaws, QE supervision of work 3/26/2006 Herb Shivers NASA/MSFC NASA, 2006 28 Project and Systems Management Were developed to manage in an emerging new environment: A multitude of government agencies, industrial firms and other organizations, sometimes on an international basis Funds in the multimillion to billion dollar category Complex technology sometimes reaching beyond the state of the art Large forces of scientists, engineers, technicians and administrative personnel Construction of extensive and highly specialized facilities 3/26/2006 Herb Shivers NASA/MSFC Rees 29 Apollo Program Characteristics • • • • • • • • • • • • • • • Program and systems management perspective Technical risk trades with cost and schedule Planning Visibility Management review Configuration control Penetration Communication Contracting philosophies Organization Authority, roles and responsibilities Innovation Goal focus Continuous study and application of systems engineering Relate actions to schedule and budget 3/26/2006 Herb Shivers NASA/MSFC 30 Systems Aspects Such projects of great magnitude and complexity, had to be considered under the overall “systems” point of view The Apollo Program had shortcomings, setbacks, and deficiencies during its execution – all of which challenged the management: To assure success, minimize technical risks or actually mission risks Keep closely to the time schedule Wherever possible must engage in parallel rather than consecutive developments 3/26/2006 Herb Shivers NASA/MSFC Rees 31 Tight Budget Control and Highest Economy in Expenditure Budget Controls Subordinate to technical needs and the demands of the time schedule There is a trade-off between acceptable technical risks or product quality, time schedule and project cost. “To eliminate the technical risk problem, frequently undue quality control or over-testing of hardware is applied which delays schedules and makes costs skyrocket.” 3/26/2006 Herb Shivers NASA/MSFC Rees 32 Solid Planning Master plans on hardware, software, and overall systems: Technical approaches Resources such as facilities, manpower and funds Schedules Detailed breakdowns of the overall job and the system into subsystems 3/26/2006 Herb Shivers NASA/MSFC Rees 33 Visibility Management at all levels should know almost in “real time” what is going on in the program: technical occurrences schedule progress or delays financial status From the outset of the program, proper and effective channels and ways of communication have to be established on the government side between upper and lower echelons of management Prime contractors must provide equally effective channels down to their respective subcontractors 3/26/2006 Herb Shivers NASA/MSFC Rees 34 Significance of Visibility Enable management on all levels to predict trends in the progression of the program Vital for taking corrective steps before the program runs into impediments “The capability of management to foretell trouble and thus avoid it by appropriate actions was one of the major cornerstones of the Apollo success.” Dr. Eberhard Rees 3/26/2006 Herb Shivers NASA/MSFC 35 Review Milestones Schedule review between government and prime contractors. Apollo reviews, for instance, in a chronological sequence: Program Requirements Review PRR Preliminary Design Review PDR Critical Design Review CDR Design Certification Review DCR Pre-Delivery Turn-Over Review PDTR Flight Readiness Review FRR Countdown Demonstration Test and its Review CDDT 3/26/2006 Herb Shivers NASA/MSFC Rees 36 Significance of Reviews Critically examine and assess the project status Affirm the quality of the product and its reliability Assure systems safety Every review resulted in protocolled action items Resolve problems Authorized go ahead with the next increment of the overall plan. Rees 3/26/2006 Herb Shivers NASA/MSFC 37 Configuration Control The contractor followed acceptable drawing room practice as to procedure and discipline Design intentions were carried through manufacturing Only mandatory changes were approved The exact configuration, known down to the most minute detail was delivered to the launching site Failures or unsuitable hardware or material could be traced down to the point of origin (Apollo management called this “traceability”) “Configuration control carried out in a strict sense is very expensive. It is, therefore, vital that these controls not be overdone and that they are wisely introduced to prime contractors and subcontractors.” 3/26/2006 Herb Shivers NASA/MSFC Rees 38 Application of the Penetration Principle Dr. Eberhard Rees on the “Penetration Principle” “It permeated through the contractor organization to the subcontractor structure. Spawned by this approach, improved failure analysis appeared throughout the system; in-process inspection was maintained at a high level; and receiving inspection techniques and effectiveness were improved, among other benefits.” 3/26/2006 Herb Shivers NASA/MSFC 39 Significance of Penetration Improved Communication Channels Created close interaction of highly dedicated, competent technical and scientific personnel, all motivated by the impressive challenge of a huge complex program, no mater whether they are government or contractor employees Most instrumental in this government-contractor relationship was the establishment of resident personnel in the prime contractor plants Rees 3/26/2006 Herb Shivers NASA/MSFC 40 Contracting Principles Cost-plus-fixed-fee contracts: Used because of the uncertainties of effective, close pricing in such a program with its many unknowns Incentive fee contracts: A base fee of modest proportions Plus a scaled or incentive segment awarded to a contractor for success in meeting program product requirements for performance, cost, and time schedule Lends itself well to hardware contracts with reasonable, welldetermined milestones, cost levels and schedule. Award Fee contracts: Used where parameters are not easily distinguished in advance Support service or engineering service contracts Motivational in nature 3/26/2006 Herb Shivers NASA/MSFC Rees 41 Other Pertinent Principles Organize and motivate to achieve effective high morale in the workforce Delegate authority clearly, concisely and positively to achieve timely decisions Apply innovative concepts and techniques courageously Keep objectives pointed toward the goal Require continuing study and application of the systems engineering approach Relate actions to schedule and to budget continuously 3/26/2006 Herb Shivers NASA/MSFC Rees 42 The Apollo Management System “Our management system evolved after some painful experiences in the early days of Apollo. In fact, at the beginning of the program in 1961, there was no common system in existence within the rather young National Aeronautics and Space Administration. Then as the program gathered headway and matured, the management system became better defined, changing as necessary to keep pace with unfolding events. Early it was learned that in the environment of a big development project, there can be no static system. Change and evolution are inevitable.” Dr. Eberhard Rees 3/26/2006 Herb Shivers NASA/MSFC 43 Program Integration Three categories of concern: First, there are the hardware, systems and subsystems specialists who devote attention to the delivery of items that are technically adequate and qualified for mission performance Second, there are the specialists who approach the project from the point of view of controlling costs and schedules. As the third organizational element in the grouping, there is the on-site resident management office. To assure that project management interests were advanced and that decisions were made and implemented within the designated scope of authority of the resident group. 3/26/2006 Herb Shivers NASA/MSFC Rees 44 Resident Management Offices This resident element proved to be a most important link between government and contractor activities To expedite decisions, the resident manager required functional support, which was provided by specialized , on-site contract administration and technical engineering staff assigned from parent functional organizations of the responsible Center could make decisions “on the spot” or commit the parent office or function at the Center (within well-established limits) 3/26/2006 Herb Shivers NASA/MSFC Rees 45 Significance of the Resident Management Office Speed the project management process Provide a dynamic interface with the contractor on a continuing day-to-day basis Integrate technical and managerial personnel The technical functions tend to strive primarily toward perfection to a degree that possibly inhibits adequate attention to manufacturing and launch schedules or cost. The contractor could well be oriented toward schedule, costs and profits, whereas the project manager might weigh concern more heavily on schedule and costs. Through the office of the resident manager, an automatic system of checks and balances developed to the end that each consideration received its appropriate share of attention. 3/26/2006 Herb Shivers NASA/MSFC Rees 46 Contractor Penetration Contractor penetration is necessary to obtain visibility There is an understandably strong desire on the part of industry to take the control and the funding and to do the job with but minor government intervention. The restiveness that stemmed from such close control gradually dissipated early in the Apollo Program as the benefits accruing from the industry-government teams approach were revealed. The manager must have control of competent technical and administrative staff in order to conduct activities efficiently. 3/26/2006 Herb Shivers NASA/MSFC Rees 47 Program Management “While centralized program management has many values, of prime importance is the assignment of all responsibility to single organizational management structures, pyramiding into a single strong personality. Of course with the responsibility, the manager must have commensurate authority to resolve technical, financial, production and other problems that otherwise require coordination and approval in separate channels at different echelons. And the manager must have clear, concise communications flowing in all directions.” Dr. Eberhard Rees 3/26/2006 Herb Shivers NASA/MSFC 48 Conclusion • System Safety and Quality: – necessary components of good program and systems management – very similar in their objectives, but with quite different tools and techniques – Must be applied early in the life cycle – Must be implemented religiously throughout program execution – Must be continuously examined and improved – Are complementary for safety and mission success 3/26/2006 Herb Shivers NASA/MSFC 49 Acknowledgements (1 of 2) “A Brief Overview of Selected System Safety Analytical Approaches,” R. R. Mohr, Jacobs Engineering, 2002. “Air Force System Safety Handbook,” Air Force Safety Agency, July 2000. “Cost and Schedule – The Overlooked Hazards,” Ron Stroup and Warren Naylor, Proceedings of the 19th International System Safety Conference, 2001. “Improving Performance of the System Safety Function at the Marshall Space Flight Center,” Ed Kiessling and Herb Shivers, NASA Marshall Space Flight Center and Donald D. Tippett, The University of Alabama in Huntsville, Proceedings of the American Society for Engineering Management Conference, 9/2004. “Human Factors: A Personal Perspective,” James Reason, Human Factors Seminar, Helsinki, 2006. Managing the Risks of Organizational Accidents, James Reason, Ashgate, 1997 (9th reprint, 2005). “Quality 101,” American Society for Quality, 2001. 3/26/2006 Herb Shivers NASA/MSFC 50 Acknowledgements (2 of 2) “Safety and Mission Success,” Technical Managers Training, Goddard Space Flight Center, 10/2006 Some general SSE information in this presentation was taken from works of Pat Clemens/APT Research, Huntsville, AL; Ronnie Goodin/KSC, retired. “System Safety Engineering Awareness Training for NASA Managers and Engineers,” (not yet released), 2006. “System Safety Engineering Technical Warrant,” Herb Shivers, presented to the NASA Technical Authority Conference, June 2005. The Fifth Discipline: The Art and Practice of the Learning Organization, Peter Senge, Currency Doubleday, 1990 - 1st edition, 1994 - paperback edition. “System Failure Case Studies,” NASA, Office of Safety and Mission Assurance, Review and Assessment Division, 2006. 3/26/2006 Herb Shivers NASA/MSFC 51