Old Lessons Apply in the New World Part 2 3/26/2006

advertisement
Part 2 – Session 1 – Breakout #1
Old Lessons Apply in the New World
3/26/2006
Herb Shivers NASA/MSFC
1
Recap from Panel Discussion
• “There has to be an optimum balance among technical
performance, time schedule and cost.”
Dr. Eberhard Rees
• “If eternal vigilance is the price of liberty, then chronic unease is
the price of safety.”
Professor James Reason (2005, p 37) (substitute “quality” for “safety”)
• Quality and System Safety both are instrumental in the
prevention process
3/26/2006
Herb Shivers NASA/MSFC
2
What is Quality Engineering?
Juran:
– customer satisfaction, or simply "fitness for use" (p 20)
Ishikawa:
– the practice of developing, designing, producing, and
servicing a quality product that is most economical,
useful, and satisfactory to the customer (p 64)
Crosby:
– conformance to requirements (p 21)
Deming:
– a predictable degree of uniformity and dependability
that is suited to the market at low cost. In other
words, quality is meeting customer needs and wants
(p 61)
ASQ, 2001
3/26/2006
Herb Shivers NASA/MSFC
3
Quality Evolution
– Babylonian, Egyptian, Greek, Roman weights
and measures for trade
– Trades and craft guilds standards (experts)
– Mass production and machinery (low level
training)
– Supervisor quality monitors
– Inspectors (ala quality control)
– Deming: Plan-Do-Check Action Cycle
– Juran, Feigenbaum, Ishikawa: TQM
– Quality assurance
• designed in, not inspected in
(James Reason, pp 46/7)
3/26/2006
Herb Shivers NASA/MSFC
4
What is System Safety Engineering?
System Safety Engineering (SSE) - A subset of the safety
engineering discipline that provides direct support to programs
and projects to achieve acceptable mishap risk through a
systematic approach of hazard analysis, risk assessment, and
risk management.
(J.R. Goodin/NASA/KSC ( retired), 2004)
System Safety is the application of engineering and management
principles, criteria, and techniques to optimize all aspects of safety
within the constraints of operational effectiveness, time, and cost
throughout all phases of the system life cycle
(Air Force Safety Agency, 2000, p vii)
System safety is…
– A management doctrine, and
– A family of analytical approaches that support that doctrine
(Mohr, Jacobs Sverdrup, 2002)
3/26/2006
Herb Shivers NASA/MSFC
5
Some Analysis Types
Preliminary Hazard Analysis (PHA)
System Hazard Analysis (SHA)
Subsystem Hazard Analysis (SSHA)
Occupational Health Hazard Assessment (OHHA)
Software Hazard Analysis
SSE Analyses consider system limits and risks
Mohr, 2002
3/26/2006
Herb Shivers NASA/MSFC
6
Some Analytical Techniques
Preliminary Hazard Analysis
Failure Modes and Effects Analysis
Fault Tree Analysis
Event Tree Analysis
Cause-Consequence Analysis
Sneak Circuit Analysis
Probabilistic Risk Assessment
Digraph Analysis
Hazard and Operability Study (HAZOP)
Management Oversight and Risk Tree Analysis (MORT)
SSE requires a toolbox of techniques; there is no one size fits all tool
Mohr, 2002
3/26/2006
Herb Shivers NASA/MSFC
7
Why System Safety Engineering?
Support management risk decisions relative to system hazards
Avoid “fly-fail-fix-fly” and “pilot error” mentalities
Manage safety in the same manner as any other design or
operational parameter
Prevent accidents, not react to them
Consider impacts to: workers, the public, product quality,
productivity, environment, facilities and equipment
Shivers, 2005
3/26/2006
Herb Shivers NASA/MSFC
8
Effective System Safety Program Attributes
Management Commitment
Safety Culture
Independent Safety Organization
Communication
Qualified/Educated Personnel
Well-Defined Roles, Processes and Tools Including:
Use of Technical Standards, Capture/Use of Lessons
Learned,
Audits and Reviews, Stop Work Authority
Sufficient Resources
(Kiessling, Shivers, and Tippet, 2004)
3/26/2006
Herb Shivers NASA/MSFC
9
Systems Thinking
Learn to view connected events as a system
Learn to view connected events as a system (Peter Senge,
The Fifth Discipline )
Seeing wholes – the “big picture,” unintended
consequences, cause and effect (including delay),
Seeing wholes – the “big picture,” unintended
long term views, etc.
consequences, cause and effect (including delay), long
term views, etc.
Our jobs don’t exist in isolation
Our jobs don’t exist in isolation
Deal with root causes, not symptoms
Deal with root causes, not symptoms
3/26/2006
Senge
Herb Shivers NASA/MSFC
10
Who Should Implement SSE?
SSE is the responsibility of all technical and
management personnel on a project team
Chief engineers, systems engineers, design engineers,
project managers all must include “SSE thinking” as a
minimum in their work and understand what SSE is and
does
SSE practitioners generally come from the safety and
mission assurance organizations, but must be planned
for and included in the team activities
Shivers, 2005
3/26/2006
Herb Shivers NASA/MSFC
11
SSE Thinking
SSE thinking is focused on identifying and controlling
potential failure, while design engineering thinking might
be more focused on successful operation
Together, the two thought modes are complimentary and
lead to better chance of success, which is the goal of
each
Both thought modes need to be within the realm of
“Systems Thinking” in general to consider all impacts of
decisions made
Shivers, 2005
3/26/2006
Herb Shivers NASA/MSFC
12
When is SSE Implemented?
SSE considerations must be included in the up front
conceptualization so that pertinent information can be used in trade
studies and requirements development
SSE is applied throughout the life cycle with appropriate tools and
analyses brought to bear as warranted
– The system safety process can be applied at any point in the system
life cycle, but the greatest advantages are achieved when it is used
early in the acquisition life cycle
– The system safety process is normally repeated as the system evolve
or changes and as problem areas are identified (Air Force Safety Agency,
2000, p 14)
Decisions made under cost and schedule pressure can lead to
hazards (Stroup and Naylor, 2001)
3/26/2006
Herb Shivers NASA/MSFC
13
SSE and the Life Cycle
Early in the life cycle SSE considers hazards that may
occur any time in the life cycle
Early identification usually results in less expensive
corrections
Analysis can be and is done at any time in the life cycle
Shivers, 2005
3/26/2006
Herb Shivers NASA/MSFC
14
System Safety Program Objectives
•
•
•
•
•
•
•
•
•
•
a. Safety, consistent with mission requirements is designed into the system
in a timely, cost-effective manner
b. Hazards are identified, evaluated, and eliminated, or the associated risk
reduced to a level acceptable to the managing activity (MA) throughout the
entire life cycle of a system
c. Historical safety data, including lessons learned from other systems, are
considered and used
d. Minimum risk is sought in accepting and using new designs, materials,
and production and test techniques
e. Actions taken to eliminate hazards or reduce risk to a level acceptable to
the MA are documented
f. Retrofit actions are minimized
g. Changes in design, configuration, or mission requirements are
accomplished in a manner that maintains a risk level acceptable to the MA
h. Consideration is given to safety, ease of disposal, and demilitarization of
any hazardous materials associated with the system
i. Significant safety data are documented as “lessons learned” and are
submitted to data banks, design handbooks, or specifications
j. Hazards identified after production are minimized consistent with program
restraints
Air Force Safety Agency, 2000, p 1
3/26/2006
Herb Shivers NASA/MSFC
15
Some Concept Phase SSE Tasks
Concept Trade Studies
– Concept alternative studies include quantitative and
qualitative SSE analysis input and criteria
Concept Definition
– Requirements management, risk management planning,
feasibility and design trades safety technical requirements
generation include results from SSE analysis
Shivers, 2005
3/26/2006
Herb Shivers NASA/MSFC
16
Some Development Phase SSE Tasks
Development of contract requirements in the Statement of Work and for
the contract data requirements (analyses & reports)
– Example analyses requirements
System Safety Plan
Preliminary Hazard List
Preliminary Hazard Analysis
Operating & Support Hazard Analyses
System Hazard Analyses
Fault Tree Analyses (FTA)
Probabilistic Risk Analysis (PRA)
Design and Development
– SSE input into specification development and verification planning
Shivers, 2005
3/26/2006
Herb Shivers NASA/MSFC
17
Some Production Phase SSE Tasks
Fabrication integration, test and evaluation
– SSE input into ground activities and verification
– Test planning to validate safety features
– Conducting test safely
Shivers, 2005
3/26/2006
Herb Shivers NASA/MSFC
18
Some Operations Phase SSE Tasks
Operations
– SSE input into operations and performance
validation (must be considered early as well)
Operation and Support Hazard Analyses
Analyses from the Human Factors Program
Shivers, 2005
3/26/2006
Herb Shivers NASA/MSFC
19
Some Close Out SSE Tasks
Decommissioning, disposal, recycling
– SSE inputs into process decisions
Shivers, 2005
3/26/2006
Herb Shivers NASA/MSFC
20
NASA S&MA Roles
S&MA provides
– SSE practitioners
– Assurance that requirements are set and met
– Development of disciplines and tools
S&MA in-line engineering has a review, evaluation and
concurrence role
The S&MA assurance supports engineering, validation
and verification, policy and planning, and independent
assessments
3/26/2006
Herb Shivers NASA/MSFC
21
System Safety Effort Throughout Project Lifecycle
Proposal Support
Requirements Definition
Design Assessment
Identification of Hazards
Recommended Hazard Controls
Assessment of Risk
Verification of Hazard Controls
Development of Safety Data Packages
Interface with KSC & Range Safety
Safety Support during I&T Activities
Track Closure of Verification Items
Safety Certification
Prelaunch Safety Support
Goddard Space Flight Center, 2006
3/26/2006
Herb Shivers NASA/MSFC
22
1:20
SUMMARY
System safety is involved throughout entire project
lifecycle
Hazards to personnel or mission success are
identified, eliminated or controlled to an acceptable
level of risk
Effectiveness of hazard controls must be verified
Hazard analysis results and verification results are
documented
Goddard Space Flight Center, 2006
3/26/2006
Herb Shivers NASA/MSFC
23
Organizational Accidents
Rare, sometimes catastrophic, events that occur
within complex modern technologies
Have multiple causes
Have devastating effects on uninvolved
populations and things
Contrast with individual accidents that involve a
person as often the victim and agent of the
event
Difficult to understand and control
(James Reason, p 1)
3/26/2006
Herb Shivers NASA/MSFC
24
Generic Cause of Organizational Accidents
“All organizational accidents entail the
breaching of the barriers and safeguards
that separate damaging and injurious
hazards from vulnerable people or assetscollectively termed ‘losses’
In individual accidents such defenses are
often either inadequate or lacking
Three factors of breaching defenses:
– Human, technical, organizational
– Governed by production and protection
(James Reason, p 2)
3/26/2006
Herb Shivers NASA/MSFC
25
Unintended Consequences
“… conflicts between production and
protection pressures tend to be resolved
in favour of the former – at least until a
bad accident occurs.”
– “efficient” methods for work arise naturally
– “safety” adds restrictions to procedures
– rules become more restrictive over time
– the scope of allowable actions is reduced
– violation of procedure becomes necessary to
accomplish the job
(James Reason, p 49)
3/26/2006
Herb Shivers NASA/MSFC
26
Maintenance Can Seriously Damage Your System
“…it is often latent conditions created by
maintenance lapses that either set the accident
sequence in motion or thwart its recovery.”
“…of the various possible error types associated
with the reassembly, installation or restoration of
components, omission – the failure to carry out
necessary steps in the task – comprise the largest
single error type.”
(James Reason, pp 85/6)
3/26/2006
Herb Shivers NASA/MSFC
27
Some Well-known Accidents
USS Thrasher 1963 sinking
–
–
–
–
QC of brazing, etc. Quality Problem = safety problem
Poor design, overhaul followed by severe test
Quality - to prevent, not learn from catastrophe
Design, manufacturing, identify safety critical
elements, test and verification, test planning
X31 Crash 1995
– Faulty Configuration Management
– Pitot tube heaters not present in design
– Failure to follow procedure, find process escapes,
identify critical failures, verification
Idaho Falls nuclear reactor explosion 1991
– Poor maintenance procedures, on the fly process
modifications, design flaws, QE supervision of work
3/26/2006
Herb Shivers NASA/MSFC
NASA, 2006
28
Project and Systems Management
Were developed to manage in an emerging new
environment:
A multitude of government agencies, industrial firms and
other organizations, sometimes on an international basis
Funds in the multimillion to billion dollar category
Complex technology sometimes reaching beyond the
state of the art
Large forces of scientists, engineers, technicians and
administrative personnel
Construction of extensive and highly specialized facilities
3/26/2006
Herb Shivers NASA/MSFC
Rees
29
Apollo Program Characteristics
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Program and systems management perspective
Technical risk trades with cost and schedule
Planning
Visibility
Management review
Configuration control
Penetration
Communication
Contracting philosophies
Organization
Authority, roles and responsibilities
Innovation
Goal focus
Continuous study and application of systems engineering
Relate actions to schedule and budget
3/26/2006
Herb Shivers NASA/MSFC
30
Systems Aspects
Such projects of great magnitude and complexity, had to be
considered under the overall “systems” point of view
The Apollo Program had shortcomings, setbacks, and
deficiencies during its execution – all of which challenged
the management:
To assure success, minimize technical risks or actually
mission risks
Keep closely to the time schedule
Wherever possible must engage in parallel rather than
consecutive developments
3/26/2006
Herb Shivers NASA/MSFC
Rees
31
Tight Budget Control and Highest Economy in Expenditure
Budget Controls
Subordinate to technical needs and the
demands of the time schedule
There is a trade-off between acceptable
technical risks or product quality, time
schedule and project cost.
“To eliminate the technical risk problem, frequently
undue quality control or over-testing of hardware is
applied which delays schedules and makes costs
skyrocket.”
3/26/2006
Herb Shivers NASA/MSFC
Rees
32
Solid Planning
Master plans on hardware, software, and overall systems:
Technical approaches
Resources such as facilities, manpower and funds
Schedules
Detailed breakdowns of the overall job and the system into
subsystems
3/26/2006
Herb Shivers NASA/MSFC
Rees
33
Visibility
Management at all levels should know almost in “real time”
what is going on in the program:
technical occurrences
schedule progress or delays
financial status
From the outset of the program, proper and effective channels
and ways of communication have to be established on the
government side between upper and lower echelons of
management
Prime contractors must provide equally effective channels
down to their respective subcontractors
3/26/2006
Herb Shivers NASA/MSFC
Rees
34
Significance of Visibility
Enable management on all levels to predict trends in
the progression of the program
Vital for taking corrective steps before the program
runs into impediments
“The capability of management to foretell trouble and
thus avoid it by appropriate actions was one of the
major cornerstones of the Apollo success.”
Dr. Eberhard Rees
3/26/2006
Herb Shivers NASA/MSFC
35
Review Milestones
Schedule review between government and prime contractors. Apollo reviews, for
instance, in a chronological sequence:
Program Requirements Review
PRR
Preliminary Design Review
PDR
Critical Design Review
CDR
Design Certification Review
DCR
Pre-Delivery Turn-Over Review
PDTR
Flight Readiness Review
FRR
Countdown Demonstration
Test and its Review
CDDT
3/26/2006
Herb Shivers NASA/MSFC
Rees
36
Significance of Reviews
Critically examine and assess the project status
Affirm the quality of the product and its reliability
Assure systems safety
Every review resulted in protocolled action items
Resolve problems
Authorized go ahead with the next increment of the
overall plan.
Rees
3/26/2006
Herb Shivers NASA/MSFC
37
Configuration Control
The contractor followed acceptable drawing room practice as
to procedure and discipline
Design intentions were carried through manufacturing
Only mandatory changes were approved
The exact configuration, known down to the most minute detail
was delivered to the launching site
Failures or unsuitable hardware or material could be traced
down to the point of origin (Apollo management called this
“traceability”)
“Configuration control carried out in a strict sense is very
expensive. It is, therefore, vital that these controls not be
overdone and that they are wisely introduced to prime
contractors and subcontractors.”
3/26/2006
Herb Shivers NASA/MSFC
Rees
38
Application of the Penetration Principle
Dr. Eberhard Rees on the “Penetration Principle”
“It permeated through the contractor organization to
the subcontractor structure. Spawned by this
approach, improved failure analysis appeared
throughout the system; in-process inspection was
maintained at a high level; and receiving inspection
techniques and effectiveness were improved, among
other benefits.”
3/26/2006
Herb Shivers NASA/MSFC
39
Significance of Penetration
Improved Communication Channels
Created close interaction of highly dedicated,
competent technical and scientific personnel, all
motivated by the impressive challenge of a huge
complex program, no mater whether they are
government or contractor employees
Most instrumental in this government-contractor
relationship was the establishment of resident
personnel in the prime contractor plants
Rees
3/26/2006
Herb Shivers NASA/MSFC
40
Contracting Principles
Cost-plus-fixed-fee contracts:
Used because of the uncertainties of effective, close pricing in such a
program with its many unknowns
Incentive fee contracts:
A base fee of modest proportions
Plus a scaled or incentive segment awarded to a contractor for
success in meeting program product requirements for performance,
cost, and time schedule
Lends itself well to hardware contracts with reasonable, welldetermined milestones, cost levels and schedule.
Award Fee contracts:
Used where parameters are not easily distinguished in advance
Support service or engineering service contracts
Motivational in nature
3/26/2006
Herb Shivers NASA/MSFC
Rees
41
Other Pertinent Principles
Organize and motivate to achieve effective high morale in the
workforce
Delegate authority clearly, concisely and positively to achieve
timely decisions
Apply innovative concepts and techniques courageously
Keep objectives pointed toward the goal
Require continuing study and application of the systems
engineering approach
Relate actions to schedule and to budget continuously
3/26/2006
Herb Shivers NASA/MSFC
Rees
42
The Apollo Management System
“Our management system evolved after some painful
experiences in the early days of Apollo. In fact, at the
beginning of the program in 1961, there was no common
system in existence within the rather young National
Aeronautics and Space Administration. Then as the program
gathered headway and matured, the management system
became better defined, changing as necessary to keep pace
with unfolding events. Early it was learned that in the
environment of a big development project, there can be no
static system. Change and evolution are inevitable.”
Dr. Eberhard Rees
3/26/2006
Herb Shivers NASA/MSFC
43
Program Integration
Three categories of concern:
First, there are the hardware, systems and subsystems
specialists who devote attention to the delivery of items that
are technically adequate and qualified for mission performance
Second, there are the specialists who approach the project
from the point of view of controlling costs and schedules.
As the third organizational element in the grouping, there is the
on-site resident management office. To assure that project
management interests were advanced and that decisions were
made and implemented within the designated scope of
authority of the resident group.
3/26/2006
Herb Shivers NASA/MSFC
Rees
44
Resident Management Offices
This resident element proved to be a most important
link between government and contractor activities
To expedite decisions, the resident manager
required functional support, which was provided by
specialized , on-site contract administration and
technical engineering staff
assigned from parent functional organizations of
the responsible Center
could make decisions “on the spot” or commit
the parent office or function at the Center (within
well-established limits)
3/26/2006
Herb Shivers NASA/MSFC
Rees
45
Significance of the Resident Management Office
Speed the project management process
Provide a dynamic interface with the contractor on a continuing
day-to-day basis
Integrate technical and managerial personnel
The technical functions tend to strive primarily toward perfection to
a degree that possibly inhibits adequate attention to manufacturing
and launch schedules or cost. The contractor could well be
oriented toward schedule, costs and profits, whereas the project
manager might weigh concern more heavily on schedule and costs.
Through the office of the resident manager, an automatic system of
checks and balances developed to the end that each consideration
received its appropriate share of attention.
3/26/2006
Herb Shivers NASA/MSFC
Rees
46
Contractor Penetration
Contractor penetration is necessary to obtain visibility
There is an understandably strong desire on the
part of industry to take the control and the
funding and to do the job with but minor
government intervention. The restiveness that
stemmed from such close control gradually
dissipated early in the Apollo Program as the
benefits accruing from the industry-government
teams approach were revealed. The manager
must have control of competent technical and
administrative staff in order to conduct activities
efficiently.
3/26/2006
Herb Shivers NASA/MSFC
Rees
47
Program Management
“While centralized program management has many
values, of prime importance is the assignment of all
responsibility to single organizational management
structures, pyramiding into a single strong
personality. Of course with the responsibility, the
manager must have commensurate authority to
resolve technical, financial, production and other
problems that otherwise require coordination and
approval in separate channels at different echelons.
And the manager must have clear, concise
communications flowing in all directions.”
Dr. Eberhard Rees
3/26/2006
Herb Shivers NASA/MSFC
48
Conclusion
• System Safety and Quality:
– necessary components of good program and
systems management
– very similar in their objectives, but with quite
different tools and techniques
– Must be applied early in the life cycle
– Must be implemented religiously throughout
program execution
– Must be continuously examined and improved
– Are complementary for safety and mission
success
3/26/2006
Herb Shivers NASA/MSFC
49
Acknowledgements (1 of 2)
“A Brief Overview of Selected System Safety Analytical Approaches,” R. R.
Mohr, Jacobs Engineering, 2002.
“Air Force System Safety Handbook,” Air Force Safety Agency, July 2000.
“Cost and Schedule – The Overlooked Hazards,” Ron Stroup and Warren
Naylor, Proceedings of the 19th International System Safety Conference,
2001.
“Improving Performance of the System Safety Function at the Marshall
Space Flight Center,” Ed Kiessling and Herb Shivers, NASA Marshall Space
Flight Center and Donald D. Tippett, The University of Alabama in
Huntsville, Proceedings of the American Society for Engineering
Management Conference, 9/2004.
“Human Factors: A Personal Perspective,” James Reason, Human Factors
Seminar, Helsinki, 2006.
Managing the Risks of Organizational Accidents, James Reason, Ashgate,
1997 (9th reprint, 2005).
“Quality 101,” American Society for Quality, 2001.
3/26/2006
Herb Shivers NASA/MSFC
50
Acknowledgements (2 of 2)
“Safety and Mission Success,” Technical Managers Training, Goddard
Space Flight Center, 10/2006
Some general SSE information in this presentation was taken from works of
Pat Clemens/APT Research, Huntsville, AL; Ronnie Goodin/KSC, retired.
“System Safety Engineering Awareness Training for NASA Managers and
Engineers,” (not yet released), 2006.
“System Safety Engineering Technical Warrant,” Herb Shivers, presented to
the NASA Technical Authority Conference, June 2005.
The Fifth Discipline: The Art and Practice of the Learning Organization,
Peter Senge, Currency Doubleday, 1990 - 1st edition, 1994 - paperback
edition.
“System Failure Case Studies,” NASA, Office of Safety and Mission
Assurance, Review and Assessment Division, 2006.
3/26/2006
Herb Shivers NASA/MSFC
51
Download