Reliability Risk Assessment – Ray Barl

advertisement
Reliability Risk
Assessment
Ray Barlog, PE
March 24, 2015
“Service Measured to the Standard”
Cornerstone Electrical Consultants, Inc.
Safety and Reliability
• Both deal with uncertainty, aim to reduce
undesired outcomes
• Safety mostly concerned with avoiding
harm to humans
• Reliability most often concerned with
reducing economic losses - $$
Cornerstone Electrical Consultants, Inc.
Risk
- An event that has a negative consequence and
has a probability of occurring (not an opportunity)
•
Risk = Likelihood x Consequence
•
Reliability Risk = Failure Probability x $$ Impact
•
Reliability Risks are often not constant across
time
Risk
Do We Want To…………
Eliminate Risk?
Reduce Risk?
Manage Risk?
Risk Management Process
Identify
Assessment
Control
Respond
Analyze
Evaluate
Risk Assessment
The process of identifying,
analyzing, and evaluating, and
prioritizing risks
Some Reliability Risk
Assessment Methods
Functional FMEA
Process FMEA
Equipment FMEA
Expected Value FMEA
Fault Tree Analysis
Qualitative Fault Tree
What If Analysis
Bow Tie Analysis
RAM Modeling
Stochastic Life Cycle Cost
Concept FMEA
Event Tree Analysis
Layer of Protection Analysis
Markov Analysis
3 Reliability RA Tools
• Functional Failure Mode and Effects
Analysis
• Bow Tie Analysis
• Reliability, Availability, Maintainability
(RAM) Modeling
FMEA
• Probably the most common reliability risk
assessment tool
• Structured method
• Best using team with diverse backgrounds
FMEA
• Came from Military Procedure MIL-P-1629,
Procedures for Performing a Failure Mode, Effects
and Criticality Analysis, dated November 9, 1949.
• FMEA used and improved by NASA in the 1960's to
improve and verify reliability of space program
hardware.
• Mil-Std-1629A used in the military and by commercial
• Used in the Nuclear Power Industry for evaluating
design risks
• SAE J1739 - an FMEA standard used in the auto
industry
FMEA-asks the questions
• What is the intended function?
• How does it fail? ( failure mode )
• How often do we expect the failure to occur?
• How severe are the effects?
• What are the potential causes of the failure?
• How likely is the onset of failure to be detected?
Common Example
Objective: Determine the most
critical risk and its cause(s) for this
boiler feed water system.
Common Example
If 2 pumps fail,
both boilers
trip
Common Example
P-1
P-2
P-3
Cornerstone Electrical Consultants, Inc.
Risk Rating Factors
DEGREE OF
RATING
SEVERITY
OCCURRENCE
Qualitative
1
2
3
4
5
6
7
Less than $50K
$50k to $100k
$100k to $500k
$500k to $1mm
$1mm to $5mm
$5mm to $10mm
$10mm to $100mm
Likelihood of
occurrence is
remote
Low failure rate
with supporting
documentation
Low failure rate
without
supporting
documentation
Occasional
failures
Medium Failure
Rate
Moderately
High Failure
Rate
High Failure
Rate
DETECTION
FAILURE
RATE (_/yr)
1.00E-06
1.00E-05
1.00E-04
Detection
Certainty
Almost certain that the
potential failure will be found
or prevented before
producing an economic loss
Current controls may or
may not detect impending
failure
Current controls probably
will not detect the potential
failure
100%
50%
0%
1.00E-03
1.00E-02
1.00E-01
1
Cornerstone Electrical Consultants, Inc.
FMEA Worksheet
Subsystem
Function of
Subsystem
Potential
Failure Mode
Boiler Feed Pump Deliver
Loss of ALL
System
feedwater to feed water flow
boilers at
2mmpph rate
O
C
C
Potential Causes
S
D
R P
E Potential Failure Effects E
N
V
T
Boilers trip, Production
Loss of $100k per day
4 x 5 days plus $50k
2
pump repair cost, Total
$550k loss
Boilers trip, Production
Loss of $100k per day
1 Pump fails and auto3
3 x 5 days plus $60k
3
start for standby fails
repair cost. Total $560k
loss
2oo3 Pumps Fail
4 Simultaneously due
to seal failure
Current
Controls
Manual
Condition
32
Monitoring for
vibration
Recommended
Actions
Consider
continuous
vibration
monitoring
Action
Owner
Joe
Engineer
Periodic
27 Testing of Auto- None
Start
NA
Boilers trip, Production
Loss of $100k per day
5 x 15 days plus $100k
repair cost. Total
$1.6mm loss
2
Periodic
40 ultrasonic
None
corona testing
NA
Boilers trip, Production
Pump 1 fails and
Loss of $100k per day
3 Station Service bus B 5 x 15 days plus $50k
fails
repair cost. Total
$1.55mm loss
2
Periodic
30 ultrasonic
None
corona testing
NA
4
Loss of Station
Service Bus B
Cornerstone Electrical Consultants, Inc
Bow Tie Analysis
A simple graphical tool that shows the link
between potential causes, preventive and
mitigating controls, and consequences of a
risk event
• Shows at a glance how risks are managed
• Can be purely qualitative or semiquantitative
Reason’s Swiss Cheese
Cause 1
Consequences
Threats or Causes
Generalized Bow Tie
Cause 2
TOP
EVENT
Cause 3
Cause 4
Mitigations
Barriers
Example Risk Matrix
Freq per
Year or
Likelihood
1
( 1/yr)
2
(1/10yr)
3
(.001)/yr
4
(.0001)/yr
5
(.00001/yr)
Financial Consequence Severity
A
B
C
D
<$50k
$50 to $500k
$500k - $5mm
$5mm-$50mm
E
$50mm $100mm
Bow Tie-Common Example
THREATS or
CAUSES
2oo3 pumps fail
due to seal
failures
F=1
BARRIERS / PREVENTIVE CONTROLS
_
Pump
redundancy
2
One pump fails
and auto-start
fails
F=1
P-2 or P-3 fails
and SS Bus A
fails
F=3
Station Service
Bus B Failure
F=2
_
_
_
Medium
Robust shaft
and bearing
design
1
Medium
Periodic testing
of auto-start
2
_
_
MITIGATIVE CONTROLS
Weak
Operator
response
2
Medium
Burner Trip
System
TOP EVENT
Inadequate
BFW Flow to
Boilers
3
Use of Predictive
Maintenance
Techniques
2
2
Corona testing
to detect onset
of failure
2
Medium
_
Medium
_
_
Weak
Planned Repairs
Prior to Major
Damage
2
_
Boiler Tubes
Damaged
$10mm
Strong
Spares Stocking
Strategy
1
Weak
Corona testing
to detect onset
of failure
Medium
3
Strong
Quick Pump
Repairs
1
_
3 Element BFW
Control System
CONSEQUENCES
Medium
_
Large
Production
Downtime
Losses $550k$5mm
Significant Pump
Repair Costs
>$100k
RAM Model
• RAM: Reliability, Availability, Maintainability
• Reliability: Probability of surviving a given time
interval without failure under given conditions
• Availability: Average % time a system is in a state
to perform a function
• Maintainability: Probability of completion of a
maintenance task in a given time interval
RAM Model
• A graphical and mathematical
representation of system operation,
dependency, and performance
• Most quantitative of the three methods
presented
• Requires failure data, repair time data,
and system operating logic
RAM Model Building Block
• Series
RAM Model Building Block
RAM Model-Example
RAM Model-Typical Input
RAM Model Results
System Life Cycle Performance Summary
System Mean Availability
99.986%, +/- 0.052%
Average Annual Production Losses
2.457 mmLb/yr
Average Annual Production Losses
$5,120/yr
Average Outage Duration
160.6 Hrs
Longest Duration Outage
372 Hrs
Shortest Duration Outage
0.34 Hrs
Results of 1000 Simulations, 20 Years in Length
RAM Model Results
Pros / Cons - FMEA
• Structured, Thorough
• Tedious, Time Consuming
• Easy to Learn
• Requires robust risk
matrix
• Uses Group Knowledge
• Doesn't handle
redundancy or multiple
failures well
• Doesn't handle
dependencies well
• Doesn’t handle increasing
failure rates well
• Requires no special
software
• Excellent for evaluating
designs early in the
process
Pros / Cons – Bow Tie
• Excellent risk
management
communication tool
• Easy to learn and
interpret
• Uses group knowledge
to develop
• Quantifying risk requires
modification
• Requires robust risk
matrix
• Fairly quick to develop
• Becomes complex with
large systems
• Software recommended
for good documentation
Pros / Cons – RAM Model
• Quantifies risks for
prioritization
• Estimates risks over time
• Handles dependencies,
redundancy, special ops
rules
• Evaluating “What Ifs”
can be done quickly
• Can be labor and $$
intensive for large
systems
• Not easily understood by
person not trained
• Requires special analyst
skills for model building
• Quality of model depends
on quality of data
•
•
•
Final Thoughts
There is NO one best or universal
method.
Use the simplest method that can help
you meet the objective of your
assessment with the minimum investment
of time and resources.
Risk assessment alone is valueless- risks
must be managed and that takes action.
What are your questions?
Download