22.38 PROBABILITY AND ITS APPLICATIONS TO Fall 2005, Lecture 2

advertisement
22.38 PROBABILITY AND ITS APPLICATIONS TO
RELIABILITY, QUALITY CONTROL AND RISK ASSESSMENT
Fall 2005, Lecture 2
RISK-INFORMED OPERATIONAL
DECISION MANAGEMENT (RIODM):
RELIABILITY AND AVAILABILITY
Michael W. Golay
Professor of Nuclear Engineering
Massachusetts Institute of Technology
Component States and Populations
Successful
Ns
Failure
Failed
Repair
Nf
Consider a population, Nso, of successful components and, Nfo,
failed components placed into service at the same time.
At time, t, progresses, some of these components will fail and some
of the failed components will be repaired and returned to service.
The expected populations of components vary in time as:
Expected Successful Components:
Ns = NoPs(t)
Expected Failed Components:
Nf = NoPf(t)
and
Probability Conservation:
Ps(t) + Pf(t) = 1
and
Component Conservation:
Ns(t) + Nf(t) = No
COMPONENT FAILURE PROBABILITY
Component (Conditional) Failure Rate, λ(t),
1 dPs ( t )
1 dN s ( t )
=
= −λ ( t )
Ps ( t ) dt
Ns (t ) dt
where
Ps(t) = probability that an individual component will be
successful at time, t;
Ns(t) = expected number of components surviving at time, t (note
that Ns(t=0) = Nso);
λ(t) = time-dependent (conditional) failure rate function.
Mean-Time-To-Failure (MTTF) = 1/λ = τf,
for λ = constant.
COMPONENT REPAIR PROBABILITY
Component Repair Coefficient, µ(t),
1 dPf ( t )
1 dN f ( t )
=
= −µ( t )
Pf ( t ) dt
N f ( t ) dt
where
Pf(t) = probability that an individual component will be failed at
time, t;
Nf(t) = expected number of components failed at time, t (note that
Nf(t=0) = Nfo);
µ(t) = time-dependent (conditional) repair rate function.
Mean-Time-To-Repair (MTTR) = 1/µ = τR,
for µ = constant.
Combined Repair and Failure
dN s
= −λNs ( t ) + µN f ( t )
dt
dN f
= λNs (t ) − µN f ( t)
dt
can express as matrix equation
dN
= MN ,
dt
where
 N s ( t )
 −λ µ 
N=
 , and M = 
.
 N f ( t )
 λ −µ 
This is the relationship for a Markov process, where for a single
component:
dP( t )
= MP( t ) ,
dt
where
 Ps ( t )
P(t ) = state vector of the component = 
.
 Pf ( t )
For initial condition Ps(t=0) = 1 and Pf(t=0) = 0,
Solution is:
 λ  − ( λ+µ ) t
µ
Ps ( t ) =
+
e
λ + µ  λ + µ
 λ 
− (λ +µ ) t
Pf ( t ) = 
.
 1− e
 λ + µ
[
]
Asymptotic result: (i.e., as t ∅ ∞)
 µ 
 λ 
Ps ∞ = 
Pf∞ = 
,
.
λ + µ
 λ + µ
1
Ps
Ps∞
Pf∞
Pf
0
0
Time, t
COMPONENT CYCLE: RUN-TO-FAILURE,
REPAIR AND RETURN-TO-SERVICE
Consider that total mean cycle time is τcycle for:
Component Status
= τs (= MTTF)
a) Service
b) Failure
c) Waiting for repair
d) Repaired to service
= τR (= MTTR)
τs
τR
1 1 µ+λ
τ cycle = τ s + τ R = + =
λ µ
λµ
τs
µ
Ps ∞ =
=
τ cycle µ + λ
τR
λ
Pf∞ =
=
τ cycle µ + λ
τ cycle
EFFECTS OF COMPONENT
TESTING AND INSPECTION
BENEFICIAL
•
•
•
•
Verify That Component Is Operable
Reveal Failures That Can Be Repaired
Exercise Component and Maintain Operability
Maintain Skills of Testing Team
HARMFUL
• Removal From Service Can Result in Complete Component
Unavailability
• Wear and Tear Due to Testing (Wear, Fatigue, Corrosion, …)
• Introduction of New Defects (e.g., via Damage During Inspection,
Fuel Depletion)
• Acceleration of Dependent Failures
• Damage or Degradation of Component via Incorrect Restoration to
Service
• Human Error Can Cause Wrong Component to Be Removed From
Service
TIME DEPENDENCE OF STANDBY
COMPONENT UNAVAILABILITY,
INCLUDING TEST AND REPAIR
1.0
Test, t t ,
duration
Mean
repair, t R ,
duration
Test
Test
fR ,
fraction
of tested
components
requiring repair
0
t1
Effect of tests, demonstrating that some
failure modes have
not become activated
Q1
Q2
Effect of untested failure modes and
new defects
Time, t
t2
t1 = time of first test
t i = time of i-th test
POST-TEST UNAVAILABILITY
CAUSED BY
• Failures Requiring Repairs, Caused by Tests
• Defects Introduced by Tests, Resulting in Later Failures
• Incorrect Component (and Supporting System)
Disengagement, Re-Engagement
• Incorrect Component Having Been Tested
MEAN AVAILABILITY, <Q>, UNDER
DIFFERENT COMBINATIONS OF TESTING
AND REPAIR: CASES TO BE CONSIDERED
(λ = CONSTANT)
CASES
1. Asymptotic Component Unavailability as Function of µ, λ
2. Mean Component Unavailability During Standby Interval
3. Cycle Mean Unavailability Due to
• Defects randomly introduced during standby,
• Unavailability due to testing and repairs
4. Cycle Mean Unavailability Due to
• Pre-existing defects,
• Defects introduced during standby, and
• Unavailability due to testing and repairs
5. Standby Interval That Minimizes <Q>
CASE 1. ASYMPTOTIC AVAILABILITY WHEN
FAILURES ARE MONITORED AND REPAIRED
Asymptotic Availability : A ∞ = Ps ∞
1
0
µ
=
µ+λ
Ps
Ps∞
Pf
Pf∞
0
1
Note that MTTR = = TD
µ
1
⇒ A=
1 + λTD
Time, t
(TD = repair-related down-time)
and
λTD
Q =1− A =
1 + λTD
also, Q ≈ λTD
CASE 2. MEAN UNAVAILABILITY
DURING STANDBY PERIOD, ts
During Standby :
Q( t ) = Pf ( t ) = 1 − e −λt s ≈ 1 − (1 − λt s )
Q( t ) ≈ λt s
Pf
<Q>
ts
Time
ts
ts
′
′
Q
t
d
t
∫ 0 λ(t′) dt′
tD ∫0 ( )
t s2
Q =
=
=
=λ
tc
ts
ts
2t s
€tc = cycle time
ts
Q =λ
2
CASE 3. MEAN CYCLE UNAVAILABILITY,
INCLUDING TESTING AND REPAIR
For the Entire Testing Cycle Can Evaluate Expected Unavailability,
<Q>, Due to Defects Introduced Randomly During Standby and
Unavailability Due to Testing and Repairs as:
1 tc
tD
Q = ∫ 0 Q( t)dt =
,
tc
tc
€
where
Q
<Q>
Time
CASE 3. MEAN CYCLE UNAVAILABILITY
(continued)
DOWNTIME:
t D = t Ds + t D t + t D R
λt 2s
t Ds =
During Standby:
2
t Dt = t t
During Testing:
t D R = fR t R
During Repair:
fR = repair frequency, the fraction of tests for which a repair is
required
CYCLE TIME:
tc = ts + tt + tR
cycle standby testing repair
AVERAGE UNAVAILABILITY:

t D 1  λt s2
Q = = ∗
+ t t + fR t R  ( t s + t t + t R )
tc tc  2

CASE 4. MEAN CYCLE UNAVAILABILITY,
INCLUDING PRE-EXISTING
UNAVAILABILITY, Qo
Evaluate Expected System Unavailability, <Q>, Due to
• Pre-Existing Defects
• Defects Introduced Randomly During Standby and
• Unavailability Due to Testing and Repairs as:
Q
<Q>
Qo
Time
CASE 4. MEAN CYCLE UNAVAILABILITY,
INCLUDING PRE-EXISTING UNAVAILABILITY,
Qo (continued)
t D = t Ds + t D t + t D R
λ 2
During Standby: t Ds = Qo t s + t s (1 − Q o )
2
Qo = expected unavailability due to pre-existing defects (i.e.,
those not interrogated during testing)
t Dt = t t
During Testing:
t D R = fR t R
During Repair:
λ 2
For Entire Cycle: t D = Qo t s + (1 − Q o ) t s + t t + fR t R
2
CYCLE TIME: tc = ts + tt + tR
cycle standby testing repair
DOWNTIME:
AVERAGE UNAVAILABILITY:

tD 1 
λ 2
Q =
=  Qo ts + (1 − Qo ) ts  + t t + fR t R 

tc tc 
2 
COMBINED CASE OF EFFECT UPON STANDBY
SYSTEM FAILURE OF PRE-EXISTING FAULT
AND RANDOMLY INTRODUCED FAULT
I
R
I = Pre-existing fault event
R = Random fault event
F = I+R = Component fault
P( F) = P(I + R ) = P(I ) + P( R ) − P( I ) ⋅ P( R )
λt s
λt s
− Qo ⋅
2
2
λt s
P( F) = Q o + (1 − Q o )
2
P( F) = Q o +
CASE 5. STANDBY INTERVAL THAT
MINIMIZES <Q>
For a Good System:
⇒
t t + fR t R << t s
Q o << 1

1
λ 2
Q ≈ Q o t s + t s + t t + fR t R 

tc 
2
∂Q
€
*
The value of ts which minimizes Q , t s , is obtained from
= 0 as
∂t s
1/ 2
 2(t t + fR t R ) 
1/ 2
*
€
ts = 
=
2τ
t
+
f
t
(
)
[ f t RR]

λ


€
τf = random defects contribution
(tt + fRtR) = testing and repair contribution
UNAVAILABILITY
• Failure density fT (t ) = λe − λt t ≥ 0
t
• Cumulative Density Function (CDF): FT (t) = P(T ≤ t) = fT (t)dt
0
• Unavailability Q(t) :
probability that system is down at time t,
∫
€
t
Q(t) = FT (t) = ∫ 0 fT (t)dt = 1− e−λt ≈ 1− (1− λt)
Q(t) ≈ λt
€
MEAN UNAVAILABILITY DURING
STANDBY PERIOD, ts
2
1 ts
t
t
< Q >= ∫ 0 Q(t)dt ≈ ∫ 0s λtdt = λ s
ts
2t s
ts
< Q >≈ λ
2
MEAN CYCLE UNAVAILABILITY,
INCLUDING TESTING AND REPAIR
tt
Q(t)
tR
<Q>
ts
tC
 λt

Q(t) =  1
f
 R
(0 ≤ t ≤ t s )
(t s < t ≤ t s + t t )
(t s + t t < t ≤ t C )
Time
1
t €
× ∫ 0C Q(t)dt
tC
1
t
t +t
t
= × ∫ 0s λtdt + ∫ t s t dt + ∫ t C+t fR dt
s
s
t
tC

1 λ 2
= ×  t s + t t + fR t R 

tC 2
<Q>=
[
]
MEAN CYCLE UNAVAILABILITY INCLUDING
PRE-EXISTING UNAVAILABILITY, Q0
tt
Q(t)
tR
<Q>
Q0
ts
tC
Q 0 + (1− Q 0 )λt (0 ≤ t ≤ t s )

Q(t) = 
1
(t s < t ≤ t s + t t )

(t s + t t < t ≤ t C )
fR

Time
€
1
t
× ∫ 0C Q(t)dt
tC
1
t
t +t
t
= × ∫ 0s Q 0 + (1− Q 0 )λtdt + ∫ t s t dt + ∫ t C+t fR dt
s
s
t
tC

1 
λ 2
= × Q 0 t s + (1− Q 0 ) t s  + t t + fR t R 

t C 
2 
<Q>=
[
]
STANDBY INTERVAL, ts*, THAT
MINIMIZES <Q>

1 
λ 2
• < Q >= t × Q 0 t s + (1− Q 0 ) 2 t s  + t t + fR t R 
C
t t + fR t R << t s
t C = t s + t t + t R ≈ t s
• For a good system 
⇒
(1− Q 0 ) ≈ 1
 Q 0 << 1

€

1 
λ 2
⇒ < Q > ≈ × Q 0 t + t s  + t t + fR t R 

t s 
2 
∂€< Q > *
(t s ) = 0
∂t s
∂<Q> * λ
1
(t s ) = − (t t + fR t R ) × 2 = 0
∂t s
2
t*
s
1/2


2(t
+
f
t
)
⇒ t*s =  t R R 


λ
STANDBY INTERVAL, ts*, THAT
MINIMIZES <Q> (continued)
1
<Q>
ts*
tC
1/2


2(t
+
f
t
)
1/2
t*s =  t R R  = [2τ f ( t t + fR t R )]


λ
tf = random defects contribution
€
(tt+fRtR) = testing and repair contribution
MEAN UNAVAILABILITY, EXAMPLES
• Mean unavailability during standby period ts:
t s = 10 3 hr, λ = 10−4 hr −1
3
ts
10
< Q >= λ = 10−4 ×
= 0.05
2
2
• Mean cycle unavailability, including testing and repair:
€
t s = 10 3 hr, λ = 10−4 hr −1, t t = 25 hr, t R = 60 hr, fR = 0.01

1 λt s2
<Q>= 
+ t t + fR t R 
tC  2

10−4 ×10 3×2

1
= 3
+ 25 + 0.01× 60 ≈ 0.07

2
10 + 25 + 60 

MEAN UNAVAILABILITY, EXAMPLES
(continued)
• Mean cycle unavailability including Q0:
t s = 10 3 hr, λ = 10−4 hr −1, t t = 25 hr, t R = 60 hr, fR = 0.01, Q 0 = 0.02

1
λt s2
< Q > = Q 0 t s + (1− Q 0 )
+ t t + fR t R 
tC 
2



1
10−4 ×10 3×2
3
= 3
+ 25 + 0.01× 60 ≈ 0.087
0.02 ×10 + (1− 0.02)
2
10 + 25 + 60 

• Optimum standby interval ts:
1/2
1/2 



2(t
+
f
t
)
2(25
+
0.01×
60)
t*s =  t R R  = 
≈ 715.54 hr

−4




λ
10
€
EXAMINATION OF SEQUENCING OF TESTS
EXAMPLE OF TWO PARALLEL IDENTICAL COMPONENTS*
A) Successive Testing
B) Staggered Testing
*
• Consider random failures during standby, time out-of-service during testing
• Ignore time out-of-service during repairs, pre-existing defects.
FOR REDUNDANT SYSTEMS CAN COMBINE INDIVIDUAL COMPONENT
UNAVAILABILITY VALUES TO OBTAIN OVERALL SYSTEM UNAVAILABILITY,
CONSIDER A 1/2 PARALLEL SYSTEM (e.g., Two Parallel EDGs), WHERE SUCCESS
OF ONE COMPONENT IS SUFFICIENT FOR SYSTEM SUCCESS
A
B
Q system = Q A ⋅ Q B
(ignoring dependencies)
During Interval with Units A & B in Standby:
(
)(
)
Q s ( t ) = 1 − e −λ A t A 1 − e −λ B t B ≈ λ A t A ⋅λ B t B = λ A λ B t A t B
tA = time that component A has been on standby
tB = time that component B has been on standby
Note, effects of downtime for repair omitted from this analysis.
FOR REDUNDANT SYSTEMS CAN COMBINE INDIVIDUAL COMPONENT
UNAVAILABILITY VALUES TO OBTAIN OVERALL SYSTEM UNAVAILABILITY,
CONSIDER A 1/2 PARALLEL SYSTEM (e.g., Two Parallel EDGs), WHERE SUCCESS
OF ONE COMPONENT IS SUFFICIENT FOR SYSTEM SUCCESS (continued)
(
)
During Interval with Unit A in Testing:
Q s = 1⋅ 1 − e −λ Bt B ≈ λ B t B
During Interval with Unit B in Testing:
Qs = 1 − e
During Interval with Unit A Possibly in Repair:
Q s = fR A 1 − e −λ B t B ≈ f R A ⋅λ B t B
where
−λ A t A
) ⋅1 ≈ λ At A
(
)
(
)
fR A = repair frequency of Unit A
During Interval with Unit B Possibly in Repair:
where
(
fR B = repair frequency of Unit B
Qs = fR B 1 − e −λ A t A ≈ fR B ⋅λ A t A
ILLUSTRATION OF INDIVIDUAL COMPONENT (e.g., EDG) UNRELIABILITIES
FOR A 1/2 PARALLEL SYSTEM GIVEN A STRATEGY OF TESTING EACH
COMPONENT AT SUCCESSIVE INTERVALS (e.g., TESTING BOTH
COMPONENTS DURING SAME OUTAGE)*
Let λA = λB = λ
Testing Time Start
Component A
Component B
Component A
Tested First
τ1 = τ
τ2 = 2τ + tt
τ1′ = τ + tt
τ2′ = τ2 + tt - 2τ + 2tt
Component B
Tested Second
* Role of repair omitted from the analysis.
ILLUSTRATION OF INDIVIDUAL COMPONENT (e.g., EDG) UNRELIABILITY
FOR A 1/2 PARALLEL SYSTEM GIVEN A STRATEGY OF TESTING EACH
COMPONENT AT EVENLY STAGGERED INTERVALS
Component A
Tested First
Component B
Tested Second
COMPARISON OF MAXIMUM AND AVERAGE
VALUES OF Q, FIRST CYCLE OF TESTING
Qmax
Successive Testing:
λ( τ + t t ) ≈ λτ
Staggered Testing:

τ
τ
λ + t t  ≈ λ
2

2
€
<Q>cycle
λτ
≈
3
5
≈ λτ
24
€
Successive Testing:
Staggered Testing:
€
€
HUMAN ERRORS ARE
TYPICALLY MOST IMPORTANT
Also, taking into account human errors committed during tests
and repair and failure modes not tested previously.
Qo = unavailability due to defects existing at the start of the
next testing cycle
Qo = Q U + Q H , where
QU = unavailability due to failure modes not interrogated during
the tests performed, and those activated upon demand
QH = λttt + λRtR, and
λt = rate of introduction of defects due to human errors during
tests (e.g., system realignment errors), hr -1
λR = rate of introduction of defects due to human errors during
repairs (e.g., incorrectly installed gaskets, tools or debris
left within a component), hr -1
SUMMARY
• Testing and Inspections Contribute to Simultaneous Increases
and Decreases in System Availability
• These Contributions Can Be Balanced Optimally
• Staggered Testing Yield Lower Peak and Lower Mean System
Unavailability vs. Successive Testing
Download