Introduction to Reliability

advertisement
Introduction to Reliability
• Reliability is:
– An inherent feature of design
– Concerned with performance in the field, as opposed to
quality of production (conformance to design specs)
• Definition
– Reliability is the probability that a system will perform
in a satisfactory manner for a given period of time
when used under specified operating conditions.
1
Introduction to Reliability (cont)
• What is Satisfactory ?
– All critical functions
– Time-oriented quantitative factors--MTBF
P (X>to), with X = Lifetime
– Qualitative factors, too
• Operating Conditions
– Use
– Handling, Transport, Installation, Storage
2
1
Reliability in the System Life-Cycle
• Conceptual Design Phase
– Define reliability requirements of a system
– Plan Reliability Program
• Preliminary Design Phase
– Allocate reliability requirements
– Predict reliability of components/subsystems
– Provide reliability estimates to cost estimating and design
trade-off studies
– Participate in design reviews
– Assess subsystem/ component supplier reliability estimates
3
Reliability in the System LifeCycle(cont)
• Detail Design Phase
– More detailed reliability prediction
– Assist in detail design decisions
– Assist in logistic support analysis
– Assist in prototype development
– Recommend changes prior to production
– Evaluate reliability of prototype
– Participate in other test and evaluation activities as
related to reliability
4
2
Reliability in the System LifeCycle(cont)
• Production/ Construction Phase
– Monitor production
– Perform reliability tests of selected items
• Qualification Tests -Prior to production, repetitive tests to
determine MTBF, degradation, failure modes
• Acceptance Tests- Random or 100%, testing of items
exiting production to assure that reliability demonstrated
during qualifying-test is being achieved in production items.
– Collect and analyze data on operational test (product evaluation
tests at a designated site)
– Recommend Corrective action
– Continue to update reliability models and predictions
5
Reliability in the System LifeCycle(cont)
• System Use Phase
– Data collection and analysis
– Reliability improvement studies
– Change recommendations
– Equipment redesign projects
6
3
Measures of Reliability
• Let T = Random Variable Measuring “Lifetime” of an item
(time to first - next - failure)
• Range Space of T={t:t ≥ 0}
≥
• Tests to establish PDF & Parameters
of T are called “Life
Testing”
• Cum, Distribution Function F(t)=P(T ≤ t) is called the
Failure Distribution Function
7
Measures of Reliability(cont)
• The Reliability Function is:
∞
– R(t)=P(T>t)=1-F(t)= f (t )dt
∫
1
t
F (t)
Prob
Reliability Density Function
0
R (t)
t
• Four ways to determine R(t) for a particular system
– Test many systems to failure. Develop curve empirically.
– Test many subsystems, use historical field data on others,
develop subsystem reliability functions, use a reliability system
model to combine.
– Extrapolate past experience with similar systems.
8
– Physical properties--Hypothesize a certain distribution.
4
Failures and Failure Rates
• 3 Types of Failure (See Figure 12.4)
– Initial ( Failure at t=0)
– Random
– Wearout
• IF initial failures are to be disregarded in your analysis,
g (t )
then use, f (t ) =
, t>0; as density for(T T > 0 )
[1 − P (T = 0)]
9
Failures and Failure Rates(cont)
• The Hazard Function is the instantaneous failure rate at
time t, given survival up to t has formula:
h(t ) =
− R′(t )
f (t )
=
R (t )
R (t )
t
• Note: H(t)= number failures in [0,t]= ∫ h( x )dx is called
the failure count function
0
10
5
Failures and Failure Rates(cont)
• How are H(t),R(t),F(t) Related?
t
H (t ) = −
0
R′( x )
∫ R( x) dx = − log R( x) ] = − log
t
e
0
e
R(t ) + loge R(0)
0
• So, R(t)=
e − H (t )
11
Mean Lifetime (Time Between
Failures)
• Mean Life = θ ≡ E(T)=
∞
∫ tf (t )dt,
or
0
∞
∞
0
0
∫ [1- F(t)]dt = ∫ R(t )dt
• Example:
– Random failures often are modeled by time-to-failure is
exponential with rate λ:
f (t ) = λe − λt , t ≥ 0
= 0 otherwise
F(t ) = 1 − e − λt
R(t ) = e − λt
12
6
Example (cont)
• Then,
f (t ) λe − λt
h(t ) =
= −λt = λ Constant
R (t ) e
• Also, because
θ=E(T)= 1
R(t ) = e − H ( t )
, H(t)=λt Linear in t and
λ
• P(T< θ)=F(θ)=
1 − e −λθ = 1 − e −1 = 1 −
1
= 1 − .3679 = .6321
e
• P(T≥ θ)=.3679 , Independent of λ (or θ)
13
Examples on Pages 349
• Example 1
– 5 Components did not fail in 600 hours
– 5 Others failed at various points
5 failures
λ=
= 0.001196
• Example 2
4180 hours
– Operating Cycle = 168.8 hours
– Downtime = 26.8 hours
– Operating Time = 142
14
7
Examples on Pages 352-353(cont)
• Number of failures = 6
Only if we treat
MTBF = MTBM
(instant maintenance)
• λ = 6 / 142 = 0.042
• MTBF = 23.81 hours = 1 / λ
• Operational Availability =
MTBM
23.81
=
= 0.841
MTBM + MDT 23.81 + 4.4666
Other examples are on handouts
Hines and Montgomery, example 15-7
Halpern, examples 10-1 thru 10-6
Note: For exponential failure module R(t) = e- λ t is the first
Term in a poisson distribution with parameter x.
15
What if Failure Rate Not Constant?
• Distribution
Normal
• Lognormal
Failure Rate h(t) Behavior
Increasing Function
Various Shapes
• Weibull
Decreasing β<1
Constant
β =1
Increasing β>1
• Gamma
Decreasing n<1
Constant
n =1
Increasing n>1
16
8
What if Failure Rate Not
Constant(cont)
• Have different h(t) for each time interval where rate is
constant
• use average failure rate (AFR) between t1 and t2
t2
AFR(t1 , t2 ) =
∫ h(t )dt
t1
t2 − t1
Note: AFR (0, t) =
=
H (t2 ) − H (t1 ) ln R(t1 ) − ln R(t2 )
=
t2 − t1
t2 − t1
H (t)
- ln R(t)
=
t
t
17
Concepts Our Text Skips
• Renewal Rate Function r(t) = Instantaneous failure rate at time T
accounting for replacement of failed items with new components
from same population as original parts
• Censored Type I Data : A fixed test duration T is pre-set. Units
that do not fail before T are “censored” in that the data doesn’t
account for their survival beyond time T. If T is poorly chosen,
may get no failures by time T--then what?
18
9
Concepts Our Text Skips (cont)
• Censored Type II Data : A fixed number of failures is prespecified, n items are tested until r fail. If r is poorly
chosen, test make take too long.
• Readout Time Data : Record actual failure times of each
failed component
19
Estimation of λ for Exponential Life
• λ = (number of failures) / (total unit test hours)
• Type I Censored Data
– n items, r failures
λ=
r
r
∑t
+ ( n − r )T
i
i =1
ti = time of i th failure
λ=
• Type II Censored Data
(ends at r th failure time t r )
r
r
∑t
i
+ (n − r )tr
i =1
• If system has n components and system fails when first
n
component fails λ s = ∑ λi
i =1
20
10
System Reliability Models
• Defined: Math models of the system that show functional
relationships among subsystems, components, etc.
• Examples
– Reliability block diagram
• Shows all possible success/failure combinations
• Series and parallel; also k-out-of-n configuration
• Any closed path through system is success
• May not resemble system physically
• Standby redundancy
21
System Reliability Models(cont)
• Coherent systems models
• Fault tree analysis and other cause-consequence diagrams
– Work from top level events (failures)
– To primary events ( causes)
22
11
Series Configuration
1
2
n
• Static Model:
Rs =
n
∏ R = R * R *...R
i =1
i
1
n
∏ R (t )
Rs (t ) =
• Dynamic Model:
hs (t ) =
n
2
i
i =1
n
∑ h (t )
i
i =1
Hs ( t ) =
n
∑ H (t )
i
i =1
23
Example
• Exponential Subsystem Failure Models
− λ + λ + ... + λ n ) t
Rs (t ) = e ( 1 2
n
hs (t ) = ∑ λi
Constant
i =1
θ = MTBF =
1
n
∑λ
i
i =1
See example on page 354
24
12
Active Parallel Configuration
• Static:
Ra = 1 −
1
n
∏ (1 − Ri )
i =1
• Dynamic: Ra (t ) = 1 −
2
n
∏ (1 − R (t ))
i
i =1
n
• Identical Components:
Ra (t ) = 1 − [1 − R(t )]
System fails only
if all n subsystems
fail
n
25
Example 1
• Always Keep in Mind “Redundancy Has a Cost”
# of Components in Parallel
R
Wt.
Benefit/ Cost
1
0.95
5 lb
-
2
0.9975
10 lb
.0475 / 5 lbs
3
0.999875
15 lb
.002375 / 5 lbs
26
13
Example 2
• Exponential Subsystem Lifetime, Identical Subsystems
[
Ra (t ) = 1 − 1 − e − λt
θa =
∞
]
n
n
1
θ
n
∫ R (t )dt = ∑ λ * i = ∑ i
a
i =1
0
e. g., if n = 3 and
i =1
1
θ =
= 1000 hours
λ
1000
1000
1000
+
+
1
2
3
= 1000 + 500 + 333.33 = 1833.33
θa =
27
Special Configurations
• K-out-of-n Configuration
– Systems works only if at least k of n components are
working. Assume identical components with reliability
R(t):
Rs (t ) =
n
n
∑ ( i )[ R(t )] [1 − R(t )]
i
n−i
i=k
• If
−t
R(T ) = e − λt = e θ exponential, then θ s =
n
θ
∑i
i=k
28
14
Special Configurations (cont)
• Combined Series-Parallel
– Key:Treat Components in parallel as single component,
then expand
Rs = Ra * RBUC = Ra [1 − (1 − RB )(1 − RC )]
Rs = R AUB * RCUD
= [1 - (1 - R A )(1 − RB )][1 − (1 − RC )(1 − RD )]
See pages 354 - 355
29
Availability Measurement
• Inherent Availability (Ideal Support Environment)
Ai =
MTBF
MTBF + M ct
M ct = mean corrective maintenance time
= mean time to repair (MTTR)
• Does not include preventive maintenance, logistics delay, or
administrative delay.
• Achieved Availability ( Ideal Support Environment)
M = mean active maintenance time
MTBM
Aa =
= weighted average of corrective
MTBM + M
and preventive maintenance time.
– MTBM = mean time between any maintenance action,
corrective or preventive
30
15
Availability Measurement
Operational Availability ( Actual Support Environment)
Aσ =
MTBM
MTBM + MDT
MDT = mean downtime = weighted average of active
maintenance (current and previous) and delays (logistical and
administrative.
31
Comments on Availability
• Availability is a function of both:
– Reliability of a prime item
– The logistics support subsystem
• Equipment designer can exert little control over support
operations, but can design in:
– Built-in diagnostics
– Easy access
– Rapid disconnect / connect
32
16
Comments on Availability (cont)
• The proper balance of R&M must be decided in early
stages, when flexibility is great.
• Discussion of availability is always in some context:
– Actual failure or not
– Which mission, what is critical to success
– Maintenance crew, equipment, spares availability
33
Reliability Techniques in System
Design Phase
• Conceptual Design Phase
– Assignment of system reliability goal based on:
• Mission analysis
• Cost analysis
• Technical Limits
• Preliminary Design:
– Block Diagram Models
– Estimation of Ri(t) Functions
– Study of failure points, solutions
34
17
Reliability Techniques in System
Design Phase(cont)
• Preliminary Design Phase (Cont.)
– Definition of Success/ Failure criteria
– Budgeting/ Revision of Reliability Requirements
• Detail Design:
– Material and Parts Selection
– Standardization
– Test and Evaluation
– Requirements for Suppliers
– Series-Parallel Recommendations
– De-rating
35
Standardization
• Standardization:
– Means selection of components and materials whose
reliability characteristics are known, as well as their
degradation under stress and aging. This indirectly
eases the burden on spare parts inventories, by having
same component used in several systems
36
18
De-rating
• De-rating:
- Use part in application below its rated value
– A type of overdesign to provide reliability margin
• Steps:
– Identify operating interval
– Select de-rating % ( see RCA Corp. Table)
– Calculate de-rated value of component to be used
• Example: ceramic capacitor for 100v (max) application
- RCA recommends 70% de-rating
- X (0.7) = 100, X = 142.85 v minimum requirement for
component
37
Binomial Expansion to Explain
Parallel-Redundant Systems
• Consider 3 Identical Components in Parallel
– P = Probability of Operation of Each
– Q = Probability of Failure of Each
3
3
3
(P + Q)3 = P 3 +   P 2 Q +   P1Q 2 +   P 0 Q3
 1
 2
 3
= P 3 + 3P 2 Q + 3P1Q 2 + Q3
All 3
up
2 up,
one
failed
P (System operating)
One up,
two failed
All 3
down
1 − Q3 = 1 − (1 − P)3
P 3 + 3 P 2 Q + 3 P1Q 2
38
19
Binomial Expansion to Explain
Parallel-Redundant Systems
• Let PA=PB=PC=PD = 0.9
Which configuration is more reliable? Why?
A
B
C
D
A
B
C
D
39
Parallel Redundancy Has Its
Drawbacks
• Limitations
• Each subsystem must have a “switch” to assure its failure doesn’t
disable the remaining components
• Sometimes necessary to “disconnect” failed system
• Redundancy increases weight, volume, cost and sometimes
complexity. The failure sensing device may be unreliable
• Alternatives to Redundancy
• Reduce number of parts
• Simplify
• Improve reliability level of parts used, especially at critical
“nodes”
• Burn-in of Parts
• On-board spares, repairs
40
20
Standby Redundancy
• Assume “cold” standby, not energized until failure
detected in original component
• Assume reliability of “decision switch” is 100%
• Lifetime variable is T=T1+…+Tn
• Standby always more reliable than simple parallel, if
switch is 100% Reliable
1
DS
2
41
n
Standby Redundancy (Cont,)
• Assume lifetime variable is as follows:
T = T1 + T2 + - - - - + Tn
∑ E (T )
V (T ) = ∑ V (T )
E (T ) =
i
i
If Ti each exponential, t is gamma (λ , n)
n
λ
n
V (T ) = 2
λ
E (T) =
n=2
R(t) = P(system life > t / one standby) = e - λt + (λt )e − λt
n=3
R(t) = P(system life > t / two standbys) = e - λt + (λt )e − λt +
(λt )2 − λt
e
2!
42
21
Benefits of Computerized Reliability Models
• Helps keep track of reliability relationships
– Across levels of design
– Within a given level
• Rapid Sensitivity Analysis
– Is overall R goal even feasible
– Study effect of different R allocations
– Study effects of configuration changes on R
– Study effects of substituting different components
– Perform “worst-case” analysis
• Can be adapted to multiple missions -in essence, one model for each set of
mission equipment/conditions
• Can be used to evaluate proposed modifications to existing system
43
Analytical Methods to Support Reliability Estimation and
Assist in Design Decisions
• Stress-Strength Analysis
• Critical-Useful-Life Analysis
• For Complex Systems ( radar, missiles, computers)
– Failure Mode and Effect Analysis
– Worst-Case Analysis
– Sneak-Circuit Analysis
• Safety Analysis Techniques
– Fault-Tree Analysis
– Task and Error Analysis
– Hazard Analysis
44
22
Discussion of Stress-Strength
Analysis
• Measures Resistance to Stress (strength)
• Examples: operating wattage versus rated wattage
Operating temperature vrs rated temperature
pounds/square inch
• Includes:
– Stress distribution, especially maximum stress
– Stress causes, timing, frequency
– Stress testing, such as metal fatigue tests
45
Discussion of Critical-Useful-Life
Analysis
• Critical-Useful-Life Analysis:
– Identification of critical item list and requirements of
each of these items for a preventive maintenance,
corrective maintenance, and replacement.
Includes studies of how to eliminate critical items
through redesign
46
23
Discussion of FMEA
• Failure Mode and Effect Analysis:
– Identification of all possible failure modes of
equipment, the possible causes and the possible
immediate/ ultimate effects on the system and
operation
• Formal documentation in words not diagrams
• Estimation of probability of occurrence
• Classify each failure by criticality
• Describe corrective action alternatives
47
Discussion of Worst-Case Analysis
• Worst Case Analysis:
– Examining how the performance of an electrical circuit
(or other device) will change over time as a result of
drift in part characteristics. Provides guidance on how
to allow for part parameter variation in design
48
24
Discussion of Sneak-Circuit Analysis
• Sneak-Circuit Analysis:
– Use of math models to identify any unanticipated
performance signal paths in a circuit that may degrade
performance or introduce failure.
49
Reliability Prediction at Part, Circuit,
and Subsystem Level
• Based On:
– Similar equipment--Extrapolate. Not very accurate.
– Number and complexity of “active element groups”--these are controllers or
converters of energy
– part types, counts, failure rates are combined into an estimate of system
reliability
– Prediction based on testing, such as stress tests
• Used For:
– Higher-level reliability prediction
– As input to maintenance and logistic support analysis
– Comparison with requirement, where are we over/ under reliability
50
25
Reliability Degradation Studies/
Action
• Determine and correct potential/ actual adverse effects due
to:
– Storage, packing, transportation, handling
– Unpacking, assembly, set-up
– Preventive and corrective maintenance
• Carelessness
• Wrong tools and equipment
• Didn’t follow/ know proper procedure
51
Reliability Test and Evaluation
• To answer question : “will the mature system achieve its
MTBF requirement in operation ?”
• Should be part of an integrated test plan to test entire spec.
• Type I Tests:
– are early enough in design process so that design changes are fairly cheap
• Type II and III Tests must :
– Follow approved procedures ( first drafts of tech manuals and training
courses)
– Use test and support equipment that was specified in the maintenance
concept and detailed in LSA
– Be provided with ( test ) supply support
– Be carefully planned, instrumented, documented, analyzed
52
26
Type II Reliability Testing
• Evaluation of prototype and early production models, using
producer personnel
• Includes:
– Reliability qualification tests, to determine
• MTBF
• MTBM
• Failure sequences, detection, performance degradation
• Maintenance procedure adequacy
• Maintenance induced failures
– Production sampling acceptance tests
53
Types of Type 2 Tests
• Sequential Qualification Tests
– Environmental test chambers
– Environmental test cycle, equipment duty cycle
– Multiple identical test items
– Statistics-based accept-reject test plan
• Producer’s Risk α
• Consumer’s Risk β
}usually range from
.05 to .25 (negotiated)
54
27
Types of Type 2 Tests (cont)
• Reliability Acceptance Testing- Plot MTBF versus time,
look for growth/decline
• Reliability Life Testing- To determine failure distribution
– Continuous (Steady)
• Fixed Time, Count Failures
• Fixed number of Failures, Count Time
– Step-Stress (Accelerated) Testing
• Step up stress until all units fail
• Aids in planning burn in
55
Type 3 Testing
• Definition- Operational Testing Using:
–
–
–
–
–
A group of production units
Designated field test sight
Representative mix of mission profiles
User personnel (first trained)
1st sets of support equipment; spares
• Uniqueness
– All elements of the system are operational and evaluated together
– Where the true R, M, A and other performance measures are known
for first time, rather than estimated via models plus some type 1 & 2
test data
56
28
Download