slides - 2015 Accelerated Stress Testing & Reliability

&
Accelerated Stress Testing and Reliability Workshop
October 9-11, 2013
San Diego, CA
Accelerating Reliability into the 21st Century
Keynote Presenter Day 1: Vice Admiral Walter Massenburg
Keynote Presenter Day 2: Alain Bensoussan, Thales Avionics
CALL FOR PRESENTATIONS: We are now Accepting Abstracts.
Email to: don.gerstle@gmail.com.
Guidelines on website www.ieee-astr.org
For more details, click here to join our LinkedIn Group:
IEEE/CPMT Workshop on Accelerated Stress Testing and Reliability
This is the 3rd of a series of four webinars being
put on by Ops A La Carte, ASTR, and ASQ
Reliability Division
Each webinar will also be presented as a full 2 hour
tutorial at our ASTR Workshop Oct 9-11th, San Diego.
Abstracts for presentations are due Apr 30.
www.ieee-astr.org
Agenda
 Introduction
5 min
 Accelerated Reliability Growth Testing
45 min
 Questions
10 min
Upcoming Reliability Webinars
Title: 40 Years of HALT: What Have We Learned
Author: Mike Silverman
Date: Sept 12, 2013, 12pm EST
http://reliabilitycalendar.org/webinars/english/40-years-of-halt-whathave-we-learned/
Location: Webinar
HALT began 40 years ago with a simple idea of testing beyond
specifications in order to better understand design margins. Over the
past 40 years, thousands of engineers around the world have been
exposed to the concepts of HALT and have tried the techniques.
This tutorial will explore what we have learned in the past 40 Years and
what the future of HALT could be.
Registration Demographics
For this webinar we have signed up
–250 Registrants
–17 Countries
–28 US States
Registration Question #1
Have you ever performed a Reliability
Growth Test?
–Never
–All the time
–Tried Once
45%
25%
20%
Registration Question #1
For your last RGT, did you have a chance
to plan the duration and stresses?
–Neither
–Both
–Duration Only
–Stresses only
50%
25%
10%
10%
Traditional and
Accelerated Reliability
Growth
The Case of Lost (and Found)
Failure Rates
Milena Krasich, P. E.
Raytheon, IDS
Copyright © 2012 Raytheon Company. All rights reserved.
Customer Success Is Our Mission is a registered trademark of Raytheon Company.
Tutorial Objectives
 Identify shortcomings of traditional reliability growth testing and
offer alternatives
 Reliability Growth Test objectives
 Explain traditional Reliability Growth test methodology along with the
assumptions
 Show shortfalls of the traditional methods
• Entire item failure rate not calculated and presented in results
• Test duration too long for the modern high reliability items
• Little or no relationship of reliability and stresses on the tested item





Show principles of the Physics of Failure test methodology
Show how the Reliability growth test based on PoF is constructed
Show how the expected stresses are applied and accelerated
Show how to account for total final failure rates
Show achieved considerable test cost reduction.
Page 10
Traditional RG Test Methodology
 Overall test duration determined based on the initial and goal
reliability measure: failure rates Mean Time Between Failures,
MTBF (or MTTF)
 Initial failure rate estimated for the entire item and then used for
calculations of reliability growth
 Reliability growth parameters and test duration determined based on
the goal reliability - mathematically
 Magnitude (stress level of applied operational and environmental
stresses equal to those in use – but not their duration
 Applied stress duration determined by engineering judgment, and level
by assumptions of some “mean” stress
 Overall test duration and stress application are unrelated to use profiles
or required life or mission of the product – only to mathematics
 Additional errors:
 Mathematical
Page 11
Principles and Assumptions
 Goal:
 Increase the current (existing reliability – measured in mean time
between failures)
 Goal magnitude guided by:
• Requirement or commercial logic
 Item as designed contains design errors:
 Those are going to appear in test reasonably within the determined
test time
 The test errors are going to be eliminated by design corrections type
B failure modes)
 The test continuation will evaluate success of the fix.
 Design errors that cannot be fixed (type A failure modes) will
continuously be counted
 Failures determined to be random will not be counted
 Reliability growth will be measured.
Page 12
Principles and Assumptions, cont.
 Failure rate during the test is constant when there are no changes
of the tested item
 Failure rate decreases with introduced design corrections in
steps, and remains constant through the next change
 The step curve is fitted with a curve representing NonHomogenous Poisson Process, NHPP)
 The process definition: failure rate is constant until changes occur.
 The facts not considered in application of that theory:
 The initial failure rate is just the total failure rate. No rationale how
much of it is attributed to:
• Design problems that can be corrected
• Random events (those failure modes one does not know where they
come from, they “just happen”)
• Design problems that cannot be corrected for one of the reasons:
– Technically impossible
– Economically not justifiable
– Time to market constraints
Page 13
Mathematical Model - Refresher
 The expected accumulated number of failures up to test time T is
given by:
b
E  N T  l T , with l  0, b  0, T  0
 where
 l is the scale parameter;
 b is the shape parameter (a function of the general effectiveness of the
improvements; (0 < b < 1, corresponds to reliability growth; b = 1
corresponds to no reliability growth; b > 1 corresponds to negative
reliability growth- reliability degradation)
 The failure intensity when it is changing as a result of design
improvements after T h of testing is given by:
 t  
d
dt
E  N t  lb t
 Item ( t )   B ( t )   A ( t )   r ( t )
 Item ( t )  l  b  t
b 1
  A (t )   r (t )
b 1
, with t  0
 T  
1
 T 
b 1
 A ( t )  l A  const .
 Item ( t )  l  b  t
 r ( t )  l r  const .
 Item ( t )  l Item  b  t
Page 14
 l A  lr
b 1
Mathematics of Traditional Reliability Growth
 Failure modes types in test:
 Systematic: corrected in test (Type B), not corrected (Type A), Random constant
 Item ( t )   B ( t )   A ( t )   r ( t )
0,06
 Item ( t )  l  b  t
  A (t )   r (t )
 Item ( t )  l  b  t
0,05
Failure intensity/failure rate (failures/hour)
b 1
Only type B failure modes failure
rates are accounted for in a
reliability test program – those that
show growth expressed by the
power law model; the type A and
random remain constant.
0,04
S(t)=A(t)+r(t)+B(t)
0,03
0,02
b 1
r(t)
The only failure modes with
decreasing failure rates
(power law)
B(t)
0,01
A(t)
0
0
1000
2000
3000
Test duration (hours)
4000
5000
6000
Page 15
Planning Reliability Growth
 To plan a reliability growth, the initial value of failure rate, lI or
initial mean time between failures, I, was assumed as known at
some time tI. This initial failure rate would have a value that was
known by experience for that item or by similarity with another like
item, I(tI)=constant
 The thought process was then that this initial failure rate would
decrease under the rules of the power law and at the end of the
test with the corrections would assume a final value (a constant
again), F(tF).
 The Crow/AMSAA/Duane planning model is simple and easy to
implement:  
b 1
 t    I t I 
t


 tI 
 But, the initial failure rate has three components, only one of
those can be improved and fitted with the power law, the failure
rate of the B failure modes. The remaining components are
constant.
Page 16
Planning Reliability Growth, cont.
 The remaining two components are constant. The final failure rate
as a function of time also contains three components, two
constant and one only that can be fitted with the power law:
 I ( t )   BI ( t I )   A ( t I )   r ( t I )
 I ( t )   BI ( t I )   A   r
 tF 


 tI 
b 1
 BF t F    BI ( t I ) 
 F (t F )  l  b  t F
b 1
  A (t F )   r (t F )
 F ( t )   BF ( t F )   A   r
 F ( t )   BI
t
( t I )  F
 tI




b 1
  A  r
 The final B-modes failure rate is then made of the improved Btype failure modes failure rate and the total final item or system
failure rate contains also two additional constant components:
Page 17
A Failure Modes
 The random failure rates are not recorded or taken into account,
the A-type failures are considered in the number of failures
 it is said that they are included into the shape parameter calculations but
there is no example in current Handbooks that would show how it was done
 It is also stated that the Type A failure modes are counted every time they
show up, repetitions included; no example of that statement could be found
 Given that there is no improvement applied, type A failure modes
should be treated in the same manner as the random failure
rates. They could be separately accounted for, but numerically,
their failure rate will be added to the random failure rate.
 This means that during the test, the A type failure modes should
be counted as another group of constant failure rates
 In which case the methodology of the fixed duration testing should
be applied to determine failure rates for both:
• The A – type failure modes
• All other random failure modes where the origin is not identifiable.
Page 18
Present Method to Determine Test Duration
 Test duration is mathematically determined from the reciprocal of
the “failure rate” as:
 log  F  t F   log  1  t1  

F
t
t F    1 t1    F
 t1









1 b
tF  e


1  b 
 log  t1  

Where:
F = final product MTBF (for mitigated. “fixed” failure modes only) – given goal
I = initial product MTBF (for failure modes that will be mitigated) - assumed
tF =test duration needed to achieve the final MTBF for fixed failure modes
tI = initial test time (has various explanations) – assumed – what is it?
Test Duration (hours)
 Example – old school:
 I=4,000 hours,
 F=10,000, b = 0.6
4
1
10
8
10
6
10
4
10
2
10
3
3
tF tI
 
3
3
0
0
400
800
tI
Initial Test Time (hours)
Page 19
Initial MTBF – What is It?
 In the traditional test design, the initial test MTBF is the MTBF
assumed for the product, but:
 The reciprocal of this initial MTBF is the initial failure rate made up of
three components, two of them are constant, not Power Law:
• Design – correctable
• Design – non correctable
• Random failure rates or failure modes
 It is only the design failure modes that can be corrected (B type) that
can be fitted by the Power Law (Weibull Intensity Function), thus:
 BI t I  
1
 BI t I 
  BI
• What part of the entire item initial assumed, estimated failure rate could
those correctable failure modes could be?
• Analytical prediction contains only the random failure rates
– If the Design Engineering is reasonably competent, Type A or B failure modes
could be at the most 40% of the assumed initial failure rate
– B failure rate could be only a small fraction of the estimated product failure rate
before the test.
Page 20
Parameters and Results
 Recorded in test are cumulative times of occurrence of A and B
failure modes.
1
d
b 1



T

B
 B t  
E  N B t  lb t
, with t  0

 T 
dt
B
 A modes are not addressed, they should not be a part of the power
law – handbook text suggested they are counted, if they were it would
have been in error
 From test data, shape and scale parameters are determined
bˆ 
NB
N B  ln T  
; Unbiased : b 
NB
 ln t 
i
i0

NB 1
N B  ln t 0  
l 
NB
 ln t 
NB
T
b
i
i0
 The reported failure rate and MTBF are:


 B T   l  b  T
1
 B T    
l  b T

b 1

b 1
 Random and A modes do not seem to be a part of the achieved
growth. They are unfortunately - forgotten.
Page 21
Comparison
 If initial test time was assumed to be 200 hours
 Traditional test (all failure rates – power law):
 Initial failure rate: lI = 2.5×10-4 f/hr
 Initial MTBF: I = 4,000 hours
 Final MTBF: F = 10,000 hours
 Final test time: 1,976 hours (from the initial time)
 True status, only B-type failure modes improved (e.g. maximum 40% of the
old “initial” failure rate: lI = 2.5×10-4 f/hr
 Initial failure rate for B modes: lI = 0.4 ×2.5×10-4 f/hr = 1×10-4 f/hr
 Initial MTBF: IB = 10,000 hours
 Possible final MTBF for B modes: FB = 30,000 hours
 Overall final failure rate B modes + random and A modes: 1,833 ×10-4
 Final overall MTBF: F = 5,544 hours
 Final test time: 3,118 hours (from the initial time)
 The forgotten, unreported failure rate: = 1.5×10-4 f/hr
Page 22
The Solution – Way Forward
 The possible correct solution:





Prepare a reliability growth test for only B failure modes
Count A type failure modes as if they are random
Count random failures
Calculate final B failure modes failure rate and MTBF
Add the constant A and random failure rates to get results
 Possible problems - difficulties:
 The calculated mathematical test duration is unrelated to use stresses or use
profile
 The traditionally determined test duration is too short to account for the random
failures, normally the required test duration for a reasonable confidence is
about 10 MTBFs (in our example would be about 70,000 hours)
• The traditional RG test duration does not support this test time
 A short reliability growth test does not disclose any cumulative damage or
failures of small failure rates that would start showing only after the test is
complete, while useful life of the item could be 10 or 20 years
 The proposed viable solution – accelerated Reliability Growth test.
Page 23
Physics of Failure and Reliability
 Failures occur when an item is not strong enough to withstand one or
more attributes of a stress:
 Level, duration, or repetitions of its application
• The higher the level the shorter duration or less repetitions induce a failure
The area of overlap of strength and stress distributions
represents probability of failure for each of the stresses;
mL, sL = mean and standard deviation of the load
distribution sL = b× mL
mS, sS = mean and standard deviation of the strength
distribution sS = a × mS
• If the mean of strength is a k times multiple of the mean of stress (load) and the
standard deviations of each are a and b times their respective mean values,
reliability of an item regarding each use stress (i), and the total reliability will be:

Ri ( k , m L _ i )   


k  mL _ i  mL _ i
a  k  m L _ i 2  b  m L _ i 2




S
R Item ( t 0 ) 
 R Stress
i
(ti )
i 1
Page 24
Physics of Failure Reliability – Margin k Selection
 Allocate reliability regarding each of the expected stresses in use
 The cumulative damage and ultimately failure due to a stress is proportional
to the stress level and its duration. For the stress applied at the same level
as in life, the cumulative damage model is: D ( t )   S ( t )  dt
1.00
t
0.95
0.90
Reliability
Reliability
0.85
0.80
0.75
b=0,5
a=0,05
b=0,2
a=0,05
0.70
b=0,05
a=0,05
0.65
b=0,2
a=0,02
b=0,1
a=0.02
0.60
b=0,05
a=0,02
0.55
0.50
1.00
1.05
1.10
1.15
1.20
1.25
Multiplier k
1.30
1.35
1.40
1.45
1.50
For the allocated reliability
regarding each stress, select
the value of margin k which
would multiply its duration in use
to be applied in test;
Apply stresses simultaneously
whenever possible;
If the same stress type is
applied at different levels in use,
recalculate their durations to the
highest level (using acceleration
factors);
The most common values for a
and b are:
a = 0.05, b = 0.2
Page 25
Test Acceleration
 Each of the stresses is accelerated in test to allow for shorter test
duration
 Total item failure rate is the sum of its failure rates regarding each
individual stress (l0 is the item total failure rate in use condition and lA is
the accelerated item total failure rate (in reliability growth l is equivalent
to ):
N


 l0 

i 1  

S
l A  ATest
 
j

A j   li

i




 Product j exists when the stresses 1 to j produce the same failure mode.
 Stress acceleration models for different stresses – example:
 inverse power law model (usually applicable to thermal cycling, vibration,
shock, humidity);
 Arrhenius model (used for temperature acceleration using absolute
temperature);
 Eyring model (used also when the thermal stress is a factor in process
acceleration);
 step stress model, where the stress is increasing in steps;
 fatigue model representing the degradation due to the repetitious stress.
Page 26
Test Example B Failure Modes – duration k×life
Parameter
Symbol
Value
Required life
t0
10 years = 87 600 h
Required reliability
R 0 (t 0 )
0,8
Time ON
t ON
2 h/day=7 300 h
Temperature ON
T ON
65 °C
Time OFF
t OFF
22 h/day=80 300 h
Temperature OFF
T OFF
35 °C
Thermal cycling
 T Use
45 °C, two times per day
Total cycles
N Use
7 300
Temperature ramp rate

1,5 °C/min
Vibrations, random
W Use
16,68 m/sec 2 r.m.s
Relative humidity
RH Use
50 %
Activation energy
Ea
1,2 eV
Determination of factor k – for major
stresses:
R i ( t 0 )  R 0 ( t 0 ) 
1
4
 0 . 946
k=1.5
1
0,95
0,9
0,85
Reliability
0,8
a=0,1
b=0,1
0,75
0,7
0,65
0,6
0,55
Stresses:
0,5
1,00
Thermal cycling
Thermal exposure (thermal dwell)
Humidity
Vibration
Operational cycling
Thermal cycling
  TTest
ATC  
 T
Use





m
A Ramp
_ Rate

  Test

 Uset




1/ 3
N TC _ Test 
1,05
1,10
1,15
1,20
1,25
1,30
1,35
1,45
1,50
Thermal dwell (normalize exposure when OFF to duration
at ON temperature): 

N TC _ Use  k
ATC  A Ramp
_ Rate
t ON
_N
t ON
_N
E
 t ON  t OFF  exp   a
 k B


1
1



T


273
T

273
ON
 OFF
 
 8 , 754 hours
Duration of accelerated exposure:
tT
_ Test
 t ON
tT
_ Test
 168 . 1 h
One thermal cycle in test = 24 hours in life
_N
 E
 k  exp   a
 k B


1
1



T
T Test  273  
 ON  273
Page 27
.
1,40
Multiplier k
Test Example, Cont.
 The thermal exposure is combined with the thermal cycling, distributed over the high
temperature:
t TC  2  ( ramp time)  (temp. Stabilizat ion  Thermal Dwell)  Dwell at cold
 The test cycle profile:
t TC  2 
125
 22 . 3  5  52 . 3 min  0.875 h
10
 Humidity: Test 95% RH and temperature TRH= 85 °C (65 °C chamber + 20 °C internal
h
temperature rise)
 Ea 
 RH Use 

1
1
t RH
_ Test _ Test
 t ON
_N

 RH

Test
 exp  




T

k

273
T

273

B
ON
RH


 

h  2.3
t RH
_ Test
 300 h
 Vibration: 150,000 miles, 150 hours per axis vibration at 1.7 g rms. Test level: 3.2 g rms
 To project test time to life use acceleration factor to multiply test time
tVib _ Test  k  tVib _ Use
W
  Use
 W Test




w
With : w  4
ilure
Time to
failure
h
Cumulative
time to
failure (n=24)
 (t)
log(t)
log[  (t)]
1
3,821.33
91,711,92
91 ,711.92
4.96
4.96
2
5,781.33
138,751.92
69 ,375.96
5.14
4.84
3
14,016
336,384.00
112 ,128
5.53
5.05
4
18,563.44
445 522,56
111, 380.64
5.65
5.05
t 0*k
131.400
3 ,153 ,600
788 ,400
6.50
5.90
tVib _ Test  18 hours per axis
Data for reliability plotting:
Initial B failure modes MTBF 100,000 hours, final 106hours
Initial test time: 100 hours
Total traditional test time: 4.6x103hours
Final test reliability (B failure modes): 0.99997
Final MTBF (improved failure modes):1,431,964 hours
Total accelerated test time; 526 hours Page 28
Why Accelerated Reliability Growth?
 The test duration covers product entire life
 It allows detection of all design problems, not only those that appear in a
small fraction of product life
 It enables estimate of failure rate regarding product random events,
disregarded in traditional RG testing
 The failure rate achieved by design improvement with the random failure
rate provides realistic estimate of total product reliability
 Test duration is determined based on required total reliability in view of
product physical cumulative damage from life stresses in use;
 Test acceleration allows achievement of very reasonable test duration,
shorter than traditional mathematically derived testing
 The reliability improvement through test is no longer cost prohibitive
 Test failure times are projected to their appearance in real life and the
analysis uses this data;
 Even though covering the product expected life (durability information), it
is still considerably shorter than the traditional reliability growth test.
Page 29
Biography



Milena_krasich@raytheon.com
Milena Krasich is a Senior Principal Systems Engineer in Raytheon Integrated Defense
Systems, Whole Life Engineering in RAM Engineering Group, Sudbury, MA.
Prior to joining Raytheon, she was a Senior Technical Lead of Reliability Engineering in Design
Quality Engineering of Bose Corporation, Automotive Systems Division. Before joining Bose, she
was a Member of Technical Staff in the Reliability Engineering Group of General Dynamics
Advanced Technology Systems formerly Lucent Technologies, after the five year tenure at the
Jet Propulsion Laboratory in Pasadena, California. While in California, she was a part-time
professor at the California State University Dominguez Hills, where she taught graduate courses
in System Reliability, Advanced Reliability and Maintainability, and Statistical Process Control.
At that time, she was also a part-time professor at the California State Polytechnic University,
Pomona, teaching undergraduate courses in Engineering Statistics, Reliability, SPC,
Environmental Testing, Production Systems Design,. She holds a BS and MS in Electrical
Engineering from the University of Belgrade, Yugoslavia, and is a California registered
Professional Electrical Engineer. She is also a member of the IEEE and ASQC Reliability
Society, and a Fellow and the president Emeritus of the Institute of Environmental Sciences and
Technology. Currently, she is the Technical Advisor (Chair) to the US Technical Advisory Group
(TAG) to the International Electrotechnical Committee, IEC, Technical Committee, TC56,
Dependability. As a part of the TC56 Working groups she is working on dependability/Reliability
standards as a project leader for revision of many released and current international standards
such as IEC/IEEE/ANSI Reliability Growth IEC 61014 and IEC 61164, Fault Tree Analysis IEC
/ANSI 61025, Testing for the constant failure rate and failure intensity (Reliability
compliance/demonstration tests), IEC/ANSI 61124 and FMEA, IEC/ANSI 60812, and for
preparation of the new IEC standard on Accelerated Testing, IEC 62506 .
Page 30
Upcoming Reliability Webinars
Title: 40 Years of HALT: What Have We Learned
Author: Mike Silverman
Date: Sept 12, 2013, 12pm EST
http://reliabilitycalendar.org/webinars/english/40-years-of-haltwhat-have-we-learned/
Location: Webinar
HALT began 40 years ago with a simple idea of testing beyond
specifications in order to better understand design margins. Over
the past 40 years, thousands of engineers around the world have
been exposed to the concepts of HALT and have tried the
techniques.
This tutorial will explore what we have learned in the past 40
Years and what the future of HALT could be.
Page 31