Uploaded by psalmsgus99

Lecture V

advertisement
EEG560
Reliability & Maintainability of E&E
Components & Systems
Designing for Higher Reliability
Sunday ADETONA, PhD
sadetona@unilag.edu.ng
Outline
Introduction
Methods of Improving Reliability
◦ Part Selection
◦ Use of Improved Constructional Method and New Technology
◦ Use of Redundancy
◦ Standby Redundancy: Cold and Hot Standby Redundancies
◦ N Modular Redundancy: Dual, Triple, and Quadruple Redundancies
◦ 1:N Redundancy
◦ Use of Statistical Design Method
◦ Use of Derating Concept
Design Considerations and Packaging
Conclusion
Introduction
The main goal of designing for Higher Reliability is to:
 Reduce components failure rate
Methods of Improving Reliability
 Part Selection
 Use of Improved Constructional Method and New Technology
 Use of Redundancy
 Standby Redundancy: Cold and Hot Standby Redundancies
 N Modular Redundancy: Dual, Triple, and Quadruple Redundancies
 1:N Redundancy
 Use of Statistical Design Method
 Use of Derating Concept
Methods of Improving Reliability
Part Selection
 For the design of a reliable system, it is necessary to ensure
that the selected components:
 Are insensitive to the environment concerned.
 Do not create any harmful effect on other components.
 Can be protected individually or in group of a complete circuit from the
harmful effect of the environment.
 Do not change in normal value outside the allowable limit.
Use of Improved Constructional Method and New
Technology
Many methods and new technology have been developed
which are intended to reduce the quantity of electronics
components in any given equipment and protect them against
the effects of the environment.
 These measures go a long way in improving reliability. Some of the main
developments in construction have been the use of encapsulation and
printed circuits.
There have also been tremendous developments in
components themselves which make them less sensitive to
the environment.
 For example, the development of semiconductor diodes and transistors has
reduced the problem of mounting in sockets in order to reduce the effects of
mechanical shock and vibration.
 This is because they are small in size and dissipate little ‘harmful’ heat to
neighbouring components.
 Encapsulating of circuits and component provides a great deal of protection
against humidity, mechanical shock and vibration.
Use Of Redundancy
Redundancy
 Involves the use of additional units in a variety of configurations;
 Can be Standby Redundancy, N Modular Redundancy, and 1:N Redundancy
Standby Redundancy
 Also known as Backup Redundancy;
 Here, there is an identical secondary unit to back up the primary unit;
 The secondary unit typically does not monitor the system, but is there just as a spare.
 There is need for a third party to be a:
 Watchdog,
 which monitors the system to decide when a switchover condition is met and command the system to
switch control to the standby unit and
 Voter,
 which is the component that decides when to switch over and which unit is given control of the DUC.
The system cost increase for this type of redundancy is
 Usually about 2X or less depending on your software development costs.
In Standby redundancy there are two basic types,
 Cold Standby and Hot Standby.
 Cold Standby:
 Here, the secondary unit is powered off, thus preserving the reliability of the unit.
 Figure 5.1
 The drawback of this design is that the downtime is greater than in hot standby, because you
have to power up the standby unit and bring it online into a known state.
 This makes it more challenging to reconcile synchronization issues, but do to the length of the
time it takes to bring the standby unit on line, there will be a big bump on switchover.
 Hot Standby:
 Here, the secondary unit is powered up and can optionally monitor the DUC.
 Figure 5.2
 If the secondary unit is used as the watchdog and/or voter to decide when to switch over,
then a third party can be eliminated to this job.
 This design does not preserve the reliability of the standby unit as well as the cold standby
design.
 However, it shortens the downtime, which in turn increases the availability of the system.
 Hot Standby are similar to Dual Modular Redundancy (DMR) or Parallel Redundancy.
 These naming conventions are commonly interchanged.
 The main difference between Hot Standby and DMR is how tightly the primary and the secondary are
synchronized.
 DMR completely synchronizes the primary and secondary units.
N Modular Redundancy
It is also known as Parallel Redundancy, which :
 refers to the approach of having multiply units running in parallel.
 All units are highly synchronized and receive the same input information at the same time.
 Their output values are then compared; and
 A voter decides which output values should be used.
This model typically has faster switchover times than
 Hot Standby models, thus the system availability is very high,
 but because all the units are powered up and actively engaged with the DUC, the system is at
more risk of encountering a common mode failure across all the units.
There are three main typologies:
 Dual Modular Redundancy,
 Triple Modular Redundancy, and
 Quadruple Redundancy.
 Dual Modular Redundancy (DMR),

Figure 5.3
 In the figure, the voter decides which unit will actively control the Device Under Control (DUC)
 DMR uses two functional equivalent units, thus either can control the DUC.
 The most challenging aspect of DMR is determining when to switch over to the secondary unit.
 The obvious drawback is the 4X increase in system cost.
 Triple Modular Redundancy (TMR),

Figure 5.4
 TMR uses three functionally equivalent units to provide redundant backup
 TMR is more reliable than DMR due to two main aspects.
 The most obvious reason is that there are two “standby” units instead of just one.
 The other reason is that in TMR, there is a technique called diversity platforms
 The drawback of this approach is cost; TMR can raise the cost of your system by at least 3X.
 Quadruple Modular Redundancy (QMR),
 QMR is fundamentally similar to TMR but using four units instead of three to increase the
reliability.
 The obvious drawback is the 4X increase in system cost.
1:N Redundancy
 The design is used where a single backup is used for multiple systems;
 Figure 5.5
 This backup functions in the place of any single one of the active systems.
 Other drawbacks of this approach are
 the added complexity of deciding when to switch; and
 of a switch matrix that can reroute the signals correctly and efficiently
 This technique offers redundancy at a much lower cost than the other
models by using one standby unit for several primary units.
 This approach only works well when the primary units all have very
similar functions, thus allowing the standby to back up any of the
primary units if one of them fails
How redundant system design practices can improve
reliability of a system
R is the probability of success and F is the probability of failure
 Thus the sum of the two logical states is unity: i.e.
 R+F = 1
(5.1)
 Because most companies tend to track failures more than successes, the
equation can be recast around, to calculate reliability
 R = 1-F
(5.2)
 Example 5.1
 If it is assumed the probability of failure of a given system is 10 %; then
 Solution
 R = 1 - 0.1 = 0.90
 Which reveals that
 The probability of success for a given system would be 90 %, or 90 out of 100 should
succeed
The same concept
 Can be used to compute the reliability of a redundant system, viz:
 R = 1 - (F1)(F2)
 If systems 1 and 2 are redundant systems, i.e.,
 One system can functionally back up the other, and that:
 F1 is the probability of failure of system 1 and
 F2 is the probability of failure of system 2
 Upon leveraging on redundancy,
 R = = 1 - (F1)(F2) =1 - (0.1)(0.1) = 0.99
 This reveals that,
 The probability of success for a given system would be 99 %; or
 The 99 out of 100 should succeed
(5.3)
 Another way to calculate reliability is to take the probability equation
and instead solve for the mean time to failure (MTTF) of the system:
 R(t) = e-λt = e-t/MTTF
(5.4)
 Solving for MTTF: MTTF = - (t / ln [R(t)])
 For example,
 If an application had a mission time of 24 hours a day, 7 days a week, for
one year (24/7/365) and experienced a success rate of 90 %; then the
 MTTF = - (1 yr / ln [0.90] ) = 9.49 years
 If you add redundancy, then the success rate increases to approximately 99% for the
same mission time:
 MTTF = - (1 yr / ln [0.99]) = 99.50 years
 These equations effectively demonstrate
 The vast improvement in reliability that redundancy can bring to any system.

Levels of Redundancy
 There are many situations where it might be more beneficial to employ
redundancy only to a less reliable component of the system.
 The following model depicts a system that has three dependent
components in series. If one component fails, the entire system fails.

 Figure 5.6
 In Figure 5.6,
 R1 = Component 1 reliability, R2 = Component 2 reliability, R3 = Component 3 reliability
 The reliability of the entire system is obtained
 By multiplying the reliability of each of the components together.
 Rsystem = (R1) (R2) (R3)
(5.5)
 For an example, if R1 = .98, R2 = .85, and R3 = .97, then the reliability of
the system would be:
 Rsystem = (.98) (.85) (.97) = .81
 If a least reliable component of the system, R2, is backedup with
redundancy, the model now looks like this:

 Figure 5.6
 And known that: R = 1 – F
 And for a redundant component: R = 1 - (F1)(F2)
 The reliability equation of the system now looks like this:
 Rsystem = (R1) [1-(F2)(F2)] (R3)
 If system is now recalculated then:
 Rsystem = (.98) (1 - (.15) (.15) ) (.97) = .93; which reveals that
 The reliability of the system has been
 Improved by 12 % by implementing redundancy for only one component.
 This strategy is realizable at many levels; which is application specific;
 There is need to understand which components are most likely to fail; for instance,
 It may be at the sensor, the cabling, the device, the chassis, the power supply, or any number
of other components of the system.
Use of Statistical Design Method
The statistical design method involves
 The use of the distribution of component values between their tolerance
limits for the purpose of assessing whether a design meets its engineering
specifications; and
 Determining the proportion of the production which will be rejected as
unsatisfactory.
This procedure necessitates
 intensive use of the computer and is based on the use of random values
with the tolerance range for each component.
 Use of Derating Concept
Derating is a process by which
 a equipment is operated below the rated stress level in order to reduce the
equipment failure rate and consequently improve reliability significantly.
 The Arrhenius law can be used to achieved this and is expressed as;
 𝝀𝟐 = 𝝀𝟏
𝑽𝟐 𝒏
𝑲 𝑻𝟐 −𝑻𝟏
𝑽𝟏
(5.6)
 Where V1, V2; T1 and T2 are voltage and temperature levels respectively, and
 λ1, λ2 are the corresponding failure rates at those levels;
 K and n are constants.
 K is a constant showing the effect of temperature for a particular component. If the values of constants n and K are unknown, then they
may be determined by tests involving, firstly, keeping the voltage constant, and thus determining K, and secondly, keeping the
temperature constant, and thus determining n.
 This may be clarified by considering the equation
 Firstly for assumed constant voltage (i.e., V2 = V1 = V), giving
 𝝀𝟐 = 𝝀𝟏
𝑽 𝒏
𝑲 𝑻𝟐 −𝑻𝟏
𝑽
(5.7)
Since n > 0
𝝀𝟐 = 𝝀𝟏 𝑲
𝑻𝟐 −𝑻𝟏
(5.8)
Which gives
𝑙𝑜𝑔𝑒 𝐾 =
𝜆
𝑙𝑜𝑔𝑒 2
𝜆1
𝑇2 −𝑇1
↔𝐾= 𝑒
And for the constant temperature, 𝝀𝟐 = 𝝀𝟏
𝜆
𝑙𝑜𝑔𝑒 2
𝜆1
𝑇2 −𝑇1
𝑉2 𝒏
𝑲 𝑻𝟏 −𝑻𝟏
𝑉1
(5.9)
= 𝝀𝟏
𝑉2 𝒏
𝑲𝟎
𝑉1
= 𝝀𝟏
𝑉2 𝒏
𝑉1
Hence,
𝜆
𝒏 =
𝑙𝑜𝑔𝑒 𝜆2
1
𝑽
𝒍𝒐𝒈𝒆 𝑽𝟐
𝟏
(5.10)
Example:
In a test to determine failure rate, three sets of observations
were recorded under the following conditions
 a) Failure rate for rated voltage and rated temperature
 b) Failure rate for rated voltage and twice rated temperature
 c) Failure rate for half rated voltage and rated temperature
If the failure rates for these conditions are 0.025, 0.100, and
0.00125 respectively, and the rated temperature is 30°C,
calculate the probable failure rate at
 One third rated voltage, and Two-third rated temperature.
Solution
 For Failure rate for rated voltage and rated temperature,
 λ1 = 0.025, V1 = V , and T1 = 30°C
 For Failure rate for rated voltage and twice rated temperature,
 λ2 = 0.100, V2 = V1 = V , T1 = 30°C, T2 = 2T1 = 60°C ; and known that

 Then,
↔
; K = 1.047
 For Failure rate for ½ rated voltage and rated temperature,
 λ2 = 0.00125, V2 = ½ V1 = ½ V , T1 = T2 = 30°C ; and known that

↔

 Then
, hence
Then the resulting equation or law is therefore:
 𝜆2 = 0.025
𝑉2 4.32
𝑉1
× 1.047
𝑇2 −𝑇1
(5.11)
 The failure rate for one third rated voltage and two third rated temperature is therefore:
 λ2 = ??, V2 = 1/3 V1 =1/3 V , T2 = 2/3 T1 = 2/3 (30) = 20°C ;
 And upon substituting the above listed values in equation (5.11), we have
 𝜆2 = 0.025
1 4.32
3
× 1.047
20−30
= 0.00001371
This shows that
 There is a remarkable decrease in failure rate from 0.025 to 0.0001371, which proves convincingly
the benefit derived from derating
Design Considerations and Packaging
Packaging
 Refers to enclosures and protective features built into the product.
 of an electronic system must consider protection from
 mechanical damage, cooling, radio frequency noise emission, protection from
electrostatic discharge, maintenance, operator convenience, and cost.
Reason for packaging electrical & electronic products;
 To avoid breakage and potential release of hazardous constituents,
 To reduce the cost of transportation, maximize use of transport space,
 To simplify handling and prevents injuries, and
 To make it easier and faster to process material once it arrives at its
destination.
Things to be considered when selecting packaging:
 Hazards to be protected against:
 mechanical damage, exposure to weather and dirt, electromagnetic interference, etc.
 Heat dissipation requirements
 Availability and capability of suppliers
 User interface design and convenience
 Ease of access to internal parts when required for maintenance
 Product safety, and compliance with regulatory standards
 Aesthetics, and other marketing considerations
 Service life and reliability
Conclusion
In this lecture, we have discussed
◦ Methods of Improving Reliability; which include:
◦ Part Selection
◦ Use of Improved Constructional Method and New Technology
◦ Use of Redundancy
◦ Standby Redundancy: Cold and Hot Standby Redundancies
◦ N Modular Redundancy: Dual, Triple, and Quadruple Redundancies
◦ 1:N Redundancy
◦ Use of Statistical Design Method
◦ Use of Derating Concept
◦ Design Considerations and Packaging
Download