EEG560 Reliability & Maintainability of E&E Components & Systems Designing for Higher Reliability Sunday ADETONA, PhD sadetona@unilag.edu.ng Outline Introduction Methods of Improving Reliability ◦ Part Selection ◦ Use of Improved Constructional Method and New Technology ◦ Use of Redundancy ◦ Standby Redundancy: Cold and Hot Standby Redundancies ◦ N Modular Redundancy: Dual, Triple, and Quadruple Redundancies ◦ 1:N Redundancy ◦ Use of Statistical Design Method ◦ Use of Derating Concept Design Considerations and Packaging Conclusion Introduction The main goal of designing for Higher Reliability is to: Reduce components failure rate Methods of Improving Reliability Part Selection Use of Improved Constructional Method and New Technology Use of Redundancy Standby Redundancy: Cold and Hot Standby Redundancies N Modular Redundancy: Dual, Triple, and Quadruple Redundancies 1:N Redundancy Use of Statistical Design Method Use of Derating Concept Methods of Improving Reliability Part Selection For the design of a reliable system, it is necessary to ensure that the selected components: Are insensitive to the environment concerned. Do not create any harmful effect on other components. Can be protected individually or in group of a complete circuit from the harmful effect of the environment. Do not change in normal value outside the allowable limit. Use of Improved Constructional Method and New Technology Many methods and new technology have been developed which are intended to reduce the quantity of electronics components in any given equipment and protect them against the effects of the environment. These measures go a long way in improving reliability. Some of the main developments in construction have been the use of encapsulation and printed circuits. There have also been tremendous developments in components themselves which make them less sensitive to the environment. For example, the development of semiconductor diodes and transistors has reduced the problem of mounting in sockets in order to reduce the effects of mechanical shock and vibration. This is because they are small in size and dissipate little ‘harmful’ heat to neighbouring components. Encapsulating of circuits and component provides a great deal of protection against humidity, mechanical shock and vibration. Use Of Redundancy Redundancy Involves the use of additional units in a variety of configurations; Can be Standby Redundancy, N Modular Redundancy, and 1:N Redundancy Standby Redundancy Also known as Backup Redundancy; Here, there is an identical secondary unit to back up the primary unit; The secondary unit typically does not monitor the system, but is there just as a spare. There is need for a third party to be a: Watchdog, which monitors the system to decide when a switchover condition is met and command the system to switch control to the standby unit and Voter, which is the component that decides when to switch over and which unit is given control of the DUC. The system cost increase for this type of redundancy is Usually about 2X or less depending on your software development costs. In Standby redundancy there are two basic types, Cold Standby and Hot Standby. Cold Standby: Here, the secondary unit is powered off, thus preserving the reliability of the unit. Figure 5.1 The drawback of this design is that the downtime is greater than in hot standby, because you have to power up the standby unit and bring it online into a known state. This makes it more challenging to reconcile synchronization issues, but do to the length of the time it takes to bring the standby unit on line, there will be a big bump on switchover. Hot Standby: Here, the secondary unit is powered up and can optionally monitor the DUC. Figure 5.2 If the secondary unit is used as the watchdog and/or voter to decide when to switch over, then a third party can be eliminated to this job. This design does not preserve the reliability of the standby unit as well as the cold standby design. However, it shortens the downtime, which in turn increases the availability of the system. Hot Standby are similar to Dual Modular Redundancy (DMR) or Parallel Redundancy. These naming conventions are commonly interchanged. The main difference between Hot Standby and DMR is how tightly the primary and the secondary are synchronized. DMR completely synchronizes the primary and secondary units. N Modular Redundancy It is also known as Parallel Redundancy, which : refers to the approach of having multiply units running in parallel. All units are highly synchronized and receive the same input information at the same time. Their output values are then compared; and A voter decides which output values should be used. This model typically has faster switchover times than Hot Standby models, thus the system availability is very high, but because all the units are powered up and actively engaged with the DUC, the system is at more risk of encountering a common mode failure across all the units. There are three main typologies: Dual Modular Redundancy, Triple Modular Redundancy, and Quadruple Redundancy. Dual Modular Redundancy (DMR), Figure 5.3 In the figure, the voter decides which unit will actively control the Device Under Control (DUC) DMR uses two functional equivalent units, thus either can control the DUC. The most challenging aspect of DMR is determining when to switch over to the secondary unit. The obvious drawback is the 4X increase in system cost. Triple Modular Redundancy (TMR), Figure 5.4 TMR uses three functionally equivalent units to provide redundant backup TMR is more reliable than DMR due to two main aspects. The most obvious reason is that there are two “standby” units instead of just one. The other reason is that in TMR, there is a technique called diversity platforms The drawback of this approach is cost; TMR can raise the cost of your system by at least 3X. Quadruple Modular Redundancy (QMR), QMR is fundamentally similar to TMR but using four units instead of three to increase the reliability. The obvious drawback is the 4X increase in system cost. 1:N Redundancy The design is used where a single backup is used for multiple systems; Figure 5.5 This backup functions in the place of any single one of the active systems. Other drawbacks of this approach are the added complexity of deciding when to switch; and of a switch matrix that can reroute the signals correctly and efficiently This technique offers redundancy at a much lower cost than the other models by using one standby unit for several primary units. This approach only works well when the primary units all have very similar functions, thus allowing the standby to back up any of the primary units if one of them fails How redundant system design practices can improve reliability of a system R is the probability of success and F is the probability of failure Thus the sum of the two logical states is unity: i.e. R+F = 1 (5.1) Because most companies tend to track failures more than successes, the equation can be recast around, to calculate reliability R = 1-F (5.2) Example 5.1 If it is assumed the probability of failure of a given system is 10 %; then Solution R = 1 - 0.1 = 0.90 Which reveals that The probability of success for a given system would be 90 %, or 90 out of 100 should succeed The same concept Can be used to compute the reliability of a redundant system, viz: R = 1 - (F1)(F2) If systems 1 and 2 are redundant systems, i.e., One system can functionally back up the other, and that: F1 is the probability of failure of system 1 and F2 is the probability of failure of system 2 Upon leveraging on redundancy, R = = 1 - (F1)(F2) =1 - (0.1)(0.1) = 0.99 This reveals that, The probability of success for a given system would be 99 %; or The 99 out of 100 should succeed (5.3) Another way to calculate reliability is to take the probability equation and instead solve for the mean time to failure (MTTF) of the system: R(t) = e-λt = e-t/MTTF (5.4) Solving for MTTF: MTTF = - (t / ln [R(t)]) For example, If an application had a mission time of 24 hours a day, 7 days a week, for one year (24/7/365) and experienced a success rate of 90 %; then the MTTF = - (1 yr / ln [0.90] ) = 9.49 years If you add redundancy, then the success rate increases to approximately 99% for the same mission time: MTTF = - (1 yr / ln [0.99]) = 99.50 years These equations effectively demonstrate The vast improvement in reliability that redundancy can bring to any system. Levels of Redundancy There are many situations where it might be more beneficial to employ redundancy only to a less reliable component of the system. The following model depicts a system that has three dependent components in series. If one component fails, the entire system fails. Figure 5.6 In Figure 5.6, R1 = Component 1 reliability, R2 = Component 2 reliability, R3 = Component 3 reliability The reliability of the entire system is obtained By multiplying the reliability of each of the components together. Rsystem = (R1) (R2) (R3) (5.5) For an example, if R1 = .98, R2 = .85, and R3 = .97, then the reliability of the system would be: Rsystem = (.98) (.85) (.97) = .81 If a least reliable component of the system, R2, is backedup with redundancy, the model now looks like this: Figure 5.6 And known that: R = 1 – F And for a redundant component: R = 1 - (F1)(F2) The reliability equation of the system now looks like this: Rsystem = (R1) [1-(F2)(F2)] (R3) If system is now recalculated then: Rsystem = (.98) (1 - (.15) (.15) ) (.97) = .93; which reveals that The reliability of the system has been Improved by 12 % by implementing redundancy for only one component. This strategy is realizable at many levels; which is application specific; There is need to understand which components are most likely to fail; for instance, It may be at the sensor, the cabling, the device, the chassis, the power supply, or any number of other components of the system. Use of Statistical Design Method The statistical design method involves The use of the distribution of component values between their tolerance limits for the purpose of assessing whether a design meets its engineering specifications; and Determining the proportion of the production which will be rejected as unsatisfactory. This procedure necessitates intensive use of the computer and is based on the use of random values with the tolerance range for each component. Use of Derating Concept Derating is a process by which a equipment is operated below the rated stress level in order to reduce the equipment failure rate and consequently improve reliability significantly. The Arrhenius law can be used to achieved this and is expressed as; 𝝀𝟐 = 𝝀𝟏 𝑽𝟐 𝒏 𝑲 𝑻𝟐 −𝑻𝟏 𝑽𝟏 (5.6) Where V1, V2; T1 and T2 are voltage and temperature levels respectively, and λ1, λ2 are the corresponding failure rates at those levels; K and n are constants. K is a constant showing the effect of temperature for a particular component. If the values of constants n and K are unknown, then they may be determined by tests involving, firstly, keeping the voltage constant, and thus determining K, and secondly, keeping the temperature constant, and thus determining n. This may be clarified by considering the equation Firstly for assumed constant voltage (i.e., V2 = V1 = V), giving 𝝀𝟐 = 𝝀𝟏 𝑽 𝒏 𝑲 𝑻𝟐 −𝑻𝟏 𝑽 (5.7) Since n > 0 𝝀𝟐 = 𝝀𝟏 𝑲 𝑻𝟐 −𝑻𝟏 (5.8) Which gives 𝑙𝑜𝑔𝑒 𝐾 = 𝜆 𝑙𝑜𝑔𝑒 2 𝜆1 𝑇2 −𝑇1 ↔𝐾= 𝑒 And for the constant temperature, 𝝀𝟐 = 𝝀𝟏 𝜆 𝑙𝑜𝑔𝑒 2 𝜆1 𝑇2 −𝑇1 𝑉2 𝒏 𝑲 𝑻𝟏 −𝑻𝟏 𝑉1 (5.9) = 𝝀𝟏 𝑉2 𝒏 𝑲𝟎 𝑉1 = 𝝀𝟏 𝑉2 𝒏 𝑉1 Hence, 𝜆 𝒏 = 𝑙𝑜𝑔𝑒 𝜆2 1 𝑽 𝒍𝒐𝒈𝒆 𝑽𝟐 𝟏 (5.10) Example: In a test to determine failure rate, three sets of observations were recorded under the following conditions a) Failure rate for rated voltage and rated temperature b) Failure rate for rated voltage and twice rated temperature c) Failure rate for half rated voltage and rated temperature If the failure rates for these conditions are 0.025, 0.100, and 0.00125 respectively, and the rated temperature is 30°C, calculate the probable failure rate at One third rated voltage, and Two-third rated temperature. Solution For Failure rate for rated voltage and rated temperature, λ1 = 0.025, V1 = V , and T1 = 30°C For Failure rate for rated voltage and twice rated temperature, λ2 = 0.100, V2 = V1 = V , T1 = 30°C, T2 = 2T1 = 60°C ; and known that Then, ↔ ; K = 1.047 For Failure rate for ½ rated voltage and rated temperature, λ2 = 0.00125, V2 = ½ V1 = ½ V , T1 = T2 = 30°C ; and known that ↔ Then , hence Then the resulting equation or law is therefore: 𝜆2 = 0.025 𝑉2 4.32 𝑉1 × 1.047 𝑇2 −𝑇1 (5.11) The failure rate for one third rated voltage and two third rated temperature is therefore: λ2 = ??, V2 = 1/3 V1 =1/3 V , T2 = 2/3 T1 = 2/3 (30) = 20°C ; And upon substituting the above listed values in equation (5.11), we have 𝜆2 = 0.025 1 4.32 3 × 1.047 20−30 = 0.00001371 This shows that There is a remarkable decrease in failure rate from 0.025 to 0.0001371, which proves convincingly the benefit derived from derating Design Considerations and Packaging Packaging Refers to enclosures and protective features built into the product. of an electronic system must consider protection from mechanical damage, cooling, radio frequency noise emission, protection from electrostatic discharge, maintenance, operator convenience, and cost. Reason for packaging electrical & electronic products; To avoid breakage and potential release of hazardous constituents, To reduce the cost of transportation, maximize use of transport space, To simplify handling and prevents injuries, and To make it easier and faster to process material once it arrives at its destination. Things to be considered when selecting packaging: Hazards to be protected against: mechanical damage, exposure to weather and dirt, electromagnetic interference, etc. Heat dissipation requirements Availability and capability of suppliers User interface design and convenience Ease of access to internal parts when required for maintenance Product safety, and compliance with regulatory standards Aesthetics, and other marketing considerations Service life and reliability Conclusion In this lecture, we have discussed ◦ Methods of Improving Reliability; which include: ◦ Part Selection ◦ Use of Improved Constructional Method and New Technology ◦ Use of Redundancy ◦ Standby Redundancy: Cold and Hot Standby Redundancies ◦ N Modular Redundancy: Dual, Triple, and Quadruple Redundancies ◦ 1:N Redundancy ◦ Use of Statistical Design Method ◦ Use of Derating Concept ◦ Design Considerations and Packaging