Microelectronics Reliability 56 (2016) 182–188 Contents lists available at ScienceDirect Microelectronics Reliability journal homepage: www.elsevier.com/locate/mr Thermal reliability prediction and analysis for high-density electronic systems based on the Markov process Yi Wan a,⁎, Hailong Huang a, Diganta Das b, Michael Pecht b a b College of Physics and Electronic Information Engineering, Wenzhou University, Wenzhou 325035, China Center for Advanced Life Cycle Engineering, University of Maryland, College Park, MD 20742, USA a r t i c l e i n f o Article history: Received 9 April 2015 Received in revised form 25 August 2015 Accepted 5 October 2015 Available online 14 October 2015 Keywords: Electronic systems Thermal reliability estimation and prediction Stochastic process Markov theory The feature parameters of thermal reliability evaluation a b s t r a c t Thermal-mechanical fatigue is one of the main failure modes for electronic systems, particularly for high-density electronic systems with high-power components. Thermal reliability estimation and prediction have been an increasing concern for improving the safety and reliability of electronic systems. In this paper, we propose a stochastic process prediction model to estimate the thermal reliability of an electronic system based on Markov theory. We first divided the high-density electronic systems into four modules: the energy transformation and protection module, the electronic control module, the connection module, and the signal transmission and transformation module. By integrating failure and repair characteristics of the four modules, a stochastic model of thermal reliability analysis and prediction for a whole electronic system was built based on the Markov process. The feature parameters of thermal reliability evaluation, including thermal reliability, thermal failure probability, mean time between thermal faults, and thermal stable availability, were derived based on our comprehensive model. Finally, we applied the model to an indoor electronic system of DC frequency conversion conditioning. The thermal reliability was estimated and predicted using tested failure and debugging repair data. Effective methods for improving thermal reliability are presented and analyzed based on the comprehensive Markov model. © 2015 Elsevier Ltd. All rights reserved. 1. Introduction In an electronic system, the electrical and mechanical connections between the electronic components and the circuit board are mainly implemented through the solder joints. With the increase of integration levels, more attention has been focused on overheating of electronic products [1]. The thermal failure rate of electronic components will increase by an order of magnitude per 10 °C increment of environmental temperature [2]. Thermal fatigue is the main failure mode of high-density electronic systems. Table 1 shows the relation between failure rate and the temperature of electronic components [3,4]. Temperature and thermal features directly affect the life of electronic products and are key issues for highdensity electronic systems [5]. Todd et al. pointed out that the reliability of an electronic system is mainly the thermal reliability of the system [6]. As solder joints have gotten smaller, it has been emphasized that reliability designs need to be made to prevent thermal failure [7]. So, it is inevitable that reliability engineering and reliability theory are applied to thermal failure analysis and thermal design of an electronic system. ⁎ Corresponding author. E-mail address: jsj_yiwan@wzu.edu.cn (Y. Wan). http://dx.doi.org/10.1016/j.microrel.2015.10.006 0026-2714/© 2015 Elsevier Ltd. All rights reserved. Many scholars have carried out relevant studies, and some reliability analysis and prediction models have been proposed. The most common model used is the MIL standard model; the other common models are the Schick–Wolverton model, the Shooman model, the Musa execution time model, the Goel–Okumoto non-homogeneous Poisson course model, the likelihood Bayesian model, and so on [8–11]. The other popular methods are the thermal shock test/temperature cycling test (TST/TCT) method, the nonlinear finite element method, and the thermal structural reliability method [12–16]. In recent years, thermal reliability models have been developed. Johann-Peter Sommer et al. presented a parametric finite element analysis (FEA) method for the thermal and thermo-mechanical behavior of advanced electronic packages, taking into account the initial design phase in order to evaluate and achieve reliability [17]. Yu-min Lee et al. proposed an efficient statistical electro-thermal model for analyzing on-chip thermal reliability under process variations by the collocation-based statistical modeling technique. A mixed-mesh strategy is presented to further enhance the efficiency of the developed statistical electro-thermal model [18]. Catelani et al. evaluated and analyzed the reliability performance of the electronic device in accordance with the Arrhenius and modified Coffin–Manson degradation models. The whole test plan had been characterized, and some results were reported [19]. Chipulis et al. developed an approach to solving the issue of Y. Wan et al. / Microelectronics Reliability 56 (2016) 182–188 2. State analysis of an electronic system Table 1 Failure rate of electronic components at high and low temperatures. Basis failure rate Component Transistor Ceramic capacitor Transformer Carbon film resistor IC chip High temperature Low temperature 0.0640 at 160 °C 0.029 at 125 °C 0.0267 at 85 °C 0.0063 at 90 °C 0.5100 at 90 °C 0.008 at 40 °C 0.0009 at 40 °C 0.001 at 40 °C 0.0002 at 40 °C 0.0068 at 40 °C 183 ΔT/oC 120 85 45 50 50 ⁎FRHL 8:1 32:1 27:1 31:1 75:1 ⁎ FRHL denotes failure ratio of high and low temperature. evaluating the reliability of thermal power engineering based on processing retrospective information by regression analysis methods [20]. Physics-of-failure (PoF) and T-FEA methods are developed and used in the literature to assess and predict the thermal reliability of electronic systems [21–22]. However, the existing models and methods are basically for thermal reliability analysis and prediction for a single component. The comprehensive failure and debugging repair characteristics of an electronic system were not considered in the above models. An electronic system is a comprehensive system that consists of many components and is capable of debugging and repair. Its thermal reliability is not only determined by the thermal reliability of individual components, but also the failure and debugging repair characteristics of every component. In this paper, we propose a stochastic process prediction model to estimate the thermal reliability of electronic systems based on Markov theory. We first divided high-density electronic systems into four modules: the energy transformation and protection module, the electronic control module, the connection module, and the signal transmission and transformation module. By integrating failure characteristics and debugging repair characteristics of the four modules, the stochastic process model of thermal reliability analysis and prediction for the whole electronic system was built based on Markov theory. Further, the evaluation feature parameters, including thermal reliability, thermal failure probability, mean time between thermal faults, and thermal stable availability, were derived and obtained based on the comprehensive model. Finally, we applied the model to an indoor electronic system of DC frequency conversion conditioning. The thermal reliability was estimated and predicted using failure and debugging repair data, and effective methods of improving thermal reliability were presented and analyzed based on the comprehensive Markov model. In this work, we developed a new method for thermal reliability design, accurate thermal reliability evaluation and prediction, and scientific management for high-density electronic systems. A high-density electronic system consists of an energy transformation and protection module, an electronic control module, a connection module, and a signal transmission and transformation module (Fig. 1). Every module consists of many components. An energy transformation and protection module includes a switching power supply, powerconverting chips, transformers, fuses, magnetic rings, voltagedependent resistors, and so on. An electronic control module includes relays, optocouplers, light-emitting diodes (LEDs), and so on. A connection module includes connectors, sockets, printed circuit boards (PCBs), connecting cables, and so on. A signal transmission and transformation module includes resistors, capacitors, inductors, diodes, transistors, digital integrated circuits, AD/DA circuits, operational amplifiers, and so on. There are two states of normal and thermal failures for every module in an electronic system. A state is constantly transformed from normal to thermal failure, then from thermal failure to normal by debugging or repair within a given time. A state transformation is shown in Fig. 2. Both failure time and debugging repair time are random, and the failure probability of every module is also random. So life and state transformations for every module are random processes. The thermal failure random process of an electronic system can be considered a continuous-time Markov process with discrete states. Assume the failure process of an electronic system corresponds with the following statements [23]: (1) The state is discernible. (2) The thermal failure rate, λ, and debugging repair rate, μ, of all modules are both approximately able to be regarded as constant, so reliability distribution of every module is an exponential distribution. (3) The failure probability of the normal module is λΔt within t to t + Δt. (4) The debugging repair probability of the failure module is μΔt within t to t + Δt. (5) The failure probability of two and more modules is approximately zero within Δt. (6) The thermal failure event and debugging repair event of every module are independent of each other. Based on the above analysis, the thermal failure process of an electronic system is a Markov process wherein the state alternates between normal and failure. Assume that the state of every module is independent of the others and that two or more modules are unable to be in thermal failure at the same time. A group of professional debugging and repair specialists are provided, and the state of every module is shown in Table 2. 3. Prediction and analysis of thermal reliability for an electronic system based on the Markov process If λ1, λ2, and λ4 are the thermal failure rates, then μ1, μ2, μ3, and μ4 are the debugging repair rates of the energy transformation and protection module, the electronic control module, the connection module, and the Fig. 1. Functional structure of an electronic system. Fig. 2. State transformation of an electronic system. ID 548131 Title Thermalreliabilitypredictionandanalysisforhigh-densityelectronicsystemsbasedontheMarkovprocess http://fulltext.study/article/548131 http://FullText.Study Pages 7