Design and Test Technology for Dependable Embedded Systems Foreword Prefice Part 1: Design, Modeling and Verification Introduction Zebo Peng Linköping University, Sweden 1.1. System-Level Design of NoC-Based Dependable Embedded Systems Mihkel Tagel, Peeter Ellervee, Gert Jervan Tallinn University of Technology, Estonia The complexity and communication requirements of SoC are increasing, thus making the goal to design a fault-free system a very difficult task. Network-on-chip has been proposed as one of the alternatives to solve some of the on-chip communication problems and to address dependability at various levels of abstraction. The chapter presents system-level design techniques for NoC based systems. The NoC architecture has been utilized to address on-chip communication problems of complex SoCs. It can also be used to deal with faults as it exhibits a natural redundancy. The chapter presents an interesting system-level design framework to explore the large and complex design space of dependable NoC-based systems. 1.2. Synthesis of Flexible Fault-Tolerant Schedules for Embedded Systems with Soft and Hard Timing Constraints Viacheslav Izosimov Embedded Intelligent Solutions (EIS) By Semcon AB, Sweden Paul Pop Technical University of Denmark, Denmark Petru Eles and Zebo Peng Linköping University, Sweden The chapter deals with the design and optimization of embedded applications with soft and hard real-time processes. The hard processes must always complete on time, while a soft process can complete after its deadline and its completion time is associated with the quality of service (QoS). Deadlines for the hard processes must be guaranteed even in the presence of transient and intermittent faults, and the QoS should be maximized. The chapter presents a novel quasi-static scheduling strategy, where a set of schedules is synthesized off-line and, at run time the scheduler will select the appropriate schedule based on the occurrence of faults and the actual execution times of processes. 1.3. Optimizing Fault Tolerance for Multi-Processor System-on-Chip Dimitar Nikolov, Mikael Väyrynen, Urban Ingelsson, Erik Larsson Linköping University, Sweden Virendra Singh Indian Institute of Science, India The rapid development in semiconductor technologies makes it possible to manufacture ICs with multiple processors, so called Multi-Processor System-on-Chips (MPSoC). The chapter deals with fault tolerance design of MPSoC for general-purpose application, where the main concern is to reduce the average execution time (AET). It presents a mathematical framework for the analysis of AET, and an integer linear programming model to minimize AET, which takes also communication overhead into account. It describes also an interesting approach to estimate the error probability and to adjust the fault tolerant scheme dynamically during the operation of a MPSoC. 1.4. Diagnostic Modeling of Digital Systems with Multi-Level Decision Diagrams Raimund Ubar, Jaan Raik, Artur Jutman, Maksim Jenihhin Tallinn University of Technology, Estonia To cope with the complexity of today’s digital systems in test generation, fault simulation and fault diagnosis hierarchical multi-level formal approaches should be used. The chapter presents a unified diagnostic modelling technique based on Decision Diagrams (DD), which can be used to capture a digital system design at different levels of abstraction. Two new types of DDs, the logic level structurally synthesized binary DDs (SSBDD) and the high level DDs (HLDD), are defined together with several subclasses. Methods for the formal synthesis of the both types of DDs are described, and it is shown how the DDs can be used in a design environment for dependable systems. 1.5. Enhanced Formal Verification Flow for Circuits Integrating Debugging and Coverage Analysis Daniel Große, Görschwin Fey, Rolf Drechsler University of Bremen, Germany The chapter deals with techniques for formal hardware verification. An enhanced formal verification flow to integrate debugging and coverage analysis has been presented. In this flow, a debugging tool locates the source of a failure by analyzing the discrepancy between the property and the circuit behavior. A technique to analyze functional coverage of the proven Bounded Model Checking properties is then used to determine if the property set is complete or not, and return the coverage gaps, if it is not. The technique can be used to ensure the correctness of a design, which facilitates consequently the development of dependable systems. Part 2: Faults, Compensation and Repair Introduction Heinrich Theodor Vierhaus Technical University of Brandenburg Cottbus, Germany 2.1. Advanced technologies for transient faults detection and compensation Matteo Sonza Reorda, Luca Sterpone, Massimo Violante Politecnico di Torino, Italy Transient faults have become an increasing issue in the past few years as smaller geometries of newer, highly miniaturized, silicon manufacturing technologies brought to the mass-market failure mechanisms traditionally bound to niche markets as electronic equipments for avionic, space or nuclear applications. The chapter presents and discusses the origin of transient faults, fault propagation mechanisms, and the state-of-the-art design techniques that can be used to detect and correct transient faults. The concepts of hardware, data and time redundancy are presented, and their implementations to cope with transient faults affecting storage elements, combinational logic and IP-cores (e.g., processor cores) typically found in a System-on-Chip are discussed. 2.2. Memory Testing and Self-Repair Mária Fischerová, Elena Gramatová Institute of Informatics of the Slovak Academy of Sciences, Slovakia Memories are very dense structures and therefore the probability of defects is higher than that of the logic and analogue blocks, which are not so densely laid out. Embedded memories are the largest components of a typical SoC, thus dominating the yield and reliability of the chip. The chapter gives a summary view of static and dynamic fault models, effective test algorithms for memory fault (defect) detection and localization, built-in self-test and classification of advanced built-in self-repair techniques supported by different types of repair allocation algorithms. 2.3. Fault-tolerant and fail-safe design based on reconfiguration H.Kubatova, P.Kubalik Czech Technical University in Prague, Czech Republic The chapter deals with the problem, how to design fault-tolerant or fail-safe systems in programmable hardware (FPGAs) for using them in mission-critical applications. RAM based FPGAs are usually taken for unreliable due to the high probability of transient faults (SEU) and therefore inapplicable in this area. But FPGAs can be easily reconfigured if an error is detected. It is shown how to utilize appropriate type of FPGA reconfiguration to combine it with fail-safe and fault-tolerant design. The trade-off between the requested level of dependability characteristics of a designed system and area overhead with respect to FPGA possible faults is the main property and advantage of the presented methodology. 2.4. Self-repair technology for global interconnects on SoCs Daniel Scheit and Heinrich Theodor Vierhaus Technical University of Brandenburg Cottbus, Germany The reliability of interconnects on ICs has become a major problem in recent years, due to the rise of complexity, low-k-insulating materials with reduced stability, and wear-out-effects due to high current density. The total reliability of a system on a chip is more and more dependent on the reliability of interconnects. The chapter presents an overview of the state of the art for fault-tolerant interconnects. Most of the published techniques are aimed at the correction of transient faults. Builtin self-repair has not been discussed as much as the other techniques. In this chapter, this gap is filled by discussing how to use built-in self-repair in combination with other approved solutions to achieve fault tolerance with respect of all kind of faults. 2.5. Built-in Self Repair for Logic Structures Tobias Koal, Heinrich T. Vierhaus Technical University of Brandenburg, Germany For several years, it has been predicted that nano-scale ICs will have a rising sensitivity to both transient and permanent faults effects. Most of the effort has so far gone into the detection and compensation of transient fault effects. More recently, also the possibility of repairing permanent faults, due to either production flaws or to wear-out effects, has found a great attention. While builtin self test (BIST) and even self repair (BISR) for regular structures such as static memories (SRAMs) are well understood, the concepts for in-system repair of irregular logic and interconnects are few and mainly based on FPGAs as the basic implementation. In this chapter, different schemes of logic (self-) repair with respect to cost and limitations, using repair schemes that are not based on FPGAs, are described and analyzed. It can be shown that such schemes are feasible, but need lot of attention in terms of hidden single points of failure. 2.6. Self-Repair by Program Reconfiguration in VLIW Processor Architectures M. Schölzel, P. Pawlowski Brandenburg University of Technology Cottbus, Germany A. Dabrowski Poznan University of Technology, Poland Statically scheduled superscalar processors (e.g. very long instruction word processors) are characterized by multiple parallel execution units and small sized control logic. This makes them easy scalable and therefore attractive for the use in embedded systems as an application specific processor. The chapter deals with the fault-tolerance of VLIW processor architectures. If one or more components in the data path of a processor become permanently faulty, then it becomes necessary either to reconfigure the hardware or the executed program such that operations are scheduled around the faulty units. The reconfiguration of the program is either done dynamically by the hardware or permanently by self-modifying code. In both cases a delay may occur during the execution of the application. This graceful performance degradation may become critical for realtime applications. A framework to overcome this problem by using scalable algorithms is provided. Part 3: Fault Simulation and Fault Injection Introduction 3.1. Fault simulation and fault injection technology based on SystemC Silvio Misera Kjellberg Finsterwalde, Germany Roberto Urban Brandenburg University of Technology Cottbus, Germany Simulation of faults has two important areas of application. On one hand, fault simulation is used for validation of test patterns, on the other hand, simulation based fault injection is used for dependability assessment of systems. The Chapter describes simulation of faults in electronic systems by the usage of SystemC. Two operation areas are targeted: fault simulation for detecting of fabrication faults, and fault injection for analysis of electronic system designs for safety critical applications with respect to their dependability under fault conditions. The chapter discusses possibilities of using SystemC to simulate the designs. State of the art applications are presented for this purpose. It is shown how simulation with fault models can be implemented by several injection techniques. Approaches are presented, which help to speed up simulations. Some practical simulation environments are shown. 3.2. High-Level Decision Diagram Simulation for Diagnosis and Soft-Error Analysis Jaan Raik, Urmas Repinski, Maksim Jenihhin, Anton Chepurov Tallinn University of Technology The chapter deals with high-level fault simulation for design error diagnosis. High-level descision diagrams (HLDD) are used for high-level fault reasoning which allow to implement efficient algorithms for locating the design errors. HLDDs can be efficiently used to determine the critical sets of soft-errors to be injected for evaluating the dependability of systems. A holistic diagnosis approach based on high-level critical path tracing for design error location and for critical fault list generation to assess designs vulnerability to soft-errors by means of fault injection is presented. 3.3. High-Speed Logic Level Fault Simulation Raimund Ubar, Sergei Devadze Tallinn University of Technology The chapter is devoted to logic level fault simulation. A new approach based on exact critical path tracing is presented. To achieve the speed-up of backtracing, the circuit is presented as a network of subcircuits modeled with structurally synthesized BDDs to compress the gate-level structural details. The method can be used for simulating permanent faults in combinational circuits, and transient or intermittent faults both in combinational and sequential circuits with the goal of selecting critical faults for fault injecting with dependability analysis purposes. Part 4: Test Technology for Systems-on-Chip Introduction 4.1. Software-Based Self-Test of Embedded Microprocessors Paolo Bernardi, Michelangelo Grosso, Ernesto Sánchez and Matteo Sonza Reorda Politecnico di Torino, Italy In the recent years, the usage of embedded microprocessors in complex SoCs has become common practice. Their test is often a challenging task, due to their complexity, to the strict constraints coming from the environment and the application. Chapter 4-1 focuses on the test of microprocessors or microcontrollers existing within a SoC. These modules are often coming from third parties, and the SoC designer is often in the position of not being allowed to know the internal details of the module, nor to change or redesign it for test purposes. For this reason, an emerging solution for processor testing within a SoC is based on developing suitable test programs. The test technique, known as Software-based Self-test is introduced, and the main approaches for test program generation and application are discussed. 4.2. SoC Self Test Based on a Test-Processor Tobias Koal, Rene Kothe, H. T. Vierhaus Brandenburg University of Technology Cottbus, Germany Testing complex SoCs with up to billions of transistors has been a challenge to IC test technology for more than a decade. Most of the research work has focused on problems of production testing, while the problem of self test in the field of application has found much less attention. The chapter faces this issue, describing a hierarchical HW/SW based self test solution based on introducing a test processor in charge of orchestrating the test activities and taking under control the test of the different modules within the SoC. 4.3. Delay Faults Testing Marcel Baláž, Roland Dobai, Elena Gramatová Institute of Informatics of the Slovak Academy of Sciences, Bratislava, Slovakia SoC devices are among the most advanced devices which are currently manufactured; consequently, their test must take into consideration some crucial issues that can often be neglected in other devices, manufactured with more mature technologies. One of these issues relates to delay faults: we are forced not only to check whether the functionality of SoCs is still guaranteed, but also whether they are able to correctly work at the maximum frequency they have been designed for. New semiconductor technologies tend to introduce new kinds of faults, that can not be detected unless the test is performed at speed and specifically targeting these kinds of faults. The chapter focuses on delay faults: it provides an overview of the most important fault models introduced so far, as well as a presentation of the key techniques for detecting them. 4.4. Low Power Testing Zdeněk Kotásek, Jaroslav Škarvada Brno University of Technology, Czech Republic Another increasingly important issue in SoC testing is power consumption, which is becoming critical not only for low-power devices. In general, test tends to excite as much as possible the device under test; unfortunately, this normally results in a higher than usual switching activity, which is strictly correlated with power consumption. Therefore, test procedures may consume more power than the device is designed for, creating severe problems in terms of reliability and duration. The chapter deals with power issues during test, clarifying where the problem comes from, and which techniques can be used to circumvent it. 4.5. Thermal-Aware SoC Test Scheduling Zhiyuan He, Zebo Peng, Petru Eles Linköping University, Sweden The high degree of integration of SoC devices, combined with the already mentioned power consumption, may rise issues in terms of the temperature of the different parts of the device. In general, problems stemming from the fact that some part of the circuit reaches a critical temperature during the test can be solved by letting this part to cool before the test is resumed, but this can obviously go against the common goal of minimizing test time. The chapter discusses thermal issues during test, and proposes solutions to minimize their impact by identifying optimal strategies for fulfilling thermal constraints while still minimizing test time. Part 5: Test planning, compression and compaction in SoCs Introduction 5.1. Study on Combined Test-Data Compression and Test Planning for Testing of Modular SoCs Anders Larsson, Urban Ingelsson, Erik Larsson Linköping University, Sweden Krishnendu Chakrabarty Duke University, USA Test‐data volume and test execution times are both costly commodities. To reduce the cost of test, previous studies have used test‐data compression techniques on system‐level to reduce the test‐data volume or employed test architecture design for module‐based SOCs to enable test schedules with low test execution time. Research on combining the two approaches is lacking. Chapter 5-1 studies how core‐level test data compression can be combined with test architecture design and test planning to reduce test cost. Test data compression for non-modular SoCs and test planning for modular SoCs have been separately proposed to address test application time and test data volumes. 5.2. Reduction of the Transferred Test Data Amount Ondřej Novák Technical University Liberec, Czech Republic The chapter addresses the bandwidth problem between the external tester and the device under test (DUT). While the previous chapter assumes deterministic tests, this chapter suggests to combine deterministic patterns stored on the external tester with pseudorandom patterns generated on chip. The chapter details ad-hoc compression techniques for deterministic test and details a mixed-mode approach that combines deterministic test vectors with pseudo-random test vectors using chip automata. 5.3. Sequential Test Set Compaction in LFSR Reseeding A. Jutman, I. Aleksejev, J. Raik Tallinn University of Technology The chapter continues on the line of the previous chapter and discusses embedded self-test. Instead of transporting test data to the DUT, the approach in this chapter is to make use of a fully embedded test solution where the test data is generated by on-chip linear feed-back shift-registers (LFSRs). While LFSRs usually are considered to deliver lower quality tests than deterministic ATPG tests, the chapter demonstrates that the test quality can be made high by careful planning of LRSR re-seeding.