thermal peng

advertisement
Design and Test Technology for Dependable Embedded
Systems
Foreword
Prefice
Part 1: Design, Modeling and Verification
Introduction
Zebo Peng
Linköping University, Sweden
1.1. System-Level Design of NoC-Based Dependable Embedded Systems
Mihkel Tagel, Peeter Ellervee, Gert Jervan
Tallinn University of Technology, Estonia
The complexity and communication requirements of SoC are increasing, thus making the goal to
design a fault-free system a very difficult task. Network-on-chip has been proposed as one of the
alternatives to solve some of the on-chip communication problems and to address dependability at
various levels of abstraction. The chapter presents system-level design techniques for NoC based
systems. The NoC architecture has been utilized to address on-chip communication problems of
complex SoCs. It can also be used to deal with faults as it exhibits a natural redundancy. The chapter
presents an interesting system-level design framework to explore the large and complex design
space of dependable NoC-based systems.
1.2. Synthesis of Flexible Fault-Tolerant Schedules for Embedded Systems with Soft and Hard
Timing Constraints
Viacheslav Izosimov
Embedded Intelligent Solutions (EIS) By Semcon AB, Sweden
Paul Pop
Technical University of Denmark, Denmark
Petru Eles and Zebo Peng
Linköping University, Sweden
The chapter deals with the design and optimization of embedded applications with soft and hard
real-time processes. The hard processes must always complete on time, while a soft process can
complete after its deadline and its completion time is associated with the quality of service (QoS).
Deadlines for the hard processes must be guaranteed even in the presence of transient and
intermittent faults, and the QoS should be maximized. The chapter presents a novel quasi-static
scheduling strategy, where a set of schedules is synthesized off-line and, at run time the scheduler
will select the appropriate schedule based on the occurrence of faults and the actual execution times
of processes.
1.3. Optimizing Fault Tolerance for Multi-Processor System-on-Chip
Dimitar Nikolov, Mikael Väyrynen, Urban Ingelsson, Erik Larsson
Linköping University, Sweden
Virendra Singh
Indian Institute of Science, India
The rapid development in semiconductor technologies makes it possible to manufacture ICs with
multiple processors, so called Multi-Processor System-on-Chips (MPSoC). The chapter deals with fault
tolerance design of MPSoC for general-purpose application, where the main concern is to reduce the
average execution time (AET). It presents a mathematical framework for the analysis of AET, and an
integer linear programming model to minimize AET, which takes also communication overhead into
account. It describes also an interesting approach to estimate the error probability and to adjust the
fault tolerant scheme dynamically during the operation of a MPSoC.
1.4. Diagnostic Modeling of Digital Systems with Multi-Level Decision Diagrams
Raimund Ubar, Jaan Raik, Artur Jutman, Maksim Jenihhin
Tallinn University of Technology, Estonia
To cope with the complexity of today’s digital systems in test generation, fault simulation and fault
diagnosis hierarchical multi-level formal approaches should be used. The chapter presents a unified
diagnostic modelling technique based on Decision Diagrams (DD), which can be used to capture a
digital system design at different levels of abstraction. Two new types of DDs, the logic level
structurally synthesized binary DDs (SSBDD) and the high level DDs (HLDD), are defined together with
several subclasses. Methods for the formal synthesis of the both types of DDs are described, and it is
shown how the DDs can be used in a design environment for dependable systems.
1.5. Enhanced Formal Verification Flow for Circuits Integrating Debugging and Coverage Analysis
Daniel Große, Görschwin Fey, Rolf Drechsler
University of Bremen, Germany
The chapter deals with techniques for formal hardware verification. An enhanced formal verification
flow to integrate debugging and coverage analysis has been presented. In this flow, a debugging tool
locates the source of a failure by analyzing the discrepancy between the property and the circuit
behavior. A technique to analyze functional coverage of the proven Bounded Model Checking
properties is then used to determine if the property set is complete or not, and return the coverage
gaps, if it is not. The technique can be used to ensure the correctness of a design, which facilitates
consequently the development of dependable systems.
Part 2: Faults, Compensation and Repair
Introduction
Heinrich Theodor Vierhaus
Technical University of Brandenburg Cottbus, Germany
2.1. Advanced technologies for transient faults detection and compensation
Matteo Sonza Reorda, Luca Sterpone, Massimo Violante
Politecnico di Torino, Italy
Transient faults have become an increasing issue in the past few years as smaller geometries of
newer, highly miniaturized, silicon manufacturing technologies brought to the mass-market failure
mechanisms traditionally bound to niche markets as electronic equipments for avionic, space or
nuclear applications. The chapter presents and discusses the origin of transient faults, fault
propagation mechanisms, and the state-of-the-art design techniques that can be used to detect and
correct transient faults. The concepts of hardware, data and time redundancy are presented, and
their implementations to cope with transient faults affecting storage elements, combinational logic
and IP-cores (e.g., processor cores) typically found in a System-on-Chip are discussed.
2.2. Memory Testing and Self-Repair
Mária Fischerová, Elena Gramatová
Institute of Informatics of the Slovak Academy of Sciences, Slovakia
Memories are very dense structures and therefore the probability of defects is higher than that of
the logic and analogue blocks, which are not so densely laid out. Embedded memories are the largest
components of a typical SoC, thus dominating the yield and reliability of the chip. The chapter gives a
summary view of static and dynamic fault models, effective test algorithms for memory fault (defect)
detection and localization, built-in self-test and classification of advanced built-in self-repair
techniques supported by different types of repair allocation algorithms.
2.3. Fault-tolerant and fail-safe design based on reconfiguration
H.Kubatova, P.Kubalik
Czech Technical University in Prague, Czech Republic
The chapter deals with the problem, how to design fault-tolerant or fail-safe systems in
programmable hardware (FPGAs) for using them in mission-critical applications. RAM based FPGAs
are usually taken for unreliable due to the high probability of transient faults (SEU) and therefore
inapplicable in this area. But FPGAs can be easily reconfigured if an error is detected. It is shown how
to utilize appropriate type of FPGA reconfiguration to combine it with fail-safe and fault-tolerant
design. The trade-off between the requested level of dependability characteristics of a designed
system and area overhead with respect to FPGA possible faults is the main property and advantage
of the presented methodology.
2.4. Self-repair technology for global interconnects on SoCs
Daniel Scheit and Heinrich Theodor Vierhaus
Technical University of Brandenburg Cottbus, Germany
The reliability of interconnects on ICs has become a major problem in recent years, due to the rise of
complexity, low-k-insulating materials with reduced stability, and wear-out-effects due to high
current density. The total reliability of a system on a chip is more and more dependent on the
reliability of interconnects. The chapter presents an overview of the state of the art for fault-tolerant
interconnects. Most of the published techniques are aimed at the correction of transient faults. Builtin self-repair has not been discussed as much as the other techniques. In this chapter, this gap is
filled by discussing how to use built-in self-repair in combination with other approved solutions to
achieve fault tolerance with respect of all kind of faults.
2.5. Built-in Self Repair for Logic Structures
Tobias Koal, Heinrich T. Vierhaus
Technical University of Brandenburg, Germany
For several years, it has been predicted that nano-scale ICs will have a rising sensitivity to both
transient and permanent faults effects. Most of the effort has so far gone into the detection and
compensation of transient fault effects. More recently, also the possibility of repairing permanent
faults, due to either production flaws or to wear-out effects, has found a great attention. While builtin self test (BIST) and even self repair (BISR) for regular structures such as static memories (SRAMs)
are well understood, the concepts for in-system repair of irregular logic and interconnects are few
and mainly based on FPGAs as the basic implementation. In this chapter, different schemes of logic
(self-) repair with respect to cost and limitations, using repair schemes that are not based on FPGAs,
are described and analyzed. It can be shown that such schemes are feasible, but need lot of attention
in terms of hidden single points of failure.
2.6. Self-Repair by Program Reconfiguration in VLIW Processor Architectures
M. Schölzel, P. Pawlowski
Brandenburg University of Technology Cottbus, Germany
A. Dabrowski
Poznan University of Technology, Poland
Statically scheduled superscalar processors (e.g. very long instruction word processors) are
characterized by multiple parallel execution units and small sized control logic. This makes them easy
scalable and therefore attractive for the use in embedded systems as an application specific
processor. The chapter deals with the fault-tolerance of VLIW processor architectures. If one or
more components in the data path of a processor become permanently faulty, then it becomes
necessary either to reconfigure the hardware or the executed program such that operations are
scheduled around the faulty units. The reconfiguration of the program is either done dynamically by
the hardware or permanently by self-modifying code. In both cases a delay may occur during the
execution of the application. This graceful performance degradation may become critical for realtime applications. A framework to overcome this problem by using scalable algorithms is provided.
Part 3: Fault Simulation and Fault Injection
Introduction
3.1. Fault simulation and fault injection technology based on SystemC
Silvio Misera
Kjellberg Finsterwalde, Germany
Roberto Urban
Brandenburg University of Technology Cottbus, Germany
Simulation of faults has two important areas of application. On one hand, fault simulation is used for
validation of test patterns, on the other hand, simulation based fault injection is used for
dependability assessment of systems. The Chapter describes simulation of faults in electronic
systems by the usage of SystemC. Two operation areas are targeted: fault simulation for detecting of
fabrication faults, and fault injection for analysis of electronic system designs for safety critical
applications with respect to their dependability under fault conditions. The chapter discusses
possibilities of using SystemC to simulate the designs. State of the art applications are presented for
this purpose. It is shown how simulation with fault models can be implemented by several injection
techniques. Approaches are presented, which help to speed up simulations. Some practical
simulation environments are shown.
3.2. High-Level Decision Diagram Simulation for Diagnosis and Soft-Error Analysis
Jaan Raik, Urmas Repinski, Maksim Jenihhin, Anton Chepurov
Tallinn University of Technology
The chapter deals with high-level fault simulation for design error diagnosis. High-level descision
diagrams (HLDD) are used for high-level fault reasoning which allow to implement efficient
algorithms for locating the design errors. HLDDs can be efficiently used to determine the critical sets
of soft-errors to be injected for evaluating the dependability of systems. A holistic diagnosis
approach based on high-level critical path tracing for design error location and for critical fault list
generation to assess designs vulnerability to soft-errors by means of fault injection is presented.
3.3. High-Speed Logic Level Fault Simulation
Raimund Ubar, Sergei Devadze
Tallinn University of Technology
The chapter is devoted to logic level fault simulation. A new approach based on exact critical path
tracing is presented. To achieve the speed-up of backtracing, the circuit is presented as a network of
subcircuits modeled with structurally synthesized BDDs to compress the gate-level structural details.
The method can be used for simulating permanent faults in combinational circuits, and transient or
intermittent faults both in combinational and sequential circuits with the goal of selecting critical
faults for fault injecting with dependability analysis purposes.
Part 4: Test Technology for Systems-on-Chip
Introduction
4.1. Software-Based Self-Test of Embedded Microprocessors
Paolo Bernardi, Michelangelo Grosso, Ernesto Sánchez and Matteo Sonza Reorda
Politecnico di Torino, Italy
In the recent years, the usage of embedded microprocessors in complex SoCs has become common
practice. Their test is often a challenging task, due to their complexity, to the strict constraints
coming from the environment and the application. Chapter 4-1 focuses on the test of
microprocessors or microcontrollers existing within a SoC. These modules are often coming from
third parties, and the SoC designer is often in the position of not being allowed to know the internal
details of the module, nor to change or redesign it for test purposes. For this reason, an emerging
solution for processor testing within a SoC is based on developing suitable test programs. The test
technique, known as Software-based Self-test is introduced, and the main approaches for test
program generation and application are discussed.
4.2. SoC Self Test Based on a Test-Processor
Tobias Koal, Rene Kothe, H. T. Vierhaus
Brandenburg University of Technology Cottbus, Germany
Testing complex SoCs with up to billions of transistors has been a challenge to IC test technology for
more than a decade. Most of the research work has focused on problems of production testing, while
the problem of self test in the field of application has found much less attention. The chapter faces
this issue, describing a hierarchical HW/SW based self test solution based on introducing a test
processor in charge of orchestrating the test activities and taking under control the test of the
different modules within the SoC.
4.3. Delay Faults Testing
Marcel Baláž, Roland Dobai, Elena Gramatová
Institute of Informatics of the Slovak Academy of Sciences, Bratislava, Slovakia
SoC devices are among the most advanced devices which are currently manufactured; consequently,
their test must take into consideration some crucial issues that can often be neglected in other
devices, manufactured with more mature technologies. One of these issues relates to delay faults:
we are forced not only to check whether the functionality of SoCs is still guaranteed, but also
whether they are able to correctly work at the maximum frequency they have been designed for.
New semiconductor technologies tend to introduce new kinds of faults, that can not be detected
unless the test is performed at speed and specifically targeting these kinds of faults. The chapter
focuses on delay faults: it provides an overview of the most important fault models introduced so far,
as well as a presentation of the key techniques for detecting them.
4.4. Low Power Testing
Zdeněk Kotásek, Jaroslav Škarvada
Brno University of Technology, Czech Republic
Another increasingly important issue in SoC testing is power consumption, which is becoming critical
not only for low-power devices. In general, test tends to excite as much as possible the device under
test; unfortunately, this normally results in a higher than usual switching activity, which is strictly
correlated with power consumption. Therefore, test procedures may consume more power than the
device is designed for, creating severe problems in terms of reliability and duration. The chapter
deals with power issues during test, clarifying where the problem comes from, and which techniques
can be used to circumvent it.
4.5. Thermal-Aware SoC Test Scheduling
Zhiyuan He, Zebo Peng, Petru Eles
Linköping University, Sweden
The high degree of integration of SoC devices, combined with the already mentioned power
consumption, may rise issues in terms of the temperature of the different parts of the device. In
general, problems stemming from the fact that some part of the circuit reaches a critical
temperature during the test can be solved by letting this part to cool before the test is resumed, but
this can obviously go against the common goal of minimizing test time. The chapter discusses
thermal issues during test, and proposes solutions to minimize their impact by identifying optimal
strategies for fulfilling thermal constraints while still minimizing test time.
Part 5: Test planning, compression and compaction in SoCs
Introduction
5.1. Study on Combined Test-Data Compression and Test Planning for Testing of Modular SoCs
Anders Larsson, Urban Ingelsson, Erik Larsson
Linköping University, Sweden
Krishnendu Chakrabarty
Duke University, USA
Test‐data volume and test execution times are both costly commodities. To reduce the cost of test,
previous studies have used test‐data compression techniques on system‐level to reduce the test‐data
volume or employed test architecture design for module‐based SOCs to enable test schedules with
low test execution time. Research on combining the two approaches is lacking. Chapter 5-1 studies
how core‐level test data compression can be combined with test architecture design and test
planning to reduce test cost. Test data compression for non-modular SoCs and test planning for
modular SoCs have been separately proposed to address test application time and test data volumes.
5.2. Reduction of the Transferred Test Data Amount
Ondřej Novák
Technical University Liberec, Czech Republic
The chapter addresses the bandwidth problem between the external tester and the device under
test (DUT). While the previous chapter assumes deterministic tests, this chapter suggests to combine
deterministic patterns stored on the external tester with pseudorandom patterns generated on chip.
The chapter details ad-hoc compression techniques for deterministic test and details a mixed-mode
approach that combines deterministic test vectors with pseudo-random test vectors using chip
automata.
5.3. Sequential Test Set Compaction in LFSR Reseeding
A. Jutman, I. Aleksejev, J. Raik
Tallinn University of Technology
The chapter continues on the line of the previous chapter and discusses embedded self-test. Instead
of transporting test data to the DUT, the approach in this chapter is to make use of a fully embedded
test solution where the test data is generated by on-chip linear feed-back shift-registers (LFSRs).
While LFSRs usually are considered to deliver lower quality tests than deterministic ATPG tests, the
chapter demonstrates that the test quality can be made high by careful planning of LRSR re-seeding.
Download