ICCAD’03 Review CSE 597B Lin Li Outline Overview Paper in related areas Archive download URL Best paper award Paper from our group Interesting tutorial Power and energy optimization Interconnect-centric SoC design Reliable issue Performance optimization Simulation at the nanometer scale Other areas in ICCAD Archive Download URL Papers and presentation slides can be downloaded from: http://www.iccad.com/archive.html Best Paper Award 6C.1 - Noise Analysis for Optical Fiber Communication Systems Alper Demir KOC University, Sariyer-Istanbul, Turkey 8B.1 - Block-Based Static Timing Analysis with Uncertainty Anirudh Devgan, Chandramouli Kashyap IBM Research at Austin, IBM Microelectronics Paper from Our Group 1A.1 - Adaptive Error Protection for Energy Efficiency Lin Li, N. Vijaykrishnan, Mahmut Kandemir, Mary Jane Irwin 3C.1 - Array Composition and Decomposition for Optimizing Embedded Applications Guilin Chen, Mahmut Kandemir, Ugur Sezer, Avanti Nadgir Interesting Tutorial 2C.1 - Design and CAD Challenges in sub90nm CMOS Technology Kerry Bernstein, Ching-Te Chuang, Rajiv V. Joshi, Ruchir Puri IBM T.J. Watson 11B.1 - Formal Methods for Dynamic Power Mangement Rajesh K. Gupta, Sandeep Shukla, Sandy Irani UCSD, UCI, and VT 2C.1 - Design and CAD Challenges in sub-90nm CMOS Technology Introduction Planar device structures Partially-depleted (PD) SOI Fully-depleted (FD) SOI Strained-Si & high-k gate Emerging technologies CMOS device scaling New devices for high-performance logic Double-gate MOSFETs 3D integration and interconnects Carbon Nanotube Transistor (CNT) Molecular computing CAD challenges Challenges of Advanced device technologies Major issues Power crisis Coping with Variability 2C.1 - Design and CAD Challenges in sub-90nm CMOS Technology (Cont’d) 11B.1 - Formal Methods for Dynamic Power Mangement Overview the formal methods that have been explored in solving the system-level Dynamic Power Management (DPM) problem. Show how formal reasoning frameworks can unify apparently disparate DPM techniques. Approaches that treat the DPM problem as one of stochastic optimization with probabilistic guarantees on performance. Power and Energy Optimization Using dynamic voltage scaling in embedded systems (Section 1B) Using software techniques in embedded systems (Section 3C) Energy issues in systems design (Section 7B) Power-aware design (Section 8C) 1B.1 - Generalized Network Flow Techniques for Dynamic Voltage Scaling in Hard Real-Time Systems Vishnu Swaminathan, Krishnendu Chakrabarty ECE@Duke Energy consumption must be carefully balanced with real-time responsiveness in hard real-time systems. Present an optimal offline dynamic voltage scaling (DVS) scheme for dynamic power management in such systems. lij, uij, Cij, mij Generalized Network Flow Models for the DVS problem Jobs Speeds i j Intervals s1h j1 s1l s . . . . jn s 1i snh D1 D2 . . . . snl D2n-2 sni D2n-1 t 1B.2 - Approaching the Maximum Energy Saving on Embedded Systems with Multiple Voltages Shaoxiong Hua, Gang Qu ECE@UMCP For a multiple-voltage DVS system to serve a set of applications {(ei, di, pi): i=1, 2, …, n} without missing their deadlines, if the system has m voltages {v1, v2,… ,vm}, determine the value of each vi to minimize the energy consumption. determine m and the value of each vi . 1B.2 - Approaching the Maximum Energy Saving on Embedded Systems with Multiple Voltages (Cont’d) Voltage set-up is the fundamental problem for multiple-voltage DVS system. application-specific 2-voltage DVS system: analytic solutions and a linear search algorithm m-voltage DVS system: analytic solution does not exist, an approximation method Multiple-voltage can be very close to the maximal energy saving by DVS. 1B.3 - Combined Dynamic Voltage Scaling and Adaptive Body Biasing for Heterogeneous Distributed Real-Time Embedded Systems Le Yan, Jiong Luo, Niraj K. Jha EE@Princeton New scheduling algorithm that combines DVS and adaptive body biasing (ABB) to simultaneously optimize both dynamic power consumption and leakage power consumption for real-time distributed embedded systems. 1B.3 - Combined Dynamic Voltage Scaling and Adaptive Body Biasing for Heterogeneous Distributed Real-Time Embedded Systems A novel two-phase approach Phase I Optimal tradeoff between supply and threshold voltages Phase II Trade off energy consumption and clock period 1B.3 - Combined Dynamic Voltage Scaling and Adaptive Body Biasing for Heterogeneous Distributed Real-Time Embedded Systems Initializations Phase I No Extensible tasks exist? Return Yes Allocate slack to reference task Phase II Reference task: highest energy_derivative Allocate slack to each other task No EST+WCET>LFT? Yes Invalidate this slack allocation energy_derivative: higher than reference level 3C.3 - Energy Optimazation of Distributed Embedded Processors by Combined Data Compress ion and Functional Partitioning Jinfeng Liu, Pai H. Chou Goal ECE@UCI Energy minimization for distributed embedded processors Combined optimization Selection of optimal compression algorithm Functional partitioning 3C.3 - Energy Optimazation of Distributed Embedded Processors by Combined Data Compress ion and Functional Partitioning 150MHz N1 RECV1 PROC1 IDLE A bad partitioning scheme that produces extra I/O load, without compression SEND1 D 150MHz Non-optimal without N2 PROC2 D RECV2 compression 80MHz PROC1 D However, it could turn out optimal with compression, if the data from N1 to N2 can be compressed well. SEND1 IDLE 80MHz PROC2 D IDLE SEND2 RECV2 COMP2 N2 DECO2 Optimal with compression SEND2 D COMP1 RECV1 DECO1 N1 IDLE 3C.4 - Energy-Aware Fault Tolerance in Fixed-Priority Real-Time Embedded Systems Ying Zhang, Krishnendu Chakrabarty, Vishnu Swaminathan ECE@Duke Goal: low power, fault-tolerant real-time systems Fault tolerance is achieved via checkpointing Power management is carried out using dynamic voltage scaling (DVS). 7B.1 - A Game Theoretic Approach to Dynamic Energy Minimization in Wireless Transceivers Ali Iranli, Hanif E. Fatemi, Massoud Pedram EE@USC A hierarchical formulation for energy optimization of wireless transceivers is proposed A game theoretic approach to solve this energy minimization is proposed by which the energy consumption is reduced by 15% for BER = 10-5 The proposed hierarchical frame work can be used in general for energy optimization of server-client systems 7B.1 - A Game Theoretic Approach to Dynamic Energy Minimization in Wireless Transceivers Transceiver Energy Optimization Transmitter Receiver Transmit Power & Modulation level Overall energy consumption Truncation length Receiver's energy consumption Stackelberg Game Leader Follower Leader’s Policy Leader’s cost function Follower’s Policy Follower’s cost function 7B.2 - Communication-Aware Task Scheduling and Voltage Selection for Total Systems Energy Minimization Girish V. Varatkar, Radu Marculescu ECE@CMU Recent work in ES community: performance and energy are crucial! Voltage selection Task scheduling algorithm should use the foresight that voltage selection is going to follow the scheduling step Schedule should provide the maximum slowing down potential This work brings the communication aspect into the picture A ‘communication-centric’ approach A ‘voltage selection’ approach 7B.3 - LRU-SEQ: A Novel Replacement Policy for Transition Energy Reduction in Instruction Caches Praveen G. Kalla, Xiaobo Sharon Hu, Joerg Henkel CSE@Notre Dame LRU to LRU-SEQ (Sequential LRU) Constraining sequential fetches to the same bank (same way) avoids bank transitions. It also increases the sleep time for the banks over-coming break-even time requirements. LRU nature has to be maintained, else associativity is lost !! (hit-ratio is affected) Distance between the last fetched line and the present line is a parameter that will affect the performance of this policy. 7B.3 - LRU-SEQ: A Novel Replacement Policy for Transition Energy Reduction in Instruction Caches FOR (every cache access) DO IF (access == HIT) THEN P_way = C_way State Holder 1: ELSE P_way dist = abs(Curr_Addr, Prev_Addr); (entire cache) IF ( dist <= SEQ_DST) THEN State Holder 2 : C_way = P_way P_line ELSE (each cache way) C_way = LRU_Way P_( ) : Previous_( ) END C_( ) : Current_( ) END Update LRU state for access. END 7B.4 - Compiler-Based Register Name Adjustment for Low-Power Embedded Processors Peter Petrov, Alex Orailoglu CSE@UCSD Compiler-driven register name adjustment for low-power was proposed Register names reassigned without incurring any performance or power overhead No hardware support required whatsoever Efficient algorithm for Register Name Adjustment proposed with additional frequency skew enhancing phase 8C.1 - Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches Nam S. Kim, David Blaauw, Trevor N. Mudge EECS@UMICH Cost- effective # of VTH for cache leakage reduction depending on the target access time, but 1 or 2 high VTH’s is enough for leakage reduction Cache leakage another design constraint in processor design trade-off among delay / area / leakage Incorporating w/ realistic cache miss statistics for the leakage optimization 8C.1 - Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches Using high-k dielectric reduces gate-oxide leakage ITRS 2002 projections with doubling of # of transistors every two years 8C.1 - Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches cache sub-bank organization VTH2 Circuit model based on CACTI word-line decoder VTH1 Abus buffer w/ repeater bit-line pair 70nm Berkeley predictive technology model VTH3 memory cell Interconnect R/C annotated sense-amp w/ I/O circuits Dbus buffer w/ repeater VTH4 repeaters used to minimize interconnect delay 8C.3 - Dynamic Platform Management for Configurable Platform-Based System-on-Chips Krishna Sekar, Kanishka Lahiri, Sujit Dey ECE@UCSD Described design techniques for dynamically customizing a general-purpose configurable platform Dynamic platform management helps combine benefits of general-purpose & application-specific approaches Benefits Improved application performance More efficient platform resource usage Improved energy efficiency Improving flexibility, time-to-market, engg. cost, time-in market, 8C.3 - Dynamic Platform Management for Configurable Platform-Based System-on-Chips Platform Customization Techniques General-Purpose Processors General Purpose Configurable Platforms Customized Platforms Domain Specific Platforms ASIC, Custom SoC Improving performance, power, size 8C.3 - Dynamic Platform Management for Configurable Platform-Based System-on-Chips Performance Objectives, Data Properties Application 1 Performance Objectives, Data Properties Application 2 Processing Requirements Performance Power Constraints Objectives, Data Properties Application 3 Processing Requirements Processing Requirements Dynamic Platform Management Optimized Platform Configuration General-purpose Configurable Platform Programmable Voltage Regulator Programmable PLL Embedded processor PLD On-chip communication architecture Flexible on-chip SRAM Reconfigurable Cache Parameterized co-processor Interconnect-Centric SoC Design 1A.2 - SAMBA-Bus: A High Performance Bus Architecture for System-on-Chips Ruibing Lu, Cheng-Kok Koh Single Arbitration, Multiple Bus Accesses Automatically delivers multiple bus transactions ECE@Purdue High bandwidth Bus transactions can be performed even without explicit bus access grant from the arbiter Communication latency increases only slightly even with high arbitration latency 1A.2 - SAMBA-Bus: A High Performance Bus Architecture for System-on-Chips Interface Unit Two sub-buses Forward Sub-bus Backward Sub-bus M1 M2 M3 M4 Forward Sub-bus Backward Sub-bus 1A.3 - The Y-Architecture for On-Chip Interconnect: Analysis and Methodology Hongyu Chen, Chung-Kuan Cheng, Andrew B. Kahng et.al. CSE@UCSD The Y-architecture for on-chip interconnect is based on pervasive use of 0-, 120-, and 240-degree oriented semi-global and global wiring. Communication capability (throughput of meshes) better than Manhattan architecture and Xarchitecture. Better total wire length compared to both H and X clock tree structures and better path length compared to the H tree. Achieve 8.5% less IR drop than an equally-resourced power network in Manhattan architecture. 1A.3 - The Y-Architecture for On-Chip Interconnect: Analysis and Methodology (a) A 7 by 7 mesh using Y-architecture (b) A 7 by 7 mesh using Manhattanarchitecture (c) A 7 by 7 mesh using Xarchitecture 7 x 7 meshes with different interconnect architectures. Reliable Issue 3B.4 - Vectorless Analysis of Supply Noise Induced Delay Variation Sanjay Pant, David Blaauw, Savithri Sundareswaran UMICH, Motorola Power Supply Integrity Issues Functional Failure Voltage fluctuations inject noise in the circuit Performance Failure Gate delay becoming increasing sensitive to supply voltage ±10% variation in supply can result in 30% delay increase Proposed Approach Vectorless Conservative in estimating worst-case drop/delay increase Takes into account both IR and LdI/dt drops 3B.4 - Vectorless Analysis of Supply Noise Induced Delay Variation Power Grid i/p Vector Search Simulator Library Charac. STA WorstCase Timing Voltage Drop Estimation Input Vectors Worst Voltage Drop Worst Drop highly dependent on input vectors Slow simulation times allow only a few vectors to be tried Worst-Case Voltage Budget Analysis Highly conservative Worst-case drop is localized Ignores voltage shifts between distant driver-receiver pairs 3B.4 - Vectorless Analysis of Supply Noise Induced Delay Variation VDD Divide Chip Into Blocks Compute Unit Pulse Response Gate Delay Characterize POWER GRID VDD GND GND Express Delay/Voltage Using Spatial/Temporal Superposition Formulate Delay/Voltage Max. As Linear Optimization V(t) GROUND GRID V(t) i (t) Variables 5B.2 - Fault-Tolerant Techniques for Ambient Intelligent Distributed Systems Diana Marculescu Novel techniques for harnessing redundancy as a way for increasing fault-tolerance ECE@CMU Assume a large number of networked devices Idle devices can act as surrogates for failing ones via application migration or remapping Scheduling techniques for optimizing system lifetime Determine optimal migration schedule, under realistic battery models 8C.2 - Dynamic Fault-Tolerance and Metrics for Battery Powered, Failure-Prone Systems Phillip Stanley-Marbell, Diana Marculescu ECE@CMU Introduce the concept of adaptive faulttolerance management for failure-prone systems, and a classification of local algorithms for achieving system-wide reliability. Performance Optimization 5B.1 - Cache Optimization For Embedded Processor Cores: An Analytical Approach Arijit Ghosh, Tony Givargis CS@UCI An efficient algorithm to directly compute cache parameters satisfying desired performance criteria. 5B.3 - Performance Efficiency of Context-Flow System-On-Chip Platform Rami Beidas, Jianwen Zhu ECE@Toronto A new programming model, called contextflow, that is simple, safe, highly parallelizable yet transparent to the underlying architectural details. Simulation at the Nanometer Scale 7A.1 - A Probabilistic-Based Design Methodology for Nano-Scale Computation Iris Bahar, Joseph Mundy, Jie Chen Based on Markov random fields Propose a new architectural framework designed to handle faulty processes prevalent with nanoscale devices Brown Dynamically defect tolerant Adapts to errors as a natural consequence of probability maximization Removes need to actually detect faults Can handle both structure- and signal-based faults 7A.1 - A Probabilistic-Based Design Methodology for Nano-Scale Computation Carbon Nanotubes (CNTs) Excellent conductors Diodes, FETs, and memory arrays using CNTs have been demonstrated Physical placement of CNTs is an issue Alumina substrates have been proposed to fabricate arrays of CNTs Off Junction On Junction Carbon Nanotubes 7A.1 - A Probabilistic-Based Design Methodology for Nano-Scale Computation Molecular devices Direct use of molecules and their electronic states Conduction achieved by changes in physical configuration or electronic state Diodes and memory have been demonstrated additional electron switch on 7A.1 - A Probabilistic-Based Design Methodology for Nano-Scale Computation Quantum Cellular Automata (QCA) Based on local interaction of quantum dots arranged in cells Logic function is encoded into spatial patterns of the cells. Information is propagates through chains of QCA devices 7A.2 - Modeling of Ballistic Carbon Nanotube Field Effect Transistors for Efficient Circuit Simulation Arijit Raychowdhury, Saibal Mukhopadhyay, Kaushik Roy ECE@Purdue Circuit/SPICE level model for Ballistic CNFETs Removes self-consistent solutions of Poisson’s and Schrödinger's Equations Proposed model closely replicates the self consistent numerical simulations The model has been used to simulate simple adders/multipliers 7A.2 - Modeling of Ballistic Carbon Nanotube Field Effect Transistors for Efficient Circuit Simulation Carbon nanotubes are graphite sheets rolled in the form of tubes. They act as channel material for FETs. Source: IBM 7A.2 - Modeling of Ballistic Carbon Nanotube Field Effect Transistors for Efficient Circuit Simulation Schottky barrier Top Gate S Intrinsic CNT Band Diagram b=Eg/2 D ZrO2 Bottom Gate Intrinsic CNT n+ Top Gate ZrO2 Bottom Gate n n+ + 7A.2 - Modeling of Ballistic Carbon Nanotube Field Effect Transistors for Efficient Circuit Simulation • Performance of CNFETs can be evaluated only through circuit simulations • SPICE compatible compact modeling is essential for circuit simulations 7A.3 - Circuit Simulation of Nanotechnology Devices with Non-Monotonic I-V Characteristics Jiayong Le, Larry Pileggi, Anirudh Devgan ECE@CMU Describes a circuit level simulator that can accommodate an important class of nanotechnology devices that are characterized by nonmonotonic I-V characteristics. Other Areas in ICCAD Placement, Routing, and Floorplanning Analog design and Methodology Verification Formal Verification Dynamic Verification Timing Analysis Delay and Signal Modeling Statistical Static Timing Retiming for Global Interconnects Other Areas in ICCAD (Cont’d) CAD Algorithms for Emerging Technologies Reversible Logic Synthesis DNA Probe Array Layout MEMS Design for Customized Processors Synthesis Testing