Literature Review Summary for EE800 Name: Nathan Windels ID: 10515113 ECE - U of S Literature Review: 1 Title: Optimality Study of Logic Synthesis for LUT-Based FPGAs Authors: Jason Cong and Kirill Minkovich The purpose of the research presented in this paper was to develop an algorithm for generating synthetic benchmarks (LEKO and LEKU) with known optimal technology mapping solutions. The benefit of this is that it allows us to construct arbitrarily large test circuits for Synthesis software. Basic building blocks call ‘Core Graphs’ are put together on several layers so that there exists a path from every basic block on the bottom layer to the top layer. To construct a core graph from a preexisting benchmark, all you have to do is extract a piece of logic from that has an equal number of inputs and outputs. Later, they showed that these benchmarks are structurally similar the MCNC benchmark by using Rent’s Rule and MCFC numbers. Literature Review: 2 Title: Application Specific Low Power ALU Design Authors: Yu Zhou and Hui Guo Power consumption is a critical design issue in embedded processor designs. Two types of ALU structures: Tree structure (faster, larger area) Chain structure (slower, smaller area) The idea of this paper is to customize the ALU by repositioning the elements in the chain structure. Swapping different components in a chain structure may favour some applications and can save a considerable amount of ALU power. This approach to power reduction is almost cost free and is extremely simple to implement. We can customize the ALU design by identifying frequent functional components and placing them close to the output. In application specific designs, the frequency of a functional component is obtained from instruction frequencies. We can therefore partition the instruction set into ALU and non-ALU instructions. The ALU instructions can then be grouped according to the functional component they actually use. Different weights of power consumption can be assigned to different functional components. Literature Review: 3 Title: FPGA Implementation of a 64-bit BID-Based Decimal Floating Point Adder/Subtractor Authors: Amin Farmahinin-Farahani, Charles Tsen, and Katherine Compton FPGA’s are a potential solution to add hardware-based DFP engines to existing compute clusters without replacing the systems. This allows them to accelerate DFP calculations without replacing their computing infrastructure. The basic idea in this paper was to take an adder implemented in HDL for standard cells and improve it for the Xilinx Virtex 5. Hardware Optimizations for Virtex 5: Rounder block is largest component (used for alignment and rounding). In the original design, it used 12 DSP48E blocks for the multiplier: multi-cycle multipliers use only 4 DSP48E blocks, and are thus a good choice for many parallel DFP units. Several 64bit 2:1 muxes are an inefficiently use LUT resources: left as single LUT’s because if they are combined, it causes wire routing problems. Constant Tables: Merge 2 into single BRAM (other tables were left as LUT’s due to inefficient use of BRAM). Change active-low signals with asynchronous reset to active high signals with synchronous reset (fits the Virtex 5 setup better). Literature Review: 4 Title: A Novel Cache Architecture with Enhanced Performance and Security Authors: Zhenghong Wang and Ruby B. Lee Problems with current (non-random) cache designs: Hardware caches in processors introduce interference between programs and users. One process can evict cache lines of other processes, causing them to miss cache accesses. Critical information can be leaked out due to common cache behaviour. Cache-based attacks allow the recovery of the full secret cryptographic key and require much less time and computation power to do so. The new cache design involves the following: A new address decoder; A new replacement algorithm (SecRAND); Adopts the direct mapped architecture and extend this with dynamic memory-to-cache remapping (implemented in a Re-Mapping Table) and larger cache index (k bits + 1 protected flag bit). The proposed address decoder, not only takes up less area (metal lines + combinational logic), but it also allows for the dynamic remapping (and as a side effect allows for fault tolerance within the cache).