Nathan

advertisement
Literature Review Summary for EE800
Name: Nathan Windels
ID: 10515113
ECE - U of S
Literature Review: 1
Title: Optimality Study of Logic Synthesis for LUT-Based FPGAs
Authors: Jason Cong and Kirill Minkovich
The purpose of the research presented in this paper was to develop an algorithm for generating
synthetic benchmarks (LEKO and LEKU) with known optimal technology mapping solutions. The benefit
of this is that it allows us to construct arbitrarily large test circuits for Synthesis software.
Basic building blocks call ‘Core Graphs’ are put together on several layers so that there exists a path
from every basic block on the bottom layer to the top layer. To construct a core graph from a preexisting benchmark, all you have to do is extract a piece of logic from that has an equal number of
inputs and outputs.
Later, they showed that these benchmarks are structurally similar the MCNC benchmark by using Rent’s
Rule and MCFC numbers.
Literature Review: 2
Title: Application Specific Low Power ALU Design
Authors: Yu Zhou and Hui Guo
Power consumption is a critical design issue in embedded processor designs.
Two types of ALU structures:
 Tree structure (faster, larger area)
 Chain structure (slower, smaller area)
The idea of this paper is to customize the ALU by repositioning the elements in the chain structure.
Swapping different components in a chain structure may favour some applications and can save a
considerable amount of ALU power. This approach to power reduction is almost cost free and is
extremely simple to implement.
We can customize the ALU design by identifying frequent functional components and placing them close
to the output. In application specific designs, the frequency of a functional component is obtained from
instruction frequencies. We can therefore partition the instruction set into ALU and non-ALU
instructions. The ALU instructions can then be grouped according to the functional component they
actually use. Different weights of power consumption can be assigned to different functional
components.
Literature Review: 3
Title: FPGA Implementation of a 64-bit BID-Based Decimal Floating Point
Adder/Subtractor
Authors: Amin Farmahinin-Farahani, Charles Tsen, and Katherine Compton
FPGA’s are a potential solution to add hardware-based DFP engines to existing compute clusters
without replacing the systems. This allows them to accelerate DFP calculations without replacing their
computing infrastructure.
The basic idea in this paper was to take an adder implemented in HDL for standard cells and improve
it for the Xilinx Virtex 5.
Hardware Optimizations for Virtex 5:
 Rounder block is largest component (used for alignment and rounding). In the original design, it
used 12 DSP48E blocks for the multiplier: multi-cycle multipliers use only 4 DSP48E blocks, and
are thus a good choice for many parallel DFP units.
 Several 64bit 2:1 muxes are an inefficiently use LUT resources: left as single LUT’s because if
they are combined, it causes wire routing problems.
 Constant Tables: Merge 2 into single BRAM (other tables were left as LUT’s due to inefficient
use of BRAM).
 Change active-low signals with asynchronous reset to active high signals with synchronous
reset (fits the Virtex 5 setup better).
Literature Review: 4
Title: A Novel Cache Architecture with Enhanced Performance and Security
Authors: Zhenghong Wang and Ruby B. Lee
Problems with current (non-random) cache designs:



Hardware caches in processors introduce interference between programs and users.
One process can evict cache lines of other processes, causing them to miss cache accesses.
Critical information can be leaked out due to common cache behaviour.
Cache-based attacks allow the recovery of the full secret cryptographic key and require much
less time and computation power to do so.
The new cache design involves the following: A new address decoder; A new replacement algorithm
(SecRAND); Adopts the direct mapped architecture and extend this with dynamic memory-to-cache
remapping (implemented in a Re-Mapping Table) and larger cache index (k bits + 1 protected flag bit).
The proposed address decoder, not only takes up less area (metal lines + combinational logic), but it
also allows for the dynamic remapping (and as a side effect allows for fault tolerance within the cache).
Download