the ECE 449 Computer Design Lab

advertisement
ECE 545
Lecture 11
Modern FPGA Devices
ATHENa - Automated Tool for
Hardware EvaluatioN
George Mason University
Required Reading
Xilinx, Inc.
Virtex-5 FPGA Family
Virtex-5 FPGA User Guide
Chapter 5: Configurable Logic Blocks (CLBs)
2
Required Reading
Altera, Inc.
Stratix III FPGA Family
Stratix III Device Handbook
1. Stratix III Device Family Overview
2. Logic Array Blocks and Adaptive Logic
Modules in Stratix III Devices
3
Xilinx FPGA Devices
Technology
Low-cost
Highperformance
Virtex 2, 2 Pro
Spartan 3
Virtex 4
120/150 nm
90 nm
65 nm
45 nm
40 nm
Virtex 5
Spartan 6
Virtex 6
Altera FPGA Devices
Technology
Low-cost
Mid-range
130 nm
Cyclone
Highperformanc
e
Stratix
90 nm
Cyclone II
Stratix II
65 nm
Cyclone III
Arria I
Stratix III
40 nm
Cyclone IV
Arria II
Stratix IV
High-Performance Xilinx FPGAs
ECE 448 – FPGA and ASIC Design with VHDL
Virtex 5
Arrangement of Slices within the CLB
Row and Column Relationship
between CLBs and Slices
Major Differences between Xilinx Families
Look-Up Tables
Spartan 3
Virtex 4
Virtex 5, Virtex 6,
Spartan 6
4-input
6-input
Number of CLB slices
per CLB
4
2
Number of LUTs
per CLB slice
2
4
Distributed RAM Configurations
64 x 1 Single Port
64 x 1 Dual Port
64 x 1
Quad Port
64 x 3
Simple Dual Port
ROM Configurations
32-bit Shift Register, SRL
32-bit Shift Register
Dual 16-bit Shift Register
64-bit Shift Register
96-bit Shift Register
Fast Carry
Logic Path
Major Differences between Xilinx Families
Spartan 3
Virtex 4
Maximum Single-Port
Memory Size per LUT
16 x 1
Maximum Shift Register
Size per LUT
16 bits
Number of adder
stages per CLB slice
2
Virtex 5, Virtex 6,
Spartan 6
64 x 1
32 bits
4
Low-cost Altera FPGAs
ECE 448 – FPGA and ASIC Design with VHDL
Altera Cyclone III
Logic Element (LE) – Normal Mode
Altera Cyclone III
Logic Element (LE) – Arithmetic Mode
High-Performance Altera FPGAs
ECE 448 – FPGA
and ASIC Design with
Stratix III Logic Array Blocks (LABs)
High-Level Block Diagram of the Stratix III ALM
Altera Stratix III
Adaptive Logic Modules (ALM) – Normal Mode
4 × 2 Crossbar Switch Example
Register Packing
Template for Seven-Input Functions
Supported in Extended LUT Mode
Altera Stratix III, Stratix IV
Adaptive Logic Modules (ALM) – Arithmetic Mode
Performing
Operation
R = (X < Y) ? Y : X
Three Operand Addition
Utilizing Shared Arithmetic Mode
LUT-Register Mode
Register Chain
Example of Resource Utilization Report (1)
+--------------------------------------------------------------------------+
; Fitter Resource Usage Summary
;
+-------------------------------------------------+------------------------+
; Resource
; Usage
;
+-------------------------------------------------+------------------------+
; ALUTs Used
; 415 / 38,000 ( 1 % )
;
;
-- Combinational ALUTs
; 415 / 38,000 ( 1 % )
;
;
-- Memory ALUTs
; 0 / 19,000 ( 0 % )
;
;
-- LUT_REGs
; 0 / 38,000 ( 0 % )
;
; Dedicated logic registers
; 136 / 38,000 ( < 1 % ) ;
;
;
;
; Combinational ALUT usage by number of inputs
;
;
;
-- 7 input functions
; 0
;
;
-- 6 input functions
; 287
;
;
-- 5 input functions
; 0
;
;
-- 4 input functions
; 24
;
;
-- <=3 input functions
; 104
;
;
;
;
; Combinational ALUTs by mode
;
;
;
-- normal mode
; 335
;
;
-- extended LUT mode
; 0
;
;
-- arithmetic mode
; 80
;
;
-- shared arithmetic mode
; 0
;
Example of Resource Utilization Report (2)
; Logic utilization
; 701 / 38,000 ( 2 % )
;
-- Difficulty Clustering Design
; Low
;
-- Combinational ALUT/register pairs used
in final Placement
; 476
;
-- Combinational with no register
; 340
;
-- Register only
; 61
;
-- Combinational with a register
; 75
;
-- Estimated pairs recoverable by pairing ALUTs and registers
as design grows
; -54
;
-- Estimated Combinational ALUT/register pairs
unavailable
; 279
;
-- Unavailable due to Memory LAB use
; 0
;
-- Unavailable due to unpartnered 7 LUTs
; 0
;
-- Unavailable due to unpartnered 6 LUTs
; 279
;
-- Unavailable due to unpartnered 5 LUTs
; 0
;
-- Unavailable due to LAB-wide signal
conflicts
; 0
;
-- Unavailable due to LAB input limits
; 0
;
;
;
;
;
;
;
;
;
;
;
;
;
;
Example of Resource Utilization Report (3)
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
Total registers*
-- Dedicated logic registers
-- I/O registers
-- LUT_REGs
ALMs: partially or completely used
Total LABs: partially or completely used
-- Logic LABs
-- Memory LABs
User inserted logic elements
Virtual pins
I/O pins
-- Clock pins
-- Dedicated input pins
Global signals
M9K blocks
M144K blocks
Total MLAB memory bits
Total block memory bits
Total block memory implementation bits
DSP block 18-bit elements
PLLs
Global clocks
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
136
;
136 / 38,000 ( < 1 % ) ;
0 / 2,752 ( 0 % )
;
0
;
360 / 19,000 ( 2 % )
;
42 / 1,900 ( 2 % )
;
42 / 42 ( 100 % )
;
0 / 42 ( 0 % )
;
;
0
;
0
;
20 / 488 ( 4 % )
;
5 / 16 ( 31 % )
;
0 / 12 ( 0 % )
;
2
;
0 / 108 ( 0 % )
;
0 / 6 ( 0 % )
;
0
;
0 / 1,880,064 ( 0 % ) ;
0 / 1,880,064 ( 0 % ) ;
0 / 216 ( 0 % )
;
0 / 4 ( 0 % )
;
2 / 16 ( 13 % )
;
ATHENa
George Mason University
Resources
• ATHENa website
http://cryptography.gmu.edu/athena
42
ATHENa – Automated Tool for
Hardware EvaluatioN
43
Supported in part by the National Institute of Standards & Technology (NIST)
ATHENa Team
Venkata
“Vinny”
MS CpE
student
Ekawat
“Ice”
PhD CpE
student
Marcin
John
PhD ECE
student
MS CpE
student
Michal
PhD exchange
PhD ECE student from
Slovakia
student
Rajesh
ATHENa – Automated Tool for Hardware EvaluatioN
http://cryptography.gmu.edu/athena
Benchmarking open-source tool,
written in Perl, aimed at an
AUTOMATED generation of
OPTIMIZED results for
MULTIPLE hardware platforms
Currently under development at
George Mason University.
45
Why Athena?
"The Greek goddess Athena was frequently
called upon to settle disputes between
the gods or various mortals. Athena Goddess
known for her superb logic and intellect.
Her decisions were usually well-considered,
highly ethical, and seldom motivated
by self-interest.”
from "Athena, Greek Goddess
of Wisdom and Craftsmanship"
46
Basic Dataflow of ATHENa
User
FPGA Synthesis and
Implementation
6
5
Database
query
ATHENa
Server
2
Ranking
of designs
HDL + scripts +
configuration files
3
Result Summary
+ Database
Entries
1
HDL + FPGA Tools
Download scripts
and
configuration files8
4
Designer
Database
Entries
0
Interfaces
+ Testbenches
47
configuration
files
synthesizable
source files
result
summary
(user-friendly)
constraint
files
testbench
database
entries
(machinefriendly)
48
ATHENa Major Features (1)
•
synthesis, implementation, and timing analysis in batch mode
•
support for devices and tools of multiple FPGA vendors:
•
generation of results for multiple families of FPGAs of a given
vendor
•
automated choice of a best-matching device within a given
family
49
ATHENa Major Features (2)
•
automated verification of designs through simulation in batch
mode
OR
•
support for multi-core processing
•
automated extraction and tabulation of results
•
several optimization strategies aimed at finding
–
optimum options of tools
–
best target clock frequency
–
best starting point of placement
50
Generation of Results Facilitated by ATHENa
•
batch mode of FPGA tools
vs.
• ease of extraction and tabulation of results
•
•
Text Reports, Excel, CSV (Comma-Separated Values)
optimized choice of tool options
•
GMU_optimization_1 strategy
51
Relative Improvement of Results from Using ATHENa
Virtex 5, 256-bit Variants of Hash Functions
2.5
2
1.5
Area
Thr
Thr/Area
1
0.5
0
Ratios of results obtained using ATHENa suggested options
vs. default options of FPGA tools
52
Other (Somewhat) Similar Tools
ExploreAhead (part of PlanAhead)
Design Space Explorer (DSE)
Boldport Flow
EDAx10 Cloud Platform
53
Distinguishing Features of ATHENa
• Support for multiple tools from multiple vendors
• Optimization strategies aimed at the best possible
performance rather than design closure
• Extraction and presentation of results
• Seamless integration with the ATHENa database of results
54
How To Start Working With ATHENa?
One-Time Tasks
Download and unzip ATHENa
http://cryptography.gmu.edu/athena/
Read the Tutorial!
Install the Required Tools
(see Tutorial - Part 1 – Tools Installation)
Run ATHENa_setup
How To Start Working With ATHENa?
Repetitive Tasks
Prepare or modify your source files
& source_list.txt
Modify design.config.txt
+ possibly other configuration files
Run ATHENa
design.config.txt
Your Design
# directory containing synthesizable source files for the project
SOURCE_DIR = <examples/sha256_rs>
# A file list containing list of files in the order suitable for synthesis and implementation
# low level modules first, top level entity last
SOURCE_LIST_FILE = source_list.txt
# project name
# it will be used in the names of result directories
PROJECT_NAME = SHA256
# name of top level entity
TOP_LEVEL_ENTITY = sha256
# name of top level architecture
TOP_LEVEL_ARCH = rs_arch
# name of clock net
CLOCK_NET = clk
design.config.txt
Timing Formulas
#formula for latency
LATENCY = TCLK*65
#formula for throughput
THROUGHPUT = 512/(TCLK*65)
design.config.txt
Application & Optimization Target
# OPTIMIZATION_TARGET = speed | area | balanced
OPTIMIZATION_TARGET = speed
# OPTIONS = default | user
OPTIONS = default
# APPLICATION = single_run | exhaustive_search | placement_search | frequency_search |
#
GMU_Optimization_1 | GMU_Xilinx_optimization_1
APPLICATION = single_run
# TRIM_MODE = off | zip | delete
TRIM_MODE = zip
design.config.txt
FPGA Families
# commenting the next line removes all families of Xilinx
FPGA_VENDOR = xilinx
#commenting the next line removes a given family
FPGA_FAMILY = spartan3
# FPGA_DEVICES = <list of devices> | best_match | all
FPGA_DEVICES = best_match
SYN_CONSTRAINT_FILE = default
IMP_CONSTRAINT_FILE = default
REQ_SYN_FREQ = 120
REQ_IMP_FREQ = 100
MAX_SLICE_UTILIZATION = 0.8
MAX_BRAM_UTILIZATION = 0.8
MAX_MUL_UTILIZATION = 1
MAX_PIN_UTILIZATION = 0.9
END FAMILY
END VENDOR
design.config.txt
FPGA Families
# commenting the next line removes all families of Altera
FPGA_VENDOR = altera
#commenting the next line removes a given family
FPGA_FAMILY = Stratix III
# FPGA_DEVICES = <list of devices> | best_match | all
FPGA_DEVICES = best_match
SYN_CONSTRAINT_FILE = default
IMP_CONSTRAINT_FILE = default
REQ_IMP_FREQ = 120
MAX_LOGIC_UTILIZATION = 0.8
MAX_MEMORY_UTILIZATION = 0.8
MAX_DSP_UTILIZATION = 0
MAX_MUL_UTILIZATION = 0
MAX_PIN_UTILIZATION = 0.8
END FAMILY
END VENDOR
Library Files
device_lib/xilinx_device_lib.txt
device_lib/altera_device_lib.txt
• Files created during ATHENa setup
• Characterize FPGA families and devices available in the version of
Xilinx and Altera tools installed on your computer
• Currently supported tool versions:
–
–
–
–
Xilinx WebPACK
9.1, 9.2, 10.1, 11.1, 11.5, 12.1, 12.2, 12.3
Xilinx Design Suite
11.1, 12.1, 12.2, 12.3
Altera Quartus II Web Edition
8.1, 8.2, 9.0, 9.1, 10.0
Altera Quartus II Subscription Edition
9.1, 10.0
• In case a library for a given version not available yet, use a library from the
closest available version
Library Files
device_lib/xilinx_device_lib.txt
VENDOR = Xilinx
#Device, Total Slices, Block RAMs, DSP, Dedicated Multipliers, Maximum User I/O Pins
ITEM_ORDER =
SLICE, BRAM, DSP,
MULT,
IO
FAMILY = spartan3
xc3s50pq208-5,
768,
4,
0,
4,
124
xc3s200ft256-5,
1920, 12,
0,
12,
173
xc3s400fg456-5,
3584, 16,
0,
16,
264
xc3s1000fg676-5,
7680, 24,
0,
24,
391
xc3s1500fg676-5,
13312, 32,
0,
32,
487
END_FAMILY
FAMILY = virtex5
xc5vlx30ff676-3,
xc5vfx30tff665-3,
xc5vlx30tff665-3,
xc5vlx50ff1153-3,
xc5vlx50tff1136-3,
END_FAMILY
4800, 32,
5120, 68,
4800, 36,
7200, 48,
7200, 60,
32,
64,
32,
48,
48,
0,
0,
0,
0,
0,
400
360
360
560
480
Result Files
report_resource_utilization.txt
xilinx : spartan3
+---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+
| GENERIC | DEVICE
| RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % |
+---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+
| default | xc3s200ft256-5* | 1
| 142 | 3 | 74
| 3 | 4
| 33 | 7
| 58 | 0
| 0 | 20 | 11 |
+---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+
xilinx : spartan6
+---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+
| GENERIC | DEVICE
| RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % |
+---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+
| default | xc6slx9csg324-3* | 1
| 41
| 1 | 22
| 1 | 4
| 6 | 0
| 0 | 9
| 56 | 20 | 10 |
+---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+
xilinx : virtex5
+---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+
| GENERIC | DEVICE
| RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % |
+---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+
| default | xc5vlx20tff323-2* | 1
| 101 | 1 | 56
| 1 | 4
| 15 | 0
| 0 | 9
| 37 | 20 | 11 |
+---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+
xilinx : virtex6
+---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+
| GENERIC | DEVICE
| RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % |
+---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+
| default | xc6vlx75tff784-3* | 1
| 44
| 1 | 21
| 1 | 4
| 1 | 0
| 0 | 9
| 3 | 20 | 5 |
+---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+
Result Files
report_timing.txt
REQ SYN
REQ SYN
REQ IMP
REQ IMP
LATENCY
TP/Area
FREQ
TCLK
FREQ
TCLK
- Requested synthesis clk freq.
- Requested synthesis clk period
- Requested implement. clk freq.
- Requested implement. clk period
- Latency [ns]
- Throughput/Area [(Mbits/s)/CLB slices
SYN FREQ – Achieved synthesis clk. freq.
SYN TCLK – Achieved synthesis clk. period
IMP FREQ – Achieved implement. clk. freq.
IMP TCLK – Achieved implement clk. period
THROUGHPUT – Throughput [Mbits/s]
Latency*Area – Latency*Area [ns*CLB slices]
xilinx : spartan3
+---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| GENERIC | DEVICE
| RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area
| Latency*Area |
+---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| default | xc3s200ft256-5* | 1
| default
| 207.370 | default
| 4.822
| default
| 112.448 | default
| 8.893
| 17.786 | 449.792
| 6.078
| 1316.164
|
+---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
xilinx : spartan6
+---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| GENERIC | DEVICE
| RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area
| Latency*Area |
+---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| default | xc6slx9csg324-3* | 1
| default
| 75.751
| default
| 13.201
| default
| 78.119
| default
| 12.801
| 25.602 | 312.476
| 14.203
| 563.244
|
+---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
xilinx : virtex5
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| GENERIC | DEVICE
| RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area
| Latency*Area |
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| default | xc5vlx20tff323-2* | 1
| default
| 156.347 | default
| 6.396
| default
| 126.952 | default
| 7.877
| 15.754 | 507.808
| 9.068
| 882.224
|
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
xilinx : virtex6
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| GENERIC | DEVICE
| RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area
| Latency*Area |
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| default | xc6vlx75tff784-3* | 1
| default
| 158.053 | default
| 6.327
| default
| 135.410 | default
| 7.385
| 14.770 | 541.638
| 25.792
| 310.170
|
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
Result Files
report_options.txt
COST TABLE - parameter determining the starting point of placement
Synthesis Options – options of the synthesis tool
Map Options – Options of the mapping tool
PAR Options – Options of the place & route tool
xilinx : spartan3
+---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+
| GENERIC | DEVICE
| RUN | COST TABLE | Synthesis Options
| Map Options
| PAR Options |
+---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+
| default | xc3s200ft256-5* | 1
| 1
| -opt_level 1 -opt_mode speed | -c 100 -pr b -cm speed | -w -ol std |
+---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+
xilinx : spartan6
+---------+------------------+-----+------------+------------------------------+---------------+--------------+
| GENERIC | DEVICE
| RUN | COST TABLE | Synthesis Options
| Map Options
| PAR Options |
+---------+------------------+-----+------------+------------------------------+---------------+--------------+
| default | xc6slx9csg324-3* | 1
| 1
| -opt_level 1 -opt_mode speed | -c 100 -pr b | -w -ol std |
+---------+------------------+-----+------------+------------------------------+---------------+--------------+
xilinx : virtex5
+---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+
| GENERIC | DEVICE
| RUN | COST TABLE | Synthesis Options
| Map Options
| PAR Options |
+---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+
| default | xc5vlx20tff323-2* | 1
| 1
| -opt_level 1 -opt_mode speed | -c 100 -pr b -cm speed | -w -ol std |
+---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+
xilinx : virtex6
+---------+-------------------+-----+------------+------------------------------+---------------+--------------+
| GENERIC | DEVICE
| RUN | COST TABLE | Synthesis Options
| Map Options
| PAR Options |
+---------+-------------------+-----+------------+------------------------------+---------------+--------------+
| default | xc6vlx75tff784-3* | 1
| 1
| -opt_level 1 -opt_mode speed | -c 100 -pr b | -w -ol std |
+---------+-------------------+-----+------------+------------------------------+---------------+--------------+
Result Files
report_execution_time.txt
Synthesis Time
Implementation Time
Elapsed Time
- Time of Synthesis
- Time of Implementation
- Total Time
xilinx : spartan3
+---------+-----------------+-----+----------------+---------------------+--------------+
| GENERIC | DEVICE
| RUN | Synthesis Time | Implementation Time | Elapsed Time |
+---------+-----------------+-----+----------------+---------------------+--------------+
| default | xc3s200ft256-5* | 1
| 0d 0h:0m:12s
| 0d 0h:0m:36s
| 0d 0h:0m:48s |
+---------+-----------------+-----+----------------+---------------------+--------------+
xilinx : spartan6
+---------+------------------+-----+----------------+---------------------+--------------+
| GENERIC | DEVICE
| RUN | Synthesis Time | Implementation Time | Elapsed Time |
+---------+------------------+-----+----------------+---------------------+--------------+
| default | xc6slx9csg324-3* | 1
| 0d 0h:0m:21s
| 0d 0h:1m:13s
| 0d 0h:1m:34s |
+---------+------------------+-----+----------------+---------------------+--------------+
xilinx : virtex5
+---------+-------------------+-----+----------------+---------------------+--------------+
| GENERIC | DEVICE
| RUN | Synthesis Time | Implementation Time | Elapsed Time |
+---------+-------------------+-----+----------------+---------------------+--------------+
| default | xc5vlx20tff323-2* | 1
| 0d 0h:0m:39s
| 0d 0h:1m:50s
| 0d 0h:2m:29s |
+---------+-------------------+-----+----------------+---------------------+--------------+
xilinx : virtex6
+---------+-------------------+-----+----------------+---------------------+--------------+
| GENERIC | DEVICE
| RUN | Synthesis Time | Implementation Time | Elapsed Time |
+---------+-------------------+-----+----------------+---------------------+--------------+
| default | xc6vlx75tff784-3* | 1
| 0d 0h:0m:22s
| 0d 0h:3m:22s
| 0d 0h:3m:44s |
+---------+-------------------+-----+----------------+---------------------+--------------+
design.config.txt
Functional Simulation (1)
# FUNCTIONAL_VERFICATION_MODE = <on | off>
FUNCTIONAL_VERIFICATION_MODE = <off>
# directory containing source files of the testbench
VERIFICATION_DIR = <examples/sha256_rs/tb>
# A file containing a list of testbench files in the order suitable for compilation;
# low level modules first, top level entity last.
# Test vector files should be located in the same directory and listed
# in the same file, unless fixed path is used. Please refer to tutorial for more detail.
VERIFICATION_LIST_FILE = <tb_srcs.txt>
# name of testbench's top level entity
TB_TOP_LEVEL_ENTITY = <sha_tb>
# name of testbench's top level architecture
TB_TOP_LEVEL_ARCH = <behavior>
design.config.txt
Functional Simulation (2)
# MAX_TIME_FUNCTIONAL_VERIFICATION = <$time $unit>
# supported unit are : ps, ns, us, and ms
# if blank, simulation will run until it finishes =
#
= no changes in signals, i.e., clock is stopped and no more inputs coming in.
MAX_TIME_FUNCTIONAL_VERIFICATION = <>
# Perform only verification (synthesis and implementation parameters are ignored)
# VERIFICATION_ONLY = <ON | OFF>
VERIFICATION_ONLY = <off>
test_circuit:
ATHENa Example
including
embedded FPGA
resources
design.config.txt
Global Generics
GLOBAL_GENERICS_BEGIN
# Number of stages
# n is currently set to the default value i.e n=16
# for other values of n, modify the formulas for Latency and Throughput accordingly
n = 16
# Memory type: 0 = MEM_DISTRIBUTED, 1= MEM_EMBEDDED
mem_type = 0, 1
# Adder type:
0 = ADD_SCCA_BASED (Simple Carry Chain Adder, "+" in VHDL),
#
1 = ADD_DSP_BASED
# Multiplier type: 0 = MUL_LOGIC_BASED (multiplier based on configurable logic),
#
1 = MUL_DSP_BASED
# Allowed combinations of adder and multiplier types
(adder_type, multiplier_type) = (0, 0), (1, 1)
GLOBAL_GENERICS_END
design.config.txt
FPGA Family Specific Generics
FPGA_FAMILY = Cyclone II
GENERICS_BEGIN
# FPGA vendor: 0 = XILINX, 1 = ALTERA
vendor = 1
# Memory block size: 0 = M512, 1 = M4K, 2 = M9K, 3 = M20K,
#
4 = MLAB, 5 = MRAM, 6 = M144K
mem_block_size = 1
GENERICS_END
FPGA_DEVICES = best_match
REQ_IMP_FREQ = 120
MAX_LOGIC_UTILIZATION = 0.8
MAX_MEMORY_UTILIZATION = 0.8
MAX_DSP_UTILIZATION = 1
MAX_MUL_UTILIZATION = 1
MAX_PIN_UTILIZATION = 0.8
END FAMILY
ATHENa – Database
of Results
73
ATHENa Database
http://cryptography.gmu.edu/athenadb
74
ATHENa Database – Result View
• Algorithm parameters
• Design parameters
 Optimization target
 Architecture type
 Datapath width
 I/O bus widths
 Availability of source code
 Platform
 Vendor, Family, Device
 Timing
 Maximum clock frequency
 Maximum throughput
 Resource utilization
 Logic blocks (Slices/LEs/ALUTs)
 Multipliers/DSP units
 Tools
 Names & versions
 Detailed options
 Credits
 Designers & contact information
75
ATHENa Database – Compare Feature
Matching fields in grey
Non-matching fields in red and blue
76
Currently in the Database
Hash Functions in FPGAs
GMU Results for
•
20 hash functions
( 14 Round 2 SHA-3 + 5 Round 3 SHA-3 + SHA-2 )
x 2 variants ( 256-bit output & 512-bit output )
x 11 FPGA families
= 440 combinations
(440-not_fitting) = 423 optimized results
77
Coming soon!
• GMU results for Hash Functions in FPGAs
 Folded & unrolled architectures
 Pipelined architectures
 Lightweight architectures
 Architectures based on embedded resources
• Other Groups’ results for Hash Functions in FPGAs
• Other Groups’ results for Hash Functions in ASICs
• Modular Arithmetic (basis of public key cryptography)
in FPGAs & ASICs
78
Possible Future Customizations
The same basic database can be customized
and adapted for other domains, such as
• Digital Signal Processing
• Bioinformatics
• Communications
• Scientific Computing, etc.
79
ATHENa - Website
80
ATHENa Website
http://cryptography.gmu.edu/athena/
• Download of ATHENa Tool
• Links to related tools
SHA-3 Competition in FPGAs & ASICs
• Specifications of candidates
• Interface proposals
• RTL source codes
• Testbenches
• ATHENa database of results
• Related papers & presentations
81
GMU Source Codes and Block Diagrams
•
First batch of GMU Source Codes for
all Round 3 SHA-3 Candidates & SHA-2
made available at the ATHENa website at:
http://cryprography.gmu.edu/athena
•
Included in this release:
• Basic architectures
• Folded architectures
• Unrolled architectures
• Each code supports two variants:
with 256-bit and 512-bit output.
• Each source code accompanied by comprehensive
hierarchical block diagrams
82
ATHENa Result Replication Files
• Scripts and configuration files sufficient to easily
reproduce all results (without repeating optimizations)
• Automatically created by ATHENa for all
results generated using ATHENa
• Stored in the ATHENa Database
In the same spirit of Reproducible Research as:
• J. Claerbout (Stanford University)
“Electronic documents give reproducible research a new meaning,”
in Proc. 62nd Ann. Int. Meeting of the Soc. of Exploration Geophysics, 1992,
http://sepwww.stanford.edu/doku.php?id=sep:research:reproducible:seg92
.....
• Patrick Vandewalle1, Jelena Kovacevic2, and Martin Vetterli1 (1EPFL, 2CMU)
Reproducible research in signal processing - what, why, and how.
IEEE Signal Processing Magazine, May 2009. http://rr.epfl.ch/17/
83
Benchmarking Goals Facilitated by ATHENa
Comparing multiple:
1. cryptographic algorithms
2. hardware architectures or implementations
of the same cryptographic algorithm
3. hardware platforms from the point of view
of their suitability for the implementation of a given algorithm,
(e.g., choice of an FPGA device or FPGA board)
4. tools and languages in terms of quality
of results they generate (e.g. Verilog vs. VHDL,
Synplicity Synplify Premier vs. Xilinx XST,
ISE v. 13.1 vs. ISE v. 12.3)
84
Download