ECE 545 Lecture 11 Modern FPGA Devices ATHENa - Automated Tool for Hardware EvaluatioN George Mason University Required Reading Xilinx, Inc. Virtex-5 FPGA Family Virtex-5 FPGA User Guide Chapter 5: Configurable Logic Blocks (CLBs) 2 Required Reading Altera, Inc. Stratix III FPGA Family Stratix III Device Handbook 1. Stratix III Device Family Overview 2. Logic Array Blocks and Adaptive Logic Modules in Stratix III Devices 3 Xilinx FPGA Devices Technology Low-cost Highperformance Virtex 2, 2 Pro Spartan 3 Virtex 4 120/150 nm 90 nm 65 nm 45 nm 40 nm Virtex 5 Spartan 6 Virtex 6 Altera FPGA Devices Technology Low-cost Mid-range 130 nm Cyclone Highperformanc e Stratix 90 nm Cyclone II Stratix II 65 nm Cyclone III Arria I Stratix III 40 nm Cyclone IV Arria II Stratix IV High-Performance Xilinx FPGAs ECE 448 – FPGA and ASIC Design with VHDL Virtex 5 Arrangement of Slices within the CLB Row and Column Relationship between CLBs and Slices Major Differences between Xilinx Families Look-Up Tables Spartan 3 Virtex 4 Virtex 5, Virtex 6, Spartan 6 4-input 6-input Number of CLB slices per CLB 4 2 Number of LUTs per CLB slice 2 4 Distributed RAM Configurations 64 x 1 Single Port 64 x 1 Dual Port 64 x 1 Quad Port 64 x 3 Simple Dual Port ROM Configurations 32-bit Shift Register, SRL 32-bit Shift Register Dual 16-bit Shift Register 64-bit Shift Register 96-bit Shift Register Fast Carry Logic Path Major Differences between Xilinx Families Spartan 3 Virtex 4 Maximum Single-Port Memory Size per LUT 16 x 1 Maximum Shift Register Size per LUT 16 bits Number of adder stages per CLB slice 2 Virtex 5, Virtex 6, Spartan 6 64 x 1 32 bits 4 Low-cost Altera FPGAs ECE 448 – FPGA and ASIC Design with VHDL Altera Cyclone III Logic Element (LE) – Normal Mode Altera Cyclone III Logic Element (LE) – Arithmetic Mode High-Performance Altera FPGAs ECE 448 – FPGA and ASIC Design with Stratix III Logic Array Blocks (LABs) High-Level Block Diagram of the Stratix III ALM Altera Stratix III Adaptive Logic Modules (ALM) – Normal Mode 4 × 2 Crossbar Switch Example Register Packing Template for Seven-Input Functions Supported in Extended LUT Mode Altera Stratix III, Stratix IV Adaptive Logic Modules (ALM) – Arithmetic Mode Performing Operation R = (X < Y) ? Y : X Three Operand Addition Utilizing Shared Arithmetic Mode LUT-Register Mode Register Chain Example of Resource Utilization Report (1) +--------------------------------------------------------------------------+ ; Fitter Resource Usage Summary ; +-------------------------------------------------+------------------------+ ; Resource ; Usage ; +-------------------------------------------------+------------------------+ ; ALUTs Used ; 415 / 38,000 ( 1 % ) ; ; -- Combinational ALUTs ; 415 / 38,000 ( 1 % ) ; ; -- Memory ALUTs ; 0 / 19,000 ( 0 % ) ; ; -- LUT_REGs ; 0 / 38,000 ( 0 % ) ; ; Dedicated logic registers ; 136 / 38,000 ( < 1 % ) ; ; ; ; ; Combinational ALUT usage by number of inputs ; ; ; -- 7 input functions ; 0 ; ; -- 6 input functions ; 287 ; ; -- 5 input functions ; 0 ; ; -- 4 input functions ; 24 ; ; -- <=3 input functions ; 104 ; ; ; ; ; Combinational ALUTs by mode ; ; ; -- normal mode ; 335 ; ; -- extended LUT mode ; 0 ; ; -- arithmetic mode ; 80 ; ; -- shared arithmetic mode ; 0 ; Example of Resource Utilization Report (2) ; Logic utilization ; 701 / 38,000 ( 2 % ) ; -- Difficulty Clustering Design ; Low ; -- Combinational ALUT/register pairs used in final Placement ; 476 ; -- Combinational with no register ; 340 ; -- Register only ; 61 ; -- Combinational with a register ; 75 ; -- Estimated pairs recoverable by pairing ALUTs and registers as design grows ; -54 ; -- Estimated Combinational ALUT/register pairs unavailable ; 279 ; -- Unavailable due to Memory LAB use ; 0 ; -- Unavailable due to unpartnered 7 LUTs ; 0 ; -- Unavailable due to unpartnered 6 LUTs ; 279 ; -- Unavailable due to unpartnered 5 LUTs ; 0 ; -- Unavailable due to LAB-wide signal conflicts ; 0 ; -- Unavailable due to LAB input limits ; 0 ; ; ; ; ; ; ; ; ; ; ; ; ; ; Example of Resource Utilization Report (3) ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; Total registers* -- Dedicated logic registers -- I/O registers -- LUT_REGs ALMs: partially or completely used Total LABs: partially or completely used -- Logic LABs -- Memory LABs User inserted logic elements Virtual pins I/O pins -- Clock pins -- Dedicated input pins Global signals M9K blocks M144K blocks Total MLAB memory bits Total block memory bits Total block memory implementation bits DSP block 18-bit elements PLLs Global clocks ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; 136 ; 136 / 38,000 ( < 1 % ) ; 0 / 2,752 ( 0 % ) ; 0 ; 360 / 19,000 ( 2 % ) ; 42 / 1,900 ( 2 % ) ; 42 / 42 ( 100 % ) ; 0 / 42 ( 0 % ) ; ; 0 ; 0 ; 20 / 488 ( 4 % ) ; 5 / 16 ( 31 % ) ; 0 / 12 ( 0 % ) ; 2 ; 0 / 108 ( 0 % ) ; 0 / 6 ( 0 % ) ; 0 ; 0 / 1,880,064 ( 0 % ) ; 0 / 1,880,064 ( 0 % ) ; 0 / 216 ( 0 % ) ; 0 / 4 ( 0 % ) ; 2 / 16 ( 13 % ) ; ATHENa George Mason University Resources • ATHENa website http://cryptography.gmu.edu/athena 42 ATHENa – Automated Tool for Hardware EvaluatioN 43 Supported in part by the National Institute of Standards & Technology (NIST) ATHENa Team Venkata “Vinny” MS CpE student Ekawat “Ice” PhD CpE student Marcin John PhD ECE student MS CpE student Michal PhD exchange PhD ECE student from Slovakia student Rajesh ATHENa – Automated Tool for Hardware EvaluatioN http://cryptography.gmu.edu/athena Benchmarking open-source tool, written in Perl, aimed at an AUTOMATED generation of OPTIMIZED results for MULTIPLE hardware platforms Currently under development at George Mason University. 45 Why Athena? "The Greek goddess Athena was frequently called upon to settle disputes between the gods or various mortals. Athena Goddess known for her superb logic and intellect. Her decisions were usually well-considered, highly ethical, and seldom motivated by self-interest.” from "Athena, Greek Goddess of Wisdom and Craftsmanship" 46 Basic Dataflow of ATHENa User FPGA Synthesis and Implementation 6 5 Database query ATHENa Server 2 Ranking of designs HDL + scripts + configuration files 3 Result Summary + Database Entries 1 HDL + FPGA Tools Download scripts and configuration files8 4 Designer Database Entries 0 Interfaces + Testbenches 47 configuration files synthesizable source files result summary (user-friendly) constraint files testbench database entries (machinefriendly) 48 ATHENa Major Features (1) • synthesis, implementation, and timing analysis in batch mode • support for devices and tools of multiple FPGA vendors: • generation of results for multiple families of FPGAs of a given vendor • automated choice of a best-matching device within a given family 49 ATHENa Major Features (2) • automated verification of designs through simulation in batch mode OR • support for multi-core processing • automated extraction and tabulation of results • several optimization strategies aimed at finding – optimum options of tools – best target clock frequency – best starting point of placement 50 Generation of Results Facilitated by ATHENa • batch mode of FPGA tools vs. • ease of extraction and tabulation of results • • Text Reports, Excel, CSV (Comma-Separated Values) optimized choice of tool options • GMU_optimization_1 strategy 51 Relative Improvement of Results from Using ATHENa Virtex 5, 256-bit Variants of Hash Functions 2.5 2 1.5 Area Thr Thr/Area 1 0.5 0 Ratios of results obtained using ATHENa suggested options vs. default options of FPGA tools 52 Other (Somewhat) Similar Tools ExploreAhead (part of PlanAhead) Design Space Explorer (DSE) Boldport Flow EDAx10 Cloud Platform 53 Distinguishing Features of ATHENa • Support for multiple tools from multiple vendors • Optimization strategies aimed at the best possible performance rather than design closure • Extraction and presentation of results • Seamless integration with the ATHENa database of results 54 How To Start Working With ATHENa? One-Time Tasks Download and unzip ATHENa http://cryptography.gmu.edu/athena/ Read the Tutorial! Install the Required Tools (see Tutorial - Part 1 – Tools Installation) Run ATHENa_setup How To Start Working With ATHENa? Repetitive Tasks Prepare or modify your source files & source_list.txt Modify design.config.txt + possibly other configuration files Run ATHENa design.config.txt Your Design # directory containing synthesizable source files for the project SOURCE_DIR = <examples/sha256_rs> # A file list containing list of files in the order suitable for synthesis and implementation # low level modules first, top level entity last SOURCE_LIST_FILE = source_list.txt # project name # it will be used in the names of result directories PROJECT_NAME = SHA256 # name of top level entity TOP_LEVEL_ENTITY = sha256 # name of top level architecture TOP_LEVEL_ARCH = rs_arch # name of clock net CLOCK_NET = clk design.config.txt Timing Formulas #formula for latency LATENCY = TCLK*65 #formula for throughput THROUGHPUT = 512/(TCLK*65) design.config.txt Application & Optimization Target # OPTIMIZATION_TARGET = speed | area | balanced OPTIMIZATION_TARGET = speed # OPTIONS = default | user OPTIONS = default # APPLICATION = single_run | exhaustive_search | placement_search | frequency_search | # GMU_Optimization_1 | GMU_Xilinx_optimization_1 APPLICATION = single_run # TRIM_MODE = off | zip | delete TRIM_MODE = zip design.config.txt FPGA Families # commenting the next line removes all families of Xilinx FPGA_VENDOR = xilinx #commenting the next line removes a given family FPGA_FAMILY = spartan3 # FPGA_DEVICES = <list of devices> | best_match | all FPGA_DEVICES = best_match SYN_CONSTRAINT_FILE = default IMP_CONSTRAINT_FILE = default REQ_SYN_FREQ = 120 REQ_IMP_FREQ = 100 MAX_SLICE_UTILIZATION = 0.8 MAX_BRAM_UTILIZATION = 0.8 MAX_MUL_UTILIZATION = 1 MAX_PIN_UTILIZATION = 0.9 END FAMILY END VENDOR design.config.txt FPGA Families # commenting the next line removes all families of Altera FPGA_VENDOR = altera #commenting the next line removes a given family FPGA_FAMILY = Stratix III # FPGA_DEVICES = <list of devices> | best_match | all FPGA_DEVICES = best_match SYN_CONSTRAINT_FILE = default IMP_CONSTRAINT_FILE = default REQ_IMP_FREQ = 120 MAX_LOGIC_UTILIZATION = 0.8 MAX_MEMORY_UTILIZATION = 0.8 MAX_DSP_UTILIZATION = 0 MAX_MUL_UTILIZATION = 0 MAX_PIN_UTILIZATION = 0.8 END FAMILY END VENDOR Library Files device_lib/xilinx_device_lib.txt device_lib/altera_device_lib.txt • Files created during ATHENa setup • Characterize FPGA families and devices available in the version of Xilinx and Altera tools installed on your computer • Currently supported tool versions: – – – – Xilinx WebPACK 9.1, 9.2, 10.1, 11.1, 11.5, 12.1, 12.2, 12.3 Xilinx Design Suite 11.1, 12.1, 12.2, 12.3 Altera Quartus II Web Edition 8.1, 8.2, 9.0, 9.1, 10.0 Altera Quartus II Subscription Edition 9.1, 10.0 • In case a library for a given version not available yet, use a library from the closest available version Library Files device_lib/xilinx_device_lib.txt VENDOR = Xilinx #Device, Total Slices, Block RAMs, DSP, Dedicated Multipliers, Maximum User I/O Pins ITEM_ORDER = SLICE, BRAM, DSP, MULT, IO FAMILY = spartan3 xc3s50pq208-5, 768, 4, 0, 4, 124 xc3s200ft256-5, 1920, 12, 0, 12, 173 xc3s400fg456-5, 3584, 16, 0, 16, 264 xc3s1000fg676-5, 7680, 24, 0, 24, 391 xc3s1500fg676-5, 13312, 32, 0, 32, 487 END_FAMILY FAMILY = virtex5 xc5vlx30ff676-3, xc5vfx30tff665-3, xc5vlx30tff665-3, xc5vlx50ff1153-3, xc5vlx50tff1136-3, END_FAMILY 4800, 32, 5120, 68, 4800, 36, 7200, 48, 7200, 60, 32, 64, 32, 48, 48, 0, 0, 0, 0, 0, 400 360 360 560 480 Result Files report_resource_utilization.txt xilinx : spartan3 +---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+ | GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % | +---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+ | default | xc3s200ft256-5* | 1 | 142 | 3 | 74 | 3 | 4 | 33 | 7 | 58 | 0 | 0 | 20 | 11 | +---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+ xilinx : spartan6 +---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+ | GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % | +---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+ | default | xc6slx9csg324-3* | 1 | 41 | 1 | 22 | 1 | 4 | 6 | 0 | 0 | 9 | 56 | 20 | 10 | +---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+ xilinx : virtex5 +---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+ | GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % | +---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+ | default | xc5vlx20tff323-2* | 1 | 101 | 1 | 56 | 1 | 4 | 15 | 0 | 0 | 9 | 37 | 20 | 11 | +---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+ xilinx : virtex6 +---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+ | GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % | +---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+ | default | xc6vlx75tff784-3* | 1 | 44 | 1 | 21 | 1 | 4 | 1 | 0 | 0 | 9 | 3 | 20 | 5 | +---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+ Result Files report_timing.txt REQ SYN REQ SYN REQ IMP REQ IMP LATENCY TP/Area FREQ TCLK FREQ TCLK - Requested synthesis clk freq. - Requested synthesis clk period - Requested implement. clk freq. - Requested implement. clk period - Latency [ns] - Throughput/Area [(Mbits/s)/CLB slices SYN FREQ – Achieved synthesis clk. freq. SYN TCLK – Achieved synthesis clk. period IMP FREQ – Achieved implement. clk. freq. IMP TCLK – Achieved implement clk. period THROUGHPUT – Throughput [Mbits/s] Latency*Area – Latency*Area [ns*CLB slices] xilinx : spartan3 +---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ | GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area | +---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ | default | xc3s200ft256-5* | 1 | default | 207.370 | default | 4.822 | default | 112.448 | default | 8.893 | 17.786 | 449.792 | 6.078 | 1316.164 | +---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ xilinx : spartan6 +---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ | GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area | +---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ | default | xc6slx9csg324-3* | 1 | default | 75.751 | default | 13.201 | default | 78.119 | default | 12.801 | 25.602 | 312.476 | 14.203 | 563.244 | +---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ xilinx : virtex5 +---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ | GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area | +---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ | default | xc5vlx20tff323-2* | 1 | default | 156.347 | default | 6.396 | default | 126.952 | default | 7.877 | 15.754 | 507.808 | 9.068 | 882.224 | +---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ xilinx : virtex6 +---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ | GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area | +---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ | default | xc6vlx75tff784-3* | 1 | default | 158.053 | default | 6.327 | default | 135.410 | default | 7.385 | 14.770 | 541.638 | 25.792 | 310.170 | +---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ Result Files report_options.txt COST TABLE - parameter determining the starting point of placement Synthesis Options – options of the synthesis tool Map Options – Options of the mapping tool PAR Options – Options of the place & route tool xilinx : spartan3 +---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+ | GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options | +---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+ | default | xc3s200ft256-5* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b -cm speed | -w -ol std | +---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+ xilinx : spartan6 +---------+------------------+-----+------------+------------------------------+---------------+--------------+ | GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options | +---------+------------------+-----+------------+------------------------------+---------------+--------------+ | default | xc6slx9csg324-3* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b | -w -ol std | +---------+------------------+-----+------------+------------------------------+---------------+--------------+ xilinx : virtex5 +---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+ | GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options | +---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+ | default | xc5vlx20tff323-2* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b -cm speed | -w -ol std | +---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+ xilinx : virtex6 +---------+-------------------+-----+------------+------------------------------+---------------+--------------+ | GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options | +---------+-------------------+-----+------------+------------------------------+---------------+--------------+ | default | xc6vlx75tff784-3* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b | -w -ol std | +---------+-------------------+-----+------------+------------------------------+---------------+--------------+ Result Files report_execution_time.txt Synthesis Time Implementation Time Elapsed Time - Time of Synthesis - Time of Implementation - Total Time xilinx : spartan3 +---------+-----------------+-----+----------------+---------------------+--------------+ | GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time | +---------+-----------------+-----+----------------+---------------------+--------------+ | default | xc3s200ft256-5* | 1 | 0d 0h:0m:12s | 0d 0h:0m:36s | 0d 0h:0m:48s | +---------+-----------------+-----+----------------+---------------------+--------------+ xilinx : spartan6 +---------+------------------+-----+----------------+---------------------+--------------+ | GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time | +---------+------------------+-----+----------------+---------------------+--------------+ | default | xc6slx9csg324-3* | 1 | 0d 0h:0m:21s | 0d 0h:1m:13s | 0d 0h:1m:34s | +---------+------------------+-----+----------------+---------------------+--------------+ xilinx : virtex5 +---------+-------------------+-----+----------------+---------------------+--------------+ | GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time | +---------+-------------------+-----+----------------+---------------------+--------------+ | default | xc5vlx20tff323-2* | 1 | 0d 0h:0m:39s | 0d 0h:1m:50s | 0d 0h:2m:29s | +---------+-------------------+-----+----------------+---------------------+--------------+ xilinx : virtex6 +---------+-------------------+-----+----------------+---------------------+--------------+ | GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time | +---------+-------------------+-----+----------------+---------------------+--------------+ | default | xc6vlx75tff784-3* | 1 | 0d 0h:0m:22s | 0d 0h:3m:22s | 0d 0h:3m:44s | +---------+-------------------+-----+----------------+---------------------+--------------+ design.config.txt Functional Simulation (1) # FUNCTIONAL_VERFICATION_MODE = <on | off> FUNCTIONAL_VERIFICATION_MODE = <off> # directory containing source files of the testbench VERIFICATION_DIR = <examples/sha256_rs/tb> # A file containing a list of testbench files in the order suitable for compilation; # low level modules first, top level entity last. # Test vector files should be located in the same directory and listed # in the same file, unless fixed path is used. Please refer to tutorial for more detail. VERIFICATION_LIST_FILE = <tb_srcs.txt> # name of testbench's top level entity TB_TOP_LEVEL_ENTITY = <sha_tb> # name of testbench's top level architecture TB_TOP_LEVEL_ARCH = <behavior> design.config.txt Functional Simulation (2) # MAX_TIME_FUNCTIONAL_VERIFICATION = <$time $unit> # supported unit are : ps, ns, us, and ms # if blank, simulation will run until it finishes = # = no changes in signals, i.e., clock is stopped and no more inputs coming in. MAX_TIME_FUNCTIONAL_VERIFICATION = <> # Perform only verification (synthesis and implementation parameters are ignored) # VERIFICATION_ONLY = <ON | OFF> VERIFICATION_ONLY = <off> test_circuit: ATHENa Example including embedded FPGA resources design.config.txt Global Generics GLOBAL_GENERICS_BEGIN # Number of stages # n is currently set to the default value i.e n=16 # for other values of n, modify the formulas for Latency and Throughput accordingly n = 16 # Memory type: 0 = MEM_DISTRIBUTED, 1= MEM_EMBEDDED mem_type = 0, 1 # Adder type: 0 = ADD_SCCA_BASED (Simple Carry Chain Adder, "+" in VHDL), # 1 = ADD_DSP_BASED # Multiplier type: 0 = MUL_LOGIC_BASED (multiplier based on configurable logic), # 1 = MUL_DSP_BASED # Allowed combinations of adder and multiplier types (adder_type, multiplier_type) = (0, 0), (1, 1) GLOBAL_GENERICS_END design.config.txt FPGA Family Specific Generics FPGA_FAMILY = Cyclone II GENERICS_BEGIN # FPGA vendor: 0 = XILINX, 1 = ALTERA vendor = 1 # Memory block size: 0 = M512, 1 = M4K, 2 = M9K, 3 = M20K, # 4 = MLAB, 5 = MRAM, 6 = M144K mem_block_size = 1 GENERICS_END FPGA_DEVICES = best_match REQ_IMP_FREQ = 120 MAX_LOGIC_UTILIZATION = 0.8 MAX_MEMORY_UTILIZATION = 0.8 MAX_DSP_UTILIZATION = 1 MAX_MUL_UTILIZATION = 1 MAX_PIN_UTILIZATION = 0.8 END FAMILY ATHENa – Database of Results 73 ATHENa Database http://cryptography.gmu.edu/athenadb 74 ATHENa Database – Result View • Algorithm parameters • Design parameters Optimization target Architecture type Datapath width I/O bus widths Availability of source code Platform Vendor, Family, Device Timing Maximum clock frequency Maximum throughput Resource utilization Logic blocks (Slices/LEs/ALUTs) Multipliers/DSP units Tools Names & versions Detailed options Credits Designers & contact information 75 ATHENa Database – Compare Feature Matching fields in grey Non-matching fields in red and blue 76 Currently in the Database Hash Functions in FPGAs GMU Results for • 20 hash functions ( 14 Round 2 SHA-3 + 5 Round 3 SHA-3 + SHA-2 ) x 2 variants ( 256-bit output & 512-bit output ) x 11 FPGA families = 440 combinations (440-not_fitting) = 423 optimized results 77 Coming soon! • GMU results for Hash Functions in FPGAs Folded & unrolled architectures Pipelined architectures Lightweight architectures Architectures based on embedded resources • Other Groups’ results for Hash Functions in FPGAs • Other Groups’ results for Hash Functions in ASICs • Modular Arithmetic (basis of public key cryptography) in FPGAs & ASICs 78 Possible Future Customizations The same basic database can be customized and adapted for other domains, such as • Digital Signal Processing • Bioinformatics • Communications • Scientific Computing, etc. 79 ATHENa - Website 80 ATHENa Website http://cryptography.gmu.edu/athena/ • Download of ATHENa Tool • Links to related tools SHA-3 Competition in FPGAs & ASICs • Specifications of candidates • Interface proposals • RTL source codes • Testbenches • ATHENa database of results • Related papers & presentations 81 GMU Source Codes and Block Diagrams • First batch of GMU Source Codes for all Round 3 SHA-3 Candidates & SHA-2 made available at the ATHENa website at: http://cryprography.gmu.edu/athena • Included in this release: • Basic architectures • Folded architectures • Unrolled architectures • Each code supports two variants: with 256-bit and 512-bit output. • Each source code accompanied by comprehensive hierarchical block diagrams 82 ATHENa Result Replication Files • Scripts and configuration files sufficient to easily reproduce all results (without repeating optimizations) • Automatically created by ATHENa for all results generated using ATHENa • Stored in the ATHENa Database In the same spirit of Reproducible Research as: • J. Claerbout (Stanford University) “Electronic documents give reproducible research a new meaning,” in Proc. 62nd Ann. Int. Meeting of the Soc. of Exploration Geophysics, 1992, http://sepwww.stanford.edu/doku.php?id=sep:research:reproducible:seg92 ..... • Patrick Vandewalle1, Jelena Kovacevic2, and Martin Vetterli1 (1EPFL, 2CMU) Reproducible research in signal processing - what, why, and how. IEEE Signal Processing Magazine, May 2009. http://rr.epfl.ch/17/ 83 Benchmarking Goals Facilitated by ATHENa Comparing multiple: 1. cryptographic algorithms 2. hardware architectures or implementations of the same cryptographic algorithm 3. hardware platforms from the point of view of their suitability for the implementation of a given algorithm, (e.g., choice of an FPGA device or FPGA board) 4. tools and languages in terms of quality of results they generate (e.g. Verilog vs. VHDL, Synplicity Synplify Premier vs. Xilinx XST, ISE v. 13.1 vs. ISE v. 12.3) 84