ECE 545 Digital System Design with VHDL Course web page: ECE web page Courses ECE 545 Kris Gaj Research and teaching interests: • reconfigurable computing • computer arithmetic • cryptography • network security Contact: The Engineering Building, room 3225 kgaj@gmu.edu Office hours: Thursday, 7:30-8:30 PM, Tuesday, 6:00-7:00 PM, and by appointment ECE 545 Part of: MS in Computer Engineering One of five core courses (must be passed with B or better) Fundamental course for the specialization areas: Digital Systems Design Digital Signal Processing Elective course in the remaining specialization areas MS in Electrical Engineering Elective ECE 545 Part of: PhD in Electrical and Computer Engineering Knowledge tested at the Technical Qualifying Exam (TQE) Topic 2: Digital Design and Computer Organization I am interested in… I want to specialize primarily in… CAD tools & Design Automation VLSI Hardware Description Languages Recommended program & specialization MS CpE Digital Systems Design Digital Systems Design FPGAs & Reconfigurable computing ASICs & FPGAs Computer Arithmetic VHDL/Verilog Front-end ASIC Design (algorithmic downto gate level) CAD Tools Reconfigurable Computing Back-end ASIC Design (circuit and mask layout levels) Analog & Digital Circuit Design Microelectronics VLSI Fabrication VLSI Fabrication Microelectronics Nanoelectronics Nanoelectronics Semiconductor Devices MS EE Microelectronics/ Nanoelectronics Courses Design level Digital System Computer Design with VHDL Arithmetic VLSI Design VLSI Test for ASICs Concepts algorithmic register-transfer ECE 545 ECE 645 ECE 681 gate ECE 586 transistor layout devices ECE 680 ECE 682 Digital Integrated Circuits Physical VLSI Design Semiconductor ECE 584 ECE684 Device Fundamentals MOS Device Electronics CpE Digital Systems Design PreApproved Electives ECE 545 Digital System Design with VHDL ECE 586 Digital Integrated Circuits ECE 645 Computer Arithmetic ECE 681 VLSI Design for ASICs ECE 682 VLSI Test Concepts ECE 699 DSP HW Architectures CpE Microprocessors and Embedded Systems ECE 510 Real-Time Concepts ECE 511 Microprocessors ECE 611 Advanced Microprocessors ECE 612 Real-Time Embedded Systems ECE 641 Computer System Architecture Suggested Electives CS 540, 583 (languages, algorithms) CS 635 (parallel machines) ECE 584, 684, … (technology) ECE 511, 611, … (microprocessors) ECE 542, 642, 742 (networks) ECE 537, 646, 746, …(applications) ECE 645, 681 (digital design) ECE 548 (sequential mach. theory) Professors K. Gaj, K. Hintz, H. Homayoun, T. Storey, A. Cohen H. Homayoun, J. Kaps, P. Pachowicz, C. Sabzevari DIGITAL SYSTEMS DESIGN Concentration advisors: Kris Gaj, Ken Hintz, Houman Homayoun 1. ECE 545 Digital System Design with VHDL – K. Gaj, project, FPGA design with VHDL, 2. ECE 645 Computer Arithmetic – K. Gaj, project, FPGA design with VHDL 3. ECE 681 VLSI Design for ASICs – H. Homayoun, project/lab, front-end and back-end ASIC design with Synopsys tools 4. ECE 586 Digital Integrated Circuits – D. Ioannou, R. Mulpuri, 5a. ECE 682 VLSI Test Concepts – T. Storey 5b. ECE 699 Digital Signals Processing Hardware Architectures – A. Cohen, project, FPGA design with VHDL and Matlab/Simulink DIGITAL SIGNAL PROCESSING Concentration advisors: Aaron Cohen, Kris Gaj, Ken Hintz, Jill Nelson, Kathleen Wage 1. ECE 535 Digital Signal Processing – L. Griffiths, J. Nelson, Matlab 2. ECE 545 Digital System Design with VHDL – K. Gaj, project, FPGA design with VHDL 3. ECE 645 Computer Arithmetic – K. Gaj, project, FPGA design with VHDL 4. ECE 699 Digital Signals Processing Hardware Architectures – A. Cohen, project, FPGA design with VHDL and Matlab/Simulink 5a. ECE 537 Introduction to Digital Image Processing – K. Hintz 5b. ECE 738 Advanced Digital Signal Processing – K. Wage Grading Scheme • Homework - 15% • Project - 35% • Midterm Exam - 20% • Final Exam - 30% Midterm exam 1 2 hours 40 minutes in class design-oriented open-books, open-notes practice exams available on the web Tentative date: Last week of October Final exam 2 hours 45 minutes in class design-oriented open-books, open-notes practice exams available on the web Date: Thursday, December 12, 4:30-7:15pm Textbooks 13 Required Textbook Pong P. Chu, RTL Hardware Design Using VHDL, Wiley-Interscience, 2006. Supplementary Textbook – Basics Refresher Stephen Brown and Zvonko Vranesic, Fundamentals of Digital Logic with VHDL Design, McGraw-Hill, 3rd or 2nd Edition Supplementary Textbook – Advanced Hubert Kaeslin, Digital Integrated Circuit Design: From VLSI Architectures to CMOS Fabrication, Cambridge University Press; 1st Edition, 2008. Used in ECE 681 “VLSI Design for ASICs” Technology & Tools 17 What is an FPGA? Configurable Logic Blocks Block RAMs Block RAMs I/O Blocks Block RAMs FPGA Design process (1) Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds….. Specification / Pseudocode On-paper hardware design (Block diagram & ASM chart) VHDL description (Your Source Files) Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; Functional simulation entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core; Synthesis Post-synthesis simulation FPGA Design process (2) Implementation Timing simulation Configuration On chip testing Simulation Tools FPGA Synthesis Tools Logic Synthesis VHDL description architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW; Circuit netlist FPGA Implementation • After synthesis the entire implementation process is performed by FPGA vendor tools Design Process control from Active-HDL Xilinx FPGA Tools ECE Labs Aldec Active-HDL Design Flow Xilinx ISE Design Flow Aldec Active-HDL (IDE) ModelSim or ISim Xilinx XST or Synopsys Synplify Premier Xilinx XST or Synopsys Synplify Premier Xilinx ISE Design Suite Xilinx ISE Design Suite (IDE) simulation synthesis implementation Xilinx FPGA Tools Home Xilinx ISE Design Flow Aldec Active-HDL Design Flow Aldec Active-HDL Student Edition (IDE) ISim Xilinx XST (restricted) Xilinx XST (restricted) Xilinx ISE WebPACK (restricted) Xilinx ISE WebPACK (IDE) (restricted) simulation synthesis implementation Altera FPGA Tools ECE Labs Altera Design Flow Mentor Graphics ModelSim-Altera Altera Quartus II Subscription Edition simulation synthesis & implementation Altera FPGA Tools Home Altera Design Flow Mentor Graphics ModelSim-Altera Starter (restricted) Altera Quartus II Web Edition (restricted) simulation synthesis & implementation Project 33 Two Major Areas to Choose From Cryptography Digital Signal Processing Cryptography Project requires knowledge of C or Java related to the research project conducted by Cryptographic Engineering Research Group (CERG) at GMU supporting NIST (National Institute of Standards and Technology) in the evaluation of candidates for a new cryptographic standard DSP Project (1) requires knowledge of Matlab and basics of digital signal processing processing of waveform audio files (.wav, .wave) and images co-advised by Dr. Aaron Cohen recommended (but not required) for students who would like to specialize in digital signal processing soft introduction to ECE 699 Digital Signals Processing Hardware Architectures DSP Project (2) Examples of Topics: Signal Spectrum Analyzer (FFT) VoIP Conference Call voice mixing Background adaptive noise canceling Video/Audio codecs Classifiers (Speech Recognition) Adaptive Filtering with Quantization Compressed Sensing Solver for Piano Notes Background for Cryptography Projects 38 Crypto 101 Cryptography is Everywhere Buying a book on-line Teleconferencing over Intranets Withdrawing cash from ATM Backing up files on remote server Cryptographic Standards Before 1997 Secret-Key Block Ciphers IBM & NSA DES – Data Encryption Standard Triple DES 1993 1995 Hash Functions 2003 SHA-1–Secure Hash Algorithm NSA SHA-2 SHA 1970 2005 1999 1977 1980 1990 2000 2010 time Why a Contest for a Cryptographic Standard? • Avoid back-door theories • Speed-up the acceptance of the standard • Stimulate non-classified research on methods of designing a specific cryptographic transformation • Focus the effort of a relatively small cryptographic community Cryptographic Standard Contests IX.1997 X.2000 AES 15 block ciphers 1 winner NESSIE I.2000 XII.2002 CRYPTREC V.2008 XI.2004 34 stream ciphers 4 HW winners + 4 SW winners eSTREAM XII.2012 X.2007 51 hash functions 1 winner SHA-3 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 time Cryptographic Contests - Evaluation Criteria Security Software Efficiency μProcessors Hardware Efficiency μControllers Flexibility Simplicity FPGAs ASICs Licensing 44 Specific Challenges of Evaluations in Cryptographic Contests • Very wide range of possible applications, and as a result performance and cost targets throughput: single Mbits/s to hundreds Gbits/s cost: single cents to thousands of dollars • Winner in use for the next 20-30 years, implemented using technologies not in existence today • Large number of candidates • Limited time for evaluation • Only one winner and the results are final Mitigating Circumstances • Security is a primary criterion • Performance of competing algorithms tend to very significantly (sometimes as much as 500 times) • Only relatively large differences in performance matter (typically at least 20%) • Multiple groups independently implement the same algorithms (catching mistakes, comparing best results, etc.) • Second best may be good enough AES Contest 1997-2000 Rules of the Contest Each team submits Detailed cipher specification Justification of design decisions Source code in C Source code in Java Tentative results of cryptanalysis Test vectors AES: Candidate Algorithms 2 8 Canada: CAST-256 Deal USA: Mars RC6 Twofish Safer+ HPC Costa Rica: Frog 4 Germany: Magenta Belgium: Rijndael France: Korea: Crypton Japan: E2 1 DFC Israel, UK, Norway: Serpent Australia: LOKI97 AES Contest Timeline June 1998 15 Candidates CAST-256, Crypton, Deal, DFC, E2, Frog, HPC, LOKI97, Magenta, Mars, RC6, Rijndael, Safer+, Serpent, Twofish, August 1999 Round 1 Security Software efficiency Round 2 5 final candidates Mars, RC6, Twofish (USA) Rijndael, Serpent (Europe) October 2000 1 winner: Rijndael Belgium Security Software efficiency Hardware efficiency NIST Report: Security & Simplicity Security High MARS Twofish Serpent Rijndael Adequate RC6 Complex Simple Simplicity Efficiency in software: NIST-specified platform 200 MHz Pentium Pro, Borland C++ Throughput [Mbits/s] 128-bit key 192-bit key 30 256-bit key 25 20 15 10 5 0 Rijndael RC6 Twofish Mars Serpent NIST Report: Software Efficiency Encryption and Decryption Speed high medium low 32-bit processors 64-bit processors DSPs RC6 Rijndael Twofish Rijndael Twofish Rijndael Mars Twofish Mars RC6 Mars RC6 Serpent Serpent Serpent Efficiency in FPGAs: Speed Xilinx Virtex XCV-1000 Throughput [Mbit/s] 500 450 400 350 431 444 George Mason University 414 University of Southern California 353 Worcester Polytechnic Institute 294 300 250 200 150 100 177 173 149 143 104 62 112 88 102 61 50 0 Serpent Rijndael x8 Twofish Serpent RC6 x1 Mars Efficiency in ASICs: Speed Throughput [Mbit/s] MOSIS 0.5μm, NSA Group 700 606 128-bit key scheduling 600 500 3-in-1 (128, 192, 256 bit) key scheduling 443 400 300 202 202 200 105 105 103 104 57 57 100 0 Rijndael Serpent x1 Twofish RC6 Mars Lessons Learned Results for ASICs matched very well results for FPGAs, and were both very different than software FPGA ASIC x8 x1 GMU+USC, Xilinx Virtex XCV-1000 x1 NSA Team, ASIC, 0.5μm MOSIS Serpent fastest in hardware, slowest in software Lessons Learned Hardware results matter! Final round of the AES Contest, 2000 Speed in FPGAs GMU results Votes at the AES 3 conference Limitations of the AES Evaluation • Optimization for maximum throughput • Single high-speed architecture per candidate • No use of embedded resources of FPGAs (Block RAMs, dedicated multipliers) • Single FPGA family from a single vendor: Xilinx Virtex FPGA Evaluations AES eSTREAM SHA-3 Multiple FPGA families No No Yes Multiple architectures No Yes Yes Use of embedded resources No No Yes Primary optimization target Throughput Throughput/ Area Experimental results No Area Throughput/Ar ea No Availability of source codes No No Yes Specialized tools No No Yes Yes ASIC Evaluations AES eSTREAM SHA-3 Multiple processes/ libraries No No Yes Multiple architectures No Yes Yes Primary optimization target Throughput Power x Area Throughput x Time /Area Post-layout results No Yes Yes Experimental results No Yes Yes Availability of source codes No No Yes Specialized tools No No No Benchmarking Tools Tools for Benchmarking Implementations of Cryptography Software FPGAs eBACS ATHENa D. Bernstein (UIC) T. Lange (TUE) K. Gaj, J. Kaps, et al. (GMU) 2006-present 2009-present ASICs ? Benchmarking in Software: eBACS 63 eBACS: ECRYPT Benchmarking of Cryptographic Systems: http://bench.cr.yp.to/ SUPERCOP - toolkit developed by D. Bernstein and T. Lange for measuring performance of cryptographic software • measurements on multiple machines (currently over 90) • each implementation is recompiled multiple times (currently over 1600 times) with various compiler options • time measured in clock cycles/byte for multiple input/output sizes • median, lower quartile (25th percentile), and upper quartile (75th percentile) reported • standardized function arguments (common API) 64 SUPERCOP Extension for Microcontrollers – XBX: 2009-present Allows on-board timing measurements Supports at least the following microcontrollers: 8-bit: Atmel ATmega1284P (AVR) Developers: Christian Wenzel-Benner, ITK Engineering AG, Germany Jens Gräf, LiNetCo GmbH, Heiger, Germany 32-bit: TI AR7 (MIPS) Atmel AT91RM9200 (ARM 920T) Intel XScale IXP420 (ARM v5TE) Cortex-M3 (ARM) Benchmarking in FPGAs: ATHENa 66 ATHENa – Automated Tool for Hardware EvaluatioN http://cryptography.gmu.edu/athena Open-source benchmarking environment, written in Perl, aimed at AUTOMATED generation of OPTIMIZED results for MULTIPLE hardware platforms. The most recent version 0.6.4 released in December 2012. 67 Why Athena? "The Greek goddess Athena was frequently called upon to settle disputes between the gods or various mortals. Athena Goddess known for her superb logic and intellect. Her decisions were usually well-considered, highly ethical, and seldom motivated by self-interest.” from "Athena, Greek Goddess of Wisdom and Craftsmanship" 68 Basic Dataflow of ATHENa User FPGA Synthesis and Implementation 6 5 Database query ATHENa Server 2 Ranking of designs HDL + scripts + configuration files 3 Result Summary + Database Entries 1 HDL + FPGA Tools Download scripts and configuration files8 4 Designer Database Entries 0 Interfaces + Testbenches 69 Three Components of the ATHENa Environment • ATHENa Tool • ATHENa Database of Results • ATHENa Website ATHENa – Database of Results 71 ATHENa Database http://cryptography.gmu.edu/athenadb 72 ATHENa Database – Result View • Algorithm parameters • Design parameters Optimization target Architecture type Datapath width I/O bus widths Availability of source code Platform Vendor, Family, Device Timing Maximum clock frequency Maximum throughput Resource utilization Logic blocks (Slices/LEs/ALUTs) Multipliers/DSP units Tools Names & versions Detailed options Credits Designers & contact information 73 ATHENa Database – Compare Feature Matching fields in grey Non-matching fields in red and blue 74 ATHENa - Website 75 ATHENa Website http://cryptography.gmu.edu/athena/ • Download of ATHENa Tool • Links to related tools SHA-3 Competition in FPGAs & ASICs • Specifications of candidates • Interface proposals • RTL source codes • Testbenches • ATHENa database of results • Related papers & presentations 76 ATHENa Result Replication Files • Scripts and configuration files sufficient to easily reproduce all results (without repeating optimizations) • Automatically created by ATHENa for all results generated using ATHENa • Stored in the ATHENa Database In the same spirit of Reproducible Research as: • J. Claerbout (Stanford University) “Electronic documents give reproducible research a new meaning,” in Proc. 62nd Ann. Int. Meeting of the Soc. of Exploration Geophysics, 1992, http://sepwww.stanford.edu/doku.php?id=sep:research:reproducible:seg92 ..... • Patrick Vandewalle1, Jelena Kovacevic2, and Martin Vetterli1 (1EPFL, 2CMU) Reproducible research in signal processing - what, why, and how. IEEE Signal Processing Magazine, May 2009. http://rr.epfl.ch/17/ 77 Benchmarking Goals Facilitated by ATHENa Comparing multiple: 1. cryptographic algorithms 2. hardware architectures or implementations of the same cryptographic algorithm 3. hardware platforms from the point of view of their suitability for the implementation of a given algorithm, (e.g., choice of an FPGA device or FPGA board) 4. tools and languages in terms of quality of results they generate (e.g. Verilog vs. VHDL, Synplicity Synplify Premier vs. Xilinx XST, ISE v. 13.1 vs. ISE v. 12.3) 78 Your Project: Implementation and Benchmarking of Authenticated Ciphers 79 Features of Authenticated Ciphers 1. Confidentiality Bob Alice Charlie 2. Message integrity Bob Alice Charlie 3. Message authentication Bob Alice Charlie All Projects - Organization • Projects divided into phases • Deliverables for each phase submitted through Blackboard at selected checkpoints and evaluated by the instructor and/or TA • Feedback provided to students on a best effort basis • Final report and codes submitted using Blackboard at the end of the semester Honor Code Rules • All students are expected to write and debug their codes individually • Students are encouraged to help and support each other in all problems related to the - operation of the CAD tools - understanding of an investigated algorithm and existing implementations - understanding of the project tasks Additional Skills Learned in the Project • Reading & understanding specification of a complex algorithm • Design of new hardware architectures based on existing architectures (datapath & controller) • Reading, understanding, and modifying existing VHDL code • Using embedded resources of modern FPGAs • Characterizing performance of your codes for multiple FPGA families 83