Kris Gaj Research and teaching interests: • cryptography • computer arithmetic • VLSI design and testing Contact: Engineering Bldg., room 3225 kgaj@gmu.edu (703) 993-1575 Office hours: Monday, 7:30-8:30 PM, Tuesday & Thursday 4:30-5:30 PM, and by appointment ECE 645 Part of: MS in CpE Digital Systems Design – pre-approved course Other concentration areas – elective course MS in EE Certificate in VLSI Design/Manufacturing PhD in ECE PhD in IT DIGITAL SYSTEMS DESIGN 1. ECE 545 Digital System Design with VHDL – K. Gaj, project, FPGA design with VHDL, Aldec/Xilinx/Altera 2. ECE 645 Computer Arithmetic – K. Gaj, project, FPGA design with VHDL or Verilog, Aldec/Xilinx/Altera/Synopsys 3. ECE 586 Digital Integrated Circuits – D. Ioannou 4. ECE 681 VLSI Design for ASICs – N. Klimavicz, project/lab, front-end and back-end ASIC design with Synopsys tools 5. ECE 682 VLSI Test Concepts – T. Storey, homework Prerequisites ECE 545 Digital System Design with VHDL or Permission of the instructor, granted assuming that you know VHDL or Verilog, High level programming language (preferably C) Prerequisite knowledge • This class assumes proficiency with the FPGA CAD tools from ECE 545 • You are expected to be proficient with: – Synthesizable VHDL coding – Advanced VHDL testbenches, including file input/output – Xilinx FPGA synthesis and post-synthesis simulation – Xilinx FPGA place-and-route and post-place and route simulation – Reading and interpreting all synthesis and implementation reports Course web page ECE web page Courses Course web pages ECE 645 http://ece.gmu.edu/coursewebpages/ECE/ECE645/S11/ Computer Arithmetic Lecture Homework 10 % Midterm exam (in class) 15 % Final Exam (in class) 25 % Project Project 1 20 % Project 2 30 % Advanced digital circuit design course covering Efficient • addition and subtraction • multiplication • division and modular reduction • exponentiation Integers unsigned and signed Real numbers Elements of the Galois field GF(2n) • fixed point • single and double precision floating point • polynomial base Lecture topics INTRODUCTION 1. Applications of computer arithmetic algorithms. Initial Discussion of Project Topics. ADDITION AND SUBTRACTION 1. Basic addition, subtraction, and counting 2. Carry-lookahead, carry-select, and hybrid adders 3. Adders based on Parallel Prefix Networks MULTIOPERAND ADDITION 1. Carry-save adders 2. Wallace and Dadda Trees 3. Adding multiple unsigned and signed numbers TECHNOLOGY 1. Internal Structure of Xilinx and Altera FPGAs 2. Two-operand and multi-operand addition in FPGAs 3. Pipelining NUMBER REPRESENTATIONS • Unsigned Integers • Signed Integers • Fixed-point real numbers • Floating-point real numbers • Elements of the Galois Field GF(2n) LONG INTEGER ARITHMETIC 1. Modular Exponentiation 2. Montgomery Multipliers and Exponentiation Units MULTIPLICATION 1. Tree and array multipliers 2. Sequential multipliers 3. Multiplication of signed numbers and squaring TECHNOLOGY Multiplication in Xilinx and Altera FPGAs - using distributed logic - using embedded multipliers - using DSP blocks DIVISION 1. Basic restoring and non-restoring sequential dividers 2. SRT and high-radix dividers 3. Array dividers FLOATING POINT AND GALOIS FIELD ARITHMETIC 1. Floating-point units 2. Galois Field GF(2n) units Literature (1) Required textbook: Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design, 2nd edition, Oxford University Press, 2010. Literature (2) Recommended textbooks: Jean-Pierre Deschamps, Gery Jean Antoine Bioul, Gustavo D. Sutter, Synthesis of Arithmetic Circuits: FPGA, ASIC and Embedded Systems, Wiley-Interscience, 2006. Milos D. Ercegovac and Tomas Lang Digital Arithmetic, Morgan Kaufmann Publishers, 2004. Isreal Koren, Computer Arithmetic Algorithms, 2nd edition, A. K. Peters, Natick, MA, 2002. Literature (2) VHDL books: 1. Pong P. Chu, RTL Hardware Design Using VHDL: Coding for Efficiency, Portability, and Scalability, Wiley-IEEE Press, 2006. 2. Volnei A. Pedroni, Circuit Design with VHDL, The MIT Press, 2004. 3. Sundar Rajan, Essential VHDL: RTL Synthesis Done Right, S & G Publishing, 1998. Literature (3) Supplementary books: 1. E. E. Swartzlander, Jr., Computer Arithmetic, vols. I and II, IEEE Computer Society Press, 1990. 2. Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone, Handbook of Applied Cryptology, Chapter 14, Efficient Implementation, CRC Press, Inc., 1998. Literature (3) Proceedings of conferences ARITH - International Symposium on Computer Arithmetic ASIL - Asilomar Conference on Signals, Systems, and Computers ICCD - International Conference on Computer Design CHES - Workshop on Cryptographic Hardware and Embedded Systems Journals and periodicals IEEE Transactions on Computers, in particular special issues on computer arithmetic. IEEE Transactions on Circuits and Systems IEEE Transactions on Very Large Scale Integration IEE Proceedings: Computer and Digital Techniques Journal of VLSI Signal Processing Homework • reading assignments • design of small hardware units using VHDL • analysis of computer arithmetic algorithms and implementations Midterm exams Midterm Exam - 2 hrs 30 minutes, in class multiple choice + short problems Final Exam – 2 hrs 45 minutes comprehensive conceptual questions, analysis and design of arithmetic units Practice exams on the web Tentative days of exams: Midterm Exam - Monday, March 28 Final Exam - Tuesday, May 16, 4:30-7:15 PM Project 1 Project I (individual, 20% of grade) Optimizing addition in Skein Choosing optimal architecture for • combinational adder • pipelined adder in • Xilinx FPGAs (Virtex 5 & Virtex 6) • Altera FPGAs (Stratix III & Stratix IV) • ASICs (bonus) Done individually Final report due Monday, March 14 Basic Operations of 14 SHA-3 Candidates NTT – Number Theoretic Transform, GF MUL – Galois Field multiplication, 27 MUL – integer multiplication, mADDn – multioperand addition with n operands 27 Basic operation in Skein x1 and Skein x4 Basic operation, MIX, in Skein x1 (basic iterative) Basic operation, MIX, in Skein x4 (4 times unrolled) 28 How to Increase the Speed? : The case for pipelining and parallel processing • Protocols: IPSec, SSL, WLAN (802.11) • Required Throughput Range: 100 Mbit/s - 40 Gbit/s (based on the specs of Security Processors from Cavium Networks, HiFn, and Broadcom) • Supported sizes of packets: 40B - 1500B 1500 B = Maximum Transmission Unit (MTU) for Ethernet v2 576 B = Maximum Transmission Unit (MTU) for Internet IPv4 Path • Most Common Operation Involving Hashing: HMAC 29 Cumulative Distribution of Packet Sizes 30 Multiple Packets Available for Parallel Processing 31 Parallel Processing Data Stream 1 . . . . . . . . Data Stream k IV IV H H . . . . . + + R Wt Step t CLR Kt R Wt Step t CLR Kt 32 Pipelining IV IV H H + R1 11 stepStage t, stage R2 Wt 22 stepStage t, stage Kt 33 Project 2 Project I (in groups of two or individually, 30% of grade) Modular Exponentiation of Large Integers Investigation of alternative architectures for the best performance in terms of • Latency • Latency x Area product in • Xilinx FPGAs (Virtex 5 & Virtex 6) • Altera FPGAs (Stratix III & Stratix IV) • ASICs (bonus) Final report due Monday, May 9 Primary applications (1) Execution units of general purpose microprocessors Integer units Floating point units Integers (8, 16, 32, 64 bits) Real numbers (32, 64 bits) Primary applications (2) Digital signal and digital image processing e.g., digital filters Discrete Fourier Transform Discrete Hilbert Transform General purpose DSP processors Specialized circuits Real or complex numbers (fixed-point or floating point) Primary applications (3) Coding Error detection codes Error correcting codes Elements of the Galois fields GF(2n) (4-64 bits) Secret-key (Symmetric) Cryptosystems key of Alice and Bob - KAB key of Alice and Bob - KAB Network Encryption Alice Decryption Bob Hash Function arbitrary length m message h It is computationally infeasible to find such m and m’ that h(m)=h(m’) h(m) fixed length hash function hash value Primary applications (4) Cryptography IDEA, RC6, Mars, SHA-3 candidates Twofish, Rijndael, SHA-3 candidates Integers (16, 32, 64 bits) Elements of the Galois field GF(2n) (4, 8 bits) Main operations RC6 2 x SQR32, 2 x ROL32 MARS MUL32, 2 x ROL32, S-box 9x32 Twofish 96 S-box 4x4, 24 MUL GF(28) Auxiliary operations XOR, ADD/SUB32 XOR, ADD/SUB32 XOR ADD32 Rijndael 16 S-box 8x8 24 MUL GF(28) XOR Serpent 8 x 32 S-box 4x4 XOR Basic Operations of 14 SHA-3 Candidates NTT – Number Theoretic Transform, GF MUL – Galois Field multiplication, 42 MUL – integer multiplication, mADDn – multioperand addition with n operands 42 Public Key (Asymmetric) Cryptosystems Private key of Bob - kB Public key of Bob - KB Network Encryption Alice Decryption Bob RSA as a trap-door one-way function PUBLIC KEY M C = f(M) = Me mod N C M = f-1(C) = Cd mod N PRIVATE KEY N=PQ P, Q - large prime numbers e d 1 mod ((P-1)(Q-1)) RSA keys PUBLIC KEY PRIVATE KEY { e, N } { d, P, Q } N=PQ P, Q - large prime numbers e d 1 mod ((P-1)(Q-1)) Primary applications (5) Cryptography Public key cryptography RSA, DSA, Diffie-Hellman Elliptic Curve Cryptosystems Long integers (1k-16k bits) Elements of the Galois field GF(2n) (150-500 bits) Primary applications (5) Cipher Breaking Public key cryptography RSA PUBLIC KEY RSA PRIVATE KEY { e, N } { d, P, Q } N=PQ P, Q e d 1 mod ((P-1)(Q-1))