PPT - the GMU ECE Department

advertisement
ECE 545
Digital System Design with VHDL
Course web page:
ECE web page  Courses  ECE 545
Kris Gaj
Research and teaching interests:
• reconfigurable computing
• computer arithmetic
• cryptography
• network security
Contact:
The Engineering Building, room 3225
kgaj@gmu.edu
Office hours: Thursday, 7:30-8:30 PM,
Tuesday, 6:00-7:00 PM,
and by appointment
ECE 545
Part of:
MS in Computer Engineering
One of five core courses
(must be passed with B or better)
Fundamental course for the specialization areas:
Digital Systems Design
Digital Signal Processing
Elective course in the remaining specialization areas
MS in Electrical Engineering
Elective
ECE 545
Part of:
PhD in Electrical and Computer Engineering
Knowledge tested at the
Technical Qualifying Exam (TQE)
Topic 2: Digital Design and Computer Organization
I am interested
in…
I want to specialize
primarily in…
CAD tools & Design Automation
VLSI
Hardware Description Languages
Recommended
program &
specialization
MS CpE
Digital Systems Design
Digital Systems Design FPGAs & Reconfigurable computing
ASICs & FPGAs
Computer Arithmetic
VHDL/Verilog
Front-end ASIC Design
(algorithmic downto gate level)
CAD Tools
Reconfigurable
Computing
Back-end ASIC Design
(circuit and mask layout levels)
Analog & Digital Circuit Design
Microelectronics
VLSI Fabrication
VLSI Fabrication
Microelectronics
Nanoelectronics
Nanoelectronics
Semiconductor Devices
MS EE
Microelectronics/
Nanoelectronics
Courses
Design level
Digital System Computer
Design with VHDL Arithmetic
VLSI Design VLSI Test
for ASICs
Concepts
algorithmic
register-transfer
ECE
545
ECE
645
ECE
681
gate
ECE
586
transistor
layout
devices
ECE
680
ECE
682
Digital
Integrated
Circuits
Physical
VLSI Design
Semiconductor
ECE 584
ECE684
Device Fundamentals
MOS Device
Electronics
CpE
Digital Systems Design
PreApproved
Electives
ECE 545 Digital System Design
with VHDL
ECE 586 Digital Integrated Circuits
ECE 645 Computer Arithmetic
ECE 681 VLSI Design for ASICs
ECE 682 VLSI Test Concepts
ECE 699 DSP HW Architectures
CpE
Microprocessors and
Embedded Systems
ECE 510 Real-Time Concepts
ECE 511 Microprocessors
ECE 611 Advanced Microprocessors
ECE 612 Real-Time Embedded
Systems
ECE 641 Computer System
Architecture
Suggested
Electives
CS 540, 583 (languages, algorithms)
CS 635
(parallel machines)
ECE 584, 684, … (technology)
ECE 511, 611, … (microprocessors) ECE 542, 642, 742 (networks)
ECE 537, 646, 746, …(applications) ECE 645, 681 (digital design)
ECE 548 (sequential mach. theory)
Professors
K. Gaj, K. Hintz, H. Homayoun,
T. Storey, A. Cohen
H. Homayoun, J. Kaps, P. Pachowicz,
C. Sabzevari
DIGITAL SYSTEMS DESIGN
Concentration advisors: Kris Gaj, Ken Hintz, Houman Homayoun
1. ECE 545 Digital System Design with VHDL
– K. Gaj, project, FPGA design with VHDL,
2. ECE 645 Computer Arithmetic
– K. Gaj, project, FPGA design with VHDL
3. ECE 681 VLSI Design for ASICs
– H. Homayoun, project/lab, front-end and back-end ASIC design with
Synopsys tools
4. ECE 586 Digital Integrated Circuits
– D. Ioannou, R. Mulpuri,
5a. ECE 682 VLSI Test Concepts
– T. Storey
5b. ECE 699 Digital Signals Processing Hardware Architectures
– A. Cohen, project, FPGA design with VHDL and Matlab/Simulink
DIGITAL SIGNAL PROCESSING
Concentration advisors: Aaron Cohen, Kris Gaj, Ken Hintz, Jill Nelson,
Kathleen Wage
1. ECE 535 Digital Signal Processing
– L. Griffiths, J. Nelson, Matlab
2. ECE 545 Digital System Design with VHDL
– K. Gaj, project, FPGA design with VHDL
3. ECE 645 Computer Arithmetic
– K. Gaj, project, FPGA design with VHDL
4. ECE 699 Digital Signals Processing Hardware Architectures
– A. Cohen, project, FPGA design with VHDL and Matlab/Simulink
5a. ECE 537 Introduction to Digital Image Processing
– K. Hintz
5b. ECE 738 Advanced Digital Signal Processing
– K. Wage
Grading Scheme
• Homework
- 15%
• Project
- 35%
• Midterm Exam
- 20%
• Final Exam
- 30%
Midterm exam 1
 2 hours 40 minutes
 in class
 design-oriented
 open-books, open-notes
 practice exams available on the web
Tentative date:
Last week of October
Final exam
 2 hours 45 minutes
 in class
 design-oriented
 open-books, open-notes
 practice exams available on the web
Date:
Thursday, December 12, 4:30-7:15pm
Textbooks
13
Required Textbook
Pong P. Chu, RTL Hardware Design Using VHDL,
Wiley-Interscience, 2006.
Supplementary Textbook – Basics Refresher
Stephen Brown and Zvonko Vranesic,
Fundamentals of Digital Logic with VHDL Design,
McGraw-Hill, 3rd or 2nd Edition
Supplementary Textbook – Advanced
Hubert Kaeslin, Digital Integrated Circuit Design:
From VLSI Architectures to CMOS Fabrication,
Cambridge University Press; 1st Edition, 2008.
Used in ECE 681
“VLSI Design for ASICs”
Technology
&
Tools
17
What is an FPGA?
Configurable
Logic
Blocks
Block RAMs
Block RAMs
I/O
Blocks
Block
RAMs
FPGA Design process (1)
Design and implement a simple unit permitting to
speed up encryption with RC5-similar cipher with
fixed key set on 8031 microcontroller. Unlike in
the experiment 5, this time your unit has to be able
to perform an encryption algorithm by itself,
executing 32 rounds…..
Specification / Pseudocode
On-paper hardware design
(Block diagram & ASM chart)
VHDL description (Your Source Files)
Library IEEE;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
Functional simulation
entity RC5_core is
port(
clock, reset, encr_decr: in std_logic;
data_input: in std_logic_vector(31 downto 0);
data_output: out std_logic_vector(31 downto 0);
out_full: in std_logic;
key_input: in std_logic_vector(31 downto 0);
key_read: out std_logic;
);
end AES_core;
Synthesis
Post-synthesis simulation
FPGA Design process (2)
Implementation
Timing simulation
Configuration
On chip testing
Simulation Tools
FPGA Synthesis Tools
Logic Synthesis
VHDL description
architecture MLU_DATAFLOW of MLU is
signal A1:STD_LOGIC;
signal B1:STD_LOGIC;
signal Y1:STD_LOGIC;
signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC;
begin
A1<=A when (NEG_A='0') else
not A;
B1<=B when (NEG_B='0') else
not B;
Y<=Y1 when (NEG_Y='0') else
not Y1;
MUX_0<=A1 and B1;
MUX_1<=A1 or B1;
MUX_2<=A1 xor B1;
MUX_3<=A1 xnor B1;
with (L1 & L0) select
Y1<=MUX_0 when "00",
MUX_1 when "01",
MUX_2 when "10",
MUX_3 when others;
end MLU_DATAFLOW;
Circuit netlist
FPGA Implementation
• After synthesis the entire implementation
process is performed by FPGA vendor tools
Design Process control from Active-HDL
Xilinx FPGA Tools
ECE Labs
Aldec Active-HDL
Design Flow
Xilinx ISE
Design Flow
Aldec Active-HDL (IDE)
ModelSim or ISim
Xilinx XST
or
Synopsys Synplify Premier
Xilinx XST
or
Synopsys Synplify Premier
Xilinx ISE Design Suite
Xilinx ISE Design Suite (IDE)
simulation
synthesis
implementation
Xilinx FPGA Tools
Home
Xilinx ISE
Design Flow
Aldec Active-HDL
Design Flow
Aldec Active-HDL
Student Edition (IDE)
ISim
Xilinx XST
(restricted)
Xilinx XST
(restricted)
Xilinx ISE WebPACK
(restricted)
Xilinx ISE WebPACK (IDE)
(restricted)
simulation
synthesis
implementation
Altera FPGA Tools
ECE Labs
Altera
Design Flow
Mentor Graphics ModelSim-Altera
Altera Quartus II Subscription Edition
simulation
synthesis & implementation
Altera FPGA Tools
Home
Altera
Design Flow
Mentor Graphics ModelSim-Altera Starter
(restricted)
Altera Quartus II Web Edition
(restricted)
simulation
synthesis & implementation
Project
33
Two Major Areas to Choose From
 Cryptography
 Digital Signal Processing
Cryptography Project
 requires knowledge of C or Java
related to the research project conducted by
Cryptographic Engineering Research Group (CERG)
at GMU
 supporting NIST (National Institute of Standards
and Technology) in the evaluation of candidates
for a new cryptographic standard
DSP Project (1)
 requires knowledge of Matlab and
basics of digital signal processing
 processing of waveform audio files (.wav, .wave)
and images
 co-advised by Dr. Aaron Cohen
 recommended (but not required) for students
who would like to specialize in digital signal processing
soft introduction to ECE 699 Digital Signals Processing
Hardware Architectures
DSP Project (2)
Examples of Topics:
 Signal Spectrum Analyzer (FFT)
 VoIP Conference Call voice mixing
 Background adaptive noise canceling
 Video/Audio codecs
 Classifiers (Speech Recognition)
 Adaptive Filtering with Quantization
 Compressed Sensing Solver for Piano Notes
Background
for
Cryptography
Projects
38
Crypto 101
Cryptography is Everywhere
Buying a book on-line
Teleconferencing
over Intranets
Withdrawing cash from ATM
Backing up files
on remote server
Cryptographic Standards Before 1997
Secret-Key Block Ciphers
IBM
& NSA
DES – Data Encryption Standard
Triple DES
1993 1995
Hash Functions
2003
SHA-1–Secure Hash Algorithm
NSA
SHA-2
SHA
1970
2005
1999
1977
1980
1990
2000
2010
time
Why a Contest for
a Cryptographic Standard?
• Avoid back-door theories
• Speed-up the acceptance of the standard
• Stimulate non-classified research on methods of
designing a specific cryptographic transformation
• Focus the effort of a relatively small cryptographic
community
Cryptographic Standard Contests
IX.1997
X.2000
AES
15 block ciphers  1 winner
NESSIE
I.2000
XII.2002
CRYPTREC
V.2008
XI.2004
34 stream ciphers  4 HW winners
+ 4 SW winners
eSTREAM
XII.2012
X.2007
51 hash functions  1 winner
SHA-3
96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13
time
Cryptographic Contests - Evaluation Criteria
Security
Software Efficiency
μProcessors
Hardware Efficiency
μControllers
Flexibility
Simplicity
FPGAs
ASICs
Licensing
44
Specific Challenges of Evaluations
in Cryptographic Contests
• Very wide range of possible applications, and as a result
performance and cost targets
throughput:
single Mbits/s to hundreds Gbits/s
cost:
single cents to thousands of dollars
• Winner in use for the next 20-30 years, implemented using
technologies not in existence today
• Large number of candidates
• Limited time for evaluation
• Only one winner and the results are final
Mitigating Circumstances
• Security is a primary criterion
• Performance of competing algorithms tend to very significantly
(sometimes as much as 500 times)
• Only relatively large differences in performance matter
(typically at least 20%)
• Multiple groups independently implement the same algorithms
(catching mistakes, comparing best results, etc.)
• Second best may be good enough
AES
Contest
1997-2000
Rules of the Contest
Each team submits
Detailed
cipher
specification
Justification
of design
decisions
Source
code
in C
Source
code
in Java
Tentative
results
of cryptanalysis
Test
vectors
AES: Candidate Algorithms
2
8
Canada:
CAST-256
Deal
USA: Mars
RC6
Twofish
Safer+
HPC
Costa Rica:
Frog
4
Germany:
Magenta
Belgium:
Rijndael
France:
Korea:
Crypton
Japan:
E2
1
DFC
Israel, UK,
Norway:
Serpent
Australia:
LOKI97
AES Contest Timeline
June 1998
15 Candidates
CAST-256, Crypton, Deal, DFC, E2,
Frog, HPC, LOKI97, Magenta, Mars,
RC6, Rijndael, Safer+, Serpent, Twofish,
August 1999
Round 1
Security
Software efficiency
Round 2
5 final candidates
Mars, RC6, Twofish (USA)
Rijndael, Serpent (Europe)
October 2000
1 winner: Rijndael
Belgium
Security
Software efficiency
Hardware efficiency
NIST Report: Security & Simplicity
Security
High
MARS
Twofish
Serpent
Rijndael
Adequate
RC6
Complex
Simple
Simplicity
Efficiency in software: NIST-specified platform
200 MHz Pentium Pro, Borland C++
Throughput [Mbits/s]
128-bit key
192-bit key
30
256-bit key
25
20
15
10
5
0
Rijndael
RC6
Twofish
Mars
Serpent
NIST Report: Software Efficiency
Encryption and Decryption Speed
high
medium
low
32-bit
processors
64-bit
processors
DSPs
RC6
Rijndael
Twofish
Rijndael
Twofish
Rijndael
Mars
Twofish
Mars
RC6
Mars
RC6
Serpent
Serpent
Serpent
Efficiency in FPGAs: Speed
Xilinx Virtex XCV-1000
Throughput [Mbit/s]
500
450
400
350
431
444
George Mason University
414
University of Southern California
353
Worcester Polytechnic Institute
294
300
250
200
150
100
177
173
149
143
104
62
112
88
102
61
50
0
Serpent Rijndael
x8
Twofish Serpent RC6
x1
Mars
Efficiency in ASICs: Speed
Throughput [Mbit/s]
MOSIS 0.5μm, NSA Group
700
606
128-bit key scheduling
600
500
3-in-1 (128, 192, 256 bit) key scheduling
443
400
300
202 202
200
105 105
103 104
57 57
100
0
Rijndael Serpent
x1
Twofish
RC6
Mars
Lessons Learned
Results for ASICs matched very well results for FPGAs,
and were both very different than software
FPGA
ASIC
x8
x1
GMU+USC, Xilinx Virtex XCV-1000
x1
NSA Team, ASIC, 0.5μm MOSIS
Serpent fastest in hardware, slowest in software
Lessons Learned
Hardware results matter!
Final round of the AES Contest, 2000
Speed in FPGAs
GMU results
Votes at the AES 3 conference
Limitations of the AES Evaluation
•
Optimization for maximum throughput
•
Single high-speed architecture per candidate
•
No use of embedded resources of FPGAs
(Block RAMs, dedicated multipliers)
•
Single FPGA family from a single vendor:
Xilinx Virtex
FPGA Evaluations
AES
eSTREAM
SHA-3
Multiple FPGA families
No
No
Yes
Multiple architectures
No
Yes
Yes
Use of embedded
resources
No
No
Yes
Primary optimization
target
Throughput
Throughput/
Area
Experimental results
No
Area
Throughput/Ar
ea
No
Availability of source
codes
No
No
Yes
Specialized tools
No
No
Yes
Yes
ASIC Evaluations
AES
eSTREAM
SHA-3
Multiple processes/
libraries
No
No
Yes
Multiple architectures
No
Yes
Yes
Primary optimization
target
Throughput
Power x Area Throughput
x Time
/Area
Post-layout results
No
Yes
Yes
Experimental results
No
Yes
Yes
Availability of source
codes
No
No
Yes
Specialized tools
No
No
No
Benchmarking
Tools
Tools for Benchmarking
Implementations of Cryptography
Software
FPGAs
eBACS
ATHENa
D. Bernstein (UIC)
T. Lange (TUE)
K. Gaj,
J. Kaps, et al.
(GMU)
2006-present
2009-present
ASICs
?
Benchmarking
in Software: eBACS
63
eBACS: ECRYPT Benchmarking of
Cryptographic Systems:
http://bench.cr.yp.to/
SUPERCOP - toolkit developed by D. Bernstein and T. Lange
for measuring performance of cryptographic software
•
measurements on multiple machines (currently over 90)
•
each implementation is recompiled multiple times
(currently over 1600 times) with various compiler options
•
time measured in clock cycles/byte for multiple
input/output sizes
•
median, lower quartile (25th percentile), and upper quartile
(75th percentile) reported
•
standardized function arguments (common API)
64
SUPERCOP Extension for Microcontrollers –
XBX: 2009-present
Allows on-board timing measurements
Supports at least the following
microcontrollers:
8-bit:
Atmel ATmega1284P (AVR)
Developers:
 Christian Wenzel-Benner,
ITK Engineering AG, Germany
 Jens Gräf, LiNetCo GmbH,
Heiger, Germany
32-bit:
TI AR7 (MIPS)
Atmel AT91RM9200 (ARM 920T)
Intel XScale IXP420 (ARM v5TE)
Cortex-M3 (ARM)
Benchmarking
in FPGAs: ATHENa
66
ATHENa – Automated Tool for Hardware
EvaluatioN
http://cryptography.gmu.edu/athena
Open-source benchmarking environment,
written in Perl, aimed at
AUTOMATED generation of
OPTIMIZED results for
MULTIPLE hardware platforms.
The most recent version
0.6.4 released in December 2012.
67
Why Athena?
"The Greek goddess Athena was frequently
called upon to settle disputes between
the gods or various mortals. Athena Goddess
known for her superb logic and intellect.
Her decisions were usually well-considered,
highly ethical, and seldom motivated
by self-interest.”
from "Athena, Greek Goddess
of Wisdom and Craftsmanship"
68
Basic Dataflow of ATHENa
User
FPGA Synthesis and
Implementation
6
5
Database
query
ATHENa
Server
2
Ranking
of designs
HDL + scripts +
configuration files
3
Result Summary
+ Database
Entries
1
HDL + FPGA Tools
Download scripts
and
configuration files8
4
Designer
Database
Entries
0
Interfaces
+ Testbenches
69
Three Components of the ATHENa
Environment
• ATHENa Tool
• ATHENa Database of Results
• ATHENa Website
ATHENa – Database
of Results
71
ATHENa Database
http://cryptography.gmu.edu/athenadb
72
ATHENa Database – Result View
• Algorithm parameters
• Design parameters
 Optimization target
 Architecture type
 Datapath width
 I/O bus widths
 Availability of source code
 Platform
 Vendor, Family, Device
 Timing
 Maximum clock frequency
 Maximum throughput
 Resource utilization
 Logic blocks (Slices/LEs/ALUTs)
 Multipliers/DSP units
 Tools
 Names & versions
 Detailed options
 Credits
 Designers & contact information
73
ATHENa Database – Compare Feature
Matching fields in grey
Non-matching fields in red and blue
74
ATHENa - Website
75
ATHENa Website
http://cryptography.gmu.edu/athena/
• Download of ATHENa Tool
• Links to related tools
SHA-3 Competition in FPGAs & ASICs
• Specifications of candidates
• Interface proposals
• RTL source codes
• Testbenches
• ATHENa database of results
• Related papers & presentations
76
ATHENa Result Replication Files
• Scripts and configuration files sufficient to easily
reproduce all results (without repeating optimizations)
• Automatically created by ATHENa for all
results generated using ATHENa
• Stored in the ATHENa Database
In the same spirit of Reproducible Research as:
• J. Claerbout (Stanford University)
“Electronic documents give reproducible research a new meaning,”
in Proc. 62nd Ann. Int. Meeting of the Soc. of Exploration Geophysics, 1992,
http://sepwww.stanford.edu/doku.php?id=sep:research:reproducible:seg92
.....
• Patrick Vandewalle1, Jelena Kovacevic2, and Martin Vetterli1 (1EPFL, 2CMU)
Reproducible research in signal processing - what, why, and how.
IEEE Signal Processing Magazine, May 2009. http://rr.epfl.ch/17/
77
Benchmarking Goals Facilitated by ATHENa
Comparing multiple:
1. cryptographic algorithms
2. hardware architectures or implementations
of the same cryptographic algorithm
3. hardware platforms from the point of view
of their suitability for the implementation of a given algorithm,
(e.g., choice of an FPGA device or FPGA board)
4. tools and languages in terms of quality
of results they generate (e.g. Verilog vs. VHDL,
Synplicity Synplify Premier vs. Xilinx XST,
ISE v. 13.1 vs. ISE v. 12.3)
78
Your Project:
Implementation and
Benchmarking of
Authenticated
Ciphers
79
Features of Authenticated Ciphers
1. Confidentiality
Bob
Alice
Charlie
2. Message integrity
Bob
Alice
Charlie
3. Message authentication
Bob
Alice
Charlie
All Projects - Organization
• Projects divided into phases
• Deliverables for each phase submitted through
Blackboard at selected checkpoints and evaluated
by the instructor and/or TA
• Feedback provided to students on a best effort basis
• Final report and codes submitted using Blackboard
at the end of the semester
Honor Code Rules
• All students are expected to write and debug
their codes individually
• Students are encouraged to help and support each
other in all problems related to the
- operation of the CAD tools
- understanding of an investigated algorithm and
existing implementations
- understanding of the project tasks
Additional Skills Learned in the Project
• Reading & understanding specification of a complex
algorithm
• Design of new hardware architectures based on
existing architectures (datapath & controller)
• Reading, understanding, and modifying existing
VHDL code
• Using embedded resources of modern FPGAs
• Characterizing performance of your codes
for multiple FPGA families
83
Download