The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu

advertisement
The CMU Reconfigurable
Computing Project
April 9, 1999
Mihai Budiu
mihaib@cs.cmu.edu
SSS 4/9/99
CMU Reconfigurable Computing
1
Current Project Members
CS Department
ECE Department
Seth Copen Goldstein
Mihai Budiu
Herman Schmit
Srihari Cadambi
Matt Moe
Robert Taylor
Ronald Laufer
SSS 4/9/99
CMU Reconfigurable Computing
2
Why Study Reconfigurable Hardware?
It is a nice computation paradigm
(wire your own computer)
SSS 4/9/99
CMU Reconfigurable Computing
3
Why Study Reconfigurable Hardware
Algorithm Year System
Versus Speedup x
DNA matching
1992
SPLASH 2
SPARC 10
4300
FIR Filter
1998
PipeRench
90
IDEA Encryption
1998
SAT solver
1997
Ray Casting
1995
Hidden Markov
Model
DES Encryption
1996
UltraSparc
300Mhz
PipeRench
UltraSparc
300Mhz
Pamette
SPARC 5
110Mhz
RIPP-10
Pentium
75Mhz
1 Xilinx FPGA SPARC 10
1996
GARP
SPEC92
1994
MIPS+RC
SSS 4/9/99
UltraSparc
170Mhz
MIPS
CMU Reconfigurable Computing
61
17--1100
33.8
24.4
24
1.22
4
Commercial Players
Source: In-stat April 1998
*Does not include software, hardwire or support EPROMs
SSS 4/9/99
CMU Reconfigurable Computing
5
What Is “Reconfigurable Hardware?”
Interconnection
network
Universal gates
and/or
storage elements
Switches
SSS 4/9/99
CMU Reconfigurable Computing
6
Basic Ingredient: RAM cell
a0
a1
0
0
0
1
data
a0
a1
a1 & a2
Universal gate = RAM
SSS 4/9/99
CMU Reconfigurable Computing
7
Basic Ingredients (ctd)
1
0
1
1
A switch is controlled by a 1-bit RAM cell
SSS 4/9/99
CMU Reconfigurable Computing
8
Outline
•
•
•
•
What is reconfigurable hardware
RH vs other computation paradigms
Challenges in RH research
PipeRench: the CMU project:
– the hardware
– the software
• Conclusions
SSS 4/9/99
CMU Reconfigurable Computing
9
RH vs ASICs
• Generally Application-Specific Integrated Circuits
will be faster than RH:
– RH wires are slow & big
– RH bit-slices are costly to interconnect
– RH devices must store configuration on the chip
but
• RH can be reprogrammed
– new algorithms
– to fix bugs
• RH cheaper in small production
• RH tolerates faults better
• RH sometimes faster with staged computation
SSS 4/9/99
CMU Reconfigurable Computing
10
RH vs Microprocessors
• RH less flexible (like a VLIW with fixed
instructions)
but
• RH provides more (customized)
computation elements
• RH can decrease memory traffic
• RH can be tailored for specific algorithms
and data types
RH will not replace mP, but complement them
SSS 4/9/99
CMU Reconfigurable Computing
11
Types of RH
• FPGAs: bit-level logic functionality
(the basic processing elements compute on 1 bit)
• word-based architectures: PipeRench (CMU)
(basic PE operates on 8 bits)
(basic PE is a small ALU)
• coarse architectures: RAW (MIT)
(basic PE is a MIPS 2000 core)
SSS 4/9/99
CMU Reconfigurable Computing
12
RH In A System
Title:
(c oupling)
Creat or:
(FrameMaker 5.5 Pow erPC: Las erWrit er 8 8.5. 1)
Prev iew :
This EPS pict ure w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS pict ure w ill print to a
Pos tSc ript printer, but not to
other ty pes of printers.
SSS 4/9/99
CMU Reconfigurable Computing
13
Challenges In RC
• Software tools:
– Programming RC like software development
– Automatic compilation from HLL
– Automatic program partitioning
• Mapping efficiently algorithms (no ISA)
• System issues
– interfaces
– find “ideal” RC fabric
SSS 4/9/99
CMU Reconfigurable Computing
14
The CMU Reconfigurable
Computing Project
SSS 4/9/99
CMU Reconfigurable Computing
15
Hardware Goals
• To build a complete reconfigurable
hardware device
• To build the system integration hardware
• To host the device in a PC
SSS 4/9/99
CMU Reconfigurable Computing
16
Our Device:
•
•
•
•
•
Word processing elements
Pipelined architecture
Virtualized hardware
Local interconnection network
Wide pipelined bus
SSS 4/9/99
CMU Reconfigurable Computing
17
Configuration
memory
Data & Config
controller
Stripes
Processing
elements
SSS 4/9/99
CMU Reconfigurable Computing
18
Hardware Virtualization
Actual available
hardware
Instructions
currently in hardware
Instructions paged out
SSS 4/9/99
CMU Reconfigurable Computing
19
Hardware Virtualization (2)
Page out
compute
compute
compute
configure
Page in
hardware
Program in
configuration
memory
Overlap configuration
with computation.
SSS 4/9/99
CMU Reconfigurable Computing
20
Processing Elements
a
PE2
b
PE1
out
SSS 4/9/99
CMU Reconfigurable Computing
Cin
PE0
• Look-up table
• Any 3-to-1 function
21
The Interconnection Network
P*B bits
Word-level cross-bar
0
B bits
PE N
PE
PE 1
Pass Registers
P*B*N bits
SSS 4/9/99
CMU Reconfigurable Computing
22
The PCI Board
Title:
c hip.eps
Creat or:
f ig2dev Version 3.2 Patchlevel 0-beta3
Prev iew :
This EPS pict ure w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS pict ure w ill print to a
Pos tSc ript printer, but not to
other ty pes of printers.
SSS 4/9/99
CMU Reconfigurable Computing
23
Software Goal
To program reconfigurable devices using the
standard software development processes:
Java
– Compile C or Java
– Do it quickly
Partitioner
Data-flow Intermediate
Language
DIL
Built
Configuration
Reconfigurable HW
SSS 4/9/99
CMU Reconfigurable Computing
CPU
25
Building Circuits From DIL
a = b + c * d;
e = c - d;
c
b
d
*
• variables
• operators
SSS 4/9/99
wires
gates
+
-
a
e
CMU Reconfigurable Computing
26
Mapping Circuits To
a
a
b c
b c
+
a
c
b
+
+
-
a
+
SSS 4/9/99
c
b
-
CMU Reconfigurable Computing
27
The DIL Compiler Front-End
Circuit
Dil
input file
Parser
Evaluator
Loader
Backend
Loader
component
library
SSS 4/9/99
CMU Reconfigurable Computing
Component
circuits
28
The DIL Compiler Backend
Circuit
(expanded)
Front-end
Circuit
(placed)
Circuit
Optimizer
PlacerRouter
The whole compilation process is
very fast (compared to classical
CAD tools).
We can compile two orders of
magnitude faster.
SSS 4/9/99
CMU Reconfigurable Computing
Code generator
xfig
C++
Asm
29
Processing Element Size Tradeoffs
Small
Efficient usage
Slower
Flexible interconnect
Bigger configuration
Place and route easier
SSS 4/9/99
Big
Wasteful
Faster bit-slice
Coarse routing
Fewer configuration bits
Constrains the compiler
CMU Reconfigurable Computing
30
Stripe Width Tradeoffs
Wider
Fewer stripes
Virtualize more
Bandwidth waste
Placer freedom
SSS 4/9/99
Narrower
More will fit
Fewer page-ins
Less bandwidth available
Placement constrained
CMU Reconfigurable Computing
31
Bus Width Tradeoffs
Wider
More area
High bandwidth
SSS 4/9/99
Narrower
Less area
Time-mux bus
CMU Reconfigurable Computing
32
Clock Speed Tradeoffs
(run-time)
Faster
Short critical path
Long pipeline built
Decomposition overhead
Virtualized more
24
Little decomposition
Less virtualized
24
8
+
8
+
+
Slower
Big chains
Compact circuits
24
24
+
24
8
24
SSS 4/9/99
CMU Reconfigurable Computing
33
Configuration Bits per Stripe
2
1600
4
PE bit width
8 16 32
128
144
Configuration Bits
1400
1200
1000
800
600
400
200
0
64
SSS 4/9/99
80
96
112
Stripe Width
CMU Reconfigurable Computing
34
Title:
(fir-throughput.eps)
Creator:
Adobe Illus trator(TM) 7.0
Preview :
This EPS picture w as not saved
w ith a preview included in it.
Comment:
This EPS picture w ill print to a
PostScript printer, but not to
other ty pes of printers .
SSS 4/9/99
CMU Reconfigurable Computing
35
Project Status
• Operational:
– Behavioral and structural models of Piperench
in Verilog
– Assembler, simulator
– Tools for visualization and debugging
– One tile fabricated and tested
– Very fast compiler from intermediate language
• In work:
– Prototype PipeRench to be taped this summer
– PCI board to host PipeRench in a PC
SSS 4/9/99
CMU Reconfigurable Computing
36
Simulated Speed-up vs. UltraSparc @ 300Mhz
1000.0
328.8
90.9
100.0
76.1
61.8
29.0
26.0
20.6
10.0
1.0
ATR
SSS 4/9/99
Cordic
DCT
FIR
CMU Reconfigurable Computing
IDEA
Nqueens
Over
37
Future Work
• Build the PCI board
• Build the OS device drivers
• Start investigating HLL issues:
– automatic partitioning
– translation to DIL
– special code transformations
SSS 4/9/99
CMU Reconfigurable Computing
38
Conclusions
• A set of important applications can benefit from
RC devices
• RC offer potential for substantial performance
improvement at a low cost
• RC devices will soon be mainstream U
in the embedded computing world;
V
perhaps in the future they will also
R
permeate the desktop
SSS 4/9/99
CMU Reconfigurable Computing
Pentium V
39
Download