Document

advertisement
Reconfigurable Architectures
• Forces that drive a Reconfigurable Architecture
– Price
• Mass production 100K to millions
• Experimental 1 to 10’s
– Granularity of reconfiguration
• Fine grain
• Course Grain
– Degree of system integration/coupling
• Tightly
• Loosely
All are a function of the application that
will run on the Architecture
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Example Points in (Price,Granularity,Coupling) Space
$1M’s
Exec
Int
float
RFU
Store
Decode
Intel /
AMD
Processor
Price
Coupling
$100’s Loose
Coarse
Tight
PC
Ethernet
Granularity
ML507
Fine
2 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
What’s the point of a Reconfigurable Architecture
• Performance metrics
– Computational
• Throughput
• Latency
– Power
• Total power dissipation
• Thermal
– Reliability
• Recovery from faults
Increase application performance!
3 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Typical Approach for Increasing Performance
• Application/algorithm implemented in software
– Often easier to write an application in software
• Profile application (e.g. gprof)
– Determine where the application is spending its time
• Identify kernels of interest
– e.g. application spends 90% of its time in function
matrix_multiply()
• Design custom hardware/instruction to accelerate
kernel(s)
– Analysis to kernel to determine how to extract
fine/coarse grain parallelism (does any parallelism even
exist?)
Amdahl’s Law!
4 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity
5 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: Coarse Grain
• rDPA: reconfigurable Data Path Array
• Function Units with programmable interconnects
Example
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
6 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: Coarse Grain
• rDPA: reconfigurable Data Path Array
• Function Units with programmable interconnects
Example
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
7 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: Coarse Grain
• rDPA: reconfigurable Data Path Array
• Function Units with programmable interconnects
Example
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
8 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: Fine Grain
• FPGA: Field Programmable Gate Array
• Sea of general purpose logic gates
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
Configurable Logic Block
9 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: Fine Grain
• FPGA: Field Programmable Gate Array
• Sea of general purpose logic gates
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
Configurable Logic Block
10 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: Fine Grain
• FPGA: Field Programmable Gate Array
• Sea of general purpose logic gates
CLB
CLB
CLB
Configurable Logic Block
CLB
CLB
CLB
CLB
CLB
11 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: Trade-offs
Trade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)
1024-bits
2-LUT
10-LUT
Microprocessor
1024-bits
12 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: Trade-offs
Trade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)
1024-bits
Microprocessor
op
A
B
2-LUT
4
3
3
10-LUT
3
1024-bits
13 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: Trade-offs
Trade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)
1024-bits
Microprocessor
op
A
B
2-LUT
4
3
3
10-LUT
3
op
A
B
op
4
A
B
3
4
3
3
1024-bits
3
3
3
14 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: Trade-offs
Trade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)
1024-bits
Microprocessor
op
A
B
2-LUT
4
3
op
A
B
op
4
A
B
3
3
10-LUT
3
3
1024-bits
3
3
3
3
15 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: Trade-offs
Trade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)
1024-bits
Microprocessor
op
A
B
2-LUT
4
3
1024-bits
4
op
A
B
op
A
B
10-LUT
3
3
3
3
3
4
3
3
3
16 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: Trade-offs
Trade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)
1024-bits
2-LUT
10-LUT
Bit logic and
constants
17 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
1024-bits
Iowa State University (Ames)
Granularity: Trade-offs
Trade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)
1024-bits
2-LUT
10-LUT
Bit logic and
constants
1024-bits
(A and “1100”)
or (B or “1000”)
18 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: Trade-offs
Trade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)
1024-bits
2-LUT
A
10-LUT
B
Bit logic and
constants
1024-bits
(A and “1100”)
or (B or “1000”)
19 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: Trade-offs
Trade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)
1024-bits
A
4
2-LUT
AND
10-LUT
1
Bit logic and
constants
OR
(A and “1100”)
or (B or “1000”)
B
0
4
1024-bits
Area that was
required using
2-LUTS
OR
It’s much worse,
each 10-LUT only has one output
20 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: Example Architectures
• Fine grain: GARP
• Course grain: PipeRench
21 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: GARP
Memory
I-cache
D-cache
CPU
RFU
Config
cache
Garp chip
22 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: GARP
Memory
RFU
control
(1)
I-cache
D-cache
CPU
RFU
Config
cache
Execution
(16, 2-bit)
N
PE (Processing Element)
Garp chip
23 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: GARP
Memory
RFU
control
(1)
I-cache
D-cache
CPU
RFU
Config
cache
Garp chip
Execution
(16, 2-bit)
N
PE (Processing Element)
Example computations in one cycle
A<<10 | (b&c)
(A-2*b+c)
24 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: GARP
Memory
I-cache
D-cache
Impact of configuration size
• 1 GHz bus frequency
•128-bit memory bus
• 512Kbits of configuration size
On a RFU context switch how long
to load a new full configuration?
CPU
RFU
Config
cache
Garp chip
4 microseconds
An estimate of amount of time for the
CPU perform a context switch is
~5 microseconds
~2x increase context
switch latency!!
25 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: GARP
Memory
RFU
control
(1)
I-cache
D-cache
CPU
RFU
Config
cache
Execution
(16, 2-bit)
N
PE (Processing Element)
Garp chip
“The Garp Architecture and C Compiler”
http://www.cs.cmu.edu/~tcal/IEEE-Computer-Garp.pdf
26 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: PipeRench
• Coarse granularity
• Higher (higher) level programming
• Reference papers
• PipeRench: A Coprocessor for Streaming Multimedia Acceleration
(ISCA 1999): http://www.cs.cmu.edu/~mihaib/research/isca99.pdf
• PipeRench Implementation of the Instruction Path Coprocessor
(Micro 2000):
http://class.ee.iastate.edu/cpre583/papers/piperench_Micro_2000.
pdf
27 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: PipeRench
PE
PE
PE
8-bit ALU
8-bit ALU
8-bit ALU
Reg file
Reg file
Reg file
Global bus
Interconnect
PE
PE
PE
8-bit ALU
8-bit ALU
8-bit ALU
Reg file
Reg file
Reg file
Interconnect
PE
PE
PE
8-bit ALU
8-bit ALU
8-bit ALU
Reg file
Reg file
Reg file
28 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: PipeRench
Cycle 1
PE
PE
PE
PE
2
3
4
5
6
Pipeline 0
stage
1
2
PE
PE
PE
PE
PE
PE
PE
PE
3
4
29 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: PipeRench
Cycle 1
PE
PE
PE
PE
Pipeline 0
stage
1
2
3
4
5
6
0
2
PE
PE
PE
PE
PE
PE
PE
PE
3
4
30 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: PipeRench
Cycle 1
PE
PE
PE
PE
Pipeline 0
stage
1
0
2
3
4
5
6
0
1
2
PE
PE
PE
PE
PE
PE
PE
PE
3
4
31 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: PipeRench
Cycle 1
PE
PE
PE
PE
Pipeline 0
stage
1
2
PE
PE
PE
PE
PE
PE
PE
PE
0
2
3
0
0
1
1
4
5
6
2
3
4
32 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: PipeRench
Cycle 1
PE
PE
PE
PE
Pipeline 0
stage
1
2
PE
PE
PE
PE
PE
PE
PE
PE
3
0
2
3
4
0
0
1
1
1
2
2
5
6
3
4
33 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: PipeRench
Cycle 1
PE
PE
PE
PE
Pipeline 0
stage
1
2
PE
PE
PE
PE
PE
PE
PE
PE
3
0
2
3
4
0
0
1
1
1
2
2
2
3
3
4
34 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
5
6
4
Iowa State University (Ames)
Granularity: PipeRench
Cycle 1
PE
PE
PE
PE
Pipeline 0
stage
1
2
PE
PE
PE
PE
PE
PE
PE
PE
3
0
2
3
4
5
6
0
0
1
1
1
2
2
2
3
3
3
4
4
0
4
35 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: PipeRench
Cycle 1
PE
PE
PE
PE
Pipeline 0
stage
1
0
2
3
0
0
1
1
1
2
2
2
3
3
3
4
4
2
PE
PE
PE
PE
PE
PE
PE
PE
3
4
2
6
0
4
Cycle 1
5
3
4
5
6
Pipeline 0
stage
1
2
36 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: PipeRench
Cycle 1
PE
PE
PE
PE
Pipeline 0
stage
1
0
2
3
0
0
1
1
1
2
2
2
3
3
3
4
4
2
PE
PE
PE
PE
PE
PE
PE
PE
3
4
Pipeline 0
stage
1
2
6
0
4
Cycle 1
5
3
4
5
6
0
2
37 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: PipeRench
Cycle 1
PE
PE
PE
PE
Pipeline 0
stage
1
0
2
3
0
0
1
1
1
2
2
2
3
3
3
4
4
2
PE
PE
PE
PE
PE
PE
PE
PE
3
4
Pipeline 0
stage
1
0
2
6
0
4
Cycle 1
5
3
4
5
6
0
1
2
38 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: PipeRench
Cycle 1
PE
PE
PE
PE
Pipeline 0
stage
1
0
2
3
0
0
1
1
1
2
2
2
3
3
3
4
4
2
PE
PE
PE
PE
PE
PE
PE
PE
3
4
Pipeline 0
stage
1
2
0
2
3
0
0
1
1
6
0
4
Cycle 1
5
4
5
6
2
39 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: PipeRench
Cycle 1
PE
PE
PE
PE
Pipeline 0
stage
1
0
2
3
0
0
1
1
1
2
2
2
3
3
3
4
4
2
PE
PE
PE
PE
PE
PE
PE
PE
3
4
Pipeline 0
stage
1
2
0
6
0
4
Cycle 1
5
2
3
4
0
0
3
1
1
1
2
2
40 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
5
6
Iowa State University (Ames)
Granularity: PipeRench
Cycle 1
PE
PE
PE
PE
Pipeline 0
stage
1
0
2
3
0
0
1
1
1
2
2
2
3
3
3
4
4
2
PE
PE
PE
PE
PE
PE
PE
PE
3
4
Pipeline 0
stage
1
2
0
6
0
4
Cycle 1
5
2
3
4
5
0
0
3
3
1
1
1
4
2
2
2
41 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
6
Iowa State University (Ames)
Granularity: PipeRench
Cycle 1
PE
PE
PE
PE
Pipeline 0
stage
1
0
2
3
0
0
1
1
1
2
2
2
3
3
3
4
4
2
PE
PE
PE
PE
PE
PE
PE
PE
3
4
Pipeline 0
stage
1
2
0
6
0
4
Cycle 1
5
2
3
4
5
6
0
0
3
3
3
1
1
1
4
4
2
2
2
0
42 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Degree of Integration/Coupling
• Independent Reconfigurable Coprocessor
– Reconfigurable Fabric does not have direct
communication with the CPU
• Processor + Reconfigurable Processing Fabric
– Loosely coupled on the same chip
– Tightly coupled on the same chip
43 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Degree of Integration/Coupling
Write Back
DMA
Controller
Memory
Controller
L2 Cache
Main Memory
L1 Cache
Memory
Decode
Fetch
CPU
Execute
ALU
FPU
I/O
Controller
USB
PCI
NIC
PCI-Express
SATA
Hard Drive
44 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Degree of Integration/Coupling
Write Back
DMA
Controller
Memory
Controller
L2 Cache
Main Memory
L1 Cache
Memory
Decode
Fetch
CPU
Execute
ALU
FPU
I/O
Controller
USB
PCI
RPF
NIC
PCI-Express
SATA
Hard Drive
45 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Degree of Integration/Coupling
Write Back
RPF
DMA
Controller
Memory
Controller
L2 Cache
Main Memory
L1 Cache
Memory
Decode
Fetch
CPU
Execute
ALU
FPU
I/O
Controller
USB
PCI
NIC
PCI-Express
SATA
Hard Drive
46 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Degree of Integration/Coupling
Write Back
DMA
Controller
L2 Cache
Memory
Controller
RPF
I/O
Controller
PCI
NIC
PCI-Express
Main Memory
USB
L1 Cache
Memory
Decode
Fetch
Config
I/F
CPU
Execute
ALU
FPU
SATA
Hard Drive
47 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Degree of Integration/Coupling
Write Back
DMA
Controller
L2 Cache
Memory
Controller
RPF
I/O
Controller
PCI
NIC
PCI-Express
Main Memory
USB
L1 Cache
Memory
Decode
Fetch
Config
I/F
CPU
Execute
ALU
FPU
SATA
Hard Drive
48 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Degree of Integration/Coupling
Write Back
DMA
Controller
Memory
Controller
L2 Cache
RPF
USB
PCI
NIC
I/O
PCI-Express
Main Memory
L1 Cache
Memory
Decode
Fetch
Config
I/F
CPU
Execute
ALU
FPU
I/O
Controller
SATA
Hard Drive
49 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Degree of Integration/Coupling
Write Back
DMA
Controller
Memory
Controller
L2 Cache
Main Memory
L1 Cache
Memory
Decode
Fetch
CPU
Execute
ALU
FPU
RFU
I/O
Controller
USB
PCI
NIC
PCI-Express
SATA
Hard Drive
50 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
51 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Next Class
• Reconfiguration Management
– Chapter 4
52 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Questions/Comments/Concerns
• Write down
– Main point of lecture
– One thing that’s still not quite clear
OR
– If everything is clear, then give an example
of how to apply something from lecture
53 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Lecture notes
54 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: PipeRench
• Scheduling virtual stage on to physical
• Partial/Dynamically reconfig (each cycle)
55 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Granularity: GARP
• Impact of configuration size on performance
• Context switching
• Garp feature
• Dynamic reconfigurable
• Store multiple configurations in an on chip
cache (4)
• One configuration at a time
• Example app mapping to GARP (loop)
• Amdahl's Law
The Garp Architecture and C Compiler
• http://www.cs.cmu.edu/~tcal/IEEE-Computer-Garp.pdf
56 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Overview
• Dimensions
– Price
– Granularity
– Coupling
– To optimize App Performance (compute (throughput, latency),
Power, reliability)
• RPF to efficiently implement VICs
– Main picture authors' wants to convey
• What’s the point or having a Reconfigure arch
– Example (Increase App performance)
• App -> SW/CPU
• Profile
• ID kernels of intense compute
• Design custom hardware/instruction (Amdels law)
– Intel FPL paper, great example for reading by Friday
57 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Reconfigurable Architectures
• RPF -> VIC (short slide)
58 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures
Iowa State University (Ames)
Download