Design, Integration, and Implementation of the DySER Hardware Accelerator into OpenSPARC

advertisement
Department of Computer Science
Design, Integration, and Implementation of
the DySER Hardware Accelerator
into OpenSPARC
Chris Frericks, Jesse Benson, Ryan Cofell,
Chen-Han Ho, Venkatraman Govindaraju, Tony Nowatzki,
Karthikeyan Sankaralingam
Vertical Research Group
University of Wisconsin−Madison
1
Department of Computer Science
Executive Summary

DySER is a programmable in-core accelerator



Minimally invasive, high performance, high energy eff.
Implemented DySER in Verilog RTL
Integrated DySER into OpenSPARC



Full system implemented on off-the-shelf FPGA
Software stack complete
10 month effort
2
Department of Computer Science
Outline




Motivation & Background on DySER
What We Did
What We Found Challenging
What We Learned
3
Department of Computer Science
Motivation

Multi-core trends
Performance limited by parallelism
 Performance limited by voltage-scaling


Single-core trends

Hard to improve performance
Accelerators and specialization will likely drive
future performance gains
4
Department of Computer Science
DySER

DySER concept first proposed HPCA 2011
Dynamically Specialized Execution Resources
 2X to 10X performance increase and 70%
reduction in energy

Computation Kernel


DySER
Utilizes network of functional units
Dynamically specialize hardware to match
application phases
5
Department of Computer Science
Generic Processor
Fetch Decode
Decode
I$
Register
File
Execute
Memory Writeback
Exec
Units
D$
6
Department of Computer Science
Processor with Integrated DySER
Fetch Decode
Decode
I$
Execute
Memory Writeback
Exec
Units
Register
File
D$
DySER
7
Department of Computer Science
Program
Execution
with
DySER
___________
___________
___________
___________
___________
___________
___________
___________
___________
___________
___________
___________
___________
___________
___________
Processor
Config
____________
____________
-
+
x
+
+
____________
____________
-
+
x
____________
Processor
+
+
DySER
8
Department of Computer Science
Outline




Motivation & Background
What We Did
What We Found Challenging
What We Learned
9
Department of Computer Science
Prototype Components
Software Stack
FPGA
C
Code
Compiler
____
_____
_____
_____
____
_____
DySER
_____
_____
_____
Proc
_____
OpenSPARC
Decode
I$
Exec
Units
D$
Register
File
DySER
ISA Extensions
DySER μArch Interface
10
Department of Computer Science
Software Stack

Utilized LLVM compiler framework and IR
C
Code
_____
_____
_____
_____
_____
_____
_____
_____
_____
C
Code
_____
_________
_________
_____
_________
_____
Map
Find Frequently
Regions to
Executed
DySER
Regions
_____
Proc
DySER
11
Department of Computer Science
ISA Extensions
10
config
110111
config
000
config
001
RS2[4:0]
DySER_Init
10
DI1[4:0]
110111
RS1[4:0]
DI2[4:0]
V
DySER_Send
11
DI1[4:0]
000000
RS1[4:0]
0
1000000000
RS2[4:0]
DySER_Load
10
DO1[4:0]
110111
RD[4:0]
unused
010
unused
1000000000
RS2[4:0]
DySER_Receive
11
DI1[4:0]
000100
RS1[4:0]
0
DySER_Store
12
Department of Computer Science
DySER μArch Interface
Fetch Decode
Decode
I$
Execute
Memory Writeback
ExecConfiguration
Units
Data Out
Register
File
D$
Data In
DySER
13
Department of Computer Science
RTL Modifications
Stage
Modules Modified
Lines Modified
Fetch
Fetch Control Logic, Top
Level Fetch Module
46
Thread Select
Thread Switch Logic,
19
Thread Completion Control
Logic
Decode
Instruction Decode Logic,
Long Latency Instr Control
Logic
210
Execute
Execute and Bypass
Control Logic, Bypass Mux
Module, Top Level Exec
Module
216
Store Buffer Datapath and
Control Logic
23
Store Buffer
Small change required
14
Department of Computer Science
FPGA Bring-up



Prototype mapped onto Xilinx Virtex 5 FPGA board
Boots Unmodified OpenSPARC Ubuntu 7.10 linux
DySER not part of critical path!
15
Department of Computer Science
Outline




Motivation & Background
What We Did
What We Found Challenging
What We Learned
16
Department of Computer Science
Compiler Debugging

Can’t debug internal state of DySER, so ‘debugging’
backend was created


Convert code generated for DySER back to SPARC
Ensure intermediate representations are functionally
correct
_________
_________
_________
Proc
_________
_________
_________
_________
_________
DySER
_________
_________
_________
_________
_________
_________
_________
_________
Proc
17
Department of Computer Science
ISA Extensions


SPARC ISA uses most of its encode space
Head room found in ‘Implementation Dependent’
Instructions
DySER Implementation Dependent Instructions
10
config
110111
config
000
config
001
RS2[4:0]
010
unused
DySER_Init
10
DI1[4:0]
110111
RS1[4:0]
DI2[4:0]
V
DySER_Send
10
DO1[4:0]
110111
RD[4:0]
unused
DySER_Receive
18
Department of Computer Science
DySER Configuration

Original “DySER Config” proposal abandoned
Instead…embedded configuration bits in
instructions, let processor handle rest

Mitigates system level interference
Memory
DySER Config<ptr>
Memory
…
DySER_Init<0010101..>
...

DySER_Init<0101101..>
DySER_Init<1100100..>
ptr:
0101010101010
DySER_Init<0100110..>
1010101011011
…
…
19
Department of Computer Science
FPGA Sizing



Fit only a 2x2 DySER with 32-bit datapath
Switches contribute heavily to DySER’s size
Future work includes mapping 8x8 DySER
Available
OpenSPARC
OpenSPARC
OpenSPARC
with 2x2 DySER with 4x4 DySER
32-Bit
8-Bit
# Slice Registers
69120
19634
36358
25616
#Slice LUTS
69120
31010
57110
45419
20
Department of Computer Science
Outline




Motivation & Background
What We Did
What We Found Challenging
What We Learned
21
Department of Computer Science
Can be Integrated into Processor

DySER + OpenSPARC prototype:
Hardware and Software work
 Compiler work in ten months, hardware in six


Design of DySER remained modular and
extendable
22
Department of Computer Science
‘Least Invasive’ Mantra

3 person-months “understanding” OpenSPARC



Pipeline organization
Verilog RTL
Avoided greedy integration strategy

E.g. DySER load and store instructions
23
Department of Computer Science
Speedup less than ideal
Benchmark
HPCA-11
bzip
After
Prototyping
1.08x
hmmer
1.09x
1.30x
h264ref
1.11x
1.36x
gobmk
1.07x
1.20x
libquantum
1.01x
1.09x
mcf
1.00x
1.30x
1.50x
Baseline: Single Issue In-Order Pipeline
24
Department of Computer Science
Documentation
''What does this do?''
''TBD: is this necessary?''
''Kill the next three interrupts, after that, you are on
your own.''
''There must be a cleaner way to do this!''
Healthy skepticism of documentation is good
25
Department of Computer Science
Conclusion

DySER is an integrable accelerator

Produced prototype in ten months time with a team
of six graduate students

Pay attention to system interaction of accelerator

Compiler and RTL intensive work feasible with
medium effort
26
Department of Computer Science
Questions?
Red = DySER
Green = OpenSPARC
Blue = Mem Controller
27
Department of Computer Science
Verification


Compiler
OpenSPARC/DySER Prototype



OpenSPARC comes with VCS test suite
Included regression tests for normal SPARC execution
Embedded DySER instructions into regression tests
28
Download