slides - FASTER

advertisement
NASA/ESA Conference on
Adaptive Hardware and Systems (AHS-2013)
Runtime Adaptation on Dataflow HPC
Platforms
R. Cattaneo, C. Pilato, M. Mastinu, M.D. Santambrogio
Politecnico di Milano – Dip. di Elettronica, Informazione e Bioingegneria
O. Kadlcek, O. Pell
Maxeler Technologies Ltd., London, UK
Torino, Italy – June 25, 2013
Context Definition
 The portion of the application that needs to be accelerated
is usually implemented in the hardware
 Resource limitations can become a bottleneck
 In some contexts, the HPC application should be able to
adapt to the environment
 Partial dynamic reconfiguration
is a well-know technique to
change the behavior at run time
while reusing the same logic
across different tasks
Christian Pilato – Politecnico di Milano
2
Reconfigurable Computing
“Reconfigurable computing is intended to fill
the gap between hardware and software, achieving potentially
much higher performance than software, while maintaing a higher
level of flexibility than hardware”
(K. Compton and S. Hauck, Reconfigurable Computing: a Survey of Systems and software,2002)
Christian Pilato – Politecnico di Milano
3
Reasons Behind
 Some applications require performance that
cannot be achieved by software
 Some applications require to be flexible,
modifiable, adaptable. Traditional hardware
cannot achieve these results
 Reconfigurable Computing platforms allow to
be altered after their deployment, turning into a
high-performance device able to meet
resources constraints, adaptability constraints
and reliability constraints
Christian Pilato – Politecnico di Milano
4
Maxeler Architecture
• Maxeler systems are based on the interaction between a
CPU and an FPGA
• Maxeler exploits FPGAs only as devices devoted to
hardware acceleration
Why do not try enhancing the
flexibility and performance of
Maxeler platforms by
exploiting some intrinsic
characteristics of the FPGAs?
Christian Pilato – Politecnico di Milano
5
Objectives
Rationale
 Dynamic Partial Reconfiguration is a technique
that can be applied to cope with problems such as
the lack of available resources and the system
adaptability and reliability
 Maxeler architectures are very efficient for
computation but they do not support the use of
Dynamic Partial Reconfiguration
Goals
 Designing a new tool flow able to support Dynamic
Partial Reconfiguration in Maxeler architectures to
offer adaptivity in the HPC domain
Christian Pilato – Politecnico di Milano
6
Canny edge detector
Christian Pilato – Politecnico di Milano
7
Reconfiguration in FPGAs
Useful Definitions
 Full Bitstream
FPGA
Full bitstream
 Reconfigurable partitions
 Reconfigurable modules
 Partial Bitstream
 Configurations
Christian Pilato – Politecnico di Milano
9
Maxeler Architecture
Christian Pilato – Politecnico di Milano
10
Example application
SLiC
SLiC
Manager
Christian Pilato – Politecnico di Milano
11
MaxCompiler flow
MaxIDE
Java runtime
Java
compilation
VHDL
BIT file
Christian Pilato – Politecnico di Milano
12
Preliminary Considerations
 Hierarchical design VS flat design
 NGDBuild, Map, PAR, Bitgen, are run as many times as
the number of configurations
 Need for the PXML file to lead the process
Christian Pilato – Politecnico di Milano
13
Proposed Approach
 Focusing on Kernels instead of Manager
 Kernels in the same Reconfigurable Block must have
the same characteristics;
 In every Configuration, exactly one Kernel must be
assigned to each Reconfigurable Bock;
 The same Kernel can not be placed in two different
Reconfigurable Blocks.
 Preserving as much as possible MaxCompiler/Xilinx tool
flow structure
 Mask the details to the designer
Christian Pilato – Politecnico di Milano
14
Reconfiguration on Kernels
Christian Pilato – Politecnico di Milano
15
User interface: DFE code
PRManager
Main
...
Configuration A = ...
Configuration B = ...
build(A,B)
• Reconfigurable Block = Reconfigurable Partition
• Kernel = Reconfigurable Module
Christian Pilato – Politecnico di Milano
16
Considerations
Christian Pilato – Politecnico di Milano
17
User interface: Host code
DFE
max_reconfig_partial_bitstream
Christian Pilato – Politecnico di Milano
18
Case Study: Edge Detection
 Canny edge detection is
applied to a video
 There are two Reconfigurable
Blocks and a total of four filters
 each filter represents a
Reconfigurable Module
DFE
 Initially, the first two filters
are applied
 Then, the device is partially
reconfigured and the other
two filters are applied
Christian Pilato – Politecnico di Milano
19
MaxWorkstation
 The targeted platform is MaxWorkstation
 It contains a Intel i7 870
quad core CPU
with 16 GB RAM
 The Intel CPU is connected
to the DFE via PCI Express
 The DFE has 24 GB RAM,
and it is a MAX3 board XilinxV6
Christian Pilato – Politecnico di Milano
20
Experimental Results
 Methodology applied to a video taken from “Mission
Impossible”
 combined with a set of compiler extensions for the
automatic code generation of the kernels
 details are totally hidden to the designer
[VIDEO]
Christian Pilato – Politecnico di Milano
21
Conclusions and Future Work
 The proposed approach integrated Partial Dynamic
Reconfiguration in a dataflow architecture
 The process is totally transparent to the designer
 Future works will focus on the current limitations:
 Reconfigurable Areas constraints can be specified
only as multiple of clock regions
 During the partial reconfiguration of some
Reconfigurable Blocks, all the Kernels are in reset
status
Christian Pilato – Politecnico di Milano
22
Questions
Implementation: design flow
The build process is divided in four main
stages
Christian Pilato – Politecnico di Milano
24
First build stage
•
•
When the build process starts, MaxDC, XST
and NGCBuild are run for each Reconfigurable
Block and for the static part independently;
The result of this first stage is a large number of
netlist files.
Christian Pilato – Politecnico di Milano
25
Second build stage
•
•
•
•
The second stage consist in running NGDBuild,
MAP, Par, pr_verify and Bitgen for each
configuration
PXML file is automatically generated
The static part is implemented only in the first
configuration
The reconfigurable modules are implemented
only the first time they appear in a Configuration
Christian Pilato – Politecnico di Milano
26
Final stage
•
•
•
Once the full bitstream and all the partial ones
have been generated, they are encapsulated in
the .Max file
The first Configuration passed to the build
method is choosen as the “default”
Configuration
This means that its full bitstream will be loaded
in the CFPGA when the program starts
Christian Pilato – Politecnico di Milano
27
Download