Implementing Image Processing Pipelines in a Hardware/Software Environment Problem Statement Heather Quinn

advertisement
Implementing Image Processing Pipelines in a Hardware/Software Environment
Heather Quinn1, Dr. Miriam Leeser
Dr. Laurie Smith King
Northeastern University
1hquinn@ece.neu.edu
College of the Holy Cross
An Example
Motivation: Accelerate image processing tasks through efficient
use of FPGAs. Combine already designed components at runtime
to implement series of transformations (pipelines)
Our Environment
HW Init
1a) swsw
implementation
S
W
Median
Filter
Edge
Detect
S
W
1b) swhw
implementation
S
W
Median
Filter
Pad
Image
H
W
1c) hwsw
implementation
S
W
Pad
Image
H
W
Median
Filter
S
W
Remove
Padding
Edge
Detect
1d) hwhw
implementation
S
W
Pad
Image
H
W
Median
Filter
R
P
R
G
Fix
Padding
Edge
Detect
Designed in Java
Done on a single host
Exhaustive: up to 11 stages
ILP: all pipelines up to 13 stages, some pipelines up to 18 stages,
and none larger
 Sub-optimal solutions in 500 ms
 Greedy and local search: all problem sizes
 Strategy
 Exhaustive and ILP up to 13 stages
 Greedy or local search for more than 13 stages
Assumptions
 Reprogramming and Communication costs incurred
at sw/hw boundaries
 Might need to fix image edge in between components
 Problems sizes of 20 or fewer stages
 500 ms to make a decision
Runtime for Exhaustive and ILP Algorithms
1000000
Average Exhaustive Runtime
Maximum Exhaustive Runtime
Average ILP Runtime
Maximum ILP Runtime
100000
10000
median sw
1000
Solving Pipeline Assignment
800
 Exhaustive Search
600



400
200
Find: Optimal solutions
How: Search entire problem space
Algorithm Runtime: O(2N), where N is the
number of pipeline stages
 ILP
0
0
1000
2000
3000
4000
Total Pixels
5000
6000
7000
8000
Median FilterEdge Detection Profiles
Median to Edge Running Time (with Initialization Time)
2000
sw/sw total
 Software algorithm’s runtime for small images less than the
Exponential number of implementations
Reprogramming costs
 Need a strategy to find a fast pipeline implementation at runtime


median hw
Efficient Use of FPGAs


Remove
Padding
hw/sw boundaries
 Red boxes are fixing
image edges
 Green Boxes are
reprogramming
 Exhaustive: optimal solutions for 11 stages
 Optimal solutions in 500 ms
and software
 Each implementation has known runtimes for a set of
images
 Interpolation used for rest of image sizes
 Each hardware implementation has a known area
size
 All components are image in/image out
edge hw
systems
 Hardware initialization,
 Communicating image, and
 Reprogramming
 Series of image processing algorithms applied to an image
 Each algorithm has a software and hardware implementation
 Finding the crossover point for a pipeline is complicated
S
W
1200
 Using hardware incurs execution costs not present in software
Image Processing Pipelines
Remove
Padding
edge sw
Hardware Systems
hw/hw total
1800
hw/sw total
sw/hw total
1600
1400
1200
Millisecond
hardware costs
 Profiling the hardware and software runtimes for different image
sizes determines the crossover point
 Deciding at runtime to execute in software or hardware is simple
for one algorithm processing one image
S
W
Results
 Each algorithm has two implementations: hardware
 Blue boxes are
Edge
Detect
Used as a baseline for solution quality
Timed to find 500 ms boundary
 ILP solver constrained to 500 ms
 Ability to solve dependent on components
 Local Search returns best solution found within time limit
The Library of Components
Median and Edge Runtimes (with Init Times)
1400
Milliseconds
Designed with JHDL (from BYU)
Input and output image in FPGA’s on-board memory
 Input image communicated from host to FPGA at
beginning of processing
 Output image communicated from FPGA to host at end
of processing
 Host and FPGA connected through PCI Bus
Display
Median Filter and Edge Detection Profiles
 Hardware Processing


Get Data
Possible Implementations
of workstations (NOW)
 FPGAs are expensive,
available on some hosts
but not others
 NOW provide coarsegrained parallelism, FPGAs
provide fine-grained
parallelism
 Software processing


Edge Det


1000
800
600
400
200
0
0
1000
2000
3000
4000
Total Pixels
5000
6000
7000
8000




Find: Optimal solutions
How: AMPL model running on CPLEX
Need: ILP formulation of the problem statement
Algorithm Runtime: Unknown
 Greedy
 Find: Sub-optimal solutions
 How: Make optimal decisions for each pipeline
stage based on hardware area usage and
speedup values
 Algorithm Runtime: O(N), where N is the number
of pipeline stages
 Local Search
 Find: Sub-optimal solutions
 How: Improve upon initial solutions (found
through Greedy or randomly)
 Algorithm Runtime: runs for user supplied
amount of time
milliseconds
FPGA
Repgm
implementations (components), a pipeline, and an
image
 Output: an assignment of each component to a
hardware or software implementation
Need pipeline implementations that minimize reprogramming and communication costs
 A heterogeneous network
FPGA2
Median
Send Data
 Synthetic components arranged into pipelines of length 1 to 20
 Exhaustive algorithm run to completion
 Inputs: a profiled library of image processing
Median Filter
&
Edge Detection
Start App
Experiments
Problem Statement
1000
100
10
1
0
2
4
6
8
10
12
14
16
18
Problem Size
Future Work
 ADAPT: Algorithm that calls exhaustive, ILP and local search
algorithms to solve pipeline assignment problem based on
problem size
 Decision Time: Study how the amount of time allotted affects
ADAPT results
 Virtex II Pro: Add scheduling support for using embedded
Power PC cores
Publications
L. Smith King, H. Quinn, M. Leeser, D. Galatopoullos and E. S.
Manolakos, “Run-time Execution of Reconfigurable Hardware in a Java
Environment”, International Conference on Computer Design,
September 2001.
H Quinn, M. Leeser, and L. Smith King, “Accelerating Image Processing
in a Software/Hardware Environment”, MAPLD International
Conference, September 2002.
20
Download