Acceleration of the Retinal Vascular Tracing Algorithm using FPGAs

advertisement
Acceleration of the
Retinal Vascular Tracing Algorithm
using FPGAs
Miriam Leeser
Shawn Miller
Smart Camera: Provides Host PC with image data along with image
processing results at frame rate with low latency
PCI MEMORY 2
Bus
Results
MEMORY 3
Data Packing
Design
Direction
Filter
Design
BlockRAM
Host
FPGA
Memory
Switching
Design
Results
FIREBIRD BOARD
Memory Switching
Design
Framegrabber
Image
Data
MEMORY 0
Image
Data
MEMORY 1
Retinal Vascular Tracing Application
Goal: Detection and enhancement of the vascular
structure of a patient’s retina from a live video
feed
Latency and throughput requirements of real-time
image processing cannot be provided by software
on a general-purpose processor
Timing Issues
Return results for each pixel at frame rate of camera
Very Low Latency
•Surgical laser must be shut off immediately after detecting that it
is aimed incorrectly
•Cannot tolerate 1 frame delay (33msec at 30 frames/sec)
•Complex memory management required to achieve minimum
latency
Storage of a 5x5 pixel image
Memory 0
0
2
6
10
4
8
12
16
20
Memory 1
5
14
18
22
1
7
11
15
24
3
9
13
17
21
19
23
Latency currently on the order of 100msec
Acceleration of the Retinal Vascular Tracing
Algorithm using FPGAs
To accelerate retinal vascular tracing by implementing
computation of template responses in reconfigurable
hardware.
Image Data
FPGA
Image Data
Host
Memory Switching
Design
MEMORY 0
Direction Filter
Design
BlockRAM
Memory Switching Design
Data Packing Design
Image Data
Algorithm
Objective
Framegrabber
(Dillon Eng.)
MEMORY 2
PCI Bus
Image Data
MEMORY 3
MEMORY 1
REG23
REG22
REG21
REG20
REG19
REG18
REG17
REG16
REG15
REG14
REG13
REG12
REG11
REG10
REG9
RESPONSE
RESPONSE
RESPONSE
RESPONSE
RESPONSE
gt
POS
POS2
NEG
NEG2
PARTIAL
RESPONSE
PARTIAL
RESPONSE
PARTIAL
RESPONSE
PARTIAL
RESPONSE
INTERCONNECTION
RESPONSE
>
REG8
REG3
REG2
TEMPLATE_B
REG7
REG1
REG6
REG5
REG0
TEMPLATE_A
RESPONSE
RESPONSE_B
RESPONSE
RESPONSE_A
REG4
FIREBIRD BOARD
000
001
010
011
100
101
110
111
Direction Templates
What does the algorithm do?
Retinal vascular tracing: detection
of blood vessels in images of the
retina
The algorithm finds blood vessels
and traces out their structure
Where is the algorithm used?
Processing live video of the
patient retina during laser retinal
surgery
Highlighting the vascular
structure helps the surgeon avoid
damage
Why do we need to accelerate it?
Current implementation: software
on a general-purpose processor
Images are 512x512 pixels, and
need to be processed at frame rate
Implementation
Memory Management
Hardware Acceleration
Template responses are calculated in hardware in parallel
All pixels in the image are processed
Camera connected directly to the board (no host interaction)
Only the results are sent to the host after they become
available, and while new results are being calculated
Very Low Latency
Surgical laser must be shut off immediately if we detect that it
is aimed incorrectly
Cannot tolerate a one frame delay
Complex memory management scheme must be introduced
Problems
Must continuously write new data to input memory from the
camera and be reading the data to be processed
Cannot read and write from one memory on the same clock
cycle
The image is stored row-wise, but must be read columnwise
Solution
Store the image in “checkerboard” fashion in two memories.
Every other pixel is stored in a different memory
A column of data is read by alternating between the two
memories on every clock cycle
Checkerboard storage of a 5x5 image
TEMPLATE_
COMPARATOR
TEMPLATE_
COMPARATOR
TEMPLATE_
COMPARATOR
+
+
TEMPLATE_
COMPARATOR
Clock
Memory 0
TEMPLATE_
COMPARATOR
-
RESPONSE
Badrinath Roysam
Charles Stewart
Ken Fritsche
Miriam Leeser
Shawn Miller
-
<
0
Memory 1
2
1
4
Input Memory 0
3
TEMPLATE_
COMPARATOR
Input Memory 1
6
TEMPLATE
TEMPLATE_
COMPARATOR
RESULT
DIRECTION
RESPONSE TEMPLATE
10
12
16
Reconfigurable Hardware
20
5
8
15
18
22
24
17
21
Time
13
11
14
9
7
Writing data from camera
19
Reading data to be processed
Inactive
23
Firebird reconfigurable computing engine from Annapolis Micro Systems
1 Xilinx VIRTEX E (XCV2000E) FPGA
5 Memory banks (4 x 64-bit, 1 x 32-bit)
5.4 Gbytes/sec of memory bandwidth
66Mhz/64-bit PCI interface to host
Results
Original Image
Each pixel is passed through the design
unaltered.
More Information
In proceedings:
Rapid Automated Tracing and Feature Extraction from Retinal Fundus Images Using Direct Exploratory
Algorithms
A. Can, H. Shen, J.N. Turner, H.L. Tanenbaum and B. Roysam, IEEE Transactions on Information Technology in
Biomedicine,
June 99.
On the web:
http://www.ece.neu.edu/groups/rpl/projects/retinaltracing
This work was supported in part by CenSSIS, the Center for Subsurface Sensing and Imaging Systems, under the Engineering
Research Centers Program of the National Science Foundation (Award Number EEC-9986821).
Direction
The direction template with the maximum
response for each pixel. The direction is
represented by a value between 0 and 15.
Response
The maximum response that led to the
direction decision for each pixel.
Conclusions
•Stand-alone camera outputs only image data.
•Our design outputs not only image data, but directional template
responses as well.
•The cost of additional image processing is a latency on the order of
10-4 seconds. This is a low cost when considering that at 30
frames/sec, a new frame of image data is introduced every
33 msec.
•The application for this project is Retinal Vascular Tracing, but the
same method can be applied to any problem that requires real-time
image processing.
Download