Acceleration of the Retinal Vascular Tracing Algorithm using FPGAs Miriam Leeser Shawn Miller Smart Camera: Provides Host PC with image data along with image processing results at frame rate with low latency PCI MEMORY 2 Bus Results MEMORY 3 Data Packing Design Direction Filter Design BlockRAM Host FPGA Memory Switching Design Results FIREBIRD BOARD Memory Switching Design Framegrabber Image Data MEMORY 0 Image Data MEMORY 1 Retinal Vascular Tracing Application Goal: Detection and enhancement of the vascular structure of a patient’s retina from a live video feed Latency and throughput requirements of real-time image processing cannot be provided by software on a general-purpose processor Timing Issues Return results for each pixel at frame rate of camera Very Low Latency •Surgical laser must be shut off immediately after detecting that it is aimed incorrectly •Cannot tolerate 1 frame delay (33msec at 30 frames/sec) •Complex memory management required to achieve minimum latency Storage of a 5x5 pixel image Memory 0 0 2 6 10 4 8 12 16 20 Memory 1 5 14 18 22 1 7 11 15 24 3 9 13 17 21 19 23 Latency currently on the order of 100msec Acceleration of the Retinal Vascular Tracing Algorithm using FPGAs To accelerate retinal vascular tracing by implementing computation of template responses in reconfigurable hardware. Image Data FPGA Image Data Host Memory Switching Design MEMORY 0 Direction Filter Design BlockRAM Memory Switching Design Data Packing Design Image Data Algorithm Objective Framegrabber (Dillon Eng.) MEMORY 2 PCI Bus Image Data MEMORY 3 MEMORY 1 REG23 REG22 REG21 REG20 REG19 REG18 REG17 REG16 REG15 REG14 REG13 REG12 REG11 REG10 REG9 RESPONSE RESPONSE RESPONSE RESPONSE RESPONSE gt POS POS2 NEG NEG2 PARTIAL RESPONSE PARTIAL RESPONSE PARTIAL RESPONSE PARTIAL RESPONSE INTERCONNECTION RESPONSE > REG8 REG3 REG2 TEMPLATE_B REG7 REG1 REG6 REG5 REG0 TEMPLATE_A RESPONSE RESPONSE_B RESPONSE RESPONSE_A REG4 FIREBIRD BOARD 000 001 010 011 100 101 110 111 Direction Templates What does the algorithm do? Retinal vascular tracing: detection of blood vessels in images of the retina The algorithm finds blood vessels and traces out their structure Where is the algorithm used? Processing live video of the patient retina during laser retinal surgery Highlighting the vascular structure helps the surgeon avoid damage Why do we need to accelerate it? Current implementation: software on a general-purpose processor Images are 512x512 pixels, and need to be processed at frame rate Implementation Memory Management Hardware Acceleration Template responses are calculated in hardware in parallel All pixels in the image are processed Camera connected directly to the board (no host interaction) Only the results are sent to the host after they become available, and while new results are being calculated Very Low Latency Surgical laser must be shut off immediately if we detect that it is aimed incorrectly Cannot tolerate a one frame delay Complex memory management scheme must be introduced Problems Must continuously write new data to input memory from the camera and be reading the data to be processed Cannot read and write from one memory on the same clock cycle The image is stored row-wise, but must be read columnwise Solution Store the image in “checkerboard” fashion in two memories. Every other pixel is stored in a different memory A column of data is read by alternating between the two memories on every clock cycle Checkerboard storage of a 5x5 image TEMPLATE_ COMPARATOR TEMPLATE_ COMPARATOR TEMPLATE_ COMPARATOR + + TEMPLATE_ COMPARATOR Clock Memory 0 TEMPLATE_ COMPARATOR - RESPONSE Badrinath Roysam Charles Stewart Ken Fritsche Miriam Leeser Shawn Miller - < 0 Memory 1 2 1 4 Input Memory 0 3 TEMPLATE_ COMPARATOR Input Memory 1 6 TEMPLATE TEMPLATE_ COMPARATOR RESULT DIRECTION RESPONSE TEMPLATE 10 12 16 Reconfigurable Hardware 20 5 8 15 18 22 24 17 21 Time 13 11 14 9 7 Writing data from camera 19 Reading data to be processed Inactive 23 Firebird reconfigurable computing engine from Annapolis Micro Systems 1 Xilinx VIRTEX E (XCV2000E) FPGA 5 Memory banks (4 x 64-bit, 1 x 32-bit) 5.4 Gbytes/sec of memory bandwidth 66Mhz/64-bit PCI interface to host Results Original Image Each pixel is passed through the design unaltered. More Information In proceedings: Rapid Automated Tracing and Feature Extraction from Retinal Fundus Images Using Direct Exploratory Algorithms A. Can, H. Shen, J.N. Turner, H.L. Tanenbaum and B. Roysam, IEEE Transactions on Information Technology in Biomedicine, June 99. On the web: http://www.ece.neu.edu/groups/rpl/projects/retinaltracing This work was supported in part by CenSSIS, the Center for Subsurface Sensing and Imaging Systems, under the Engineering Research Centers Program of the National Science Foundation (Award Number EEC-9986821). Direction The direction template with the maximum response for each pixel. The direction is represented by a value between 0 and 15. Response The maximum response that led to the direction decision for each pixel. Conclusions •Stand-alone camera outputs only image data. •Our design outputs not only image data, but directional template responses as well. •The cost of additional image processing is a latency on the order of 10-4 seconds. This is a low cost when considering that at 30 frames/sec, a new frame of image data is introduced every 33 msec. •The application for this project is Retinal Vascular Tracing, but the same method can be applied to any problem that requires real-time image processing.