OpenCV

advertisement
How to Accelerate OpenCV Applications
with the Zynq-7000 All Programmable
SoC using Vivado HLS Video Libraries
August 28, 2013
© Copyright 2013 Xilinx
.
OpenCV Overview
Open Source Computer Vision (OpenCV) is widely used to
develop Computer Vision applications
– Library of 2500+ optimized video functions
– Optimized for desktop processors and GPUs
– Tens of thousands users
– Runs out of the box on ARM processors in Zynq
However
– HD processing with OpenCV is often limited by external memory
– Memory bandwidth is a bottleneck for performance
– Memory accesses limit power efficiency
Zynq All-programmable SOCs are a great way of
implementing embedded computer vision applications
– High performance and Low Power
Page 2
© Copyright 2013 Xilinx
.
Real-Time Computer Vision Applications
Computer Vision
Applications
Real-time Analytics
Function
Advanced Drivers Assist
for Safety
Lane or Pedestrian
detection
Surveillance
for Security
Friend vs Foe
recognition
Machine Vision
for Quality
High velocity object
detection
Medical Imaging
For non invasive surgery
Tumor detection
Page 3
© Copyright 2013 Xilinx
.
Real-time Video Analytics Processing
Pixel based
Image Processing and
Feature Extraction
Frame based
Feature processing and
decision making
Pixel based
4Kx2K
Image processing and Feature
extraction
F1
F2
F3
…..
1080p
720p
480p
10000s Ops/feature
1000s of features/sec
= Mops
100s Ops/pixel
8MPx100 Ops/ frame
= 100s Gops
Page 4
© Copyright 2013 Xilinx
.
Heterogeneous Implementation of Real-time
Video Analytics
Pixel based
Image Processing and
Feature Extraction
Frame based
Feature processing and
decision making
Pixel based
4Kx2K
Image processing and Feature
extraction
F1
SoftwareF2
Domain
F3
(ARM)
…..
Hardware Domain
(FPGA) 1080p
720p
480p
10000s Ops/feature
1000s of features/sec
= Mops
100s Ops/pixel
8MPx100 Ops/ frame
= 100s Gops
Page 5
© Copyright 2013 Xilinx
.
Xilinx Real-time Image Analytics
Implementation: Zynq All Programmable SoC
Pixel based
Image Processing and
Feature Extraction
Frame
Frame based
based
Feature
Feature processing
processing and
and
decision
decision making
making
Pixel based
4Kx2K
Image processing and Feature
extraction
F1
F2
F3
…..
1080p
720p
480p
10000s Ops/feature
1000s of features/sec
= Mops
100s Ops/pixel
8MPx100 Ops/ frame
= 100s Gops
Page 6
© Copyright 2013 Xilinx
.
Vivado: Productivity gains for OpenCV functions
C simulation of HD video
algorithm ~1 fps
– RTL simulation of HD video 1
frame per hour
Real-time FPGA
implementation up to
60fps
Page 7
© Copyright 2013 Xilinx
.
Accelerating OpenCV Applications
Driver
Assist
Broadcast
Monitor
HD
Surveillance
Video
Conferencing
Studio
Cinema Camera
Frame-level
processing
Library for PS
Pixel processing
interfaces and basic
functions for analytics
Vivado HLS
Digital
Signage
Consumer
Displays
Office-class
MFP
Machine
Vision
Page 8
Cinema
Projection
Medical
Displays
© Copyright 2013 Xilinx
.
Zynq Video TRD architecture
DDR3 External Memory
DDR3
Processing
System
SD Card
DDR Memory Controller
Dual Core
Cortex-A9
Hardened
Peripherals
S_AXI_HP 64 bit
S_AXI_GP 32b bit
AXI4 Stream
IP Core
AXI Interconnect
AXI VDMA
HDMI
Video
Input
Xylon
Display
Controller
HLS-generated
pipeline
HDMI
Video access to external memory using 64-bit High Performance ports
Control register access using 32-bit General Purpose ports
Video streams implemented using AXI4-Stream
Page 9
© Copyright 2013 Xilinx
.
IP Centric Design flow
Accelerated IP Generation and Integration
C based IP Creation
User Preferred System Integration Environment
C, C++ or SystemC
System Generator for DSP
C Libraries
• Floating
point math.h
• Fixed point
• Video
VHDL or Verilog
plus SW Drivers
Vivado IP Integrator
IP Subsystem
Xilinx IP
3rd Party IP
Vivado RTL Integration
User IP
Page 10
© Copyright 2013 Xilinx
.
Page 11
© Copyright 2013 Xilinx
.
Using OpenCV in FPGA designs
Pure
OpenCV
Application
Integrated
OpenCV
Application
Accelerated
OpenCV
Application
OpenCV
Reference
Image File Read
(OpenCV)
Live Video Input
Live Video Input
OpenCV2AXIvideo
AXIvideo2Mat
OpenCV function
chain
OpenCV function
chain
HLS video library
function chain
Mat2AXIvideo
Image File Write
(OpenCV)
Live Video Output
AXIvideo2OpenCV
Image File Write
(OpenCV)
Page 12
© Copyright 2013 Xilinx
.
Synthesizable
Block
AXIvideo2Mat
HLS video library
function chain
Mat2AXIvideo
Live Video Output
Synthesized
Block
Image File Read
(OpenCV)
Pure OpenCV Application
DDR3 External Memory
Image File Read
(OpenCV)
Processing
System
OpenCV function
chain
DDR3
DDR Memory Controller
SD
Card
Dual Core
Cortex-A9
Hardened
Peripherals
AXI Interconnect
Image File Write
(OpenCV)
AXI VDMA
HDMI
Page 13
Video
Input
HLS-generated
pipeline
© Copyright 2013 Xilinx
.
Xylon
Display
Controller
HDMI
Pure OpenCV Application
Processing
System
OpenCV function
chain
1 DDR3
DDR3 External Memory
Image File Read
(OpenCV)
DDR Memory Controller
SD
Card
Dual Core
Cortex-A9
Hardened
Peripherals
AXI Interconnect
Image File Write
(OpenCV)
AXI VDMA
HDMI
Page 14
Video
Input
HLS-generated
pipeline
© Copyright 2013 Xilinx
.
Xylon
Display
Controller
HDMI
Pure OpenCV Application
Processing
System
OpenCV function
chain
1 DDR32
DDR3 External Memory
Image File Read
(OpenCV)
3
4
5
DDR Memory Controller
SD
Card
Dual Core
Cortex-A9
Hardened
Peripherals
AXI Interconnect
Image File Write
(OpenCV)
AXI VDMA
HDMI
Page 15
Video
Input
HLS-generated
pipeline
© Copyright 2013 Xilinx
.
Xylon
Display
Controller
HDMI
Pure OpenCV Application
DDR3 External Memory
Image File Read
(OpenCV)
Processing
System
OpenCV function
chain
DDR3
DDR Memory Controller
SD
Card
Dual Core
Cortex-A9
Hardened
Peripherals
AXI Interconnect
Image File Write
(OpenCV)
AXI VDMA
HDMI
Page 16
Video
Input
HLS-generated
pipeline
© Copyright 2013 Xilinx
.
Xylon
Display
Controller
HDMI
Integrated OpenCV Application
1 DDR32
DDR3 External Memory
Live Video Input
Processing
System
OpenCV function
chain
3
4
5
DDR Memory Controller
SD
Card
Dual Core
Cortex-A9
Hardened
Peripherals
AXI Interconnect
Live Video Output
AXI VDMA
HDMI
Page 17
Video
Input
HLS-generated
pipeline
© Copyright 2013 Xilinx
.
Xylon
Display
Controller
HDMI
OpenCV Reference / Software Execution
Processing
System
OpenCV2AXIvideo
AXIvideo2Mat
4
5
Dual Core
Cortex-A9
Hardened
Peripherals
Mat2AXIvideo
AXI Interconnect
AXIvideo2OpenCV
Page 18
3
DDR Memory Controller
SD
Card
HLS video library
function chain
Image File Write
(OpenCV)
1 DDR32
DDR3 External Memory
Image File Read
(OpenCV)
AXI VDMA
HDMI
Video
Input
HLS-generated
pipeline
© Copyright 2013 Xilinx
.
Xylon
Display
Controller
HDMI
OpenCV Reference / In system Test
Processing
System
OpenCV2AXIvideo
AXIvideo2Mat
DDR Memory Controller
SD
Card
Dual Core
Cortex-A9
Hardened
Peripherals
HLS video library
function chain
Mat2AXIvideo
AXI Interconnect
AXIvideo2OpenCV
Image File Write
(OpenCV)
Page 19
1 DDR32
DDR3 External Memory
Image File Read
(OpenCV)
AXI VDMA
HDMI
Video
Input
HLS-generated
pipeline
© Copyright 2013 Xilinx
.
Xylon
Display
Controller
HDMI
Accelerated OpenCV Application
1 DDR32
DDR3 External Memory
Live Video Input
AXIvideo2Mat
Processing
System
DDR Memory Controller
SD
Card
Dual Core
Cortex-A9
Hardened
Peripherals
HLS video library
function chain
Mat2AXIvideo
AXI Interconnect
Live Video Output
AXI VDMA
HDMI
Page 20
Video
Input
HLS-generated
pipeline
© Copyright 2013 Xilinx
.
Xylon
Display
Controller
HDMI
OpenCV design flow
OpenCV Block A
1) Develop OpenCV application on Desktop
2) Run OpenCV application on ARM cores
without modification
OpenCV Block B
3) Abstract FPGA portion using I/O functions
4) Replace OpenCV function calls with
synthesizable code
OpenCV Block C
5) Run HLS to generate FPGA accelerator
6) Replace call to synthesizable code with call
to FPGA accelerator
OpenCV Block D
Page 21
© Copyright 2013 Xilinx
.
Partitioned OpenCV Application
opencv2AXIvideo
OpenCV Block A
AXIvideo2HLS
OpenCV Block B
HLS Block B
Synchronization
HLS Block C
OpenCV Block C
HLS2AXIvideo
Synthesizable
OpenCV Block D
Page 22
AXIvideo2opencv
© Copyright 2013 Xilinx
.
OpenCV Design Tradeoffs
OpenCV-based image processing is built around memory
frame buffers
– Poor access locality -> small caches perform poorly
– Complex architectures for performance -> higher power
– Likely ‘good enough’ for many applications
• Low resolution or framerate
• Processing of features or regions of interest in a larger image
Streaming architectures give high performance and low
power
– Chaining image processing functions reduces external memory
accesses
– Video-optimized line buffers and window buffers simpler than
processor caches
– Can be implemented with streaming optimizations in HLS
– Requires conversion of code to be synthesizable
© Copyright 2013 Xilinx
.
HLS Video Libraries
OpenCV functions are not directly synthesizable with HLS
– Dynamic memory allocation
– Floating point
– Assumes images are modified in external memory
The HLS video library is intended to replace many basic
OpenCV functions
– Similar interfaces and algorithms to OpenCV
– Focus on image processing functions implemented in FPGA fabric
– Includes FPGA-specific optimizations
• Fixed point operations instead of floating point
• On-chip Linebuffers and window buffers
– Not necessarily bit-accurate
Page 24
© Copyright 2013 Xilinx
.
Xilinx HLS Video Library 2013.2
AXI4-Stream IO Functions
Video Data Modeling
Linebuffer class
Window class
OpenCV Interface Functions
cvMat2AXIvideo
AXIvideo2cvMat
IplImage2AXIvideo AXIvideo2IplImage
CvMat2AXIvideo
AXIvideo2CvMat
Video Functions
AbsDiff
AddS
AddWeighted
And
Avg
AvgSdv
Cmp
CmpS
CornerHarris
CvtColor
Dilate
AXIvideo2Mat
Mat2AXIvideo
cvMat2hlsMat
IplImage2hlsMat
CvMat2hlsMat
hlsMat2cvMat
hlsMat2IplImage
hlsMat2CvMat
MaxS
Mean
Merge
Min
MinMaxLoc
MinS
Mul
Not
PaintMask
Range
Reduce
Duplicate
EqualizeHist
Erode
FASTX
Filter2D
GaussianBlur
Harris
HoughLines2
Integral
InitUndistortRectifyMap
Max
Remap
Resize
Scale
Set
Sobel
Split
SubRS
SubS
Sum
Threshold
Zero
For function signatures and descriptions, see the HLS user guide UG 902
Page 25
© Copyright 2013 Xilinx
.
Video Library Functions
C++ code contained in hls namespace. #include “hls_video.h”
Similar interface, equivalent behavior with OpenCV, e.g.
– OpenCV library:
cvScale(src, dst, scale, shift);
– HLS video library:
hls::Scale<...>(src, dst, scale, shift);
Some constructor arguments have corresponding or replacement
template parameters, e.g.
– OpenCV library:
cv::Mat mat(rows, cols, CV_8UC3);
– HLS video library:
hls::Mat<ROWS, COLS, HLS_8UC3> mat(rows, cols);
ROWS and COLS specify the maximum size of an image processed
Page 26
© Copyright 2013 Xilinx
.
Video Library Core Structures
OpenCV
HLS Video Library
cv::Point_<T>, CvPoint
hls::Point_<T>, hls::Point
cv::Size_<T>, CvSize
hls::Size_<T>, hls::Size
cv::Rect_<T>, CvRect
hls::Rect_<T>, hls::Rect
cv::Scalar_<T>, CvScalar
hls::Scalar<N, T>
cv::Mat, IplImage, CvMat
hls::Mat<ROWS, COLS, T>
cv::Mat mat(rows, cols, CV_8UC3);
hls::Mat<ROWS, COLS, HLS_8UC3> mat
(rows, cols);
IplImage* img =
cvCreateImage(cvSize(cols,rows),
IPL_DEPTH_8U, 3);
hls::Mat<ROWS, COLS, HLS_8UC3> img,
(rows, cols);
hls::Mat<ROWS, COLS, HLS_8UC3> img;
hls::Window<ROWS, COLS, T>
hls::LineBuffer<ROWS, COLS, T>
Page 27
© Copyright 2013 Xilinx
.
Limitations
Must replace OpenCV calls with video library functions
Frame buffer access not supported through pointers
– use VDMA and AXI Stream adapter functions
Random access not supported
– data read more than once must be duplicated
– see hls::Duplicate()
In-place update not supported
– e.g. cvRectangle (img, point1, point2)
OpenCV
HLS Video Library
Read operation
pix = cv_mat.at<T>(i,j)
pix = cvGet2D(cv_img,i,j)
hls_img >> pix
Write operation
cv_mat.at<T>(i,j) = pix
cvSet2D(cv_img,i,j,pix)
hls_img << pix
Page 28
© Copyright 2013 Xilinx
.
OpenCV Code
One image input, one image output
– Processed by chain of functions sequentially
…
IplImage* src=cvLoadImage("test_1080p.bmp");
IplImage* dst=cvCreateImage(cvGetSize(src),
src->depth, src->nChannels);
cvSobel(src, dst, 1, 0);
cvSubS(dst, cvScalar(100,100,100), src);
cvScale(src, dst, 2, 0);
cvErode(dst, src);
cvDilate(src, dst);
cvSaveImage("result_1080p.bmp", dst);
cvReleaseImage(&src);
cvReleaseImage(&dst);
…
OpenCV function
chain
Image Write
(OpenCV)
test_opencv.cpp
Page 29
Image Read
(OpenCV)
© Copyright 2013 Xilinx
.
Integrated OpenCV Application
System provides pointer to frame buffers
Synthesizable code can also be run on ARM
void img_process(ZNQ_S32 *rgb_data_in, ZNQ_S32 *rgb_data_out, int
height, int width, int stride, int flag_OpenCV) {
// constructing OpenCV interface
IplImage* src_dma =
cvCreateImageHeader(cvSize(width, height), IPL_DEPTH_8U, 4);
IplImage* dst_dma =
cvCreateImageHeader(cvSize(width, height), IPL_DEPTH_8U, 4);
src_dma->imageData = (char*)rgb_data_in;
dst_dma->imageData = (char*)rgb_data_out;
src_dma->widthStep = 4 * stride;
dst_dma->widthStep = 4 * stride;
if (flag_OpenCV) {
opencv_image_filter(src_dma, dst_dma);
} else {
sw_image_filter(src_dma, dst_dma);
}
OpenCV function
chain
Live Video Output
cvReleaseImageHeader(&src_dma);
cvReleaseImageHeader(&dst_dma);
img_filters.c
}
Page 30
Live Video Input
© Copyright 2013 Xilinx
.
Accelerated with Vivado HLS video library
Top level function extracted for HW acceleration
#include “hls_video.h” // header file of HLS video library
#include “hls_opencv.h” // header file of OpenCV I/O
// typedef video library core structures
typedef hls::stream<ap_axiu<32,1,1,1> >
typedef hls::Scalar<3, uchar>
typedef hls::Mat<1080,1920,HLS_8UC3>
AXI_STREAM;
RGB_PIXEL;
RGB_IMAGE;
Image Read (OpenCV)
void image_filter(AXI_STREAM& src_axi, AXI_STREAM& dst_axi,
int rows, int cols);
top.h
OpenCV2AXIvideo
AXIvideo2Mat
#include “top.h”
…
HLS video library
function chain
IplImage* src=cvLoadImage("test_1080p.bmp");
IplImage* dst=cvCreateImage(cvGetSize(src),
src->depth, src->nChannels);
AXI_STREAM src_axi, dst_axi;
IplImage2AXIvideo(src, src_axi);
Mat2AXIvideo
image_filter(src_axi, dst_axi, src->height, src->width);
AXIvideo2IplImage(dst_axi, dst);
Image Write (OpenCV)
cvSaveImage("result_1080p.bmp", dst);
cvReleaseImage(&src);
cvReleaseImage(&dst);
Page 31
AXIvideo2OpenCV
test.cpp
© Copyright 2013 Xilinx
.
Accelerated with Vivado HLS video library
HW Synthesizable Block for FPGA acceleration
– Consist of video library function and interfaces
– Replace OpenCV function with similar function in hls namespace
void image_filter(AXI_STREAM& input, AXI_STREAM& output, int rows, int cols) {
//Create AXI streaming interfaces for the core
#pragma
#pragma
#pragma
#pragma
#pragma
#pragma
#pragma
HLS
HLS
HLS
HLS
HLS
HLS
HLS
RESOURCE variable=input core=AXIS metadata="-bus_bundle INPUT_STREAM"
RESOURCE variable=output core=AXIS metadata="-bus_bundle OUTPUT_STREAM"
RESOURCE variable=rows core=AXI_SLAVE metadata="-bus_bundle CONTROL_BUS"
RESOURCE variable=cols core=AXI_SLAVE metadata="-bus_bundle CONTROL_BUS"
RESOURCE variable=return core=AXI_SLAVE metadata="-bus_bundle CONTROL_BUS"
INTERFACE ap_stable port=rows
INTERFACE ap_stable port=cols
RGB_IMAGE img_0(rows, cols), img_1(rows, cols), img_2(rows, cols);
RGB_IMAGE img_3(rows, cols), img_4(rows, cols), img_5(rows, cols);
RGB_PIXEL pix(50, 50, 50);
#pragma HLS dataflow
hls::AXIvideo2Mat(input, img_0);
hls::Sobel<1,0,3>(img_0, img_1);
hls::SubS(img_1, pix, img_2);
hls::Scale(img_2, img_3, 2, 0);
hls::Erode(img_3, img_4);
hls::Dilate(img_4, img_5);
hls::Mat2AXIvideo(img_5, output);
top.cpp
}
Page 32
© Copyright 2013 Xilinx
.
Image Read (OpenCV)
OpenCV2AXIvideo
AXIvideo2Mat
HLS video library
function chain
Mat2AXIvideo
AXIvideo2OpenCV
Image Write (OpenCV)
Using Linux Userspace API
Modify device tree to include register map
FILTER@0x400D0000 {
compatible = "xlnx,generic-hls";
reg = <0x400d0000 0xffff>;
interrupts = <0x0 0x37 0x4>;
interrupt-parent = <0x1>;
};
Live Video Input
Call from userspace after mmap()
AXIvideo2Mat
Ximage_filter xsfilter;
int fd_uio = 0;
if ((fd_uio = open("/dev/uio0", O_RDWR)) < 0) {
printf("UIO: Cannot open device node\n");
}
xsfilter.Control_bus_BaseAddress =
(u32)mmap(NULL, XSOBEL_FILTER_CONTROL_BUS_SIZE,
PROT_READ|PROT_WRITE, MAP_SHARED, fd_uio, 0);
xsfilter.IsReady = XIL_COMPONENT_IS_READY;
// init the configuration for image filter
XImage_filter_SetRows(&xsfilter, sobel_configuration.height);
XImage_filter_SetCols(&xsfilter, sobel_configuration.width);
XImage_filter_EnableAutoRestart(&xsfilter);
XImage_filter_Start(&xsfilter);
Page 33
© Copyright 2013 Xilinx
.
HLS video library
function chain
Mat2AXIvideo
Live Video Output
HLS Directives for Video Processing
Assign „input‟ to be an AXI4 stream named “INPUT_STREAM”
#pragma HLS RESOURCE variable=input core=AXIS metadata="-bus_bundle INPUT_STREAM"
Assign control interface to an AXI4-Lite interface
#pragma HLS RESOURCE variable=return core=AXI_SLAVE metadata="-bus_bundle CONTROL_BUS"
Assign „rows‟ to be accessible through the AXI4-Lite interface
#pragma HLS RESOURCE variable=rows core=AXI_SLAVE metadata="-bus_bundle CONTROL_BUS"
Declare that „rows‟ will not be changed during the execution of
the function
#pragma HLS INTERFACE ap_stable port=rows
Enable streaming dataflow optimizations
#pragma HLS dataflow
Page 34
© Copyright 2013 Xilinx
.
A more complex OpenCV example: fast-corners
This code is not „streaming‟ and must be rewritten
– Random access and in-place operation on ‘dst’
void opencv_image_filter(IplImage* img, IplImage* dst ) {
IplImage* gray = cvCreateImage(cvSize(img->width,img->height), 8, 1 );
cvCvtColor( img, gray, CV_BGR2GRAY );
std::vector<cv::KeyPoint> keypoints;
cv::Mat gray_mat(gray,0);
cv::FAST(gray_mat, keypoints, 20,true );
int rect=2;
cvCopy(img,dst);
for (int i=0; i<keypoints.size(); i++) {
cvRectangle(dst,
cvPoint(keypoints[i].pt.x,keypoints[i].pt.y),
cvPoint(keypoints[i].pt.x+rect,keypoints[i].pt.y+rect),
cvScalar(255,0,0),1);
}
cvReleaseImage( &gray );
}
opencv_top.cpp
Page 35
© Copyright 2013 Xilinx
.
A more complex OpenCV example: fast-corners
This code is „streaming‟
– Note that function correspondence is not 1:1!
void opencv_image_filter(IplImage* src, IplImage* dst)
{
IplImage* gray = cvCreateImage( cvGetSize(src), 8, 1 );
IplImage* mask = cvCreateImage( cvGetSize(src), 8, 1 );
IplImage* dmask = cvCreateImage( cvGetSize(src), 8, 1 );
std::vector<cv::KeyPoint> keypoints;
cv::Mat gray_mat(gray,0);
cvCvtColor(src, gray, CV_BGR2GRAY );
cv::FAST(gray_mat, keypoints, 20, true);
GenMask(mask, keypoints);
cvDilate(mask,dmask);
cvCopy(src,dst);
PrintMask(dst,dmask,cvScalar(255,0,0));
hls::FASTX
hls::PaintMask
cvReleaseImage( &mask );
cvReleaseImage( &dmask );
cvReleaseImage( &gray );
}
Page 36
opencv_top.cpp
© Copyright 2013 Xilinx
.
A more complex OpenCV example: fast-corners
Synthesizable code
– Note ‘#pragma HLS stream”
hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC3>
_src(rows,cols);
hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC3>
_dst(rows,cols);
hls::AXIvideo2Mat(input, _src);
hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC3>
src0(rows,cols);
hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC3>
src1(rows,cols);
#pragma HLS stream depth=20000 variable=src1.data_stream
hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC1>
mask(rows,cols);
hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC1>
dmask(rows,cols);
hls::Scalar<3,unsigned char> color(255,0,0);
hls::Duplicate(_src,src0,src1);
hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC1>
gray(rows,cols);
hls::CvtColor<HLS_BGR2GRAY>(src0,gray);
hls::FASTX(gray,mask,20,true);
hls::Dilate(mask,dmask);
hls::PaintMask(src1,dmask,_dst,color);
hls::Mat2AXIvideo(_dst, output);
top.cpp
Page 37
© Copyright 2013 Xilinx
.
Streams and Reconvergent paths
hls::Mat conceptually represents a whole image, but is
implemented as a stream of pixels
template<int ROWS, int COLS, int T> class Mat {
public:
HLS_SIZE_T rows, cols;
hls::stream<HLS_TNAME(T)> data_stream[HLS_MAT_CN(T)];
};
hls_video_core.h
Fast-corners contains a reconvergent path
– The stream of pixels for src1 must include enough buffering to match
the delay through FASTX and Dilate (approximately 10 video lines *
1920 pixels)
CvtColor
FASTX
Dilate
PaintMask
src1
#pragma HLS stream depth=20000 variable=src1.data_stream
Page 38
© Copyright 2013 Xilinx
.
Performance Analysis
AXI Performance Monitor collects statistics on memory
bandwidth
– see /mnt/AXI_PerfMon.log
Video + fast corners
– 1920*1080*60*32 = ~4 Gb/s per stream
– HP0: Read 4.01 Gb/s, Write 4.01 Gb/s, Total 8.03 Gb/s
– HP2: Read 4.01 Gb/s, Write 4.01 Gb/s, Total 8.03 Gb/s
Page 39
© Copyright 2013 Xilinx
.
Power Analysis
Voltage and Current can be read from the digital power
regulators on the ZC702 board.
Custom, realtime HD video processing in 2-3 Watts total system
power
– FASTX is less than 200 mW incremental power
3000
2500
2000
DDR
PL IO
PL core
PS IO
PS core
1500
1000
500
0
Active Idle
Page 40
Idle + Video
Fast Corners +
video
© Copyright 2013 Xilinx
.
HLS and Zynq accelerates OpenCV apps
OpenCV functions enable fast prototyping of Computer
Vision algorithms
Computer Vision applications are inherently heterogenous
and require a mix HW and SW implementation
Vivado HLS video library accelerates mapping of openCV
functions to FPGA programmable fabric
Zynq offers power-optimized integrated solution with high
performance programmable logic and embedded ARM
Page 41
© Copyright 2013 Xilinx
.
Additional OpenCV Collateral at Xilinx.com
Download XAPP1167 from
Xilinx.com
QuickTake: Leveraging
OpenCV and High-Level
Synthesis with Vivado
http://www.xilinx.com/hls
http://www.xilinx.com/getlicense
Page 42
© Copyright 2013 Xilinx
.
Download