Final Presentation - High Speed Digital Systems Lab

advertisement
Yaron Doweck
Yael Einziger
Supervisor: Mike Sumszyk
Spring 2011
Semester Project
* Learn to use the new TI C6678 multi-core
platform and to exploit its abilities and
advantages.
* Implement a real-time tracking algorithm
using multi-core programming and VLIB.
* Create a framework for multi-core,
Ethernet video streaming and DSP-FPGA
communication.
2
1.Keystone Architecture
2. SYS/BIOS
3.VLIB
4.Tracking Algorithm
5.Tracking System
6.Performance Analysis
7.Encountered Difficulties
8.Future Projects
3
* In the first part of the project, the main
goal was learning:
* The C6678 platform
* TI development enviroments
* The Multi Core SDK
* The SYS\BIOS Real-Time OS
4
5
TeraNet
6
DDR3: Up to
10666MB/s
Shared
Memory: 4
access ports,
each up to
16000MB/s
TeraNet
Switch
Fabric: Up to
256GB/s
C66 DSP
Up to 20
GFLPOS @
1.25GHz
32KB L1P
Cache\SRAM
32000MB/s
512KB L2
Cache\SRAM
16000MB/s
7
Core
C64x
C67x
C66x
Data type
Fixed-Point
Floating-Point
Fixed and Floating-Point
Speed
720-1000 MHz
250-350 MHz
1-1.25 GHz
GMAC/core
0.88
0.5
40
GFLOPS/core
-
2.4
20
L1 Cache
32KB
32KB
32KB L1P+32KB L1D
L2 Cache
1MB
-
512KB + 4MB shared
Power
Consumption
3.7W@700MHz 1.5W@300MHz
(1 core)
(1 core)
8
10W@1GHz
(8 cores)
1. Keystone Architecture
2.SYS/BIOS
3.VLIB
4.Tracking Algorithm
5.Tracking System
6.Performance Analysis
7.Encountered Difficulties
8.Future Projects
9
*SYS/BIOS is an advanced real-time light operating
system from Texas Instruments.
*It is designed for use in embedded applications that
need real-time scheduling and synchronization.
*SYS/BIOS is delivered as a set of pre-compiled
packages that provide the modules that make up
the OS.
*Each can module is loaded and configured
separately (only the selected modules are loaded
making the OS as light as possible).
10
Main SYS/BIOS modules used in the project:
* BIOS – Manages the OS.
* Task – Creating and managing threads.
* HWI – Hardware Interrupts.
* Semaphore - Creating and managing semaphore.
* IPC – Inter Processor Communication.
* Timestamp - Provides timestamp service for
performance analysis.
11
1. Keystone Architecture
2. SYS/BIOS
3.VLIB
4.Tracking Algorithm
5.Tracking System
6.Performance Analysis
7.Encountered Difficulties
8.Future Projects
12
*
VLIB is an extensible library of more than 40
software kernels that are optimized for TI's C64+
digital signal processor (DSP) core.
*
These kernels execute background modeling and
subtraction, object feature extraction, tracking,
recognition and low-level pixel processing to
provide a foundation for video analytics
applications development.
13
*
TI has also provided developers with a bit-exact
version of the library for testing and debugging in
PC (Windows) environment.
*
VLIB’s version used in this project is an unofficial
release compiled with C66x support obtained from
TI Video Surveillance team (VLIB’s developers).
14
1. Keystone Architecture
2. SYS/BIOS
3.VLIB
4.Tracking Algorithm
5.Tracking System
6.Performance Analysis
7.Encountered Difficulties
8.Future Projects
15
Statistical Background
Subtraction
1
Connected Components
Labeling
2
Tracking
16
*
Unlike moving objects, the background of the
image doesn’t change.
*
However, there are still some small variations
along time due to luminosity change, camera
noise, trees, etc.
*
Hence, by studying the variation along time of
each pixel, we can deduce whether it belongs to a
moving object or to the background.
*
API: VLIB_subtractBackgroundS16
17
*Groups foreground pixels that have other
foreground pixels as 8-connected neighbors, and
labels discrete groupings as components.
*Once accomplished, component properties can be
measured and used to extract foreground
information. These properties include bounding box,
centroid and area.
*API: VLIB_createConnectedComponentsList
18
Binary Foreground Image
Connected Components Labeling
*
Tracker association is done by matching each
component to the closest tracker for previous
image.
*
If no existing tracker is close enough, a new
tracker is associated with the component.
*
After all components are associated, any left
trackers are discarded.
19
1. Keystone Architecture
2. SYS/BIOS
3.VLIB
4.Tracking Algorithm
5.Tracking System
6.Performance Analysis
7.Encountered Difficulties
8.Future Projects
20
*
The developed tracking system demonstrates
multicore programming on the DSP using it’s
powerful features:
*
*
*
*
*
Network coprocessor.
EDMA engine.
Multicore Navigator.
Synchronization modules.
Event-driven operations.
21
1*
* (1) and (4) where
implemented using
openCV2.3 with 2
separate threads.
4*
22
Ethernet
Controller
Packet DMA
DDR3
Shared Memory
Double Buffer
Cache Controller
L1 Cache
Cache Controller
Processing
CORE 0
Processing
Message
CORE 1
Notify that
foreground
image is
ready
Processing
DDR3
SHARED MEMORY
Image
at T
Image
at T-1
Background
Model
Mean
Var
Queue
Binary
foreground
Images
List of
Trackers
CORE 0
EDMA
Interrupt
Statistical
Background
Subtraction
Each block is event
driven. That is, it wakes
up only when a specific
event happens.
Semaphore Sync.
Multicore Messaging
Service
CORE 1
Multicore
Message
Connected Component
and Tracking
1. Keystone Architecture
2. SYS/BIOS
3.VLIB
4.Tracking Algorithm
5.Tracking System
6.Performance Analysis
7.Encountered Difficulties
8.Future Projects
26
Ethernet: PCDSP
10MB/s > 34FPS
Ethernet
Controller
Packet DMA
Shared Memory
Double Buffer
DDR3
DDR3 memory
throughput:
10666MB/s via
EDMA3.
SRAM
(local\shared)
memory throughput:
16000MB/s, direct
access.
Processing
Cache Controller
L1 Cache
*
*
In conclusion, The system can process frame size of
120x160 or 240x320 at up to 30FPS.
*
Shared SRAM size: L2 double buffering requires
2Byte/pixel. Gaussian model (for background subtraction)
requires 4Bytes/pixel. Largest frame size possible:
240x320.
*
Webcam: Up to 30FPS. Frame size 120x160, 240x320 or
480x640.
By processing only a part of the image at a time,
the size of the double buffer can be significantly
reduced allowing larger frame size.
28
SHARED MEMORY
Image
at T
(Char)
Image
at T-1
(Char)
Memory can be
significantly
reduced by
processing a
part of the
frame at a time
Background
Model
Mean
(Short)
Var
(Short)
29
Queue
Binary
foreground
Images
List of
Connected
Components
Implementation
of Connected
Component
algorithm will
be more
complicated
*
*
VLIB is optimized for TI C64+ DSP.
As a part of the performance analysis, VLIB’s
performance on the C66 core and on the C64+ core
was compared:
Name
Subtract Background
Update Variance
Update Mean
Erosion
Dilation
Connected Components
Timing Data in L2, Data in L2, Data
on C64+ L1 is SRAM L1 is cache in L1
1.1
4.2
1.59
1.02
1.3
5.1
1.9
1.26
1.0
4.19
1.3
1.01
0.29
0.98
0.2
0.2
0.27
0.98
0.2
0.2
1.1
2.3
1.07
1.5*
*Since Connected Components requires a lot of memory, the image
was located in L1 but the Connected Components buffer was located
in L2 memory.
30
1. Keystone Architecture
2. SYS/BIOS
3.VLIB
4.Tracking Algorithm
5.Tracking System
6.Performance Analysis
7.Encountered Difficulties
8.Future Projects
31
Documentation:
* Incomplete platform’s documents.
* Lack of documentation for the MCSDK
examples.
* Had to learn by Trail-and-Error method.
* Posted questions on TI’s E2E forums.
32
Software Bugs:
* Software bugs in the development
environment.
* Software bugs in the MCSDK examples.
* Repeatedly updated software versions.
* Some bugs are still unsolved.
*
Unable to receive large UDP transmits.
33
* TI’s E2E forums were highly effective in
solving problems.
* The posted questions were answered by TI’s
employees almost immediately.
34
1. Keystone Architecture
2. SYS/BIOS
3.VLIB
4.Tracking Algorithm
5.Tracking System
6.Performance Analysis
7.Encountered Difficulties
8.Future Projects
35
* Tracking system can be enhanced to
support larger frame size.
* Motion estimation (e.g. Kalman filter)
can be added for better tracking
capabilities.
36
* The final report was written as a user’s
guide to the DSP, the development
environment and the tracking system.
* The program on the DSP’s side is highly
modular. Can be easily adapted for any type
of multi core pipeline processing.
37
38
Download