Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project * Learn to use the new TI C6678 multi-core platform and to exploit its abilities and advantages. * Implement a real-time tracking algorithm using multi-core programming and VLIB. * Create a framework for multi-core, Ethernet video streaming and DSP-FPGA communication. 2 1.Keystone Architecture 2. SYS/BIOS 3.VLIB 4.Tracking Algorithm 5.Tracking System 6.Performance Analysis 7.Encountered Difficulties 8.Future Projects 3 * In the first part of the project, the main goal was learning: * The C6678 platform * TI development enviroments * The Multi Core SDK * The SYS\BIOS Real-Time OS 4 5 TeraNet 6 DDR3: Up to 10666MB/s Shared Memory: 4 access ports, each up to 16000MB/s TeraNet Switch Fabric: Up to 256GB/s C66 DSP Up to 20 GFLPOS @ 1.25GHz 32KB L1P Cache\SRAM 32000MB/s 512KB L2 Cache\SRAM 16000MB/s 7 Core C64x C67x C66x Data type Fixed-Point Floating-Point Fixed and Floating-Point Speed 720-1000 MHz 250-350 MHz 1-1.25 GHz GMAC/core 0.88 0.5 40 GFLOPS/core - 2.4 20 L1 Cache 32KB 32KB 32KB L1P+32KB L1D L2 Cache 1MB - 512KB + 4MB shared Power Consumption 3.7W@700MHz 1.5W@300MHz (1 core) (1 core) 8 10W@1GHz (8 cores) 1. Keystone Architecture 2.SYS/BIOS 3.VLIB 4.Tracking Algorithm 5.Tracking System 6.Performance Analysis 7.Encountered Difficulties 8.Future Projects 9 *SYS/BIOS is an advanced real-time light operating system from Texas Instruments. *It is designed for use in embedded applications that need real-time scheduling and synchronization. *SYS/BIOS is delivered as a set of pre-compiled packages that provide the modules that make up the OS. *Each can module is loaded and configured separately (only the selected modules are loaded making the OS as light as possible). 10 Main SYS/BIOS modules used in the project: * BIOS – Manages the OS. * Task – Creating and managing threads. * HWI – Hardware Interrupts. * Semaphore - Creating and managing semaphore. * IPC – Inter Processor Communication. * Timestamp - Provides timestamp service for performance analysis. 11 1. Keystone Architecture 2. SYS/BIOS 3.VLIB 4.Tracking Algorithm 5.Tracking System 6.Performance Analysis 7.Encountered Difficulties 8.Future Projects 12 * VLIB is an extensible library of more than 40 software kernels that are optimized for TI's C64+ digital signal processor (DSP) core. * These kernels execute background modeling and subtraction, object feature extraction, tracking, recognition and low-level pixel processing to provide a foundation for video analytics applications development. 13 * TI has also provided developers with a bit-exact version of the library for testing and debugging in PC (Windows) environment. * VLIB’s version used in this project is an unofficial release compiled with C66x support obtained from TI Video Surveillance team (VLIB’s developers). 14 1. Keystone Architecture 2. SYS/BIOS 3.VLIB 4.Tracking Algorithm 5.Tracking System 6.Performance Analysis 7.Encountered Difficulties 8.Future Projects 15 Statistical Background Subtraction 1 Connected Components Labeling 2 Tracking 16 * Unlike moving objects, the background of the image doesn’t change. * However, there are still some small variations along time due to luminosity change, camera noise, trees, etc. * Hence, by studying the variation along time of each pixel, we can deduce whether it belongs to a moving object or to the background. * API: VLIB_subtractBackgroundS16 17 *Groups foreground pixels that have other foreground pixels as 8-connected neighbors, and labels discrete groupings as components. *Once accomplished, component properties can be measured and used to extract foreground information. These properties include bounding box, centroid and area. *API: VLIB_createConnectedComponentsList 18 Binary Foreground Image Connected Components Labeling * Tracker association is done by matching each component to the closest tracker for previous image. * If no existing tracker is close enough, a new tracker is associated with the component. * After all components are associated, any left trackers are discarded. 19 1. Keystone Architecture 2. SYS/BIOS 3.VLIB 4.Tracking Algorithm 5.Tracking System 6.Performance Analysis 7.Encountered Difficulties 8.Future Projects 20 * The developed tracking system demonstrates multicore programming on the DSP using it’s powerful features: * * * * * Network coprocessor. EDMA engine. Multicore Navigator. Synchronization modules. Event-driven operations. 21 1* * (1) and (4) where implemented using openCV2.3 with 2 separate threads. 4* 22 Ethernet Controller Packet DMA DDR3 Shared Memory Double Buffer Cache Controller L1 Cache Cache Controller Processing CORE 0 Processing Message CORE 1 Notify that foreground image is ready Processing DDR3 SHARED MEMORY Image at T Image at T-1 Background Model Mean Var Queue Binary foreground Images List of Trackers CORE 0 EDMA Interrupt Statistical Background Subtraction Each block is event driven. That is, it wakes up only when a specific event happens. Semaphore Sync. Multicore Messaging Service CORE 1 Multicore Message Connected Component and Tracking 1. Keystone Architecture 2. SYS/BIOS 3.VLIB 4.Tracking Algorithm 5.Tracking System 6.Performance Analysis 7.Encountered Difficulties 8.Future Projects 26 Ethernet: PCDSP 10MB/s > 34FPS Ethernet Controller Packet DMA Shared Memory Double Buffer DDR3 DDR3 memory throughput: 10666MB/s via EDMA3. SRAM (local\shared) memory throughput: 16000MB/s, direct access. Processing Cache Controller L1 Cache * * In conclusion, The system can process frame size of 120x160 or 240x320 at up to 30FPS. * Shared SRAM size: L2 double buffering requires 2Byte/pixel. Gaussian model (for background subtraction) requires 4Bytes/pixel. Largest frame size possible: 240x320. * Webcam: Up to 30FPS. Frame size 120x160, 240x320 or 480x640. By processing only a part of the image at a time, the size of the double buffer can be significantly reduced allowing larger frame size. 28 SHARED MEMORY Image at T (Char) Image at T-1 (Char) Memory can be significantly reduced by processing a part of the frame at a time Background Model Mean (Short) Var (Short) 29 Queue Binary foreground Images List of Connected Components Implementation of Connected Component algorithm will be more complicated * * VLIB is optimized for TI C64+ DSP. As a part of the performance analysis, VLIB’s performance on the C66 core and on the C64+ core was compared: Name Subtract Background Update Variance Update Mean Erosion Dilation Connected Components Timing Data in L2, Data in L2, Data on C64+ L1 is SRAM L1 is cache in L1 1.1 4.2 1.59 1.02 1.3 5.1 1.9 1.26 1.0 4.19 1.3 1.01 0.29 0.98 0.2 0.2 0.27 0.98 0.2 0.2 1.1 2.3 1.07 1.5* *Since Connected Components requires a lot of memory, the image was located in L1 but the Connected Components buffer was located in L2 memory. 30 1. Keystone Architecture 2. SYS/BIOS 3.VLIB 4.Tracking Algorithm 5.Tracking System 6.Performance Analysis 7.Encountered Difficulties 8.Future Projects 31 Documentation: * Incomplete platform’s documents. * Lack of documentation for the MCSDK examples. * Had to learn by Trail-and-Error method. * Posted questions on TI’s E2E forums. 32 Software Bugs: * Software bugs in the development environment. * Software bugs in the MCSDK examples. * Repeatedly updated software versions. * Some bugs are still unsolved. * Unable to receive large UDP transmits. 33 * TI’s E2E forums were highly effective in solving problems. * The posted questions were answered by TI’s employees almost immediately. 34 1. Keystone Architecture 2. SYS/BIOS 3.VLIB 4.Tracking Algorithm 5.Tracking System 6.Performance Analysis 7.Encountered Difficulties 8.Future Projects 35 * Tracking system can be enhanced to support larger frame size. * Motion estimation (e.g. Kalman filter) can be added for better tracking capabilities. 36 * The final report was written as a user’s guide to the DSP, the development environment and the tracking system. * The program on the DSP’s side is highly modular. Can be easily adapted for any type of multi core pipeline processing. 37 38