Avnet SpeedWay Workshops Accelerating Your Success™ Avnet Speedway Design Workshop™ Creating FPGA-based Co-Processors for DSPs Using Model Based Design Techniques Lecture 5: Creating a Stand-alone Video System V10_1_2_0 1 Avnet SpeedWay Workshops Model-Based Design Flow Develop Executable Spec in Simulink Design Exploration for Targeting Hardware Partition Between DSP and FPGA Co-Processor Verify Hardware in HW Co-simulation Implement Stand-Alone Video System Avnet SpeedWay Design Workshop™ 2 The final design phase after verification in simulation is implementation as a stand-alone system comprised of DSP and FPGA co-processor. 2 Avnet SpeedWay Workshops The Problem We Wish to Solve Maintaining a complex system involving DSP and FPGA co-processor can be tedious and error-prone. MathWorks model-based design bridges TI DSP and Xilinx FPGA design flows with automatic code generation to remove the grunt work of manually maintaining the API, including memory-maps, function headers and C-code device drivers in Code Composer Studio. Final FPGA co-processor system offers better performance. Avnet SpeedWay Design Workshop™ 3 3 Avnet SpeedWay Workshops Agenda • Interfacing the DSP and FPGA Co-Processor • Avnet Spartan3A-DSP DaVinci Platform with + PS Video EXP Module • Model-Based Infrastructure for Stand-Alone Implementation Avnet SpeedWay Design Workshop™ 4 4 Avnet SpeedWay Workshops … Design Flow for Stand-Alone Implementation TI TI Xilinx Xilinx MATLAB MATLAB®® and and Simulink Simulink®® Algorithm Algorithm and and System System Design Design Real-Time Workshop Real Real-Time Workshop Embedded EmbeddedCoder, Coder, Targets, Targets,Links Links Video source Generate Generate C/ ASM Verify Avnet Avnet Xilinx Xilinx System System Generator Generator for for DSP DSP Verify MathWorks MathWorks LCD Panel HDL Link for CCS Hardware CoCo-simulation Code Composer ISE DSP FPGA Verify Chipscope Chipscope Avnet Spartan3A-DSP FPGA / DaVinci Platform Avnet SpeedWay Design Workshop™ 5 < mouse click > We begin by examining the connectivity for data transfer between the DM6437 and FPGA co-processor. < mouse click > We continue with automatic code generation of executables for both DSP and FPGA, including the Avnet board support package for Simulink on Avnet Spartan-3A DSP DaVinci development Kit. < mouse click > We conclude with in-system verification techniques of the combined the DSP and FPGA co-processor system. Note that video now flows into the system from a live source, contrary to video frames generated by a Simulink testbench for verification using hardware co-simulation. 5 Avnet SpeedWay Workshops Model Partition DSP / FPGA Image Translate Video Stabilization Model Motion estimation updated template updated ROI Location estimation Relative motion vector from frame to frame 2 1 Sum-of-Absolute Differences (SAD) Simulink Simulink Algorithm Algorithm and and System System Design Design Avnet SpeedWay Design Workshop™ 6 Recall the steps of labs 3 and 4, where a Simulink model was partitionned between DSP and FPGA … Moving to a stand-alone implementation, we must now bridge the FPGA co-processor hardware and to DSP software. 6 Avnet SpeedWay Workshops . Bridging Software to Hardware DSP Core ? ? Co-Processor • Requires hardware interface and communication protocol • Managing asynchronous clock domains • Software API to communicate with hardware Avnet SpeedWay Design Workshop™ 7 Bridging software (DSP) to hardware (FPGA Co-Processor) requires: •hardware interface and communication protocol •managing asynchronous clock domains •software API to communicate with hardware Mouse click … How can this be implemented ? Let's examine these aspects in detail, especially as they relate to exchanging streaming data such as video between the FPGA co-processor and the DSP. 7 Avnet SpeedWay Workshops Bridging Software to Hardware / EMIF DSP Core EMIF ? Data Control Data Control Co-Processor Data and control on common bus (EMIF) – Obliges burst transfer over time-shared bus – Inefficient for streaming data (ex. video) – Requires inserted syncs, framing in DSP software, handshaking Avnet SpeedWay Design Workshop™ 8 Bridging software on the DSP-side to the hardware co-processor requires first and foremost a hardware interface and communication protocol. One possibility is EMIF, ‘External Memory Interface’, which groups address, data and control signals for interface to external devices. EMIF comes in a variety of sizes across different families of DaVinci, from synchronous 32-bit data on DM642 to asynchronous 8-bit data on the DM6437. It is convenient to differentiate between control data and streaming data. Control data is often bursty in nature and not time-critical, while streaming data is constant and requires a fixed bandwidth. Exchanging streaming data such as video between DSP and FPGA co-processor over a shared bus such as EMIF will require time-multiplexed burst transactions to accommodate other devices access to the bus. Control data must be inserted between streaming data bursts in a time-multiplexed bus-sharing. Furthermore, exchanging video over a bus such as EMIF would necessitate inserted syncs, and would require framing in software in the DSP and asynchronous FIFOs in the FPGA. For these reasons, EMIF is not the best choice of interface between the DSP and FPGA coprocessor. 8 Avnet SpeedWay Workshops Bridging Software to Hardware / VLYNQ DSP Core VPBE Video VPBE INTERFACE VLYNQ Control VLYNQ LOGICORE VPFE Video VPFE INTERFACE Co-Processor Separate data and control – Streaming full-duplex video over dedicated Video Processing Subsystem of DM6437 – Control over VLYNQ – Simple, fast, efficient Avnet SpeedWay Design Workshop™ 9 A simple and efficient approach is to transport streaming data over dedicated ports of the Video Processing Subsystem, while control data flows through a separate, non-time critical link. This offers simple, fast uninterrupted bi-directional streaming video between DSP and FPGA co-processor. Let's examine the resources on DM6437 to implement separate video and control interfaces to the FPGA co-processor. -----------------------------------------------------Why does video flow thru FPGA and not directly to DSP ? … because board is built to pipe video thru FPGA to/from DSP. -----------------------------------------------------Note: Although not officially supported, TI has done some work to allow general-purpose data, not just video, to flow into the VPFE and out of the VPBE ports. Contact Bernie Thompson at TI. 9 Avnet SpeedWay Workshops VLYNQ Video DSP Core VLYNQ Control VLYNQ LOGICORE Co-Processor VPFE • Xilinx and TI collaborating forVideo seamless interconnection VPFE INTERFACE between DSPs and FPGAs • Low pin count, low cost, scalable bandwidth • DaVinci has on-chip VLYNQ peripheral • Xilinx VYNQ LogiCORETM IP delivered through Xilinx CORE Generator Avnet SpeedWay Design Workshop™ 10 VLYNQ is a serial (i.e. low pin count) communications interface that enables the extension of an internal bus segment to one or more external physical devices (ex. FPGA). VLYNQ accomplishes this function by serializing bus transactions in one device, transferring the serialized transaction between devices via a VLYNQ port, and deserializing the transaction in the external device. VLYNQ peripheral is offered in DaVinci (DM644x and DM643x devices), Jacinto, Avalanche, Puma, Sangam, Titan, APEX and other TI communication processors. Xilinx has licensed VLYNQ, so it is a great opportunity to connect FPGAs to TI DSPs, in addition to EMIF (External memory I/F) and Serial Rapid IO (SRIO). 10 Avnet SpeedWay Workshops VLYNQ High-Speed, low pin-count, full duplex, peer-to-peer Serial I/F Extension of an internal bus segment to one or more external devices Point-to-point serial interface for other VLYNQ compatible devices External devices are mapped to local physical address space and appear as if they are on the internal bus of the local device CMD 1 (10 bits) CMD 2 (10 bits) Packet Type (4 bits) Address Mask (4 bits) Byte Count (10 bits) Address (<4*10 bits) TRANSMIT Data (N*10 bits) End of Packet EOP (10 bits) VLYNQ VLYNQ • • • • RECEIVE CLOCK CLK REQ (optional) Xilinx FPGA • Scalable to meet bandwidth requirements (3pin to • Memory mapped, master & slave on a single bus • Software transparent for future device integration 10pin) • Single ended, unidirectional I/O • 8b/10b encoding. In-band signaling Avnet SpeedWay Design Workshop™ 11 Avnet Spartan-3A DSP DaVinci board uses all 4 data (transmit / receive) pairs. Individual pins can be GPIO if a lower bandwidth VLYNQ interface is desired or not used. 11 Avnet SpeedWay Workshops VLYNQ Performance • 8b/10b coding causes 20% overhead - only 8bits of data contained in every 10bits sent • Total Overhead = protocol overhead + 8b/10b overhead • Theoretical Maximum throughput = 4 data lines X 100Mhz max clock = 50 Mbytes/sec) Maximum Effective Throughput - With 99Mhz Clock (100 Mhz max clock supported) All benchmarks using 4 VLYNQ transmit/receive pairs. Burst Size in 32-bit Words Throughput (Mbits/sec) Throughput (Mbytes/sec) 1 126.72 15.84 4 220.37 27.55 8 259.93 32.49 16 285.56 35.7 Avnet SpeedWay Design Workshop™ 12 The max write rate describes the maximum available data rate of the serial interface for transmission, taking into consideration the 8b/10b encoding overheads. This is calculated as follows: Max write rate = VLYNQ Serial Clock (MHZ) x No. of Pins x 8b/10b encoding overhead The 8b/10b encoding overhead essentially accounts for 20% overhead, thus the actual effective data throughput after subtraction of the encoding overhead gives a factor of 0.8. For example, if the VLYNQ clock is running at 99 MHZ on a 4 pin per direction interface, the raw data is 99 x 4 or 396 Mbps. After the 8B10B encoding is removed, the maximum write rate is 396 x 0.8 = 316.8 Mbps. The total throughput on the VLYNQ interface includes both transmit and receive directions. Therefore, for the above configuration, a remote device can also be writing to the local device at the same data rates, then the total throughput is the sum of transmit and receive rates, or 633.6 Mbps. In addition to the 8b/10b encoding, the packet structure for read/write operations also results in additional overheads. The VLYNQ module can transfer single 32-bit words or a burst of up to sixteen 32-bit words. The data and throughput calculations shown here are sample calculations for most ideal situations. In general, the data rates depend on a variety of other factors, such as efficiency of read/write burst transactions, ability of buffering up read/write data, and how best it can be serially shifted out without stalling additional read/write data burst, remote and local components , both external and internal (device operations, board considerations, etc.). References: TMS320DM643x DMP VLYNQ Port User's Guide / TI Literature: spru938b.pdf (Appendix B) 12 Avnet SpeedWay Workshops VLYNQ Remote Memory Mapping Video Processing processing Subsystem DSP Core Connectivity VLYNQ 0400:0000 07FF:FFFF 0800:0000 0800:00FF 0800:0100 0801:00FF 0801:0100 0841:00FF • • Peripheral A 0000:0000 03FF:FFFF Peripheral B VLYNQ Address decode Map Region 1 Map Region 1 Map Region 2 0400:00FF Peripheral C 0500:0000 0500:FFFF Map Region 2 Map Region 3 0400:0000 Map Region 3 Peripheral D 0B00:0000 Map Region 4 Map Region 4 0B3F:FFFF Remote VLYNQ devices memory mapped to the local (DSP host) device’s address space Finer memory-decoding can target smaller address ranges within the FPGA co-processor Avnet SpeedWay Design Workshop™ 13 Remote VLYNQ device(s) are memory mapped to the local (host) device’s address space when a link is established (and appear as if they are on the internal bus, similar to any other on-chip peripherals). Enumerating the VLYNQ devices (single or multiple) into a coherent memory map for accessing each device is part of the initialization sequence. After the enumeration, the host (local) device can access the remote device address map using local device addresses. The VLYNQ module in the host device manages the address translation of the local address to the remote address. A remote VLYNQ device is mapped to the local device’s address via the address map registers (TX address map, RX address map size n, RX address map offset n, where n = 1 to 4). The transmit side has a contiguous map; the size of the map is the same as the remote device map. The figure illustrates this mapping. This capability makes VLYNQ ideal for memory-mapping FPGA-based peripherals. For clarity, only 4 peripherals are shown above; finer memorydecoding can target any number of smaller address ranges to communicate with registers within the FPGA co-processor. The Avnet VLYNQ block allows memory-mapped address spaces down to single-register level using System Generator shared memory registers. -------------------------------------------------------------------------------------------------------------------Reference: In the local device, the address of the VLYNQ remote memory map in the local configuration space is the transmit address accessing remote devices over the serial interface. The address of the VLYNQ remote13memory map is programmed in the TX address map register (XAM). When the local device transmits, first it strips off the transmit address offset in the local device memory map Then the local device sends the Avnet SpeedWay Workshops Agenda • Interfacing the DSP and FPGA Co-Processor • Avnet Spartan3A-DSP DaVinci Platform with + PS Video EXP Module • Model-Based Infrastructure for Stand-Alone Implementation Avnet SpeedWay Design Workshop™ 14 14 Avnet SpeedWay Workshops Integration of 3 Pieces of Avnet Hardware + + Spartan-3A DSP DaVinci Evaluation Kit PS Video EXP 6.5” NEC LCD panel Avnet SpeedWay Design Workshop™ 6.5” NEC panel is targeted for $500 resale, but we do not have an established price yet. 15 15 Avnet SpeedWay Workshops Avnet Spartan-3A DSP DaVinci Evaluation Kit LEDS Parallel Flash USB LEDS 8- Bit EMIF RS232 DDR2 VPBE VPFE 10/100/1G PHY VLYNQ SPI Flash McBSP1 I2C DDR2 Image Sensor Interface Spartan 3A-DSP 3SD1800A Audio CODEC DaVinci DM6437 Component Video Out Clocks Switches RS232 SPI Flash Parallel Flash EXP JTAG 10/100 PHY JTAG Avnet SpeedWay Design Workshop™ 16 The Avnet Spartan-DSP DaVinci Evaluation Platform combines on the same baseboard the new Xilinx Spartan 3A-DSP FPGA and TI DaVinci TMS320DM6437 Digital Media Processor, optimized for video applications such as surveillance, automotive, machine vision. DM6437 connects to Spartan3A-DSP over several interfaces : VLYNQ, EMIFA, VPBE, VPFE. Features: • • • • • • • • • • • • • • • Xilinx 3SD1800A-FG676 FPGA Programmable LVDS Clock Generator On-board 27 MHz LVTTL Oscillator On-board LVTTL Oscillator Socket 16M x 32-bit DDR2 SDRAM 256K x 36bit ZBT SRAM EXP Expansion Slot 10/100 PHY 64Mb x 2 SPI Configuration Flash JTAG Programming/Configuration Port RS232 Port Two User LEDs A 4-position User DIP Switch Three User Push Button Switches Audio CODEC shared with DM6437 TI DaVinci DSP Processor • • • • • • TMS320DM6437 Digital Media Processor 128 MB 166 MHz DDR2 SDRAM 64 Mb serial SPI Flash program code storage 10/100 PHY VGA Out Audio CODEC shared with FPGA 16 Avnet SpeedWay Workshops Avnet Spartan-3A DSP DaVinci Evaluation Kit FPGA • • • DSP Xilinx XC3SD1800A-4FG676C FPGA Clocks • – Programmable LVDS clock generator – On-board 27 MHz LVTTL oscillator – On-board LVTTL oscillator socket • • • 10/100/1000 PHY JTAG programming/configuration port RS-232 serial port Image Sensor Interface 2 EXP expansion connectors Interfaces – – – – Interfaces – – – – – • – 128 MB 166 MHz DDR2 SDRAM – 128 Mb parallel Flash program code storage – 64 Mb serial SPI Flash program code storage Memory – 128M x 32-bit DDR2 SDRAM – 16M x 8 parallel / BPI configuration Flash – 64Mb SPI configuration/storage Flash TI TMS320DM6437 DaVinci Processor Memory • 10/100 Ethernet Port Component and composite video out Audio CODEC shared with FPGA USB Buttons and switches – 4 User LEDs Buttons and switches – 4 LEDs – Eight 4-position DIP switch – 4 push-button switches Avnet SpeedWay Design Workshop™ 17 17 Avnet SpeedWay Workshops Avnet Pro-Sumer Video EXP Module • • • • • • • High-Definition Video Decoder – Texas Instruments TVP7001 (RGB, Component) Standard-Definition Video Decoder – Texas Instruments TVP5150 (Composite, S-Video) DVI Transmitter – TFP410 DVI Receiver – AD9887A Analog Devices ADV7123 RGB DAC Parallel RGB and LVDS interfaces to Flat Panel Displays Stereo Audio CODECs Avnet SpeedWay Design Workshop™ 18 The Avnet EXP ProSumer Video (EXP PS Video) Module is a plug-in module designed to interface with compatible Avnet baseboards, including the Avnet Spartan-DSP DaVinci Evaluation Platform. The EXP PS Video Module provides a number of video and audio interfaces to its host via two EXP connectors. 18 Avnet SpeedWay Workshops NEC TFT Display • NEC XGA LCD flat panel display NL10276BC13-01C • Super-Transmissive Natural Light TFT • 1024 x 768 Resolution • 6.5 inches Diagonal • 16.77M colors • LVDS Interface • LED Backlight Avnet SpeedWay Design Workshop™ 19 19 Avnet SpeedWay Workshops LVDS Flat Panel Controller 1024 x 768 XGA Avnet Spartan3A-DSP / DaVinci Evaluation Kit 2X RGB Scaler 24-bits Flat Panel Controller VPBE Flat Panel Display 62.5 MPixels / sec 62.5 x 7 = 437.5 Mbps Avnet SpeedWay Design Workshop™ 20 Avnet provides a controller for LVDS flat panel displays. It is provided at no extra cost to customers who purchase the PS Video EXP module. RGB + syncs digital video arrives at the flat panel controller at 62.5 MPixels / sec. The outputs of the LVDS flat panel controller comprise 5 LVDS transmit pairs: • a forwarded clock at 1/7th the bit rate with 4:3 duty cycle comprising the LCD_FTXC pair • 4 data lines LCD_FTX[3:0], each of which carry a 7:1 serialized bit stream. These 5 LVDS transmit pairs originate from the baseboard FPGA, are routed up through the EXP connector to J6 of the Avnet EXP PS Video module. J6 is a JAE FI-X30S-HF connector that accepts a cable assembly to drive a NEC 6.5” XGA TFT-LCD module. 20 Avnet SpeedWay Workshops Agenda • Interfacing the DSP and FPGA Co-Processor • Avnet Spartan3A-DSP DaVinci Platform with + PS Video EXP Module • Model-Based Infrastructure for Stand-Alone Implementation Avnet SpeedWay Design Workshop™ 21 21 Avnet SpeedWay Workshops Avnet Board Support Package for Simulink DM6437 Avnet SpeedWay Design Workshop™ 22 Here is an overview of the Avnet board support package for Simulink for Spartan-3A DSP DaVinci Development kit. It is subdivided into 3 blocksets. On the left are Simulink blocks that map to physical peripheral devices within the DM6437, such as UART, CAN and the Video-Processing subsystem. On the right are blocks that are implemented in the Spartan3A-DSP. These blocks are used in the System Generator portion of the Simulink model. At the bottom are Simulink blocks that map to the DSP, but which communicate with FPGA functions, or physical board-level circuitry via the FPGA, such as LEDs. The Avnet board support package for Simulink is the result of collaborative work between Avnet and The MathWorks. 22 Avnet SpeedWay Workshops Avnet Board Support Library for Simulink • Library of Simulink blocks supporting features of DM6437 on Avnet Spartan-3A DSP DaVinci Evaluation Kit • Exposes parameters of each peripheral • Generates API to DSP/BIOS drivers Avnet SpeedWay Design Workshop™ Overview of Simulink blocks in BSP to support DM6437. Note the extensive list of parameters offered for each peripheral. 23 23 Avnet SpeedWay Workshops Avnet Board Support Package / VPSS DSP Core VPBE Video VPBE INTERFACE VLYNQ Control VLYNQ LOGICORE VPFE Video VPFE INTERFACE Co-Processor • VPSS blocks used by automatic code-generation to call DSP/BIOS driver APIs Avnet SpeedWay Design Workshop™ 24 How is the VPSS connectivity accomplished ? This is accomplished with Avnet BSP for Simulink, developed in collaboration with The MathWorks. For code generation, the VPFE and VPBE blocks are used by RTW Embedded Coder to call the DSP/BIOS driver API. 24 Avnet SpeedWay Workshops Avnet Board Support Library / VLYNQ DSP Core VLYNQ Control VLYNQ LOGICORE VPFE Video VPFE INTERFACE Co-Processor • VLYNQ block used by automatic code-generation to call VLYNQ DSP/BIOS driver API Avnet SpeedWay Design Workshop™ 25 How is the VLYNQ connectivity accomplished on the DSP side ? This is accomplished with Avnet BSP for Simulink, developed in collaboration with The MathWorks. For code generation, the VLYNQ block is used by RTW Embedded Coder to call the DSP/BIOS driver API. (Recall directory structure of Avnet BSP from lecture 4) 25 Avnet SpeedWay Workshops … Passing FPGA Memory Map via MATLAB MathWorks MathWorks TI TI MATLAB MATLAB®® and and Simulink Simulink®® Algorithm Algorithm and and System System Design Design Export memory map via MATLAB REG Avnet Avnet Real-Time Workshop Real Real-Time Workshop Embedded EmbeddedCoder, Coder, Targets, Targets,Links Links Memory Map FIFO DaVinci processor Generate Generate C/ ASM Xilinx Xilinx System System Generator Generator for for DSP DSP Verify Xilinx Xilinx RAM HDL Link for CCS MemoryMapped IO 0400:0000 04000001 Code Composer ISE Co-Processor REG DSP VLYNQ FPGA FIFO 0800:0000 0800:00FF 0800:0100 Avnet Spartan3A-DSP FPGA / DaVinci Platform 0801:00FF RAM Avnet SpeedWay Design Workshop™ 26 Shared memories in the System Generator model destined for the FPGA co-processor are associated with the DM6437 through the ‘DaVinci Processor’ VLYNQ Interface block’s GUI in System Generator. After an association is made, System Generator automatically generates a memory map of all shared memory in the model. <mouse click> During code generation, the memory map is exported to Code Composer Studio via the MATLAB workspace to create memory-mapped IO in DM6437 that communicate with corresponding registers, FIFOs and RAM elements in the FPGA co-processor over VLYNQ. <mouse click> On the FPGA side, System Generator project integration with ISE carries memory mapping information to the VLYNQ IP in ISE, where the final bitstream is created. <mouse click> The result is an association between memory-mapped IO space in the DM6437 and registers, FIFOs and RAM memory elements in the FPGA co-processor, which appear to the DM6437 as local memory space through VLYNQ. Push-button automatic code generation removes all the grunt work of manually maintaining the API, including memory-maps, function headers and C-code device drivers in Code Composer Studio. 26 Avnet SpeedWay Workshops Implementing DSP to FPGA VLYNQ Interface DSP design FPGA design Memory Map communicated via MATLAB Avnet SpeedWay Design Workshop™ 27 Here we show usage of the DM6437 VLYNQ Interface blocks in Simulink to connect DM6437 in the top windows to the FPGA co-processor in System Generator in the bottom windows. Note the memory-mapping for a single shared register passed via the MATLAB workspace. 27 Avnet SpeedWay Workshops TC6 Automatic Code Generation for DM6437 • VLYNQ DSP/BIOS driver API created by automatic code-generation from Avnet BSP VLYNQ block Avnet SpeedWay Design Workshop™ Excerpt of auto-generated code from The MathWorks Embedded Coder for TC6 from VLYNQ block in Avnet board support library for Simulink. 28 28 Avnet SpeedWay Workshops Clock Domains in System Generator • Various FPGA infrastructure on different clock domains • Multiple Subsystem Generator allows multiple asynchronous clock domains in one System Generator model Avnet SpeedWay Design Workshop™ 29 Multiple clock domains are handled seamlessly by the Avnet board support package using a powerful feature of System Generator: Multiple Subsystem Generator. This example shows VLYNQ interface to DSP on one clock domain, VPFE for incoming video one another clock domain, and VPBE for video display on a third clock domain. Note that the top-level FPGA design is finalized in ISE after project export from System Generator. 29 Avnet SpeedWay Workshops Avnet Board Support Package / Demos • Suite of demos integrated into board support package • FPGA-based co-processors using model based design Avnet SpeedWay Design Workshop™ 30 A comprehensive suite of demos is integrated into the Simulink board support package for the Avnet Spartan-3A DSP FPGA DaVinci Development Kit. Demos cover these aspects of creation of FPGA-based co-processors using model based design: •LCD Demo: generate an image on the LCD panel of the Avnet Spartan-3A DSP FPGA DaVinci Development Kit •Resizer demo: demonstrates two methods for resizing an image •NTSC to LCD passthrough: demonstrates how to implement a NTSC to LCD passthrough •SVGA to LCD passthrough: demonstrates how to implement a SVGA to LCD passthrough •Video surveillance recording: demonstrates a video surveillance recording application with motion-detection algorithm on the DM6437 DSP •LED Demo: Using a very simple example, a model-based design is gradually targeted to DSP and FPGA hardware. 30 Avnet SpeedWay Workshops Avnet Design Resource Center • Download Board Support Package for Simulink from DRC Avnet SpeedWay Design Workshop™ 31 31 Avnet SpeedWay Workshops Stand-Alone Video Stabilization System Avnet Xilinx Spartan3A-DSP DaVinci Evaluation Platform 2X Scaler RGB 24-bits 1024 x 768 60 Hz VPBE INTERFACE Flat Panel Controller VPBE XGA Flat Panel Image Translate DDR2 Template, ROI VLYNQ Best-match row,column VLYNQ LOGICORE VPFE INTERFACE Scaler VPFE SAD Video source NTSC Avnet SpeedWay Design Workshop™ 32 Block diagram of stand-alone video stabilization system that will be built in lab 5. The architecture of the Avnet Spartan-3A-DSP DaVinci board routes video data through the FPGA towards the DM6437 over the dedicated VPFE video port. Template and ROI data are sent to the FPGA at each frame for SAD search of template in region of interest (ROI). Best-match result of SAD is sent back to DM6437 over VLYNQ. Motion vector is used as offset for image translation to stabilize the video from frame-to-frame. Video output is sent over VPBE to FPGA for display on XGA flat panel. 32 Avnet SpeedWay Workshops … Integrating the DSP and FPGA Co-processor TI TI Xilinx Xilinx MATLAB MATLAB®® and and Simulink Simulink®® Algorithm Algorithm and and System System Design Design Real-Time Workshop Real Real-Time Workshop Embedded EmbeddedCoder, Coder, Targets, Targets,Links Links Video source Generate Generate C/ ASM Verify Avnet Avnet Xilinx Xilinx System System Generator Generator for for DSP DSP Verify MathWorks MathWorks LCD Panel HDL Link for CCS Hardware CoCo-simulation Code Composer ISE DSP VLYNQ FPGA Verify Chipscope Chipscope Avnet Spartan3A-DSP FPGA / DaVinci Platform Avnet SpeedWay Design Workshop™ 33 Preview of lab 5: < mouse click > 1. Implement connectivity in System Generator for data transfer between the DM6437 and FPGA co-processor over VLYNQ. < mouse click > 2. Continue with automatic code generation of executables for both DSP and FPGA, including the Avnet board support package for Simulink on Avnet Spartan-3A DSP DaVinci development Kit. < mouse click > 3. Conclude with in-system verification techniques of the combined the DSP and FPGA co-processor system. Hardware co-simulation was used for functional verification in lab 4. It is not used for stand-alone implementation, and is shown here as reference only. Note that video now flows into the system from a live source, contrary to video frames generated by Simulink for hardware co-simulation. 33 Avnet SpeedWay Workshops Summary • Interfacing the DSP and FPGA Co-Processor • Avnet Spartan3A-DSP DaVinci Platform with + PS Video EXP Module • Model-Based Infrastructure for Stand-Alone Implementation … proceed to lab 5 Integrating the DSP and FPGA Coprocessor Avnet SpeedWay Design Workshop™ 34 34 Avnet SpeedWay Workshops Reference Slides Avnet SpeedWay Design Workshop™ 35 35 Avnet SpeedWay Workshops VLYNQ Data Flow Video Processing Subsystem DSP Core Co-Processor Remote VLYNQ Custom Interface Local VLYNQ Avnet SpeedWay Design Workshop™ 36 VLYNQ block diagram. The previous slide showed memory mapping between the local (host) device’s address space and the remote address space. This is accomplished via the address translation blocks. A remote VLYNQ device is mapped to the local device’s address via the address map registers (TX address map, RX address map size n, RX address map offset n, where n = 1 to 4). For clarity, the map registers aren’t shown on the block diagram above. The data flow between two VLYNQ devices is shown here, in which the write originates from the DM643x slave configuration bus interface towards the outbound command (CMD) FIFO after address translation. Data is subsequently read from the FIFO and encapsulated in a write request packet. The packet is encoded and serialized before being transmitted to the remote VLYNQ in the FPGA. The remote device subsequently de-serializes and decodes the receive data and writes it into the inbound CMD FIFO. A write operation initiates on the FPGA VLYNQ OPB master bus interface (On-Chip Peripherial Bus) after reading the address and data from the FIFO. 32-bit OPB interface standard can interface directly to an embedded processor in the FPGA, or a custom user interface, as shown. Finally, address decoding can deliver the data to register(s) of the addressed peripheral. The Xilinx VLYNQ serial interface is not directly coupled to the OPB interface; there are asynchronous FIFOs between the two interface domains, and the interfaces operate independently. However, if the OPB fails to generate sufficient commands and data to consume all the VLYNQ interface’s bandwidth, the VLYNQ interface generates idle packets. If the OPB fails to immediately accept all remotely generated commands and data, the FIFOs fill and the VLYNQ interface turns flow control on. Reference: TMS320DM643x DMP VLYNQ Port User's Guide Literature Number: SPRU938B Section 2.5.1 Xilinx VLYNQ v1.3 / Core Generator 10.1 Literature Number: DS324 36 Avnet SpeedWay Workshops VLYNQ References www.xilinx.com/products/ipcenter/DO-DI-VLYNQ.htm http://focus.ti.com/lit/ug/spru938b/spru938b.pdf Avnet SpeedWay Design Workshop™ 37 VLYNQ documentation consists of the TMS320DM643x DMP VLYNQ Port User’s Guide from TI and of the VLYNQ LogiCore datasheet from Xilinx. 37 Avnet SpeedWay Workshops VLYNQ DSP/BIOS Driver vlynq_config.peer_tx_addr = 0; vlynq_config.local_rtm_cfg_type = no_rtm_cfg; vlynq_config.peer_rtm_cfg_type = no_rtm_cfg; vlynq_config.local_tx_fast_path = FALSE; vlynq_config.peer_tx_fast_path = FALSE; /* Initialize the VLYNQ control module */ ptr_vlynq = PAL_sysVlynqInitSoc(&vlynq_config); if(NULL == ptr_vlynq) { VLYNQ_DEBUG("VLYNQ :Failed to initialize the vlynq 0x%08x\n\r", vlynq_config.base_addr); VLYNQ_DEBUG("VLYNQ :The error msg: %s\n\r", vlynq_config.error_msg); goto av_vlynq_init_fail; } /* Map memory regions of device for remote/local VLYNQ depending on region ID to be mapped and the size and offset. */ while(init_p_region->id > -1) { if(VLYNQ_APP_SUCCESS != PAL_sysVlynqMapRegion(ptr_vlynq, init_p_region->remote, init_p_region->id, init_p_region->offset, init_p_region->size, ptr_vlynq_dev)) Avnet SpeedWay Design Workshop™ 38 On the TI SOC software side, a VLYNQ peripheral is implemented using a set of functions within the API (application programming interface) provided by the VLYNQ device driver. Shown above are 2 of the preparatory steps to activate VLYNQ: PAL_sysVlynqInitSoc to initialize the VLYNQ control module, and PAL_sysVlynqMapRegion to map memory regions of the device for remote/local VLYNQ depending on the region ID to be mapped and the size and offset. Refer to VLYNQ Device Driver architecture for a full description of all functions in the API. 38 Avnet SpeedWay Workshops Avnet BSP Installation Package Avnet Tools: - avnet_3adsp_dm6437_0_04 AVNET_S3ADSP_DM6437_INSTALL_DIR => C:\avnet_s3adsp_dm6437_0_04 PSP_EVMDM6437_INSTALLDIR => %AVNET_S3ADSP_DM6437_INSTALL_DIR%\psp CSLR_DM6437_INSTALLDIR => %AVNET_S3ADSP_DM6437_INSTALL_DIR%\psp\pspdrivers\soc\dm6437\dsp\inc DSP drivers (CCS specific) FPGA logic (ISE specific) DSP blockset (Target Support Package TC6 & Embedded IDE Link CC specific) FPGA blockset (SysGen specific) Modified version of C:\dvsdk_1_01_00_15\ndk_1_92_00_22_eval Modified version of C:\dvsdk_1_01_00_15\psp_1_00_02_00 Avnet SpeedWay Design Workshop™ 39 Once installed, the Avnet Spartan-3A DSP DaVinci board support package consists of the above directory structure. Note: •NDK = Modified-for-Avnet version of DVSDK for TI DM6437 EVM : C:\dvsdk_1_01_00_15\ndk_1_92_00_22_eval •PSP = Modified-for-Avnet version of PSP DSP/BIOS drivers for TI DM6437 EVM : C:\dvsdk_1_01_00_15\psp_1_00_02_00 39 Avnet SpeedWay Workshops Spartan-3A DSP DaVinci Board Support Package DSP drivers (Code Composer Studio specific) FPGA logic (ISE specific) DSP blockset (Target Support Package TC6 & Embedded IDE Link CC specific) FPGA blockset (System Generator specific) Network Devloper’s Kit (DSP/BIOS) PSP Drivers for DM6437 (DSP/BIOS) Ethernet Hardware Co-Simulation support files Avnet SpeedWay Design Workshop™ 40 Once installed, the Avnet Spartan-3A DSP DaVinci board support package consists of the above directory structure. We concentrate here on Ethernet hardware co-simulation support files. All other components of the BSP will be presented in lecture 5. Note: •NDK = Modified-for-Avnet version of DVSDK for TI DM6437 EVM : C:\dvsdk_1_01_00_15\ndk_1_92_00_22_eval •PSP = Modified-for-Avnet version of PSP DSP/BIOS drivers for TI DM6437 EVM : C:\dvsdk_1_01_00_15\psp_1_00_02_00 40 Avnet SpeedWay Workshops Ethernet Hardware Co-Simulation Support Files • Board appears in list of targets for Ethernet hardware co-simulation Avnet SpeedWay Design Workshop™ 41 Avnet provides Ethernet hardware co-simulation support files for the Spartan-3A DSp DaVinci, as well as several Avnet Virtex-5 evaluation kits. The support files, known as ‘plugins’ are packaged in a standard format for the System Generator plugin installer ‘xlinstallplugin’. Once installed under the directory tree shown here, the board appears in the target list for Ethernet point-topoint hardware co-simulation. 41 Avnet SpeedWay Workshops Accelerating Your Success™ Installation Package BSL – Board Support Libraries MSL – Model Support Libraries LED Demo V10_1_2_0 42 Avnet SpeedWay Workshops BSL – DSP drivers bsl\dsp\gel: - avnet_s3adsp_dm6437.ccs => CCS setup for BlackHawk USB510L - avnet_s3adsp_dm6437.gel => GEL file for Avnet board bsl\dsp\src: bsl\dsp\inc: - dm6437_init.c/.h => various init/config routines - fpga_interface.c/.h => FPGA device driver (apply/release reset) - vlynq_interface.c/.h => VLYNQ device driver - led_interface.c/.h => LED device driver - dip_interface.c/.h => DIP Switch device driver - vpss_interface.h => contains a bunch of useful defines bsl\dsp\dspbios: - Platform.tci => ?? Avnet SpeedWay Design Workshop™ 43 43 Avnet SpeedWay Workshops BSL – FPGA Logic bsl\fpga\rtl: - pattern => XGA pattern generator (color bars + moving logo) - lcd => LCD flat panel interface - picoblaze => picoblaze-based I2C controller - vlynq => VLYNQ interface core - video => video interfaces (stddef, hidef, vpfe, vpbe) - debug => ChipScope debug module - top_level => top level designs bsl\fpga\chipscope: - ChipScope Analyzer project for FPGA debug bsl\fpga\ucf: - constraints file for FPGA designs bsl\fpga\ise - davinci_coprocessor_stddef => example design for Composite input - davinci_coprocessor_hidef => example design for VGA input Avnet SpeedWay Design Workshop™ 44 44 Avnet SpeedWay Workshops Accelerating Your Success™ Installation Package BSL – Board Support Libraries MSL – Model Support Libraries LED Demo V10_1_2_0 45 Avnet SpeedWay Workshops MSL – DSP Logic DIP Switch: - Reads one of SW10[1:4] switches (cannot be used with VPFE/VPBE) LED: - Writes to one of D7, D8, D9, D10 LEDs VLYNQ Read/Write: - Reads/Writes to FPGA peripherals via VLYNQ Avnet SpeedWay Design Workshop™ 46 46 Avnet SpeedWay Workshops MSL – FPGA Blockset DaVinci Processor: - similar to Xilinx’s EDK Processor block - automatically creates VLYNQ bus logic to all shared regs/fifos/mems - creates memory map I2C Controller: - PicoBlaze-based I2C Controller - Command Port via request/response FIFOs Avnet SpeedWay Design Workshop™ 47 47 Avnet SpeedWay Workshops Implementing DSP to FPGA VLYNQ Interface FPGA design VLYNQ bus logic Automatically created Avnet SpeedWay Design Workshop™ 48 Memories used in the co-processor are associated with the DaVinci processor through the block’s GUI interface in system Generator. After an association is made, System Generator automatically generates an interface that marshals data to and from the processor over VLYNQ. On the DaVinci side Target for C6000 handles automatic code generation. Having the control and processor in the same development environment removes all the grunt work of manually maintaining the API, including memory-maps, function headers and C-code device drivers in Code Composer Studio. 48 Avnet SpeedWay Workshops Accelerating Your Success™ Installation Package BSL – Board Support Libraries MSL – Model Support Libraries LED Demo V10_1_2_0 49 Avnet SpeedWay Workshops LED Demo – DIP Implementations for simulation only for DSP build Avnet SpeedWay Design Workshop™ 50 50 Avnet SpeedWay Workshops LED Demo – LED Implementations for simulation only for DSP build Avnet SpeedWay Design Workshop™ 51 51 Avnet SpeedWay Workshops LED Demo – Simulation only Avnet SpeedWay Design Workshop™ 52 52 Avnet SpeedWay Workshops LED Demo – DSP only Avnet SpeedWay Design Workshop™ 53 53 Avnet SpeedWay Workshops Serial RapidIO™ Enables Increased Bandwidth (TI TMS320C6455, C6474, etc.) Serial RapidIO is a high-performance, packet-switched, interconnect technology that addresses the embedded industry's need for: Reliability Increased Bandwidth Faster Bus Speeds Serial RapidIO allows chip-to-chip and board-to-board communications at performance levels scaling to ten Gigabits per second and beyond •C6455 Serial RapidIO Support – IEEE 1149.6 Compliant – 1.25, 2.5, 3.125 GBit/sec per link Up to four 1x links (each 1x link is bidirectional) --OR- Up to one 4x link (bi-directional pipe), which provides up to 12.5 GBit/sec – Resulting range 10 – 25 GBits/sec total (1.25 – 3.125 GBytes/sec) – Supports DSP-to-DSP on the same board, DSP-to-Switch, DSP-to-FPGA, etc. •Benefits – 1x Link is fast enough to send HD 1080i raw video between devices – 4x Link is easily fast enough to send HD 1080p raw video between devices – Reduction in chip count, board area and system cost 54 Avnet SpeedWay Design Workshop™ TI customers asked for faster IO performance. TI listened. TI are bus agnostic. So, let’s first explain why did TI choose Serial Rapid IO for C6455: High Performance for HD video and Telecom Channel Density Worldwide standard, Multiple applications, broad OEM adoption Flexible / scaleable rates and widths (1x or 4x) Low pin count and Low power per link TI was part of the consortium that defined the standard with other industry leaders. The theoretical payload bandwidth is up to 25Gbits/sec, but there is some overhead (addresses, acknowledgement, error correction) with any communications protocol. (reality may be ~19 or 20 Gbits/sec) From a video infrastructure applications perspective, the 1x Link is fast enough to send HD 1080i raw video between devices and the 4x link can easily send HD 1080p raw video between devices. The use of SRIO in infrastructure applications with large “DSP farms” may allow the reduction of FPGA cost (quantity, pin count, size and/or cost) for our OEMs. 54