Avnet Speedway Design Workshop™

advertisement
Avnet SpeedWay Workshops
Accelerating Your Success™
Avnet Speedway
Design Workshop™
Creating FPGA-based Co-Processors
for DSPs Using Model Based Design
Techniques
Lecture 5:
Creating a Stand-alone Video System
V10_1_2_0
1
Avnet SpeedWay Workshops
Model-Based Design Flow
Develop Executable Spec in Simulink
Design Exploration for Targeting Hardware
Partition Between DSP and FPGA Co-Processor
Verify Hardware in HW Co-simulation
Implement Stand-Alone Video System
Avnet SpeedWay Design Workshop™
2
The final design phase after verification in simulation is implementation as a stand-alone
system comprised of DSP and FPGA co-processor.
2
Avnet SpeedWay Workshops
The Problem We Wish to Solve
Maintaining a complex system involving DSP and FPGA
co-processor can be tedious and error-prone.
MathWorks model-based design bridges TI DSP and
Xilinx FPGA design flows with automatic code
generation to remove the grunt work of manually
maintaining the API, including memory-maps, function
headers and C-code device drivers in Code Composer
Studio.
Final FPGA co-processor system offers better
performance.
Avnet SpeedWay Design Workshop™
3
3
Avnet SpeedWay Workshops
Agenda
• Interfacing the DSP and FPGA Co-Processor
• Avnet Spartan3A-DSP DaVinci Platform with + PS Video
EXP Module
• Model-Based Infrastructure for Stand-Alone Implementation
Avnet SpeedWay Design Workshop™
4
4
Avnet SpeedWay Workshops
…
Design Flow for Stand-Alone Implementation
TI
TI
Xilinx
Xilinx
MATLAB
MATLAB®® and
and Simulink
Simulink®®
Algorithm
Algorithm and
and System
System Design
Design
Real-Time Workshop
Real
Real-Time
Workshop
Embedded
EmbeddedCoder,
Coder,
Targets,
Targets,Links
Links
Video
source
Generate
Generate
C/
ASM
Verify
Avnet
Avnet
Xilinx
Xilinx System
System
Generator
Generator for
for DSP
DSP
Verify
MathWorks
MathWorks
LCD
Panel
HDL
Link for CCS
Hardware
CoCo-simulation
Code Composer
ISE
DSP
FPGA
Verify
Chipscope
Chipscope
Avnet Spartan3A-DSP FPGA / DaVinci Platform
Avnet SpeedWay Design Workshop™
5
< mouse click >
We begin by examining the connectivity for data transfer between the DM6437 and
FPGA co-processor.
< mouse click >
We continue with automatic code generation of executables for both DSP and FPGA,
including the Avnet board support package for Simulink on Avnet Spartan-3A DSP
DaVinci development Kit.
< mouse click >
We conclude with in-system verification techniques of the combined the DSP and FPGA
co-processor system.
Note that video now flows into the system from a live source, contrary to video frames
generated by a Simulink testbench for verification using hardware co-simulation.
5
Avnet SpeedWay Workshops
Model Partition DSP / FPGA
Image
Translate
Video Stabilization Model
Motion estimation
updated template
updated ROI
Location
estimation
Relative motion vector
from frame to frame
2
1
Sum-of-Absolute Differences
(SAD)
Simulink
Simulink
Algorithm
Algorithm and
and System
System Design
Design
Avnet SpeedWay Design Workshop™
6
Recall the steps of labs 3 and 4, where a Simulink model was partitionned between DSP
and FPGA …
Moving to a stand-alone implementation, we must now bridge the FPGA co-processor
hardware and to DSP software.
6
Avnet SpeedWay Workshops
.
Bridging Software to Hardware
DSP
Core
?
?
Co-Processor
• Requires hardware interface and communication protocol
• Managing asynchronous clock domains
• Software API to communicate with hardware
Avnet SpeedWay Design Workshop™
7
Bridging software (DSP) to hardware (FPGA Co-Processor) requires:
•hardware interface and communication protocol
•managing asynchronous clock domains
•software API to communicate with hardware
Mouse click …
How can this be implemented ?
Let's examine these aspects in detail, especially as they relate to exchanging streaming
data such as video between the FPGA co-processor and the DSP.
7
Avnet SpeedWay Workshops
Bridging Software to Hardware / EMIF
DSP
Core
EMIF
?
Data Control Data Control
Co-Processor
Data and control on common bus (EMIF)
– Obliges burst transfer over time-shared bus
– Inefficient for streaming data (ex. video)
– Requires inserted syncs, framing in DSP software, handshaking
Avnet SpeedWay Design Workshop™
8
Bridging software on the DSP-side to the hardware co-processor requires first and
foremost a hardware interface and communication protocol. One possibility is EMIF,
‘External Memory Interface’, which groups address, data and control signals for interface
to external devices. EMIF comes in a variety of sizes across different families of DaVinci,
from synchronous 32-bit data on DM642 to asynchronous 8-bit data on the DM6437.
It is convenient to differentiate between control data and streaming data. Control data is
often bursty in nature and not time-critical, while streaming data is constant and
requires a fixed bandwidth. Exchanging streaming data such as video between DSP and
FPGA co-processor over a shared bus such as EMIF will require time-multiplexed burst
transactions to accommodate other devices access to the bus. Control data must be
inserted between streaming data bursts in a time-multiplexed bus-sharing. Furthermore,
exchanging video over a bus such as EMIF would necessitate inserted syncs, and would
require framing in software in the DSP and asynchronous FIFOs in the FPGA. For these
reasons, EMIF is not the best choice of interface between the DSP and FPGA coprocessor.
8
Avnet SpeedWay Workshops
Bridging Software to Hardware / VLYNQ
DSP
Core
VPBE
Video
VPBE
INTERFACE
VLYNQ
Control
VLYNQ
LOGICORE
VPFE
Video
VPFE
INTERFACE
Co-Processor
Separate data and control
– Streaming full-duplex video over dedicated Video Processing Subsystem of
DM6437
– Control over VLYNQ
– Simple, fast, efficient
Avnet SpeedWay Design Workshop™
9
A simple and efficient approach is to transport streaming data over dedicated ports of
the Video Processing Subsystem, while control data flows through a separate, non-time
critical link. This offers simple, fast uninterrupted bi-directional streaming video between
DSP and FPGA co-processor.
Let's examine the resources on DM6437 to implement separate video and control
interfaces to the FPGA co-processor.
-----------------------------------------------------Why does video flow thru FPGA and not directly to DSP ?
… because board is built to pipe video thru FPGA to/from DSP.
-----------------------------------------------------Note: Although not officially supported, TI has done some work to allow general-purpose
data, not just video, to flow into the VPFE and out of the VPBE ports.
Contact Bernie Thompson at TI.
9
Avnet SpeedWay Workshops
VLYNQ
Video
DSP
Core
VLYNQ
Control
VLYNQ
LOGICORE
Co-Processor
VPFE
• Xilinx and TI collaborating
forVideo
seamless
interconnection
VPFE
INTERFACE
between DSPs and FPGAs
• Low pin count, low cost, scalable bandwidth
• DaVinci has on-chip VLYNQ peripheral
• Xilinx VYNQ LogiCORETM IP delivered through Xilinx CORE
Generator
Avnet SpeedWay Design Workshop™
10
VLYNQ is a serial (i.e. low pin count) communications interface that enables the
extension of an internal bus segment to one or more external physical devices (ex.
FPGA). VLYNQ accomplishes this function by serializing bus transactions in one device,
transferring the serialized transaction between devices via a VLYNQ port, and deserializing the transaction in the external device.
VLYNQ peripheral is offered in DaVinci (DM644x and DM643x devices), Jacinto,
Avalanche, Puma, Sangam, Titan, APEX and other TI communication processors.
Xilinx has licensed VLYNQ, so it is a great opportunity to connect FPGAs to TI DSPs, in
addition to EMIF (External memory I/F) and Serial Rapid IO (SRIO).
10
Avnet SpeedWay Workshops
VLYNQ
High-Speed, low pin-count, full duplex, peer-to-peer Serial I/F
Extension of an internal bus segment to one or more external devices
Point-to-point serial interface for other VLYNQ compatible devices
External devices are mapped to local physical address space and appear
as if they are on the internal bus of the local device
CMD 1
(10 bits)
CMD 2
(10 bits)
Packet Type
(4 bits)
Address Mask
(4 bits)
Byte Count
(10 bits)
Address
(<4*10 bits)
TRANSMIT
Data
(N*10 bits)
End of Packet
EOP (10 bits)
VLYNQ
VLYNQ
•
•
•
•
RECEIVE
CLOCK
CLK REQ (optional)
Xilinx
FPGA
• Scalable to meet bandwidth requirements (3pin to
• Memory mapped, master & slave on a single bus
• Software transparent for future device integration
10pin)
• Single ended, unidirectional I/O
• 8b/10b encoding. In-band signaling
Avnet SpeedWay Design Workshop™
11
Avnet Spartan-3A DSP DaVinci board uses all 4 data (transmit / receive) pairs. Individual
pins can be GPIO if a lower bandwidth VLYNQ interface is desired or not used.
11
Avnet SpeedWay Workshops
VLYNQ Performance
• 8b/10b coding causes 20% overhead - only 8bits of data
contained in every 10bits sent
• Total Overhead = protocol overhead + 8b/10b overhead
• Theoretical Maximum throughput = 4 data lines X 100Mhz
max clock = 50 Mbytes/sec)
Maximum Effective Throughput - With 99Mhz Clock (100 Mhz max clock supported)
All benchmarks using 4 VLYNQ transmit/receive pairs.
Burst Size in 32-bit
Words
Throughput
(Mbits/sec)
Throughput
(Mbytes/sec)
1
126.72
15.84
4
220.37
27.55
8
259.93
32.49
16
285.56
35.7
Avnet SpeedWay Design Workshop™
12
The max write rate describes the maximum available data rate of the serial interface for
transmission, taking into consideration the 8b/10b encoding overheads. This is calculated
as follows:
Max write rate = VLYNQ Serial Clock (MHZ) x No. of Pins x 8b/10b encoding overhead
The 8b/10b encoding overhead essentially accounts for 20% overhead, thus the actual
effective data throughput after subtraction of the encoding overhead gives a factor of
0.8. For example, if the VLYNQ clock is running at 99 MHZ on a 4 pin per direction
interface, the raw data is 99 x 4 or 396 Mbps. After the 8B10B encoding is removed, the
maximum write rate is 396 x 0.8 = 316.8 Mbps.
The total throughput on the VLYNQ interface includes both transmit and receive
directions. Therefore, for the above configuration, a remote device can also be writing to
the local device at the same data rates, then the total throughput is the sum of transmit
and receive rates, or 633.6 Mbps. In addition to the 8b/10b encoding, the packet
structure for read/write operations also results in additional overheads. The VLYNQ
module can transfer single 32-bit words or a burst of up to sixteen 32-bit words.
The data and throughput calculations shown here are sample calculations for most ideal
situations. In general, the data rates depend on a variety of other factors, such as
efficiency of read/write burst transactions, ability of buffering up read/write data, and
how best it can be serially shifted out without stalling additional read/write data burst,
remote and local components , both external and internal (device operations, board
considerations, etc.).
References:
TMS320DM643x DMP VLYNQ Port User's Guide / TI Literature: spru938b.pdf (Appendix
B)
12
Avnet SpeedWay Workshops
VLYNQ Remote Memory Mapping
Video
Processing
processing
Subsystem
DSP
Core
Connectivity
VLYNQ
0400:0000
07FF:FFFF
0800:0000
0800:00FF
0800:0100
0801:00FF
0801:0100
0841:00FF
•
•
Peripheral A
0000:0000
03FF:FFFF
Peripheral B
VLYNQ
Address
decode
Map Region 1
Map Region 1
Map Region 2
0400:00FF
Peripheral C
0500:0000
0500:FFFF
Map Region 2
Map Region 3
0400:0000
Map Region 3
Peripheral D
0B00:0000
Map Region 4
Map Region 4
0B3F:FFFF
Remote VLYNQ devices memory mapped to the local (DSP host) device’s address space
Finer memory-decoding can target smaller address ranges within the FPGA co-processor
Avnet SpeedWay Design Workshop™
13
Remote VLYNQ device(s) are memory mapped to the local (host) device’s
address space when a link is established (and appear as if they are on the
internal bus, similar to any other on-chip peripherals). Enumerating the
VLYNQ devices (single or multiple) into a coherent memory map for accessing
each device is part of the initialization sequence.
After the enumeration, the host (local) device can access the remote device
address map using local device addresses. The VLYNQ module in the host
device manages the address translation of the local address to the remote
address. A remote VLYNQ device is mapped to the local device’s address via
the address map registers (TX address map, RX address map size n, RX
address map offset n, where n = 1 to 4). The transmit side has a contiguous
map; the size of the map is the same as the remote device map.
The figure illustrates this mapping.
This capability makes VLYNQ ideal for memory-mapping FPGA-based
peripherals. For clarity, only 4 peripherals are shown above; finer memorydecoding can target any number of smaller address ranges to communicate
with registers within the FPGA co-processor. The Avnet VLYNQ block allows
memory-mapped address spaces down to single-register level using System
Generator shared memory registers.
-------------------------------------------------------------------------------------------------------------------Reference:
In the local device, the address of the VLYNQ remote memory map in the local
configuration space is the transmit address accessing remote devices over the serial
interface. The address of the VLYNQ remote13memory map is programmed in the TX
address map register (XAM). When the local device transmits, first it strips off the
transmit address offset in the local device memory map Then the local device sends the
Avnet SpeedWay Workshops
Agenda
• Interfacing the DSP and FPGA Co-Processor
• Avnet Spartan3A-DSP DaVinci Platform with + PS Video
EXP Module
• Model-Based Infrastructure for Stand-Alone Implementation
Avnet SpeedWay Design Workshop™
14
14
Avnet SpeedWay Workshops
Integration of 3 Pieces of Avnet Hardware
+
+
Spartan-3A DSP DaVinci
Evaluation Kit
PS Video EXP
6.5” NEC LCD panel
Avnet SpeedWay Design Workshop™
6.5” NEC panel is targeted for $500 resale, but we do not have an established price yet.
15
15
Avnet SpeedWay Workshops
Avnet Spartan-3A DSP DaVinci Evaluation Kit
LEDS
Parallel
Flash
USB
LEDS
8- Bit EMIF
RS232
DDR2
VPBE
VPFE
10/100/1G
PHY
VLYNQ
SPI Flash
McBSP1
I2C
DDR2
Image
Sensor
Interface
Spartan 3A-DSP
3SD1800A
Audio
CODEC
DaVinci
DM6437
Component
Video Out
Clocks
Switches
RS232
SPI
Flash
Parallel
Flash
EXP
JTAG
10/100 PHY
JTAG
Avnet SpeedWay Design Workshop™
16
The Avnet Spartan-DSP DaVinci Evaluation Platform combines on the same baseboard the new Xilinx Spartan 3A-DSP FPGA
and TI DaVinci TMS320DM6437 Digital Media Processor, optimized for video applications such as surveillance, automotive,
machine vision.
DM6437 connects to Spartan3A-DSP over several interfaces : VLYNQ, EMIFA, VPBE, VPFE.
Features:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Xilinx 3SD1800A-FG676 FPGA
Programmable LVDS Clock Generator
On-board 27 MHz LVTTL Oscillator
On-board LVTTL Oscillator Socket
16M x 32-bit DDR2 SDRAM
256K x 36bit ZBT SRAM
EXP Expansion Slot
10/100 PHY
64Mb x 2 SPI Configuration Flash
JTAG Programming/Configuration Port
RS232 Port
Two User LEDs
A 4-position User DIP Switch
Three User Push Button Switches
Audio CODEC shared with DM6437
TI DaVinci DSP Processor
•
•
•
•
•
•
TMS320DM6437 Digital Media Processor
128 MB 166 MHz DDR2 SDRAM
64 Mb serial SPI Flash program code storage
10/100 PHY
VGA Out
Audio CODEC shared with FPGA
16
Avnet SpeedWay Workshops
Avnet Spartan-3A DSP DaVinci Evaluation Kit
FPGA
•
•
•
DSP
Xilinx XC3SD1800A-4FG676C FPGA
Clocks
•
– Programmable LVDS clock generator
– On-board 27 MHz LVTTL oscillator
– On-board LVTTL oscillator socket
•
•
•
10/100/1000 PHY
JTAG programming/configuration port
RS-232 serial port
Image Sensor Interface
2 EXP expansion connectors
Interfaces
–
–
–
–
Interfaces
–
–
–
–
–
•
– 128 MB 166 MHz DDR2 SDRAM
– 128 Mb parallel Flash program code
storage
– 64 Mb serial SPI Flash program code
storage
Memory
– 128M x 32-bit DDR2 SDRAM
– 16M x 8 parallel / BPI configuration Flash
– 64Mb SPI configuration/storage Flash
TI TMS320DM6437 DaVinci
Processor
Memory
•
10/100 Ethernet Port
Component and composite video out
Audio CODEC shared with FPGA
USB
Buttons and switches
– 4 User LEDs
Buttons and switches
– 4 LEDs
– Eight 4-position DIP switch
– 4 push-button switches
Avnet SpeedWay Design Workshop™
17
17
Avnet SpeedWay Workshops
Avnet Pro-Sumer Video EXP Module
•
•
•
•
•
•
•
High-Definition Video Decoder – Texas Instruments TVP7001 (RGB, Component)
Standard-Definition Video Decoder – Texas Instruments TVP5150 (Composite, S-Video)
DVI Transmitter – TFP410
DVI Receiver – AD9887A
Analog Devices ADV7123 RGB DAC
Parallel RGB and LVDS interfaces to Flat Panel Displays
Stereo Audio CODECs
Avnet SpeedWay Design Workshop™
18
The Avnet EXP ProSumer Video (EXP PS Video) Module is a plug-in module designed to
interface with compatible Avnet baseboards, including the Avnet Spartan-DSP DaVinci
Evaluation Platform. The EXP PS Video Module provides a number of video and audio
interfaces to its host via two EXP connectors.
18
Avnet SpeedWay Workshops
NEC TFT Display
• NEC XGA LCD flat panel display NL10276BC13-01C
• Super-Transmissive Natural Light TFT
• 1024 x 768 Resolution
• 6.5 inches Diagonal
• 16.77M colors
• LVDS Interface
• LED Backlight
Avnet SpeedWay Design Workshop™
19
19
Avnet SpeedWay Workshops
LVDS Flat Panel Controller
1024 x 768
XGA
Avnet Spartan3A-DSP / DaVinci Evaluation Kit
2X
RGB
Scaler
24-bits
Flat Panel
Controller
VPBE
Flat
Panel
Display
62.5 MPixels / sec
62.5 x 7 = 437.5 Mbps
Avnet SpeedWay Design Workshop™
20
Avnet provides a controller for LVDS flat panel displays. It is provided at no extra cost to
customers who purchase the PS Video EXP module.
RGB + syncs digital video arrives at the flat panel controller at 62.5 MPixels / sec.
The outputs of the LVDS flat panel controller comprise 5 LVDS transmit pairs:
• a forwarded clock at 1/7th the bit rate with 4:3 duty cycle comprising the LCD_FTXC
pair
• 4 data lines LCD_FTX[3:0], each of which carry a 7:1 serialized bit stream.
These 5 LVDS transmit pairs originate from the baseboard FPGA, are routed up through
the EXP connector to J6 of the Avnet EXP PS Video module.
J6 is a JAE FI-X30S-HF connector that accepts a cable assembly to drive a NEC 6.5” XGA
TFT-LCD module.
20
Avnet SpeedWay Workshops
Agenda
• Interfacing the DSP and FPGA Co-Processor
• Avnet Spartan3A-DSP DaVinci Platform with + PS Video
EXP Module
• Model-Based Infrastructure for Stand-Alone Implementation
Avnet SpeedWay Design Workshop™
21
21
Avnet SpeedWay Workshops
Avnet Board Support Package for Simulink
DM6437
Avnet SpeedWay Design Workshop™
22
Here is an overview of the Avnet board support package for Simulink for Spartan-3A DSP
DaVinci Development kit. It is subdivided into 3 blocksets.
On the left are Simulink blocks that map to physical peripheral devices within the
DM6437, such as UART, CAN and the Video-Processing subsystem.
On the right are blocks that are implemented in the Spartan3A-DSP. These blocks are
used in the System Generator portion of the Simulink model.
At the bottom are Simulink blocks that map to the DSP, but which communicate with
FPGA functions, or physical board-level circuitry via the FPGA, such as LEDs.
The Avnet board support package for Simulink is the result of collaborative work
between Avnet and The MathWorks.
22
Avnet SpeedWay Workshops
Avnet Board Support Library for Simulink
• Library of Simulink blocks supporting features of DM6437 on Avnet
Spartan-3A DSP DaVinci Evaluation Kit
• Exposes parameters of each peripheral
• Generates API to DSP/BIOS drivers
Avnet SpeedWay Design Workshop™
Overview of Simulink blocks in BSP to support DM6437. Note the extensive list of
parameters offered for each peripheral.
23
23
Avnet SpeedWay Workshops
Avnet Board Support Package / VPSS
DSP
Core
VPBE
Video
VPBE
INTERFACE
VLYNQ
Control
VLYNQ
LOGICORE
VPFE
Video
VPFE
INTERFACE
Co-Processor
• VPSS blocks used by automatic
code-generation to call DSP/BIOS
driver APIs
Avnet SpeedWay Design Workshop™
24
How is the VPSS connectivity accomplished ?
This is accomplished with Avnet BSP for Simulink, developed in collaboration with The
MathWorks. For code generation, the VPFE and VPBE blocks are used by RTW
Embedded Coder to call the DSP/BIOS driver API.
24
Avnet SpeedWay Workshops
Avnet Board Support Library / VLYNQ
DSP
Core
VLYNQ
Control
VLYNQ
LOGICORE
VPFE
Video
VPFE
INTERFACE
Co-Processor
• VLYNQ block used by automatic code-generation to call VLYNQ
DSP/BIOS driver API
Avnet SpeedWay Design Workshop™
25
How is the VLYNQ connectivity accomplished on the DSP side ?
This is accomplished with Avnet BSP for Simulink, developed in collaboration with The
MathWorks. For code generation, the VLYNQ block is used by RTW Embedded Coder to
call the DSP/BIOS driver API.
(Recall directory structure of Avnet BSP from lecture 4)
25
Avnet SpeedWay Workshops
…
Passing FPGA Memory Map via MATLAB
MathWorks
MathWorks
TI
TI
MATLAB
MATLAB®® and
and Simulink
Simulink®®
Algorithm
Algorithm and
and System
System Design
Design
Export memory map via MATLAB
REG
Avnet
Avnet
Real-Time Workshop
Real
Real-Time
Workshop
Embedded
EmbeddedCoder,
Coder,
Targets,
Targets,Links
Links
Memory Map
FIFO
DaVinci processor
Generate
Generate
C/
ASM
Xilinx
Xilinx System
System
Generator
Generator for
for DSP
DSP
Verify
Xilinx
Xilinx
RAM
HDL
Link for CCS
MemoryMapped IO
0400:0000
04000001
Code Composer
ISE
Co-Processor
REG
DSP
VLYNQ
FPGA
FIFO
0800:0000
0800:00FF
0800:0100
Avnet Spartan3A-DSP FPGA / DaVinci Platform
0801:00FF
RAM
Avnet SpeedWay Design Workshop™
26
Shared memories in the System Generator model destined for the FPGA co-processor are
associated with the DM6437 through the ‘DaVinci Processor’ VLYNQ Interface block’s
GUI in System Generator. After an association is made, System Generator automatically
generates a memory map of all shared memory in the model.
<mouse click>
During code generation, the memory map is exported to Code Composer Studio via the
MATLAB workspace to create memory-mapped IO in DM6437 that communicate with
corresponding registers, FIFOs and RAM elements in the FPGA co-processor over VLYNQ.
<mouse click>
On the FPGA side, System Generator project integration with ISE carries memory
mapping information to the VLYNQ IP in ISE, where the final bitstream is created.
<mouse click>
The result is an association between memory-mapped IO space in the DM6437 and
registers, FIFOs and RAM memory elements in the FPGA co-processor, which appear to
the DM6437 as local memory space through VLYNQ.
Push-button automatic code generation removes all the grunt work of manually
maintaining the API, including memory-maps, function headers and C-code device
drivers in Code Composer Studio.
26
Avnet SpeedWay Workshops
Implementing DSP to FPGA VLYNQ Interface
DSP design
FPGA design
Memory Map
communicated
via MATLAB
Avnet SpeedWay Design Workshop™
27
Here we show usage of the DM6437 VLYNQ Interface blocks in Simulink to connect
DM6437 in the top windows to the FPGA co-processor in System Generator in the bottom
windows. Note the memory-mapping for a single shared register passed via the MATLAB
workspace.
27
Avnet SpeedWay Workshops
TC6 Automatic Code Generation for DM6437
• VLYNQ DSP/BIOS driver API created by automatic code-generation
from Avnet BSP VLYNQ block
Avnet SpeedWay Design Workshop™
Excerpt of auto-generated code from The MathWorks Embedded Coder for TC6 from
VLYNQ block in Avnet board support library for Simulink.
28
28
Avnet SpeedWay Workshops
Clock Domains in System Generator
• Various FPGA infrastructure on different clock domains
• Multiple Subsystem Generator allows multiple asynchronous clock
domains in one System Generator model
Avnet SpeedWay Design Workshop™
29
Multiple clock domains are handled seamlessly by the Avnet board support package
using a powerful feature of System Generator: Multiple Subsystem Generator.
This example shows VLYNQ interface to DSP on one clock domain, VPFE for incoming
video one another clock domain, and VPBE for video display on a third clock domain.
Note that the top-level FPGA design is finalized in ISE after project export from System
Generator.
29
Avnet SpeedWay Workshops
Avnet Board Support Package / Demos
• Suite of demos integrated into board support package
• FPGA-based co-processors using model based design
Avnet SpeedWay Design Workshop™
30
A comprehensive suite of demos is integrated into the Simulink board support package
for the Avnet Spartan-3A DSP FPGA DaVinci Development Kit. Demos cover these
aspects of creation of FPGA-based co-processors using model based design:
•LCD Demo: generate an image on the LCD panel of the Avnet Spartan-3A DSP
FPGA DaVinci Development Kit
•Resizer demo: demonstrates two methods for resizing an image
•NTSC to LCD passthrough: demonstrates how to implement a NTSC to LCD
passthrough
•SVGA to LCD passthrough: demonstrates how to implement a SVGA to LCD
passthrough
•Video surveillance recording: demonstrates a video surveillance recording
application with motion-detection algorithm on the DM6437 DSP
•LED Demo: Using a very simple example, a model-based design is gradually
targeted to DSP and FPGA hardware.
30
Avnet SpeedWay Workshops
Avnet Design Resource Center
• Download Board Support Package for Simulink from DRC
Avnet SpeedWay Design Workshop™
31
31
Avnet SpeedWay Workshops
Stand-Alone Video Stabilization System
Avnet Xilinx Spartan3A-DSP DaVinci Evaluation Platform
2X
Scaler
RGB
24-bits
1024 x 768
60 Hz
VPBE
INTERFACE
Flat Panel
Controller
VPBE
XGA
Flat
Panel
Image
Translate
DDR2
Template,
ROI
VLYNQ
Best-match
row,column
VLYNQ
LOGICORE
VPFE
INTERFACE
Scaler
VPFE
SAD
Video
source
NTSC
Avnet SpeedWay Design Workshop™
32
Block diagram of stand-alone video stabilization system that will be built in
lab 5. The architecture of the Avnet Spartan-3A-DSP DaVinci board routes
video data through the FPGA towards the DM6437 over the dedicated
VPFE video port.
Template and ROI data are sent to the FPGA at each frame for SAD search of
template in region of interest (ROI). Best-match result of SAD is sent back
to DM6437 over VLYNQ.
Motion vector is used as offset for image translation to stabilize the video
from frame-to-frame. Video output is sent over VPBE to FPGA for display
on XGA flat panel.
32
Avnet SpeedWay Workshops
…
Integrating the DSP and FPGA Co-processor
TI
TI
Xilinx
Xilinx
MATLAB
MATLAB®® and
and Simulink
Simulink®®
Algorithm
Algorithm and
and System
System Design
Design
Real-Time Workshop
Real
Real-Time
Workshop
Embedded
EmbeddedCoder,
Coder,
Targets,
Targets,Links
Links
Video
source
Generate
Generate
C/
ASM
Verify
Avnet
Avnet
Xilinx
Xilinx System
System
Generator
Generator for
for DSP
DSP
Verify
MathWorks
MathWorks
LCD
Panel
HDL
Link for CCS
Hardware
CoCo-simulation
Code Composer
ISE
DSP
VLYNQ
FPGA
Verify
Chipscope
Chipscope
Avnet Spartan3A-DSP FPGA / DaVinci Platform
Avnet SpeedWay Design Workshop™
33
Preview of lab 5:
< mouse click >
1. Implement connectivity in System Generator for data transfer between the DM6437
and FPGA co-processor over VLYNQ.
< mouse click >
2. Continue with automatic code generation of executables for both DSP and FPGA,
including the Avnet board support package for Simulink on Avnet Spartan-3A DSP
DaVinci development Kit.
< mouse click >
3. Conclude with in-system verification techniques of the combined the DSP and FPGA
co-processor system.
Hardware co-simulation was used for functional verification in lab 4. It is not used for
stand-alone implementation, and is shown here as reference only.
Note that video now flows into the system from a live source, contrary to video frames
generated by Simulink for hardware co-simulation.
33
Avnet SpeedWay Workshops
Summary
• Interfacing the DSP and FPGA Co-Processor
• Avnet Spartan3A-DSP DaVinci Platform with + PS Video
EXP Module
• Model-Based Infrastructure for Stand-Alone Implementation
… proceed to lab 5 Integrating the DSP and FPGA Coprocessor
Avnet SpeedWay Design Workshop™
34
34
Avnet SpeedWay Workshops
Reference Slides
Avnet SpeedWay Design Workshop™
35
35
Avnet SpeedWay Workshops
VLYNQ Data Flow
Video
Processing
Subsystem
DSP
Core
Co-Processor
Remote VLYNQ
Custom Interface
Local VLYNQ
Avnet SpeedWay Design Workshop™
36
VLYNQ block diagram.
The previous slide showed memory mapping between the local (host) device’s address space and the remote address space. This is accomplished
via the address translation blocks. A remote VLYNQ device is mapped to the local device’s address via the address map registers (TX address map,
RX address map size n, RX address map offset n, where n = 1
to 4). For clarity, the map registers aren’t shown on the block diagram above.
The data flow between two VLYNQ devices is shown here, in which the write originates from the DM643x slave configuration bus interface towards
the outbound command (CMD) FIFO after address translation. Data is subsequently read from the FIFO and encapsulated in a write request packet.
The packet is encoded and serialized before being transmitted to the remote VLYNQ in the FPGA.
The remote device subsequently de-serializes and decodes the receive data and writes it into the inbound CMD FIFO. A write operation initiates on
the FPGA VLYNQ OPB master bus interface (On-Chip Peripherial Bus) after reading the address and data from the FIFO. 32-bit OPB interface
standard can interface directly to an embedded processor in the FPGA, or a custom user interface, as shown.
Finally, address decoding can deliver the data to register(s) of the addressed peripheral.
The Xilinx VLYNQ serial interface is not directly coupled to the OPB interface; there are asynchronous FIFOs between the two interface domains, and
the interfaces operate independently. However, if the OPB fails to generate sufficient commands and data to consume all the VLYNQ interface’s
bandwidth, the VLYNQ interface generates idle packets. If the OPB fails to immediately accept all remotely generated commands and data, the
FIFOs fill and the VLYNQ interface turns flow control on.
Reference:
TMS320DM643x DMP VLYNQ Port User's Guide
Literature Number: SPRU938B
Section 2.5.1
Xilinx VLYNQ v1.3 / Core Generator 10.1
Literature Number: DS324
36
Avnet SpeedWay Workshops
VLYNQ References
www.xilinx.com/products/ipcenter/DO-DI-VLYNQ.htm
http://focus.ti.com/lit/ug/spru938b/spru938b.pdf
Avnet SpeedWay Design Workshop™
37
VLYNQ documentation consists of the TMS320DM643x DMP VLYNQ Port User’s Guide from
TI and of the VLYNQ LogiCore datasheet from Xilinx.
37
Avnet SpeedWay Workshops
VLYNQ DSP/BIOS Driver
vlynq_config.peer_tx_addr
= 0;
vlynq_config.local_rtm_cfg_type = no_rtm_cfg;
vlynq_config.peer_rtm_cfg_type = no_rtm_cfg;
vlynq_config.local_tx_fast_path = FALSE;
vlynq_config.peer_tx_fast_path = FALSE;
/* Initialize the VLYNQ control module */
ptr_vlynq = PAL_sysVlynqInitSoc(&vlynq_config);
if(NULL == ptr_vlynq)
{
VLYNQ_DEBUG("VLYNQ :Failed to initialize the vlynq 0x%08x\n\r",
vlynq_config.base_addr);
VLYNQ_DEBUG("VLYNQ :The error msg: %s\n\r", vlynq_config.error_msg);
goto av_vlynq_init_fail;
}
/* Map memory regions of device for remote/local VLYNQ depending on region ID to be mapped and the size and offset. */
while(init_p_region->id > -1)
{
if(VLYNQ_APP_SUCCESS != PAL_sysVlynqMapRegion(ptr_vlynq, init_p_region->remote, init_p_region->id,
init_p_region->offset, init_p_region->size, ptr_vlynq_dev))
Avnet SpeedWay Design Workshop™
38
On the TI SOC software side, a VLYNQ peripheral is implemented using a set of functions within
the API (application programming interface) provided by the VLYNQ device driver.
Shown above are 2 of the preparatory steps to activate VLYNQ: PAL_sysVlynqInitSoc to
initialize the VLYNQ control module, and PAL_sysVlynqMapRegion to map memory regions of
the device for remote/local VLYNQ depending on the region ID to be mapped and the size and
offset.
Refer to VLYNQ Device Driver architecture for a full description of all functions in the API.
38
Avnet SpeedWay Workshops
Avnet BSP Installation Package
Avnet Tools:
- avnet_3adsp_dm6437_0_04
AVNET_S3ADSP_DM6437_INSTALL_DIR => C:\avnet_s3adsp_dm6437_0_04
PSP_EVMDM6437_INSTALLDIR => %AVNET_S3ADSP_DM6437_INSTALL_DIR%\psp
CSLR_DM6437_INSTALLDIR => %AVNET_S3ADSP_DM6437_INSTALL_DIR%\psp\pspdrivers\soc\dm6437\dsp\inc
DSP drivers
(CCS specific)
FPGA logic
(ISE specific)
DSP blockset
(Target Support Package TC6 & Embedded IDE Link CC specific)
FPGA blockset (SysGen specific)
Modified version of C:\dvsdk_1_01_00_15\ndk_1_92_00_22_eval
Modified version of C:\dvsdk_1_01_00_15\psp_1_00_02_00
Avnet SpeedWay Design Workshop™
39
Once installed, the Avnet Spartan-3A DSP DaVinci board support package consists of the
above directory structure.
Note:
•NDK = Modified-for-Avnet version of DVSDK for TI DM6437 EVM :
C:\dvsdk_1_01_00_15\ndk_1_92_00_22_eval
•PSP = Modified-for-Avnet version of PSP DSP/BIOS drivers for TI DM6437 EVM :
C:\dvsdk_1_01_00_15\psp_1_00_02_00
39
Avnet SpeedWay Workshops
Spartan-3A DSP DaVinci Board Support Package
DSP drivers (Code Composer Studio specific)
FPGA logic (ISE specific)
DSP blockset
(Target Support Package TC6 & Embedded IDE Link CC specific)
FPGA blockset (System Generator specific)
Network Devloper’s Kit (DSP/BIOS)
PSP Drivers for DM6437 (DSP/BIOS)
Ethernet Hardware
Co-Simulation support files
Avnet SpeedWay Design Workshop™
40
Once installed, the Avnet Spartan-3A DSP DaVinci board support package consists of the
above directory structure. We concentrate here on Ethernet hardware co-simulation
support files. All other components of the BSP will be presented in lecture 5.
Note:
•NDK = Modified-for-Avnet version of DVSDK for TI DM6437 EVM :
C:\dvsdk_1_01_00_15\ndk_1_92_00_22_eval
•PSP = Modified-for-Avnet version of PSP DSP/BIOS drivers for TI DM6437 EVM :
C:\dvsdk_1_01_00_15\psp_1_00_02_00
40
Avnet SpeedWay Workshops
Ethernet Hardware Co-Simulation Support Files
• Board appears in list of targets for
Ethernet hardware co-simulation
Avnet SpeedWay Design Workshop™
41
Avnet provides Ethernet hardware co-simulation support files for the Spartan-3A DSp DaVinci, as
well as several Avnet Virtex-5 evaluation kits. The support files, known as ‘plugins’ are packaged
in a standard format for the System Generator plugin installer ‘xlinstallplugin’. Once installed
under the directory tree shown here, the board appears in the target list for Ethernet point-topoint hardware co-simulation.
41
Avnet SpeedWay Workshops
Accelerating Your Success™
‹
Installation Package
‹
BSL – Board Support Libraries
‹
MSL – Model Support Libraries
‹
LED Demo
V10_1_2_0
42
Avnet SpeedWay Workshops
BSL – DSP drivers
bsl\dsp\gel:
- avnet_s3adsp_dm6437.ccs => CCS setup for BlackHawk USB510L
- avnet_s3adsp_dm6437.gel => GEL file for Avnet board
bsl\dsp\src:
bsl\dsp\inc:
- dm6437_init.c/.h => various init/config routines
- fpga_interface.c/.h => FPGA device driver (apply/release reset)
- vlynq_interface.c/.h => VLYNQ device driver
- led_interface.c/.h => LED device driver
- dip_interface.c/.h => DIP Switch device driver
- vpss_interface.h => contains a bunch of useful defines
bsl\dsp\dspbios:
- Platform.tci => ??
Avnet SpeedWay Design Workshop™
43
43
Avnet SpeedWay Workshops
BSL – FPGA Logic
bsl\fpga\rtl:
- pattern => XGA pattern generator (color bars + moving logo)
- lcd => LCD flat panel interface
- picoblaze => picoblaze-based I2C controller
- vlynq => VLYNQ interface core
- video => video interfaces (stddef, hidef, vpfe, vpbe)
- debug => ChipScope debug module
- top_level => top level designs
bsl\fpga\chipscope:
- ChipScope Analyzer project for FPGA debug
bsl\fpga\ucf:
- constraints file for FPGA designs
bsl\fpga\ise
- davinci_coprocessor_stddef => example design for Composite input
- davinci_coprocessor_hidef => example design for VGA input
Avnet SpeedWay Design Workshop™
44
44
Avnet SpeedWay Workshops
Accelerating Your Success™
‹
Installation Package
‹
BSL – Board Support Libraries
‹
MSL – Model Support Libraries
‹
LED Demo
V10_1_2_0
45
Avnet SpeedWay Workshops
MSL – DSP Logic
DIP Switch:
- Reads one of SW10[1:4] switches (cannot be used with VPFE/VPBE)
LED:
- Writes to one of D7, D8, D9, D10 LEDs
VLYNQ Read/Write:
- Reads/Writes to FPGA peripherals via VLYNQ
Avnet SpeedWay Design Workshop™
46
46
Avnet SpeedWay Workshops
MSL – FPGA Blockset
DaVinci Processor:
- similar to Xilinx’s EDK Processor block
- automatically creates VLYNQ bus logic to all shared regs/fifos/mems
- creates memory map
I2C Controller:
- PicoBlaze-based I2C Controller
- Command Port via request/response FIFOs
Avnet SpeedWay Design Workshop™
47
47
Avnet SpeedWay Workshops
Implementing DSP to FPGA VLYNQ Interface
FPGA design
VLYNQ
bus logic
Automatically
created
Avnet SpeedWay Design Workshop™
48
Memories used in the co-processor are associated with the DaVinci processor through
the block’s GUI interface in system Generator.
After an association is made, System Generator automatically generates an interface that
marshals data to and from the processor over VLYNQ. On the DaVinci side Target for
C6000 handles automatic code generation. Having the control and processor in the same
development environment removes all the grunt work of manually maintaining the API,
including memory-maps, function headers and C-code device drivers in Code Composer
Studio.
48
Avnet SpeedWay Workshops
Accelerating Your Success™
‹
Installation Package
‹
BSL – Board Support Libraries
‹
MSL – Model Support Libraries
‹
LED Demo
V10_1_2_0
49
Avnet SpeedWay Workshops
LED Demo – DIP Implementations
for
simulation
only
for
DSP build
Avnet SpeedWay Design Workshop™
50
50
Avnet SpeedWay Workshops
LED Demo – LED Implementations
for
simulation
only
for
DSP build
Avnet SpeedWay Design Workshop™
51
51
Avnet SpeedWay Workshops
LED Demo – Simulation only
Avnet SpeedWay Design Workshop™
52
52
Avnet SpeedWay Workshops
LED Demo – DSP only
Avnet SpeedWay Design Workshop™
53
53
Avnet SpeedWay Workshops
Serial RapidIO™ Enables Increased Bandwidth
(TI TMS320C6455, C6474, etc.)
Serial RapidIO is a high-performance, packet-switched, interconnect
technology that addresses the embedded industry's need for:
Reliability
Increased Bandwidth
Faster Bus Speeds
Serial RapidIO allows chip-to-chip and board-to-board communications
at performance levels scaling to ten Gigabits per second and beyond
•C6455 Serial RapidIO Support – IEEE 1149.6 Compliant
– 1.25, 2.5, 3.125 GBit/sec per link
ƒ Up to four 1x links (each 1x link is bidirectional) --OR-ƒ Up to one 4x link (bi-directional pipe), which provides up to 12.5 GBit/sec
– Resulting range 10 – 25 GBits/sec total (1.25 – 3.125 GBytes/sec)
– Supports DSP-to-DSP on the same board, DSP-to-Switch, DSP-to-FPGA,
etc.
•Benefits
– 1x Link is fast enough to send HD 1080i raw video between devices
– 4x Link is easily fast enough to send HD 1080p raw video between devices
– Reduction in chip count, board area and system cost
54
Avnet SpeedWay Design Workshop™
TI customers asked for faster IO performance. TI listened. TI are bus agnostic. So, let’s first explain why did TI choose Serial Rapid IO
for C6455:
High Performance for HD video and Telecom Channel Density
Worldwide standard, Multiple applications, broad OEM adoption
Flexible / scaleable rates and widths (1x or 4x)
Low pin count and Low power per link
TI was part of the consortium that defined the standard with other industry leaders.
The theoretical payload bandwidth is up to 25Gbits/sec, but there is some overhead (addresses, acknowledgement, error correction) with
any communications protocol. (reality may be ~19 or 20 Gbits/sec)
From a video infrastructure applications perspective, the 1x Link is fast enough to send HD 1080i raw video between devices and the 4x
link can easily send HD 1080p raw video between devices. The use of SRIO in infrastructure applications with large “DSP farms” may
allow the reduction of FPGA cost (quantity, pin count, size and/or cost) for our OEMs.
54
Download