BittWare Overview

advertisement
BittWare Overview
March 2007
Agenda
• Corporate Overview
• Hardware Technical Overview
• Software Technology Overview
• Demo
Who is BittWare?
A leading COTS signal
processing vendor, focused on providing:
the
“Essential building blocks”
(DSPs, FPGAs, I/O, SW Tools,
IP, & Integration) that our
customers can use to build
“Innovative solutions”
BittWare Corporate Overview
• Private company founded in 1989
 Founded by Jim Bittman (hence the spelling)
• Essential Building Blocks for innovative Signal Processing
Solutions
 Focused on doing one thing extremely well
 #2 in recognition for DSP Boards (source: VDC Merchant Board Survey 2004)
• Committed to providing leading edge, deployable products,
produced with timely & consistently high quality
 Tens of Thousands of boards shipped
 100’s of active customers
• Financially Strong: Profitable & Growing
• Headquartered in Concord, New Hampshire, USA
 Engineering/Sales Offices in:
 Belfast, Northern Ireland (UK) (Formally EZ-DSP, acquired Sept. 2004)
 Leesburg, Virginia (Washington DC)
 Phoenix, Arizona
• 15 International Distributors
 Representing 38 countries
BittWare’s Building Blocks
• High-end Signal Processing Hardware (HW)
 Altera FPGAs & TigerSHARC DSPs
 High Speed I/O
 Board Formats:
 CompactPCI (cPCI), PMC, & PCI
 VME
 Advanced Mezzanine Card (AMC or AdvancedMC)
• Silicon & IP Framework
 SharcFIN
 ATLANTiS
• Development Tools
 BittWorks
 Trident
• Systems & Services
BittWare Business Model & Markets
BittWare provides essential building blocks for
innovative signal processing solutions
at every stage of the OEM Life-cycle
Application-specific
Products
COTS
Products
• Signal Processing HW
• System Integration
 Altera FPGAs
• Custom FPGA Design
 TigerSHARC DSPs
 Interfacing
 High Performance I/O
 Processing
• Development/Deployment
 PCI; PMC; cPCI
• Tailored Signal Processing
 VME
Boards
 AdvancedMC (AMC)
• Specialized/Custom I/O
• Defense/Aerospace
• Silicon & IP Frameworks
• Application Software
• Communications
 SharcFIN
integration/implementation
• High-End Instrumentation
 ATLANTiS
• Development Tools • Life Sciences • Technology & Intellectual
Property Licensing
 BittWorks Tools
 Function Libraries
 Trident MP-RTOE
Markets
Hardware Technology Overview
• Hybrid Signal Processing
• T2 Family
SharcFIN
ATLANTiS
T2 Boards (PCI, PMC, cPCI, VME)
• GT and GX Family
FINe
GT Boards (cPCI, VME)
GX Boards (AMC, VME)
Hybrid Signal Processing Concept
Input
Output
I/O Interface
I/O Interface
FPGA
Pre-Processing
Co-Processing
Post-Processing
Inter-Processor Communications (IPC)
Programmable
DSP(s)
Hybrid Signal Processing Architecture
Hybrid Signal Processing
Control Plane
Memory Module
I/O Interfacing
SerDes
LVDS pairs -or- Single-Ended DIO
RS232/422
Interface
FPGA
Interprocessor Communications (IPC)
GigE
Command & Control Bus
Host/
Control
Bridge
DSP
DSP
#0
#1
DSP
DSP
#2
#3
Flash
(64 MB)
PCI bus
BittWare’s T2 Board Family
TigerSHARC multiprocessing boards for ultra-high
performance applications, using a common
architecture across multiple platforms and formats







Clusters of 4 ADSP-TS201S DSPs @ up to 600MHz
 14,400 MFLOPS per cluster
Xilinx Virtex-II Pro FPGA interface/coprocessor
ATLANTiS™ Architecture: up to 8.0 GB/sec I/O
 2 Links per DSP off-board @ 125MB/sec each
 routed via FPGA to DIO/RocketIO SERDES
Ring of link ports interconnected within cluster
SharcFIN ASIC (SFIN-201) providing:
 64-bit, 66 MHz PCI bus
 8MB Boot Flash
 FPGA Control Interface
PMC+ expansion site(s)
Large shared SDRAM memory (up to 512MB)
T2 Architecture Block Diagram
L2
Up to 3 separate
64-pin DIO (Digital
I/O) ports can be
used to implement
Link ports, parallel
buses, and/or other
interconnects
TS201
#1
L3
L2
TS201
64-bit, 83.3 Mhz Clusterl Bus
4 x L1
Serdes
4 x L0
DIO
(64-192 pins)
RocketIO
(8 Channels)
ATLANTiS FPGA implements link routing
8 Full-duplex Link Ports from DSPs
• Configured & controlled via SharcFIN 64-bit, 66 Mhz PCItoLocal
FPGA
(2 from each DSP):
Bus
• Access via TigerSHARCs and Host
• Each link provides
• Can also be used for pre/post-processing
• 125 MB/sec Transmit
Boot Flash
• 125 MB/sec Receive
8-bit Bus
8 Ints&
SharcFIN
8 Flags
SF201
Basic Architecture is the
• Total I/O bandwidth = 2.0 GB/sec
8 Ints&
same as before (HH & ATLANTiS 8 Flags
FPGA
TS) except the two I/O
links per DSP are routed
SharcFIN-201 Bridge
(transferred) via
provides powerful, easy to
ATLANTiS FPGA
use PCI/Host command &
L3
TS201
#4
control interface
L2
L3
TS201
#3
8 channels#2of RocketIO SerDes
@ 2.5 GHz each
L3
L2
• Each channel provides ~250 MB/sec both ways
• Total I/O bandwidth is 4.0 GB/sec
• Connected(SO-DIMM
via SDRAM
two
4x Infiniband-type HW, or backplane
up to 512MB)
SharcFIN 201 Features
• 64/66MHz PCI bus master Interface (rev. 2.2)

528MB/sec burst

460MB/sec sustained writes (SF to SF)

400MB/sec sustained reads (SF to SF)
• Cluster bus interface to ADSP-TS201s @ 83.3MHz
• Access DSP internal memory & SDRAM from PCI
• 2 independent PCI bus mastering DMA engines
• 6 independent FIFOs (2.4KB total)

2 for PCI target to/from DSP DMA (fly-by to SDRAM)

2 for PCI target to/from DSP internal memory

2 for PCI bus mastering DMA to/from DSP DMA
• General purpose peripheral bus

8-bits wide, 22 address bits, 16MB/sec

Reduces cluster bus loading, increasing cluster bus speed

Accessible from DSP cluster bus & PCIbus

Flash interface for DSP boot & non-volitile storage
• I2O V1.5 compliant
• I2S serial controller
• Programmable interrupt & flag multiplexer

10 inputs; 7 outputs

1 inputs/1output dedicated to PCI
• Extensive SW support via BittWorks HIL & DSP21K
SFIN -201
SharcFIN-201 Block Diagram
What is ATLANTiS?
A Generic FPGA Framework for I/O, Routing & Processing
•
An I/O routing device in which every I/O can be dynamically connected to
any other I/O!
 Like a Software programmable ‘cable’ – but better!
 ATLANTiS provides communication between the TigerSHARC link ports and
all other I/Os connected to the FPGA/Board
 Off-board I/O defined by board architecture
 Communication can be point-to-point, or broadcast to various outputs
 Devices can be connected or disconnected as requirements dictate w/o
recompiling or changing cables
•
A configurable FPGA Pre/Post/Co-Processing engine
 Standard IP blocks
 Customer/Custom developed blocks
T2 ATLANTiS Detail Diagram
External I/O & connectors dependant on specific board implementation
RocketIO SerDes
RocketIO SerDes
PCI Bus
Off-Board
64-bit DIO Port (optional)
Clusterto-Cluster
64-bit DIO Port (optional)
Peripheral
Control Bus
ATLANTiS FPGA
SDRAM
PMC+
64-bit DIO Port
L0
L1
TigerSHARC
TS-201
#1
L0
L1
TigerSHARC
TS-201
#2
L0
L1
TigerSHARC
TS-201
#3
L0
L1
TigerSHARC
TS-201
#4
TigerSHARC Cluster Bus
SharcFIN
8 x 8 ATLANTiS Switch Diagram
OUT
0
OUT
1
OUT
4
OUT
5
OUT
6
OUT
7
128
128
128
128
128
128
128
128
OUT
3
OUT
2
Configuration
Registers
8 x 8 Switch
Control Bus
128
IN
0
128
IN
1
128
IN
2
128
IN
3
128
IN
4
128
IN
5
128
IN
6
128
IN
7
Other Major ATLANTiS Components
Packet
Protocal
Output FIFO
Buffer
SerDes
Transmitter
Link Port
Intf. Circuit
Output FIFO
Buffer
TS201 LinkPort
Transmitter
Null Intf.
Circuit
Output FIFO
Buffer
Processing
Post-Processing
Block
Null Transmitter
Output FIFO
Buffer
Co-Processing
Block
Through Routing
Block
Must be used
as a pair to
same endpoint
SerDes
Receiver
Input FIFO
Buffer
Packet
Protocal
TS201 LinkPort
Receiver
Input FIFO
Buffer
Link Port
Intf. Circuit
Null Receiver
Input FIFO
Buffer
Null Intf.
Circuit
Pre-Processing
Block
Co-Processing
Block
Input FIFO
Buffer
Pocessing
ATLANTiS Put Together
SerDes
SerDes
Link
CDR
Link
OUT
1
OUT
2
OUT
3
PCI Bus
PMC+
64-bit DIO Port
OUT
0
OUT
4
OUT
5
OUT
6
OUT
7
Configuration
Registers
8 x 8 Switch
Clusterto-Cluster
64-bit DIO Port (optional)
Peripheral
Control Bus
ATLANTiS FPGA
IN
0
IN
1
IN
2
IN
3
Link
CDR
Link
L0
L1
IN
4
IN
5
IN
6
IN
7
SDRAM
Off-Board
64-bit DIO Port (optional)
*Links, DIO, &
SerDes are now
routed by Switch
L0
L1
TigerSHARC
TS-201
#1
TigerSHARC
TS-201
#2
L0
L1
TigerSHARC
TS-201
#3
L0
L1
TigerSHARC
TS-201
#4
TigerSHARC Cluster Bus
SharcFIN
How is
Used?
FPGA Configuration
1) BittWare Standard Implementations (Loads)
 Works out-of-the-box (doesn’t require any FPGA design capabilities)
 Fixed interfaces & connections define switch I/Os
 Variety of I/O configuration options are available with boards
2) Developer’s kit
 Fully customizable (by BittWare and/or end user)
 All component cores in kit
 Requires FPGA Development Tools & design capabilities
Run-Time Set-up and Control
1) Powerful, easy to use GUI (Navigator)
 Set up for any and all possible routings
2) Use DSP or Host to program Control Registers
 Initial configuration
 Change routing at any time by re-programming Control Registers
ATLANTiS Configurator
T2 Board Family
T2PC: Quad PCI TigerSHARC Board
T2PM: Quad PMC TigerSHARC Board
T26U: Octal 6U cPCI TigerSHARC Board
T2V6: Octal 6U VME TigerSHARC Board
T2-PCI Features
• One Cluster of Four ADSP-TS201S TigerSHARC® DSPs
processors running at 600 MHz each
- 24 Mbits of on chip SRAM per DSP
- Static Superscaler Architecture
- Fixed or Floating point operations
• 14.4 GFLOPS (floating point) or 58 GOPS (16-bit) of DSP
Processing power
• Xilinx Virtex-II Pro FPGA interface/coprocessor
• ATLANTiS Architecture: up to 4.0 GB/sec I/O
- Eight external link ports @ 250MB/sec each
- Routed via Virtex-II Pro
- RocketIO SerDes Xcvrs, PMC+, DIO headers
• Two link ports per DSP dedicated for interprocessor
communications
• Sharc®FIN (SFIN201) 64/66 PCI interface
• PMC site with PMC+ extensions for BittWare’s PMC+ I/O
modules
• 64 MB-512 MB SDRAM
• 8 MB FLASH memory (boots DSPs & FPGA)
• Complete software support, including remote control and
debug, support for multiple run-time and host operating
systems, and optimized function libraries
• Standalone operation
64-bit, 66 Mhz PCI Local Bus
PCI-PCI
Bridge
64
Boot Flash
J4
L3
JTAG Header
L2
TS201
#2
L3
64-bit, 83.3 Mhz Clusterl Bus
4 x L1
L2
TS201
#1
DIO Header
64 Signals
Ext. Power
8 Ints &
8 Flags
SharcFIN
SF201
8 Ints &
8 Flags
VirtexII-Pro
4 x L0
Serdes
DIO Headers
2 @ 20 signals
Rocket I/O
(8 Channels)
8-bit Bus
PMC+
L3
TS201
#4
L2
L3
TS201
#3
L2
SDRAM
(SO-DIMM up to 512MB)
PCI Conn.
T2PC Block Diagram
T2PM Features
• One Cluster of Four ADSP-TS201S TigerSHARC® DSPs
processors running at up to 600 MHz each
- 24 Mbits of on chip SRAM per DSP
- Static Superscaler Architecture
- Fixed or Floating point operations
• 14.4 GFLOPS (floating point) or 58 GOPS (16-bit) of DSP
Processing power
• Xilinx Virtex-II Pro FPGA interface/coprocessor
• ATLANTiS Architecture: up to 4.0 GB/sec I/O
- Eight external link ports @ 250MB/sec each
- Routed via Virtex-II Pro
- RocketIO SerDes Xcvrs, PMC+, DIO header
• Two link ports per DSP dedicated for interprocessor
communications
• Sharc®FIN (SFIN201) 64/66 PCI interface
• PMC format with BittWare’s PMC+ extensions
• 64 MB-256 MB SDRAM
• 8 MB FLASH memory (boots DSPs & FPGA)
• Complete software support, including remote control and debug,
support for multiple run-time and host operating systems, and
optimized function libraries
• Standalone operation
T2PM Block Diagram
64-bit, 66 Mhz PCI Local Bus
Boot Flash
8 Ints&
8 Flags
SharcFIN
SF201
8 Ints&
8 Flags
64
J4
L2
TS201
#1
L3
L2
TS201
#2
L3
64-bit, 83.3 Mhz Clusterl Bus
4 x L1
FPGA
VirtexII-Pro
4 x L0
Serdes
JTAG Header
(optional)
Rocket I/O
(8 Channels)
Front Panel
8-bit Bus
PMC+ Conn.
PMC Conn.
J1-3
J1-3
L3
TS201
#4
L2
L3
TS201
#3
L2
SDRAM
(up to 256MB)
T26U cPCI Features
• Two Clusters of Four ADSP-TS201S TigerSHARC® DSPs
processors (8 total) running at 500 MHz each
- 24 Mbits of on chip SRAM per DSP
- Static Superscaler Architecture
- Fixed or Floating point operations
• 24 GFLOPS (floating point) or 96 GOPS (16-bit) of DSP Processing
power
• Two Xilinx Virtex-II Pro FPGA interface/coprocessors
• ATLANTiS Architecture: up to 6.0 GB/sec I/O
- Sixteen external link ports @ 250MB/sec each
- Routed via Virtex-II Pro
- RocketIO SerDes Xcvrs, PMC+, DIO (Cross-cluster)
• Two link ports per DSP dedicated for interprocessor communications
• Sharc®FIN (SFIN201) 64/66 PCI interface
• Two PMC sites with PMC+ extensions for BittWare’s PMC+ I/O
modules
• 128 MB-512 MB SDRAM
• 16 MB FLASH memory (boots DSPs & FPGAs)
• Complete software support, including remote control and debug,
support for multiple run-time and host operating systems, and
optimized function libraries
• Standalone operation
T26U Block Diagram
Rear Panel DIO
(64 Signals)
Rocket I/O
(4 Channels)
Rocket I/O
(4 Channels)
CPCI 64/66
Rear Panel DIO
(64 Signals)
64
64
PCI-PCI
Bridge
High-Speed Serdes
High-speed Serdes
PCI-PCI
Bridge
Boot Flash
8-bit bus
8 Ints&
8 Flags
SharcFIN
SF201
PCI-PCI
Bridge
64-bit, 66 Mhz PCI Local Bus
JTAG Header
64-bit, 66 Mhz PCI Local Bus
8 Ints&
8 Flags
Boot Flash
8 Ints&
8 Flags
8-bit bus
SharcFIN
SF201
8 Ints&
8 Flags
FPGA
FPGA
L2
TS201
#1
L3
L2
TS201
#2
L3
Rocket I/O
(4 Channels)
J4
L2
TS201
#4
TS201
#1
L3
L2
TS201
#3
L2
SDRAM
(up to 256MB)
PMC+
PMC+
A
B
L2
TS201
#2
L3
4 x L0
Cluster B
J4
L3
L3
4 x L1
Cluster A
64
64-bit, 83.3 Mhz Clusterl Bus
4 x L1
64
64-bit, 83.3 Mhz Clusterl Bus
High-speed Serdes
4 x L0
64
L3
TS201
#4
L2
L3
TS201
#3
L2
SDRAM
(up to 256MB)
Rocket I/O
(4 Channels)
T2 6U VME/VXS Features
• Two Clusters of Four ADSP-TS201S TigerSHARC® DSPs
processors (8 total) running at 500 MHz each
- 24 Mbits of on chip SRAM per DSP
- Static Superscaler Architecture
- Fixed or Floating point operations
• 24 GFLOPS (floating point) or 96 GOPS (16-bit) of DSP
Processing power
• Two Xilinx Virtex-II Pro FPGA interface/coprocessor
• ATLANTiS Architecture: up to 8.0 GB/sec I/O
- Sixteen external link ports @ 250MB/sec each
- Routed via Virtex-II Pro
- RocketIO SerDes Xcvrs, PMC+, DIO (Crosscluster)
• Two link ports per DSP for interprocessor ring
• Sharc®FIN (SFIN201) 64/66 PCI interface
• Tundra TSI-148 PCI-VME bridge with 2eSST support
• VITA-41 VXS Switched-Fabric Interface
• PMC site with PMC+ extensions for BittWare’s PMC+ I/O
modules
• 128 MB-512 MB SDRAM
• 16 MB FLASH memory (boots DSPs & FPGAs)
• Complete software support, including remote control and
debug, support for multiple run-time and host operating
systems, and optimized function libraries
• Standalone operation
T2V6 Block Diagram
VXS/P0
(8 Channels)
VME64/2eSST
4
P2 User Pins
4
64
32
VME-PCI
Bridge
High-speed Serdes
High-Speed Serdes
64-bit, 66 Mhz PCI Local Bus
8 Ints&
8 Flags
FPGA
64
8 Ints&
8 Flags
FPGA
4 x L1
4 x L1
T2V6 Block Diagram
64
L2
TS201
#0
L3
L2
TS201
#1
L3
J4
Cluster B
L3
L2
TS201
#0
TS201
#3
L3
L2
PMC+
L3
TS201
#2
L2
SDRAM
(up to 256MB)
L2
TS201
#1
L3
64-bit, 83.3 Mhz Clusterl Bus
Cluster A
64-bit, 83.3 Mhz Clusterl Bus
4 x L0
High-speed Serdes
8-bit bus
SharcFIN
SF201
Factory
Options
64
RocketIO
(4 Channels)
8 Ints&
8 Flags
L3
TS201
#3
L2
High-speed Serdes
8 Ints&
8 Flags
SharcFIN
SF201
4 x L0
8-bit bus
Boot
Flash
JTAG Header
Boot
Flash
L3
TS201
#2
L2
SDRAM
(up to 256MB)
RocketIO
(4 Channels)
T2V6 Heat Frame - Transparent
T2V6 Heat Frame
T2V6 Thermal Model
BittWare Levels of Ruggedization
BittWare Levels of Ruggedization
Characteristics
Type
Temperature
Operating
Storage
Commercial
Level 1
Level 1c
Air-Cooled
Air-Cooled
Air-Cooled
Level 2c
Level 3c
Conduction-Cooled Conduction-Cooled
0C to 50C
-40C to 75C
-40C to 75C
-40C to 75C
-40C to 85C
w/ 300 lin.ft/min airflo w
w/ 300 lin.ft/min airflo w
w/ 300 lin.ft/min airflo w
at Thermal Interface
at Thermal Interface
-55C to 100C
-55C to 100C
-55C to 100C
-55C to 100C
-55C to 100C
Mechanical
Random; 0.01g2/hz Random; 0.04g2/hz Random; 0.04g2/hz Random; 0.1g2/hz Random; 0.1g2/hz
Vibration
15Hz to 2kHz
15Hz to 2kHz
15Hz to 2kHz
15Hz to 2kHz
15Hz to 2kHz
(per M IL-STD-810E)
Shock
Conformal Coating
Humidity
(per M IL-STD-810E)
(per M IL-STD-810E)
(per M IL-STD-810E)
20g peak sawtooth 20g peak sawtooth 20g peak sawtooth 40g peak sawtooth 40g peak sawtooth
11ms duration
11ms duration
11ms duration
11ms duration
11ms duration
No
No
Yes
Yes
Yes
0 to 95%
0 to 95%
0 to 100%
0 to 100%
0 to 100%
no n-co ndensing
no n-co ndensing
co ndensing
co ndensing
co ndensing
Hardware Technology Overview
• PMC+ Extensions
• Barracuda High-Speed 2-ch ADC
• Tetra High-Speed 4-ch ADC
BittWare PMC+ Extensions
• BittWare’s PMC+ boards are an extension of
the standard PMC specification (userdefined J4 connector)
• Provides tightly coupled I/O and processing
to BittWare’s DSP boards:
 Hammerhead Family
 4 links, Serial TDM, flags, irqs, reset, I2C
 TS Family
 4 links, flags, irqs, reset, I2C
 T2 Family
 64 signals, routed as 32 diff pairs to
ATLANTiS
 Standard use is 4 links, plus flags and irqs
 Can be customized for 3rd party PMCs
Barracuda PMC+ Features
•
•
•
•
•
•
•
•
•
•
•
•
2 channel 14 bit A/D, 105 MHz (AD6645)
78 dB SFDR; 67 dB SNR (real-world in-system performance)
AC (transformer) or DC (op-amp) coupled options
64 bit, 66 MHz bus mastering PCI interface via SharcFIN
64 MB- 512 MB SDRAM for large snapshot acquisitions
Virtex-II 1000 FPGA reconfigurable over PCI
 used for A/D control and data distribution
 configurable preprocessing of high speed A/D data, such as digital filtering,
decimation, digital down conversion, etc.
 Developer’s kit available with VHDL source code
 Optional IP cores and integration from 3rd Parties for
DDR/DDC/SDR/comms applications
 Plethora of other IP cores available
PMC+ links (4) in FPGA configurable for use with Hammerhead or Tiger PMC+
carrier boards
Internal/external clock and triggering
Optional oven controlled oscillator/high stability clock
Onboard programmable clock divider & decimator
Large Snapshot acquisition to SDRAM (4K- 256M samples)
 1 ch @ 105 MHz
 2 ch @ 75 Mhz
Continuous acquisition
 2 ch @ 105 Mhz to TigerSHARC links
 1 ch @ 105 Mhz or 2 ch @ 52.5 Mhz to PCI (system dependent)
Barracuda PMC+ Block Diagram
Tetra PMC+ Features
• 4 channel 14 bit A/D, 105 MHz (AD6645)
• 78 dB SFDR; 67 dB SNR (real-world in-system
performance)
• DC (op-amp) coupled
• 32 bit, 66 MHz bus mastering PCI interface via
SharcFIN
• Cyclone-II 20/35/50 FPGA reconfigurable over PCI
 used for A/D control and data distribution
 configurable preprocessing of high speed A/D
data, such as digital filtering, decimation, digital
down conversion, etc.
 Developer’s kit available with VHDL source
code
 Optional IP cores and integration from 3rd
Parties including DDC
• PMC+ links (4) in FPGA configurable for use with
TigerSHARC/ATLANTiS
• Internal/external clock and triggering
 Can source clock for chaining
• Onboard programmable clock divider & decimator
Tetra PMC+ (TRPM) Block Diagram
Ch.0
105 MHz
Link 1
14
Link 2
Link 3
ADC
Ch.1
105 MHz
14
FPGA
Flags/Ints
User Defined Pins
(P4 Connector)
Link 0
ADC
Cyclone II
Ch.3
105 MHz
14
ADC
Clk In
XO
Clock
Driver
32-bit, 66 MHz PCI bus
14
Factory
Option
PMC Interface
(P1 – P3 Connectors)
105 MHz
Trig In/
Clk Out
(EP2C20/35)
ADC
Ch.2
Hardware Technology Overview
• New FINe
• New ATLANTiS
FINe Host Interface Bridge
RS232/422
RS232/422
PHY
PHY
Signal Processing Side
(Data Plane)
GigE; 10/100
Avalon
text Bus
Cyclone™ II
PCIexp
Bridge
(1x)
8
Interrupts to/from DSPs (2 per)
Peripheral
I/F
8
Boot
FLASH
32-bit, 66MHz PCI Bus
NIOS II
t
/In
ag X
Fl MU
Flags to/from DSPs (2 per)
Host/Control Side
(Control Plane)
PCI Bus I/F
64-bit, 83 MHz Cluster Bus
Cluster Bus
I/F
Li
nk
Po
rt
et
rn
he C
Et M A
to DSPs & ATLANTiS
et
rn
he Y
Et PH
UART
(2x)
UART
to ATLANTiS
SDRAM
New ATLANTiS - Putting it all Together
SerDes
SerDes
Memory
Module
DDR
Controller
Link
CDR
Link
DMA
OUT
1
OUT
2
OUT
3
OUT
4
OUT
5
OUT
6
OUT
7
Configuration
Registers
8 x 8 Switch
Processing
ATLANTiS FPGA
IN
0
IN
1
IN
2
IN
3
Co-
Link
CDR
Link
L0
L1
TigerSHARC
TS-201
#1
IN
4
IN
5
IN
6
IN
7
Cluster
Bus I/F
L0
L1
L0
TigerSHARC
TS-201
#2
L1
TigerSHARC
TS-201
#3
L0
L1
TigerSHARC
TS-201
#4
TigerSHARC Cluster Bus
SharcFINe
FINe
PCI Bus
64-bit DIO Port
CoOUT
0
New Product Families
• B2 Family
• B2AM
• GT Family
• GT3U-cPCI
• GTV6-Vita41/VXS
• GX Family
• GXAM
B2AM Features
• Full-height, single wide AMC (Advanced Mezzanine Card)
• ATLANTiS/ADSP-TS201 Hybrid Signal Processing cluster
• Altera Stratix II FPGA for I/O routing and processing
• 4 ADSP-TS201S TigerSHARC® DSPs processors up to 600 MHz
- 57.5 GOPS (16-bit) or 14.4 GFLOPS (floating point) of DSP
Processing power
• Fat Pipes & Common Options Interface for Data & Control
• Module management Control Implementing IPMI
- Monitors temperature and power usage of major devices
- Supports hot swapping
• SharcFINe bridge providing GigE and PCI Express
• ATLANTiS provides Fat Pipes Switch Fabric Interfaces:
- Serial RapidIO™
- PCI Express
- GigE, XAUI™ (10 GigE)
• System Synchronization via AMC system clocks
• Front Panel I/O
• 10/100 ethernet
• LVDS & General Purpose Digital I/O
• JTAG port for debug support
• FiberOptic Transciever @ 2.5GHz (optional)
• Booting of DSPs and FPGA via Flash nonvolatile memory
B2-AMC Block Diagram
AMC
Edge Conn.
(B+)
Temperature
Monitoring
24-bit
GP-DIO
MMC
IPMI
(AtMega16)
11-LVDS
(5Rx; 5Tx;
1Clk)
1x PCIe
10/100b
Ethernet
GigE (Bx)
Sys.Clks
FLASH
TigerSHARC
TigerSHARC
ADSP-TS201S
ADSP-TS201S
SharcFINe
#0
#1
Bridge
JTAG
Header
22
24
ATLANTiS
FPGA
Stratix II
(EP2S60,90,or 130)
TigerSHARC
TigerSHARC
ADSP-TS201S
ADSP-TS201S
#3
#2
TigerSHARC Linkports
LEDs
switch
Network
Interface
SERDES
QuadPHY
(PM8358)
Front
- orBack
(sRIO,
PCIe, ASI,
GigE,
XAUI)
Fat Pipes
RS-232
Fiber
Xcvr
Common Options
AMC
Front Panel
GT Cluster Architecture
Memory Module
1GB of DDR2 or
64MB of QDR
32 LVDS pairs -or- 64 Single-Ended DIO
12
64
SerDes
RS232/422
Interface
ATLANTiS
Stratix II GX
2SGX90/130
2 Xcvrs
GigE
Bridge
TigerSHARC
TigerSHARC
TS-201
#0
TS-201
#1
64-bit, 100 MHz
TigerSHARC LinkPorts
SharcFINe
TigerSHARC
TigerSHARC
TS-201
#2
TS-201
#3
Flash
(64 MB)
Local PCI (32-bit, 66MHz)
BittWare Memory Module (BMM)
• Convection or Conduction
Cooled
• 67 mm x 40 mm
• 240-pin Connector
• 160 usable signals (plus 80
Top
power/ground)
- Capability to address TBytes
• Can be implemented today
as:
- 1 bank of SDRAM up to 1GB
(x64)
- 2 banks of SDRAM up to
512MB each (x32)
- 1 bank of SRAM up to 64MB
(x64)
- 1 bank or SDRAM up to 512MB
(x32) and 1 bank of SRAM up
to 32MB (x32)
Back
Side
240-pin Connector to Carrier
GT 3U cPCI Features
GT3U Features
• Altera® Stratix® II GX FPGA for I/O, routing, and processing
• One cluster of four ADSP-TS201S TigerSHARC® DSPs
- 57.5 GOPS 16-bit fixed point, 14.4 GFLOPS floating point processing power
- Four link ports per DSP
- Two link ports routed to the ATLANTiS FPGA
- Two link ports routed for interprocessor communications
- 24 Mbits of on-chip RAM per DSP; Static superscalar architecture
• ATLANTiS architecture
-
4 GB/s of simultaneous external input and output
Eight link ports @ up to 500 MB/s routed from the on-board DSPs
36 LVDS pairs (72 pins) comprised of 16 inputs and 20 outputs
Four channels of high-speed SerDes transceivers
• BittWare Memory Module
- Up to 1 GB of on-board DDR2 SDRAM or 64 MB of QDR SDRAM
• BittWare’s SharcFINe PCI bridge
-
32-bit/66 MHz PCI
10/100 ethernet
Two UARTs, software configurable as RS232 or RS422
One link port routed to ATLANTiS
• 64 MB of flash memory for booting of DSPs and FPGA
• 3U CompactPCI form factor – Air Cooled or Conduction
• Complete software support
GT3U Block Diagram
16
36 LVDS pairs (72 Pins)
4
20
SerDes
RS232/422
Interface
ATLANTiS
Stratix II GX
2SGX90/130
2 Xcvrs
(8 pins)
Ethernet (10/100)
(4 pins)
User Defined Pins
(J2 Connector)
1GB of DDR2 or
64MB of QDR
TigerSHARC
TS-201
#0
TS-201
#1
64-bit, 100 MHz
TigerSHARC
TigerSHARC
TigerSHARC
TS-201
#2
TS-201
#3
Flash
(64 MB)
CompactPCI 32-bit, 66MHz Bus
(J1 Connector)
Bridge
Local PCI (32-bit, 66MHz)
SharcFINe
TigerSHARC LinkPorts
4x SerDes Connector
(Infiniband-Type)
Memory Module
GTV6 Block Diagram
VXS/VITA41
(P0)
Bridge
Tsi148
Memory Module
1GB of DDR2 or
64MB of QDR
96
Memory Module
1GB of DDR2 or
64MB of QDR
Local PCI (64-bit/66MHz)
8
4
4
4
Factory
Configuration
64
ATLANTiS A
High-Speed Serial Ports (SerDes)
VME-PCI
User Defined Pins
(P2)
Ethernet (GigE)
Ethernet (GigE)
High-Speed Serial Ports (SerDes)
VME64 with 2eSST
(P1 & P2)
64
4
ATLANTiS B
Stratix-IIGX
2SGX90
Stratix-IIGX
2SGX90
64-bit, 83.3 MHz
Cluster A
SharcFINe
SharcFINe
Bridge
Bridge
64-bit, 83.3 MHz
Cluster B
J4
TigerSHARC
TigerSHARC
TS-201
TS-201
Flash
(64 MB)
Flash
(64 MB)
TigerSHARC
TigerSHARC
TS-201
TS-201
PMC+
PMC+
TigerSHARC LinkPorts
Available Q2 2007
PMC Front-Panel I/O only available
on air-cooled versions
4x SerDes Connector
(Infiniband-Type)
High-Speed Serial Ports (SerDes)
64
GT3U/GTV6 BittWare Levels of Ruggedization
BittWare Levels of Ruggedization
Characteristics
Type
Temperature
Operating
Storage
Commercial
Level 1
Level 1c
Air-Cooled
Air-Cooled
Air-Cooled
Level 2c
Level 3c
Conduction-Cooled Conduction-Cooled
0C to 50C
-40C to 75C
-40C to 75C
-40C to 75C
-40C to 85C
w/ 300 lin.ft/min airflo w
w/ 300 lin.ft/min airflo w
w/ 300 lin.ft/min airflo w
at Thermal Interface
at Thermal Interface
-55C to 100C
-55C to 100C
-55C to 100C
-55C to 100C
-55C to 100C
Mechanical
Random; 0.01g2/hz Random; 0.04g2/hz Random; 0.04g2/hz Random; 0.1g2/hz Random; 0.1g2/hz
Vibration
15Hz to 2kHz
15Hz to 2kHz
15Hz to 2kHz
15Hz to 2kHz
15Hz to 2kHz
(per M IL-STD-810E)
Shock
Conformal Coating
Humidity
(per M IL-STD-810E)
(per M IL-STD-810E)
(per M IL-STD-810E)
20g peak sawtooth 20g peak sawtooth 20g peak sawtooth 40g peak sawtooth 40g peak sawtooth
11ms duration
11ms duration
11ms duration
11ms duration
11ms duration
No
No
Yes
Yes
Yes
0 to 95%
0 to 95%
0 to 100%
0 to 100%
0 to 100%
no n-co ndensing
no n-co ndensing
co ndensing
co ndensing
co ndensing
GXAM Features
• Mid-size, single wide AMC (Advanced Mezzanine Card)
 Common Options region:
 Port 0 GigE; Ports 1 ,2 & 3 connect to BittWare’s ATLANTiS framework
 Fat Pipes region has eight ports: ports 4-11 configurable to support.
 Serial RapidIO™, PCI Express™, GigE, and XAUI™ (10 GigE)
 Rear panel I/O has eight ports (8 LVDS IN, 8 LVDS OUT)
 System synchronization via AMC system clocks (all connected)
• High-density Altera Stratix II GX FPGA (2S90/130)
 BittWare’s ATLANTiS framework for control of I/O, routing, and
processing
• BittWare’s FINe bridge provides control plane processing and
interface
 GigE, 10/100 Ethernet, and RS-232
• Over 1 GByte of Bulk Memory
 Two banks of DDR2 SDRAM (up to 512 MBytes each)
 One bank of QDR2 SRAM (up to 9 MBytes)
• Front panel I/O
 10/100 Ethernet, RS-232, JTAG port for debug support, 4x SERDES
providing: Serial RapidIO™, PCI Express™, GigE, and XAUI™ (10
GigE)
• BittWare I/O Module
 72 LVDS pairs, 4x SerDes, Clocks, I2C, JTAG, Reset
• Booting of FINe and FPGA via Flash
Available Q2
2007
GXAM Block Diagram
Available Q2 2007
AMC
Edge Conn.
(B+)
PRELIMINARY
FLASH
10/100b
Ethernet
MMC
IPMI
(AtMega16)
JTAG
Header
FINe
GigE (Bx)
0
Bridge
1
2
3
RS-232
Serdes
(up to 512 MB)
32
2
DDR2 SDRAM
(up to 512 MB)
32
1
QDR2 SRAM
36
(up to 9 MB)
FPGA
Stratix II GX
36
(EP2SGX90/130)
Clocks, I2C, JTAG, Reset
LEDs
switch
Supported by:
ATLANTiS Framework
76 LVDS pairs (38 In & 38 Out)
Serdes
4X
Infiniband
Type
Connector
(optional)
Serdes
(2SGX130 only)
PRELIMINARY
4
5
6
7
8
9
10
11
Port#:
DDR2 SDRAM
Serdes
(optional)
3
(sRIO, PCIexp,
GigE, XAUI, ...)
(can be
whole width
of AMC Front
Panel)
FrontPanel I/O Module
Sys. Clks
Serdes
FP I/O
Connectors
(optional)
32-bit
Control
Bus
16 LVDS pairs (8 In & 8 Out) RP I/O
Common Options
Temperature
Monitoring
Fat Pipes/Network Interface
AMC
Front Panel
IFFM Features - Preliminary
The IFFM is an IF transceiver on a Front-panel Module (FM) format. Combined with
a GXAM, this forms an integrated IF/FPGA interface & processing AMC board
• 2 channels of high-speed (HS) ADCs (AD9640: 14-bit, 150 MHz) with
good SFDR specs (target is 80db)
 dual package to better sync channels
Available
 fast detect (non-pipelined upper 4 bits) helps for AGC control
Q3 2007
• 2 channels of HS-DACs (AD9777: 16-bit; 400 MHz)
 dual package to better sync channels
 built-in up conversion interpolation of 1x, 2x, 4x, and 8x
• High performance Clock generation via PLL/VCO (AD9516)
 inputs reference clock (e.g. 10MHz) from front panel or Baseboard
 generates programmable clocks for HS-ADCs and HS-DACs
 source reference clock to Baseboard (for system distribution)
• General Purpose (GP) 12-bit ADCs & DACs
 GP-ADCs can be used for driving AGC on RF front-end
 GP-DACs can be used for other utility signal such as GPS, positions,
...
IFFM Block Diagram - Preliminary
BaseBoard
Connector
Front Panel
Ref.Clk
Input
Ref.Clk Input
Clock Gen
PLL/VCO
Rx 1
Dual
HS-ADC
Rx 2
14-bit;
150 MHz
(AD9640)
Tx 1
Dual
HS-DAC
Tx 2
16-bit;
160 MHz
Ref.Clk Output
Command/Status
14-bit Output Bus
Fast Detect for AGC
Command/Status
16-bit Input Bus
(AD9777)
GP
SPI
ADC
GP
DAC
SPI
Available
Q3 2007
Software Technology Overview
• BittWorks
• TS Libs
• Trident MPOE
• GEDAE
Software Products
•
Analog Devices Family Development Tools
 VisualDSP C++, C, Assembler, Linker, Debugger, Simulator, VDK Kernal
 JTAG Emulators (ADI/ White Mountain)
•
BittWorks
 DSP21k Toolkit (DOS, Windows, LINUX & VxWorks)
 VDSP Target
 Remote VDSP Target & DSP21k Toolkit via Ethernet (combined in 8.0 Toolkit)
 Board Support Packages/Libraries & I/O GUIs
 SpeedDSP (ADSP-21xxx only - no TS)
 FPGA Developer’s Kits
 Porting Kit
•
Function Libraries
 TS-Lib Float
 TS-Lib Fixed
 Algorithmic Design, Implementation, & Integration
Real-Time Operating Systems
 BittWare’s Trident
 Enea’s OSEck
Graphical Development Tools
 GEDAE
 MATLAB/SimuLink/RTW
•
•
Software Products Diagram
DSP21k-SF Toolkit
•
•
•
•
•
•
Host Interface Library (HIL)
 Provides C callable interface to BittWare boards from host system
 Download, upload, board and processor control, interrupts
 Symbol table aware, converts DSP based addresses
 Full featured, mature application programming interface (API)
 Supports all BittWare boards, including FPGA and I/O
Configuration Manager (BwConfig)
 Find, track, and manage all BittWare devices in your system
Diag21k – Command line diagnostic utility
 All the power of the HIL at a command prompt
 Built-in scripting language with conditionals and looping
 Assembly level debug with breakpoints
 stdio support (printf, etc).
BitLoader
 Dynamically load FPGAs via PCI bus (or Ethernet)
 Reprogram FPGA configuration EEPROM
DspBAD/DspTest
 Automated diagnostic tests for PCI, onboard memory, DSP memory & execution
DspGraph
 Graphing utility for exploring board memory (Windows only)
BittWare Target
•
Software Debug Target for VisualDSP++
 VisualDSP++ source level debugging via PCI bus
 Supports most features of the debugger
•
Only Software Target for COTS Sharc Boards
 Other board vendors require JTAG emulator for VisualDSP debug
•
Multiprocessor Debug Sessions on All DSPs in a System
 Any processor in the system can be included in a debug session
 Not limited to the board-level JTAG chain
•
Virtually Transparent to Application
 No special code, instrumentation, or build required
 Only uses a maximum of 8 words of program memory - user selectable location
•
Some restrictions compared to JTAG debug
 For very low level debugging (e.g. interrupt service routines), an ICE is still nice
Remote Toolkit & Target
Allows Remote Code Development, Debug, & Control
•
Client-Server using RPC (remote procedure calls)
 Server on system with BittWare hardware in it (Windows, Linux, VxWorks)
 Client on Windows machine connected via TCP/IP to server
•
Run All BittWare Tools on Remote PC via Ethernet
 Diag21k, configuration manager, DspGraph, DspBad, Target
 Great for remote technical support
•
Run All User Applications on Remote PC
 Just rebuild user app with Remote HIL instead of regular HIL
•
Run VisualDSP++ Debug Session on Remote PC!
 No need to plug in JTAG emulator
 Don’t need Windows on target platform!
•
Toolkit 8.0 Combines Remote and Standard Dsp21k-SF
 Allows you to access boards in local machine and remote machine
 No need to rebuild application to use remote board
Board Support Libraries & Examples
•
All Boards Ship with Board Support Libraries & Examples
 Actual contents specific to each board
 Provides interface to standard hardware
 Examples of how to use key features of the board
 Same code as used by BittWare for validation & production test
 Examples include: PCI, links, SDRAM, FLASH, UART, utilities, ...
 Royalty free on BittWare hardware
•
Source Provided for User Customization
 Users may tailor to their specific needs
 Hard to create “generic” optimal library as requirements vary greatly
•
PCI Library for All DSP Boards
 Bus mastering DMA read/write
 Single access read/write
•
Windows GUIs for All I/O Boards
 Allow user to learn board control and operation
 IOBarracuda, AdcPerf
FPGA Developer’s Kits
•
For Users Customizing FPGAs on BittWare Boards
 Source for standard FPGA loads or examples
 Royalty free on BittWare hardware
 Mainly VHDL with some schematic (usually top level)
 Uses standard Xilinx (ISE Foundation) and Altera (Quartus) tools
•
B2/T2 ATLANTiS FPGA Developer’s Kit
 TS-201 link transmit and receive
 ATLANTiS Switches
 Control registers on peripheral bus (TigerSharc and PCI accessible)
 Digital I/O
 SerDes I/O (Aurora, SerialLite, Serial Rapid IO in works)
 Pre/Post/Co-Processing shells
TS-Libs
Hand optimised, C-callable TigerSHARC Function Libraries
Floating Point Library
 Over 450 optimised 32-bit floating point signal processing routines
 With over 200 extra striding versions
Integer Library
 Over 100 optimised 32-bit integer routines
 With over 80 extra striding versions
Fixed point (16-bit) Library
 Over 120 optimised 16-bit fixed point signal processing routines
•
•
•
•
•
•
Fastest, most optimised library for TS (up to 10x faster than C)
Uses latest algorithm theory
Well documented, easy to use, and proven over wide user base
Allows customers to focus on application (not implementation)
Supported & maintained by highly experienced TS programmers
Additional routines & functions available upon request
TS-Libs Function Coverage
•
•
•
•
•
•
•
FFT & DCTs
 1 & 2-dimension, real/complex,
Filters
 Convolution, correlation, IIR, FIR
Trigonometric
Vector Mathematics
Matrix Mathematics
Logic-Test-Sort Operations
Statistics
•
•
•
•
•
Windowing functions
Compander
Distribution and Pseudo-Random
Number Generation
 Scalar/vector log/cubes, etc.
Memory Move Matrix/Vector
Other Routines
 Doppler, signal to noise density,
Choleski decomposition
Routine
Input Length
VDSP Run-time
TS-Lib
% Faster
Real Vector and Vector Add.
Real Vector and Vector Mult.
Complex Vector Addition
Complex Vector Mult.
Complex Vector Dot Product
Complex Matrix Addition
Real Vector Mean
FIR
Real Cross Correlation
Real Convolution
1,000
1,000
1,000
1,000
1,000
(100,100)
1,000
20 & 10,000
1,000 & 1,000
1,000 & 1,000
1,273
1,273
2,766
3,012
3,022
25,030
1,431
202,534
1,145,056
2,513,531
776
776
1,526
2,526
2,039
12,713
1,045
104,420
260,821
874,567
64.0
64.0
81.3
19.2
48.2
96.9
36.9
94.0
339.0
187.4
Software Technology Overview
Trident
Multi Processor Operating Environment
BittWare’s Trident - MPOE
Multi-Processor Operating Environment
•
•
•
•
•
Designed specifically for BittWare’s TigerSHARC boards
Built on top of Analog Device’s VDK
Provides easy-to-use ‘Virtual Single Processor’ programming model
Optimized for determinism, low-latency, & high-throughput
Trident’s 3 Prongs:
 Multi-Tasking
 multiple threads of execution on a single processor
 Multi-Processor
 Transparent coordination of multiple threads on multiple processors in a system
 Data Flow Management
 managing high-throughput, low-latency data transfer throughout the system
Why is Trident Needed?
Ease of Programming
•
•
Multiprocessor DSP programming is complicated
Many customers don’t have this background/experience
Higher-level Tool Integration
•
Need underlying support for higher level software concepts (Corba, MPI,
etc)
Lack of Alternatives
•
•
Most RTOSs focus on control and single processor, not data flow and
multiprocessor
VDK is multiprocessor limited
 multiprocessor messaging but limited to 32 DSPs
 no multiprocessor synchronization
 limited data flow management
Transparent Multiprocessing
• The key feature Trident provides is Transparent Multiprocessing
• Allows programmer to concentrate on developing threads of
sequential execution (more traditional programming style)
• Provides for messaging between threads and synchronization of
threads over processor boundaries transparently
• Programmer does not need to know where a thread is located in the
system when coding
• Tools allow for defining system configuration and partitioning threads
onto the available processors (at build time)
• Similar to “Virtual Single Processor” model of Virtuoso/VspWorks
Trident Threads
• Multiple threads spread over single or multiple processors
 allows user to split application into logical units of operation
 provides for more familiar linear programming style, I.e. one thread
deals with one aspect of the system
 locate threads at build time on appropriate processors
• Priority based preemptive scheduler (per processor)
 multiple levels of priority for threads
 round robin (time slice) or run to completion within a level
 preemption between levels based on a system event (eg. an
interrupt)
• Synchronization & control of threads
 Message between threads within a processor or spanning multiple
processors
 semaphores for resource control available for access anywhere in
system
Trident Runtime
•
•
•
•
•
Device drivers for underlying board
components
Framework: message passing core
responsible for addressing,
topology and boot-time
synchronization to support up to
65k processors
Initial Modules
 CDF, MPSync, MPMQ
Optional Modules
 Future functionality
 User expansion
User API
Trident Modules - CDF
• Continuous Data Flow module provides raw link port support
• Suitable for device I/O at the system/processing edge, e.g. ADC
• Simple-to-use interface for reading and writing data blocks across
link ports
• Supports
Continuous Data Flows API



Single data block transfer
Vector data block transfer
Continuous data block transfers
• User-supplied call-back
• Mix-and-match approach
Trident_RegisterCallbackFunction
Trident_UnregisterCallbackFunction
Trident_Write
Trident_Read
Trident_WriteV
Trident_ReadV
Trident_WriteC
Trident_ReadC
Trident Modules - MPSync
• Multiprocessor Synchronization
• Synchronization methods are essential in any distributed system to
protect shared resources or coordinate activities
• Allows threads to synchronize across processor boundaries
• Semaphores: counting and binary
• Barriers: a simple group synchronization method
Trident Modules - MPMQ
• Multiprocessor Message
Queues
• Provides for messaging
between threads anywhere in
the system transparently
• Extends the native VDK
channel-based messaging into
multiprocessor space
• Provides point-to-point and
broadcast capabilities
VDSP++ IDE Integration
• Trident Plugin fully integrated within
VDSP++
• Configures
 The boards and their
interconnections
 The VDK projects
 Any Trident objects
• Builds the configuration files
• Configures VDK kernel to support
Trident runtime
Trident – to Market
• Beta released Summer 2006
• First full release November 7
• Pricing
 ~$10k per project (max 3 developers) when
purchased with BittWare Hardware
 Royalty free on BittWare hardware
• 30 day trials available
Trident – Future Directions
•
•
•
•
Extend debug and config tools
Add support for buses (cluster, PCI)
Add support for switch fabrics (RapidIO, ?)
Incorporate FPGAs as processing elements
 “Threads” located in FPGAs as sources/sinks for
messaging
• Port to other processors
 Trident designed to use basic features of a kernel,
so could port to other platforms and kernels
• BittWare’s Gedae BSP
for TigerSHARC
What Gedae says Gedae is
What is Gedae?
• Graphical Entry Distributed Application Environment
– Originally developed by Martin Marietta (now Lockheed Martin) under
DARPA’s RASSP initiative to ‘abstract’ HW-level implementation
• A graphical software development tool for signal processing
algorithm design and implementation on real-time embedded
multiprocessor systems
• A tool designed to reduced software development costs and
build reusable designs
• A tool that can help analyze the performance of the
embedded implementation and optimize to the hardware
System Development in Gedae
1) Develop Algorithm that runs on the workstation
- A tool for algorithm development
- Design hardware independent systems
- Design reusable components
2) Implement systems on the embedded hardware
- Port designs to any supported hardware
- Re-port to new hardware
Designing Data Flow Graphs (DFG)
• Basic Gedae interface: Design systems from Standard Function
Units in the hardware optimized embeddable library
• Function blocks represent the function units (FFT, sin, FIR, etc.)
• Optimized routines/blocks form GEDAE “e_” library
• 200 routines taken from TS-Libs for BittWare BSP
• Underlining code that each function block calls for execution is
called a Primitive (written similar to C)
Designing Data Flow Graphs (DFG)
• Create sub-blocks to define your own function units (add to e_
library for component reuse)
• Connecting lines represent the token streams. The underlying
communications are handled by the hardware BSP
Gedae Data Communications
• Uses data flow by token streams
• Communication is handled when transfer across hardware
Scalar values
(or structures)
Vectors
Matrices
Run-time Schedules
Static Scheduling
• The execution sequence and memory layout specified by the DFG
• A schedule boundary is forced by dynamic queues
Dynamic (Runtime) Scheduling
• Static schedule boundaries are forced when variable token streams are only
determined at runtime
• Queues are used to separate two static schedules when this occurs
• Functions require defined number of tokens to run
• a branch, valve, merge, switch effect the token flow
• Produces one static schedule for each part separated by a queue
This black square indicates a queue
Run-time Schedules – Memory Usage
•
•
•
One of the primary resources available on a DSP is the memory
Memory scheduling dramatically reduces the amount of memory used by a
static schedule
Gedae used memory packer modes:
 No packer: Gedae uses different memory for each output (wasteful)
 When function is finished, the memory is reused
 Other packers trade-off the time to pack with optimality of packing
•
Vertically - static schedule
•
Horizontally - memory used
Create parallelism in DFG
• A simple flow graph function blocks can be distributed across multiple
processors
• A “family” of function blocks can be distributed across multiple of
processors
• Families creates multiple instances of function block which can
express parallelism
• Gedae treats families as separate function blocks (referenced with a
vector index)
n
1
n
2
3
n
4
5
Partitioning a Graph
Partitioning a Graph to multiple processors
• To run the function blocks on separate processors,
partition a DFG into parts
• A separate executable is created for each part
• Partitions are independent of schedules
• Gedae creates a static schedule for each partition
• Extensive Group Controls facilitate management of
partitions
Visualization Tools: Trace table
Gedae has powerful visualization tools to view
the timings of the processor schedules
Receive
Operation
Send
Blocked
Trace table – Function Details
Gedae has powerful visualization tools to view
the trace details of a given function
Trace table - Parallel Operation
Parallel DSP Operation
BittWare’s Gedea BSP for TigerSHARC
What does the BittWare Gedae BSP Provide?
• Optimized routines for the Gedae embeddable “e_” library
200 TS-Libs functions – more can be ported if needed
• Memory Handler
• Communication Methods
Support for HW capabilities: Link & Cluster bus
• Multi-DSP Board Capability
Up to 128 clusters
• Networking Support
Development and control of distributed network of BittWare boards, with
remote debug capabilities
• BSP Support with over 12 man/years of TigerSHARC expertise
BSP Data Transfer Methods
- SHARED_WORD
(cluster bus word-sync transfers)
- SHMBUF
(cluster bus buffered transfers)
- LINK
(link port transfers)
- DSA_LINK
(DMA over the link ports)
- DSA_SHMBUF
(DSA DMA over the cluster bus)
Data Transfer Rates – Shared Memory
dsa_shmbuf Data Transfer Rate
700
600
MBytes/sec
500
400
300
200
100
0
32
64
128
256
512
1024 2048 4096 8192
Data Transfer Size
best_send_ready
• Hardware Max rate
 666.4 Mbytes / second
• For 1k data packets  450 Mbytes / second (on-board)
Data Transfer Rates – Link Ports
38
4
16
92
81
96
40
48
20
24
10
2
51
8
12
64
250
230
210
190
170
150
130
110
90
70
50
32
MBytes/sec
dsa_link Data Transfer Rate
Data Transfer Size
best_send_ready
• Hardware Max rate
 250 Mbytes / second
• For 1k data packets  230 Mbytes / second
Gedae/BSP Summary
• Gedae
 Provides portable designs for embedded multi-DSP
 Scheduling, communication and memory handling is provided
 Optimized functions are provided for each supported board
• BittWare’s Gedae BSP for TigerSHARC:

Allows Gedae to target BittWare’s TigerSHARC Boards





Compiles onto multiple DSP (up to 8 per board)
Compiles to multiple boards (currently up to 128 boards)
Optimized TigerSHARC library of functions
Multiple communication methods (with efficient, high data rates)
Removes TigerSHARC specialist engineering
Additional Slides/Info
Demo Description
• Dual B2-AMC hybrid signal processing boards
 2S90 Stratix II FPGA
 Quad TigerSHARC DSPs
• FINe control interface via GigE
• ATLANTiS framework
 Reconfigurable data routing
 ‘Patch-able’ processing
• 4x Serial RapidIO endpoint implemented in FPGA
 12.5 Gb/s inter-board xfer rates; 10 Gb/sec max payload rate
 90% efficiencies
• MicroTCA-like “Pico Box”
Demo Hardware
BittWare’s
B2-AMC
CorEdge’s PicoTCA
Demo System Architecture
CoreEdge PICO Box
RJ45
CoreEdge Power, IPMI, & Ethernet Module
Ethernet Hub
RJ45
BittWare B2-AMC
DSP
DSP
FINe
(TS201)
(TS201)
(2S60)
DSP
DSP
(TS201)
(TS201)
FPGA
(2S90)
GigE
SerDes
QuadPHY
BittWare B2-AMC
DSP
DSP
FINe
(TS201)
(TS201)
(2S60)
DSP
DSP
(TS201)
(TS201)
FPGA
(2S90)
GigE
SerDes
QuadPHY
4x Serial RapidIO
PC/Laptop
Laptop
ATLANTiS – B2
FINe
GigE
(2S60)
Cluster Bus
TS-201 L0
TigerSHARC
DSP#1
L1
TS-201 L0
TigerSHARC
DSP#2
L1
ATLANTiS FPGA
(Stratix II 90/130)
TS-201 L0
TigerSHARC
DSP#3
L1
TS-201 L0
TigerSHARC
DSP#4
L1
Front-Panel I/O
GMII
SerDes
QuadPHY
PMC Sierra
ATLANTiS – SRIO
Switch 1
FINe
GigE
(2S60)
Cluster Bus
TS-201 L0
TigerSHARC
DSP#1
L1
TS-201 L0
TigerSHARC
DSP#2
L1
ATLANTiS FPGA
(Stratix II 90/130)
TS-201
TigerSHARC
L0
DSP#3
L1
TS-201 L0
TigerSHARC
DSP#4
L1
Front-Panel I/O
GMII
SerDes
QuadPHY
PMC Sierra
ATLANTiS – Connecting to FPGA Filters
Switch 2
FINe
GigE
(2S60)
Cluster Bus
TS-201 L0
TigerSHARC
DSP#1
L1
TS-201 L0
TigerSHARC
DSP#2
L1
ATLANTiS FPGA
(Stratix II 90/130)
TS-201
TigerSHARC
L0
DSP#3
L1
TS-201 L0
TigerSHARC
DSP#4
L1
Front-Panel I/O
GMII
SerDes
QuadPHY
PMC Sierra
Download