BittWare Overview March 2007 Agenda • Corporate Overview • Hardware Technical Overview • Software Technology Overview • Demo Who is BittWare? A leading COTS signal processing vendor, focused on providing: the “Essential building blocks” (DSPs, FPGAs, I/O, SW Tools, IP, & Integration) that our customers can use to build “Innovative solutions” BittWare Corporate Overview • Private company founded in 1989 Founded by Jim Bittman (hence the spelling) • Essential Building Blocks for innovative Signal Processing Solutions Focused on doing one thing extremely well #2 in recognition for DSP Boards (source: VDC Merchant Board Survey 2004) • Committed to providing leading edge, deployable products, produced with timely & consistently high quality Tens of Thousands of boards shipped 100’s of active customers • Financially Strong: Profitable & Growing • Headquartered in Concord, New Hampshire, USA Engineering/Sales Offices in: Belfast, Northern Ireland (UK) (Formally EZ-DSP, acquired Sept. 2004) Leesburg, Virginia (Washington DC) Phoenix, Arizona • 15 International Distributors Representing 38 countries BittWare’s Building Blocks • High-end Signal Processing Hardware (HW) Altera FPGAs & TigerSHARC DSPs High Speed I/O Board Formats: CompactPCI (cPCI), PMC, & PCI VME Advanced Mezzanine Card (AMC or AdvancedMC) • Silicon & IP Framework SharcFIN ATLANTiS • Development Tools BittWorks Trident • Systems & Services BittWare Business Model & Markets BittWare provides essential building blocks for innovative signal processing solutions at every stage of the OEM Life-cycle Application-specific Products COTS Products • Signal Processing HW • System Integration Altera FPGAs • Custom FPGA Design TigerSHARC DSPs Interfacing High Performance I/O Processing • Development/Deployment PCI; PMC; cPCI • Tailored Signal Processing VME Boards AdvancedMC (AMC) • Specialized/Custom I/O • Defense/Aerospace • Silicon & IP Frameworks • Application Software • Communications SharcFIN integration/implementation • High-End Instrumentation ATLANTiS • Development Tools • Life Sciences • Technology & Intellectual Property Licensing BittWorks Tools Function Libraries Trident MP-RTOE Markets Hardware Technology Overview • Hybrid Signal Processing • T2 Family SharcFIN ATLANTiS T2 Boards (PCI, PMC, cPCI, VME) • GT and GX Family FINe GT Boards (cPCI, VME) GX Boards (AMC, VME) Hybrid Signal Processing Concept Input Output I/O Interface I/O Interface FPGA Pre-Processing Co-Processing Post-Processing Inter-Processor Communications (IPC) Programmable DSP(s) Hybrid Signal Processing Architecture Hybrid Signal Processing Control Plane Memory Module I/O Interfacing SerDes LVDS pairs -or- Single-Ended DIO RS232/422 Interface FPGA Interprocessor Communications (IPC) GigE Command & Control Bus Host/ Control Bridge DSP DSP #0 #1 DSP DSP #2 #3 Flash (64 MB) PCI bus BittWare’s T2 Board Family TigerSHARC multiprocessing boards for ultra-high performance applications, using a common architecture across multiple platforms and formats Clusters of 4 ADSP-TS201S DSPs @ up to 600MHz 14,400 MFLOPS per cluster Xilinx Virtex-II Pro FPGA interface/coprocessor ATLANTiS™ Architecture: up to 8.0 GB/sec I/O 2 Links per DSP off-board @ 125MB/sec each routed via FPGA to DIO/RocketIO SERDES Ring of link ports interconnected within cluster SharcFIN ASIC (SFIN-201) providing: 64-bit, 66 MHz PCI bus 8MB Boot Flash FPGA Control Interface PMC+ expansion site(s) Large shared SDRAM memory (up to 512MB) T2 Architecture Block Diagram L2 Up to 3 separate 64-pin DIO (Digital I/O) ports can be used to implement Link ports, parallel buses, and/or other interconnects TS201 #1 L3 L2 TS201 64-bit, 83.3 Mhz Clusterl Bus 4 x L1 Serdes 4 x L0 DIO (64-192 pins) RocketIO (8 Channels) ATLANTiS FPGA implements link routing 8 Full-duplex Link Ports from DSPs • Configured & controlled via SharcFIN 64-bit, 66 Mhz PCItoLocal FPGA (2 from each DSP): Bus • Access via TigerSHARCs and Host • Each link provides • Can also be used for pre/post-processing • 125 MB/sec Transmit Boot Flash • 125 MB/sec Receive 8-bit Bus 8 Ints& SharcFIN 8 Flags SF201 Basic Architecture is the • Total I/O bandwidth = 2.0 GB/sec 8 Ints& same as before (HH & ATLANTiS 8 Flags FPGA TS) except the two I/O links per DSP are routed SharcFIN-201 Bridge (transferred) via provides powerful, easy to ATLANTiS FPGA use PCI/Host command & L3 TS201 #4 control interface L2 L3 TS201 #3 8 channels#2of RocketIO SerDes @ 2.5 GHz each L3 L2 • Each channel provides ~250 MB/sec both ways • Total I/O bandwidth is 4.0 GB/sec • Connected(SO-DIMM via SDRAM two 4x Infiniband-type HW, or backplane up to 512MB) SharcFIN 201 Features • 64/66MHz PCI bus master Interface (rev. 2.2) 528MB/sec burst 460MB/sec sustained writes (SF to SF) 400MB/sec sustained reads (SF to SF) • Cluster bus interface to ADSP-TS201s @ 83.3MHz • Access DSP internal memory & SDRAM from PCI • 2 independent PCI bus mastering DMA engines • 6 independent FIFOs (2.4KB total) 2 for PCI target to/from DSP DMA (fly-by to SDRAM) 2 for PCI target to/from DSP internal memory 2 for PCI bus mastering DMA to/from DSP DMA • General purpose peripheral bus 8-bits wide, 22 address bits, 16MB/sec Reduces cluster bus loading, increasing cluster bus speed Accessible from DSP cluster bus & PCIbus Flash interface for DSP boot & non-volitile storage • I2O V1.5 compliant • I2S serial controller • Programmable interrupt & flag multiplexer 10 inputs; 7 outputs 1 inputs/1output dedicated to PCI • Extensive SW support via BittWorks HIL & DSP21K SFIN -201 SharcFIN-201 Block Diagram What is ATLANTiS? A Generic FPGA Framework for I/O, Routing & Processing • An I/O routing device in which every I/O can be dynamically connected to any other I/O! Like a Software programmable ‘cable’ – but better! ATLANTiS provides communication between the TigerSHARC link ports and all other I/Os connected to the FPGA/Board Off-board I/O defined by board architecture Communication can be point-to-point, or broadcast to various outputs Devices can be connected or disconnected as requirements dictate w/o recompiling or changing cables • A configurable FPGA Pre/Post/Co-Processing engine Standard IP blocks Customer/Custom developed blocks T2 ATLANTiS Detail Diagram External I/O & connectors dependant on specific board implementation RocketIO SerDes RocketIO SerDes PCI Bus Off-Board 64-bit DIO Port (optional) Clusterto-Cluster 64-bit DIO Port (optional) Peripheral Control Bus ATLANTiS FPGA SDRAM PMC+ 64-bit DIO Port L0 L1 TigerSHARC TS-201 #1 L0 L1 TigerSHARC TS-201 #2 L0 L1 TigerSHARC TS-201 #3 L0 L1 TigerSHARC TS-201 #4 TigerSHARC Cluster Bus SharcFIN 8 x 8 ATLANTiS Switch Diagram OUT 0 OUT 1 OUT 4 OUT 5 OUT 6 OUT 7 128 128 128 128 128 128 128 128 OUT 3 OUT 2 Configuration Registers 8 x 8 Switch Control Bus 128 IN 0 128 IN 1 128 IN 2 128 IN 3 128 IN 4 128 IN 5 128 IN 6 128 IN 7 Other Major ATLANTiS Components Packet Protocal Output FIFO Buffer SerDes Transmitter Link Port Intf. Circuit Output FIFO Buffer TS201 LinkPort Transmitter Null Intf. Circuit Output FIFO Buffer Processing Post-Processing Block Null Transmitter Output FIFO Buffer Co-Processing Block Through Routing Block Must be used as a pair to same endpoint SerDes Receiver Input FIFO Buffer Packet Protocal TS201 LinkPort Receiver Input FIFO Buffer Link Port Intf. Circuit Null Receiver Input FIFO Buffer Null Intf. Circuit Pre-Processing Block Co-Processing Block Input FIFO Buffer Pocessing ATLANTiS Put Together SerDes SerDes Link CDR Link OUT 1 OUT 2 OUT 3 PCI Bus PMC+ 64-bit DIO Port OUT 0 OUT 4 OUT 5 OUT 6 OUT 7 Configuration Registers 8 x 8 Switch Clusterto-Cluster 64-bit DIO Port (optional) Peripheral Control Bus ATLANTiS FPGA IN 0 IN 1 IN 2 IN 3 Link CDR Link L0 L1 IN 4 IN 5 IN 6 IN 7 SDRAM Off-Board 64-bit DIO Port (optional) *Links, DIO, & SerDes are now routed by Switch L0 L1 TigerSHARC TS-201 #1 TigerSHARC TS-201 #2 L0 L1 TigerSHARC TS-201 #3 L0 L1 TigerSHARC TS-201 #4 TigerSHARC Cluster Bus SharcFIN How is Used? FPGA Configuration 1) BittWare Standard Implementations (Loads) Works out-of-the-box (doesn’t require any FPGA design capabilities) Fixed interfaces & connections define switch I/Os Variety of I/O configuration options are available with boards 2) Developer’s kit Fully customizable (by BittWare and/or end user) All component cores in kit Requires FPGA Development Tools & design capabilities Run-Time Set-up and Control 1) Powerful, easy to use GUI (Navigator) Set up for any and all possible routings 2) Use DSP or Host to program Control Registers Initial configuration Change routing at any time by re-programming Control Registers ATLANTiS Configurator T2 Board Family T2PC: Quad PCI TigerSHARC Board T2PM: Quad PMC TigerSHARC Board T26U: Octal 6U cPCI TigerSHARC Board T2V6: Octal 6U VME TigerSHARC Board T2-PCI Features • One Cluster of Four ADSP-TS201S TigerSHARC® DSPs processors running at 600 MHz each - 24 Mbits of on chip SRAM per DSP - Static Superscaler Architecture - Fixed or Floating point operations • 14.4 GFLOPS (floating point) or 58 GOPS (16-bit) of DSP Processing power • Xilinx Virtex-II Pro FPGA interface/coprocessor • ATLANTiS Architecture: up to 4.0 GB/sec I/O - Eight external link ports @ 250MB/sec each - Routed via Virtex-II Pro - RocketIO SerDes Xcvrs, PMC+, DIO headers • Two link ports per DSP dedicated for interprocessor communications • Sharc®FIN (SFIN201) 64/66 PCI interface • PMC site with PMC+ extensions for BittWare’s PMC+ I/O modules • 64 MB-512 MB SDRAM • 8 MB FLASH memory (boots DSPs & FPGA) • Complete software support, including remote control and debug, support for multiple run-time and host operating systems, and optimized function libraries • Standalone operation 64-bit, 66 Mhz PCI Local Bus PCI-PCI Bridge 64 Boot Flash J4 L3 JTAG Header L2 TS201 #2 L3 64-bit, 83.3 Mhz Clusterl Bus 4 x L1 L2 TS201 #1 DIO Header 64 Signals Ext. Power 8 Ints & 8 Flags SharcFIN SF201 8 Ints & 8 Flags VirtexII-Pro 4 x L0 Serdes DIO Headers 2 @ 20 signals Rocket I/O (8 Channels) 8-bit Bus PMC+ L3 TS201 #4 L2 L3 TS201 #3 L2 SDRAM (SO-DIMM up to 512MB) PCI Conn. T2PC Block Diagram T2PM Features • One Cluster of Four ADSP-TS201S TigerSHARC® DSPs processors running at up to 600 MHz each - 24 Mbits of on chip SRAM per DSP - Static Superscaler Architecture - Fixed or Floating point operations • 14.4 GFLOPS (floating point) or 58 GOPS (16-bit) of DSP Processing power • Xilinx Virtex-II Pro FPGA interface/coprocessor • ATLANTiS Architecture: up to 4.0 GB/sec I/O - Eight external link ports @ 250MB/sec each - Routed via Virtex-II Pro - RocketIO SerDes Xcvrs, PMC+, DIO header • Two link ports per DSP dedicated for interprocessor communications • Sharc®FIN (SFIN201) 64/66 PCI interface • PMC format with BittWare’s PMC+ extensions • 64 MB-256 MB SDRAM • 8 MB FLASH memory (boots DSPs & FPGA) • Complete software support, including remote control and debug, support for multiple run-time and host operating systems, and optimized function libraries • Standalone operation T2PM Block Diagram 64-bit, 66 Mhz PCI Local Bus Boot Flash 8 Ints& 8 Flags SharcFIN SF201 8 Ints& 8 Flags 64 J4 L2 TS201 #1 L3 L2 TS201 #2 L3 64-bit, 83.3 Mhz Clusterl Bus 4 x L1 FPGA VirtexII-Pro 4 x L0 Serdes JTAG Header (optional) Rocket I/O (8 Channels) Front Panel 8-bit Bus PMC+ Conn. PMC Conn. J1-3 J1-3 L3 TS201 #4 L2 L3 TS201 #3 L2 SDRAM (up to 256MB) T26U cPCI Features • Two Clusters of Four ADSP-TS201S TigerSHARC® DSPs processors (8 total) running at 500 MHz each - 24 Mbits of on chip SRAM per DSP - Static Superscaler Architecture - Fixed or Floating point operations • 24 GFLOPS (floating point) or 96 GOPS (16-bit) of DSP Processing power • Two Xilinx Virtex-II Pro FPGA interface/coprocessors • ATLANTiS Architecture: up to 6.0 GB/sec I/O - Sixteen external link ports @ 250MB/sec each - Routed via Virtex-II Pro - RocketIO SerDes Xcvrs, PMC+, DIO (Cross-cluster) • Two link ports per DSP dedicated for interprocessor communications • Sharc®FIN (SFIN201) 64/66 PCI interface • Two PMC sites with PMC+ extensions for BittWare’s PMC+ I/O modules • 128 MB-512 MB SDRAM • 16 MB FLASH memory (boots DSPs & FPGAs) • Complete software support, including remote control and debug, support for multiple run-time and host operating systems, and optimized function libraries • Standalone operation T26U Block Diagram Rear Panel DIO (64 Signals) Rocket I/O (4 Channels) Rocket I/O (4 Channels) CPCI 64/66 Rear Panel DIO (64 Signals) 64 64 PCI-PCI Bridge High-Speed Serdes High-speed Serdes PCI-PCI Bridge Boot Flash 8-bit bus 8 Ints& 8 Flags SharcFIN SF201 PCI-PCI Bridge 64-bit, 66 Mhz PCI Local Bus JTAG Header 64-bit, 66 Mhz PCI Local Bus 8 Ints& 8 Flags Boot Flash 8 Ints& 8 Flags 8-bit bus SharcFIN SF201 8 Ints& 8 Flags FPGA FPGA L2 TS201 #1 L3 L2 TS201 #2 L3 Rocket I/O (4 Channels) J4 L2 TS201 #4 TS201 #1 L3 L2 TS201 #3 L2 SDRAM (up to 256MB) PMC+ PMC+ A B L2 TS201 #2 L3 4 x L0 Cluster B J4 L3 L3 4 x L1 Cluster A 64 64-bit, 83.3 Mhz Clusterl Bus 4 x L1 64 64-bit, 83.3 Mhz Clusterl Bus High-speed Serdes 4 x L0 64 L3 TS201 #4 L2 L3 TS201 #3 L2 SDRAM (up to 256MB) Rocket I/O (4 Channels) T2 6U VME/VXS Features • Two Clusters of Four ADSP-TS201S TigerSHARC® DSPs processors (8 total) running at 500 MHz each - 24 Mbits of on chip SRAM per DSP - Static Superscaler Architecture - Fixed or Floating point operations • 24 GFLOPS (floating point) or 96 GOPS (16-bit) of DSP Processing power • Two Xilinx Virtex-II Pro FPGA interface/coprocessor • ATLANTiS Architecture: up to 8.0 GB/sec I/O - Sixteen external link ports @ 250MB/sec each - Routed via Virtex-II Pro - RocketIO SerDes Xcvrs, PMC+, DIO (Crosscluster) • Two link ports per DSP for interprocessor ring • Sharc®FIN (SFIN201) 64/66 PCI interface • Tundra TSI-148 PCI-VME bridge with 2eSST support • VITA-41 VXS Switched-Fabric Interface • PMC site with PMC+ extensions for BittWare’s PMC+ I/O modules • 128 MB-512 MB SDRAM • 16 MB FLASH memory (boots DSPs & FPGAs) • Complete software support, including remote control and debug, support for multiple run-time and host operating systems, and optimized function libraries • Standalone operation T2V6 Block Diagram VXS/P0 (8 Channels) VME64/2eSST 4 P2 User Pins 4 64 32 VME-PCI Bridge High-speed Serdes High-Speed Serdes 64-bit, 66 Mhz PCI Local Bus 8 Ints& 8 Flags FPGA 64 8 Ints& 8 Flags FPGA 4 x L1 4 x L1 T2V6 Block Diagram 64 L2 TS201 #0 L3 L2 TS201 #1 L3 J4 Cluster B L3 L2 TS201 #0 TS201 #3 L3 L2 PMC+ L3 TS201 #2 L2 SDRAM (up to 256MB) L2 TS201 #1 L3 64-bit, 83.3 Mhz Clusterl Bus Cluster A 64-bit, 83.3 Mhz Clusterl Bus 4 x L0 High-speed Serdes 8-bit bus SharcFIN SF201 Factory Options 64 RocketIO (4 Channels) 8 Ints& 8 Flags L3 TS201 #3 L2 High-speed Serdes 8 Ints& 8 Flags SharcFIN SF201 4 x L0 8-bit bus Boot Flash JTAG Header Boot Flash L3 TS201 #2 L2 SDRAM (up to 256MB) RocketIO (4 Channels) T2V6 Heat Frame - Transparent T2V6 Heat Frame T2V6 Thermal Model BittWare Levels of Ruggedization BittWare Levels of Ruggedization Characteristics Type Temperature Operating Storage Commercial Level 1 Level 1c Air-Cooled Air-Cooled Air-Cooled Level 2c Level 3c Conduction-Cooled Conduction-Cooled 0C to 50C -40C to 75C -40C to 75C -40C to 75C -40C to 85C w/ 300 lin.ft/min airflo w w/ 300 lin.ft/min airflo w w/ 300 lin.ft/min airflo w at Thermal Interface at Thermal Interface -55C to 100C -55C to 100C -55C to 100C -55C to 100C -55C to 100C Mechanical Random; 0.01g2/hz Random; 0.04g2/hz Random; 0.04g2/hz Random; 0.1g2/hz Random; 0.1g2/hz Vibration 15Hz to 2kHz 15Hz to 2kHz 15Hz to 2kHz 15Hz to 2kHz 15Hz to 2kHz (per M IL-STD-810E) Shock Conformal Coating Humidity (per M IL-STD-810E) (per M IL-STD-810E) (per M IL-STD-810E) 20g peak sawtooth 20g peak sawtooth 20g peak sawtooth 40g peak sawtooth 40g peak sawtooth 11ms duration 11ms duration 11ms duration 11ms duration 11ms duration No No Yes Yes Yes 0 to 95% 0 to 95% 0 to 100% 0 to 100% 0 to 100% no n-co ndensing no n-co ndensing co ndensing co ndensing co ndensing Hardware Technology Overview • PMC+ Extensions • Barracuda High-Speed 2-ch ADC • Tetra High-Speed 4-ch ADC BittWare PMC+ Extensions • BittWare’s PMC+ boards are an extension of the standard PMC specification (userdefined J4 connector) • Provides tightly coupled I/O and processing to BittWare’s DSP boards: Hammerhead Family 4 links, Serial TDM, flags, irqs, reset, I2C TS Family 4 links, flags, irqs, reset, I2C T2 Family 64 signals, routed as 32 diff pairs to ATLANTiS Standard use is 4 links, plus flags and irqs Can be customized for 3rd party PMCs Barracuda PMC+ Features • • • • • • • • • • • • 2 channel 14 bit A/D, 105 MHz (AD6645) 78 dB SFDR; 67 dB SNR (real-world in-system performance) AC (transformer) or DC (op-amp) coupled options 64 bit, 66 MHz bus mastering PCI interface via SharcFIN 64 MB- 512 MB SDRAM for large snapshot acquisitions Virtex-II 1000 FPGA reconfigurable over PCI used for A/D control and data distribution configurable preprocessing of high speed A/D data, such as digital filtering, decimation, digital down conversion, etc. Developer’s kit available with VHDL source code Optional IP cores and integration from 3rd Parties for DDR/DDC/SDR/comms applications Plethora of other IP cores available PMC+ links (4) in FPGA configurable for use with Hammerhead or Tiger PMC+ carrier boards Internal/external clock and triggering Optional oven controlled oscillator/high stability clock Onboard programmable clock divider & decimator Large Snapshot acquisition to SDRAM (4K- 256M samples) 1 ch @ 105 MHz 2 ch @ 75 Mhz Continuous acquisition 2 ch @ 105 Mhz to TigerSHARC links 1 ch @ 105 Mhz or 2 ch @ 52.5 Mhz to PCI (system dependent) Barracuda PMC+ Block Diagram Tetra PMC+ Features • 4 channel 14 bit A/D, 105 MHz (AD6645) • 78 dB SFDR; 67 dB SNR (real-world in-system performance) • DC (op-amp) coupled • 32 bit, 66 MHz bus mastering PCI interface via SharcFIN • Cyclone-II 20/35/50 FPGA reconfigurable over PCI used for A/D control and data distribution configurable preprocessing of high speed A/D data, such as digital filtering, decimation, digital down conversion, etc. Developer’s kit available with VHDL source code Optional IP cores and integration from 3rd Parties including DDC • PMC+ links (4) in FPGA configurable for use with TigerSHARC/ATLANTiS • Internal/external clock and triggering Can source clock for chaining • Onboard programmable clock divider & decimator Tetra PMC+ (TRPM) Block Diagram Ch.0 105 MHz Link 1 14 Link 2 Link 3 ADC Ch.1 105 MHz 14 FPGA Flags/Ints User Defined Pins (P4 Connector) Link 0 ADC Cyclone II Ch.3 105 MHz 14 ADC Clk In XO Clock Driver 32-bit, 66 MHz PCI bus 14 Factory Option PMC Interface (P1 – P3 Connectors) 105 MHz Trig In/ Clk Out (EP2C20/35) ADC Ch.2 Hardware Technology Overview • New FINe • New ATLANTiS FINe Host Interface Bridge RS232/422 RS232/422 PHY PHY Signal Processing Side (Data Plane) GigE; 10/100 Avalon text Bus Cyclone™ II PCIexp Bridge (1x) 8 Interrupts to/from DSPs (2 per) Peripheral I/F 8 Boot FLASH 32-bit, 66MHz PCI Bus NIOS II t /In ag X Fl MU Flags to/from DSPs (2 per) Host/Control Side (Control Plane) PCI Bus I/F 64-bit, 83 MHz Cluster Bus Cluster Bus I/F Li nk Po rt et rn he C Et M A to DSPs & ATLANTiS et rn he Y Et PH UART (2x) UART to ATLANTiS SDRAM New ATLANTiS - Putting it all Together SerDes SerDes Memory Module DDR Controller Link CDR Link DMA OUT 1 OUT 2 OUT 3 OUT 4 OUT 5 OUT 6 OUT 7 Configuration Registers 8 x 8 Switch Processing ATLANTiS FPGA IN 0 IN 1 IN 2 IN 3 Co- Link CDR Link L0 L1 TigerSHARC TS-201 #1 IN 4 IN 5 IN 6 IN 7 Cluster Bus I/F L0 L1 L0 TigerSHARC TS-201 #2 L1 TigerSHARC TS-201 #3 L0 L1 TigerSHARC TS-201 #4 TigerSHARC Cluster Bus SharcFINe FINe PCI Bus 64-bit DIO Port CoOUT 0 New Product Families • B2 Family • B2AM • GT Family • GT3U-cPCI • GTV6-Vita41/VXS • GX Family • GXAM B2AM Features • Full-height, single wide AMC (Advanced Mezzanine Card) • ATLANTiS/ADSP-TS201 Hybrid Signal Processing cluster • Altera Stratix II FPGA for I/O routing and processing • 4 ADSP-TS201S TigerSHARC® DSPs processors up to 600 MHz - 57.5 GOPS (16-bit) or 14.4 GFLOPS (floating point) of DSP Processing power • Fat Pipes & Common Options Interface for Data & Control • Module management Control Implementing IPMI - Monitors temperature and power usage of major devices - Supports hot swapping • SharcFINe bridge providing GigE and PCI Express • ATLANTiS provides Fat Pipes Switch Fabric Interfaces: - Serial RapidIO™ - PCI Express - GigE, XAUI™ (10 GigE) • System Synchronization via AMC system clocks • Front Panel I/O • 10/100 ethernet • LVDS & General Purpose Digital I/O • JTAG port for debug support • FiberOptic Transciever @ 2.5GHz (optional) • Booting of DSPs and FPGA via Flash nonvolatile memory B2-AMC Block Diagram AMC Edge Conn. (B+) Temperature Monitoring 24-bit GP-DIO MMC IPMI (AtMega16) 11-LVDS (5Rx; 5Tx; 1Clk) 1x PCIe 10/100b Ethernet GigE (Bx) Sys.Clks FLASH TigerSHARC TigerSHARC ADSP-TS201S ADSP-TS201S SharcFINe #0 #1 Bridge JTAG Header 22 24 ATLANTiS FPGA Stratix II (EP2S60,90,or 130) TigerSHARC TigerSHARC ADSP-TS201S ADSP-TS201S #3 #2 TigerSHARC Linkports LEDs switch Network Interface SERDES QuadPHY (PM8358) Front - orBack (sRIO, PCIe, ASI, GigE, XAUI) Fat Pipes RS-232 Fiber Xcvr Common Options AMC Front Panel GT Cluster Architecture Memory Module 1GB of DDR2 or 64MB of QDR 32 LVDS pairs -or- 64 Single-Ended DIO 12 64 SerDes RS232/422 Interface ATLANTiS Stratix II GX 2SGX90/130 2 Xcvrs GigE Bridge TigerSHARC TigerSHARC TS-201 #0 TS-201 #1 64-bit, 100 MHz TigerSHARC LinkPorts SharcFINe TigerSHARC TigerSHARC TS-201 #2 TS-201 #3 Flash (64 MB) Local PCI (32-bit, 66MHz) BittWare Memory Module (BMM) • Convection or Conduction Cooled • 67 mm x 40 mm • 240-pin Connector • 160 usable signals (plus 80 Top power/ground) - Capability to address TBytes • Can be implemented today as: - 1 bank of SDRAM up to 1GB (x64) - 2 banks of SDRAM up to 512MB each (x32) - 1 bank of SRAM up to 64MB (x64) - 1 bank or SDRAM up to 512MB (x32) and 1 bank of SRAM up to 32MB (x32) Back Side 240-pin Connector to Carrier GT 3U cPCI Features GT3U Features • Altera® Stratix® II GX FPGA for I/O, routing, and processing • One cluster of four ADSP-TS201S TigerSHARC® DSPs - 57.5 GOPS 16-bit fixed point, 14.4 GFLOPS floating point processing power - Four link ports per DSP - Two link ports routed to the ATLANTiS FPGA - Two link ports routed for interprocessor communications - 24 Mbits of on-chip RAM per DSP; Static superscalar architecture • ATLANTiS architecture - 4 GB/s of simultaneous external input and output Eight link ports @ up to 500 MB/s routed from the on-board DSPs 36 LVDS pairs (72 pins) comprised of 16 inputs and 20 outputs Four channels of high-speed SerDes transceivers • BittWare Memory Module - Up to 1 GB of on-board DDR2 SDRAM or 64 MB of QDR SDRAM • BittWare’s SharcFINe PCI bridge - 32-bit/66 MHz PCI 10/100 ethernet Two UARTs, software configurable as RS232 or RS422 One link port routed to ATLANTiS • 64 MB of flash memory for booting of DSPs and FPGA • 3U CompactPCI form factor – Air Cooled or Conduction • Complete software support GT3U Block Diagram 16 36 LVDS pairs (72 Pins) 4 20 SerDes RS232/422 Interface ATLANTiS Stratix II GX 2SGX90/130 2 Xcvrs (8 pins) Ethernet (10/100) (4 pins) User Defined Pins (J2 Connector) 1GB of DDR2 or 64MB of QDR TigerSHARC TS-201 #0 TS-201 #1 64-bit, 100 MHz TigerSHARC TigerSHARC TigerSHARC TS-201 #2 TS-201 #3 Flash (64 MB) CompactPCI 32-bit, 66MHz Bus (J1 Connector) Bridge Local PCI (32-bit, 66MHz) SharcFINe TigerSHARC LinkPorts 4x SerDes Connector (Infiniband-Type) Memory Module GTV6 Block Diagram VXS/VITA41 (P0) Bridge Tsi148 Memory Module 1GB of DDR2 or 64MB of QDR 96 Memory Module 1GB of DDR2 or 64MB of QDR Local PCI (64-bit/66MHz) 8 4 4 4 Factory Configuration 64 ATLANTiS A High-Speed Serial Ports (SerDes) VME-PCI User Defined Pins (P2) Ethernet (GigE) Ethernet (GigE) High-Speed Serial Ports (SerDes) VME64 with 2eSST (P1 & P2) 64 4 ATLANTiS B Stratix-IIGX 2SGX90 Stratix-IIGX 2SGX90 64-bit, 83.3 MHz Cluster A SharcFINe SharcFINe Bridge Bridge 64-bit, 83.3 MHz Cluster B J4 TigerSHARC TigerSHARC TS-201 TS-201 Flash (64 MB) Flash (64 MB) TigerSHARC TigerSHARC TS-201 TS-201 PMC+ PMC+ TigerSHARC LinkPorts Available Q2 2007 PMC Front-Panel I/O only available on air-cooled versions 4x SerDes Connector (Infiniband-Type) High-Speed Serial Ports (SerDes) 64 GT3U/GTV6 BittWare Levels of Ruggedization BittWare Levels of Ruggedization Characteristics Type Temperature Operating Storage Commercial Level 1 Level 1c Air-Cooled Air-Cooled Air-Cooled Level 2c Level 3c Conduction-Cooled Conduction-Cooled 0C to 50C -40C to 75C -40C to 75C -40C to 75C -40C to 85C w/ 300 lin.ft/min airflo w w/ 300 lin.ft/min airflo w w/ 300 lin.ft/min airflo w at Thermal Interface at Thermal Interface -55C to 100C -55C to 100C -55C to 100C -55C to 100C -55C to 100C Mechanical Random; 0.01g2/hz Random; 0.04g2/hz Random; 0.04g2/hz Random; 0.1g2/hz Random; 0.1g2/hz Vibration 15Hz to 2kHz 15Hz to 2kHz 15Hz to 2kHz 15Hz to 2kHz 15Hz to 2kHz (per M IL-STD-810E) Shock Conformal Coating Humidity (per M IL-STD-810E) (per M IL-STD-810E) (per M IL-STD-810E) 20g peak sawtooth 20g peak sawtooth 20g peak sawtooth 40g peak sawtooth 40g peak sawtooth 11ms duration 11ms duration 11ms duration 11ms duration 11ms duration No No Yes Yes Yes 0 to 95% 0 to 95% 0 to 100% 0 to 100% 0 to 100% no n-co ndensing no n-co ndensing co ndensing co ndensing co ndensing GXAM Features • Mid-size, single wide AMC (Advanced Mezzanine Card) Common Options region: Port 0 GigE; Ports 1 ,2 & 3 connect to BittWare’s ATLANTiS framework Fat Pipes region has eight ports: ports 4-11 configurable to support. Serial RapidIO™, PCI Express™, GigE, and XAUI™ (10 GigE) Rear panel I/O has eight ports (8 LVDS IN, 8 LVDS OUT) System synchronization via AMC system clocks (all connected) • High-density Altera Stratix II GX FPGA (2S90/130) BittWare’s ATLANTiS framework for control of I/O, routing, and processing • BittWare’s FINe bridge provides control plane processing and interface GigE, 10/100 Ethernet, and RS-232 • Over 1 GByte of Bulk Memory Two banks of DDR2 SDRAM (up to 512 MBytes each) One bank of QDR2 SRAM (up to 9 MBytes) • Front panel I/O 10/100 Ethernet, RS-232, JTAG port for debug support, 4x SERDES providing: Serial RapidIO™, PCI Express™, GigE, and XAUI™ (10 GigE) • BittWare I/O Module 72 LVDS pairs, 4x SerDes, Clocks, I2C, JTAG, Reset • Booting of FINe and FPGA via Flash Available Q2 2007 GXAM Block Diagram Available Q2 2007 AMC Edge Conn. (B+) PRELIMINARY FLASH 10/100b Ethernet MMC IPMI (AtMega16) JTAG Header FINe GigE (Bx) 0 Bridge 1 2 3 RS-232 Serdes (up to 512 MB) 32 2 DDR2 SDRAM (up to 512 MB) 32 1 QDR2 SRAM 36 (up to 9 MB) FPGA Stratix II GX 36 (EP2SGX90/130) Clocks, I2C, JTAG, Reset LEDs switch Supported by: ATLANTiS Framework 76 LVDS pairs (38 In & 38 Out) Serdes 4X Infiniband Type Connector (optional) Serdes (2SGX130 only) PRELIMINARY 4 5 6 7 8 9 10 11 Port#: DDR2 SDRAM Serdes (optional) 3 (sRIO, PCIexp, GigE, XAUI, ...) (can be whole width of AMC Front Panel) FrontPanel I/O Module Sys. Clks Serdes FP I/O Connectors (optional) 32-bit Control Bus 16 LVDS pairs (8 In & 8 Out) RP I/O Common Options Temperature Monitoring Fat Pipes/Network Interface AMC Front Panel IFFM Features - Preliminary The IFFM is an IF transceiver on a Front-panel Module (FM) format. Combined with a GXAM, this forms an integrated IF/FPGA interface & processing AMC board • 2 channels of high-speed (HS) ADCs (AD9640: 14-bit, 150 MHz) with good SFDR specs (target is 80db) dual package to better sync channels Available fast detect (non-pipelined upper 4 bits) helps for AGC control Q3 2007 • 2 channels of HS-DACs (AD9777: 16-bit; 400 MHz) dual package to better sync channels built-in up conversion interpolation of 1x, 2x, 4x, and 8x • High performance Clock generation via PLL/VCO (AD9516) inputs reference clock (e.g. 10MHz) from front panel or Baseboard generates programmable clocks for HS-ADCs and HS-DACs source reference clock to Baseboard (for system distribution) • General Purpose (GP) 12-bit ADCs & DACs GP-ADCs can be used for driving AGC on RF front-end GP-DACs can be used for other utility signal such as GPS, positions, ... IFFM Block Diagram - Preliminary BaseBoard Connector Front Panel Ref.Clk Input Ref.Clk Input Clock Gen PLL/VCO Rx 1 Dual HS-ADC Rx 2 14-bit; 150 MHz (AD9640) Tx 1 Dual HS-DAC Tx 2 16-bit; 160 MHz Ref.Clk Output Command/Status 14-bit Output Bus Fast Detect for AGC Command/Status 16-bit Input Bus (AD9777) GP SPI ADC GP DAC SPI Available Q3 2007 Software Technology Overview • BittWorks • TS Libs • Trident MPOE • GEDAE Software Products • Analog Devices Family Development Tools VisualDSP C++, C, Assembler, Linker, Debugger, Simulator, VDK Kernal JTAG Emulators (ADI/ White Mountain) • BittWorks DSP21k Toolkit (DOS, Windows, LINUX & VxWorks) VDSP Target Remote VDSP Target & DSP21k Toolkit via Ethernet (combined in 8.0 Toolkit) Board Support Packages/Libraries & I/O GUIs SpeedDSP (ADSP-21xxx only - no TS) FPGA Developer’s Kits Porting Kit • Function Libraries TS-Lib Float TS-Lib Fixed Algorithmic Design, Implementation, & Integration Real-Time Operating Systems BittWare’s Trident Enea’s OSEck Graphical Development Tools GEDAE MATLAB/SimuLink/RTW • • Software Products Diagram DSP21k-SF Toolkit • • • • • • Host Interface Library (HIL) Provides C callable interface to BittWare boards from host system Download, upload, board and processor control, interrupts Symbol table aware, converts DSP based addresses Full featured, mature application programming interface (API) Supports all BittWare boards, including FPGA and I/O Configuration Manager (BwConfig) Find, track, and manage all BittWare devices in your system Diag21k – Command line diagnostic utility All the power of the HIL at a command prompt Built-in scripting language with conditionals and looping Assembly level debug with breakpoints stdio support (printf, etc). BitLoader Dynamically load FPGAs via PCI bus (or Ethernet) Reprogram FPGA configuration EEPROM DspBAD/DspTest Automated diagnostic tests for PCI, onboard memory, DSP memory & execution DspGraph Graphing utility for exploring board memory (Windows only) BittWare Target • Software Debug Target for VisualDSP++ VisualDSP++ source level debugging via PCI bus Supports most features of the debugger • Only Software Target for COTS Sharc Boards Other board vendors require JTAG emulator for VisualDSP debug • Multiprocessor Debug Sessions on All DSPs in a System Any processor in the system can be included in a debug session Not limited to the board-level JTAG chain • Virtually Transparent to Application No special code, instrumentation, or build required Only uses a maximum of 8 words of program memory - user selectable location • Some restrictions compared to JTAG debug For very low level debugging (e.g. interrupt service routines), an ICE is still nice Remote Toolkit & Target Allows Remote Code Development, Debug, & Control • Client-Server using RPC (remote procedure calls) Server on system with BittWare hardware in it (Windows, Linux, VxWorks) Client on Windows machine connected via TCP/IP to server • Run All BittWare Tools on Remote PC via Ethernet Diag21k, configuration manager, DspGraph, DspBad, Target Great for remote technical support • Run All User Applications on Remote PC Just rebuild user app with Remote HIL instead of regular HIL • Run VisualDSP++ Debug Session on Remote PC! No need to plug in JTAG emulator Don’t need Windows on target platform! • Toolkit 8.0 Combines Remote and Standard Dsp21k-SF Allows you to access boards in local machine and remote machine No need to rebuild application to use remote board Board Support Libraries & Examples • All Boards Ship with Board Support Libraries & Examples Actual contents specific to each board Provides interface to standard hardware Examples of how to use key features of the board Same code as used by BittWare for validation & production test Examples include: PCI, links, SDRAM, FLASH, UART, utilities, ... Royalty free on BittWare hardware • Source Provided for User Customization Users may tailor to their specific needs Hard to create “generic” optimal library as requirements vary greatly • PCI Library for All DSP Boards Bus mastering DMA read/write Single access read/write • Windows GUIs for All I/O Boards Allow user to learn board control and operation IOBarracuda, AdcPerf FPGA Developer’s Kits • For Users Customizing FPGAs on BittWare Boards Source for standard FPGA loads or examples Royalty free on BittWare hardware Mainly VHDL with some schematic (usually top level) Uses standard Xilinx (ISE Foundation) and Altera (Quartus) tools • B2/T2 ATLANTiS FPGA Developer’s Kit TS-201 link transmit and receive ATLANTiS Switches Control registers on peripheral bus (TigerSharc and PCI accessible) Digital I/O SerDes I/O (Aurora, SerialLite, Serial Rapid IO in works) Pre/Post/Co-Processing shells TS-Libs Hand optimised, C-callable TigerSHARC Function Libraries Floating Point Library Over 450 optimised 32-bit floating point signal processing routines With over 200 extra striding versions Integer Library Over 100 optimised 32-bit integer routines With over 80 extra striding versions Fixed point (16-bit) Library Over 120 optimised 16-bit fixed point signal processing routines • • • • • • Fastest, most optimised library for TS (up to 10x faster than C) Uses latest algorithm theory Well documented, easy to use, and proven over wide user base Allows customers to focus on application (not implementation) Supported & maintained by highly experienced TS programmers Additional routines & functions available upon request TS-Libs Function Coverage • • • • • • • FFT & DCTs 1 & 2-dimension, real/complex, Filters Convolution, correlation, IIR, FIR Trigonometric Vector Mathematics Matrix Mathematics Logic-Test-Sort Operations Statistics • • • • • Windowing functions Compander Distribution and Pseudo-Random Number Generation Scalar/vector log/cubes, etc. Memory Move Matrix/Vector Other Routines Doppler, signal to noise density, Choleski decomposition Routine Input Length VDSP Run-time TS-Lib % Faster Real Vector and Vector Add. Real Vector and Vector Mult. Complex Vector Addition Complex Vector Mult. Complex Vector Dot Product Complex Matrix Addition Real Vector Mean FIR Real Cross Correlation Real Convolution 1,000 1,000 1,000 1,000 1,000 (100,100) 1,000 20 & 10,000 1,000 & 1,000 1,000 & 1,000 1,273 1,273 2,766 3,012 3,022 25,030 1,431 202,534 1,145,056 2,513,531 776 776 1,526 2,526 2,039 12,713 1,045 104,420 260,821 874,567 64.0 64.0 81.3 19.2 48.2 96.9 36.9 94.0 339.0 187.4 Software Technology Overview Trident Multi Processor Operating Environment BittWare’s Trident - MPOE Multi-Processor Operating Environment • • • • • Designed specifically for BittWare’s TigerSHARC boards Built on top of Analog Device’s VDK Provides easy-to-use ‘Virtual Single Processor’ programming model Optimized for determinism, low-latency, & high-throughput Trident’s 3 Prongs: Multi-Tasking multiple threads of execution on a single processor Multi-Processor Transparent coordination of multiple threads on multiple processors in a system Data Flow Management managing high-throughput, low-latency data transfer throughout the system Why is Trident Needed? Ease of Programming • • Multiprocessor DSP programming is complicated Many customers don’t have this background/experience Higher-level Tool Integration • Need underlying support for higher level software concepts (Corba, MPI, etc) Lack of Alternatives • • Most RTOSs focus on control and single processor, not data flow and multiprocessor VDK is multiprocessor limited multiprocessor messaging but limited to 32 DSPs no multiprocessor synchronization limited data flow management Transparent Multiprocessing • The key feature Trident provides is Transparent Multiprocessing • Allows programmer to concentrate on developing threads of sequential execution (more traditional programming style) • Provides for messaging between threads and synchronization of threads over processor boundaries transparently • Programmer does not need to know where a thread is located in the system when coding • Tools allow for defining system configuration and partitioning threads onto the available processors (at build time) • Similar to “Virtual Single Processor” model of Virtuoso/VspWorks Trident Threads • Multiple threads spread over single or multiple processors allows user to split application into logical units of operation provides for more familiar linear programming style, I.e. one thread deals with one aspect of the system locate threads at build time on appropriate processors • Priority based preemptive scheduler (per processor) multiple levels of priority for threads round robin (time slice) or run to completion within a level preemption between levels based on a system event (eg. an interrupt) • Synchronization & control of threads Message between threads within a processor or spanning multiple processors semaphores for resource control available for access anywhere in system Trident Runtime • • • • • Device drivers for underlying board components Framework: message passing core responsible for addressing, topology and boot-time synchronization to support up to 65k processors Initial Modules CDF, MPSync, MPMQ Optional Modules Future functionality User expansion User API Trident Modules - CDF • Continuous Data Flow module provides raw link port support • Suitable for device I/O at the system/processing edge, e.g. ADC • Simple-to-use interface for reading and writing data blocks across link ports • Supports Continuous Data Flows API Single data block transfer Vector data block transfer Continuous data block transfers • User-supplied call-back • Mix-and-match approach Trident_RegisterCallbackFunction Trident_UnregisterCallbackFunction Trident_Write Trident_Read Trident_WriteV Trident_ReadV Trident_WriteC Trident_ReadC Trident Modules - MPSync • Multiprocessor Synchronization • Synchronization methods are essential in any distributed system to protect shared resources or coordinate activities • Allows threads to synchronize across processor boundaries • Semaphores: counting and binary • Barriers: a simple group synchronization method Trident Modules - MPMQ • Multiprocessor Message Queues • Provides for messaging between threads anywhere in the system transparently • Extends the native VDK channel-based messaging into multiprocessor space • Provides point-to-point and broadcast capabilities VDSP++ IDE Integration • Trident Plugin fully integrated within VDSP++ • Configures The boards and their interconnections The VDK projects Any Trident objects • Builds the configuration files • Configures VDK kernel to support Trident runtime Trident – to Market • Beta released Summer 2006 • First full release November 7 • Pricing ~$10k per project (max 3 developers) when purchased with BittWare Hardware Royalty free on BittWare hardware • 30 day trials available Trident – Future Directions • • • • Extend debug and config tools Add support for buses (cluster, PCI) Add support for switch fabrics (RapidIO, ?) Incorporate FPGAs as processing elements “Threads” located in FPGAs as sources/sinks for messaging • Port to other processors Trident designed to use basic features of a kernel, so could port to other platforms and kernels • BittWare’s Gedae BSP for TigerSHARC What Gedae says Gedae is What is Gedae? • Graphical Entry Distributed Application Environment – Originally developed by Martin Marietta (now Lockheed Martin) under DARPA’s RASSP initiative to ‘abstract’ HW-level implementation • A graphical software development tool for signal processing algorithm design and implementation on real-time embedded multiprocessor systems • A tool designed to reduced software development costs and build reusable designs • A tool that can help analyze the performance of the embedded implementation and optimize to the hardware System Development in Gedae 1) Develop Algorithm that runs on the workstation - A tool for algorithm development - Design hardware independent systems - Design reusable components 2) Implement systems on the embedded hardware - Port designs to any supported hardware - Re-port to new hardware Designing Data Flow Graphs (DFG) • Basic Gedae interface: Design systems from Standard Function Units in the hardware optimized embeddable library • Function blocks represent the function units (FFT, sin, FIR, etc.) • Optimized routines/blocks form GEDAE “e_” library • 200 routines taken from TS-Libs for BittWare BSP • Underlining code that each function block calls for execution is called a Primitive (written similar to C) Designing Data Flow Graphs (DFG) • Create sub-blocks to define your own function units (add to e_ library for component reuse) • Connecting lines represent the token streams. The underlying communications are handled by the hardware BSP Gedae Data Communications • Uses data flow by token streams • Communication is handled when transfer across hardware Scalar values (or structures) Vectors Matrices Run-time Schedules Static Scheduling • The execution sequence and memory layout specified by the DFG • A schedule boundary is forced by dynamic queues Dynamic (Runtime) Scheduling • Static schedule boundaries are forced when variable token streams are only determined at runtime • Queues are used to separate two static schedules when this occurs • Functions require defined number of tokens to run • a branch, valve, merge, switch effect the token flow • Produces one static schedule for each part separated by a queue This black square indicates a queue Run-time Schedules – Memory Usage • • • One of the primary resources available on a DSP is the memory Memory scheduling dramatically reduces the amount of memory used by a static schedule Gedae used memory packer modes: No packer: Gedae uses different memory for each output (wasteful) When function is finished, the memory is reused Other packers trade-off the time to pack with optimality of packing • Vertically - static schedule • Horizontally - memory used Create parallelism in DFG • A simple flow graph function blocks can be distributed across multiple processors • A “family” of function blocks can be distributed across multiple of processors • Families creates multiple instances of function block which can express parallelism • Gedae treats families as separate function blocks (referenced with a vector index) n 1 n 2 3 n 4 5 Partitioning a Graph Partitioning a Graph to multiple processors • To run the function blocks on separate processors, partition a DFG into parts • A separate executable is created for each part • Partitions are independent of schedules • Gedae creates a static schedule for each partition • Extensive Group Controls facilitate management of partitions Visualization Tools: Trace table Gedae has powerful visualization tools to view the timings of the processor schedules Receive Operation Send Blocked Trace table – Function Details Gedae has powerful visualization tools to view the trace details of a given function Trace table - Parallel Operation Parallel DSP Operation BittWare’s Gedea BSP for TigerSHARC What does the BittWare Gedae BSP Provide? • Optimized routines for the Gedae embeddable “e_” library 200 TS-Libs functions – more can be ported if needed • Memory Handler • Communication Methods Support for HW capabilities: Link & Cluster bus • Multi-DSP Board Capability Up to 128 clusters • Networking Support Development and control of distributed network of BittWare boards, with remote debug capabilities • BSP Support with over 12 man/years of TigerSHARC expertise BSP Data Transfer Methods - SHARED_WORD (cluster bus word-sync transfers) - SHMBUF (cluster bus buffered transfers) - LINK (link port transfers) - DSA_LINK (DMA over the link ports) - DSA_SHMBUF (DSA DMA over the cluster bus) Data Transfer Rates – Shared Memory dsa_shmbuf Data Transfer Rate 700 600 MBytes/sec 500 400 300 200 100 0 32 64 128 256 512 1024 2048 4096 8192 Data Transfer Size best_send_ready • Hardware Max rate 666.4 Mbytes / second • For 1k data packets 450 Mbytes / second (on-board) Data Transfer Rates – Link Ports 38 4 16 92 81 96 40 48 20 24 10 2 51 8 12 64 250 230 210 190 170 150 130 110 90 70 50 32 MBytes/sec dsa_link Data Transfer Rate Data Transfer Size best_send_ready • Hardware Max rate 250 Mbytes / second • For 1k data packets 230 Mbytes / second Gedae/BSP Summary • Gedae Provides portable designs for embedded multi-DSP Scheduling, communication and memory handling is provided Optimized functions are provided for each supported board • BittWare’s Gedae BSP for TigerSHARC: Allows Gedae to target BittWare’s TigerSHARC Boards Compiles onto multiple DSP (up to 8 per board) Compiles to multiple boards (currently up to 128 boards) Optimized TigerSHARC library of functions Multiple communication methods (with efficient, high data rates) Removes TigerSHARC specialist engineering Additional Slides/Info Demo Description • Dual B2-AMC hybrid signal processing boards 2S90 Stratix II FPGA Quad TigerSHARC DSPs • FINe control interface via GigE • ATLANTiS framework Reconfigurable data routing ‘Patch-able’ processing • 4x Serial RapidIO endpoint implemented in FPGA 12.5 Gb/s inter-board xfer rates; 10 Gb/sec max payload rate 90% efficiencies • MicroTCA-like “Pico Box” Demo Hardware BittWare’s B2-AMC CorEdge’s PicoTCA Demo System Architecture CoreEdge PICO Box RJ45 CoreEdge Power, IPMI, & Ethernet Module Ethernet Hub RJ45 BittWare B2-AMC DSP DSP FINe (TS201) (TS201) (2S60) DSP DSP (TS201) (TS201) FPGA (2S90) GigE SerDes QuadPHY BittWare B2-AMC DSP DSP FINe (TS201) (TS201) (2S60) DSP DSP (TS201) (TS201) FPGA (2S90) GigE SerDes QuadPHY 4x Serial RapidIO PC/Laptop Laptop ATLANTiS – B2 FINe GigE (2S60) Cluster Bus TS-201 L0 TigerSHARC DSP#1 L1 TS-201 L0 TigerSHARC DSP#2 L1 ATLANTiS FPGA (Stratix II 90/130) TS-201 L0 TigerSHARC DSP#3 L1 TS-201 L0 TigerSHARC DSP#4 L1 Front-Panel I/O GMII SerDes QuadPHY PMC Sierra ATLANTiS – SRIO Switch 1 FINe GigE (2S60) Cluster Bus TS-201 L0 TigerSHARC DSP#1 L1 TS-201 L0 TigerSHARC DSP#2 L1 ATLANTiS FPGA (Stratix II 90/130) TS-201 TigerSHARC L0 DSP#3 L1 TS-201 L0 TigerSHARC DSP#4 L1 Front-Panel I/O GMII SerDes QuadPHY PMC Sierra ATLANTiS – Connecting to FPGA Filters Switch 2 FINe GigE (2S60) Cluster Bus TS-201 L0 TigerSHARC DSP#1 L1 TS-201 L0 TigerSHARC DSP#2 L1 ATLANTiS FPGA (Stratix II 90/130) TS-201 TigerSHARC L0 DSP#3 L1 TS-201 L0 TigerSHARC DSP#4 L1 Front-Panel I/O GMII SerDes QuadPHY PMC Sierra