Trading Dynamics 20-25% CAGR in market volumes Competitive advantage hinges on speed, transparency, and proximity to data sources. The application must be in the data path – seamlessly Quest to balance risk/compliance with HPC on Wall Street - 2012 performance 10GbE Switches for the Virtualized Data Center, but a software company at the core >1300 Customers >325 Employees Profitable, self-funded, preIPO network infrastructure provider Open Linux-based OS Fully automated testing, and SW development HPC on Wall Street - 2012 Arista Application Switch - 7124FX • Couples ultra-low latency switch with next generation programmable FPGA and memory subsystem • Customer programmable FPGA and Control Plane provides total control over the network, forwarding, inspection, redirection, etc. • Targeted for early adopters of hardware accelerated applications such as risk analysis, data arbitrage, order HPC on Wall Street - 2012 routing Exegy believes… • Exegy believes in continually challenging the status quo of market data delivery systems and trading platforms. – – • Exegy believes that delivery and consumption of quality market data should be as easy and painless as possible. – – v1 First to market with hardware-accelerated market data appliances based on FPGA technology. Best of breed solutions for major use cases faced by low-latency, high-capacity consumers of financial market data feeds. Fully managed and constantly monitored appliances to assure optimal performance and the best customer experience. A passion to help our customers succeed in the face of escalating complexity and the increasing demands placed on them. 4 Impulse C, Custom FPGA-Accelerated Solutions for the Arista 7124FX Brian Durwood, Co-founder Converting C to multiple streaming hardware processes ain’t that hard. Focus on reducing clock cycles Verify as you go Iterate, iterate, iterate (no “magic button”) The tool flow is a bit awkward for first timers. Visual Studio or equivalent Impulse C co-development, analysis & compile Altera Quartus II for place & route into FPGA Things you can do to get up to speed quickly: Work from known good sw modules Get up-front training or factory engineering Programming With Impulse C Not a new language C language applications Based on standard ANSI C C-language for FPGA programming For embedded and HPC applications Supports standard C development tools Supports multi-process partitioning A software-to-hardware compiler Optimizes C code for parallelism Generates HDL, ready for FPGA synthesis Also generates hardware/software interfaces Purpose Describe hardware accelerators using C Move compute-intensive functions to FPGAs www.ImpulseAccelerated.com Generate accelerator hardware HDL files Arista’s on-board FGPA Generate hardware interfaces Generate software interfaces C software libraries Reference slides from hereafter www.ImpulseC.com 7 Custom FPGA-Accelerated Solutions for the Arista 7124FX Brian Durwood, Co-founder Converting C to Multiple Streaming Hardware Processes FPGAs – Advantages Over Software Massive parallelism At system level, loop level, instruction level One FPGA can replace multiple CPUs For specific tasks/algorithms, using much lower power No need for separate NIC card Enable in line processing at near line speed Minimize OS interference in filtering Especially during high transaction load events Reduces jitter and other interference Offloads standard CPUs with customized pre-processors e.g. select limited analysis of X message types that meet X criteria for X symbols www.ImpulseAccelerated.com Confidential 9 3 Popular FPGA Configurations Usage Embedded CPU Core Usage Option 1 2 Generated Generated hardware Hardware module Module Generated Embedded Hardware hardware Accelerators accelerator FPGA FPGA Create a hardware module Accelerate an embedded CPU Usage 3 Accelerate an external/host CPU or computing cluster Host processor or cluster Generated Generated hardware Generated hardware Generated accelerator hardware accelerator hardware accelerator accelerator FPGA coprocessor 10 Configurations Can Be Combined Combining streaming, embedded processor, and host processor 10G Ethernet Stream processing and parsing Host message generation FPGA Matching algorithm and strategy Embedded CPU for configuration Embedded and shared RAM FPGA FPGA strategies can be coded using C for hardware and for embedded CPU, with shared RAM for hash table lookup or other local data www.ImpulseAccelerated.com Impulse C Programming Model C C C H/W process S/W process C C S/W process H/W process H/W process Communicating C-Language Processes Supports dataflow and message-based communications Supports parallelism at the application level and at the level of individual processes Allows simulation and debugging of parallel software processes. www.ImpulseAccelerated.com 12 Parallelism via Multiple Processes Spatial parallelism C C C C C C C C Temporal parallelism (system-level pipelining) www.ImpulseAccelerated.com 13 An Impulse C Process Shared memory C block reads/writes Stream C Multiple methods of process-to-process communications are supported inputs Signal inputs C Stream process Signal outputs Register Register inputs outputs App Monitor outputs C outputs C Processes are independently synchronized www.ImpulseAccelerated.com 14 Compile and Optimize Optimize the results using interactive tools Pipeline analysis Loop unrolling Instruction scheduling Generate FPGA hardware VHDL or Verilog Low level interfaces to memory, I/O and busses. ModelSim Test bench www.ImpulseAccelerated.com 15 Debug and Verify Use C tools for application debugging Source-level debuggers C-language testing Test and analyze parallel dataflow with the Impulse Application Monitor Automatically generate VHDL or Verilog Testbenches www.ImpulseAccelerated.com 16 Constructs Familiar to C Programmers Concept is similar to getc(), putc() in C for I/O co_stream_create Used in configuration co_stream_open co_stream_close co_stream_eos Open the stream (clear eos) Close the stream (set eos) Check end of stream (eos) co_stream_read co_stream_write Read from stream (with rdy, en) Write to stream (with rdy, en) co_stream_read_nb co_stream_write_nb Non-blocking read (no rdy) No-blocking write (no rdy) www.ImpulseC.com 17 Credible Solution in use by: Multiple Confidential Financial NDA Covered Financial Teams www.ImpulseAccelerated.com Confidential 18 Impulse Platform Support Package FPGA Embedded Processor Impulse CoDeveloper™ Produces Memory Resources Host Interfaces FPGA Fabric Processing Core PSP generates HW/SW wrappers between FPGA core & system elements Ethernet Other I/O Extensions (scripts and wrapper generators) Platform-specific library functions Documentation and tutorials Current ready to run examples for platform www.ImpulseAccelerated.com Confidential 19 Examples of FPGA processing: Financial feed kernel bypass or Full Hardware based trading Direct handling of financial feeds Parsing incoming feeds and triggering outbound orders – your strategy in hardware Normalization or Protocol Conversion Gateway sending a sub-feed of data Pre-Trade Risk Checking Low Latency Broker Dealer Compliance Financial valuations Co-processor off-loading for Monte Carlo and other algorithms www.ImpulseAccelerated.com Confidential 20 Stand-Alone Feed Handling Solution Usage 3 RX Adapter (Verilog) Feed Handler and Outbound UDP (Impulse C) 1G or 10G Ethernet MAC TX Adapter (Verilog) www.ImpulseAccelerated.com Confidential 21 Network Processing Pipeline FPGA 1/10GigE MAC Enet Filter UDP Parser and/or TCP/IP Stack UDP and TCP/IP implemented directly in FPGA hardware for low latency Host System Embedded CPU Custom Filtering Application User Application Driver Host I/O Interface www.ImpulseAccelerated.com Confidential Host Memory 22 Complex Order Support Adapterswithout OS Processing RMDS, Bloomberg Ultra-fast pattern matching and Custom. Direct connection Impulse UDP/TCP Exchanges, feed handlers, order data sources Standard and Incoming Custom Feed Handler Across Feeds Normalizing Formats Produce e.g.: ITCH, Sub-Feed OUCH, Pull and Present Opportunities OPRA, 10 Gb/S BATS, & Decompression Ethernet Decryption Generic Replace UDP. NIC Apply Trade Logic FPGA or FPGA-Based Board Outgoing Algorithms User and ReplaceTrading NIC Applications Revert feed to exchange formats Hardwire potential X required responses Message Management Exchanges Trade With Data Filtering www.ImpulseAccelerated.com Analytics Insert risk limitations awaiting confirm Manage Risk Confidential 23 Three Ways To Get Started Learn the tools Acquire an Impulse CoDeveloper license. Work from the included reference designs. Experiment with ways to optimize your algorithms to run efficiently as multiple streaming processes in FPGA. Turn Key System (“Bump in the Wire”) License above + UDP or other network attached FPGA-enabled reference design. FPGA-based accelerator platform. Impulse factory engineers to help get your system on line. Turn Key System Running A Target Algorithm License above + Turn Key System above + Impulse Engineers, under NDA, refactor your target algorithm(s) for efficient compilation to FPGA. Impulse Engineers train your team on how the refactoring works. www.ImpulseAccelerated.com Confidential 24 About Impulse Most widely used C to FGPA tool Pure ANSI C No PAR or HW statements inserted Founded in 2002 By part of the original ABEL team www.ImpulseAccelerated.com Confidential 25 Additional Resources Engineering consultation info@ImpulseAccelerated.com Tutorials: www.ImpulseAccelerated.com/Tutorials Book: Practical FPGA Programming in C www.ImpulseAccelerated.com 26 Arista Application Switch – Systems Design Compute, Storage, Memory, I/O, Application Acceleration – Together HPC on Wall Street - 2012 Platform Details Console Port Clock Input Air Vents 16 Base SFP/SFP+ Ports 24 Wirespeed 1G/ 10G SFP/ SFP + Ports 8 FX SFP/SFP+ Ports USB Port Management Port High Availability: Dual Hot-swappable Power Supplies Multiple Hot-swappable Fan Units Designed for Data Center + Colocation: Flexible Front-to-Rear or Rear-to-Front Airflow Choice of AC or DC Power Supplies Application Switching Cloud Networks HPC onfor Wall Street - 2012 Arista Application Switch - 7124FX Ultra Low Latency 24 port 10GbE Switch • • • • 16 10GbE ports connected to LLE ASIC 8 10GbE ports connected through Stratix V FPGA Built in 50GB SSD Optional Chip-Scale Atomic Clock and External Clock Source HPC on Wall Street - 2012 Application Switch Markets Financial Services Broker/DMA Market Data HFT/Algo Exchanges Government Signals Intelligence Link Encryption Distributed Lawful Intercept HPC & Medical Telecom Diagnostic Imaging Telemedicine Data Filtering Video Broadcasting Transcoding Network Security HPC on Wall Street - 2012 Financial Services Applications Inline Risk Analysis Feed Handling and A/B Arbitration Real-time Data analysis Algorithmic trading Order Protocol Conversion Order Execution Routing Application Switching for Cloud Networks Low Latency Broker Dealer Compliance Offload line arbitration to dramatically improve application performance Instrument transaction performance at high resolution Reducing system latency increases performance of trading strategies Convert or normalize multiple order entry formats to a common format Set order policies for best execution March 19, 2011 HPC on Wall Street - 2012 Developing on the Application Switch Full Custom Customer programs the FPGA subsystem. Arista provides software that validates HW and implements the FPGA for the 8 ports Outsourced Custom Customer outsources development to Arista verified partner like Impulse-C or Enyx who develops the custom capability Off the Shelf Customer purchases a prebuilt application such as Exegy to run on the Application Switch HPC on Wall Street - 2012 Application Switch Development Partners Complete integrated appliance model • Novasparks 100% Hardware market data solution • Exegy Appliance based robust ticker plant System integrators and development support • Impulse C C to RTL tools • Enyx Customer trading solutions and IP blocks HPC on Wall Street - 2012 Arista Application Switch 7124FX A new category of product that provides a network accelerated platform for high performance app vendors to develop on Combines a true network switch with full routing and switching protocols, with fully-programmable hardware creates a new market for the most demanding applications Application logic inserted into real-time environments with complete transparency HPC on Wall Street - 2012