Using Layer 2 Ethernet For High-Throughput, Real-Time Applications High-Performance Embedded Computing (HPEC) Conference – September 24, 2008 Robert Blau and Dr. Ian Dunn Mercury Computer Systems, Inc. © 2008 Mercury Computer Systems, Inc. www.mc.com Packets Over Ethernet (POET) Topics • Ethernet Trends • POET Overview • POET Rationale • POET Results • Conclusion © 2008 Mercury Computer Systems, Inc. www.mc.com Preliminary Information, Subject to Change Problem/Opportunity Statement Ethernet's ability to prevail over seemingly superior technologies has made this into a networking axiom Never bet against Ethernet Ethernet is the catalyst behind the growing pervasiveness of compute clusters Ethernet is the antithesis of real time © 2008 Mercury Computer Systems, Inc. • Seamless interoperation at 1, 10, 40, & 100 Gbps standards • Location-agnostic operation – applications can reside wherever optimal, depending on performance, productivity, and life-cycle considerations • Low cost of ownership Corporations/individuals looking for highest performance, most productive, and cost-efficient compute platforms to help translate research into value, deployed capabilities • Spawning new repositories of innovative software-based capabilities that rely on Ethernet as the communications substrate • Clustering/networking will become dominant components of any good system engineering toolkit • Unbounded latency, power consumption • Industry-wide reliance on inefficient protocols in both software and hardware from a real-time perspective www.mc.com Preliminary Information, Subject to Change 3 No Longer an Inferior Technology Catching up rapidly to embedded fabrics © 2008 Mercury Computer Systems, Inc. www.mc.com Preliminary Information, Subject to Change 4 Impressive 10GbE Switch ASIC Port Counts 70 64 60 50 48 40 30 26 20 16 10 4 0 2002 2004 2008 2010 2012 Year © 2008 Mercury Computer Systems, Inc. www.mc.com Preliminary Information, Subject to Change 5 POET • Layer 2 Solutions Make Sense Ride the Ethernet technology curve ATAoE, FCoE, IBoE • POET is Architected with Real-Time Requirements • Simple Principles: Encapsulate an existing packet protocol using L2 frames Add H/W-based guaranteed delivery Add H/W-based traffic policing and shaping • Priorities • Flow control • Adaptive routing • Channel bonding © 2008 Mercury Computer Systems, Inc. www.mc.com Preliminary Information, Subject to Change 6 POET Internals PCIe I/F Output Queue Tunnel Logic PCIe Edge Interface Tunnel Bypass MAC 3.125/6.25 Gig Phy Bypass Input Queue Ethernet Tunnel © 2008 Mercury Computer Systems, Inc. www.mc.com Preliminary Information, Subject to Change 7 TCP/IP Ethernet Format Ethernet Preamble (PRE) 0 Ethernet Destination MAC Address (DA) 8 12 Ethernet Destination MAC Address (DA) Ethernet Source MAC Address (SA) Ethernet Source MAC Address (SA) 16 20 Ethernet Type –VLAN = 0x8100 24 Ethernet Type (IP) PRI IP Header 32 IP Header 36 IP Header 40 IP Header IP Header TCP Header 48 TCP Header 52 TCP Header 56 TCP Header 60 TCP Header 64 TCP Header Data … Data N Ethernet Frame CRC © 2008 Mercury Computer Systems, Inc. VLAN ID (VID) 0 IP Header 28 44 Ethernet Start Delimiter (SFD) Ethernet Preamble (PRE) 4 www.mc.com Preliminary Information, Subject to Change 8 POET Format Example of grouped packets encapsulated in the layer 2 Ethernet format – a small write packet and a read request in the same number of bytes as a TCP/IP header Ethernet Preamble (PRE) 0 Ethernet Destination MAC Address (DA) 8 12 Ethernet Start Delimiter (SFD) Ethernet Preamble (PRE) 4 Ethernet Destination MAC Address (DA) Ethernet Source MAC Address (SA) Ethernet Source MAC Address (SA) 16 20 Ethernet Type –VLAN = 0x8100 24 Ethernet Type (POET) 28 Flow Control PRI Payload Type Ver SeqID Header CRC 32 Packet 0 Write Header 36 Packet 0 Header 40 Packet 0 Data 44 Packet 0 Data 48 Packet 0 Data 52 Packet 0 CRC 56 Packet 1 Read Header 60 Packet 1 Header (No Data) 64 Ethernet Frame CRC © 2008 Mercury Computer Systems, Inc. VLAN ID (VID) 0 www.mc.com Preliminary Information, Subject to Change 9 Why Use Layer 2 Ethernet for Transport? • High Performance • • Complete Functionality Ability to provide all necessary functions and services Scalable to 64K+ nodes Multicast capability using VLANs Robust Lossless transport layered on top in hardware No TCP/IP (S/W Stack) overhead Hardware-based proactive traffic policing and shaping to deal with contention and priority • Layer 2 avoids TCP/IP overhead Latency and throughput very competitive with embedded fabrics Compatible with Regular Networking Protocols © 2008 Mercury Computer Systems, Inc. www.mc.com Preliminary Information, Subject to Change 10 Key Requirements • • • Logical Layer Semantics Memory ops: writes, reads, atomic ops Messaging ops: mailbox, doorbell Performance 32+ Gbps point-to-point Lightweight protocol Channel bonding capability Rapidly improving switch ASIC data rates Latency 1 us end-to-end Minimal S/W involvement Cut-through operation Rapidly improving switch ASIC latencies • Fewer hops as # switch ports increase • Faster switch latencies © 2008 Mercury Computer Systems, Inc. www.mc.com Preliminary Information, Subject to Change 11 Key Interconnect Features • • • • Reliability Guaranteed delivery Datagram (send and forget) Contention Management Per-flow bandwidth control Adaptive routing capability Security DestinationID and address translation for memory region protection Switch-based ACL, DOS features Interoperability POET, TCP/IP, UDP, FCoE… © 2008 Mercury Computer Systems, Inc. www.mc.com Preliminary Information, Subject to Change 12 Mercury’s Communications Model Global Information Grid IPv6 Backbone Gateway External Networking Internal System Flows I/O Planes Comm. Plane Co-Processor Plane Data Plane © 2008 Mercury Computer Systems, Inc. www.mc.com Control Plane Sensor Plane Management Plane External I/O Flows External/ Internal Management Preliminary Information, Subject to Change 13 Nominal Prioritization Scheme Priority 3 2 1 0 (low) Traffic Class Comments Sensor I/O, Management Acquires, conditions, and forwards resulting digitized data to data plane for segmentation, re-assembly, processing, and/or storage. Failure to service this communications plane properly adversely affects all downstream processing. Data, Co-Processing Transforms digitized data into information symbols and exploits those symbols to create information. As an adjunct to the data plane, encompasses data communication and processing used in the acceleration of data plane functions, as such it must have an elevated priority over data. Control Performs initialization, configuration, and synchronization of sensor processing components in data plane, including mapping and/or routing of data through sensor, data, and co-processing planes. Communications Provides external data communication, which includes, but is not limited to, network backhaul, wireless interfaces, and free-space optical transmission schemes. Control traffic associated with external interfaces is assumed to be part of control plane in this model. © 2008 Mercury Computer Systems, Inc. www.mc.com Preliminary Information, Subject to Change 14 Layered Interconnect Architecture Session Services QOS 1-sided Channel Reliability 2-sided Channel (Guaranteed or Scatter/Gather Datagram) Protocol Handling Traffic Policing (Reservation Adaptive Protocol) Routing Traffic Shaping Aggregation L5 Services Data Link Framing Engine VLAN 10GbE/40GbE L3/L4 Services L2 Tunnel Connection Management Fabric Switch Management Switch Control Routing VLAN’s Monitoring Reconfiguration © 2008 Mercury Computer Systems, Inc. www.mc.com Preliminary Information, Subject to Change 15 POET 10GbE Throughput PCIe Writes Tunneled Through Layer 2 Ethernet Modeled Peak Performance 2.0 x4 Gen2 PCIe Peak Bandwidth GB/s 1.5 x4 Gen2 PCIe Over L2 20GbE 1.0 x4 Gen2 PCIe Over L2 10GbE 0.5 TCP/IP Through Network Stack and 10GbE TOE 0.0 256 512 1024 2048 Data Size © 2008 Mercury Computer Systems, Inc. www.mc.com Preliminary Information, Subject to Change 16 POET 40GbE Throughput PCIe Writes Tunneled Through Layer 2 Ethernet Modeled Peak Performance 5.00 40GbE Peak Bandwidth GB/s 4.00 x8 Gen3 PCIe Over L2 40GbE 3.00 x8 Gen2 PCIe Peak 2.00 1.00 x8 Gen2 PCIe Over L2 40GbE 256 512 1024 2048 Data Size © 2008 Mercury Computer Systems, Inc. www.mc.com Preliminary Information, Subject to Change 17 Summary: A New Level of Convergence Bringing together signal and image processing, information exploitation, and information management Data Acquisition & Signal Conditioning Ethernet Processing Data Acquisition & Signal Conditioning Signal & Image Processing Mission Processing Exploitation Ethernet TightlyCoupled Network Router & Gateway LooselyCoupled GIG Storage Ethernet Data Acquisition & Signal Conditioning Ethernet Signal & Image Processing Network Router & Gateway • Integrated, optimized for low latency, high throughput, and SWaP • Designed to deliver an “embedded” Quality-of-Service that supports convergence of processing and net-centric capabilities © 2008 Mercury Computer Systems, Inc. www.mc.com Preliminary Information, Subject to Change 18 Typical System Architecture POET Backplane POET Front Panel © 2008 Mercury Computer Systems, Inc. TCP/IP Front Panel www.mc.com POET Backplane Preliminary Information, Subject to Change 19 Questions? © 2008 Mercury Computer Systems, Inc. © 2008 Mercury Computer Systems, Inc. www.mc.com www.mc.com Preliminary Information, Subject to Change 20