Processing packets in packet switches CS343 May 7th 2003 High Performance Switching and Routing Telecom Center Workshop: Sept 4, 1997. Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu www.stanford.edu/~nickm 1 Contents 1. 2. What processing is done where? What does a packet switch look like? 3. 4. Trends and consequences Technology options for processing packets 5. Examples of packet switches What does a packet switch do? Typical packet switch architecture Evolution of high performance packet switch architecture General purpose CPU Network processors FPGA ASIC My 2c 2 The Network Layer View of the Internet End hosts Routers 3 Hierarchical arrangement A crude approximation End hosts Edge Routers Core Routers Core routers: Maximum capacity, minimum function. Typically: 16 ports of 10Gb/s. Capacity 160Gb/s, 200Mpps. Price $1M. Edge routers: Medium capacity, maximum flexibility and function. Typically: 16 ports of 2.5Gb/s. Capacity 20-30 Gb/s, 10-20Mpps. Price $200k. 4 Hierarchical arrangement End hosts (1000s per mux) Access multiplexer Edge Routers Core Routers POP 10Gb/s “OC192” POP POP Point of Presence (POP) POP: Point of Presence. Richly interconnected by mesh of long-haul links. Typically: 40 POPs per national network operator; 10-40 core routers per POP. 5 Autonomous Systems POP POP POP POP POP POP POP AT&T POP Worldcom “peering points” POP POP POP POP POP POP POP POP Global Crossing Sprint 6 How we connect Corporate/campus Environment Typically: 100 ports of 100Mb/s Ethernet Ethernet switch Building-wide router e.g. gates-rtr.stanford.edu Typically: 16 ports of 1Gb/s Ethernet POP 10Gb/s “OC192” POP POP POP i/f Campus or company-wide router e.g. border-rtr.stanford.edu Typically: mixture of 2.5Gb/s “OC48” and Gb/s Ethernet 7 How we connect Home modem/DSL environment Telephone switch with DSL line interface at your local Central Office POP 10Gb/s “OC192” POP POP Point of Presence (POP) i/f DSL Router/NAT Typically: 10/100Mb/s 8 Outline 1. 2. What processing is done where? What does a packet switch look like? 3. 4. Trends and consequences Technology options for processing packets 5. Examples of packet switches What does a packet switch do? Typical packet switch architecture Evolution of high performance packet switch architecture General purpose CPU Network processors FPGA ASIC My 2c 9 What a High Performance Router Looks Like 19” 19” Capacity: 160Gb/s Power: 4.2kW 6ft Capacity: 80Gb/s Power: 2.6kW 3ft 2ft Cisco GSR 12416 2.5ft Juniper M160 10 Other packet switches Cisco 7500 “edge” routers Lucent GX550 Core ATM switch D-Link DSL router Wiring closet in Packard building 11 Outline 1. 2. What processing is done where? What does a packet switch look like? 3. 4. Trends and consequences Technology options for processing packets 5. Examples of packet switches What does a packet switch do? Typical packet switch architecture Evolution of high performance packet switch architecture General purpose CPU Network processors FPGA ASIC My 2c 12 The IP Datagram vers HLen TOS ID Hop count TTL Total Length Flags Protocol Offset within original packet FRAG Offset checksum SRC IP Address <=64 KBytes DST IP Address (OPTIONS) (PAD) 13 Forwarding in an IP Router 1. Lookup packet DA in forwarding table. – – If known, forward to correct port. If unknown, drop packet. 2. Decrement TTL, update header checksum. 3. Forward packet to outgoing interface. 4. Transmit packet onto link. 14 Ethernet Frame Format Bytes: 7 1 Preamble SFD 1. 2. 3. 4. 5. 6. 6 DA 6 2 SA Type 0-1500 Data 0-46 4 Pad CRC Preamble: trains clock-recovery circuits Start of Frame Delimiter: indicates start of frame Destination Address: 48-bit globally unique address assigned by manufacturer. 1b: unicast/multicast 1b: local/global address Type: Indicates protocol of encapsulated data (e.g. IP = 0x0800) Pad: Zeroes used to ensure minimum frame length Cyclic Redundancy Check: check sequence to detect bit errors. 15 Encapsulation IP Header Preamble SFD DA IP Data SA Type = IP Data Pad CRC 16 Outline 1. 2. What processing is done where? What does a packet switch look like? 3. 4. Trends and consequences Technology options for processing packets 5. Examples of packet switches What does a packet switch do? Typical packet switch architecture Evolution of high performance packet switch architecture General purpose CPU Network processors FPGA ASIC My 2c 17 Generic Router Architecture Header Processing Data Hdr Lookup Update IP Address Header IP Address ~1M prefixes Off-chip DRAM Queue Packet Data Hdr Next Hop Address Table Buffer Memory ~1M packets Off-chip DRAM 18 Generic Router Architecture Header Processing Lookup IP Address Update Header Buffer Memory Address Table Header Processing Lookup IP Address Update Header Header Processing Address Table Buffer Manager Buffer Memory Address Table Lookup IP Address Buffer Manager Update Header Buffer Manager Buffer Memory 19 Contents 1. 2. What processing is done where? What does a packet switch look like? 3. 4. Trends and consequences Technology options for processing packets 5. Examples of packet switches What does a packet switch do? Typical packet switch architecture Evolution of high performance packet switch architecture General purpose CPU Network processors FPGA ASIC My 2c 20 First Generation Routers Shared Backplane CPU Route Table Buffer Memory Line Interface Line Interface Line Interface MAC MAC MAC Typically <0.5Gb/s aggregate capacity 21 Second Generation Routers CPU Route Table Buffer Memory Line Card Line Card Line Card Buffer Memory Buffer Memory Buffer Memory Fwding Cache Fwding Cache Fwding Cache MAC MAC MAC Typically <5Gb/s aggregate capacity 22 Third Generation Routers Switched Backplane Line Card CPU Card Line Card Local Buffer Memory Routing Table Local Buffer Memory Fwding Table Fwding Table MAC MAC Typically <50Gb/s aggregate capacity 23 Fourth Generation Routers Optical links 100s of metres Switch Core Linecards 160Gb/s - 20Tb/s routers in development 24 Contents 1. 2. What processing is done where? What does a packet switch look like? 3. 4. Trends and consequences Technology options for processing packets 5. Examples of packet switches What does a packet switch do? Typical packet switch architecture Evolution of high performance packet switch architecture General purpose CPU Network processors FPGA ASIC My 2c 25 Normalized Growth since 1980 Trends in Technology, Routers & Traffic 1,000,000 Line Capacity 2x / 7 months 100,000 10,000 1,000 100 User Traffic 2x / 12months Router Capacity 2.2x / 18months Moore’s Law 2x / 18 months DRAM Random Access Time 1.1x / 18months 10 1 1980 1983 1986 1989 1992 1995 1998 2001 26 Trends and Consequences 1 2 600 Normalized growth 1000 CPU Instructions per minimum length packet 100 10 1 1996 Disparity between traffic and router growth 500 traffic 400 300 5-fold disparity Router capacity 200 100 0 1997 1998 1999 2000 2001 2003 2006 2009 2012 Consequences: 1. Packet processing is getting harder, and eventually network processors will be used less for high performance routers. 2. (Much) bigger routers will be developed. 27 Trends and Consequences (2) 4 3 2 1 0 1990 1993 1996 1999 2002 10,000 1,000 100 10 1 19 98 approx... 100,000 19 92 4 1,000,000 19 86 Power (kW) 5 Disparity between line-rate and memory access time 19 80 6 Power consumption will Exceed POP limits Normalized Growth Rate 3 Consequences: 3. Multi-rack routers will spread power over multiple racks. 4. It will get harder to build packet buffers for linecards. 28 Contents 1. 2. What processing is done where? What does a packet switch look like? 3. 4. Trends and consequences Technology options for processing packets 5. Examples of packet switches What does a packet switch do? Typical packet switch architecture Evolution of high performance packet switch architecture General purpose CPU Network processors FPGA ASIC My 2c 29 Technology Options General purpose processor MIPS PowerPC Intel Network processor Intel IXA and IXP processors IBM Rainier Control plane processors: SiByte (Broadcom), QED (PMCSierra). FPGA ASIC 30 Network Processors Load-balancing CPU Dispatch CPU Dedicated Dedicated Dedicated Dedicated HW support, HW support, HW support, HW support, e.g. lookups e.g. lookups e.g. e.g.lookups lookups cache CPU cache CPU cache CPU cache CPU cache Incoming packets dispatched to: 1. Idle processor, or 2. Processor dedicated to packets in this flow (to prevent mis-sequencing). 3. Processor for processing needed by packet, e.g. security, transcoding, application-level processing. Off chip Memory 31 Network Processors Pipelining Off chip Memory cache cache cache cache CPU CPU CPU CPU Dedicated Dedicated Dedicated Dedicated HW support, HW support, HW support, HW support, e.g. lookups e.g. lookups e.g. e.g.lookups lookups Processing broken down into (hopefully balanced) steps, Each processor performs one step of processing. 32 Network Processors Pros Cons Flexibility: Protocols change, features are added. Reduced development time: In principle, should be quicker to develop software than design a custom chip. Reduces time-to-market, development costs, … Less efficient: slower than custom chip, more power. Usually designed using standard processors cores, not optimized for stream processing. Generally about 10x slower than general purpose CPU. Unusual development environments; hard to program. Often hard to partition functions over processors. 33 General Observations Up until about 1998, Low-end packet switches used general purpose processors, Mid-range packet switches used FPGAs for datapath, general purpose processors for control plane. High-end packet switches used ASICs for datapath, general purpose processors for control plane. More recently, 3rd party network processors now used in many low- and mid-range datapaths. Home-grown network processors used in mid- and highend. 34 Contents 1. 2. What processing is done where? What does a packet switch look like? 3. 4. Trends and consequences Technology options for processing packets 5. Examples of packet switches What does a packet switch do? Typical packet switch architecture Evolution of high performance packet switch architecture General purpose CPU Network processors FPGA ASIC My 2c 35 My 2c on network processors Is it clear that multiple small parallel processors are needed? When are 10 processors at speed 1 better than 1 processor at speed 10? Network processors make sense if: If general purpose processors evolve anyway to: Application is parallelizable into multiple threads/contexts. Uniprocessor performance is limited by load-latency. Contain multiple processors per chip, Support hardware multi-threading, …then perhaps they are better suited because: Greater development effort means faster general purpose processors, Existing well-known development environments. 36 My 2c on network processors The nail: Data Hdr Context The hammer: Data cache(s) Characteristics: 1. Stream processing. 2. Multiple flows. 3. Most processing on header, not data. 4. Two sets of data: packets, context. 5. Packets have no temporal locality, and special spatial locality. 6. Context has temporal and spatial locality. Characteristics: 1. Shared in/out bus. 2. Optimized for data with spatial and temporal locality. 3. Especially optimized for register accesses. 37 A network uniprocessor Off-chip FIFOs Head/tail Mailbox registers On-chip FIFO Context memory hierarchy Off-chip FIFOs On-chip FIFO Data cache(s) Off chip Memory Add hardware support for multiple threads/contexts. 38