Intel® IXP2XXX Network Processor Architecture Overview John Morgan Infrastructure Processor Division September 2004 Page 1 Agenda IXP2400 External Features IXP2800 External Features Comparison of IXP2400 and IXP2800 IXP2XXX Resource Overviews – – – – – – – MEv2 Overview QDR SRAM Overview DDR Overview RDRAM Overview PCI Overview MSF Overview Miscellaneous IXP2400 External Features External Interfaces Host CPU (Optional) PCI 64-bit / 66 MHz IXA SW Classification Accelerator IXP2400 CoProc Bus MicroEngine Clusters Customer ASICs Flash QDR SRAM 1.6 GBs 64 M Byte IXP2400 (Egress) (Ingress) Utopia 1/2/3 or POS-PL2/3 Interface Switch Fabric Port Interface DDR DRAM 2 GByte Slow Port Utopia 1,2,3 SPI – 3 (POS-PL3) CSIX Flow Control Bus ATM / POS PHY or Ethernet MAC MSF Interface supports UTOPIA 1/2/3, SPI-3 (POS-PL3), and CSIX. Four independent, configurable, 8-bit channels with the ability to aggregate channels for wider interfaces. Media interface can support channelized media on RX and 32-bit connect to Switch Fabric over SPI-3 on TX (and vice versa) to support Switch Fabric option. 2 Quad Data Rate SRAM channels. A QDR SRAM channel can interface to Co-Processors. 1 DDR SDRAM channel. PCI 64/66 Host CPU interface. Flash and PHY Mgmt interface. Dedicated inter-IXP channel to communicate fabric flow control information from egress to ingress for dual chip solution. IXP2400 Full-Duplex OC-48 System Implementation QDR SRAM Q Queues & D Tables R Q D R S D R A M DDR SDRAM Packet Memory IXP2400 Ingress Processor IXF6048 Framer 1x OC-48 or 4x OC-12 OC-48 OC48 OC48 OC48 IXP2400 Egress Processor QDR SRAM Queues & Tables Q D R Q D R S D R A M DDR SDRAM Packet Memory Host CPU (IOP or iA) T C A M Classification Accelerator Ingress Processor SAR’ing Classification Metering Policing Initial Congestion Management Switch Fabric Gasket T C A M Classification Accelerator Egress Processor Traffic Shaping Flexible Choices diff serve TM 4.0 … IXP2400 Chaining Glueless Interface between IXP2400 Devices using CSIX-L1 Control Plane Processor PCI 64/66 IXP2400 Processor 2.5Gbs SPI3 Q D R Q D R QDR SRAM Queues & Tables IXP2400 Processor 2.5Gbs CSIX-L1 D R A M DDR Packet Memory Q D R Q D R QDR SRAM Queues & Tables IXP2400 Processor 2.5Gbs CSIX-L1 D R A M DDR Packet Memory Q D R Q D R QDR SRAM Queues & Tables 2.5 Gbs CSIX-L1 D R A M DDR Packet Memory IXP2400 72 MEv2 1 DDRAM MEv2 2 Rbuf 64 @ 128B Intel® XScale™ Core 32K IC 32K DC PCI 64b (64b) 66 MHz G A S K E T MEv2 4 MEv2 3 Tbuf 64 @ 128B MEv2 5 MEv2 6 S P I 3 or C S I X Hash 64/48/128 Scratch 16KB QDR SRAM 1 QDR SRAM 2 E/D Q E/D Q 18 18 18 18 MEv2 8 MEv2 7 CSRs -Fast_wr -UART -Timers -GPIO -BootROM/Slow Port 32b 32b IXP2400 Bandwidths 600 MHz Operation 4.8+ GOPs 2.5 Gb/s Full Duplex Media Interface – POS-PHY – Utopia – CSIX-L1 2.4 GBs DDR Memory Bandwidth at 300 MTs 1.6 GBs QDR Memory Bandwidth with 200 MHz QDRII devices IXP2400 Resources Summary Half Duplex OC-48 / 2.5 Gb/sec Network Processor (8) Multi-Threaded Microengines Intel® XScale™ Core Media / Switch Fabric Interface PCI interface 2 QDR SRAM interface controllers 1 DDR SDRAM interface controller 8 bit asynchronous port – Flash and CPU bus Additional integrated feature – – – – – Hardware Hash Unit 16 KByte Scratchpad Memory,Serial UART port 8 general purpose I/O pins Four 32-bit timers JTAG Support Agenda IXP2400 External Features IXP2800 External Features Comparison of IXP2400 and IXP2800 IXP2XXX Resource Overviews – – – – – – – MEv2 Overview QDR SRAM Overview DDR Overview RDRAM Overview PCI Overview MSF Overview Miscellaneous IXP2800 External Features Host CPU (Optional) PCI 64-bit / 66 MHz QDR SRAM 12.8 Gbps x 4 64 M Byte x 4 channels IXA SW Classification Accelerator IXP2800 MicroEngine Clusters Flash IXP2800 (Egress) (Ingress) Slow Port SPI – 4, CSIX-L1 SPI-4 or CSIXL1 Switch Fabric Port Interface RDR DRAM 50+Gbps 2 Gbyte total for 3 channels CoProc Bus Customer ASICs External Interfaces Flow Control Bus ATM / POS PHY or Ethernet MAC Media Interface supports both SPI-4 and CSIX 4 Quad Data Rate (QDR) SRAM channels Each channel can interface to Coprocessors 3 RDRAM Channels PCI 64/66 Host CPU interface Flash and PHY Management interface Dedicated inter-IXP channel to communicate fabric flow control information from egress to ingress for dual chip solution 10Gb/s SONET Line Card QDR SRAM Q Queues & D Tables R Q D R Q D R D R A M Q D R D R A M D RDR R A Packet M Memory Control Plane Processor PCI 64/66 CDR, DEMUX IXF18101 15Gbs 10Gbs 10GbE OC-192c Fabric Interface Chip (FIC) SPI I/F CSIX Fabric I/F Flow Ctl CDR, DEMUX IXP2800 Ingress Processor 10Gbs 10 GbE WAN / PPP/ ATM/ OTN / SONET/ SDH QDR SRAM Q Queues & D Tables R 15Gbs Egress Processor IXP2800 Egress Processor Q D R Ingress Processor SAR’ing Classification Metering Policing Initial Congestion Management Q D R Q D R D R A M D R A M D R A M RDR Packet Memory Traffic Shaping Flexible Choices diff serve TM 4.1 … IXP2800 System with SPI gasket QDR SRAM Queues & Tables Q D R Q D R Q D R Q D R D R A M D R A M D R A M RDR Packet Memory Control Plane Processor PCI 64/66 IXP2800 Ingress Processor 10Gbs Utopia3 SPI 2 U3 10Gbs SPI4 2 U3 x SPI4 2 U3 x SPI gasket Dual CSIX IXP2800 Chaining • Glueless interface between IXP2800 devices using SPI-4.2 Control Plane Processor PCI 64/66 IXP2800 Processor 10Gbs SPI-4 Q D R Q D R Q D R Q D R QDR SRAM Queues & Tables D R A M IXP2800 Processor 10Gbs SPI-4 D R A M D R A M RDR Packet Memory Q D R Q D R Q D R Q D R QDR SRAM Queues & Tables D R A M IXP2800 Processor 10Gbs SPI-4 D R A M D R A M RDR Packet Memory Q D R Q D R Q D R Q D R QDR SRAM Queues & Tables D R A M 10Gbs SPI-4 D R A M D R A M RDR Packet Memory 18 18 18 IXP2800 Stripe RDRAM 1 RDRAM 2 MEv2 1 RDRAM 3 MEv2 2 MEv2 3 MEv2 4 Rbuf 64 @ 128B 64b G A S K E T Intel® XScale™ Core PCI (64b) 66 MHz MEv2 8 32K IC 32K DC MEv2 7 MEv2 6 MEv2 5 Tbuf 64 @ 128B MEv2 9 MEv2 10 MEv2 11 S P I 4 or C S I X MEv2 12 Hash 48/64/128 QDR SRAM 1 QDR SRAM 2 QDR SRAM 3 QDR SRAM 4 E/D Q E/D Q E/D Q E/D Q 18 18 18 18 18 Page 14 18 18 18 MEv2 16 MEv2 15 MEv2 14 MEv2 13 Scratch 16KB CSRs -Fast_wr -UART -Timers -GPIO -BootROM/SlowPort 16b 16b IXP2800 Bandwidths 1.4 GHz Operation 20+ GOPs 10Gbs Full Duplex Media Interface – SPI-4.2 – CSIX-L1 1.9 GB/s QDR SRAM Memory Bandwidth/Channel 2.1 GB/s RDRAM Memory Bandwidth/Channel IXP2800 Resources Summary Half Duplex OC-192 / 10 Gb/sec Network Processor (16) Multi-Threaded Microengines Intel® XScale™ Core Media / Switch Fabric Interface PCI interface 4 QDR SRAM Interface Controllers 3 Rambus* DRAM Interface Controllers 8 bit asynchronous port – Flash and CPU bus Additional integrated features – Hardware Hash Unit for generating of 48-, 64-, or 128-bit adaptive polynomial hash keys – 16 KByte Scratchpad Memory – Serial UART port for debug – 8 general purpose I/O pins – Four 32-bit timers – JTAG Support Agenda IXP2400 External Features IXP2800 External Features Comparison of IXP2400 and IXP2800 IXP2XXX Resource Overviews – – – – – – – MEv2 Overview QDR SRAM Overview DDR Overview RDRAM Overview PCI Overview MSF Overview Miscellaneous IXP2800 and IXP2400 Comparison IXP2800 IXP2400 1.4/1.0 GHz/ 650 MHz 600/400MHz DRAM Memory 3 channels RDRAM 800/1066MHz; Up to 2GB 1 channel DDR DRAM 150MHz; Up to 2GB SRAM Memory 4 channels QDR (or coprocessor) 2 channels QDR (or coprocessor) Media Interface Separate 16 bit Tx & Rx configurable to SPI-4 P2 or CSIX_L1 16 (MEv2) Separate 32 bit Tx & Rx configurable to SPI-3, UTOPIA 3 or CSIX_L1 8 (MEv2) Dual chip full duplex OC192 Dual chip full duplex OC48 Frequency Number of MicroEngines Performance Agenda IXP2400 External Features IXP2800 External Features Comparison of IXP2400 and IXP2800 IXP2XXX Resource Overviews – – – – – – – MEv2 Overview QDR SRAM Overview DDR Overview RDRAM Overview PCI Overview MSF Overview Miscellaneous MicroEngine v2 D-Push Bus From Next Neighbor Local Memory 128 GPR 128 GPR 128 Next Neighbor S-Push Bus 128 D Xfer In 128 S Xfer In 640 words LM Addr 1 LM Addr 0 2 per CTX B_op 4K/8K Instructions A_op Prev B Control Store Prev A P-Random # CRC Unit CRC remain Other Local CSRs Multiply Find first bit 32-bit Execution Data Path TAGs 0-15 Add, shift, logical Lock 0-15 Status Entry# ALU_Out To Next Neighbor Timers Timestamp 128 D Xfer Out D-Pull Bus Status and LRU Logic (6-bit) CAM B_Operand A_Operand 128 S Xfer Out S-Pull Bus Microengine v2 Features – Part 1 Clock Rates – IXP2400 – 600/400 MHz – IXP2800 - 1.4/1.0 GHz/ 650 MHz Control Store – IXP2400 – 4K Instruction store – IXP2800 – 8K Instruction store Configurable to 4 or 8 threads – Each thread has its own program counter, registers, signal and wakeup events – Generalized Thread Signaling (15 signals per thread) Local Storage Options – – – – 256 GPRs 256 Transfer Registers 128 Next Neighbor Registers 640 - 32bit words of local memory Microengine v2 Features – Part 2 CAM (Content Addressable Memory) – – Performs parallel lookup on 16 - 32bit entries Reports a 9-bit lookup result – 4 State bits (software controlled, no impact to hardware) – Hit – entry number that hit; Miss – LRU entry – 4-bit index of Cam entry (Hit) or LRU (Miss) – CRC hardware – – – Improves usage of multiple threads on same data IXP2400 - Provides CRC_16, CRC_32 IXP2800 - Provides CRC_16, CRC_32, iSCSI, CRC_10 and CRC_5 Accelerates CRC computation for ATM AAL/SAR, ATM OAM and Storage applications Multiply hardware – – Supports 8x24, 16x16 and 32x32 Accelerates metering in QoS algorithms – DiffServ, MPLS Pseudo Random Number generation – Accelerates RED, WRED algorithms 64-bit Time-stamp and 16-bit Profile count Intel® XScale™ Core Overview High-performance, Low-power, 32-bit Embedded RISC processor Clock rate – IXP2400 600 MHz – IXP2800 700/500/325 MHz 32 Kbyte instruction cache 32 Kbyte data cache 2 Kbyte mini-data cache Write buffer Memory management unit QDR SRAM Overview Controller Configuration – IXP2400 - 2 channels – IXP2800 - 4 channels – Optional parity (support for x16 or x18 parts) Address up to 64 Mbytes of SRAM per channel Pin design supports up to 4 SRAM loads Supports Burst of 2 QDR Devices Supports byte parity bits [8], [17] for byte 0/1 Parity can be enabled/disabled per channel in SRAM_control CSR QDR SRAM Overview Peak bandwidth of 1.6 GBytes/sec per channel – Using 200 MHz SRAMs Specialized SRAM operations: – Atomic swap, bit set, bit clear, add, subtract – Hardware support for ring, queue and journal operations – 64 Q_Array registers per channel Interface to QDR compatible TCAMs and CoProcessors – Network Processor Forum LA-1 Co-Processor Standard Compliant – “Clamshell” topology enables both Memory and Coprocessor to share the same channel IXP2400 DDR DRAM Overview 1 64-bit (72-bit with ECC) SDRAM channel DRAM sizes of 64Mb, 128Mb, 256Mb, 512Mb, or 1Gb – – – – – Max capacity is 2GB (using 1Gb parts) Support x8 or x16 devices, DIMM or direct soldered Support devices with 4 banks Support 1 or 2 sided DIMM Optional ECC 200/300 MTS, 100MHz/150MHz respectively Hardware Interleaving spreads contiguous addresses across multiple banks IXP2800 RDRAM Overview 3 Independent Rambus* DRAM Channels which operate concurrently 1.6 GBytes/s (12.8Gbps) per channel at 800 MHz Maximum total of 2 GBytes – 768 MBytes each if 3 channels are populated – 1 GBytes each if only 2 channels are populated – 2 GBytes if only 1 channel is populated Supports 64Mb, 128Mb, 256Mb, 512Mb and 1 Gb devices Supports RDRAMS with 1x16, 2x16 dependent and 4 independent Banks Optional ECC and Parity Support Interleaving implemented in HW provides balanced access across all channels – Interleave size is 128 bytes PCI Interface Overview PCI 2.2 compliant PCI Bus Target – SRAM – DRAM – Control and Status Registers PCI Bus Master to other devices DMA channels – IXP2400 – 3 Channels – IXP2800 – 2 Channels Doorbell and Mailbox Registers Loads: – 4 loads at 66MHz – 8 loads at 33MHz IXP2400 Media Switch Fabric Interface Protocols – POS-PHY Levels 2 and 3 – Utopia Levels 1, 2 and 3 – CSIX-L1 for Switch Fabric Interface LVTTL IO (3.3V) 32-bit receive, 32-bit transmit 25–133 MHz 8KB receive buffer and 8KB transmit buffer IXP2800 Media Switch Fabric Interface Protocols – SPI-4 Phase 2 for Network Device – CSIX-L1 for Switch Fabric Interface LVDS IO (IEEE 1596.3, ANSI/TIA/EIA-644) 16-bit receive, 16-bit transmit 311–500 MHz 8KB receive buffer and 8KB transmit buffer Miscellaneous UART – Standard RS232 primarily for debugging TIMER – 4 - 32 bit timers – Timer 4 can be used as Watchdog Timer GPIO – 8 General Purpose IO pins – Can be used as interrupt source to XScale core or clock to timers Interrupt Controller – Provides the ability to enable or mask interrupts from a number of chip wide sources like timers, PCI devices, DRAM ECC errors, etc. Slow Port – Used for Flash ROM access and 8, 16, or 32-bit asynchronous device access – Allows XScale do read/ write data transfers to these slave devices Backup IXP2400 Target Application IXP2400 will provide IXP 1200 customers a performance upgrade for OC-12 applications and enable multi Gigabit Ethernet platforms up to OC-48 WAN Edge/Access Aggregation – Includes IP Service Switches, Multiservice Switches, DSLAM, Cable Head End Wireless Infrastructure Layer 4-7 Switches – Includes Firewall, Server Offload, Content-Based Load Balancing IXP2800 Target Application Metropolitan Area Network (MAN) switches and routers Internet core access switches and routers Multi-service switches 10 Gbs enterprise switches and routers supporting tomorrow’s data centers, Storage area networks (SAN) Content aware server off-load/web switches. Security/VPN solutions Wireless base stations Digital Subscriber Line Access Multiplexers (DSLAMs). Oahu Quad Gig Phy Vallejo 4 x 1G Ethernet MAC 1 Gig LAN Backbone or Server Farm Utopia3 IXP2400 IXP SPI-3 (Utopia3 Packet) Phy Interface SPI-3 (Utopia3 Packet) 1 Gig LAN or Server Farm Edge Multi-Service Switch - WAN/LAN Solutions OC-48c ATM SAR & Traffic Manager WAN Backbone (ATM, SONET) IXP2400 IXP Amazon IXF6048 OC-48c ATM & POS Framer Optical Ring OR CSIX Switch Fabric 80 Gig – 1+ Terabit Switch Fabric Edge Server Offload PCI Bus Vallejo 4 x 1G Ethernet MAC 1 Gig LAN Backbone or Server Farm IXP2400 IXP CSIX Level 1 Oahu Quad Gig Phy SPI-3 (Utopia3 Packet) Server Farm Phy Interface Host CPU (IOP or iA) IXP2400 IXP CSIX Switch Fabric 80 Gig – 1+ Terabit Switch Fabric IXP2400 Media Configurations Q D R T C A M Q D R D D R Q D R Xscale Q D R D D R Q D R Xscale IXP2400 Rx 32bit T C A M Q D R 32bit Utopia 1/2/3 Or SPI-3 (POS-PHY 2/3) Or CSIX_L1B Rx 16bit Rx & Tx paths each have 2 separate clock domains for asynchronous traffic D D R Xscale IXP2400 Tx T C A M IXP2400 Tx 32bit Utopia 1/2/3 Or SPI-3 (POS-PHY 2/3) Rx Tx 8bit 32bit Q D R T C A M Q D R Xscale IXP2400 Rx Each Rx & Tx path may be configured to be single 32bit, quad 8bit, dual 16bit or combination of 8 & 16bit wide buses D D R Tx 16bit 8bit 16bit 10Port 1Gb/s Ethernet Line Card QDR SRAM Q Queues & D Tables R Q D R Q D R D R A M Q D R D R A M D RDR R A Packet M Memory Control Plane Processor PCI 64/66 IXP2800 Ingress Processor Ben Nevis Fabric Interface Chip (FIC) 15Gbs 10Gbs SPI I/F CSIX Fabric I/F Flow Ctl 10x1GbE 10Gbs 10 x 1 GbE LAN QDR SRAM Q Queues & D Tables R 15Gbs Egress Processor IXP2800 Egress Processor Q D R Ingress Processor SAR’ing Classification Metering Policing Initial Congestion Management Q D R Q D R D R A M D R A M D R A M RDR Packet Memory Traffic Shaping Flexible Choices diff serve TM 4.1 … 10Gb/s to Infiniband QDR SRAM Q Queues & D Tables R Q D R Q D R D R A M Q D R D R A M D RDR R A Packet M Memory Control Plane Processor PCI 64/66 Calypso Ben Nevis Loch Lomond 15Gbs 10Gbs Infiniband Fabric 2.5Gbps 2.5Gbps CSIX Fabric I/F Flow Ctl 10GbE SPI 10x1Gb I/F OC-192c IXP2800 Ingress Processor 10Gbs 2.5Gbps 15Gbs 2.5Gbps IXP2800 Egress Processor QDR SRAM Q Queues & D Tables R Q D R Q D R Q D R D R A M D R A M D R A M RDR Packet Memory 10Gbs Ethernet to SONET QDR SRAM Queues & Tables Q D R Q D R Q D R D R A M Q D R D R A M D RDR R A Packet M Memory Control Plane Processor PCI 64/66 Disk Farms Loch Lomond Ben Nevis SPI I/F SPI OC-192 I/F 4xOC48 10Gbs 10Gbs IXP2800 Egress Processor QDR SRAM Queues & Tables Q D R Q D R Metro 10Gbs 10Gbs 10GbE 10x1Gb Calypso Flow Ctl Server or IXP2800 Ingress Processor Q D R Q D R D R A M D R A M D RDR R Packet A Memory M Or WAN Media / Fabric Receive Logic: Thread moves data 6 AutoPush Status to Thread 5 Bit vector Status Word Per element Rbuf 7 Thread pushes ID onto Freelist Rbuf Freelist Thread Freelist 1 64/128 Elements 4 Create Status 128/64B each Receive State Machine Get Free element # 3 Assign thread # Media Switch Fabric Idle bucket Discarded if idle packet 2 Data Arrives Unit SPI-4.2 Frame Pkt ctrl Pkt payload a Pkt ctrl Cell payload Pkt ctrl Pkt payload b buffer Media Device Port A ATM Cell packet Port B