The Cisco ASR 9000 Architecture Sebastián Maulén Service Provider – Systems Engineer semaulen@cisco.com lwigley Viking © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential (NDA Required for external – no softcopy) 1 Agenda Hardware Architecture Overview –Chassis –RSP –Line Card Switch Fabric Architecture and Fabric/System QoS Multicast Architecture QoS Overview BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 2 ASR 9000 At a Glance Optimized for Aggregation of Dense 10GE and 100GE Designed for Longevity: Scalable up to 400 Gbps of Bandwidth per Slot Based on IOS-XR for Nonstop Availability and Manageability Market focus: - CE L2 Business VPN - Residential Triple Play - Mobile Backhaul - Advanced Video Services BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 3 ASR 9010 and ASR 9006 Chassis Integrated cable management with cover System fan trays Front-toback airflow Side-to-back airflow RSP (0-1) Line Card (0-3) Line Card (0-3, 4-7) RSP (0-1) System fan trays Air draw cable management BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Three Modular Power Supplies Cisco Public Six Modular Power Supplies 4 ASR 9010 and 9006 Chassis with Door Optional class door with lock ¼ rack: 17.38”w x 17.35”h x 28”d Half-rack: 17.38”w x 36.75”h x 28”d BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 5 Chassis Overview ASR-9000 10-slot system 10 slots: 8x linecards + 2x RSP Half-rack: 17.38”w x 36.75”h x 28”d Bandwidth (initial) 400Gbps backplane 180Gbps fabric Æ 400G 40G/80G linecards Æ N-100G Carrier-class hardware redundancy AC & DC systems Pay-as-you-grow, modular power Green emphasis throughout BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 6 Chassis Overview ASR-9000 6-slot system 6 slots: 4x linecards + 2x RSP ¼ rack: 17.38”w x 17.35”h x 28”d Bandwidth 400Gbps backplane 180Gbps fabric Æ 400G 40G/80G linecards Æ Nx100G Carrier-class hardware redundancy AC & DC systems Pay-as-you-grow, modular power Green emphasis throughout BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 7 ASR 9000 System Scalability Outlasting the Future 2009 Future 10 slots 6 slots 18 Slots 8 LC + 2 RSP 4 LC + 2 RSP 16 LC + 2 RSP Linecard Density 200 80 Gbps 40 Gbps 200 80 Gbps 40 Gbps 200 Gbps 80 Gbps Bandwidth per Slot 400 Gbps Gbps 180 400 180 Gbps 400 Gbps Bandwidth per Chassis 6.4 2.8 Terabits 3.2 1.4 Terabits 12.8 Terabits Linecards per Chassis BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 8 The ASR 9000 Chassis - Built on a Green Foundation Longevity: Linecard 3D Space (L x W x H), Power, Cooling, & Signal Integrity Designed for Growth to 400 Gbps per Slot “Green” Efficiency: Low Wattage per Gbps of Capacity “Pay as you Grow”: Modular Power Supplies with 50 Amp DC Input or 16 Amp AC for Easy CO Install Variable-Speed Fans for Low Noise Output, with Reduced Power, for NEBS + OSHA Compliance BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 9 Power Distribution (DC N:1 protection) Shelf 0 (Top) LC Feed A Feed A Feed B Feed A Feed B Feed A Feed B Feed A Feed B Feed A Feed B PS 0 PS 1 PS 2 PS 3 PS 4 Power Distribution Bus Feed B PS 5 LC Single power zone, one distribution bus LC All modules load share RSP RSP LC LC Fans 2kW and 1.5kW supplies Each power supply is wired to both ‘A’ and ‘B’ feed Feed failure doubles draw on remaining feed supply failure increases draw on remaining supplies Fans Shelf 1 (bottom) BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 10 Power Distribution (AC 1:1 protection) Shelf 0 (Top) LC Feed A Feed A Feed B Feed B Feed B PS 1 PS 2 PS 3 PS 4 Power Distribution Bus Feed A PS 0 PS 5 LC LC RSP RSP LC LC Single power zone, one distribution bus All modules load share AC power supplies are rates @ 3KW each ‘A’ feed wired to top power shelf ‘B’ feed wired to bottom power shelf Fans Fans Shelf 1 (bottom) BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 11 10-slot chassis: 6kW redundant commons AC “1:1” protection vs. DC “N:1” protection Shelf 0 (Top) Feed A Feed B AC PS 3kW AC PS 3kW AC PS 3kW Shelf 1 (bottom) S Y S T E M L O A D © 2009 Cisco Systems, Inc. All rights reserved. DC PS 2kW Feed A & B DC PS 2kW Feed A & B DC PS 2kW Feed A & B DC PS 2kW Shelf 1 (bottom) For 6KW AC, you need 4x 3KW power supplies (feed failure takes down two of them, supply failure kills one) BRKARC-2003_c2 Feed A & B Cisco Public Power Distribution Bus Feed B AC PS 3kW Power Distribution Bus Feed A Shelf 0 (Top) S Y S T E M L O A D For 6KW DC, 4x 2KW power supplies (2kw each, feed failure has no impact, so we protect only against a supply failure) 12 Power Distribution (DC N:1 protection) 6-slot chassis Feed A Feed B Feed A Feed B Feed A Feed B PS 0 PS 1 PS 2 Power Distribution Bus Power Entry Shelf LC LC Single power zone, one distribution bus LC All modules load share LC RSP 2kW and 1.5kW supplies Each power supply is wired to both ‘A’ and ‘B’ feed RSP Fans Fans BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 13 Power Distribution (AC 2+1 protection) 6-slot chassis Feed A Feed A Feed B PS 0 PS 1 PS 2 Power Distribution Bus Power Entry Shelf LC LC Single power zone, one distribution bus LC All modules load share LC RSP 3KW supplies 6KW maximum power per chassis, additional 3KW is used for protection RSP Fans Fans BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 14 Power Check and Rules Available power is checked when: –An LC card is inserted –An LC card is powered up via the CLI –An LC card is reset via “hw-mod reload” If the system does not have enough available power to accommodate the LC, then the LC becomes “UNPOWERED” Installing new power supplies will not automatically power up any UNPOWERED line cards. The user can force a recheck using: “hw-mod reload loc <>” RSP and Fan Tray cards are given priority allocation of power budget LC power budget is checked in numeric order until it is exhausted. BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 15 RSP Engine Performs control plane and management functions Dual Core CPU processor with 4GB DRAM 2MB NVRAM, 2GB internal bootdisk, 2 external compact flash slots Dual Out-of-band 10/100/1000 management interface Console & auxiliary serial ports USB ports for file transfer Hard Drive: 40G HDD Console Port Management Ethernet BRKARC-2003_c2 BITS AUX Port © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public Status light ALARM Compact Flash Slots Status LED 16 RSP Engine Architecture BITS Clock Time FPGA Timing Domain 4GB MEM HDD CF card Mgt Eth CPU Ether Switch Mgt Eth Console Aux 4G CF Punt FPGA I/O FPGA Front Panel BRKARC-2003_c2 Boot Flash CPU Complex © 2009 Cisco Systems, Inc. All rights reserved. Arbitration Arbitration Crossbar Fabric ASIC Alarm NVRAM Fabric Interface EOBC/ Control Plane GE Cisco Public Crossbar Fabric ASIC Fabric Interface 17 FCS Line Card Support 40Gbps line rate Scalable architecture A9K-8T/4-E -4xTen, 8xTenGE (over subscribed), 40xGE Flexible, microcode based architecture Base & extended memory options –Additional memory Æ higher scale A9K-4T-B –Medium Queue (-B) and High Queue (-E) Common Architecture Enabling Feature Parity across all variants L2 & L3 Feature Coexistence on the same line card and chassis Advanced IP software licence A9K-40GE-B A9K-8T/4-E –L3VPN 8 port Ten Gig Extended memory option 40Gig line rate network processor BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 18 FCS Line Card Hardware Architecture – Example1 Example: A9K-4T-B 2GB flash XFP 3 XFP 2 2GB memory CPU 10GE PHY NPU 0 Crossbar Fabric ASIC Bridge FPGA 0 10GE PHY Arbitration NPU 1 RSP0 GigE EOBC XFP 1 Fabric Interface 10GE PHY NPU 2 Arbitration 10GE PHY NPU 3 Network Clocking BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public Crossbar Fabric ASIC Crossbar Fabric ASIC Bridge FPGA 1 XFP 0 Crossbar Fabric ASIC RSP1 via backplane 19 Line Card Hardware Components Fabric Interface ASIC –Provide data connection to the switch fabric crossbar ASIC –Each Fabric interface ASIC has one fabric channel (23Gbps bi-directional) to each of the crossbar –With dual RSP system, each line card can have 4x23Gbps = 92Gbps bi-directional fabric bandwidth –Fabric interface ASIC has hardware queues and virtual output queues for the system level QoS (see system QoS part for more information) –Fabric interface ASIC has multicast replication table and does the multicast replication in hardware towards the two Bridge FPGA Bridge FPGA –Connect NPU and the Fabric interface ASIC, convert between NPU header and fabric header –Has hardware queues for system level QoS –Has multicast replication table and does the multicast replication in hardware towards the two NPUs “Trident” NPU (Network process unit) –Main forwarding engine, both the L2 and L3 lookup, features, multicast replications are done by the NPU (next slides for more information) CPU –Same type of CPU as RP –Some of the control plane protocols are distributed on the local line card CPU for more scale, like BFD, CFM, ARP –Local SW process receive FIB table from the RP, and program hardware forwarding table into NPU, Bridge and Fabric interface BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 20 FCS Line Card Hardware Architecture – Example2 Example: A9K-8T/4-B over-subscribed ~1.5:1 2GB flash 2GB memory 15Gbps,29Mpps uni-directional per NPU XFP 3 10GE PHY XFP 7 10GE PHY CPU 10GE PHY XFP 6 10GE PHY Bridge FPGA 0 NPU 1 10GE PHY XFP 5 10GE PHY Crossbar Fabric ASIC 2x30Gbps 60Gbps GigE EOBC XFP 1 Fabric Interface NPU 2 Bridge FPGA 1 XFP 0 10GE PHY XFP 4 10GE PHY 4x23Gbps fabric bandwidth © 2009 Cisco Systems, Inc. All rights reserved. 30Gbps Cisco Public Arbitration RSP0 Crossbar Fabric ASIC Crossbar Fabric ASIC Arbitration NPU 3 BRKARC-2003_c2 Crossbar Fabric ASIC 30Gbps NPU 0 XFP 2 4x15Gbps Network Clocking RSP1 via backplane 21 Trident Network Process Unit (NPU) Main Forwarding Engine FIB MAC LOOKUP MEMORY E STATS QDR MEMORY - TCAM TRIDENT NPU FRAME MEMORY - Multi-stage microcode based architecture, feature rich 10Gbps bi-directional line rate Each NPU has Four Main Associated memories TCAM , Search/Lookup memory , Frame/buffer memory and statistics memory –TCAM is used for VLAN tag, QoS and ACL classification –Lookup Memory is used for storing FIB tables, Mac address table and Adjacencies –Stats QDR memory is used for all interface statistics, forwarding statistics etc –Frame memory is buffer memory for Queues FCS line cards have two versions – base and extended, depends on the memory size – Search Memory is the same across Base and Extended card to support mix of the line cards without impacting the system level scale like routing table, multicast, MAC address table –TCAM , QDR and Frame Memory are Smaller in Base cards to have lower scale number of QoS queues and sub-interfaces supported on the Line card level BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 22 Hardware Subsystem: Linecard Synchronous Ethernet support on existing HW RSP has BITS input, DTI and centralized Clock Distribution hardware Full support for L1 Sync-E on linecards (XR 3.9) Flexible time sourcing: Line cards capable of recovering clock and sending to RSP and receiving Transmit clock from RSP Future Hardware capable of EEE1588-v2 BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 23 Hardware Subsystem: Linecard Optics and SFP support Gigabit SFPs (40xGE card) SFP-GE-T= (supports 10M, 100M and 1000M on cat5/6 cable) SFP-GE-S= SFP-GE-L= SFP-GE-Z= 10GE XFPs (all Nx10GE cards) XFP-10GLR-OC192SR= XFP-10GZR-OC192LR= XFP-10GER-192IR+= CWDM SFPs: 1470nm to 1610nm DWDM XFP & SFPs: most wavelengths from 1530-1561nm ITU 100Ghz C-band spacing BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 24 New Line Cards in Release 3.9.x 8 x 10GE Line Rate Line Cards Combo 2x10GE+20xGE Line Cards Low-Queue Line Cards BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 25 ASR 9000 Ethernet Linecards Low Queue* Medium Queue High Queue 512k 512k 512k IPv4 Routes 1M 1M 1M VRFs 4k 4k 4k MPLS Labels 128k 128k 128k L3 Subif/Port 4k 4k 4k Bridge Domains 8k 8k 8k EFPs 4k 16k 32k 8/port 64k 256k 8k 128k 256k 50ms 50ms 150ms MAC Addresses Egress Queues Policers Packet Buffer Different Metric Common Capability Comparison * Low Queue Linecards are targeted for 3.9 BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 26 Hardware Subsystem: Linecard IPoDWDM / G.709 / WAN-PHY support Available on 2+20, 8xTenGE cards: These cards have a separate Ethernet MAC chip that provides WAN-PHY & g.709 support WAN-PHY Provides ethernet framing over SONET/SDH OC-192 signalling rate (9.953 vs 10.0 Gbps) IPoDWDM / G.709: Utilizes forward error correcting (FEC) coding to improve signal:noise ratio Extends fiber span distance capabilities without regeneration Two variants of FEC on ASR 9000 linecards: “standard” FEC (a.k.a. “G-FEC”) : compatible with most vendors L2/L3 equipment and MSTP “enhanced” FEC: proprietary, compatible with 7600 and other ASR9Ks, but not CRS-1 Software Licencing for G.709: WAN-PHY configurable without any additional licence g.709 / FEC / EFEC requires additional licence per linecard WDM I/F WDM I/F BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 27 Cisco ASR 9000 8x10GE Line-Rate LC Testimonials “LR and EANTC had a first look at a linecard with eight 10Gigabit Ethernet port …This shows that the device could handle delivering 160 Gbit/s of multicast and unicast traffic or 80 Gbit/s in each direction.” “The ASR 9010 was able to deliver high-priority traffic, such as VoIP calls, even when the network was under an unusual traffic load and fending off a simulated denial-of-service attack.” European Advanced Networking Test Center AG (EANTC)’s Test of Cisco ASR 9000; commissioned by Light Reading http://www.lightreading.com/document.asp?doc_id=177356&page_number=9 BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 28 Agenda Hardware Architecture Overview –Chassis –RSP –Line Card Switch Fabric Architecture and Fabric/System QoS Multicast Architecture QoS Overview BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 29 Fabric Overview Fabric is logically separate from LC/RSP Physically resides on RSP Separate data and arbitration paths Each LC/RSP has a fabric interface ASIC (80Gbps line rate LC have two fabric interface ASICs) Crossbar Fabric ASIC Crossbar Fabric ASIC 23Gbps per fabric channel 8x23Gbps =184Gbps with dual RSP, 4x23Gbps=92Gbps with single RSP Arbitration Fabric Interface and VOQ Single-Fabric interfaces 40G Linecard 4x23Gbps =92Gbps with dual RSP, 2x23Gbps=46Gbps with single RSP RSP0 Crossbar Fabric ASIC Crossbar Fabric ASIC Fabric Interface and VOQ Fabric Interface and VOQ Dual-Fabric interfaces 80G Linecard Arbitration RSP1 BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 30 Fabric Load Sharing – Unicast Crossbar Fabric ASIC Crossbar Fabric ASIC Arbitration Fabric Interface and VOQ RSP0 Fabric Interface and VOQ 4 3 2 1 Crossbar Fabric ASIC Crossbar Fabric ASIC Arbitration RSP1 Unicast traffic is sent across first available fabric link to destination (maximizes efficiency) Each frame (or superframe) contains sequencing information All destination fabric interface ASIC have re-sequencing logic Additional re-sequencing latency is measured in nanoseconds BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 31 Fabric Load Sharing – Multicast Crossbar Fabric ASIC Crossbar Fabric ASIC Arbitration Fabric Interface and VOQ RSP0 Fabric Interface and VOQ C1 B2 A3 B1 A2 A1 Flows exit in-order Crossbar Fabric ASIC Crossbar Fabric ASIC Arbitration RSP1 Multicast traffic is hashed based on (S,G) info to maintain flow integrity Very large set of multicast destinations preclude re-sequencing Multicast traffic is non arbitrated – sent across a different fabric plane BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 32 Fabric Super-framing Mechanism Multiple unicast frames from/to same destinations are aggregated into one super frame Super frame is created if there are frames waiting in the queue, up to 32 frames or when min threshold meet, can be aggregated into one super frame Super frame only apply to unicast, not multicast Super-framing significantly improves total fabric throughput Packet 3 Packet 1 No super-framing Packet 2 Packet 1 Min reached Packet 2 Packet 1 Max reached Packet 1 Max Super-frame BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Min Super-frame Cisco Public Jumbo 0 (Empty) 33 Access to Fabric Bandwidth – Arbitration Access to fabric controlled using central arbitration. Arbitration is being performed by a central high speed arbitration ASIC on the RSP. At any time a single arbiter is responsible for arbitration The Arbitration algorithm is QOS aware and will ensure that P1 classes have preference over P2 classes, both of which have preference over non-priority classes Arbitration make sure the bandwidth fairness among the Line cards Arbitration is performed relative to a given the egress 10G complex (NPU) Fabric capacity on egress modules represented by Virtual Output Queues (VOQs) at ingress to fabric BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 34 Fabric Arbitration RSP0 Crossbar Fabric ASIC 1: Fabric Request 5: credit return Crossbar Fabric ASIC Arbitration Fabric Interface and VOQ 2: Arbitration Fabric Interface and VOQ Crossbar Fabric ASIC 3: Fabric Grant 4: load-balanced transmission across fabric links Crossbar Fabric ASIC Arbitration RSP1 BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 35 ASR 9000 Advanced System QoS (1) – User priority is mapped to system priority Æ end-to-end priority Drop low priority packets during the congestion 5Gbps Hi + 5Gbps Lo 1 T0 T1 5Gbps Hi + T2 5Gbps Lo B1 T3 LC1 P1 P2 BE … Rx queue, nonblocking Fabric Interface 2 Fabric Interface B0 T – Trident NPU T0 1 T1 2 T2 B1 Switch Fabric T3 LC2 P1 P2 BE P1 P2 BE Ingress queues B0 B – Bridge FPGA System queues in fabric interface ASIC (P1,P2, 2xBE) HP LP System queues in bridge FPGA (Hi and Lo) P1 P2 BE … Egress queues Ingress queues BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. User (interface) level queues, Egress queues hierarchy, priority propagation Cisco Public 36 User QoS to System QoS mapping ASR 9000 supports traffic differentiation at all relevant points within the system –P1/P2/BE differentiation or HP/LP differentiation support throughout the system • Classification into these priorities is based on input user QoS classification on the ingress linecard into P1, P2, or Other queues –Once a packet is classified into a P1 class on ingress it will get mapped to PQ1 queue along the system qos path –Once a packet is classified into a P2 class on ingress it will get mapped to PQ2 queue along the system qos path. If system component only have one priority queue, then both user P1 and P2 will be mapped to the same system PQ. –Once a packet is classified into a non-PQ1/2 class on ingress it will get mapped to LP queue along the system qos path Note: The marking is implicit once you assign a packet into a given queue on ingress; its sets the fabric header priority bits onto the packet –no specific “set” action is required BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 37 ASR 9000 Advanced System QoS (2) – Back pressure and VoQ Æ avoid HoLB B – Bridge FPGA Ingress LC fabric interface has Virtual Output Queues per each egress NPU for all LCs in the system. Packet to different egress NPU is in different VoQ, no HoLB DRR Slot 2/T1 Backpressure to fabric interface ASIC if high priority queues congested 3 1 T0 2 10Gbps T3 P1 P2 T0 B1 Fabric Interface T1 T2 BRKARC-2003_c2 2 B0 T1 3 T2 B1 Switch Fabric LC1 P1 P2 BE BE © 2009 Cisco Systems, Inc. All rights reserved. T3 LC2 P1 P2 … 1 Fabric Interface B0 6Gbps BE DRR 4Gbps Slot 2/T0 T – Trident NPU HP LP P1 P2 BE … Cisco Public 38 What Are VOQs? Virtual Output Queues (VOQs) on ingress modules represent fabric capacity on egress modules If VOQ available on ingress to fabric, capacity exists at egress module to receive traffic from fabric Central arbiter determines whether VOQ is available for a given packet VOQ is “virtual” because it represents EGRESS capacity but resides on INGRESS modules It is still PHYSICAL buffer where packets are stored Note: VOQ is NOT equivalent to ingress or egress port buffer or queues Relates ONLY to ASICs at ingress and egress to fabric BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 39 Fabric Interface ASIC VOQs RSP0 Port 3 DRR DRR Crossbar Fabric ASIC Arbitration Ingress Multicast Fabric Interface ASIC Multicast RSP1 136 ingress VoQ used: Egress Scheduler Crossbar Fabric ASIC Slot 9/Port 3 DRR Port 2 DRR DRR Slot 9/Port 2 DRR . Port 1 Arbitration Ingress Scheduler . Port 0 DRR DRR Slot 0/Port 1 Crossbar Fabric ASIC DRR DRR Slot 0/Port 0 Crossbar Fabric ASIC Egress Fabric Interface ASIC 20 egress fabric queues: 4 classes/port * 4 ports/LC (unicast) == 16 8 dest LCs * 4 10G ports/LC * 4 classes/port** == 128 VoQ for LCs 4 multicast classes == 4 2 dest RSPs * 1 10G port/RSP * 4 classes/port == 8 VoQ for RSPs 4 multicast queues BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 40 VOQ Destinations For every “destination” on other modules in system, each ingress module has corresponding VOQ with four priority levels One VOQ with four priority levels serves one “destination” which is one NPU complex on an egress module One NPU complex can have: One front-panel 10G port (non-blocking line card) -orTwo front-panel 10G ports (blocking line card) -orTen front-panel 10/100/1000 ports Line Card NPU 0 Bridge 0 NPU 1 VOQ “destination” Fabric Interface NPU 2 Bridge 1 NPU 3 BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 41 ASR 9000 Advanced System QoS (3) – B – Bridge FPGA T – Trident NPU Multicast and Unicast Separation Æ U:M bandwidth fairness Guaranteed unicast and multicast bandwidth ratio under congestion 4Gbps T0 T0 T1 10Gbps T2 B1 LC1 P1 P2 BE … B0 T1 T2 B1 Switch Fabric T3 3Gbps Fabric Interface Fabric Interface B0 6Gbps Rx queue, nonblocking P1 P2 T3 LC2 Separated unicast and multicast fabric plane BE P1 P2 BE HP LP P1 P2 BE … Unicast and Multicast have separate system queues (High and Low) BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 42 Advanced System QoS Summary Central arbitration for fabric access –Ensures fair access to bandwidth for multiple ingress ports transmitting to one egress port –Central arbiter ensures all traffic sources get appropriate access to fabric bandwidth, even with traffic sources on different modules System queue priority –Ensures priority traffic takes precedence over best-effort traffic across system components –User priority mapped to system priority automatically Flow control and VoQ –Prevents congested egress ports from blocking ingress traffic destined to other ports –Mitigates head-of-line blocking by providing dedicated buffer (VOQ) for individual destinations across the fabric Unicast and multicast queue/fabric separation –Guaranteed unicast and multicast bandwidth ratio across fabric and system components under congestion BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 43 Agenda Hardware Architecture Overview –Chassis –RSP –Line Card Switch Fabric Architecture and Fabric/System QoS Multicast Architecture QoS Overview BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 44 ASR 9000 Multicast Architecture Overview Clean, Simple and Scalable Architecture for bandwidth Efficiency and Guaranteed QoS 1 T1 4 Replication points at different system component 1 LC3 LC3 2 T3 4 Switch Fabric Clean and Scalable Multicast Architecture Always fabric and egress line card replication Fabric Interface Fabric Interface Fabric Interface B1 3 Fabric Interface B0 T2 IGMP joins 4 3 LC1 T0 Multicast Source 2 LC2B0 T0 B0 T0 B0 2 T1 B1 B1 T2 B1 3 Line rate Multicast replication independent of the scale Bandwidth Efficient Multicast Replication 4 T3 T0 T1 T1 T2 T2 T3 IGMP joins T3 Replicate the packet in the most optimized way Guaranteed System QoS Separated Unicast and Multicast High and Low priority system level queues; Separated switch fabric plane simple, predictable Guaranteed priority Guaranteed unicast and multicast bandwidth ratio BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 45 ASR 9000 Multicast Architecture Overview Control plane All multicast control protocols run on the RP Local line card CPU handle exceptional multicast packet (SW switched, Multicast signaling, etc) Distributed forwarding plane Multicast forwarding is fully distributed on the LC Two stage forwarding Ingress LC lookup determine what’s the destination egress LCs and NPUs Egress LC lookup determine what’s the destination egress ports Optimal HW based multicast packet replication Multicast packet is replicated on switch fabric and egress LC, ingress LC never replicate multicast packet Multicast packet is replicated by HW chip and in efficient way Line rate multicast with fully loaded chassis L2 vs. L3 Multicast L2 and L3 multicast has separate control plane L2 and L3 multicast has uniform data forwarding plane BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 46 L3 Multicast Control Plane T: Trident NPU B: Bridge FPGA MGID: Multicast Group ID FGID: Fabric Group ID RSP LC T0 T1 T2 B1 Fabric Interface B0 PIM IGMP 1 Switch Fabric PI M IGMP 2 3 T3 4 CPU MRIB RP MFIB Incoming IGMP and PIM packets are punted to RP directly bypassing LC CPU Protocols (PIM/IGMP) send their Route/OLIST Information to MRIB process MRIB sends the multicast state information (mroute, olist) to MFIB (process running on LC CPU) MRIB assign FGID for each mroute, indicating which slot it should forward multicast copy. This is based on the OLIST information MRIB assign a global unique MGID for each mroute MFIB program HW forwarding tables in NPU, Bridge FPGA and Fabric interface Program MGID table in Fabric interface ASIC to indicate which Bridge it should forward multicast copy. This is based on the OLIST information Program MGID table in Bridge interface ASIC, similar as above BRKARC-2003_c2 Program table in the NPU, Cisco similar © 2009 CiscoFIB Systems, Inc. All rights reserved. Public as IPv4 unicast 47 L2 Multicast Control Plane T: Trident NPU B: Bridge FPGA MGID: Multicast Group ID FGID: Fabric Group ID RSP LC T0 T1 T2 B1 Fabric Interface B0 IGMP 1 Switch Fabric IGMP Snooping 2 3 T3 4 CPU L2FIB RP L2FIB Incoming IGMP packets are punted to RP directly bypass LC CPU IGMP snooping process send its Information to L2FIB process L2FIB sends the multicast state information (mroute, olist) to L2FIB (process running on the LC CPU) L2FIB assign FGID for each mroute, indicating which slot it should forward multicast copy. This is based on the OLIST information L2FIB assign a global unique MGID for each mroute L2FIB program HW forwarding tables in NPU, Bridge FPGA and Fabric interface Program MGID table in Fabric interface ASIC to indicate which Bridge it should forward multicast copy. This is based on the OLIST information Program MGID table in Bridge interface ASIC, similar as above Program FIB table in the NPU, similar as L3 multicast © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public BRKARC-2003_c2 48 Data Plane – Packet Replication Efficient Multicast Packet Replication 1 Fabric Replication Æ replicate packet to egress LCs based on FGID & FPOE table 2 Fabric Interface Replication Æ replicate packet to Bridge FPGAs based on MGID table 3 Bridge FPGA Replication Æ replicate packet to NPU based on MGID table LC2 T0 B0 T1 T24 IGMP joins T3 4 B1 3 Fabric Interface Multicast Source Fabric Interface LC1 T0 2 1 FGID MGID B0 T1 T2 B1 IGMP joins 3 MGID Switch Fabric NPU Replication Æ replicate packet to egress port IGMP joins 2 MGID 4 4 T3 LC3 MGID T0 Fabric Interface B0 T1 B1 FIGD – Fabric Group ID MGID – Multicast Group ID T2 T3 FPOE – Fabric Point of Exit BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 49 Agenda Hardware Architecture Overview –Chassis –RSP –Line Card Switch Fabric Architecture and Fabric/System QoS Multicast Architecture QoS Overview BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 50 ASR 9000 QoS Capability Overview Very scalable SLA enforcement –Up to 3 Million queues per system (with extended linecard) –Up to 2 Million policers per system (with extended linecard Hierarchical scheduling support –Four layer scheduling hierarchy ÆPort, Subscriber Group, Subscriber, Class –Egress & Ingress Dual Priority scheduling with priority propagation for minimum latency and jitter Flexible & granular classification –Full Layer 2, Full Layer 3/4 IPv4, IPv6 Robust implementation –System level QOS on fabric and LC system components –H-QOS uses dedicated and purpose-built traffic manager HW BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 51 4 Layer Hierarchy QoS Overview L1 L2 L3 L4 Port Level Subscriber group Level Subscriber Level Class Level S-VLAN EFP S-VLAN EFP Port BRKARC-2003_c2 C-VLAN Class Class C-VLAN Class Class Class Class C-VLAN Class Class C-VLAN Class Class © 2009 Cisco Systems, Inc. All rights reserved. Class Class Cisco Public Note: We count hierarchies as follows: 4L hierarchy = 3 Level nested policy-map 3L hierarchy = 2 level nested policy-map L1 level is not configurable but is implicitly assumed Hierarchy levels used are determined by how many nested levels a policy-map is configured for and applied to a given subinterface Max 8 classes per subscriber level 52 H-QOS – Priority & Priority Propagation Priority level 1 & 2 support –The high priority queue level 1 gets scheduled at strict priority, i.e. if it has not met it’s configured maximum BW, determined by policing. –The high priority queue level 2 gets scheduled at relative strict priority after PQ level 1 has been scheduled, i.e. if it (PQ L2) has not met it’s configured maximum BW, determined by policing or shaping. Priority propagation –means that strict priority scheduling (latency/priority behavior) is executed throughout all layers of the hierarchy in case of congestion at any of the levels –Latency assurance at a child class is automatically assured at parent / grandparent levels for traffic in that class e.g. in congestion at parent / grandparent levels, traffic in this class will be serviced first Unshaped Priority traffic for lowest latency: –If priority traffic level 1 is scheduled into a parent shaper it will NOT actually be shaped, but scheduled at linerate –It will only be accounted for at the parent scheduler so that shapers will not be overrun BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 53 www.cisco.com/go/asr9000 BRKARC-2003_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 54