#CLUS ACI Troubleshooting: Endpoints Andy Gossett, DCBU ACI Escalation @agccie BRKACI-2641 #CLUS Cisco Webex Teams Questions? Use Cisco Webex Teams to chat with the speaker after the session How 1 Find this session in the Cisco Live Mobile App 2 Click “Join the Discussion” 3 Install Webex Teams or go directly to the team space 4 Enter messages/questions in the team space Webex Teams will be moderated by the speaker until June 16, 2019. cs.co/ciscolivebot#BRKACI-2641 #CLUS © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 3 8:00 a.m. 8:00 a.m. 120min BRKACI-1001 120min BRKACI-3545 9:30 a.m. 9:30 a.m. 60min BRKACI-2641 1:00 p.m. 60min BRKACI-2642 11:00 a.m. 60min 60min BRKACI-2644 BRKACI-2643 2:30 p.m. 60min BRKACI-2645 4:00 p.m. 1:00 p.m. 90min 120min BRKACI-2271 BRKACI-2934 #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 4 Agenda • ACI Endpoint Learning • Configuration Options • Endpoint Learning Troubleshooting Tips #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 5 Acronyms/Definitions Acronyms Definitions Acronyms Definitions ACI Application Centric Infrastructure LPM Longest Prefix Match ACL Access Control List MDT Multicast Distribution Tree APIC/IFC Application Policy Infrastructure Controller/ Insieme Fabric Controller pcTag Policy Control Tag BD Bridge Domain PL Physical Local COOP Council of Oracle Protocol sclass Source class (source pcTag) ECMP Equal Cost Multipath SVI Switch Virtual Interface EP Endpoint TC Topology Change EPG Endpoint Group VL Virtual Local EPM Endpoint Manager VNID Virtual Network Identifier EPMC Endpoint Manager Client (LC component) VXLAN/iVXLAN Virtual Extensible LAN / Insieme VXLAN FTEP/VTEP Fabric/Virtual or VXLAN Tunnel Endpoint XR VXLAN Remote Reference Slide #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 6 Endpoint Learning What is an ACI Endpoint Depends on who’s counting… An endpoint is a MAC with one or more IPv4 (/32) or IPv6 (/128) addresses An endpoint is a MAC, IPv4 (/32), or IPv6 (/128) address fvCEp <epg-dn>/cep-00:00:00:00:0a Endpoint Synthetic IP 00:00:00:00:00:0a 28.186.73.78 10.0.0.10 21.215.190.9 coop db fvIp <epg-dn>/cep-00:00:00:00:0a/ip-[10.0.0.10] Spine mac: 00:00:00:00:0a count: 1 ip0 : 10.0.0.10 #CLUS BRKACI-2641 Two hardware entries © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 8 What is an ACI Endpoint Why the count matters #Mac w/ one or more IPs #Mac + #IP 450K max #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 9 Classical Learning Encap + Interface => VLAN VLAN => VRF L4/Payload Proto DIP SIP 802.1Q SMAC DMAC L2 Forwarding for (VLAN, DMAC) L2 Learning for (VLAN, SMAC) => (Interface) L3 Forwarding for (VRF, DIP) L2 Forwarding: (VLAN, DMAC) Miss => Flood (VLAN, DMAC) Gateway MAC => Route (VLAN, DMAC) Hit => Destination Port config on destination port + VLAN determines egress encap (tagged or untagged) L3 Forwarding (Longest Prefix Match) (VRF, DIP) Miss => Drop (VRF, DIP) Hit=> Adjacency Might be Glean or packet rewrite (SMAC, DMAC, VLAN, etc…), may include destination port in adjacency or require second L2 lookup on new DMAC #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 10 ARP Packet Classical Learning DMAC SMAC LPM Routes • • Eth: 0x0806 Connected/direct routes manually configured Route Adj 10.1.1.101/32 … Hdr/Opcode Static/dynamic routing protocols to learn prefixes 20.1.1.101/32 10.1.1.0/24 … Glean Sender MAC 20.1.1.0/24 Glean Sender IP Host Routes (IP Endpoints) • • Glean adjacency for connected routes to punt frame and generate ARP request ARP/ND used to create MAC to IP binding and install host route into routing table Target MAC ARP P ARP 20.1.1.101/24 10.1.1.101/24 #CLUS Target IP BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 11 ACI Learning (Physical Local - PL) L4/Payload Proto DIP SIP 802.1Q SMAC DMAC Encap + Interface => EPG EPG => BD BD => VRF EPGs and L3 Learning L2 Forwarding for (BD, DMAC) L2 Learning for (BD, SMAC) => (EPG, Interface) L3 Learning for (VRF, SIP) => (EPG, Interface) L3 Forwarding for (VRF, DIP) L2 Forwarding: (BD, DMAC) Miss => (Flood/Proxy+Drop) (BD, DMAC) Gateway MAC => Route (BD, DMAC) Hit => Adjacency L3 Forwarding (Longest Prefix Match) (VRF, DIP) Miss => Drop Proxy/Glean for BD subnets (VRF, DIP) Hit=> Adjacency Adjacency contains dst EPG, encap information, dst VTEP or port, etc… in upcoming slides © 2019 Cisco and/or its affiliates. All rights reserved. #CLUS MoreBRKACI-2641 Cisco Public 12 Optimize Forwarding (ARP Flooding disabled) ACI Learning (ARP) Target Target Sender Sender Hdr/ IP MAC IP MAC Opcode ethtype 802.1Q SMAC ARP Encap + Interface => EPG EPG => BD BD => VRF DMAC L2 Learning for (BD, SMAC) => (EPG, Interface) L2 Learning for (BD, ARP SMAC) => (EPG, Interface) L3 Learning for (VRF, ARP Sender IP) => (EPG, Interface) L3 Forwarding for (VRF, ARP Target IP) ARP L3 Forwarding (VRF, ARP Target IP) Miss => Proxy (VRF, ARP Target IP) Hit=> Adjacency L3 forwarding based on ARP target IP field with miss sent to spine proxy #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 13 ACI Learning (Virtual Local - VL) VXLAN Outer Header Inner Header Proto DIP SIP ethtype SMAC DMAC VNID Rsvd Proto UDP DIP SIP 802.1Q SMAC DMAC External VNID => EPG EPG => BD BD => VRF L2 Forwarding for (BD, DMAC) Infra BD MAC Host MAC L2 Learning for (BD, SMAC) => (EPG, Tunnel) L3 Learning for (VRF, SIP) => (EPG, Tunnel) VXLAN Tunnel L4/Payload Fabric TEP Host VTEP Infra VLAN L3 Forwarding for (VRF, DIP) AVS/AVE/OVS #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 14 iVXLAN Header OUTER MAC Header 802.1Q IPv4 Header Flags 0 1 2 3 4 5 INNER 6 7 8 UDP Header MAC Header VXLAN Header iVXLAN Header D L E S P D P 9 10 11 12 IPv4 Header UDP Header PAYLOAD Source Group 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Virtual Network Identifier (VNID) 32 33 34 35 36 37 38 39 40 41 42 43 44 FCS 45 46 47 27 28 29 30 31 61 62 63 Reserved 48 49 50 51 52 53 54 55 56 57 58 59 60 Abbr. Name Description DL Do not learn Informs remote leaf that it should not perform dataplane learning from this frame E Exception Set when frame has gone through proxy path SP Source-policy-applied Policy has already been applied to this frame DP Destination-policy-applied - (DP and SP are always set together) sclass/pcTag Source group (policy-control tag) 16-bit policy control tag representing the EPG that sourced the 15 BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public #CLUS frame iVXLAN Header OUTER MAC Header 802.1Q IPv4 Header Flags 0 1 2 3 4 5 INNER 6 7 8 UDP Header MAC Header VXLAN Header iVXLAN Header D L E S P D P 9 10 11 12 IPv4 Header UDP Header PAYLOAD Source Group 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Virtual Network Identifier (VNID) 32 33 34 35 36 37 38 39 40 41 42 43 44 FCS 45 46 47 27 28 29 30 31 61 62 63 Reserved 48 49 50 51 52 53 54 55 56 57 58 59 60 Abbr. Name Description DL Do not learn Informs remote leaf that it should not perform dataplane learning from this frame E Exception Set when frame has gone through proxy path SP Source-policy-applied Policy has already been applied to this frame DP Destination-policy-applied - (DP and SP are always set together) sclass/pcTag Source group (policy-control tag) 16-bit policy control tag representing the EPG that sourced the 16 BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public #CLUS frame ACI Learning (Remote - XR) iVXLAN Outer Header Inner Header L4/Payload Proto Dst Leaf VTEP Src Leaf VTEP Fabric QoS DIP SIP ethtype SMAC DMAC VNID flags EPG Proto UDP DIP SIP 802.1Q SMAC DMAC EPG (pcTag) Internal MAC BD or VRF VNID (based on routed or switched) L2 Forwarding for (BD, DMAC) L2 Learning for (BD, SMAC) => (EPG, Tunnel) L3 Learning for (VRF, SIP) => (EPG, Tunnel) L3 Forwarding for (VRF, DIP) #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 17 ACI Learning Learning Exceptions No IP EP learning if routing is disabled on the BD • No IP EP learning on external BD’s (Layer-3 Outside interfaces) • No IP EP learning on Infra VLAN • No IP learning of shared service prefixes outside of our VRF LPM Routes (Same as Classical) • Pervasive SVI Routes (BD Subnets) • Static and dynamic routing protocols on L3Out VXLAN/Opflex traffic between host and fabric on Infra VLAN VXLAN Tunnel • Static/Dynamic Routing on L3Out WAN/ Internet AVS/AVE/OVS #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 18 ACI Learning Frame Forwarding Operation Learn NonIP/IP Bridged MAC ARP - MAC (sender-HW), IP (sender-IP) IPv4 Unicast Routed MAC, IP IPv6 Unicast Routed MAC, IP IPv6 Neighbor Discovery MAC, IP Leaf Endpoint Database VRF Remote IP Entries (VRF, IP) BD Remote MAC Entries (VRF, BD, MAC) Encap Endpoint Entry - EPG (pcTag) - Interface/Tunnel - Control flags Local MAC and IP Entries (VRF, BD, VLAN/VXLAN, MAC) (VRF, BD, VLAN/VXLAN, IP) IP IP Entry Mac IP Entry IP Entry Entry Entry Relationship to multiple IPs #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 19 ACI Learning (COOP and EP Sync) COOP sync between oracles (Spines) Spines learns all endpoints through Coop COOP citizen(leaf) update to oracle (spine) for local EP learn remote learn on leaf from dataplane packet vPC Domain 2 vPC Domain 1 local learn on leaf EP sync between vPC peersfrom dataplane packet EP sync between vPC peers for remote learns for local learns (both orphan and vPC ports) #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 20 ACI Learning: Review • MAC learning for all frames • IP learning for routed packets and ARP packets • No IP learning on frames received on L3Out or Infra vlan • All local endpoint learns are published to coop spine has full knowledge of all fabric endpoints • Proxy forwarding for any fabric endpoint allowing for zero-penalty impact for remote endpoint miss • #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 21 Spines Moves and Bounce Addr Interface Detail A tun1001 leaf101/102 vTEP B tun4 leaf104 TEP Leaf101/102 leaf102 leaf101 leaf103 leaf104 Addr Interface Detail A vpc1 local vpc B tun4 XR -> leaf104 Addr Interface Detail - - - - - - Addr Interface Detail A tun1001 XR -> leaf101/102 VIP B eth1/1 local learn Leaf 103 A B Initial State Leaf 104 #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 22 Spines Moves and Bounce 3 4 leaf102 leaf101 leaf103 Spines receive event and updates leaf101/102 Bounce set on old leaf101/102 leaf104 Addr Interface Detail A tun1001 tun3 leaf101/102 leaf103 TEP vTEP B tun4 leaf104 TEP Leaf101/102 Addr Interface Detail Detail A vpc1 tun3, bounce local vpc XR -> leaf103 with bounce bit set B tun4 XR XR-> ->leaf104 leaf104 Addr Interface Detail A eth1/1 local learn from 1st packet - - - Addr Interface Detail A tun1001 XR -> leaf101/102 VIP B eth1/1 local learn Leaf 103 A B A 2 1 Host A moves to leaf-103 ! learn on leaf103, published to coop leaf104 still points to old tunnel #CLUS Leaf 104 BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 23 Spines Moves and Bounce 2 leaf101/102 bounce to leaf103 Addr Interface Detail A tun3 leaf103 TEP B tun4 leaf104 TEP Leaf101/102 leaf101 leaf102 leaf103 leaf104 Addr Interface Detail A tun3, bounce XR -> leaf103 with bounce bit set B tun4 XR -> leaf104 Leaf 103 B A 1 host B sends packet to host A #CLUS leaf103 learns host B to leaf104 Addr Interface Interface 3 Detail Detail A eth1/1 eth1/1 local locallearn learn B tun4 - XR - -> leaf104 Leaf 104 Addr Interface Detail A tun1001 XR -> leaf101/102 VIP B eth1/1 local learn BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 24 Spines Moves and Bounce Addr Interface Detail A tun3 leaf103 TEP B tun4 leaf104 TEP Leaf101/102 leaf101 leaf102 leaf103 leaf104 Addr Interface Detail A tun3, bounce XR -> leaf103 with bounce bit set B tun4 XR -> leaf104 Leaf 103 B A 4 host A sends packet to host B 5 Addr Interface Detail A eth1/1 local learn B tun4 XR -> leaf104 Addr Interface Detail A tun1001 tun3 XR -> leaf101/102 leaf103 TEP VIP B eth1/1 local learn Leaf 104 leaf104 updates XR to leaf103 #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 25 Aging Addr Time-left Reset-count Hit A 15 second 900 second 225 224 No Yes • Hardware maintains hit-bit for each entry which is set whenever a frame is received from corresponding source address • If packet is not seen within timeout, then entry is aged and removed from hardware • Else if leaf receives a frame and hit-bit is set, then software resets timer and hit bit and entry is not aged out. • For local IP endpoints, at 75% of endpoint timer, then host tracking sends 3x ARP/ND to verify if endpoint is still present • ARP/ND reply resets timer for both IP and MAC No regular ARP/ND required • Support for silent hosts to verify IP is still present if traffic is regularly received! • No response and endpoint will eventually age-out #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public A 26 VPC Aging Addr Hit Flags Addr Hit Flags A No Local,vpc-attached local, vpc-attached A No Local,vpc-attached local, vpc-attached B No peer-attached B No local A B vpc host Orphan host • For vpc, both leaves in the vpc domain have to age out the entry before it is removed. This applies to remote and local entries • For orphan ports, as soon as the local leaf ages it out it is deleted from both switches. #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 27 VPC Aging 2 Peer-aged flag set indicating that peer has aged the entry. Will be deleted once local leaf ages out it as well. 1 When vpc endpoint is aged, set local-aged flag and send update to peer Addr Hit Flags Addr Hit Flags A No local, vpc-attached peer-aged A No local, vpc-attached local-aged B No peer-attached B No local B A vpc host #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 28 VPC Aging 3 Endpoint is locally-aged, send update to peer. Since both local-aged and peer-aged is set, delete entry 4 Receive peer-aged from peer. Since both local-aged and peer-aged is set, delete entry Addr Hit Flags Addr Hit Flags A No local, vpc-attached peer-aged, local-aged A No local, vpc-attached local-aged, peer-aged B No peer-attached B No local B A vpc host #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 29 VPC Aging 2 1 Orphan port deleted as soon as peer ages it out When orphan port is locallyaged, simply delete and send update to peer Addr Hit Flags Addr Hit Flags A No local, vpc-attached A No local, vpc-attached B No peer-attached B No local local-aged B A Orphan host #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 30 Configuration Options Nerd Knobs Timers – Endpoint Retention Policy Timer Default Applied at BD Applied at VRF Local 900 sec Mac and IP - Bounce 630 sec Mac IP Remote 300 sec Mac IP Move 256/sec - - Hold 300 sec - - XR MACs are always learned at BD level XR IP’s are always learned at VRF level • If moves/sec exceed rate then learning is disabled on BD for the hold time as a protection mechanism for software components (epm/epmc/coop) #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 32 Timers – Endpoint Retention Policy Custom Aging Timers at BD level #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 33 Timers – Endpoint Retention Policy Custom Aging Timers at VRF level #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 34 Issue #1 Switch independent NIC team and load/spreading (Misconfigured Host) 1 ARP on eth2-1 with mac A, IP C eth2-1 mac: A 3 eth2-2 mac: B Source traffic for flow-Y from A 2 Source traffic for flow-X from B IP: C • Each routed IP frame triggers a new IP learn within the fabric and endpoint is rapidly moving between mac A and mac B • Possibly no perceived impact on dataplane traffic, however high CPU on leaf. If NIC is between two leaves, then may see coop process high on spine as well. #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 35 Issue #1 Available in 3.2(1) Fix – Enable Rogue Endpoint Detection System -> System Settings -> Endpoint Controls -> Rogue EP Control • An endpoint is marked as Rogue if it moves over the multiplication factor within the detection interval. • Endpoint is programmed as static to prevent new local learns and DL bit is set for all frames to prevent XR updates. Note, this is not a fix but allows operators an opportunity to protect their fabric and get notified of misconfigured hosts #CLUS BRKACI-2641 • Fault raised for endpoints detected as rogue. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 36 Issue #1 Fix – Enable Rogue Endpoint Detection Example Fault • Fault is raised under the node and also be seen under System faults. #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 37 Issue #1 Fix – Enable Rogue Endpoint Detection Check EPM flag on leaf fab4-leaf101# show system internal epm endpoint ip 10.1.1.101 MAC : 0000.0000.000a ::: Num IPs : 1 IP# 0 : 10.1.1.101 ::: IP# 0 flags : rogue| Vlan id : 3028 ::: Vlan vnid : 8292 ::: VRF name : ag:v1 BD vnid : 15958069 ::: VRF vnid : 2555909 Phy If : 0x16000002 ::: Tunnel If : 0 Interface : port-channel3 Flags : 0x80080c05 ::: sclass : 10932 ::: Ref count : 5 EP Create Timestamp : 12/31/1969 19:00:00.000000 EP Update Timestamp : 05/13/2019 19:58:26.310178 EP Flags : local|vPC|IP|MAC|sclass|rogue| :::: #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 38 Issue #1 What about EP Loop Protection? Not RECOMMENDED • Action is potentially disruptive to other stable endpoints. • BD Learn disable prevents new learns on the entire BD • Port disable may impact a critical port such as fabricinterconnect or DCI link. No mechanism to prioritize a host port. #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 39 Issue #2 Old IP never times out after new IP is assigned to host fab4-leaf101# show endpoint ip 10.1.1.101 Legend: s - arp H - vtep V - vpc-attached p - peer-aged R - peer-attached-rl B - bounce S - static M - span D - bounce-to-proxy O - peer-attached a - local-aged L - local +-----------------------------------+---------------+-----------------+--------------+-------------+ VLAN/ Encap MAC Address MAC Info/ Interface Domain VLAN IP Address IP Info +-----------------------------------+---------------+-----------------+--------------+-------------+ 3028 vlan-101 0000.0000.000a LV po3 ag:v1 vlan-101 169.254.8.62 LV po3 ag:v1 vlan-101 10.1.1.101 LV po3 BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public #CLUS 40 Issue #2 Available in 2.1(1) Fix: Enable IP Aging Policy System -> System Settings -> Endpoint Controls -> IP Aging • For aging, an endpoint is a MAC with one or more IP addresses. If the MAC is active then all IPs learned on the MAC will remain active. • IP Aging policy performs aging on each IP individually #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 41 Issue #3 Misconfigured host/L4-L7 service triggers unexpected learn Border Leaf (BL) L3 Out IP: X service border Addr Interface Detail A tun1 XR -> Service Leaf B tun1 XR -> Service Leaf C eth1/1 local learn Initial Working State A B Service Leaf (SL) C IP X represents a prefix that is learned on the L3Out. During stable state, the service leaf would have an LPM route pointing to the border leaf for this prefix #CLUS Addr Interface Detail A eth1/1 local learn B eth1/2 local learn C tun6 XR -> Border Leaf BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 42 Issue #3 Misconfigured host/L4-L7 service triggers unexpected learn 1 Border Leaf (BL) Host-A sends pkt with source-IP X L3 Out dmac IP: X smac SIP-X service border DIP-C A B C Addr Interface Detail A tun1 XR -> Service Leaf B tun1 XR 3 -> Service Leaf leaf on border C eth1/1 local learn X tun1 XR -> Service Leaf Triggers a learn Service Leaf (SL) #CLUS Addr Interface Detail A eth1/1 local learn B eth1/2 C tun6 2 learnon service leaf local XR -> Border Leaf X eth1/1 local learn BRKACI-2641 Triggers a learn © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 43 Issue #3 Misconfigured host/L4-L7 service triggers unexpected learn 3 Border Leaf (BL) Packet incorrectly sent to SL instead of L3Out L3 Addr Interface Detail A tun1 XR -> Service Leaf B tun1 dmac C eth1/1 smac X tun1 Out IP: X service A border C B Same problem if Host-B tries to send packet to IP X. All connectivity to this IP is broken 1 SIP-C BLService has learned IP Leaf 2XR -> X toward SL local learn XR -> Service Leaf Service Leaf (SL) DIP-X Addr Interface Detail A eth1/1 local learn eth1/2 local learn tun6 XR -> Border Leaf eth1/1 local learn Host-C sends pkt B with source-IP XC X #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 44 Issue #3 Available in 1.1(1) Fix: Limit IP Learning to Subnet Tenant -> Networking -> Bridge Domain • Default setting for new BDs created in 2.3(1e) and 3.0(1k) and above. #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 45 Issue #3 Fix: Limit IP Learning to Subnet (Partial Fix) 1 Local off-subnet learn is ignored 2 dmac Border Leaf (BL) Packet is still BL Interface L3forwarded toAddr Detail Out XR -> Service Leaf IP: X smac SIP-X service border DIP-C A B A tun1 B tun1 XR 3 -> Service Leaf leaf on border C eth1/1 local learn X tun1 XR -> Service Leaf Triggers a learn Service Leaf (SL) C Limit IP learning to subnet prevents off-subnet learn on local leaf but border leaf cannot apply off-subnet logic on XR frame since BD information is not present in packet, only VRF VNID in iVXLAN header #CLUS Addr Interface Detail A eth1/1 local learn B eth1/2 local learn C tun6 XR -> Border Leaf BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 46 Available in 2.2(2) and 3.0(2) Issue #3 Fix: Enforce Subnet Check System -> System Settings -> Fabric Wide Settings #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 47 Available in 2.2(2) and 3.0(2) Issue #3 Fix: Enforce Subnet Check 1 Local off-subnet learn is ignored • This feature is available only for Gen2 L3 Out dmac IP: X smac SIP-X service DIP-C A B • This implicitly enables local subnet check whether it is enabled or not enabled on the BD (i.e., Limit Ip Learning to Subnet on the BD is no longer required). border 2 switches and above XR off-subnet for all BDs in VRF is ignored C • For remote learns, the IP is only learned if the IP belongs to at least BD in the VRF. #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 48 Issue #4 Leaf101 Stale Endpoint on Border Leaf Traffic from L3out destined to Host-A is bounced through leaf101 L3 Out leaf101 leaf103 border A A B Addr Interface Detail A tun3, bounce XR -> leaf103 with bounce bit set Leaf 103 Addr Interface Detail A eth1/1 local learn Border Leaf Addr Interface Detail A tun1 XR -> leaf101 TEP • In initial state, Host-A has triggered an XR learn on the border leaf. Let’s assume in this example that Host-A was communicating with Host-B. • Host-A then moves to leaf103. It no longer sends any frames to Host-B but continues sending frames out the L3out toward the border leaf. • Leaf101 maintains a bounce-entry for Host-A #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 49 Issue #4 Leaf101 Stale Endpoint on Border Leaf L3 Out leaf101 leaf103 border A Addr Interface Detail A tun3, bounce Bounce entry timed XR -> leaf103 with out bounce bit set Eventually bounce entry times out Leaf 103 Addr Interface Detail A eth1/1 local learn HIT bit set, but move ignored due to DL bit Border Leaf Addr Interface Detail Hit A tun1 XR -> leaf101 TEP No Yes • Leaf103 is a Gen1 leaf and the VRF is in ingress enforcement. Due to hardware restriction on Gen1, traffic sent to the L3Out has the DL (don’t-learn) bit set in the iVXLAN header. • When the border leaf receives the frame, it updates aging hit bit but does not update the learn entry since DL bit is set. • Eventually, the bounce entry on leaf101 will timeout but border leaf will still have XR #CLUS entry point to leaf-101. Any traffic destined to host-A will be dropped BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 50 Issue #4 Leaf101 Stale Endpoint on Border Leaf Traffic from L3out toward Host-A is sent to leaf-101 L3 Out leaf101 leaf103 Addr Interface Detail A - Bounce entry timed-out Leaf 103 border Addr Interface Detail A eth1/1 local learn Border Leaf A Leaf-101 drops the packet Addr Interface Detail Hit A tun1 XR -> leaf101 TEP Yes Entry on BL is now stale. It points to leaf-101 which is not where Host-A exists #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 51 Issue #4 Fix: Disable Remote Endpoint Learning on Border Leaf Available in 2.2(2) and 3.0(1) System -> System Settings -> Fabric Wide Settings • No XR IP learning on Border Leaf • L3Out deployed with VRF in ingress policy enforcement mode • Prevents stale endpoint caused by Gen1 sending traffic to L3Out with DL bit set 52 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public • Note, routed multicast will still trigger an XR#CLUS IP learn BRKACI-2641 on Border Leaf with Gen2 switches Stale Endpoint Software Fix Feature: EP Announce on Bounce Delete L3 Out leaf101 A leaf103 Leaf101 Addr Interface Detail A tun3, bounce XR -> leaf103 with bounce bit set Border Leaf border A Addr Interface Detail Hit A tun1 XR -> leaf101 TEP Yes • Let’s consider the same scenario as Issue#4. Host-A moved from leaf101 to leaf103, a bounce entry is present on Host-A, and some flow is resetting the XR hit-bit on the border leaf toward leaf101 #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 53 Stale Endpoint Software Fix Feature: EP Announce on Bounce Delete L3 Out Leaf101 Addr Interface Detail A tun3 - XR -> leaf103 Bounce entry timed-out Border Leaf leaf101 Bounce timer expires, Send EP Announce Delete leaf103 border A Addr Interface Detail Detail Interface A tun1 by announce XRDeleted -> leaf101 TEP Triggers XR delete on any leaf still pointing to leaf101 • Enabled by default in 3.2.2 and above, no configuration required • Supports Gen1 and Gen2 • Prevents stale endpoint issues #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 54 Issue #5 I have no control over the devices connected to the network… • Some environments must support Users routing through their own virtual firewalls Servers IP load-sharing Virtual routers VM with multiple NICs that perform their own routing OR allow users to spin up their own virtual routers, load-balancers, or firewalls • There are supported design recommendations to address each scenario, however it is too difficult or not possible to address each in the current network Dynamic loadbalancers • Can we just do traditional IP learning? #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 55 Issue #5 Available in 4.0(1) Fix: Disable IP Dataplane Learning on the VRF Tenant -> Networking -> VRFs • Local MAC learning still occurs via dataplane • Remote MAC learning still occurs via dataplane for Gen2 • BD L2 hardware proxy is required to support Gen1 since remote MAC learning will not occur • Local IPs are only learned via ARP/ND control plane IP Dataplane learning • Remote IPs are not learned from unicast • Remote IPs are still learned from routed multicast packets #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 56 Issue #5 What about Disable IP Dataplane Learning on the BD? Tenant -> Networking -> Bridge Domains Not recommended to disable #CLUS BRKACI-2641 • Disabling IP Dataplane learning on the BD is only tested/supported for service graph BDs with PBR • In 3.1 and above with Gen2, this feature is auto-enabled on the PBR node EPG, so disabling on BD is not required with PBR © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 57 Endpoint Control Best Practices • Run 3.2 or above to take advantage of EP Announce Delete • Per BD, enable Limit IP Learning to Subnet • Enable Global IP Aging • Enable Global Enforce Subnet Check (not applicable for Gen1) • If Gen1 leaf present, enable Disable Remote EP Learn on Border Leaf #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 58 Endpoint Learning Troubleshooting Tips Packet Walk Checklist Problem: Host-A cannot ping the gateway • Start with the basics: Verify EPG/BD/VRF basic config What leaf/port is the host connected? Is the vlan-encap deployed to the leaf? Is the port a member of the vlan? Is the SVI present with gateway config? A 10.1.1.101 0000.0000.000A EPG: e1 BD: bd1 VRF: v1 Is the endpoint learned? If we were learning the endpoint in the fabric, we could quickly tell which leaf/port it was connected and, most likely, it would be able to ping its gateway… #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 60 Packet Walk Checklist Is the endpoint learned? Problem: Host-A cannot ping the gateway Skip to the last step first, since it can validate all other steps Check EP Tracker in APIC UI fab4-apic1# show endpoint ip 10.1.1.101 Legends: (P):Primary VLAN (S):Secondary VLAN Check for endpoint on APIC CLI Total Dynamic Endpoints: 0 Total Static Endpoints: 0 #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 61 Packet Walk Checklist Problem: Host-A cannot ping the gateway VRF: v1 Validate static path attachment and encap. In this example, vpc on node101/102 and VLAN encap 101 #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 62 Packet Walk Checklist Problem: Host-A cannot ping the gateway Ensure the BD is associated to the EPG Also (not shown), ensure the BD is associated to the VRF Network faults may require you to verify your access policy configuration (AEP, phy domain, vlan pool, switch/interface selectors) Ensure there are no faults for the EPG that might have stop deployment to your leaf. #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 63 Packet Walk Checklist Is the vlan-encap deployed? Problem: Host-A cannot ping the gateway Is the port a member of the vlan? Port-channel ag_po1001 with id Po3 and member interface Eth1/3 fab4-leaf101# show port-channel extended | egrep ag_po1001 3 Po3(SU) ag_po1001 LACP Eth1/3(P) fab4-leaf101# vsh_lc -c 'show system internal eltmc info vlan access_encap_vlan 101' | egrep "vlan_id" vlan_id: 3028 ::: hw_vlan_id: 3009 vlan_id: 3028 ::: isEpg: 1 bd_vlan_id: 3027 ::: hwEpgId: 12766 Get the PI vlan for the encap (FD) and the BD vlans fab4-leaf101# show vlan id 3028 extended VLAN Name Encap Ports ---- -------------------------------- ---------------- -----------------------3028 ag:app:e1 vlan-101 Eth1/3, Eth1/4, Eth1/6, Po3, Po4 fab4-leaf101# show vlan id 3027 extended VLAN Name Encap Ports ---- -------------------------------- ---------------- -----------------------3027 ag:bd1 vxlan-15958069 Eth1/3, Eth1/4, Eth1/6, Po3, Po4 #CLUS BRKACI-2641 Verify my interface is forwarding for both EPG and BD vlans © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 64 Packet Walk Checklist Problem: Host-A cannot ping the gateway Is the SVI present with gateway config? Is the endpoint learned? fab4-leaf101# show ip interface vlan 3027 IP Interface Status for VRF "ag:v1" vlan3027, Interface status: protocol-up/link-up/admin-up, iod: 1028, mode: pervasive IP address: 10.1.1.1, IP subnet: 10.1.1.0/24 IP broadcast address: 255.255.255.255 IP primary address route-preference: 1, tag: 0 fab4-leaf101# show system internal epm endpoint ip 10.1.1.101 <none> Remember, vlan-3027 is the vlan for bd1 Queries EPM state directly (fast) fab4-leaf101# show endpoint ip 10.1.1.101 Legend: Same command used on s - arp H - vtep V - vpc-attached p - peer-aged R - peer-attached-rl B - bounce S - static M - span APIC, queries epm MIT state D - bounce-to-proxy O - peer-attached a - local-aged L - local +-----------------------------------+---------------+-----------------+--------------+-------------+ VLAN/ Encap MAC Address MAC Info/ Interface Domain VLAN IP Address IP Info +-----------------------------------+---------------+-----------------+--------------+-------------+ <none> #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 65 Packet Walk Checklist Is the endpoint learned? Problem: Host-A cannot ping the gateway Is the correct subnet pushed? Is learning enabled? fab4-leaf101# show system internal epm vlan 3027 detail | egrep "Learn|fwd_mode|BD Subnet" Valid : Yes ::: Incomplete : No ::: Learn Enable : Yes fwd_mode : route,bridge ::: fwd_ctrl : mdst-flood,ip-lrn-pfx-check, BD Subnet ip_pfx-1 : 10.1.1.1/24 fab4-leaf101# vsh_lc -c 'show system internal epmc vlan 3027 detail' | egrep "Learn|fwd_mode|BD Subnet" fwd_mode : route,bridge ::: fwd_ctrl : mdst-flood,ip-lrn-pfx-check, ::: bridge_mode: mac ::: unk_mac_ucast: proxy Learning disabled :no BD Subnet ip_pfx-1 : 10.1.1.1/24 Both epm (sup component) and epmc (LC Gen2 only, ensure that learning is globally enabled in Hal component) have routing enabled on the BD and learning is enabled. Also BD subnet list contains our prefix fab4-leaf101# vsh_lc -c 'show system internal epmc global-info' | egrep "Hal Learn" Hal Learn Disabled : No fab4-leaf101# vsh_lc -c 'show platform internal hal learn learn' | egrep status status : Enabled status_reason : None #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 66 Packet Walk Checklist Is the endpoint learned? Is the correct subnet pushed? Problem: Host-A cannot ping the gateway Is learning enabled? • Under what conditions do we expect learning to be disabled? Endpoint Retention Policy Remember, if moves per second exceed BD configured policy, learning will temporarily be disabled! Timer Default Applied at BD Applied at VRF Local 900 sec Mac and IP - Bounce 630 sec Mac IP Remote 300 sec Mac IP Move 256/sec - - Hold 300 sec - - #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 67 Packet Walk Checklist Is the endpoint learned? Problem: Host-A cannot ping the gateway Are we receiving the frame? What tools do we have to help? SPAN, ELAM (ELAM-Assistant App) fab4-leaf101# show endpoint mac 0000.0000.000a Legend: We did learn the MAC, but in the s - arp H - vtep V - vpc-attached p - peer-aged R - peer-attached-rl B - bounce S - static M span wrong vlan. Misconfigured host D - bounce-to-proxy O - peer-attached a - local-aged L - local +-----------------------------------+---------------+-----------------+--------------+-------------+ VLAN/ Encap MAC Address MAC Info/ Interface Domain VLAN IP Address IP Info +-----------------------------------+---------------+-----------------+--------------+-------------+ 291/ag:v1 vlan-102 0000.0000.000a LV po3 • We got lucky that the vlan-encap the host was sending in was configured on the leaf, else the frame would have been dropped and no MAC learn triggered Limit IP Learning to Subnet enabled by • Why wasn’t the IP learned? default, vlan-102 in a different BD or unicast routing disabled on that BD #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 68 Packet Walk Checklist Is the endpoint learned? Fixed: Host-A can ping the gateway Fixed the host config and now we’re learning the IP! fab4-leaf101# show endpoint ip 10.1.1.101 Legend: s - arp H - vtep V - vpc-attached p - peer-aged R - peer-attached-rl B - bounce S - static M - span D - bounce-to-proxy O - peer-attached a - local-aged L - local +-----------------------------------+---------------+-----------------+--------------+-------------+ VLAN/ Encap MAC Address MAC Info/ Interface Domain VLAN IP Address IP Info +-----------------------------------+---------------+-----------------+--------------+-------------+ 3028 vlan-101 0000.0000.000a LV po3 ag:v1 vlan-101 10.1.1.101 LV po3 fab4-leaf101# show system internal epm endpoint ip 10.1.1.101 MAC : 0000.0000.000a ::: Num IPs : 1 IP# 0 : 10.1.1.101 ::: IP# 0 flags : Vlan id : 3028 ::: Vlan vnid : 8292 ::: VRF name : ag:v1 BD vnid : 15958069 ::: VRF vnid : 2555909 Phy If : 0x16000002 ::: Tunnel If : 0 Interface : port-channel3 Flags : 0x80000c05 ::: sclass : 10932 ::: Ref count : 5 EP Create Timestamp : 05/17/2019 02:14:09.965041 EP Update Timestamp : 05/17/2019 02:14:09.965041 EP Flags : local|vPC|IP|MAC|sclass| :::: #CLUS Remember that epm/epmc treat an endpoint as a MAC with one or more IPs, so MAC is also displayed for local IP endpoints BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 69 Packet Walk Checklist Does coop have the endpoint? Fixed: Host-A can ping the gateway Bonus validation fab4-spine201# show coop internal info ip-db key 2555909 10.1.1.101 IP address : 10.1.1.101 Verify endpoint in coop using Vrf : 2555909 Flags : 0 VRF vnid and IP address EP bd vnid : 15958069 EP mac : 00:00:00:00:00:0A Publisher Id : 10.0.128.93 Mac and BD VNID Record timestamp : 06 09 2019 13:32:53 827717825 Publish timestamp : 06 09 2019 13:32:53 828777370 Seq No: 0 Remote publish timestamp: 12 31 1969 19:00:00 0 URIB Tunnel Info Num tunnels : 1 Tunnel address : 10.0.128.95 pTEP/vTEP/eTEP of leaf/pod/site Tunnel ref count : 1:::: • Endpoint must be in coop in order for proxy lookups to work. This is critical for XR miss for both intra/inter-pod and intra/inter-site. You should see the same state on all spines. #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 70 Packet Walk Checklist Does coop have the endpoint? Fixed: Host-A can ping the gateway Bonus validation fab4-spine201# show coop internal info repo ep key 15958069 00:00:00:00:00:0A | egrep "^Vrf|^Tunnel nh|^EP|num of active|^Real" EP bd vnid : 15958069 EP mac : 00:00:00:00:00:0A Verify endpoint is in coop using Vrf vnid : 2555909 Tunnel next-hop BD VNID and mac address Tunnel nh : 10.0.128.95 num of active ipv4 addresses : 4 num of active ipv6 addresses : 1 Real IPv4 EP : 10.1.1.101 IPv4/IPv6 addressed Real IPv4 EP : 10.1.1.102 tied to this MAC Real IPv4 EP : 10.1.1.103 Real IPv4 EP : 10.1.1.104 Real IPv6 EP : 2001:0000:0000:0000:0000:0000:0000:0065 #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 71 Endpoint Learning Troubleshooting Review Verify logical config (EPG/BD/VRF and contracts) Verify no network faults under the EPG that would prevent the encap from being deployed Verify that the leaf has the encap deployed Verify that the port is a member of the vlan Verify that the SVI is present on the leaf with the proper subnets Verify that local leaf is learning the endpoint Verify learning is enabled on the BD Verify software components have the correct BD prefixes programmed Verify the leaf is receiving the frame on expected interface and encapsulation Verify that endpoint is present in coop and coop has correct tunnel address #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 72 Recommend Troubleshooting Apps https://aciappcenter.cisco.com/ ELAM Assistant EnhancedEndpointTracker The ELAM Assistant performs ELAM to capture a packet and decode the result. The EnhancedEndpointTracker is a Cisco ACI application that maintains a database of endpoint events on a per-node basis allowing for unique fabricwide analysis. The application can be configured to analyze, notify, and automatically remediate various endpoint events. This gives ACI fabric operators better visibility and control over the endpoints in the fabric. ELAM is a built-in tool that captures a single packet at the ASIC level to check forwarding decision details. It is typically used by Cisco TAC as it requires a deep knowledge of each ACI ASIC to both perform and correctly understand the resulting output. This app wraps the differences between each ACI ASIC and provides a UI to perform an ELAM capture for those who don't have access to ASIC level information. It then decodes this results of the ELAM capture in a user friendly format. #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 73 Enhanced Endpoint Tracker Active endpoint count and fast search Start/Stop the monitor Uptime of the monitor and number of queued events to process Health/history of the monitor itself #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 74 Enhanced Endpoint Tracker Fast search for IP or MAC ~150ms for search to complete #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 75 Enhanced Endpoint Tracker Historical tables to browse various events along with browsing all endpoints in the fabric Top moves in the fabric, quickly see any unstable/misconfigured endpoints #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 76 Enhanced Endpoint Tracker Full details of current state of endpoint within the fabric including local and XR learns Also per-node detailed history, move events, rapid/offsubnet/stale/and clear events History of where endpoint was learned or if it was deleted from the fabric #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 77 Enhanced Endpoint Tracker Clear problem endpoints on multiple nodes quickly #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 78 Complete your online session evaluation • Please complete your session survey after each session. Your feedback is very important. • Complete a minimum of 4 session surveys and the Overall Conference survey (starting on Thursday) to receive your Cisco Live water bottle. • All surveys can be taken in the Cisco Live Mobile App or by logging in to the Session Catalog on ciscolive.cisco.com/us. Cisco Live sessions will be available for viewing on demand after the event at ciscolive.cisco.com. #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 79 Continue your education Demos in the Cisco campus Walk-in labs Meet the engineer 1:1 meetings Related sessions #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 80 Thank you #CLUS Appendix Packet Walk Checklist Problem: Host-A cannot ping Host-B Subtle but important. If bridged then we need to check MAC endpoints, if routed we need to check IP… Is this frame bridged or routed? Am I learning Host-A and Host-B IPs in the fabric? leaf101 leaf102 leaf103 A B 10.1.1.101 0000.0000.000A EPG: e1 10.1.2.102 0000.0000.000B EPG: e2 BD: bd1 BD: bd2 Do we have a remote learn for Host-B on ingress leaf or are we using proxy-path? Do the spines have Host-B entry programmed to handle proxy forwarding? For the leaf that is performing policy enforcement, do I have the appropriate contract? VRF: v1 #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 83 Packet Walk Checklist Am I learning Host-A and Host-B IPs in the fabric? Problem: Host-A cannot ping Host-B fab4-apic1# show endpoint ip 10.1.1.101 <snip> Dynamic Endpoints: Tenant : ag Application : app AEPg : e1 End Point MAC ----------------00:00:00:00:00:0A We can check the endpoint directly on the APIC. If not present, then repeat previous local learn troubleshooting IP Address ---------------------------------------10.1.1.101 Node ---------101 102 Interface -----------------------------vpc ag_po1001 fab4-apic1# show endpoints ip 10.1.2.102 <snip> Dynamic Endpoints: Tenant : ag Application : app AEPg : e2 End Point MAC IP Address ----------------- ---------------------------------------00:00:00:00:00:0B 10.1.2.102 Node ---------103 Interface -----------------------------eth1/5 #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 84 Packet Walk Checklist Do we have a remote learn for Problem: Host-A cannot ping Host-B Host-B on ingress leaf or are we using proxy-path? fab4-leaf101# show endpoint ip 10.1.2.102 Legend: s - arp H - vtep V - vpc-attached p - peer-aged R - peer-attached-rl B - bounce S - static M - span D - bounce-to-proxy O - peer-attached a - local-aged L - local +-----------------------------------+---------------+-----------------+--------------+-------------+ Leaf-101 (ingress leaf) does not VLAN/ Encap MAC Address MAC Info/ Interface learn for Host-B IP Info Domain VLANhave an XR IP Address +-----------------------------------+---------------+-----------------+--------------+-------------+ <none> fab4-leaf101# show ip route 10.1.2.0 vrf ag:v1 IP Route Table for VRF "ag:v1" '*' denotes best ucast next-hop '**' denotes best mcast next-hop '[x/y]' denotes [preference/metric] '%<string>' in via output denotes VRF <string> Ensure that the route has pervasive flag for ‘pervasive BD’ 10.1.2.0/24, ubest/mbest: 1/0, attached, direct, pervasive *via 10.0.208.64%overlay-1, [1/0], 00:24:38, static, tag 4294967295 recursive next hop: 10.0.208.64/32%overlay-1 fab4-leaf101# show isis dteps vrf overlay-1 | grep 10.0.208.64 10.0.208.64 SPINE N/A PHYSICAL,PROXY-ACAST-V4 #CLUS Next-hop IP is spine anycast IPv4 Proxy BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 85 Packet Walk Checklist Do the spines have Host-B entry Problem: Host-A cannot ping Host-B programmed to handle proxy? First, we need the VNID for the VRF to validate routed flow. We can get it vrf vnid from the leaf fab4-leaf101# moquery -c fvCtxDef -x 'query-target-filter=eq(fvCtxDef.ctxDn,"uni/tn-ag/ctx-v1")' scope : 2555909 … fab4-leaf101# vsh_lc -c 'show system internal eltmc info vrf ag:v1' | egrep vnid: | head -1 overlay_index: 0 ::: vnid: 2555909 Tenant -> Networking -> VRFs fab4-apic1# moquery -d uni/tn-ag/ctx-v1 | egrep scope scope : 2555909 We can get it vrf vnid from the APIC cli We can get it vrf vnid from the APIC UI #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 86 Packet Walk Checklist Do the spines have Host-B entry programmed to handle proxy? Problem: Host-A cannot ping Host-B The tunnel address can by one of several different type of TEPs: fab4-spine201# show coop internal info ip-db key 2555909 10.1.2.102 IP address : 10.1.2.102 Vrf : 2555909 Flags : 0x2 Spine has the entry in coop EP bd vnid : 16187409 (should validate each spine) EP mac : 00:00:00:00:00:0B Publisher Id : 10.4.0.2 Record timestamp : 12 31 1969 19:00:00 0 Publish timestamp : 12 31 1969 19:00:00 0 Seq No: 0 Remote publish timestamp: 05 17 2019 02:22:08 814730181 URIB Tunnel Info Num tunnels : 1 Tunnel address : 10.0.16.94 Tunnel ref count : 1 • Physical TEP within same pod • VPC TEP within same pod • Anycast External IP for remote pod or site In this case, this is leaf103 PTEP admin@fab4-apic1:~> acidiag fnvread | grep 10.0.16.94 103 1 fab4-leaf103 SAL19069BUY 10.0.16.94/32 #CLUS BRKACI-2641 • RemoteLeaf PTEP leaf © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 87 Packet Walk Checklist For the leaf that is performing Problem: Host-A cannot ping Host-B policy enforcement, do I have the appropriate contract? Which Leaf applies the contract? • Ingress leaf applies contract if remote endpoint is known so packet does not have to be forwarded all the way through the fabric • Egress leaf applies contract if packet was sent via spine proxy. Will focus on leaf-103 • Border leaf in ingress policy enforcement does not apply contract unless application EPG is deployed locally. To Verify Contract VRF VNID Source EPG pcTag (Host-A) Destination EPG pcTag (Host-B) #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 88 Packet Walk Checklist For the leaf that is performing Problem: Host-A cannot ping Host-B policy enforcement, do I have the appropriate contract? fab4-leaf101# show system internal epm end ip 10.1.1.101 MAC : 0000.0000.000a ::: Num IPs : 1 IP# 0 : 10.1.1.101 ::: IP# 0 flags : Vlan id : 3028 ::: Vlan vnid : 8292 ::: VRF name :Host-A ag:v1 local EPM entry on BD vnid : 15958069 ::: VRF vnid : 2555909 leaf101 contains source pcTag Phy If : 0x16000002 ::: Tunnel If : 0 Interface : port-channel3 Flags : 0x80004c05 ::: sclass : 49155 ::: Ref count : 5 fab4-leaf103# show system internal epm endpoint ip 10.1.2.102 EP Create Timestamp : 05/17/2019 02:14:09.965041 EP Update Timestamp : 05/17/2019 03:46:08.819921 MAC : 0000.0000.000b ::: Num IPs : 1 EP Flags : local|vPC|IP|MAC|sclass|timer| IP# 0 : 10.1.2.102 ::: IP# 0 flags : :::: Host-B Vlan id : 279 ::: Vlan vnid : 8293 ::: VRF name : local ag:v1EPM entry on BD vnid : 16187409 ::: VRF vnid : 2555909 leaf103 contains dest pcTag Phy If : 0x1a004000 ::: Tunnel If : 0 Interface : Ethernet1/5 Flags : 0x80004c04 ::: sclass : 16389 ::: Ref count : 5 EP Create Timestamp : 05/17/2019 02:21:47.612351 EP Update Timestamp : 05/17/2019 03:45:01.836174 EP Flags : local|IP|MAC|sclass|timer| :::: #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 89 Packet Walk Checklist For the leaf that is performing Problem: Host-A cannot ping Host-B fab4-leaf103# show zoning-rule scope 2555909 Rule ID SrcEPG DstEPG ======= ====== ====== 4419 0 0 4420 0 0 4421 0 15 4535 0 49154 FilterID ======== implicit implarp implicit implicit policy enforcement, do I have the appropriate contract? operSt ====== enabled enabled enabled enabled Scope ===== 2555909 2555909 2555909 2555909 Action ====== deny,log permit deny,log permit fab4-leaf103# contract_parser.py --vrf ag:v1 Key: Available since 3.2.2 [prio:RuleId] [vrf:{str}] action protocol src-epg [src-l4] dst-epg [dst-l4] [flags][contract:{str}] [hit=count] [16:4535] [16:4420] [21:4419] [22:4421] [vrf:ag:v1] [vrf:ag:v1] [vrf:ag:v1] [vrf:ag:v1] permit any epg:any tn-ag/bd-bd2(49154) [contract:implicit] [hit=0] permit arp epg:any epg:any [contract:implicit] [hit=0] deny,log any epg:any epg:any [contract:implicit] [hit=5157] deny,log any epg:any pfx-0.0.0.0/0(15) [contract:implicit] [hit=0] fab4-leaf103# show logging ip access-list internal packet-log deny | egrep 10.1.2.102 | head [ Fri May 17 04:02:02 2019 634490 usecs]: CName: ag:v1(VXLAN: 2555909), VlanType: Unknown, Vlan-Id: 0, SMac: 0x000c0c0c0c0c, DMac:0x000c0c0c0c0c, SIP: 10.1.1.101, DIP: 10.1.2.102, SPort: 0, DPort: 0, Src Intf: Tunnel14, Proto: 1, PktLen: 98 <snip> #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 90 Packet Walk Checklist For the leaf that is performing Problem: Host-A cannot ping Host-B policy enforcement, do I have the appropriate contract? In this instance the contract was missing. Add the proper consumer/provider and/or VzAny/preferred group updates to allow communication between the two EPGs fab4-leaf101# show zoning-rule scope 2555909 Rule ID SrcEPG DstEPG ======= ====== ====== 4735 49155 16389 4700 49155 16389 4736 16389 49155 6137 16389 49155 | egrep "Rule|===|16389" FilterID operSt Scope Action ======== ====== ===== ====== 7 enabled 2555909 permit default enabled 2555909 permit Traffic from Host-A (pcTag default enabled 2555909 permit 6 enabled to Host-B 2555909 permit 49155) (pcTag 16389) fab4-leaf101# contract_parser.py --vrf ag:v1 --epg tn-ag/ap-app/epg-e1 Key: [prio:RuleId] [vrf:{str}] action protocol src-epg [src-l4] dst-epg [dst-l4] [flags][contract:{str}] [hit=count] [7:6137] [7:4735] [9:4736] [9:4700] [vrf:ag:v1] [vrf:ag:v1] [vrf:ag:v1] [vrf:ag:v1] permit permit permit permit ip tcp tn-ag/ap-app/epg-e2(16389) tn-ag/ap-app/epg-e1(49155) eq 80 [contract:uni/tn-ag/brc-c1] [hit=0] ip tcp tn-ag/ap-app/epg-e1(49155) eq 80 tn-ag/ap-app/epg-e2(16389) [contract:uni/tn-ag/brc-c1] [hit=0] any tn-ag/ap-app/epg-e2(16389) tn-ag/ap-app/epg-e1(49155) [contract:uni/tn-ag/brc-c1] [hit=0] any tn-ag/ap-app/epg-e1(49155) tn-ag/ap-app/epg-e2(16389) [contract:uni/tn-ag/brc-c1] [hit=220,+10] #CLUS BRKACI-2641 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 91 #CLUS