BFD protocol update IETF 62 David Ward mailto:dward@cisco.com What has changed in the base? • We have a new, incompatible change in the state machine (more later) • We added SHA-1 authentication • Explained how to enable|disable authentication w/o resetting the session • Added Diags for concatenated links What has changed in single hop? • We specified what to do during Graceland Restart – In particular what the IGPs are to do • Stated that don’t have to use TTL 255 when using auth YAND • On the list we discussed hacking single hop vs a new draft • WG chairs would like a draft describing how to ‘generically’ bootstrap a BFD session vs explicitly stating more protocols in single hop draft. • Will become WG item Concatenated Paths and BFD • Two diagnostic codes are defined for this purpose: – Concatenated Path Down (toward the interworking system) – Reverse Concatenated Path Down (away from the interworking system). • Note that the BFD session is not taken down. • Note that if the BFD session subsequently fails, the diagnostic code will be overwritten with a code detailing the cause of the failure, so it is up to the interworking agent to perform this procedure again Security Stuff • We were forced^H^H^H^H^H^H asked to add SHA-1 • We were told to make sure that we can enable|disable auth w/o dropping the session – We removed the requirement that both sides have to have strict drops – Although outside the scope of the spec we give some hints on how to develop it. – If it is confusing or really not wanted (it is rather easy to code and interoperate) - we are willing to revert though it seems unnecessary BFD V1 • Why? What is the problem w/ V0? • Daves send many thanks to Richard Spencer! • BFD as spec'ed has the following problems: The fundamental problem is that BFD has two separate wait states (Init and Failing) and is thus bi-stable, and there is not enough information available (the IHU bit) to detect this case. BFD V1 Problem slide 2 • Worse, if the two ends use different timers during session establishment (say, 1 sec on one end and 5 sec on the other) the deadlock scenario is guaranteed to happen repeatedly (unrecoverably.) • If there is a unidirectional failure, the deadlock scenario is guaranteed to happen 50% of the time (depending on who gets their packet across first) with the timer expiring to get the one guy out of Init state; then the dice roll again. BFD V1 Solution • The good news is that by adding a bit in the protocol, we get rid of a state. – makes sense from a global information theory perspective. • This is much better, simpler, and is demonstrably correct by inspection. – It also comes up in less packets, and handles the unidirectional failure well (it actually takes one less packet to come up in that scenario.) BFD V1 Solution slide 2 • It's the same as every protocol you've seen, with the addition of the loop in the DOWN state as long as the neighbor is in UP state – Thus forcing the neighbor to acknowledge the failure before advancing. – This is sufficient to ensure both sides see the session failure. – ISIS and OSPF have more or less the same state machine, except that they will advance directly from DOWN to UP without having ever sent a packet, thus depriving the neighbor of the knowledge of the flap - we can’t do that V1 slide 2.1 • The Mandatory Section of a BFD Control packet has the following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Vers | Diag |Sta|P|F|C|A|D|R| Detect Mult | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | My Discriminator | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Your Discriminator | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Desired Min TX Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Required Min RX Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Required Min Echo RX Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ V1 slide 2.2 • As said before … no more IHU, no more Failing state State (Sta) Values are: 0 -- AdminDown 1 -- Down 2 -- Init 3 -- Up • Each system communicates its state in the State (Sta) field in the BFD Control packet – No more ambiguity V1 State Machine slide 2.3 • Down state means that the session is down (or has just been created.) – A session remains in Down state until the remote system indicates that it agrees that the session is down by sending a BFD Control packet with the State field set to anything other than Up. – If that packet signals Down state, the session advances to Init state; if that packet signals Init state, the session advances to Up state. V1 State Machine slide 2.4 • Init state means that the remote system is communicating, and the local system desires to bring the session up, but the remote system does not yet realize it. – A session will remain in Init state until either a BFD Control Packet is received that is signalling Init or Up state – or until the detection time expires, meaning that communication with the remote system has been lost V1 State Machine slide 2.5 • Up state means that the BFD session has successfully been established, and implies that connectivity between the systems is working. – The session will remain in the Up state until either connectivity fails, or the session is taken down administratively. – If either the remote system signals Down state, or the detection timeexpires, the session advances to Down state V1 State Machine slide 2.6 • AdminDown state means that the session is being held administratively down. – This causes the remote system to enter Down state, and remain there until the local system exits AdminDown state. BFD V1 Solution slide 3 • There will not be any fancy versioning machinery added to the protocol • V1 will become the default • V1 assumed unless hear V0 (another version) and revert – V1 will not specify that you have to be BW compatible – The protocol is not widely deployed for a versioning requirement BFD-ISIS interaction (see ISIS WG) What is the Problem? • The control plane (ISIS) can run even though there is a forwarding plane failure. – The BFD session will dutifully fail in these conditions, but ISIS will come back up anyhow (because it can't differentiate this scenario from having a neighbor that doesn't run BFD.) BFD-ISIS interaction.2 (ISIS WG) What is the Solution? • The ISIS router will advertise that BFD is running on an interface in a TLV in the IIH. • If no advertisement, don’t attempt a BFD session w/ that neighbor. • When receiving an IIH from a neighbor on an interface with BFD enabled, and if the IIH contains the BFD enabled TLV: – Then the establishment of a BFD session with that neighbor will be required before allowing the adjacency to the neighbor to reach the UP state. – Will require 3-way on p2p Doc status • New Base spec when embargo lifted – Yes, it is actually written already – We plan to have a review period and the LC before next meeting • New single hop draft w/ more nits picked – We will last call after a review period • New generic bootstrap draft - agree to take on as WG doc – We will LC after a review period • MIB will be updated to reflect changes – We will LC after a review period but before Paris • BFD - LSPping will be LC’ed • Review periods will be 3 wks and LC will be 3 wks