CLEO Project CLEO consortium www.cleo-project.org GIMPS Finite State Machine Events and States Design Document Elwyn Davies v0.01, 25 January 2005 1 Chapter Introduction The GIMPS Finite State Machine The GIMPS protocol The GIMPS protocol is quite complex and involves some state for each connection which passes through an NSIS aware node in the process of path discovery and/or interacts with an NSLP on the node. The state is created, modified and deleted as a result of incoming messages and timer events of which there are a considerable number. Connections can use either unreliable (DMODE) or reliable (CMODE) transmission. For DMODE connections and discovery phases for all connections, messages will come from and be sent to UDP sockets. CMODE connections use TCP sockets at present. The intention is to design a Finite State Machine (FSM) which will control the behaviour of the GIMPS protocol for each connection. The FSMs for all extant connections have to be operated in parallel with incoming message events directed to the correct machine, and timers applied separately to each FSM. GIMPS is a soft state protocol and operates a least partially over unreliable connections, so that a multiplicity of timers are needed to control transmission retries and soft state lifetime. Several timers are needed for each connection. To minimize the number of network communication end points (sockets) needed, CMODE connections can opt to use a common ‘messaging association’ with a TCP socket per association to carry all NSIS messages between a pair of adjacent NSIS-aware nodes. The behaviour of GIMPS is slightly different for the NSIS Initiator (the NSIS aware node nearest to the flow sender), the NSIS Receiver (the NSIS aware node closest to the flow receiver) and NSIS aware nodes on the path between the initiator and the receiver. One challenge for the design of the FSM is to minimise the duplication of effort in programming the different types of FSM and to allow a node to function as initiator, receiver or mid-node for each connection on demand. A GIMPS node may be an end host with a single interface or a router with a multiplicity of interfaces. Especially in the early stages of deployment, end hosts may not be NSIS enabled and the first hop router (or site edge router) may be acting as proxy NSIS initiator/receiver for the end host flow sender/receiver. Thus all types of node may be found in a single box and all will have to be capable of handling messages from and to a multiplicity of sockets. Typically this involves a single UDP socket for DMODE messages that originate or terminate at this node (subject to possible message prioritization needing additional sockets), one RAW/UDP socket to handle intercepted UDP messages with the Router Alert flag set, plus one TCP socket for every messaging association that is currently instantiated in the node. Additionally the connections to the NSLPs may be handled through a local inter-process connection using IP protocols. The events driving the state machine come from various sources: Incoming GIMPS messages from upstream and downstream nodes Timer expiries related to potential retransmission needs or soft state expiry Messages generated by the API offered by GIMPS to the NSLP applications Notifications from Messaging Associations to connections using them that the Messaging Association has expired (or been rerouted?). The following sections detail the states and messages for the state machines. For the initial work, the initiator, intermediate and receiver cases are treated separately., although there is some commonality for established states. 2 2 Chapter Connection FSM States and Events Node Nearest Flow Sender States IDLE State when a connection is first created: this will be because an NSLP has issued a SendMessage request with an unknown SID (Note this must be a ‘downstream’ request – an upstream request with an unknown SID should be rejected immediately). It is also relevant to state where a connection has terminated leaving a Message Association behind [See SetStateLifetime request]. If the connection is created due to a SendMessage request, a GIMPS-Query message is transmitted and transition is made to either state WaitResponse1 (Cmode or Dmode with EstablishMA true) or WaitResponse2 (Dmode with EstablishMA false). In either case the AdjacencyResponse timer is started [Timeout TBD/configurable] and the AdjacencyMaxRetries variable initialized [Value TBD/configurable]. Note that the SendMessage API specifies a total amount of time to wait rather than a number of retries and interval between retries. [Presumably this value should be used as some sort of default for the StateLifetime] Any NSLPdata provided by SendMessage is either sent with the GIMPS-query message (for Dmode with EstablishMA false) or queued for transmission when the Messagibg Association is established in Cmode. WaitResponse1 State entered while waiting for Message association setup to complete for CMODE connections. It is intended that the AdjacencyResponse timer is running in this state. If a GIMPS-response message with matching MRI is received then connection has been made to the adjacent downstream node. Transition is made to the EstablishedDownstreamMRS+MA state and a GIMPS-confirm message is sent back to the adjacent node. If the GIMPS-confirm message was received over an existing Messaging Association (Cmode response), this association is linked to the connection state. Otherwise (Dmode response) a new Messaging association is created and linked to the connection state, and a GIMPS-confirm message sent across the new association. In either case Message Routing State (MRS) is installed. The queued NSLPdata from the original SendMessage can be piggybacked onto the GIMPS-confirm message. Start MRSLifetimeTimer [Timeout: longer of any MRS lifetime set with SetStateLifetime and time to try delivering this message from SendMessage request?? This needs discussion] 3 If the AdjacencyResponse timer expires before a GIMPS-response message is received, the AdjacencyMaxRetries counter is decremented. If it is still positive, the GIMPSquery) is retransmitted, the AdjacencyResponse timer restarted and the state remains WaitResponse1. If the counter has decremented to zero, a MessageDelivery error notigfication is sent back to the relevant NSLP, the state transitions to IDLE and (?) the connection state is deleted. [Might be some point in keeping it around if ithe NSLP might retry but probably not.] If a SetStateLifetime request event s received for this connection, record the new state lifetime and continue in this state unless the new lifetime is 0 – in which case do same as MaxRetries = 0. [?Check if this is less than initial value of MaxRetries * AdjacnecyResponse initial timeout.. do something if it is maybe – error? Reduce MaxRetries?]. WaitResponse2 State entered while waiting for Message association setup to complete for DMODE connections. It is intended that the AdjacencyResponse timer is running in this state. If a GIMPS-response message with matching MRI (Dmode necessarily) is received then connection has been made to the adjacent downstream node. Transition is made to the EstablishedDownstreamMRS state and a GIMPS-confirm message is sent back to the adjacent node. If InstallMRS was requested the MRS is installed and start MRS. Start DownstreamRefreshTimer [Timeout TBD/configurable]. If the AdjacencyResponse timer expires before a GIMPS-response message is received, the AdjacencyMaxRetries counter is decremented. If it is still positive, the GIMPSquery) is retransmitted, the AdjacencyResponse timer restarted and the state remains WaitResponse1. If the counter has decremented to zero, a MessageDelivery error notigfication is sent back to the relevant NSLP, the state transitions to IDLE and (?) the connection state is deleted. [Might be some point in keeping it around if ithe NSLP might retry but probably not.] EstablishedDownstreamMRS+MA State entered when the downstream adjacency ha been confirmed and a Messaging Association has been setup (either Cmode or Dmode with EstablishMA true). Note that since this state uses a Messaging Association, the state timeout is tied to the messaging association as well as the MRS timeout. If either MRS times out of MA signals that it has timed out or there has been a routing refresh we go back to IDLE state. [If there is still a message pending to be sent need to start again… needs further discussion – don’t need for first version] In this state may receive new (downstream) SendMessage events [again upstream is an error] – send them and stay in same state – in principle could be Dmode or Cmode.. just send them on the MA anyway I assume. receive Upstream messages (Dmode or Cmode)– send to NSLP and continue in same state 4 receive downstream messages [BIG error… action?] EstablishedDownstreamMRS State when Downstream Adjacency has been confirmed for Dmode messages with no MA. 5