Interior Routing Protocols Interior routing protocols or interior gateway protocols (IGPs) are used to exchange routing information between routers within a single autonomous system. They are also used by routers which run exterior routing protocols to collect network-reachability information for the autonomous system. Note: The term interior routing protocol has no abbreviation in common use, so we shall use the abbreviation IGP as is usual in TCP/IP literature. The most widely used IGPs are: The Hello protocol. Routing Information Protocol. The Open Shortest Path First protocol. Before discussing these three protocols in detail, we shall look at two important groups of routing algorithm used in IGPs. Routing Algorithms In this section, we discuss the Vector-Distance and Link-State, Shortest Path First routing algorithms. Vector-Distance The term Vector-Distance refers to a class of algorithms that gateways use to update routing information. Each router begins with a set of routes for those networks or subnets to which it is directly attached, and possibly some additional routes to other networks or hosts if the network topology is such that the routing protocol will be unable to produce the desired routing correctly. This list is kept in a routing table, where each entry identifies a destination network or host and gives the ``distance'' to that network. The distance is called a metric and is typically measured in ``hops''. Periodically, each router sends a copy of its routing table to any other router it can reach directly. When a report arrives at router B from router A, B examines the set of destinations it receives and the distance to each. B will update its routing table if: A knows a shorter way to reach a destination. A lists a destination that B does not have in its table. A's distance to a destination, already routed through A from B, has changed. This kind of algorithm is easy to implement, but it has a number of disadvantages: When routes change rapidly, that is, a new connection appears or an old one fails, the routing topology may not stabilize to match the changed network topology because information propagates slowly from one router to another and while it is propagating, some routers will have incorrect routing information. Another disadvantage is that each router has to send a copy of its entire routing table to every neighbor at regular intervals. Of course, one can use longer intervals to reduce the network load but that introduces problems related to how well the network responds to changes in topology. Vector-distance algorithms using hop counts as a metric do not take account of the link speed or reliability. Such an algorithm will use a path with hop count 2 that crosses two slow-speed lines, instead of using a path with hop count 3 that crosses three tokenrings and may be substantially faster. The most difficult task in a vector-distance algorithm is to prevent instability. Different solutions are available: Counting to infinity Let us choose a value of 16 to represent infinity. Suppose a network becomes inaccessible; all the immediately neighboring routers time out and set the metric to that network to 16. We can consider that all the neighboring routers have a piece of hardware that connects them to the vanished network, with a cost of 16. Since that is the only connection to the vanished network, all the other routers in the system will converge to new routes that go through one of those routers with a direct but unavailable connection. Once convergence has happened, all the routers will have metrics of 16 for the vanished network. Since 16 indicates infinity, all routers then regard the network as unreachable. The question with vector distance algorithms is not will convergence occur but how long will it take? Let us consider the configuration shown in Figure - The Counting to Infinity Problem. Figure: The Counting to Infinity Problem - All links have a metric of 1 except for the indirect route from C to D which has a metric of 10. Let us consider only the routes from each gateway to the target network. Now, consider that the link from B to D fails. The routes should now adjust to use the link from C to D. The routing changes start when B notices that the route to D is no longer usable. For RIP this occurs when B does not receive a routing update on its link to D for 180 seconds. The following picture shows the metric to the target network, as it appears in the routing table of each gateway. Figure: The Counting to Infinity Problem The problem is that B can get rid of its route to D (using a timeout mechanism), but vestiges of that route persist in the system for a long time (time between iterations is 30 seconds using RIP). Initially, A and C still think they can reach D via B, so they keep sending updates listing metrics of 3. B will receive these updates and, in the next iteration, will claim that it can get to D via either A or C. Of course, it can't because the routes claimed by A and C (D reachable via B with a metric of 3) are now gone, but they have no way of knowing that yet. Even when they discover that their routes via B have gone away, they each think there is a route available via the other. Eventually the system will converge, when the direct link from C to D has a lower cost than the one received (by C) from B and A. The worst case is when a network becomes completely inaccessible from some part of the system: in that case, the metrics may increase slowly in a pattern like the one above until they finally reach ``infinity''. For this reason, the problem is called counting to infinity. Thus the choice of infinity is a trade off between network size and speed of convergence in case counting to infinity happens. This explains why we chose as low a value as 16 to represent infinity. 16 is the value used by RIP. The other solutions will be discussed within the RIP protocol (see Routing Information Protocol (RIP)). Link-State, Shortest Path First The growth in networking over the past few years has pushed the currently available Interior Gateway Protocols, which use vector-distance algorithms, past their limits. The primary alternative to vector-distance schemes is a class of protocols known as Link State, Shortest Path First. The important features of these routing protocols are: A set of physical networks is divided into a number of areas. All routers within an area have an identical database. Each router's database describes the complete topology (which routers are connected to which networks) of the routing domain. The topology of an area is represented with a database called a Link State Database describing all of the links that each of the routers in the area has. Each router uses its database to derive the set of optimum paths to all destinations from which it builds its routing table. The algorithm used to determine the optimum paths is called a Shortest Path First (SPF) algorithm. In general, a link state protocol works as follows. Each router periodically sends out a description of its connections (the state of its links) to its neighbors (routers are neighbors if they are connected to the same network). This description, called a Link State Advertisement (LSA), includes the configured cost of the connection. The LSA is flooded throughout the router's domain. Each router in the domain maintains an identical synchronized copy of a database composed of this link state information. This database describes both the topology of the router's domain and routes to networks outside of the domain such as routes to networks in other autonomous systems. Each router runs an algorithm on its topological database resulting in a shortest-path tree. This shortestpath tree contains the shortest path to every router and network the gateway can reach. From the shortest-path tree, the cost to the destination and the next hop to forward a datagram to is used to build the router's routing table. Link-state protocols, in comparison with vector-distance protocols, send out updates when there is news, and may send out regular updates as a way of ensuring neighbor routers that a connection is still active. More importantly, the information exchanged is the state of a router's links, not the contents of the routing table. This means that link-state algorithms use fewer network resources than their vector-distance counterparts, particularly when the routing is complex or the autonomous system is large. They are, however, compute-intensive. In return, users get faster response to network events, faster route convergence, and access to more advanced network services. The Hello Protocol This was used in the ``Fuzzball'' software for LSI/11 minicomputers, which were widely used in Internet experimentation. The Hello protocol is described in RFC 891 - DCN Local-Network Protocols. It is not an Internet standard. Note: OSPF (see Open Shortest Path First Protocol (OSPF) Version 2) includes a quite separate protocol for negotiation between routers which is also called the Hello protocol. The communication in the Hello protocol is via Hello messages which are carried via IP datagrams. Hello uses protocol number 63 (reserved for ``any local network''). The Hello protocol is significant partly because of its wide deployment during the early expansion of the Internet and partly because it provides an example of a vector-distance algorithm that does not use hop counts like RIP (see Routing Information Protocol Version 1 (RIP, RIP-1)) but, instead, network delays as a metric for the distance. A Distributed Computer Network (DCN) physical host is a PDP11compatible processor which supports a number of cooperating sequential processses, each of which is given a unique 8-bit identifier called its port ID. Every DCN host contains one or more internet processes, each of which supports a virtual host given a unique 8-bit identifier called its host ID. There is a one-to-one correspondence between internet addresses and host IDs. Each DCN physical host is identified by a host ID for the purpose of detecting loops in routing updates, which establish the minimum-delay paths between the virtual hosts. Each physical host contains two tables: Host Table This contains estimates of round-trip delay and logical-clock offset (that is, the difference between the logical clock of this host and the logical clock of the sender's host). It is indexed by the host number. The host table is maintained dynamically using updates generated by periodic (from 1 to 30 seconds) Hello messages. Net Table This contains an entry for every neighbor network that may be connected to the local network and certain other networks that are not neighbors. Each entry contains the network number, as well as the host number of the router (located on the local network) to that network. The Net table is fixed at configuration time for all hosts except those that support the GGP or EGP routing protocols. In these cases the Net table is updated as part of the routing operation. In addition, entries in either table can be changed by operator commands. The delay and offset estimates are updated by Hello messages exchanged on the links connecting physical neighbors. Here is the format of a Hello message: Figure: Hello Message Format Where: Checksum contains a checksum covering the fields indicated Date is the local host's date Time is the local host's time Timestamp used in round-trip calculation (see below) L Offset contains the offset of the block of entries of internet addresses used on the local network #hosts contains the number of entries from the host table that follows Delay n delay to reach host n Offset n offset from host n (difference between clocks) Let us consider the two main steps of the Hello protocol. Round-Trip Delay Calculation Periodically each host sends a Hello message to its neighbor on each of the communication links common to both of them. For each of these links the sender keeps a set of state variables, including a copy of the sourceaddress field of the last Hello message received. When constructing a Hello message the sender sets the destination-address field to this state variable and the source-address field to its own address. It then fills in the date and time fields from its clock and the time stamp from another state variable. It finally copies the delay and offset values from its host table into the message. Round-trip delay calculations are performed on the host receiving the Hello message. Each link has an internal state variable assigned, which is updated as each Hello message is received; this variable takes the value of the time field, minus the current time-of-day. When the next Hello message is transmitted, the value assigned to the time stamp field is computed as the low-order 16-bits of this variable minus the current timeof-day. The round trip delay is computed as the low-order 16-bits of the current time-of-day minus the value of the timestamp field. Host Updates When a Hello message arrives which results in a valid round trip-delay calculation, a host update process is performed. This consists of adding the round trip delay to each of the ``Delay n'' entries in the Hello message in turn and comparing each of these calculated delays to the delay field of the corresponding host table. Each entry is then updated according to the following rules: If the link connects to another host on the same network and the port ID of the link output process matches the port ID field of the entry, then update the entry. If the link connects to another host on the same network and the port ID of the link output process does not match the port ID field of the entry and the calculated delay is less than the host delay field of the host table by at least a specified switching threshold (currently 100 milliseconds), then update the entry. For example, if host A sends host B a Hello message, and if B's current delay to reach a given destination, D, is greater than the delay from A to D plus the delay from B to A, B changes its route and sends traffic to D via A. The purpose of the switching threshold is to avoid (together with minimum delay specification) unnecessary switching between links and transient loops which can occur due to normal variations in propagation delays. Please refer to RFC 891 for more details. Routing Information Protocol (RIP) There are two versions of RIP. Version 1 (RIP-1) is a widely deployed protocol with a number of known limitations. Version 2 (RIP-2) is an enhanced version designed to alleviate the limitations of RIP while being highly compatible with it. The term RIP is used to refer to Version 1, while RIP-2 refers to Version 2. Whenever the reader encounters the term RIP in TCP/IP literature, it is safe to assume that it is referring to Version 1 unless explicitly stated otherwise. We shall use this nomenclature in this section except when the two versions are being compared, when we shall use the term RIP-1 to avoid possible confusion. Routing Information Protocol Version 1 (RIP, RIP-1) RIP is a standard protocol (STD 34). Its status is elective. It is described in RFC 1058, although many RIP implementations pre-date this RFC by a number of years. RIP is generally implemented with a daemon named routed. RIP is also supported by gated daemons. RIP was based on the Xerox PUP and XNS routing protocols. It is widely used, as the code is incorporated in the routing code of Berkeley BSD UNIX which provides the basis for many UNIX implementations. RIP is a straightforward implementation of vector-distance routing for local networks. RIP communication uses UDP as a transport protocol, with port number 520 as the destination port (see User Datagram Protocol (UDP) for a description of UDP and ports). RIP operates in one of two modes: active (normally used by routers) and passive (normally used by hosts). The difference between the two is explained below. RIP messages are sent in UDP datagrams and each contains up to 25 pairs of numbers as shown in Figure - RIP Message. Figure: RIP Message - Between 1 and 25 routes may be listed in a RIP message. With 25 routes the message is 504 bytes long (25x20+4) which is the maximum size message that can be transmitted in a 512-byte UDP datagram. Command is 1 for a RIP request or 2 for a RIP reply. Version is 1. Address Family is 2 for IP addresses. IP address is the IP address for this routing entry: either a host or a subnet (in which case the host number is zero). Hop count metric is the number of hops to the destination. The hop count for a directly connected interface is 1, and each intermediate router increments it by 1 to a maximum of 15, with 16 indicating that no route exists to the destination. Both active and passive RIP participants listen to all broadcast messages and update their routing table according to the vector-distance algorithm described earlier. Basic Operation When RIP is started it sends a message to each of its neighbors (on well-known UDP port 520) asking for a copy of the neighbor's routing table. This message is a query (command set to 1) with an address family of 0 and a metric of 16. The neighboring routers return a copy of their routing tables. When RIP is in active mode it sends all or part of its routing table to all of its neighbor routers (by broadcasting and/or by sending it on any point-to-point links to its neighbors). This is done every 30 seconds. The routing table is sent as a reply (command is 2, even though it is unsolicited). When RIP discovers a metric has changed, it broadcasts the change to other routers. When RIP receives a reply, the message is validated and the local routing table is updated if necessary. To improve performance and reliability, RIP specifies that once a router (or host) learns a route from another router, it must keep that route until it learns of a better one (with a strictly lower cost). This prevents routes from oscillating between two or more equal cost paths. When RIP receives a request, other than one for the entire table, it is returned as the response with the metric for each entry set to the value from the local routing table. If no route exists in the local table, the metric is set to 16. RIP routes learned from other routers time out unless they are readvertised within 180 seconds (6 broadcast cycles). When a route times out, its metric is set to infinity, the invalidation of the route is broadcast to the router's neighbors, and 60 seconds later, the route is deleted from the local routing table. Limitations RIP is not designed to solve every possible routing problem. RFC 1720 (STD 1) describes these technical limitations of RIP as ``serious'' and the IETF is evaluating candidates for a new standard ``open'' protocol to replace RIP. Possible candidates include OSPF (see Open Shortest Path First Protocol (OSPF) Version 2) and OSI IS-IS (see OSI Intermediate System to Intermediate System (IS-IS)). However, RIP is widely deployed and therefore is unlikely to be completely replaced for some time. RIP has the following specific limitations: The maximum cost allowed in RIP is 16 which means that the network is unreachable. Thus RIP is inadequate for large networks (that is, those in which legitimate hop counts approach 16). RIP does not support variable length subnet masks (variable subnetting). There is no facility in a RIP message to specify a subnet mask associated with the IP address. RIP has no facilities to ensure that routing table updates come from authorized routers. It is an unsecure protocol. RIP only uses fixed metrics to compare alternative routes. It is not appropriate for situations where routes need to be chosen based on real-time parameters such as measured delay, reliability, or load. The protocol depends upon counting to infinity to resolve certain unusual situations. As described earlier (Vector-Distance), the resolution of a loop would require either much time (if the frequency of updates was limited) or much bandwidth (if updates were sent whenever changes were detected). As the size of the routing domain grows, the instability of the vector-distance algorithm in the face of changing topology becomes apparent. RIP specifies mechanisms to minimize the problems with counting to infinity (these are described below) which allows RIP to be used for larger routing domains, but eventually RIP will be unable to cope. There is no fixed upper limit, but the practical maximum depends upon the frequency of changes to the topology, the details of the network topology itself, and what is deemed as an acceptable maximum time for the routing topology to stabilize. Solving the counting to infinity problem is done by using the split horizon, poisoned reverse and triggered updates techniques. Split horizon with poisoned reverse Let's consider our example network (shown in Figure - The Counting to Infinity Problem) again. Figure: The Counting to Infinity Problem - All links have a metric of 1 except for the indirect route from C to D which has a metric of 10. As described in Vector-Distance the problem was caused by the fact that A and C are engaged in a pattern of mutual deception. Each claims to be able to reach D via the other. This can be prevented by being more careful about where information is sent. In particular, it is never useful to claim reachability for a destination network to the neighbor from which the route was learned (reverse routes). The split horizon with poisoned reverse scheme includes routes in updates sent to the router from which they were learned, but sets their metrics to infinity. If two routers have routes pointing at each other, advertising reverse routes with a metric of 16 will break the loop immediately. If the reverse routes are simply not advertised (this scheme is called simple split horizon), the erroneous routes will have to be eliminated by waiting for a timeout. Poisoned reverse does have a disadvantage: it increases the size of the routing messages. Triggered updates Split horizon with poisoned reverse will prevent any routing loop that involves only two gateways. However, it is still possible to end up with patterns in which three routers are engaged in mutual deception. For example, A may believe it has a route through B, B through C, and C through A. This cannot be solved using split horizon. This loop will only be resolved when the metric reaches infinity and the network or host involved is then declared unreachable. Triggered updates are an attempt to speed up this convergence. Whenever a router changes the metric for a route, it is required to send update messages almost immediately, even if it is not yet time for one of the regular update messages (RIP specifies a small time delay, between 1 and 5 seconds, in order to avoid having triggered updates generate excessive network traffic). Routing Information Protocol Version 2 (RIP-2) RIP-2 is a draft standard protocol. Its status is elective. It is described in RFC 1723. RIP-2 extends RIP-1. It is less powerful than other recent IGPs such as OSPF (see Open Shortest Path First Protocol (OSPF) Version 2) and ISIS (see OSI Intermediate System to Intermediate System (IS-IS)), but it has the advantages of easy implementation and lower overheads. The intention of RIP-2 is to provide a straightforward replacement for RIP which can be used on small to medium-sized networks, can be employed in the presence of variable subnetting (see Subnets) or supernetting (see Classless Inter-Domain Routing (CIDR)) and importantly, can interoperate with RIP-1. RIP-2 takes advantage of the fact that half of the bytes in a RIP-1 message are reserved (must be zero) and that the original RIP-1 specification was well designed with enhancements in mind, particularly in the use of the version field. One notable area where this is not the case is in the interpretation of the metric field. RIP-1 specifies it as being a value between 0 and 16 stored in a four-byte field. For compatibility, RIP-2 preserves this definition, meaning that it agrees with RIP-1 that 16 is to be interpreted as infinity, and wastes most of this field. Note: Neither RIP-1 nor RIP-2 are properly suited for use as an IGP in an AS where a value of 16 is too low to be regarded as infinity, because high values of infinity exacerbate the counting to infinity problem. The more sophisticated Link-State protocol used in OSPF and IS-IS provides a much better routing solution when the AS is large enough to have a legitimate hop count close to 16. Provided that a RIP-1 implementation obeys the specification in RFC 1058, RIP-2 can interoperate with RIP-1. The RIP message format is extended as shown in Figure - RIP-2 Message. Figure: RIP-2 Message - The first entry in the message may be an authentication entry, as shown here, or it may be a route as in a RIP-1 message. If the first entry is an authentication entry, only 24 routes may be included in a message; otherwise the maximum is 25 as in RIP-1. The fields in a RIP-2 message are the same as for a RIP-1 message except as follows: Version Is 2. This tells RIP-1 routers to ignore the fields designated as ``must be zero'' (if the value is 1, RIP-1 routers are required to discard messages with non-zero values in these fields since the messages originate with a router claiming to be RIP-1-compliant but sending non-RIP-1 messages). Address Family May be X'FFFF' in the first entry only, indicating that this entry is an authentication entry. Authentication Type Defines how the remaining 16 bytes are to be used. The only defined types are 0 indicating no authentication and 2 indicating that the field contains password data. Authentication Data The password is 16 bytes, plain text ASCII, left adjusted and padded with ASCII NULLs (X'00'). Route Tag Is a field intended for communicating information about the origin of the route information. It is intended for interoperation between RIP and other routing protocols. RIP-2 implementations must preserve this tag, but RIP-2 does not further specify how it is to be used. Subnet Mask The subnet mask associated with the subnet referred to by this entry. Next Hop A recommendation about the next hop that the router should use to send datagrams to the subnet or host given in this entry. To ensure safe interoperation with RIP, RFC 1723 specifies the following restrictions for RIP-2 routers sending over a network interface where a RIP-1 router may hear and operate on the RIP messages. 1. Information internal to one network must never be advertised into another network. 2. Information about a more specific subnet may not be advertised where RIP-1 routers would consider it a host route. 3. Supernet routes (routes with a subnet mask shorter than the natural or ``unsubnetted'' network mask) must not be advertised where they could be misinterpreted by RIP-1 routers. RIP-2 also supports the use of multicasting rather than simple broadcasting. This can reduce the load on hosts which are not listening for RIP-2 messages. This option is configurable for each interface to ensure optimum use of RIP-2 facilities when a router connects mixed RIP-1/RIP-2 subnets to RIP-2-only subnets. Similarly, the use of authentication in mixed environments can be configured to suit local requirements. RIP-2 is implemented in recent versions of the gated daemon, often termed gated Version 3. Since the draft standard is new at the time of writing, many implementations will comply with the earlier version described in RFC 1388. Such implementations will interoperate with those adhering to RFC 1723. For more information on RIP-2, see: RFC 1721 - RIP Version 2 Protocol Analysis RFC 1722 - RIP Version 2 Protocol Applicability Statement RFC 1723 - RIP Version 2 - Carrying Additional Information RFC 1724 - RIP Version 2 MIB Extension