CS 640 Introduction to Computer Networks Fall 1999 Professor Landweber Copyright 1999, Lawrence H. Landweber Copyright 1996, Larry Peterson (slides marked with @ in lower left corner and those with white background with the exception of the OSPF and TCP dialog slides which are from IETF RFCs) Computer Communications Network (network, computer network) Distributed system with: • Transmission media (coaxial cable, twisted pair (cat3, cat5), glass fibre, wireless-ether) • Switching elements (switch, router, node, hub, bridge, gateway …) • Hosts (computer, server, workstation endsystem, node) • Protocols (physical, link, network, transport, session, presentation, application) Communications Medium • Connects switching elements to each other and to hosts • Transports digital data stream • Link – point-to-point - connects single host to switching elt or switching elt to another switching elt – multipoint or broadcast - connects one or more hosts and switching elts to each other Units - Orders of Magnitude • kilo (thousand), mega (million), giga (billion), tera (trillion… thousand billion), peta (10**15), exa (10**18) • Till early 1980s high bandwidth meant 56Kbps - today Gbps is high speed • A commercial fibre can transmit 800Gbps (80 data streams at 10Gbps - DWDM Dense Wave Division Multiplexing) • Research state of the art is 3.2Tbps (80 data streams at 40Gbps) for a single fibre Switching Elements • Responsible for the transport of data across a network - routing • May support congestion handling or quality of service mechanisms • Examples – routers - IP – switches - ATM or local area networks (e.g., Ethernet-802.3, token ring) – bridges, hubs - local area networks Network Types - Direct Connectivity • point-to-point • mulitpoint /broadcast Ethernet Token Ring Also satellite/radio Network Types - Indirect Connectivity link • switched networks host • internetworks backbone switching element Protocols A protocol is a set of conventions that specifies the rules or parameters for communication between two entities (hosts, processes, switching elts, devices, etc). Protocols specify conventions for: • The type, semantics, and format of information to be conveyed between entities • Establishing/closing connections or sessions • Routing across a network or internetwork • Flow control: speed matching between entities • Reliability: error detection, handling, correction Protocols also specify • Data and object types and their representation • Session characteristics (e.g., checkpointing, encryption-security, compression) • Congestion detection and control • Quality of service parameters and handling The Basic Idea: Packet Switching Data networks use packet switching instead of circuit switching, the method traditionally used in the telephone system. Packet switching was first invented in the 1960s as a method for building a survivable communications system in a battlefield environment. Circuit Switching - Telephone System Model • Dedicated bandwidth (capacity) reserved along fixed path from caller to called for duration of conversation (signalling -setup) • Several conversations can share a link…but each is allocated fixed fraction of link bw • Bandwidth not shared so during periods of silence, it is wasted • If dedicated bandwidth not available for a call, then the call not completed (blocking) Circuit Switching Circuit Switching Circuit Switching Circuit Switching Path is fixed - all data follows this path Bandwidth / capacity reserved on all links Data flows continuously - NO PACKETS BW unavailable --> BLOCKING Packet Switching - Data Network Model • Each data stream divided into chunks, called packets; each packet sent separately into the network; packets may follow diff paths from source to dest • Each packet has a header and possibly a trailer that contain information as to its source and destination 5 4 Packet Header 3 Payload 2 1 Packet Switching - Data Network Model • Bandwidth not reserved but is shared between data streams • Complete packet arrives at sw elt before it is forwarded to next sw elt; store / forward • Store and forward systems must deal with allocation of buffers in sw elts and with congestion that may result from too many packets arriving in too short a time • If a buffer not available when a packet arrives at sw elt, a packet is discarded (may not be one that just arrived) Packet Switching - Datagrams and Virtual Circuits • Examples of packet switching ( i.e., S&F, buffers, congestion, no link bw reserved) • Virtual Circuit – must set up path - vc (conn estab - signalling) – path fixed for all packets of a data stream – after setup, header need only specify vc/path • Datagram – packets for data stream may travel on diff paths – headers must include full address of dest Multiplexing • Share network resources (sw elt and links) among multiple users • TDM - time div multiplexing (Alohanet, satellite) • FDM - frequency div multiplexing (CATV) @ TDM • Slots can be statically or dynamically assigned; assignment decision can be centralized or distributed • Circuit switching can use TDM - e.g., T1 • Statistical TDM - packet switching – packets from different sources interleaved – buffer contending packets – order of service may be FIFO or other A B C A A C B TDM - Circuit Sw. - T1 8 bits 7 data 1 parity 125 microseconds - each with 8 bits (digital) for 24 calls/conversations in parallel Each call gets 8 * 8000 bits per second or 64Kbps Total BW is 64Kbps * 24 + some admin bits or about 1.53Mbps T3 or DS3 = 28 T1 or ~45Mbps E1= 2Mbps What Can Go Wrong? • Signal errors - noise - leads to garbled bit(s) – CRC, checksum • Packet loss due to congestion – sequence numbers, acks, timers, retransmission (voice vs. file transfer) • Packets delayed, garbled, duplicated • Link or sw elt failure • Depending on application, protocols may have to compensate for or correct the above @ Performance - Bandwidth • Amount of data that can be transmitted per unit time (Kbps, Gbps, etc) • Related to bit width / coding • BW of link vs. end-end throughput • Throughput determined by @ – – – – bw of links on path sw elts proc speed congestion-queuing time transmission rate of sender Performance - Latency Delay • Time to send message or single bit from A to B • Components – propagation - signal speed in medium - length of links – transmission / processing time at sender and sw elts – queuing time at sw elts and at entrance to network • Speed of light – 3.0 * (10**8) meters / second in a vacuum or ~1 foot per nanosecond (billionth of a second) – 2.0 to 2.7 * (10**8) meters per second in various media – latency in LAN vs. terrestrial WAN vs. satellite @ Latency vs. Bandwidth • Independent • Bandwidth - Delay Product – amount of data that the network can store between source and destination – if RTT used for delay, then amount that can be sent before sender can receive an ack – example: 100ms RTT and 45Mbps: bw-delay product is 560Kbytes • Jitter - difference in time between arrival of successive packets in a data stream Protocol Architecture - Layering • Use layers to localize protocol functionality • Protocol that implements a layer has interfaces to: – higher level protocol and lower layer protocol • services to / from (service interface) • encapsulation / decapsulation • *** connection oriented or connectionless – peer protocol • messages exchanged with peer protocol entity • control information in header / trailer Interfaces service interface Protocol Entity Protocol Entity peer-to-peer interface service interface Connection-oriented Service Interface (CO) • Users of peer protocol entities can assume there a direct physical connection or wire between them (connection setup required) • Data sent by one user is received by the peer user in same order as it was sent by first user • Definition of CO Service Interface is indep of its implementation in network (vc/dg) • May or may not be reliable (TCP vs. ATM) Connectionless Service Interface (CL) • No notion of a connection between users of peer protocol entities • CL service interface promises best effort delivery of data from one user to the other, data may or may not reach the peer user and may or may not be delivered in order sent • AGAIN NOTE that this says nothing about the implementation method within the network More on Service Interfaces • CO, CL refer to service interfaces NOT to the protocols used to implement the service • The spec of a service interface does not imply anything about how the interface is implemented by a protocol • It also does not imply anything about how data is transported across the network • THIS DISTINCTION IS OFTEN BLURRED application presentation OSI Open Systems Interconnection Model 7 Layers session sw elts in network transport network link physical end system end system OSI Model • Model of protocol functionality (CCITT and ISO - Int. Org. for Standardization) • Just a model … does not specify details of protocols and interfaces or their implementation • Different protocols may conform to the model but not interoperate • Useful as a way of thinking about protocols - a taxonomy of protocol functionality Layers • Physical - what happens on wire - coding of bits, electrical or optical characteristics, config of pins, connectors and other hardware components • (Data) Link - handles transfer of data between two or more entities connected by a single link or hop. Service interface may be CO or CL. May or may not be reliable. (X.25 Level 2, IEEE 802.x) Layers • Network - Transfer of data across a network. Includes switching (dg, vc), routing, congestion avoidance and control, network wide addressing. (IP addresses, IP/ICMP, X.25 Level 3, ATM, BGP, OSPF) • Transport - Process to process transfer of data. Connectionless (UDP) or connectionoriented service (TCP) interface. Layers • Session - conventions for sessions. E.g., checkpointing, half or full duplex, security. • Presentation - syntactic conventions for communication between apps, encoding rules and file formats. • Application - includes session and presentation in the Internet protocol architecture Internet Protocol Architecture (IETF) FTP SMTP Telnet TCP HTTP UDP RTP IP 802.2 X.25 802.3 802.5 OSPF Link Layer • Two nodes (hosts, sw elts) connected by a single link • Framing and packet format • CO or CL service interface • Reliability • Flow control Link Layer - Framing • Link layer packet = frame • Problem: how to recognize beginning and end of frame? • Three methods – byte counting (DDCMP) – byte stuffing/oriented (BISYNCH, IMP-IMP) – bit stuffing/oriented (X.25 Level 2, 802.x) Byte Counting • Include count field giving number of bytes in frame • Problem: what if byte field is corrupted? • Might be able to catch with CRC… but only if max length assumed SYN…Count Header Body @ CRC Byte Stuffing/Oriented • Special characters used for control • Must distinguish special characters when used for control from when in data • Use special data link escape character, DLE • E.g., STX = start of text… use DLE STX when STX is in data (stuff DLE before STX if it appears in data (also for SOH, ETX,...) • If DLE in data, send DLE DLE Byte Stuffing Problems • Dependence on fixed character set • Must examine every byte of data on sending and receiving (insert / remove DLE) • Was used widely in IBM bisynch (at 9600bps) Bit Stuffing/Oriented • Frame beginning and end marked by special bit string …. usually 01111110 • If 5 1’s in data to be sent, sender inserts 0 • If receiver sees 5 1’s check next bit(s) – if 0, remove it (stuffed bit) – if 10, end of frame marker (01111110) – if 11, error (7 1’s cannot be in data) • Can be done fast in hardware FLAG Header Body … CRC FLAG Error Detection Cyclic Redundancy Check (CRC) • Bitwise binary arithmetic 0+0=1+1= 0 0+1=1+0= 1 • Addition and subtraction are same operation • Example 11011000 +/- 10001001 01010001 CRC • Represent n bit message as an n-1 degree polynomial • Message M = 101101101 M(x) = x**8 + x**6 + x**5 + x**3 + x**2 + 1 • Special generator polynomial, C(x), degree k • C(x) chosen to have special properties - this is key to the method working well CRC - Method • • • • • • • Pad M with k 0s yielding M’ (k = deg C(x)) Represent M’ as M’(x) R(x) = rem ( M’(x) / C(x)) P = M’ + R (P(x) = M’(x) + R(x)) Transmit P (P(x)) At dest, receive P(x) + E(x) Compute rem (P(x) + E(x)) / C(x) – If 0, either message rec ok or C(x) divides E(x) – If not 0, then error in transmission Shift Register • Consider C(x) = x**3 + x**2 + 1 • M(x) = X**7 + x**4 + x**3 + x • M’ = 10011010000 1 X**3 + 0 0 X**2 X**1 + 1 1010000 1 The shift register performs long division using bitwise operations - M’ / C Shift Register • Consider C(x) = x**6 + x**5 + x**3 + x +1 • M’ = 10110110101000000 0101000000 1 X**6 + 0 1 X**5 X**4 + 1 0 X**3 X**2 + 1 + 1 x The shift register performs long division using bitwise operations - M’ / C 1 Properties of C(x): Should not divide E(x) for common errors • Single bit errors – non-zero x**k, x**0 terms • Double (2 consecutive) bit errors – C(x) has factor with at least 3 terms • Odd number errors: contains factor (x+1) – consider E(x) with odd no. terms - so E(1) = 1 – but if (x+1) is factor of E, then E(1) = 0 – so if odd number of errors, E cannot have (x+1) factor More on C(x) • Burst error - at least first and last bits of burst are in error – if length of burst is <= k, the degree of C(x), then burst error is detected – C(x) with degree k cannot divide E(x) with degree less than k • High probability of detecting burst errors of length greater than k… but not 100% Reliable Transmission • When CRC detects error, frame discarded • Corrected by link layer retransmission ARQ (Automatic Repeat reQuest) – mechanisms include: frame sequence numbers, acknowledgements (acks), negative acks (naks), selective acks (sacks), timers / timeout – often implemented by sliding window • Also used by transport layer - but link impl simpler - wire connects protocol entities Sliding Window - Functions • Put data in order for delivery to user of service interface • Recovery from losses – retransmission • Flow control - prevent sender from overrunning receiver’s buffers – link - SWS usually fixed, RWS often = 1 – credits - signal from receiver to sender - used with transport protocols Timer / Retransmission Strategies • Timers for link layer – 1 timer for each frame – 1 timer for link • Retransmission on timeout – retransmit frame whose timer goes off – retransmit earliest unacked frame – retransmit all frames from earliest unacked frame - go back n • Retransmit on nak or sack Sliding Window Examples • Principle SWS + RWS <= MaxSeqNum (size seq # space) • MaxSeqNum = 8, SWS = 4, RWS = 4 Sender 0,…,3 Receiver 0,…,3 accepted lost resend old 0,…,3 ack 0,…,3, NFE = 4 all rejected-no overlap with expected frames Problem occurs if resent frame overlaps with expected new frame. Not possible if above principle observed Sliding Window Examples Sender 0,…,4 Receiver 0,…,3 accepted, disc 4 lost resend old 0,…,4 acc 4, disc 0,1,2,3 lost resend old 0,…,4 ack 0,…,3, NFE = 4 ack 4, will acc 5,6,7,0, NFE = 4 acc 0, as new but is retrans Problem occurs because resent frame overlaps with expected new frame. Bit Oriented Data Link Protocols • ANSI Standard - ADCCP, Advanced Data Communications Control Procedures • Choices of functionality and operation • Designed to operate – – – – over point-point or multi-point links in 2-way alternate or 2-way simultaneous mode over terrestrial and satellite links when communication is between logical equals or non-equals Station Types • Primary – flow control – error recovery (other than retransmission) – initialize and terminate link • Secondary – simpler than primary commands secondary primary responses Data Transfer Modes • NRM - Normal Response Mode – one primary, multiple secondary stations – polled multipoint, secondary often a device • ARM - Asynchronous Response Mode – one primary, one secondary station – secondary can send at will as responses – point-point - not good for multipoint • ABM - Asynchronous Balanced Mode – two combined stations (both issue commands and responses) Frame Format F A C Info FCS F F = 01111110 (flag sequence) A = address of receiver if command sender if response all 1s = broadcast C = 8 or 16 control bits - type of frame + functions FCS = CRC Control Fields • All can be commands or responses • Information transfer – 0 N(S) P/F N(R) • Supervisory (4) – 10 SS P/F N(R) • Unnumbered (32) – 11 MM P/F MMM • N(S) and N(R) are 3 or 7 bits; P/F is 1 bit P/F Bit • NRM - P=1, poll secondary, secondary sets F=1 in last response frame • ARM/ABM - P=1 is sent by primary asking secondary to set F=1 in next response • P/F bit used for synchronization The Sliding Window N(R), N(S) • N(S) is the sequence number of the information frame that contains it • N(R) acks all info frames with seq numbers < N(R); i.e., N(R) = NFE • Sequence number space is 7 (3 bits) or 127 (7 bits) • With X.25 L2, receive window is always 1 so frames only accepted in order Miscellaneous • To abort a frame (so it will be recognized as not correct) – append incorrect FCS or – transmit 7 1s • 15 ones transmitted means that station is giving up control of the link Supervisory Frames (4) • Contains N(R), ack to N(R)-1 • RR - receive ready - ready to rec I frame – clear busy condition (RNR) • RNR - receive not ready - busy condition – received frames may be discarded • REJ - reject - ask retrans from N(R) – clear busy condition – cleared when frame with N(S) = N(R) of rejected frame is received • SREJ - selective reject - ask retrans of N(R) Selected Unnumbered Frames • SABM - set asynchronous balanced mode link establishment (also SARM, SNRM) • DISC - disconnect • UA - unnumbered ack, ack unnumbered command • DM - disconnect mode - request set mode command, reject set mode • FRMR - frame reject - bad control field or bad I frame length or invalid N(R) Link Initialization SABM P=! DTE DCE UA (yes) Or DM (no) F=1 DM F=0 SABM Timers / Constants • T1 = frame retransmission timer • T2 = may delay timer • T3 = idle channel timer - inform protocol user • N1 = Max I frame size with headers • N2 = Max # retransmission attempts • K = max number of outstanding frames (<= 7 or <= 127) - pipelining Unbalanced Normal Class (NRM) UNC 3,4 Commands Primary I RR RNR SREJ * SNRM DISC UI *Receive window can be > 1 Responses Secondary I RR RNR SREJ * UA DM FRMR UI Primary B, SNRM,P B,RR0,P B,I00 B,I10 B,I20,P B,RNR3 B,I33,P Secondary B,UA,F B,RR0,F B,I03 B,I13 B,I23,F B,I34 B,I44,F B,RNR5,P B,RR4,F B,DISC,P B,UA,F Balanced Asynchronous Class (ABM) BAC 2,8 Commands Primary I RR RNR REJ * SABME DISC Responses Secondary RR RNR REJ * UA DM FRMR *Receive window = 1, only receive in order, REJ requests resend of ALL sent, beginning with lost one Local Area Networks • Standardized by IEEE as 802.x • A variety of Media Access Control (MAC) protocols and associated topologies – – – – – 802.3, Ethernet, CSMA/CD (3u, 3z) 802.4, Token Bus 802.5, Token Ring 802.11, Wireless DQDB, etc • With 802.2, comprises link layer 802.x Model IP or other network protocol Link Layer LLC Logical Link Control - 802.2 MAC Media Access Control - 802.3, 802.4, ... PHY Physical Layer - 802.3,802.4,... medium Hub Ethernet Token Token Ring FDDI Collision Domain Computer 802.2 - Logical Link Control • PDU - Protocol Data Unit DSAP SSAP Control Information Addresses are 8 bits - all 1s = broadcast DSAP = Destination Service Access Point, address leftmost bit for Group or Individual SSAP = Source Address leftmost bit for Command or Response (a la X.25 L2) Control = 16 bits if seq no included, else 8 bits LLC Types / Classes • There are two types of operation – Type 1 - provides connectionless service • no flow control or error control • no connection setup – Type 2 - provides a connection-oriented service • similar to X.25 L2 (BAC 2,8) • I frames can be responses • There are also 2 classes of procedure – Class 1 - includes Type 1 only – Class 2 - includes Type 1 and Type 2 Commands/Responses Type 1 Commands UI XID TEST Type 2 I RR RNR REJ SABME DISC Responses XID TEST I RR RNR REJ UA DM FRMR Details - Type 1 • • • • Every LLC on a LAN supports Type 1 UI frames used for data transfer Does not use XID TEST used by one entity to see if other still is “alive” Details - Type 2 • I frames used for data transfer • Seq number space is 7 bits • Receive window is1, frames only accepted in order • Exchange Information (XID) is used by receiver to tell sender the max amount, k, that the sender can send beyond the most most recently received ack (N(R)). This bounds pipelining. Ethernet - 802.3 • Ethernet developed by Xerox in mid 1970s • Basic ideas from AlohaNet packet radio project • Ethernet standardized by Xerox, DEC, Intel in 1978 • IEEE later standardized as 802.3 - at MAC layer differs in one header field from Ethernet • 10, 100, 1000 Mbps CSMA/CD • Carrier sense - stations sense whether there is traffic on the wire • Multiple access - more than one station can simultaneously be connected and contend for the right to send • Collision detection - individual stations detect when there has been a collision (transmissions from two stations interfere with each other) and react appropriately Classical Ethernet (Thick-Net) • Standardized as 802.3, 10Base5 • Coaxial cable for medium • Parameters - max – segment length = 500m, 2 repeaters on path – length = 1500m, 1024 nodes • Also – 10Base2 (thin-net, 200m) – 10BaseT (twisted-pair, star config, 100m) – 10BaseF (fiber, 1-2KM, FP,FB, FL) CD 4 Repeater Multi-Port Bridge CD 1 Repeater=Hub Bridge=Switch CD=Collision Domain Repeater CD 2 CD 3 Repeater Each CD includes a MP Bridge Port 802.3 Frame Format Preamble-7 StartDel-1 DstAddr-6 SrcAddr-6 Length-2 Payload FCS-4 StartDelimiter = 10101011 Dst and Src Addr = 6 bytes from flat address space first bit is individual/group Length = length of LLC packet in payload max is 1500 bytes Ethernet uses this field to specify upper layer protocol, values are over 1500 *802.3 uses 802.2 to select upper level protocol Payload is LLC packet plus PAD (min payload is 46 bytes) so min packet length is 64 bytes (without preamble, and start delimiter LLC TransDataEncaps RecDataDecaps Framing Addressing (SRC, DST) Error detection (FCS) M RecMediaAccess A TransMediaAccess Management C Management Medium allocation (collision avoidance) Contention Resolution (collision handling) TransDataEncoding RecDataDecoding Bit transmission/reception P Collision detection L Carrier sense S Preamble generation/detection Transmission Without Contention • Client layer (LLC) sends transmission request to MAC • Construct frame, compute FCS • Monitor carrier sense signal – if channel clear, initiate frane transmission after interframe delay (9.6 us) - send bits to PLS • Transmit encoded preamble, encode and transmit bits • Listen for collision Reception Without Contention • Synchronize with preamble, convert signal to binary • Set carrier sense while receiving bits • Check address - if for this station or station is in promiscuous mode, unpack frame, pass data to client • Compute FCS and send status code to client Collision Handling on Sending • If collision detect signal turned on while transmitting • Stop sending frame and transmit jam signal (32 bits, special pattern) to insure collision is detected by all stations • Schedule retransmission after back-off time Collision Handling on Reciving • Smaller than minimum size packet is detected and discarded (jam detected) • Terminated transmitted frame plus jam must be smaller than minimum size packet Retransmission • When station detects collision on transmitting, backoff/retransmission is scheduled as follows • For n th retransmission, randomly choose r in the range – 0 < = r < 2**k, where k=min(10,n) – wait r slot times before retransmission where a slot time is 512 bit times (64 * 8) or 51.2us for 10mbps • Truncated exponential backoff Collision Detection • To insure collision detection, enough bits must be sent so that if other sender is maximally distant, its first byte arrives while first sender is still sending • Consider 10Base5 – max length = 1500m – sender must continue sending for number of bit times necessary for data to travel 1500 * 2 = 3000m – at 10**7 bits/sec, assuming 2/3 speed of light in medium, each bit occupies 20m so 150 bits must be sent to deal with propagation time, plus additional bits to deal with other delays, but min packet size =512 bits – the jam is a special 32 bit pattern that allows all stations to learn that that there has been a collision Higher Bandwidth 802.3 • 802.3u, 802.3z standardize 100, 1000Mbps • Bridges (switches) can be connected to collision domains with different bandwidths • To insure collision detection must maintain relationship between – – – – 802.3 bandwidth minimum frame size length of combined media e.g., higher bandwidth, same min frame size, must have smaller length Parameters Parameters 10Mbps 100Mbps 1000Mbps slotTime interFrameGap jamSize maxFrameSize minFrameSize extendSize 512 bT 9.6us 32 1518 B 64 B 0B 512 bT 9.6us 32 1518 B 64 B 0B 4096 bT .096us 32 1518 B 64 B 448 B CD 4 (shared 10Mbps) CD 3 Repeater Multi-Port Bridge Repeater Repeater CD2 (dedicated 100Mbps) CD 1(shared 100Mbps) 1Gbps (dedicated) CD1 Multi-Port Bridge Multi-Port Bridge Repeater 100Mbps (shared) CD4 Repeater 100Mbps (shared) CD3 1Gbps (dedicated) CD2 Multi-Port Bridge DSL - Digital Subscriber Line • Digital copper loop transmission technology and associated equipment • Connection between subscriber and telco central office or concentrator - local loop access network • Achieves broadband speed (up to Mbps) over twisted pair Terminology • ATU - ADSL (Asynchronous Digital Subscriber Line) Transceiver Unit • CSU - Channel Service Unit • DSU - Digital Service Unit • CPE - Customer Premise Equipment Current Telco Infrastructure • Terminology – – – – – – ILEC - Incumbent Local Exchange Carrier CLEC - Competitive Local Exchange Carrier PTO - Public Telephone Operator CO - Central Office DLC/RT - Digital Loop Carrier/Remote Term MDF - Main Distribution Frame - where local loops terminate – DSLAM - DSL Access Multiplexer The Access Network • Typically, cable bundles with up to thousands of twisted wire pairs each of which terminates at a customer premise • Long loops – signals dissipate/weaken - lower signal to noise – solutions for loops > 18K feet • loading coils - every 6K feet, amplify signal, not compatible with DSL • DLC - concentrator, terminate twisted pair and backhaul to CO with T1/E1, fiber or copper, terminate DSL which needs end-end twisted pair Transmission Frequency • Higher frequency enables greater BW • POTS - 0 to 3600 Hertz (cycles per sec) – can also support modem up to 56Kbps (analog) • DSL uses higher frequencies • Problems – attenuation - dissipation of signal as it traverses copper wire – crosstalk - interference between signals on different copper wires Attenuation • Higher frequencies attenuate energy faster so shorter loop distance possible • Solution – use lower resistance wire - thicker gauge – but telcos have used thinnest possible wire because thicker wire more expensive to buy/install Crosstalk • Electrical energy transmitted on wire radiates energy to adjacent wires in bundle • Interferes more with similar frequencies • Less of a problem for lower frequencies, e.g., those used with POTS • Solutions for DSL – two pair instead of one pair so lower freq can be used to get desired BW – use different frequencies upstream/downstream, lower upstream because more crosstalk at CO end - many lines converge Some DSL Types • HDSL - High Bit Rate DSL, 2 pair, T1/E1, symmetric • SDSL - Symmetric DSL, 1 pair, multiple data rates to T1/E1 • ADSL - Asymmetric DSL, 1 pair, up to 6Mbps • RADSL - Rate Adaptive DSL, 1 pair, up to 12Mbps, asymmetric or symmetric Issues • Frequencies to use for up/down stream • Desired max distance - subscriber/DLC, about 18K feet is usual upper bound • Coding of bits • NON ISSUE - compatibility with other DSL systems - DSL is point-point and single organization controls both ends Network Layer A variety of different types of protocols • Access / service interface – connection-oriented (X.25 Level 3, ATM) or connectionless (IP) • Within network – switching method - DG (IP), VC (ATM) • Routing - OSPF / BGP (IP) • Congestion detection / handling - RED (IP) • QoS - TM4.0 (ATM), RSVP-DiffServ (IP) Routing • Problem: Find path from a source to a destination through a network (internet) • Possible path criteria – – – – – – – correctness - packets sent to correct destination simplicity stability - converge to equilibrium) robustness - operates in presence of failures fairness - serves all users - can prioritize optimality - minimize cost of path QoS - paths depend on app requirements Cost of Link • Capacity / bandwidth • Type - satellite / land • Error rate • Delay • Security Cost of path = sum of costs of links on path Some Issues for Routing Protocols • Who decides route – centralized - Routing or Network Control Center (RCC or NCC) – distributed - individual sw elts • When are routes changed – fixed or static or non-adaptive - at system startup time by administrator – adaptive or dynamic - with each DG, with each VC, weekly, ?? More Issues for Routing Protocols • Changes based on – measurements or estimate of traffic over some period – network topology - links, sw elts up/down, added/removed • Algorithm used – distance vector (centralized or distributed) – link state (distributed) Process for Changing Routes • Determine relevant parameters - topology, delay, queue lengths, traffic intensity • Send information to place(s) making decisions (sw elts or RCC/NCC) • Place receiving info computes routing tables • Routing tables distributed to sw elts (if NCC) Distance Vector Routing • Each sw elt maintains set of triples – (Desination, Cost, NextHop) • Each sw elt sends updates to (receives updates from) directly connected neighbors – periodically (on the order of several seconds) – when its table changes (triggered update) • Update is list of pairs - (Destination, Cost) • Update local table if receive “better” route • Refresh existing routes; delete if time out 1 D A 3 B 1 2 F 8 E 1 Database at C A B C D E F -3B -1D --- 3A -8C 1D --- -8B --4E -- 1A 1B ---2F --4C --1F ---2D 1E -- 4 Destination A B C D E F Distributed algorithm, each column at different sw elt Centralized algorithm, entire table at RCC 1 D 2 F A 3 B 1 8 E 1 Database at C A B C D E F -3B -1D --- 2D -8C 1D 12C 3D -8B --4E -- 1A 1B ---2F -12C 4C 3F -1F ---2D 1E -- 4 Destination A B C D E F Changes in B, E tables after one exchange of routing infomation Example B C A D E F @ G Routing table at node B Dest Cost NextHop A 1 A C 1 C D 2 C E 2 A F 2 A G 3 A Example - Recovery from Link Failure B C A D E F @ G F detects that link to G has failed F sets distance to G to inf, sends update to A A sets distance to G to inf since it uses F to reach G A receives update from C with 2 hop path to G A sets distance to G to 3 and sends update to F F decides it can reach G in 4 hops via A Looping Problem B C A D E F @ G Link from A to E fails A advertises distance of infinity to E B and C advertise distance of 2 to E B decides it can reach E in3 hops, advertises this to A A decides it can reach E in 4 hops, advertises this toC C decides it can reach E in 5 hops A, B, C form a routing loop 1 A 1 B 1 C D 1. Destination D 2. Cost from A 3B (cost to D from A is 3, first hop is B) B 2C C 1D 3. C-D fails (second column of 4) 4. Successive table entries for destination D from node A entries 3B 3B 3B 5B 5B 7B 7B… B entries 2C 2C 4A 4A 6A 6A 8A… C entries 1D inf 3B 5B 5B 7B 7B… Eventually, algorithm converges but only after “count to infinity” Problem: path to D from B uses B but B is not aware of this Solution - Split Horizon • If sw elt W learned route to X from Y, W does not tell Y about its route to X • Modification: split horizon with poisoned reverse - use metric infinity in update for routes that have ceased to exist – advantage - convergence can be faster • Can also speed up convergence by triggering update to neighbors when a sw elt’s routing database changes With Split Horizon A B C D Cost From A B C 3B 2C 1D 3B 2C inf 3B inf inf inf inf inf Split Horizon: do not report route to destination to the neighbor from which the route was learned C Consider A 1 B 1 D E 1 1. Destination E 2. Cost from A 3B (cost to D is 3, first hop is B) B 2D C 3B 3. B-D fails 4. Successive table entries for destination E from node A entries 3B 3B 4C 5C 6C… B entries 2D inf 4C 5C 6C… C entries 3B 3B 4A 5A 6A… Eventually, algorithm converges - after “count to infinity” Split horizon does not help! C A B D E Split horizon does not help Successive table entries for destination E from nodes A B C 3B 2D 3B 3B inf 3B 4C inf 4A inf 5A inf 6B inf inf Solution - Hold Down • When a route to a destination is lost by a sw elt, it does not accept a new route to that destination for a certain amount of time time based on max possible path loop • Difficulty is determining hold down period. Too short may lead to (incorrect) acceptance of new routes. Too long may lead to loss of data enroute to desination C A B D E With sufficient hold down Successive table entries for destination E from nodes A B C 3B 2D 3B 3B inf 3B inf inf inf Link State • Each sw elt sends information on directly connected links to ALL sw elts in the AD • Contrast with distance vector - entire routing table sent just to neighbors • Link state packet - packet used to carry info • Reliable flooding - hop-hop reliable (optional) • Shortest Path First Algorithm (Dijkstra) used to compute routes Link State Packet - LSP • Routing protocol packet - on Internet encapsulated in UDP DG or TCP segment • Includes – – – – @ ID of sw elt that created LSP cost of link to each directly connected neighbor sequence number (SEQNO) time-to-live (TTL) for LSP - max hop count or max number of time units, as defined by protocol LSP Handling @ • Each sw elt periodically generates new LSP with increased SEQNO (set to 0 on reboot) • LSP forwarded on all links except one on which it arrived • Each sw elt stores newest LSP from each source sw elt • TTL decremented before being forwarded • TTL of stored LSP may also be periodically decremented Dijkstra’s Shortest Path First Algorithm Goal - find shortest path from source node s to each node in a graph • N = set of nodes in the graph • l(i, j)= cost (weight) for edge (i, j) (>= 0) • seN = the source node • M = set of nodes incorporated so far • C(n) = cost of path from s to n @ Shortest Path First Algorithm M = {s} for each n in N - {s} C(n) = l(s,n) (inf if s not dir conn to n) while (N =/ M) -M = M union {w}: where C(w) is the minimum for all w in (N - M) -for each n directly conn to a w C(n) = MIN (C(n), C(w) + l(w,n)) @ M s w added Min dist from S X Assume for all nodes in M, the min distance and correct first hop from S have been found - consider node X in N-M having min dist from S among all nodes in N-M. Is there shorter path to X? NO, WHY? Consider shorter path to X. If node before X is not in M, then it would have been chosen instead of X. So X is first node outside of M on this shorter path. But then dist to X would already be this shorter value ... Route Table Calculation • Each sw elt contains two lists: Tentative and Confirmed • Each list contains a set of triples (Destination, Cost, NextHop) • Note that “NextHop” is the first sw elt on the path from the source S to Destination @ Confirmed Tentative S 15 3 C 10 11 2 A 1 D 3 B 1. (S,0,-) 2. (S,0.-) 3. (S,0,-) (C,3,C) 4. (S,0,-) (C,3,C) 5. (S,0,-) (C,3,C) (D,5,C) 6. (S,0,-) (C,3,C) (D,,5,C) (A,15,A) (C,3,C) (D,11,D) (A.15,A) (D,11,D) (A.13,C) (D,5,C) (A,13,C) (A,6,C) 7. (S,0,-) (C,3,C) (D,,5,C) (A,6,C) 8. (S,0,-) (C,3,C) (D,,5,C) (A,6,C) (B,9,C) Distance Vector vs. Link State • Distance vector – does not reveal net topology to sw elt – looping behavior with ad hoc solutions – slow to converge • Link state – scales well – can secure by authenticating origin of LSPs – each sw elt knows topology of entire network The Transport Layer • Implemented in end-systems (hosts) not sw elts • Uses network layer services • Issues – – – – – addressing CO vs. CL, reliable vs. unreliable connection establishment flow control connection termination CO vs. CL • Transport layer services may be CO or CL , reliable or unreliable • Consider CO transport service over CL network service • Consider reliable transport service over unreliable network service Connection Establishment 2-way Handshake • Initiator sends connection request (w. id) • Responder sends accept or reject (w. id) • Responder assumes connection exists when it sends the ack • Initiator assumes connection exists when it receives the ack • OK if network service reliable and CO • If not? (consider lost, old conn reqs/ack) Deal with Old Connection Requests • Keep track of obsolete connection ids – problem: lose state info on crash • Limit packet lifetime with max hop count – problem: queuing times • If can bound time in network for request and ack, then can use initial seq number based on clock to distinguish requests – done in TCP, problem: queuing times • 3-way handshake 3-way Handshake • Initiator sends request with ID (ISN in TCP) • Responder sends ack of initiator’s ID (ISN) plus its own ID (ISN) • Initiator sends ack of responder’s ID (ISN) and includes its ID • Solves old connection request problem because only establishes connection when both sides ack the other’s ID Flow Control • Similar mechanism (e.g., sliding window) to link layer flow control • Link layer receive window is usually fixed • Transport protocols may include a “credit” parameter so receiver can inform sender of its current receive window (variable receive window), i.e., how much it will accept beyond what is being acked • Consider credit of zero!! Connection Termination Abort • User indicates it will not send any more data nor will it accept any more data • Transport protocol immediately closes connection • Application must insure that all data sent has been received before issuing abort Connection Termination Graceful Close • User indicate it will send no more data but will continue to accept data • Transport protocol does not close connection until both users have issued graceful close • Application can count on transport protocol to insure all data sent is delivered When is a segment accepted by a receiver? Segment Receive Test Length Window -------- -------- --------------------------------------------------0 0 0 >0 >0 0 >0 >0 SEG.SEQ = = RCV.NXT RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND not acceptable RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND or RCV.NXT <= SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND Basic 3-way Handshake for Connection Synchronization TCP A TCP B SEQ ACK CTL 1. CLOSED 2. SYN-SENT LISTEN <100> <SYN> SYN-RECEIVED 3. ESTABLISHED <300> <101> <SYN,ACK> SYN-RECEIVED 4. ESTABLISHED <101> <301> <ACK> ESTABLISHED 5. ESTABLISHED <101> <301> <ACK> DATA ESTABLISHED Simultaneous Connection Synchronization TCP A TCP B SEQ ACK CTL 1. CLOSED CLOSED <100> <SYN> ... 3. SYN-RECEIVED <300> <SYN> SYN-SENT 4. <SYN> SYN-RECEIVED 2. SYN-SENT ... <100> 5. SYN-RECEIVED <100> <301> <SYN,ACK> ... 6. ESTABLISHED <300> <101> <SYN,ACK> SYN-RECEIVED 7. ESTABLISHED <101> <301> <ACK> ESTABLISHED Recovery from Old Duplicate SYN TCP A TCP B SEQ ACK CTL 1. CLOSED LISTEN 2. SYN-SENT <100> <SYN> ... 3. (duplicate) ... <90> <SYN> SYN-RECEIVED 4. SYN-SENT <300> <91> <SYN,ACK> SYN-RECEIVED 5. SYN-SENT <91> <RST> 6. ... <100> <SYN> 7. SYN-SENT <400> <101> <SYN,ACK> SYN-RECEIVED 8. ESTABLISHED <101> <401> <ACK> LISTEN SYN-RECEIVED ESTABLISHED Half-Open Connection Discovery TCP A TCP B SEQ ACK CTL 1. (Crash) (Send 300, receive 100) 2. CLOSED ESTABLISHED 3. SYN-SENT <400> 4. (!!) <300> <100> <ACK> ESTABLISHED 5. SYN-SENT <100> (Abort!!) <SYN> <RST> (??) 6. SYN-SENT 7. SYN-SENT CLOSED <400> <SYN> Old Duplicate SYN Initiates a Reset on Two Passive Sockets TCP A TCP B SEQ ACK CTL 1. LISTEN LISTEN ... < Z > 3. (??) < X > <Z+1> <SYN,ACK> SYN-RECEIVED 4. <Z+1> (return to LISTEN) 5. LISTEN <SYN> SYN-RECEIVED 2. <RST> LISTEN Normal Close Sequence TCP A TCP B SEQ ACK CTL 1. ESTABLISHED ESTABLISHED 2. (Close) FIN-WAIT-1 <100> <300> <FIN,ACK> CLOSE-WAIT 3. FIN-WAIT-2 <300> <101> <ACK> CLOSE-WAIT <300> <101> <FIN,ACK> (Close) LAST-ACK <101> <301> <ACK> CLOSED 4. TIME-WAIT 5. TIME-WAIT 6. (2 MSL) CLOSED Simultaneous Close Sequence TCP A TCP B SEQ ACK CTL 1. ESTABLISHED 2. (Close) FIN-WAIT-1 3. CLOSING 4. TIME-WAIT (2 MSL) CLOSED ESTABLISHED <100> <300> <FIN,ACK> <300> <100> <FIN,ACK> ... <100> <300> <FIN,ACK> (Close) ... FIN-WAIT-1 <101> <301> <ACK> <301> <101> <ACK> ... <101> <301> <ACK> ... CLOSING TIME-WAIT (2 MSL) CLOSED Service Interface (to IP) • Net Send – – – – – identifier source, destination IP address protocol (TCP) data, data length type of service (precision, reliability, delay, throughput) – don’t fragment – options (IP source routing, time stamp, etc) Service Interface (from IP) • Net Deliver – – – – – source, destination IP address protocol (TCP) data, data length type of service options Service Interface (from application) • Open – unspecified passive - source port – fully specified passive - +dest port, address – active, active w. data - +data length, data, psh, urg+ • • • • • Send - lcn, data length, data, psh, urg+ Allocate (accept) - lcn, data length, data, urg+ Close (graceful) - lcn Abort - lcn Status - lcn Service Interface (to application) • Open ID - local connection name, source port, dest port and address (if known) • Open failure - lcn • Open success - lcn • Deliver - lcn, data, data length, urg+ • Closing - lcn • Terminate - lcn • Status response and error When Bad Segments Arrive • Send RST (stay in same state) – closed state and anything arrives except RST – listen, syn-sent, syn-received states and incoming acks something not sent • Send ACK - established and other synchronized states • Use seq, ack numbers that other side will believe (e.g., from incoming ack or data) TCP Throughput - Error Free Environment • Max TCP window/credit = 64KB • Max TCP throughput – cross US - 100ms RTT - 5.2Mbps – metropolitan - 1ms RTT - 520Mbps – local - .1ms RTT - 5.2Gbps • Problem is the max credit of 64KB TP4 Throughput • Max TP4 window/credit – (8KB per TPDU) * (32K TPDUs) – 256MB = 2Gb • Max TP4 Throughput – cross US - 100ms RTT - 20Gbps – metropolitan - 1ms RTT - 2Tbps – local - .1ms RTT - 20Tbps TCP Solution - Window Scaling • Add window scale option • Both ends of TCP connection must agree at connection setup time to use scale factors • Each end tells the other what scale factor it will accept (0 if willing to use scaling for other end but not accept scaled window) • ActualReceiveWindow = SegmentReceiveWindow * 2**shift-cnt Window Scaling • Max shift-cnt = 14 • Max window/credit = 64KB * 2**14 = ~8Gb • Max TCP througput across US = 80Gbps TCP vs. TP4 - Common Features • Connection-oriented reliable service interface for client processes • Sliding windows with cumulative acks and credits • Timers to trigger retransmission • Checksum to insure data integrity • Three-way handshake for connection establishment TCP vs. TP4 - Differences • Sequence number space – TCP - 32 bits indexing BYTES – TP4 - 31 bits indexing TPDUs • Maximum packet size – TCP - 64KB – TP4 - 8KB • Credit - maximum receive window size – 64KB – 32K TPDUs, each with max.size 8KB TCP vs. TP4 - Differences • Packet types – TCP - one type with flags (segment) • SYN, FIN, ACK, RST, PSH, URG – TP4 - multiple packet types (TPDU) • CR, CC, DT, AK, DR, DC, ED, EA • Out of band data – TCP - URG, PSH – TP4 expedited data • Termination – TCP - graceful close or abort – TP4 - abort TCP vs. TP4 - Differences • Checksum – TCP - one addition for 16 bytes of data – TP4 - two additions for 1 byte of data • Opening connections – TCP - passive or active open, listen – TP4 - active open, no listen • Connections – TCP - one between address/port pairs – TP4 - multiple between TSAPs with diff ref #s The Internet Protocol - IP Version 4 Internet Model R R R R R R @ R TCP R IP IP TCP IP Hub Packet Delivery Model • Connectionless service interface – datagrams may not be delivered to destination user in the same order as sent by source user • Potentially unreliable – datagrams sent may never be delivered • Datagrams are used within the network – unbounded delays, reordering, loss, duplication within network are possible IP Header • • • • • • • Version (4) - currently version 4 HLEN (4) - number of 32 bit words in header TOS (8) - Type of Service (not widely used) Length (16) - number of bytes in this DG (max 64K) Ident (16) - used by fragmentation/reassemlby Flags/Offset (16) - used by frag/reass (3 +13) TTL (8) - Time to Live - number of hops before discard this DG • Protocol (8) - TCP = 6, UDP = 17 • Checksum (16) - of header only @ • Destination / Source IP address (32 bits each) Fragmentation and Reassembly • Each network has some MTU • Strategy – – – – – – @ fragment when necessary (MTU < DG size) avoid fragmentation at source host regragmentation is possible fragments are self contained datagrams delay reassembly until destination host do not recover from lost fragments Indent = x 0 Offset=0 1400 bytes Indent = x Indent = x 1 Offset=0 512 bytes Indent = x 1 Flag = more fragments 512 512 bytes Indent = x @ 0 376 bytes 1024 Global Addresses • Properties – globally unique – hierarchical - network + host – each network interface has an IP address • Four classes - A, B, C, D (multicast-111…) 7 0 net 10 110 @ 24 host 14 net 21 network 12.5.120.9 16 host 138.108.12.240 8 host 205.23.198.12 Datagram Forwarding • Strategy @ – every DG contains destination’s address – if directly connected destination network, then forward to host – otherwise, forward to some router – forwarding table maps network number into next hop – each host has a default router – each router maintains a forwarding table Address Translation • Map IP addresses into physical addresses (e.g., 48 bit IEEE) • Address Resolution Protocol (ARP) – table of IP to physical address bindings – sender broadcasts request if destination IP address not in its table – target machine responds with its physical address @ How do Hosts/Routers Build/Maintain ARP Caches • Table entries discarded if not refreshed (~10 minutes) • Update your table with source when you are the target • Update your table if you already have an entry and see info pass by on net • Do not add new table entry based on info that passes on net ICMP Internet Control Message Protocol • Uses IP but is a separate protocol in the network layer ICMP HEADER IP HEADER IP HEADER PROTOCOL = 1 TYPE CODE CHECKSUM IP DATA REMAINDER OF ICMP MESSAGE (FORMAT IS TYPE SPECIFIC) ICMP Types • • • • • • • • @ Echo - Echo Reply (ping) Redirect (use other router) Destination Unreachable (prot, port, host) Time Exceeded Source Quench Checksum Failed Reassembly Failed Cannot Fragment Echo and Echo Reply TYPE CODE CHECKSUM IDENTIFIER SEQUENCE # DATA …. TYPE = 8 = ECHO; 0 = ECHO REPLY CODE = 0 IDENTIFIER An identifier to aid in matching echoes and replies SEQUENCE # Same use as for IDENTIFIER UNIX “ping” uses echo/echo reply Redirect TYPE CODE CHECKSUM NEW ROUTER ADDRESS IP HEADER + 64 bits data from original DG TYPE = 5 CODE = 0 = Network redirect 1 = Host redirect 2 = Network redirect for specific TOS 3 = Host redirect for specific TOS Destination Unreachable TYPE CODE CHECKSUM UNUSED IP HEADER + 64 bits data from original DG TYPE = 3 CODE 0 = Net unreachable 1 = Host unreachable 2= Protocol unreachable 3 = Port unreachable 4 = Fragmentation needed but DF set 5 = Source route failed Time Exceeded TYPE CODE CHECKSUM UNUSED IP HEADER + 64 bits data from original DG TYPE = 11 CODE 0 = Time to live exceeded in transit 1 = Fragment reassembly time exceeded Source Quench TYPE CODE CHECKSUM UNUSED IP HEADER + 64 bits data from original DG TYPE = 4; CODE = 0 Indicates that a router has dropped the original DG or may indicate that a router is approaching its capacity limit. Correct behavior for source host is not defined. Traceroute • UNIX utility - displays router used to get to a specified Internet Host • Operation – router sends ICMP Time Exceeded message to source if TTL is decremented to 0 – if TTL starts at 5, source host will receive Time Exceeded message from router that is 5 hopes away • Traceroute sends a series of probes with different TTL values… and records the source address of the ICMP Time Exceeded message for each • Probes are formatted to that the destination host will send an ICMP Port Unreachable message Scaling Issues • Problems – Internet addresses have only 32 bits and only 2 levels of hierarchy – original addresses only allowed 256 networks – routing protocols do not scale • IPv6 solves both of these problems • Classes (A, B, C) provided for more networks • Subnetting adds a third level of hierarchy • CIDR makes routing more efficient Subnetting • Add another level to address/routing hierarchy: subnet • Subnet masks define variable partition of host part of Class A and B addresses • Subnets are visible only within the local site Class B Address 00000000 111111111111111111111111 Subnet Mask (255.255.255.0) Network Number SubnetID Subnetted Address HostID Subnet Mask: 255.255.255.128 Subnet number: 128.96.34.0 129.96.34.15 128.96.34.1 H1 R1 Subnet Mask: 255.255.255.128 Subnet number: 128.96.34.128 1228.96.34.130 128.96.34.139 128.96.34.129 H2 R2 H3 128.96.33.14 128,96.33.1 Subnet Mask: 255.255.255.0 Subnet number: 128.96.33.0 @ Subnet Subnet Number Mask 128.96.34.0 255.255.255.128 128.96.34.128 255.255.255.128 128.96.33.0 255.255.255.0 Next Hop int0 int1 R2 Forwarding Algorithm D = destination IP address for each entry (SubnetNum, SubnetMask, NextHop) D1 = SubnetMask & D if D1 = SubnetNum if NextHop is an interface deliver datagram directly to destination else deliver datagram to NextHop (a router) @ Notes • Would use a default router if nothing matches • Not necessary for all ones in subnet mask to be contiguous • Can put multiple subnets on one physical network • Subnets not visible from rest of the Internet @ @ Supernetting • Assign block of contiguous network numbers to near-by networks • Called CIDR: Classless Inter-Domain Routing • Represent blocks with a single pair – (first _network_address, count) • Restrict block sizes to powers of 2 • Use bit mask (CIDR mask) to identify block size • All routers must understand CIDR addressing Route Propagation Idea: Impose a second hierarchy on the network that limits what routers talk to each other. (The first hierarchy is the address hierarchy that governs packet forwarding.) • Autonomous Systems (AS) – corresponds to administrative domain – example: company backbone, ISP backbone – assign each AS a 16 bit number Routing Protocol Types • Two-level route propagation hierarchy • Interior gateway protocol – inside an AS – each AS selects its own – routing decisions based on metric • Exterior gateway protocol – between ASs – Internet-wide standard) – routing decisions based on administrative issues Popular Interior Gateway Protocols • RIP: Route Information Protocol – developed for XNS, distributed with UNIX – distance vector algorithm, metric is hop count • OSPF: Open Shortest Path First – – – – – recent Internet standard link state algorithm routes changed when topology changes supports load balancing 2-level internal hierarchy OSPF OSPF - Open Shortest Path First Intra-AS Routing Protocol • Used within a single AS • Calculates routes to destination IP network address • Link state protocol - SPF/flooding • Route changes based on topology not traffic OSPF • Supports different routes depending on TOS required • Supports multiple equal cost paths • Exchange of routing updates is reliable between adjacent routers • Link state advertisement (LSA) contains info on state of router’s interfaces and adjacencies, LSAs are flooded Areas • Two level routing hierarchy • Divide AS into areas • SPF is used to determine routes within each area and on the backbone that routes between the areas • The backbone is also an area Area 1 Area 2 Area 3 Area 4 Packet Types • • • • Hello - acquire neighbors Database description Link state request Link state update, Link state ack – reliable hop-hop transfer of link state advertisements (LSA) – may aggregate LSAs for several sources Link State Advertisements • Router link - originated by all routers, flooded in area • Network link - originated by designated router, flooded in area • Summary link - originated by area border routers, flooded into area, describes route to destination outside the area but in the AS • AS external link - originated by an AS border router, flooded throughout AS, describes route to dest in another AS Types of OSPF Routers • Internal router - within an area • Area border router – attached to multiple areas – runs a copy of SPF for each attached area – relays topological info on attached areas to backbone • Backbone router - includes area border routers • AS boundary routers Popular Exterior Gateway Protocols • EGP - Exterior Gateway Protocol – original protocol in this class – based on reachability tree – no metric • BGP4 - Border Gateway Protocol\ – recent internet standard Routing Hierarchy Summary • AS to destination AS (BGP4) • Within AS (OSPF) – 2 levels – to destination network • Within network – to subnet, host Border Routers • Each AS has – one or more border routers – one or more border routers that serve as the speakers for the AS – border routers forward datagrams between ASs • Speakers communicate with border routers in adjacent ASs, using TCP • Speakers advertise and receive routing info and determine which routes to use • Message types: OPEN, KEEPALIVE, UPDATE, NOTIFICATION (errors) UPDATE MESSAGES • Routes are advertised and withdrawn in UPDATE messages • A route includes a sequence or set of ASs on the path to the destination IP networks unfeasible routes length (2B) withdrawn routes (destination IP addresses) total path attribute length (2B) path attributes (routes) network layer reachability info (destination IP addresses) Routes • When a BGP speaker advertises a route, it adds itself to the route • A BGP speaker only advertises routes that it uses itself • Policy considerations are used to determine route; routes are not optimal • An advertised route (sequence or set of ASs) may change -- downstream ASs may not discover in a timely fashion Update Handling • Receive an update with routes • Decision process – D (attributes of route) ------> non neg integer – D specifies the degree of preference for a route – route to dest with highest preference used • Note that D does not depend on the existence or non-existence of other routes • D is based on policy considerations, not necessarily on the cost of routes