¦ The TCP/IP Protocol Suite This talk will outline parts of TCP/IP Focus on aspects relevant to network diagnosis IP and DNS, but not FTP and telnet For more information, see ftp://ftp.cs.rutgers.edu/runet/tcp-ip-intro.txt & tcp-ip-admin.txt Official source of info is the “RFC’s” At Rutgers, on /rutgers/ref/rfcs. See rfc-index.txt Or look at www.internic.net Comer’s book on TCP/IP ¦ TCP/IP: The Protocol Stack TCP/IP: a family of protocols FTP, telnet, SMTP, etc: application layer TCP and UDP: session layer gets packets to the right application, maintains a connection IP: network layer gets packets to the right machine, routing Ethernet, etc: encapsulation Each layer has its own addressing and tables. A packet has headers from each layer: Ethernet C IP C TCP C application ¦ TCP/IP: The Protocol Stack All communications are based on packets A packet has up to 1500 bytes of data If you’re sending a big file, it gets broken up into packets Packet addressed to a specific application on a specific system. Each packet has a “header”, which looks something like this: IP From IP address 128.6.134.22 To IP address 165.230.180.144 From FTP server To FTP program Bytes number 22,123 through 23,620 of the file <actual data> TC P FTP ¦ IP: protocol definition IP: Internet Protocol Gets packets from machine to machine This is the level normally involved in network diagnosis Addressing 32-bit address, displayed as octets, eg. 128.6.134.22, or in hex, 80068616 Usually divided in three parts: network, subnet, host 128.6 is Rutgers [we have more than one network] 134 is subnet, in this case one of several NBCS staff nets 22 is the particular machine Division not always on octet boundary. Subnet mask 255.255.255.0, or FFFFFF00. 1’s mark the network part More recently, just show the number of bits, e.g. 128.6.0.0/16 ¦ IP: Addressing Address: 128.6.134.22 = 80068616 Mask: 255.255.255.0 = FFFFFF00 1’s mark the network: 80068600 128.6.134.0 to 128.6.134.255 0’s mark the host: 16 22 Newer terminology: net is 128.6.134.0/24 Address: 165.230.180.141 = A5E6B48D Mask: 255.255.255.192 = FFFFFFC0 1’s mark the network: A5E6B480 165.230.180.128 to …180.191 0’s mark the host: 0D 13 Newer teminology: net is 165.230.180.141/26 The main thing is to know what range of addresses are included in the net ¦ IP: Addressing The Internet authorities assign Rutgers one or more nets Currently it is 128.6.0.0/16 and 165.230.0.0/16 Normally use one or more /16 nets for large institutions Possible to assign a range, e.g. 7.1.0.0/16 - 7.3.0.0/16 Class A: 1.0.0.0 to 126.255.255.255 used /8 mask Class B: 128.0.0.0 to 191.255.255.255 used /16 mask Class C: 192.0.0.0 to 223.255.255.255 used /24 mask No longer true: now assigning “class B” nets everywhere 127.0.0.1 is “loopback”. (whole 127 is “reserved”) ¦ IP: Addressing So Rutgers gets 128.6.0.0/16 and 165.230.0.0/16 We then allocate subnets to departments. Currently 128.6 uses /24 mask and 165.230 uses /26 A department that needs 255 hosts on a subnet is allocated from 128.6, e.g. 128.6.4.0/24 A department that needs 64 hosts on a subnet is allocated from 165.230, e.g. 165.230.180.192/26 ¦ IP: Routing So how do you get from your machine to 165.230.180.22? The network is a mesh of networks, routers, and point to point Ether Ether Phys Busch Fiber Ring Ether Busch BuschLivingston Fiber Trunk Livings. Chem Ether Ether GSB Ether Livingston Fiber Ring Ether Sociol Any resemblance between this and Runet is pure coincidence There are only two real choices: Send it directly over the Ethernet Send it to a router Ether ¦ IP: Routing You want to send from 128.6.134.22 to 165.230.180.144 The IP system will break the addresses into subnet and host So you want to get from host 22 on 128.6.134.0 to host 16 on 165.230.180.128 These are obviously on different networks. So you want to send it to the router. The routers keep track of each other. The router on 128.6.134 knows which router handles 165.230.180.128. So 128.6.134.22 sends it to a router, e.g. 128.6.134.1. The routers get it to the router for 165.230.180.128. The last router sends it to 165.230.180.144. This talk does not discuss the protocols used among the routers They keep track of each other, and compute a (nearly) optimal route This is complex, and can go wrong: unreachable nets or looping ¦ IP: Routing Each system has a “routing table”. Show with netstat -r or route Network Address 0.0.0.0 127.0.0.0 165.230.180.0 255.255.255.255 Netmask 0.0.0.0 255.0.0.0 255.255.255.192 255.255.255.255 Gateway Address If 165.230.180.1 127.0.0.1 165.230.180.4 165.230.180.1 le0 lo0 le0 le0 Type of destination: This one host (mask 255.255.255.255) All hosts on network (mask 255.255.255.192) Default route (address 0.0.0.0) How to send it Send directly on this Ethernet (metric 0) Send through a router (metric 1): router is “gateway address” Metric 1 1 0 1 ¦ IP: Routing The main things you have to worry about in routing are: Make sure your system has the right address Make sure you have the correct subnet mask Make sure your system knows how to find a router Options for the address: configuration or DHCP/PPP At RU we always configure the address, except dialups Options for the subnet mask: configuration or ICMP/PPP At RU we normally configure the subnet mask, except for dialups Finding the router: Configure a default router Find a router from router discovery Use DHCP/PPP Use proxy ARP ¦ IP: Routing Router Discovery A special protocol that allows systems to find a router automatically Your router must have router discovery enabled Solaris uses router discovery by default, modified rdisc recommended DHCP/PPP Most dialup software automatically sets up the dialup as the default We don’t currently support DHCP Proxy ARP Use “route add default XXXX 0” where XXXX is your own IP or name Causes the system to think that the whole world is on your Ethernet Routers will respond to the ARP request with their own address Sort of a hack. Has only been used at Rutgers with older Suns We encourage you to configure in the default router if your system doesn’t do router discovery ¦ IP: Broadcasting Ethernet and most other local media allow broadcasts Send to Ethernet address of all one’s All stations will accept it as for them IP allows you to use address of 255.255.255.255 to broadcast Normally used to find things E.g. ARP sends an Ethernet broadcast to find machine with a given IP Routers send broadcast so you can find them These days applications tend to use multicast instead Problem: everyone sees broadcasts. Often they aren’t interested. Multicast is based on a whole range of Ethernet addresses that are similar to broadcast, but have to be enabled Multicast addresses start with 224 and higher Most routing protocols now use multicast The MBONE (video conferencing) uses multicast ¦ IP: Broadcasting It is possible to broadcast on other subnets To send a broadcast on 128.6.4, send packet to 128.6.4.255 This can become a security issue (“smurf”) Thus we often disable this feature Multicasting has a whole protocol to distribute it The routers keep track of which hosts are interested in getting specific multicasts Send traffic only to routers where someone wants it This protocol runs over the entire Internet for audio and video Called the “MBONE” Not enabled by default; you’ll need to negotiate with TD ¦ IP: testing Single most useful test is ping Relies on having another system you know is up and supports ping Normally do several tests: hosts on local subnets, then on other subnets “ping 128.6.4.4” [recommend using IP address, not name] 128.6.4.4 is alive or no response from 128.6.4.4 If you can’t get anything from local addresses, suspect either a gross error in your machine’s TCP/IP setup or a wiring/network hardware problem. If other machines on your net are OK, probably it’s your setup or your machine’s cabling If you can get to machine on your subnet but not to any other address, probably your router is not working or there’s a problem in the wiring going to it. If you can get to machines on some subnets but not others, probably the routers are having problems. ¦ IP: testing For network delays or inconsistent results, use ping -s or -t ping -s 128.6.4.4 PING 128.6.4.4: 56 data bytes 64 bytes from ns-lcsr.rutgers.edu (128.6.4.4): icmp_seq=0. time=5. ms 64 bytes from ns-lcsr.rutgers.edu (128.6.4.4): icmp_seq=1. time=3. ms ----128.6.4.4 PING Statistics---2 packets transmitted, 2 packets received, 0% packet loss round-trip (ms) min/avg/max = 2/3/5 If some sequence numbers missing, packets are being dropped. Users will report this is “the network is slow”. It is normal to drop the first packet. This can be almost anything: bad wiring, bad hub, Runet. If hosts on your subnet are OK, probably Runet problem. If some or all hosts on your subnet drop packets, either badly overloaded network or wiring, hub, etc ¦ IP: testing Another good tool for testing connectivity is traceroute (tracert on some PC’s) C:\USERS\HEDRICK>tracert athos 1 130 ms 131 ms 120 ms calloway-a.rutgers.edu [165.230.80.66] 2 * 120 ms 121 ms busch-gw.rutgers.edu [165.230.80.65] 3 120 ms 121 ms 120 ms rucs-gw.rutgers.edu [165.230.96.130] 4 121 ms 120 ms 130 ms lcsr-gw.rutgers.edu [165.230.212.130] 5 130 ms 130 ms 130 ms athos.rutgers.edu [128.6.25.4] Look for Dropped packets (*), except for the first Unreasonable times. 2 to 5 ms typical for internal Runet, 120 ms for dialup, as here. Outside Rutgers, several hundred ms is typical. Looping Note that you can’t do much about any of this. However the output may be useful to Telecom people if they can’t duplicate the problem ¦ IP: testing Ifconfig (ipconfig for NT) and netstat (-i for Unix -e for NT) can be useful for detecting problems with your network card, wiring, hubs, etc. Ifconfig and ipconfig can be useful for checking configuration options. Netstat has many options, which let you look at open connections, the routing table, etc. Netstat -i or -e gives packet counts. Look specifically at output error and collisions. Output errors should be very small. They indicate that the system was unable to send a packet, even after retrying 16 times. This indicates badly overloaded or broken network. A few per day are normal. Collisions are a way to judge network load. Compare output packets with collisions. More than 5 indicates some load, 10-20% is cause for concern. However we’ve seen Ethernets work with up to 50% collision! ¦ TCP: Protocol IP is “unreliable” Best effort at delivering packets, but can fail Temporary equipment failure, routing in flux, etc If you can’t deliver a packet, drop it If you’re too badly overloaded, dtop packets TCP creates a reliable stream using IP packets Breaks the conversation into packets, reassembles at other end Assigns “sequence numbers” to each packet Acknowledge arrival using “acknowledge sequence number” If your packets haven’t been acknowledged, retransmit Verifies checksum, ignore if bad. TCP is good in bad networks Does “window management”, i.e. flow control If network is “slow”, probably packets are being retransmitted Some TCP implementations are better than others in bad conditions ¦ TCP: Protocol 1 0.00000 2 0.00454 3 0.00008 4 0.00189 5 0.04711 8 0.12893 9 0.00009 10 0.00068 11 0.00160 12 5.00023 13 0.00010 14 0.00009 15 0.00008 ... 28 0.01424 29 0.00008 30 0.00059 31 0.00221 geneva -> athos athos -> geneva geneva -> athos geneva -> athos athos -> geneva athos -> geneva geneva -> athos geneva -> athos athos -> geneva geneva -> athos athos -> geneva geneva -> athos geneva -> athos TCP TCP TCP TCP TCP TCP TCP TCP TCP TCP TCP TCP TCP D=23 S=60756 Syn Seq=00 Len=0 D=60756 S=23 Syn Ack=01 Seq=287 Len=0 D=23 S=60756 Ack=288 Seq=01 Len=0 D=23 S=60756 Ack=288 Seq=01 Len=24 D=60756 S=23 Ack=25 Seq=288 Len=0 D=60756 S=23 Ack=25 Seq=288 Len=12 D=23 S=60756 Ack=300 Seq=25 Len=0 D=23 S=60756 Ack=300 Seq=25 Len=6 D=60756 S=23 Ack=25 Seq=300 Len=18 D=23 S=60756 Ack=300 Seq=25 Len=6 D=60756 S=23 Ack=31 Seq=300 Len=18 D=23 S=60756 Ack=318 Seq=31 Len=0 D=23 S=60756 Ack=318 Seq=31 Len=9 athos -> geneva geneva -> athos geneva -> athos athos -> geneva TCP TCP TCP TCP D=60756 S=23 Fin Ack=93 Seq=409 Len=0 D=23 S=60756 Ack=410 Seq=93 Len=0 D=23 S=60756 Fin Ack=410 Seq=93 Len=0 D=60756 S=23 Ack=94 Seq=410 Len=0 ¦ TCP: Protocol TCP manages “ports” (or “sockets”) Ports are 16-bit numbers. They way you get to a specific application Telnet is port 23, mail is port 25, etc These are “well known ports”, documented in /etc/services (source IP, dest IP, source TCP port, dest TCP port) must be unique All ports under 1024 are “privileged” Users can’t run their own telnetd Rsh, rlogin, etc, use this to avoid needing passwords Rsh starts by sending the user name Can you trust it? If it’s coming from a port number 1024, in theory you can But it must be coming from a machine you trust Reverse lookup security; need to do a forward lookup Even so, faking addresses is too easy; only do this for your own network ¦ TCP: Diagnosis Diagnosis Normally you don’t do any explicit diagnosis or testing of TCP The main issue is applications: is the right daemon running? The only problems I know of with TCP have been on large Unix servers, when the kernel gets confused about memory. It may become impossible to start a TCP connection. “netstat -m” will show memory allocation failures under Unix “netstat -s” [Unix and NT] will show all kinds of neat TCP stats ¦ Other Protocols UDP: Unreliable Datagram Protocol Used for simple question and answer, where overhead for a connection is not justified: it’s easier just to ask again Implements ports just like TCP, so you can find your application Primary use is DNS NFS used to use it, but found that TCP was better The general trend is to use TCP for everything ICMP: Internet Control Message Protocol Used for infrastructure messages Currently not secure Ping (ICMP echo request; ICMP echo reply) Routers send back: ICMP host or network unreachable Default router sends: ICMP redirect (use this other router instead) Subnet mask: ICMP subnet mask request and reply ¦ DNS: Protocol Description The Domain Name System maps names to IP addresses, and back A distributed database Central information distributed via “root name servers” Campus information distributed via campus servers Many campuses have departmental servers. How do we handle ftp.athena.mit.edu? Ask root name server who knows about edu Ask the server for edu who knows about mit.edu Ask the MIT campus server who knows about athena.mit.edu Ask the Athena project server for the address for ftp.athena.edu ¦ DNS: Protocol Description Sometimes one server can handle several levels. You actually ask each level the whole question Root servers are listed in a configuration file Ask root servers: ftp.ai.mit.edu Response: for mit.edu, see MIT (names and IP addresses given) Ask MIT server: ftp.ai.mit.edu Response: for ai.mit.edu, see the AI servers (names and IP given) Ask AI server: ftp.ai.mit.edu Response: ftp.ai.mit.edu is an alias for mini-wheats.ai.mit.edu We now go through the same thing for mini-wheats. To avoid generating lots of traffic, this information is cached ¦ DNS: Protocol Description You don’t want every PC to have to do this So PC’s normally send their queries to a local DNS server, which will handle queries for them and cache the information So on a typical PC or workstation, all you have to do is configure a list of DNS server for it to talk to. [2 or 3, for safety] The system administrator’s web page lists recommend servers We recommend that departments run caching servers locally They can be configured just to point to one of the servers we maintain. We currently support DNS servers only for Suns. There are Track packages to help you set this up. Security: an issue if people to access control by hostname Forward lookups go to authority for the domain; probably OK Reverse lookups authority for the IP address range; can claim any name So normally do reverse and then look up name forward to check ¦ DNS: Diagnosis For PC/Workstation, there isn’t much to do. Verify that it has a correct list of servers configured Verify that it has the right default domain or list of domains, to allow people to type eden, rather than eden.rutgers.edu To check servers, using “host” on Unix. “host athos 128.6.4.4” will ask the server 128.6.4.4 about athos. First check that you can get to the server using ping Most supposed DNS problems are network problems Particularly problems looking up names outside Rutgers Diagnosing problems outside Rutgers is fairly complex Do “host -r -v target server” several times, starting with root servers. Eventually you’ll find that some server isn’t responding ¦ ARP: protocol definition ARP: Address Resolution Protocol Used for Ethernet and similar LAN technologies Ethernet cards have 48-bit addresses. All transmissions on the Ethernet must use those addresses. ARP lets a system go from IP address to Ethernet address: > ping 128.6.134.2 geneva -> (broadcast) ARP C Who is snagglepuss, 128.6.134.2 snagglepuss -> geneva ARP R 128.6.134.2 is 0:5:2:fa:dd:24 The result goes into an ARP table: le0 le0 128.6.134.1 128.6.134.2 255.255.255.255 255.255.255.255 00:60:70:2f:a0:29 00:05:02:fa:dd:24 If you ping an IP address on your subnet, should get ARP entry It is very uncommon for there to be a problem with ARP itself ¦ ARP: displaying arp -a [Unix, NT] Net to Media Table Device IP Address Mask Flags Phys Addr --- -------------------- --------------- -- --------------le0 nb-gw 255.255.255.255 00:60:70:2f:a0:29 le0 toolbox 255.255.255.255 08:00:20:1a:e4:3e le0 farside 255.255.255.255 00:60:70:2f:a0:29 le0 ALL-ROUTERS.MCAST.NET 255.255.255.255 01:00:5e:00:00:02 le0 geneva 255.255.255.255 SP 08:00:20:7e:a4:91 e0 128.6.134.222 255.255.255.255 U Nb-gw is the default router. Note how farside, on a different subnet, shows as nb-gw Toolbox is another machine on the same subnet 128.6.134.222 shows a request that hasn’t gotten a response ¦ Putting it Together When I’m having trouble, I try to do testing in a systematic order, basically following the order in which various protocols are used. First, I try doing all tests using IP addresses rather than hostnames. That tells immediately if the problem is DNS or something else. If it is DNS, the main diagnostic tool is “host” Otherwise, try pings for both local hosts and hosts on other subnets. If no pings work, it’s likely to be a serious setup problem or hardware. If you can’t get off the local subnet, ping the router. To check your configuration, look at the routing table with netstat -r or route, and verify that you have the right IP address, net mask, etc. Ifconfig or ipconfig and various options of netstat can help. For network slowness or inconsistency, ping -s (ot -t) and traceroute (tracert) are normally the most useful. Snoop on a Sun is really useful. (On Linux: tcpdump) It lets you check to see whether the packets you expect are being sent and responses are arriving. This is how I debug most complex problems. ¦ Putting it Together Use the various tools to check out each step. Suppose you want to do “telnet athos”. Here are the steps you need to check: DNS lookup of athos [DNS setup, are DNS servers working? Try it with IP address first. Most supposed DNS failures are network failures.] Look up IP address in routing table [are IP address, net mask right? Do you have a default router set up?] Look up next hop IP address (destination or router) in ARP table. If not there, send ARP request [failures of ARP are unusual] Send packet [use ping, netstat, ifconfig, etc. to check for overloaded network, router down, wiring problems, etc.] Is other end OK? [this requires cooperation from someone at the other end. Try the same command to a number of different hosts. If all fail, it’s probably your problem. If only some do, it may be theirs] A lot depends upon care in trying several different tests. Ping to a variety of machines. But you need to think carefully about the results. E.g. if pings work on your side of the Raritan only, suspect RUnet