Lecture 3 Review of Internet Protocols Transport Layer Protocols: HW/SW Interface Internetworking: allows computers on independent and incompatible networks to communicate reliably and efficiently; Enabling technologies: SW standards that allow reliable communications without reliable networks Hierarchy of SW layers, giving each layer responsibility for portion of overall communications task, called protocol families or protocol suites Transmission Control Protocol/Internet Protocol (TCP/IP) This protocol family is the basis of the Internet IP makes best effort to deliver; TCP guarantees delivery TCP/IP used even when communicating locally: NFS uses IP even though communicating across homogeneous LAN 2 Services provided by layers Each layer in protocol stack provides a “service” Uses service from lower layers Layered protocol stack Application-specific communication Application layer Benefits of layering Isolates complexity Clearly defined interfaces Protocols implement functionality within layer Service provided Process-to-process communication Transport layer Connectivity between network interfaces Network layer Point-to-point frame transmission Link layer Transmission of bits in medium Physical layer 3 Layered Network Architecture (OSI) Network B A DATA 7 Application 6 Pre. 5 Session 4 Transport 3 Network 2 Data Link 1 Physical AH DATA PH DATA SH TH NH DH PH DATA DATA DATA DATA DATA Application 7 Pre. 6 Session 5 Transport 4 Network 3 Data Link 2 Physical 1 TCP/IP Model OSI 7 Application 6 Pre. 5 Session 4 Transport TCP 3 Network IP 2 Data Link 1 TCP/IP Physical Application Host-to-Net ISO OSI (Open Systems Interconnection) not fully implemented Presentation and Session layers not present in TCP/IP Protocols Protocols define communication between entities Format and order of messages Actions taken on transmission and/or receipt of message or other event Application layer Data Message Protocols use headers (and trailers) for control information Naming depends on layer Transport layer Network layer Link layer Physical layer H Data Segment H H Data Datagram H H H Data T Frame Bit 6 Process-to-process communication We have a network. How to get between programs? Network 7 Process Communication How do the end systems communicate? End system End system Application layer Data Transport layer H Network Application layer Routers Transport layer Data Data H Network layer H H Network layer Data H H Link Layer H H H Data Network layer Data H H Link Layer T H H H Physical layer Data Data T Data Link Layer T H H H Physical layer H H H Data Data T Physical layer H H H Data T 8 Interface-to-interface connectivity We now have links. How to get across the network? 9 Datagrams Datagrams are forwarded independently C A Dest Out port A 3 B 1 ... ... E B D F 10 TCP/IP packet Application sends message TCP breaks into 64KB segments, adds 20B header IP adds 20B header, sends to network If Ethernet, broken into 1500B packets with headers, trailers Header, trailers have length field, destination, window number, version, ... Ethernet IP Header TCP Header IP Data TCP data (≤ 64KB) 11 Communicating with the Server: The O/S Wall Problems: User CPU Kernel PCI Bus NIC NIC • O/S overhead to move a packet between network and application level => Protocol Stack (TCP/IP) • O/S interrupt • Data copying from kernel space to user space and vice versa • Oh, the PCI Bottleneck! 12 The Send/Receive Operation The application writes the transmit data to the TCP/IP sockets interface for transmission in payload sizes ranging from 4 KB to 64 KB. The data is copied from the User space to the Kernel space The OS segments the data into maximum transmission unit (MTU)–size packets, and then adds TCP/IP header information to each packet. The OS copies the data onto the network interface card (NIC) send queue. The NIC performs the direct memory access (DMA) transfer of each data packet from the TCP buffer space to the NIC, and interrupts CPU activities to indicate completion of the transfer. 13 Transmitting data across the memory bus using a standard NIC http://www.dell.com/downloads/global/power/1q04-her.pdf 14 TCP/IP Processing Path (RX) Network I/O Processing TCP requirements Rule of thumb: 1GHz for 1Gbps 1000 GHz and Gbps 100 10 100 Network bandwidth outpaces Moore’s Law 40 10 1 0.1 Moore’s Law .01 1990 1995 2000 2003 2005 2006/7 2010 Time I/O Acceleration Techniques Architectural Improvement TCP Offload: Offload TCP/IP Checksum and Segmentation to Interface hardware or programmable device (Ex. TOEs) O/S Bypass: User-level software techniques to bypass protocol stack – Zero Copy Protocol (Needs programmable device in the NIC for direct user level memory access – Virtual to Physical Memory Mapping. Ex. VIA) All the high bandwidth NICs today employ some kind of TCP Offload and O/S Bypass techniques Multiplexing/de-multiplexing Multiple processes operate on one computer Interface address alone is not sufficient to distinguish Need to (de)multiplex traffic from different processes 5-tuple used for unique identification of connection IP source address IP destination address Transport layer source port Transport layer destination port Transport layer protocol 18 TCP/IP packet MAC IP TCP APP. DATA Source Port MAC Destination Port Sequence Number Acknowledgement Number Header Length and Options Window Size Checksum Urgent Pointer Options (0 or more 32-bit words) TCP/IP packet MAC Ver. IHL IP APP. DATA Service Type Identification Time to Live TCP Total Length Options and Fragment Offset Protocol Header Checksum Source Address Destination Address Options (0 or more 32-bit words) MAC TCP/IP packet MAC Preamble IP TCP D. Add. S. Add. Leng. APP. DATA MAC CRC TCP Header: 5-tuple example 5-tuple is reversed for return communication sender Destination port is (client) associated with application layer protocol (e.g., 80 for HTTP) Operating system picks source port randomly 128.119.91.53 74.125.39.99 receiver (server) source IP destination IP source port destination port protocol source IP destination IP source port destination port protocol 128.119.91.53 74.125.39.99 35466 80 TCP 74.125.39.99 128.119.91.53 80 35466 TCP 22 Source port number Position of data 31 28 29 23 24 20 21 15 16 11 12 7 8 3 4 Port numbers Sequence number 0 TCP header Destination port number Sequence number ACK number Next expected data Data offset Reserved URG ACK PSH RST SYN FIN Acknowledgement number Checksum Checksum Flags for connection setup and teardown Window Urgent pointer Options Data 23 What functionality does TCP provide? Reliability Recovery from errors in the network layer Flow control Limit transmission rate to not overwhelm receiver Congestion control Limit transmission rate to not overwhelm network 24 Reliable data transfer How can reliability be achieved? Consider different assumptions for network layer Case 1: completely reliable network layer Send segment Case 2: bit errors in network layer Add error detection and ACK/NAK Add sequence number to handle garbled ACK/NAK Case 3: bit errors and packet loss in network layer Timer to trigger retransmission “Stop-and-wait” protocol 25 Reliable data transfer Stop-and-wait has low performance How can we increase throughput? Sliding window Allow multiple segments “in-flight” 26 Sliding window example data Sent, acknowledged Sent, not yet acknowledged Ready, not yet sent Received, acknowledged, sent to application Free buffer acknowledgements Sequence numbers Received, Received, acknowledged, not yet not yet sent to acknowledged application Free buffer Acknowledgement numbers sender 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 receiver 1 2 3 4 5 6 7 8 9 10 S=1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 A=4 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 A=6 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 S=2 S=3 S=4 S=5 S=6 S=7 timer for S=6 timer for S=7 1 2 3 4 5 6 7 8 9 10 A=2 S=8 S=9 S=6 S=7 A=6 A=10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 27 UDP: bare bones protocol Ports, length, checksum Source port number Destination port number Length Checksum 31 28 29 23 24 20 21 15 16 11 12 7 8 3 4 0 Checksum is optional 28 29 Source and destination address Datagram length Upper layer protocol Identifies TCP, UDP, etc. Time to live Protection against accidental loops Header checksum Version Header length 31 28 29 23 24 20 21 Type of service Identifier Time to live 15 16 11 12 7 8 3 4 IP header 0 Internet Protocol Datagram length Flags Upper layer protocol Fragment offset Header checksum Source address Destination address Options Data Protection against bit errors Fragmentation possible Link layer limited to some datagram size (min. MTU is 576 bytes) 30 Other IP aspects Routing Application layer Determines forwarding Domain Name System (DNS) ICMP Error handling Transport layer Link layer Address resolution (ARP) Dynamic IP addresses (DHCP) Application layer Domain names (DNS) Network layer Routing protocols (OSPF, RIP, BGP) Forwarding Information Base (FIB) Internet Protocol (IP) Transport layer Network address translation (NAT) New IP version: IPv6 Internet Control Message Protocol (ICMP) Address Resolution Protocol (ARP) Link Layer Physical layer 31 Network systems Data is switched between network stacks ... ... ... Network system Transport layer Transport layer Transport layer Network layer Network layer Data link layer Data link layer Data link layer Physical layer Physical layer Physical layer Link Link Link ... Network layer 32 Classification of network systems Network system differ by level of protocol processing Link Physical Physical network system Link Physical DLC MAC LLC Bridge DLC MAC LLC Network Router Network Transport ... Gateway Transport ... 33 Example network system: NIC Network interface card / adapter connects to link Block diagram: Adapter Memory Link Physical MAC End system bus interface DMA Microprocessor 34 Example network system: switch Switch connects multiple links Block diagram: Interconnection Memory Switch DMA Adapter 1 Proc Memory ... Switch Proc DMA MAC/ PHY MAC/ PHY Link Link Adapter N 35 Example network system: switch System may differ by system architecture Example: share memory vs. distributed Memory memory Interconnection Switch Proc ... DMA MAC/ PHY Link Switch Adapter 1 Proc DMA MAC/ PHY Adapter N Link 36 Application requirements Different applications have different requirements: Internet browsing Scientific data archiving Telephony Internet TV First-person shooter game Real-time surgery Delay-tolerant networking Throughput Low High Low High Low High Low Delay Large Medium Minimal Minimal Minimal Minimal Large Jitter Insensitive Indifferent Sensitive Sensitive Very sensitive Very sensitive Insensitive Packet loss Unacceptable Unacceptable Low Low Unacceptable Unacceptable Acceptable 37 Throughput preservation Throughput performance Ensure network system can handle link rates at all points Delay/jitter Ensure network system processes traffic quickly Packet loss Ensure sufficient buffer space and fast processing Most network system design focus on bandwidth 38 Packet rate vs. data rate Data rate states total number of bits per second Each packet requires specific processing Packet rate sometimes more meaningful What is the packet rate for a 10Gbps link? Distinguish small packets and large packets 39 System design for throughput What are the differences between these systems? How do they affect throughput preservation? Processor Memory Processor Bus Bus Link adapter Memory Link adapter DMA unit 40