Networked Applications: Sockets Assembled by Ossi Mokryn, based on Slides by Jennifer Rexford, Princeton, And on data from beej’s guide : http://beej.us/guide/bgnet 1 Socket programming Goal: learn how to build client/server application that communicate using sockets Socket API 2 introduced in BSD4.1 UNIX, 1981 explicitly created, used, released by apps client/server paradigm two types of transport service via socket API: unreliable datagram reliable, byte stream-oriented socket a host-local, application-created, OS-controlled interface (a “door”) into which application process can both send and receive messages to/from another application process Application Layer Clients and Servers Client program Running on end host Requests service E.g., Web browser Server program Running on end host Provides service E.g., Web server GET /index.html 3 “Site under construction” Client-Server Communication Client “sometimes on” 4 Initiates a request to the server when interested E.g., Web browser on your laptop or cell phone Doesn’t communicate directly with other clients Needs to know the server’s address Server is “always on” Services requests from many client hosts E.g., Web server for the www.cnn.com Web site Doesn’t initiate contact with the clients Needs a fixed, well-known address Client and Server Processes Program vs. process Program: collection of code Process: a running program on a host Communication between processes Same end host: inter-process communication Governed by the operating system on the end host Different end hosts: exchanging messages Governed by the network protocols Client and server processes 5 Client process: process that initiates communication Server process: process that waits to be contacted Socket: End Point of Communication Sending message from one process to another Message must traverse the underlying network Process sends and receives through a “socket” In essence, the doorway leading in/out of the house Socket as an Application Programming Interface 6 Supports the creation of network applications User process User process socket socket Operating System Operating System Identifying the Receiving Process Sending process must identify the receiver Name or address of the receiving end host Identifier that specifies the receiving process Receiving host Destination address that uniquely identifies the host An IP address is a 32-bit quantity Receiving process 7 Host may be running many different processes Destination port that uniquely identifies the socket A port number is a 16-bit quantity Using Ports to Identify Services Server host 128.2.194.242 Client host Service request for 128.2.194.242:80 (i.e., the Web server) Web server (port 80) OS Client Echo server (port 7) Service request for 128.2.194.242:7 (i.e., the echo server) Client Web server (port 80) OS Echo server (port 7) 8 Knowing What Port Number To Use Popular applications have well-known ports E.g., port 80 for Web and port 25 for e-mail Well-known ports listed at http://www.iana.org Well-known vs. ephemeral ports Server has a well-known port (e.g., port 80) Between 0 and 1023 Client picks an unused ephemeral (i.e., temporary) port Between 1024 and 65535 Uniquely identifying the traffic between the hosts 9 Two IP addresses and two port numbers Underlying transport protocol (e.g., TCP or UDP) Delivering the Data: Division of Labor Network Operating system Deliver data packet to the destination host Based on the destination IP address Deliver data to the destination socket Based on the protocol and destination port # Application 10 Read data from the socket Interpret the data (e.g., render a Web page) Windows Socket API Socket interface In UNIX, everything is like a file Originally provided in Berkeley UNIX Later adopted by all popular operating systems Simplifies porting applications to different OSes All input is like reading a file All output is like writing a file File is represented by an integer file descriptor System calls for sockets Client: create, connect, write, read, close Server: create, bind, listen, accept, read, write, close Winsock Programmer's FAQ: http://tangentsoft.net/wskfaq/newbie.html#interop 11 Typical Client Program Prepare to communicate Exchange data with the server Create a socket Determine server address and port number Initiate the connection to the server Write data to the socket Read data from the socket Do stuff with the data (e.g., render a Web page) Close the socket 12 Socket programming with UDP UDP: no “connection” between client and server no handshaking sender explicitly attaches IP address and port of destination to application viewpoint each packet UDP provides unreliable server must extract IP address, transfer of groups of bytes (“datagrams”) port of sender from received between client and server packet UDP: transmitted data may be received out of order, or lost 13 Application Layer Client/server socket interaction: UDP Server (running on hostid) create socket, port=x, for incoming request: serverSocket = DatagramSocket() read request from serverSocket write reply to serverSocket specifying client host address, port number Client create socket, clientSocket = DatagramSocket() Create, address (hostid, port=x, send datagram request using clientSocket read reply from clientSocket close clientSocket 14 Application Layer Creating a Socket: socket() Operation to create a socket af An address family specification. only format currently supported is PF_INET, which is the ARPA Internet address format. Type: semantics of the communication SOCKET WSAAPI socket ( int af, int type, int protocol); SOCK_STREAM: reliable byte stream SOCK_DGRAM: message-oriented service Protocol: specific protocol 15 0: unspecified (PF_INET and SOCK_STREAM already implies TCP, PF_INET and SOCK_DGRAM implies UDP). Establishing the server’s name and port history struct sockaddr_in server; server.sin_family = AF_INET; server.sin_addr.s_addr = inet_addr(IPstring); server.sin_port = htons(ServerPort); 16 Sending Data int WSAAPI sendto ( SOCKET s, const char FAR * buf, int len, int flags, const struct sockaddr FAR * to, int tolen ); s A descriptor identifying a (possibly connected) socket. buf A buffer containing the data to be transmitted. len The length of the data in buf. flags Specifies the way in which the call is made. to An optional pointer to the address of the target socket. tolen The size of the address in to. 20 Slides by Jennifer Rexford Receiving Data int WSAAPI recvfrom ( SOCKET char FAR* int int s, buf, len, flags, struct sockaddr FAR* int FAR* from, fromlen ); s A descriptor identifying a bound socket. buf A buffer for the incoming data. len The length of buf. flags Specifies the way in which the call is made. from An optional pointer to a buffer which will hold the source address upon return. fromlen 21 An optional pointer to the size of the from buffer. Slides by Jennifer Rexford Byte Ordering: Little and Big Endian Hosts differ in how they store data Little endian (“little end comes first”) Intel PCs!!! Low-order byte stored at the lowest memory location Byte0, byte1, byte2, byte3 Big endian (“big end comes first”) E.g., four-byte number (byte3, byte2, byte1, byte0) High-order byte stored at lowest memory location Byte3, byte2, byte1, byte 0 IP is big endian (aka “network byte order”) 22 Use htons() and htonl() to convert to network byte order Use ntohs() and ntohl() to convert to host order Why Can’t Sockets Hide These Details? Dealing with endian differences is tedious No, swapping depends on the data type Couldn’t the socket implementation deal with this … by swapping the bytes as needed? Two-byte short int: (byte 1, byte 0) vs. (byte 0, byte 1) Four-byte long int: (byte 3, byte 2, byte 1, byte 0) vs. (byte 0, byte 1, byte 2, byte 3) String of one-byte charters: (char 0, char 1, char 2, …) in both cases Socket layer doesn’t know the data types 23 Sees the data as simply a buffer pointer and a length Doesn’t have enough information to do the swapping Servers Differ From Clients Passive open Hearing from multiple clients Prepare to accept connections … but don’t actually establish one … until hearing from a client Allow a backlog of waiting clients ... in case several try to start a connection at once Create a socket for each client 24 Upon accepting a new client … create a new socket for the communication Typical Server Program Prepare to communicate Wait to hear from a client (passive open) In TCP: Indicate how many clients-in-waiting to permit Accept an incoming connection from a client Exchange data with the client over new socket Create a socket Associate local address and port with the socket Receive data from the socket Do stuff to handle the request (e.g., get a file) Send data to the socket Close the socket Repeat with the next connection request 25 Server Preparing its Socket Bind socket to the local address and port number (Associate a local address with a socket) #include <winsock2.h> int WSAAPI bind ( SOCKET s, struct sockaddr* name, int namelen); Arguments: socket descriptor, server address, address length s A descriptor identifying an unbound socket. name The address to assign to the socket. Often, you bound your listening socket to the special IP address INADDR_ANY. This allows your program to work without knowing the IP address of the machine it was running on, or, in the case of a machine with multiple network interfaces, it allows your server to receive packets destined to any of the interfaces. When sending, a socket bound with INADDR_ANY binds to the default IP address, which is that of the lowestnumbered interface. 26 Namelen The length of the name. Returns 0 on success, and -1 if an error occurs Putting it All Together Server socket() bind() Client listen() accept() socket() connect() block read() process request write() 27 write() read() Serving One Request at a Time? Serializing requests is inefficient Server can process just one request at a time All other clients must wait until previous one is done Need to time share the server machine Alternate between servicing different requests Do a little work on one request, then switch to another Small tasks, like reading HTTP request, locating the associated file, reading the disk, transmitting parts of the response, etc. Or, start a new process to handle each request 28 Allow the operating system to share the CPU across processes Or, some hybrid of these two approaches TCP info for later in the course 29 Socket-programming using TCP Socket: a door between application process and end-endtransport protocol (UCP or TCP) TCP service: reliable transfer of bytes from one process to another controlled by application developer controlled by operating system socket TCP with buffers, variables host or server 30 process process internet controlled by application developer socket TCP with controlled by operating buffers, system variables host or server Application Layer Putting it All Together Server socket() bind() Client listen() accept() socket() connect() block read() process request write() 31 write() read() TCP info for later in the course 32 Serving One Request at a Time? Serializing requests is inefficient Server can process just one request at a time All other clients must wait until previous one is done Need to time share the server machine Alternate between servicing different requests Do a little work on one request, then switch to another Small tasks, like reading HTTP request, locating the associated file, reading the disk, transmitting parts of the response, etc. Or, start a new process to handle each request 34 Allow the operating system to share the CPU across processes Or, some hybrid of these two approaches Blocking and non blocking "block" is a techie jargon for "sleep“ Lots of functions block. accept() blocks. All the recv() functions block. Why? Because we programmed them this way: When you first create the socket with socket(), the kernel sets it to blocking. Can we set the socket to be non-blocking? What does it mean? 35 Yes. It means you have to poll it for data (check on it) If there’s no data yet, you get -1 (err) and have to try again. Isn’t that CPU intensive? Yes! So, you can use select()… Reading or writing to multiple sockets A server wants to listen for incoming connections as well as keep reading from the connections it has. Select – a way to monitor several sockets at the same time, and know their situation. select()—Synchronous I/O Multiplexing 36 Manipulates sets of sockets and lets you know: which ones are ready for reading which are ready for writing How? By creating and handling of sets of sockets, and obtaining additional information for each. Manipulating Sets for the select() func. FD_SET(int fd, fd_set *set); Add fd to the set. FD_CLR(int fd, fd_set *set); Remove fd from the set. FD_ISSET(int fd, fd_set *set); Return true if fd is in the set. FD_ZERO(fd_set *set); Clear all entries from the set. Note: fd in Windows in the Socket structure 37 Additional select() info What happens if a socket in the read set closes the connection? Well, in that case, select() returns with that socket descriptor set as "ready to read". When you actually do recv() from it, recv() will return 0. That's how you know the client has closed the connection. One more note of interest about select(): if you have a socket that is listen()ing, you can check to see if there is a new connection by putting that socket's file descriptor in the readfds set. 38 Wanna See Real Clients and Servers? Apache Web server Mozilla Web browser http://www.sendmail.org/ BIND Domain Name System http://www.mozilla.org/developer/ Sendmail Open source server first released in 1995 Name derives from “a patchy server” ;-) Software available online at http://www.apache.org Client resolver and DNS server http://www.isc.org/index.pl?/sw/bind/ … 39 A final note: what is Peer-to-Peer Communication No always-on server at the center of it all Hosts can come and go, and change addresses Hosts may have a different address each time Example: peer-to-peer file sharing 40 Any host can request files, send files, query to find where a file is located, respond to queries, and forward queries Scalability by harnessing millions of peers Each peer acting as both a client and server