– TCP/UDP and Sockets introduction
– Elementary TCP sockets
– Source: Chapter 4 of Stevens’ book
– TCP Client-Server example
– Source: Chapter 5 of Stevens’ book
– I/O multiplexing
– Source: Chapter 6 of Stevens’ book
TELE 402 Lecture 3: Elementary … 1
TELE 402 Lecture 3: Elementary … 2
• sys/socket.h
should be included for socket API
• int socket(int family , int type , int protocol )
– Returns: nonnegative descriptor if OK, -1 on error
• Family
– AF_INET (IPv4), AF_INET6(IPv6), AF_LOCAL(Unix domain protocol), AF_ROUTE(routing sockets), and more
• Type
– SOCK_STREAM (stream socket),
SOCK_DGRAM(datagram socket), SOCK_RAW (raw socket), and more (e.g. SOCK_PACKET)
• Protocol is normally 0 except for raw sockets
TELE 402 Lecture 3: Elementary … 3
• AF_xxx vs PF_xxx
– AF_xxx stands for address family
– PF_xxx stands for protocol family
– They have the same set of values at the moment
TELE 402 Lecture 3: Elementary … 4
• int connect(int sockfd , const struct sockaddr * servaddr , socklen_t addrlen )
– Returns: 0 if OK, -1 on error
– The socket address structure must contain the IP address and port number of the server
• The client does not have to call bind before calling connect . The kernel will choose both an ephemeral port and the source IP address when necessary
• In case of TCP socket, connect initiates TCP’s three-way handshake. The function returns only when the connection is established or an error occurs
• Can also be used on UDP sockets!
TELE 402 Lecture 3: Elementary … 5
– If there is no response to the SYN segment,
ETIMEDOUT is returned
– If the response is RST, this indicates that no process is waiting for connections on the server host at the port we specified. In this case
ECONNREFUSED is returned immediately
• RST is generated in the following conditions: when a SYN arrives for a port that has no listening server, when TCP wants to abort an existing connection, and when TCP receives a segment for a connection that does not exist.
TELE 402 Lecture 3: Elementary … 6
– If the SYN segment elicits an ICMP destination unreachable message, the kernel saves the message and keeps sending SYN for a few times until failure. In case of failure, the saved
ICMP error is returned.
connect
connect
socket
TELE 402 Lecture 3: Elementary … 7
• int bind(int sockfd , const struct sockaddr * myaddr , socklen_t addrlen )
– Returns: 0 if OK, -1 on error (e.g. EADDRINUSE)
– Assigns a local protocol address to a socket
• Combination of IP address (32 or 128 bit) and 16-bit
TCP or UDP port number
– For TCP sockets, calling bind lets us specify a port number and/or IP address
TELE 402 Lecture 3: Elementary … 8
– Server programs bind their well-known port
– A process can bind a specific IP address to its socket (the IP address must belong to an interface on the host),
• For a TCP client, the IP address is used as source IP address
• For a TCP server, this restricts the socket to receive incoming client connections destined only to that IP address.
TELE 402 Lecture 3: Elementary … 9
• Important points
– Normally a TCP client does not bind an IP address to its socket.
The kernel chooses the source IP address when the socket is connected, based on the outgoing interface that is used.
– If a TCP server doesn’t bind an IP address to its socket (i.e. using wildcard INADDR_ANY), the kernel uses the destination IP address of the client’s SYN segment as the server’s source IP address.
– If we specify a port number of 0, the kernel chooses an ephemeral port when bind is called.
– If we specify a wildcard IP address, the kernel doesn’t choose the local IP address until either the socket is connected (TCP) or a datagram is sent on the socket (UDP)
– For IPv4, the constant INADDR_ANY should be used for wildcard address
– To get kernel chosen port or IP address, use getsockname .
TELE 402 Lecture 3: Elementary … 10
• int listen(int sockfd , int backlog )
– Called after socket and bind , and must be called before accept . listen converts an unconnected socket into a passive socket, indicating that the kernel should accept incoming connection requests directed to this socket.
– Moves the socket from CLOSED to LISTEN
– Backlog specifies the maximum number of connections that the kernel should queue for this socket
• Two queues maintained in the kernel
– Incomplete connection queue: in SYN_RCVD state
– Completed connection queue: in ESTABLISHED state
TELE 402 Lecture 3: Elementary … 11
• When accept is called, an entry is cleared from the completed connection queue, or wait (sleep) if the queue is empty
TELE 402 Lecture 3: Elementary … 12
• backlog is historically regarded as the maximum value for the sum of the two queues
• BSD adds a factor 1.5 to backlog , but other OSes may treat it differently
• Don’t specify a backlog of 0, as different implementations interpret this differently
• If the 3-way handshake completes normally, an entry remains on the incomplete queue for RTT (roundtrip time)
• What value should the application specify?
– Can use environment variable to override the value specified by the user
• Most of the time (99.4%) the complete queue is empty while there is a long incomplete queue.
• If the queues are full, the new arrived SYN is ignored
• Data after 3-way handshake completes but before calling accept are buffered
TELE 402 Lecture 3: Elementary … 13
• int accept(int sockfd , struct sockaddr * cliaddr , socklen_t * addrlen )
– Returns: nonnegative descriptor if OK, -1 on error
– Called by a TCP server to return the next completed connection from the front of the completed connection queue; if the queue is empty, the process is put to sleep (assuming a blocking socket)
TELE 402 Lecture 3: Elementary … 14
– cliaddr and addrlen are used to return the protocol address of the connected peer process.
– addrlen is a value-result argument.
– If not interested in the protocol address of the client, both can be set to null pointers.
– The returned brand new socket is called the connected socket, while the argument sockfd is called the listening socket.
TELE 402 Lecture 3: Elementary … 15
• pid_t fork(void)
– Create a copy of the process
– Returns:
• child’s process ID in the parent
• 0 in the child
• -1 on error
• Note: it is called once but returns twice
– All descriptors open in the parent before the call to fork are shared with the child after fork returns.
TELE 402 Lecture 3: Elementary … 16
fork
– A process copies itself so that one copy handles an operation while the other copy does another task
– A process wants to execute another program
TELE 402 Lecture 3: Elementary … 17
• There are six exec functions
– exec replaces the current process image with the new program file and this program normally starts at the main function
– These functions return to the caller only if an error occurs
– The difference in the six functions is
• Whether the program file to execute is specified by a filename or a pathname.
If it is a filename , the PATH environment variable is used. If there is a “/” in the name, PATH not used
• Whether the arguments to the new program are listed one by one or referenced through an array of pointers
• Whether the environment of the calling process is passed to the new program or a new environment is specified
TELE 402 Lecture 3: Elementary … 18
TELE 402 Lecture 3: Elementary … 19
• Descriptors opened in the process before calling exec normally remain open across the exec
– This can be disabled by using fcntl to set the FD_CLOEXEC descriptor flag.
TELE 402 Lecture 3: Elementary … 20
listenfd = socket( . . .); bind( . . .); listen(listenfd . . .); for ( ; ; ) { connfd = accept(listenfd . . . ); if ((pid = fork()) == 0) { close(listenfd);
-- process request (connfd)-close(connfd); exit(0);
} close(connfd);
}
TELE 402 Lecture 3: Elementary … 21
TELE 402 Lecture 3: Elementary … 22
TELE 402 Lecture 3: Elementary … 23
TELE 402 Lecture 3: Elementary … 24
• int close(int sockfd )
– Returns
• 0 if OK
• -1 on error
– The socket is no longer usable by the process.
– But already queued data will be sent to the other end before the TCP termination sequence takes place.
TELE 402 Lecture 3: Elementary … 25
• Descriptor reference count
– Used to track how many processes are using the descriptor.
– Only when the count becomes 0, does TCP initiates the connection termination sequence on the socket descriptor
– It is important for a process to close descriptors after using them; otherwise they will remain open for the life of the process
TELE 402 Lecture 3: Elementary … 26
• These two functions return either the local protocol address associated with a socket or the foreign protocol address associated with a socket
– int getsockname(int sockfd, struct sockaddr *localaddr, socklen_t *addrlen)
– int getpeername(int sockfd, struct sockaddr *peeraddr, socklen_t *addrlen)
– Both return: 0 if OK, -1 on error
TELE 402 Lecture 3: Elementary … 27
• Scenario of using them
– Find the ephemeral port number for a process
– Find address family of a socket by an exec ed process
– Find the address of the bound interface in a server
– Find the address of the peer in a server process
TELE 402 Lecture 3: Elementary … 28
• Description
– The client reads a line of text from its standard input and writes the line to the server
– The server reads the line from its network input and echoes the line back to the client
– The client reads the echoed line and prints it on its standard output
TELE 402 Lecture 3: Elementary … 29
main
tcpcliserv/tcpserv01.c
– Create socket, bind server’s well-known port
– Wait for client connection to complete
– Concurrent server (using fork )
str_echo
– Read a line and echo the line
tcpcliserv/tcpserv01.c
TELE 402 Lecture 3: Elementary … 30
listenfd = socket(. . .); bind( . . .); for ( ; ; ) { connfd = accept(listenfd . . ); if ( (childpid=fork()) == 0) { close(listenfd); str_echo(connfd); close(connfd); exit(0);
} close(connfd);
}
TELE 402 Lecture 3: Elementary … 31
void str_echo(int sockfd) { ssize_t n; char buf[MAXLINE]; again: while( (n=read(sockfd,buf,MAXLINE)) > 0) writen(sockfd, buf, n); if (n < 0 && errno == EINTR) goto again; else if (n < 0) err_sys(“str_echo: read error”);
}
TELE 402 Lecture 3: Elementary … 32
• main (in tcpcliserv/tcpcli01.c
)
– Create socket, fill in Internet socket address structure
– Connect to server
• str_cli function
– Read a line, write to server
– Read echoed line from server, write to standard output
– Return to main when end-of-file or error.
• See tcpcliserv/tcpcli01.c
for detail
TELE 402 Lecture 3: Elementary … 33
sockfd = socket( . . . ); connect(sockfd, . . ); str_cli(stdin, sockfd); exit(0);
TELE 402 Lecture 3: Elementary … 34
void str_cli(FILE *fp, int sockfd) { char sendline[MAXLINE], recvline[MAXLINE];
} while (fgets(sendline, MAXLINE, fp) != NULL) { writen(sockfd, sendline, strlne(sendline)); if (readline(sockfd, recvline, MAXLINE) == 0) err_quit(“str_cli: server quit prematurely”); fputs(recvline, stdout);
}
TELE 402 Lecture 3: Elementary … 35
• Zombie process (defunct)
– process state Z for zombie
• When child process terminates, OS sends
SIGCHLD signal to the parent. If there is no action (i.e. wait for the child process) for the signal in the parent, the child process becomes zombie.
• When a process terminates, its child processes are handed over to init process (pid 1) as child processes. init will clean up the zombies.
• Major problem for long running servers.
TELE 402 Lecture 3: Elementary … 36
TELE 402 Lecture 3: Elementary … 37
• A signal is a notification to a process that an event has occurred
– Also called software interrupts
– Usually occur asynchronously
– Use man 7 signal to find all signals in Linux
• Signals can be sent
– By one process to another process (or to itself), e.g.
SIGKILL signal to a process
– By the kernel to a process (e.g. SIGSEGV and
SIGCHLD signals)
• Commonly used signals
– SIGALRM, SIGHUP, SIGPIPE, SIGIO, …
TELE 402 Lecture 3: Elementary … 38
• Every signal has an action associated with it
– Use sigaction function to set a handler function to catch the signal. The function is called when the signal occurs.
• SIGKILL and SIGSTOP can not be caught
– Ignore the signal by setting the handler SIG_IGN
• SIGKILL and SIGSTOP can not be ignored
– Set the default action by setting the handler SIG_DFL.
The default is normally to terminate a process on the receipt of a signal, with certain signals also generating a core image of the process (e.g. abort signal)
TELE 402 Lecture 3: Elementary … 39
• Prototype of signal functions
– void (*func)(int)
– struct sigaction { void (*sa_handler)(int); void (*sa_sigaction)
(int, siginfo_t *, void *); sigset_t sa_mask; int sa_flags
}
– int sigaction(int signum, const struct sigaction *act, struct sigaction *oldact);
TELE 402 Lecture 3: Elementary … 40
void sig_chld(int signo) { pid_t pid; int stat; pid = wait(&stat); printf(“Child %d terminated\n”, pid); return;
}
TELE 402 Lecture 3: Elementary … 41
• Set the above handler in the parent process to catch the SIGCHLD signal struct sigaction act, oact; act.sa_handler=sig_chld; sigemptyset(&act.sa_mask); act.sa_flags=0; if(sigaction(SIGCHLD, &act, &oact)<0) return(SIG_ERR);
TELE 402 Lecture 3: Elementary … 42
• Once a signal handler is installed, it remains installed.
• While a signal handler is executing, the signal being delivered is blocked. Furthermore any additional signals that were specified in the sa_mask argument passed to sigaction when the handler was installed are also blocked.
– Use sigaddset to add to-be-blocked signals in sa_mask
• If a signal is generated one or more times while it is blocked, it is normally delivered only one time after the signal is unblocked.
TELE 402 Lecture 3: Elementary … 43
• It is possible to selectively block and unblock a set of signals using sigprocmask function. This lets us protect a critical region of code by preventing certain signals from being caught while that region of code is executing.
• Handling interrupted system calls
– Compare programs tcpcliserv/tcpserv02.c
and tcpcliserv/tcpserv03.c
TELE 402 Lecture 3: Elementary … 44
• accept is sometimes called a ‘slow system call’.
• The system call may never return.
• Applies to many network functions
• If a process is blocked in a slow system call and the process catches a signal and the signal handler returns, the system call can return an error of
EINTR.
TELE 402 Lecture 3: Elementary … 45
for ( ; ; ) { clilen = sizeof(cliaddr); if ((connfd = accept(listenfd, . . .))<0) { if (errno == EINTR) continue; else err_sys(“accept error”);
}
TELE 402 Lecture 3: Elementary … 46
– pid_t wait(int * statloc )
• Block if no terminated child; otherwise, return the pid of the first terminated child.
– pid_t waitpid(pid_t pid , int * statloc , int options )
• Block if the specified child hasn’t terminated yet
• WNOHANG specified as options tells kernel not to block
• If pid is -1, the function returns the pid of the first terminated child
– Both return: process ID if OK, 0 or -1 on error
– statloc returns the termination status of the child
TELE 402 Lecture 3: Elementary … 47
• Install a SIGCHLD handler using sigaction
– E.g. the sig_chld function
• Wait for the child processes to prevent them from becoming zombies
– Use wait or waitpid
TELE 402 Lecture 3: Elementary … 48
• When using wait in SIGCHLD handler, it is possible to leave zombies if there are multiple child processes terminates simultaneously
• So the best way is to use waitpid in a loop until there is no more terminated child.
TELE 402 Lecture 3: Elementary … 49
TELE 402 Lecture 3: Elementary … 50
void sig_chld(int signo) { pid_t pid; int stat; while ((pid = waitpid(-1, &stat, WNOHANG))>0) printf(“Child %d terminated\n”, pid); return;
}
TELE 402 Lecture 3: Elementary … 51
– We must catch SIGCHLD signal when fork ing child processes
– We must handle interrupted system calls when signals are caught.
– A SIGCHLD handler must be coded correctly using waipid to prevent any zombies from being left around.
TELE 402 Lecture 3: Elementary … 52
accept
– ECONNABORTED error returned in Posix.1g
TELE 402 Lecture 3: Elementary … 53
• Termination of server process
– FIN is sent to the client, and the client TCP responds with an ACK
– Also SGICHLD is sent to the server parent and handled
– When the client reads a line and sends it, there is no problem, but it will elicit a RST
– But when the client calls readline , it returns 0 (end-offile) immediately.
– The problem is that the client is blocked in the call to fgets when the FIN arrives. It is working with two descriptors and needs to block for input from either.
TELE 402 Lecture 3: Elementary … 54
• SIGPIPE signal
– Suppose the client ignored the above error returned from readline , and writes more data to the server.
– The first write elicited the RST
– RULE: When a process writes to a socket that has received a RST, the SIGPIPE signal is sent to the process. The default action of this signal is process termination.
– If the process catches the signal and returns from the handler, or ignores the error, EPIPE is returned from the write op.
TELE 402 Lecture 3: Elementary … 55
• Crashing of server host
– When the server host crashes, nothing is sent out on the existing network connections.
– We type a line of input to the client, it is written by write , and is sent by the client TCP as a data segment. The client then blocks in the call to readline , waiting for the echoed reply.
– The client TCP continually retransmits the data segments, trying to receive an ACK from the server. When the client TCP finally gives up, an error is returned to the client process.
– Since the client is blocked in the call to readline, it returns an error.
– The error is probably ETIMEOUT.
– The error could be EHOSTUNREACH or ENETUNREACH.
TELE 402 Lecture 3: Elementary … 56
• Crashing and rebooting of server host
– We type a line of input to the client, it is written by write , and is sent by the client TCP as a data segment. The client then blocks in the call to readline , waiting for the echoed reply
– When the server host reboots after crashing, its TCP loses all information about connections that existed at the time of the crash.
– The server host would respond to the data segment with a RST.
– The client is blocked in the call to readline when the RST is received, causing readline to return error ECONNRESET
TELE 402 Lecture 3: Elementary … 57
• Shutdown of server host
– When Unix shutdowns, init sends the SIGTERM signal to all processes and waits a fixed amount of time. Then it sends the SIGKILL signal to remaining processes.
– When the processes terminate, all open connections are closed.
– This is the same scenario as termination of the server process.
TELE 402 Lecture 3: Elementary … 58
TELE 402 Lecture 3: Elementary … 59
TELE 402 Lecture 3: Elementary … 60
• A protocol should be designed between the server and the client
– The simple solution is a text (ASCII) protocol
• If binary structures are passed between the server and the client
– Pass all the numeric data as text strings (using snprintf )
– Explicitly define the binary formats of the supported data types (number of bits, big or little endian) and pass all data between the client and server in this format, e.g.
External Data Representation (XDR) standard (RFC
1832)
TELE 402 Lecture 3: Elementary … 61
– Routers provide interconnections among physical networks
– Use network address for routing
– Users do not have to know the physical networks
– All physical networks are equal
TELE 402 Lecture 3: Elementary … 62
• Class A, B, C, D (multicast)
• Network id
– Host id 0 is never used for hosts
• Direct broadcast address
– Host id with all 1s, reaching all hosts in the network.
Sometimes host id with all 0s can work (BSD Unix before v4.3).
• Limited broadcast address
– 32 1s, limited to a subnet (LAN), dropped by routers.
• Weaknesses in Internet addressing
– No mobility, non-extensibility due to limited fixed address space, trouble for multi-homed hosts
TELE 402 Lecture 3: Elementary … 63
• Server
– Starts first, but passive, uses well known ports
– Concurrent and complex due to security and reliability concerns
• Client
– Starts later, but active and initiative in the communication, uses temporary ports
– Simple, and can use broadcast address to find a server
• Alternatives to demand driven?
– Caching and prefetching
TELE 402 Lecture 3: Elementary … 64