Workbook 7 Standard I/O and Pipes Pace Center for Business and Technology 1 Standard I/O and Pipes Key Concepts • Terminal based programs tend to read information from one source, and write information to one destination. • The source programs read from is referred to as Standard In (stdin), and is usually connected to a terminal's keyboard. • The destination programs write to is referred to as Standard Out (stdout), and is usually connected to a terminal's display. • When using the bash shell, stdout can be redirected using > or >>, and stdin can be redirected using <. 2 Three types of programs How you can redirect where input is read from and where output goes. The output of one command can be used as the input for another command, allowing simple commands to be used together to perform more complicated tasks. Three types of programs In Linux (and Unix), programs can generally be grouped into the following three designs. Graphical Programs Graphical programs are designed to run in the X graphical environment. They expect the user to be using a mouse, and use common graphical components, such as popup menus and buttons, for user input. The mozilla web browser is an example of a graphical program. Screen Programs Screen based programs expect to use a text console. They make use of the entire display, and handle text placement and screen redraws in sophisticated ways. They do not require a mouse, and are appropriate for terminals and virtual consoles. The vi and nano text editors, and links web browser, are examples of screen based programs. Terminal Programs Terminal programs collect input and display output in a stream, seldom if ever redrawing the screen, as if writing directly to a printer that does not allow the cursor to move back up the page. Because of their simplicity, terminal based programs are often called simply commands. ls, grep, and useradd are examples of terminal based programs. This chapter focuses on the latter type of program. Do not let the simplicity of the way these commands receive input and output fool you. You will find that many of these commands are very sophisticated, and allow you to use the command line interface in powerful ways. 3 Standard in (stdin) and Standard out (stdout) Terminal based programs generally read information as stream from a single source, such as a terminal's keyboard. Likewise, they generally write information as a steam to a single destination, such as a display. In Linux (and Unix), the input stream is referred to as Standard In (usually abbreviated stdin), and the output stream is referred to as Standard Out (usually abbreviated stdout). Usually, stdin and stdout are connected to the terminal that runs the command. Sometimes, in order to automate commonly repeated commands, or in order to record the output of a command for later inclusion in a report or email, people find it convenient to redirect stdin from or stdout into files. 4 Redirecting stdout Writing Output to a File When a terminal based program generates output, it generally writes that output to its stdout stream, without knowing what is connected to the receiving end of that stream. Usually, the stdout stream is connected to the terminal that started the process, so the output is written to the terminal's display. The bash shell uses > to redirect a process's stdout stream to a file. For example, suppose the machine elvis is using becomes very sluggish and non-responsive. In order to diagnose the problem, elvis would like to examine the currently running processes. Because the machine is so sluggish, however, he wants to collect the information now, but analyze it later. He can redirect the output of the ps aux command into the file sluggish.txt, and come back to examine the file when the machine is more responsive. 5 Redirecting stdout Notice that no output is displayed to the terminal. The ps command writes to stdout, as it always does, but stdout is redirected by the bash shell to the file sluggish.txt. The user elvis can examine the file later, at a more convenient time. 6 Appending Output to a File If the file sluggish.txt already existed, its original contents would be lost. This is often referred to as clobbering a file. To append a command's output to a file, rather than clobbering it, bash uses >>. Suppose that elvis wanted to record a timestamp of when the sluggish behavior was happening, as well as a list of currently running processes. He could first create (or clobber) the file with the output of the date command, using >, and then append to it the output of the ps aux command using >>. 7 Redirecting stdin Just as bash uses > to coax commands into delivering their output somewhere other than the display, bash uses < to cause them to read input from somewhere other than the keyboard. The user elvis is still trying to figure out why his machine was acting sluggish. He talked to his local system administrator, who thought that looking at the list of currently running processes sounded like a good idea, and asked elvis to mail him a copy. Using the terminal based mail command, elvis first writes an email message to the administrator "manually", from the keyboard. The mail command expects a recipient as an argument, and the subject line can be specified with the -s command line switch. The email body is then entered from the keyboard. The end of the message text is signaled by a lone period on a line. 8 Redirecting stdin For his follow-up message, elvis can easily mail the output of the ps command he recorded in the file sluggish.txt. He just redirects the mail command's stdin stream to be read from the file. The system administrator will receive an email from elvis, with "ps output" as it's subject line, and the contents of the file sluggish.txt as its body. In the first case, the mail process's stdin was connected to the terminal, and the message body was provided by the keyboard. In the second case, bash arranged for the mail process's stdin to be connected to the file sluggish.txt, and the message body was provided by its contents. The mail command doesn't change its basic behavior: It reads the body of the email message from stdin. [8] 9 Under the Hood: Open Files and File Descriptors Open Files and File Descriptors To fully appreciate how processes manage Standard In, Standard Out, and files, we must introduce the concept of a file descriptor. In order to read information from or write information to a file, a process must open the file. Linux (and Unix) processes keep track of the files they currently have open by assigning each an integer. The integer is called a file descriptor. The Linux kernel provides an easy way to examine the open files and file descriptors of a currently running process, using the /proc file system. Every process has an associated subdirectory under /proc, named after its PID (process ID). The process's subdirectory in turn has a subdirectory called fd (for file descritptor). Within the /proc/pid/fd subdirectory, a symbolic links exists for every file the process has open. The name of the symbolic link is the open file's integer file descriptor, and the symbolic link resolves to the open file itself. In the following, elvis cats the file /usr/share/hwdata/oui.txt, and then almost immediately suspends the program with a CTRL+Z. 10 Under the Hood: Open Files and File Descriptors Using the ps command to look up the process's PID, elvis next examines the process's /proc/pid/fd directory. Not surprisingly, the cat process has the file /usr/share/hwdata/oui.txt open (it must be able to read the file to display its contents). Perhaps a little surprising, it is not the only, or even the first, file that the process has open. The cat command has three open files before it, or, more exactly, the same file open three times: /dev/tty1. 11 Under the Hood: Open Files and File Descriptors As a Linux (and Unix) convention, every process inherits three open files upon startup. The first, file descriptor 0, is Standard In. The second, file descriptor 1, is Standard Out, and the third, file descriptor 2, is Standard Error (to be discussed in the next Lesson). What open files did the cat command inherit from the bash shell that started it? The device node /dev/tty1 for all three. Recall that /dev/tty1 is the device node which connects to the console serial driver within the kernel. Whatever elvis types can be read from this file, and whatever is written to this file is displayed on elvis's terminal. What happens if the cat process reads from stdin? It reads input from elvis's keyboard. What happens if it writes to stdout? Whatever is written is displayed on elvis's terminal. 12 Redirection In the next example, elvis cat's the /usr/share/hwdata/oui.txt file, but this time redirects stdout to the file /tmp/foo. Again, elvis suspends the command in mid-stride with the CTRL+Z control sequence. Using the same technique as above, elvis examines the files opened by the cat command, and the file descriptors associated with them. 13 What happens when elvis redirects both Standard Out and Standard In? What happens when elvis redirects both Standard Out and Standard In? 14 What happens when elvis redirects both Standard Out and Standard In? When the cat command is called without arguments (i.e., without any filenames of files to display), it displays Standard In instead. Rather than opening a specified file (using file descriptor 3, as above), the cat command reads from stdin instead. What is the effective difference between the following three commands? There is none. In order to appreciate the real benefit of designing commands to read from Standard In in lieu of named files, we must wait until pipes are introduced in a subsequent Lesson. 15 Examples Chapter 1. Standard In and Standard Out Automating Graph Generation with gnuplot About: 20 minutes http://csis.pace.edu/adelgado/rha-030/scripts/workbook-07/chapter-1/Gnuplot-lab.htm 16 Chapter 2. Standard Error Key Concepts Unix programs commonly report error conditions to a destination called Standard Error (stderr). Usually, stderr is connected to a terminal's display, and error messages are found intermixed with standard output. When using the bash shell, the stderr stream can be redirected to a file using 2>. When using bash, the stderr stream can be combined with stdout stream using 2>&1 or >& 17 Standard Error (stderr) We have discussed the standard input and output streams, stdin and stdout, and how to use > and < in the bash command line to redirect them. We are now ready to confuse matters a little by introducing a second output stream, commonly used for reporting error conditions, called Standard Error (often abbreviated stderr). 18 Standard Error (stderr) In the following sequence, elvis is using the head -1 command to generate a list of the first lines of all the files in the /etc/rc.d directory. 19 Standard Error (stderr) The head command, when fed multiple file names as arguments, conveniently decorates the name of the file, followed by the first specified number of lines (in this case, one). When the head command encounters a directory, however, it merely complains. Next, elvis runs the same command, redirecting stdout to the file rcsummary.out. Most of the output is obediently redirected to the file rcsummary.out, but the directory complaints are still displayed. Although not obvious at the outset, the head command is really sending output to two independent streams. Normal output is written to Standard Out, but error message are written to a separate stream called Standard Error (often abbreviated stderr). Usually, both streams are connected to the terminal, and so the two are difficult to distinguish. By redirecting stdout, however, the information written to stderr is obvious. 20 Redirecting stderr Just as bash uses > to redirect stdout, bash uses 2> to redirect stderr. For example, elvis repeats the head command from above, but instead of redirecting stdout to rcsummary.out, he redirects stderr to the file rcsummary.err. 21 Redirecting stderr The output is the complement to the previous example. We now see the normal output displayed to the screen, but no error messages. Where did the error messages go? It shouldn't be hard to guess. In the following example, both > and 2> are used to redirect stdout and stderr independently. In this case, the standard output can be found in the file rcsummary.out, error messages can be found in rcsummary.err, and nothing is left over to be displayed to the screen. 22 Combining stdout and stderr: Old School Often, someone would like to redirect the combined stdout and stderr streams to a single file. As a first attempt, elvis tries the following command. Upon examining the file rcsummary.both, however, elvis doesn't find what he expects. 23 Combining stdout and stderr: Old School The bash shell opened the file rcsummary.both twice, but treated each open file independently. When stdout and stderr both wrote to the file, they clobbered each other's information. What is needed instead is some way to tell bash to effectively combine stderr and stdout into a single stream, and then redirect that stream to a single file. As you would expect, there is such a way. Although awkward, the last token 2>&1 should be thought of as saying "take stderr, and send it wherever stdout is currently going". Now rcsummary.both contains the expected output. 24 Combining stdout and stderr: New School Using 2>&1 to combine stdout and stderr was introduced in the original Unix shell, the Bourne shell (sh). Because bash is designed to be backwards compatible with sh, it supports the syntax as well. The syntax, however, is inconvenient. Besides being difficult to write, the order of the redirections is important. Using ">out.txt 2>&1" and "2>&1 >out.txt" does not have the same effect! In order to simplify things, bash uses >& to combine both stdin and stdout, as in the following example. Summary The following table summarizes the syntax used by the bash shell for redirecting stdin, stdout, and stderr learned in this and the previous lesson. 25 Examples Chapter 2. Standard Error Using /dev/null to filter out stderr The user elvis is has recently learned that, besides the /home/elvis and /tmp directories he's familiar with, he may also own files in the /var directory. These files are usually spooling files for received but not yet viewed email, print jobs waiting to be sent to the printer, etc. Curious, he uses the find command to find all files within the /var directory that he owns. 26 Examples Chapter 2. Standard Error Although the find command appropriately reported the /var/spool/mail/elvis file, the output is difficult to find among all of the "Permission denied" error messages being reported from various subdirectories of /var. In order to help separate the wheat from the chaff, elvis redirects stderr to some file in the /tmp directory. While this works, elvis is left with a file called /tmp/foo that he really didn't want. In situations like this, when a user wants to discard a stream of information, experienced Unix users usually redirect output to a pseudo device called /dev/null. As the following long listing shows, /dev/null is a character device node, like those used for conventional device drivers. When a user writes to /dev/null, the information is merely discarded by the kernel. When a user reads from /dev/null, they encounter an immediate end of file. Notice that /dev/null is one of the few files in Red Hat Enterprise Linux that has world writable permissions by default. 27 Questions Chapter 2. Standard Error 1 and 2 28 Chapter 3. Pipes Part Workbook 7. Standard I/O and Pipes Key Concepts • The stdout stream from one process can be connected to the stdin stream of another process, using what Unix calls a "pipe". • Many commands in Unix are designed to operate as a filter, reading input from stdin and sending output to stdout. • bash uses "|" to create a pipe between two commands. 29 Pipes Pipes In the previous Lessons, we have seen that a process's output can be redirected to somewhere other than the terminal display, or that a process can be asked to read input from some location other than the terminal keyboard. One of the most common, and most powerful, forms of redirection is a combination of the two, where the output (Standard Out) of one command is "piped" directly into the input (Standard In) of another command, forming what Linux (and Unix) refers to as a pipe. When two commands are joined by a pipe, the stdout stream of the first process is tied directly to the stdin sequence of the second process, so that multiple processes can be combined in a sequence. In order to create a pipe using bash, the two commands are joined with a vertical bar |. (On most keyboards, this character is found on the same key as the backslash, above the RETURN key.) All processes that are joined in a pipe are referred to as a process group. 30 Pipes As an example, consider prince, who is trying to find the largest files underneath the /etc directory. He begins by composing a find command that will list all file with a size greater than 100Kbytes. Observing that the find command seems to list the files in no particular order, prince decides he would like the files to be listed alphabetically. He could redirect the output to a file, and then sort the file. Instead, he takes advantage of the fact that the sort command, when invoked without arguments, looks to Standard In for the data to sort. He pipes the output of his find command into sort. The files are now listed in alphabetical order. 31 Filtering output using grep The traditional Unix grep command is commonly used in pipes to reduce data to only the "interesting" parts. The grep command will be discussed in detail in a later Workbook. Here, we introduce grep in its simplest form. The grep command is used to search for and extract lines which contain a specified string of text. For example, in the following, prince prints all lines that contain the text "root" from the /etc/passwd file. The first argument to the grep command is the string of text to be searched for, and any remaining arguments are files to be searched for the text. If the grep command is called with only one argument (a string to be searched for, but no files to search), it looks to Standard In as its source of data on which to operate. 32 Filtering output using grep In the following, prince has so many files in his home directory that he is having trouble keeping track of them. He's trying to find a directory called templates that he created a few months ago. He uses the locate command to help him find it. Unfortunately for prince, there are many files which contain the text templates in their name on the system, and prince becomes overwhelmed with lines and lines of output. In order to reduce the information to more relevant files, prince next takes stdout from the locate command, and creates a pipe to stdin of the grep command, "grepping" for the word "prince". Because the grep command is not given a file to search, it looks to stdin, where it finds the stdout stream of the locate command. Filtering the stream, grep only duplicates to its stdout lines that matched the specified text, "prince". The rest were discarded. The user prince easily finds his directory under ~/proj, as well as another directory created by the application quanta. 33 Pipes and stderr In the next example, prince is curious to see where he shows up in the system's configuration files, and "greps" for his name in the /etc directory. 34 Pipes and stderr Again, prince is overwhelmed by the amount of output from this command. He tries the same trick, "grepping" it down for all lines that contain the word "passwd". While stdout from the first grep command was appropriately filtered, stderr is unaffected, and still gets displayed to the screen. How would prince go about suppressing stderr as well? Commands as filters The concept of a pipe extends naturally, so that multiple commands can be used together, each reading information from stdin, somehow modifying or filtering the information, and passing the result to stdout. In a subsequent Workbook, you will find that there are many standard Linux (and Unix) commands that are designed for this purpose, including some that you are already familiar with: grep, head, tail, cut, sort, sed, and awk, to name a few. 35 Pipes Listing Processes by Name Often, one would like to list information about processes which are running a specific command. While ps aux tables a lot of information about currently running processes, the number of processes running on the machine can make the output overwhelming. The grep command can help simplify the output. In the following, prince would like to list information about the processes which are implementing his web server, the httpd command. He lists all processes, but then reduces the output to only those lines which contain the text httpd. 36 Examples Listing Processes by Name Often, one would like to list information about processes which are running a specific command. While ps aux tables a lot of information about currently running processes, the number of processes running on the machine can make the output overwhelming. The grep command can help simplify the output. In the following, prince would like to list information about the processes which are implementing his web server, the httpd command. He lists all processes, but then reduces the output to only those lines which contain the text httpd. 37 Questions Chapter 3. Pipes 1 and 2 38