Uploaded by Fattah Libenanga

mashey-command

advertisement
Using a Command Language as a High-Level Programming Language
J. R. Mashey
Bell Laboratories
Piscataway, New Jersey 08854
Keywords: Command languages, command interpreters, UNIX.
•
Abstract: The command language for the Programmer's Workbench (PWB) utilizes an extended version of the standard UNIX
shell program, plus commands designed mainly for use within
shell procedures (command files). Modifications have been aimed
at improving the use of the shell by large programming groups,
and making it even more convenient to use as a high-level programming language. In line with the philosophy of much existing UNIX software, an attempt has been made to add new
features only when they are shown necessary by actual user
experience in order to avoid contaminating a compact, elegant
system through "creeping featurism." By utilizing the shell as a
programming language, PWB users have been able to eliminate a
great deal of the programming drudgery that often accompanies a
large project. Many manual procedures have been quickly,
cheaply, and conveniently automated. Because it is so easy to
create and use shell procedures, each separate project has tended
to customize the general PWB environment into one tailored to its
own requirements, organizational structure, and terminology. A
summary is given of the usage patterns revealed by a survey of
1,725 existing shell procedures.
I.
•
•
•
INTRODUCTION
A good operating system command language (CL) is an invaluable tool for large programming projects. A common occurrence
in such projects is the diversion of significant resources from
building the end product to creating internal support programs
and procedures. If a CL is a flexible programming language, it
can be used to solve many internal support problems, without
requiring compilable programs to be written, debugged, and
maintained. Its most important advantage is the ability to do the
job now.
•
The Programmer's Workbench (PWB) environment [IVI75A,
DOL76AI supports projects ranging in size from several people to
those involving hundreds. The PWB CL is a modified version of
the UNIX shell language [RIT74A, THO75AI. Experience with it has
shown it to be an effective tool for automating procedures and
eliminating programming drudgery. Nearly two thousand CL procedures have been inspected to observe real-life usage. Many of
the assertions made in this paper are based on the results of this
survey.
2. OVERVIEW OF THE UNIX ENVIRONMENT
Full understanding of some later discussions depends on familiarity with UNIX. [RIT74A] is a definite prerequisite, and it would
be helpful to read at least one of [KER75A, KER76A, THO75A]. For
completeness, a short overview of the most relevant concepts is
given below.
Existing CLS include a wide range of structures and abilities,
ranging from those quite similar to assembly language to a few
having the appearance and abilities of high-level programming
languages [BRU76A, COW75AI. A diversity of viewpoints exists
regarding the relative importance of various CL characteristics
[DOL69A, SIM74A,UNG75A]. Several years of PWB experience suggests that the following qualities are valuable in a CL:
•
•
Interchangeability o f programs and CL procedures--No user
should need to know whether a given command is implemented as a compiled, executable program or as a CL procedure. Syntactic differences should be avoided.
Ease o f creation and maintenance o f CL procedures--It should
require very little effort to write a CL procedure, save it for
later use, and maintain it. The overhead of writing a small
procedure should be especially small; as will be seen later,
many procedures are only a few lines long.
Organization o f CL procedures--It must be possible to share
easily the use of procedures among people in a way consistent
with organizational structure.
Programming language features--It must be possible to write
procedures with convenient conditional branching, looping,
argument handling, variables, string manipulation, and occa,
sional arithmetic. Thus, many of the capabilities of a typical
procedural language are necessary for a good CL. However,
they may not be sufficient: a CL may need additional capabilities, a somewhat different syntax, and a radically different set
of priorities concerning the importance of various constructs.
For example, a crucial CL capability is that of connecting executing programs in a variety of ways. Pattern-matching and
other string manipulation operations are quite useful. Arithmetic operations are also helpful, but seem to be much less
important than the others.
Separation from operating s y s t e m - - A CL interpreter should normally be an ordinary command, treated like other commands,
and definitely not embedded in the heart of the operating system. The operating system should not be viewed as an implementation of the CL, but as an environment that can support
a variety of good CLS. Different users should be able to have
different CLs if they feel like it. This approach permits experimentation and evolution without bothering, other users. It
often leads to a "survival-of-the-fittest" behavior. Mutations
occur, live, and die on their merits, and eventually breed
hybrids containing features found useful from actual practice.
2.1 File System
The UNIX file system's overall structure is that of a rooted tree
composed of directories and other files. A .file name is a sequence
of characters. A path name is a sequence of directory names followed by a file name, each separated from the previous one by a
slash ("/"). If a path name begins with a " / " , the search for the
file begins at the root of the entire tree; otherwise, it begins at
the user's current directory. (The first type of name is often called
an absolute path name because it is invariant with regard to the
user's current directory.) The user may change current directories at any time by using the chdir command. In general, file
names and path names can be used in the same ways. Some
sample names are:
Ease o f on-line u s e - - M a n y advantages can be gained by
replacing keypunches with terminals, even when jobs are
prepared for batch execution. To support this approach, CL
syntax should emphasize simplicity and avoid the need for
redundant typing.
Convenient use o/" the same language as an on-line CL and as a
programming language--It must be possible to build and store
sequences of commands (CL procedures) that can be invoked
at a later time.
169
/
root of the entire file structure.
/bin
directory of commonly used
mands.
Limiting interprocess communication to a small n u m b e r of
well-defined methods is a great aid to uniformity, understandability, and reliability of programs. It encourages the packaging of
functions into small programs that are easily connected. The
pipe mechanism is especially desirable, both for human
comprehension and for computer performance [THO75A,KER75A,
KER76A].
public com-
/u0/tnds/tf/jtb/bin a path name typical of multi-person programming projects. This one is a private directory
of commands belonging to person "jtb" of
group "tf" in project "tnds".
bin/umail
a name depending on the current directory:
it refers to file "umail" found in subdirectory
" b i n " of the current directory. If the current
directory is " / " , it names "/bin/umail". If
the current directory is "/u0/tnds/tf/jtb", it
names " /uO/tnds/tf/jtb/bin/umail".
3. SHELL BASICS
Most UNIX users utilize the CL provided by a program called the
shell. It reads input from a terminal or file and arranges for the
execution of the requested commands. The shell is a small program (about 20 pages of C code); many CL functions are actually
supported by independent programs that work with the shell, but
are not built into it. The discussion is adapted from [THO75A,
THO75B].
Large projects require the ability to quickly and easily modify
directory structures to fit changing needs. In particular, the
"current directory" feature makes it possible for each person to
move around in the file system and work where most convenient. This allows simple names to be used, even when the
current directory is many levels deep in the structure. It also
permits individual directories to remain fairly small, lessening the
load on both human and computer; i.e., "locality of reference" is
good for the performance of both.
3.1 Commands
A command is a sequence of non-blank arguments separated by
blanks. The first argument specifies the name of the command
to be executed; the remaining items are passed as arguments to
the command executed. The following line requests the pr command to print files a, b, and c:
prabc
2.2 Processes and Their Interactions
If the first argument names a file that is marked as executable 2
and is actually a load module, the shell (as parent) spawns a new
(child) process that immediately executes that program. If the
file is marked executable, but is neither a load module nor a
directory, it is assumed to be a command file (shell procedure). A
command file is a file of ordinary text--sheU command lines and
possibly lines to be read by other programs. In this case, the
shell spawns a new instance of itself to read the file and execute
the commands included in it.
An image is a computer execution environment, including core
-image, register values, current directory, status of open files,
information recorded at login time, and various other items. A
process is the execution of an image; most UNIX commands execute as separate processes. One process may spawn another
using the fork system call, which duplicates the image of the
original (parent) process. The new (child) process may continue
execution of the .image, or may abandon it by issuing an exec
system call, which initiates execution of another program.
Processes use independent address spaces and
segments, 1 and communicate in a limited number of wavs:
•
•
•
•
•
From the user's viewpoint, executable programs and shell
procedures are invoked in exactly the same way. The shell
determines which implementation has been used, rather than
requiring the user to do so. Most operating systems can execute
existing load modules without requiring the user to state the
source language used to produce the load module. CL proceOures
are treated in the same way, for several reasons. First, the
existence of two distinct types of commands is confusing to the
novice user. Second, any user becomes irritated when forced to
type repetitive information, especially when the system already
has it. Finally, the implementation of a given command may
well change with time, typically from shell procedure to compiled
program. This change could cause great pain to the users if it
required the invocation method to change also.
data
Open files--a child inherits the parent's open files, and can
manipulate the associated read/write pointers thus shared
with the parent. This ability permits processes to share the
use of a common input stream in various useful ways. In
particular, an open file possesses a pointer that indicates a
location in the file, and is modified by various operations.
Read and write copy a requested number of bytes from (to) a
file, beginning at the location given by the current value of
the pointer. As a side effect, the pointer is incremented by
the number of bytes transferred, yielding the effect of
sequential I/O. Seek can be used to obtain random-access
I/O; it sets the pointer to an absolute location within the file,
or to a location offset from the end of the file or the current
pointer location.
Arguments--a sequence of arguments (character strings) can
be passed from one program to another via exec.
Return code--when a process terminates, it can set a numeric
return code that is available to the process's parent.
Files--some programs arrange conventions to share files in
various ways, or to use files of specified names.
Pipes--pipes are interprocess channels that are similar to files
in ways of access, but allow very convenient handling of t h e
"producer-consumer" relationship between programs executing in parallel. The "producer" writes into one end of a pipe,
while the " c o n s u m e r " empties it by reading from the other
end. Because UNIX handles details of buffering and syn:hronization, neither program needs explicit information
about the other's activities.
3.2 Finding Commands
The shell normally searches for commands in a way that permits
them to be found in three distinct locations in the file structure.
It first attempts to use the command name without modification,
then prepends the string " / b i n / " to the name, and then
"/usr/bin/". If the original command name is a simple one, the
effect is to search in order the current directory, "/bin", and
"/usr/bin". A more complex path name may be given, either to
locate a file relative to the user's current directory, or to access
one via an absolute path name.
This mechanism gives the user ,,onvenient execution of public commands and of commands in or " n e a r " the current directory, as well as the ability to execute any accessible command,
regardless of location in the file structure. The search order permits a standard command to be ret~laced by a user:s 'command
without affecting anyone else.
1. The text segment of a reentrant program is shared by all processes
executing that program. Almost all programs are reentrant.
2. As shown by a set of flag bits associated with the file.
170
3.3 Command Lines
3.4 Redirection of Input and Output
A series of commands separated by " ] " make up a pipeline. Each
command is run as a separate process connected to its
neighbor(s) by pipes, i.e., the output of each command (except
the last one) becomes the input of the next command in line. A
filter is a command that reads its input, transforms it in some
way, then writes it as output. A pipeline normally consists of a
series of filters. Although the processes in a pipeline are permitted to execute in parallel, they are synchronized to the extent
that each program needs to read the output of its predecessor.
Many commands operate on individual lines of text, reading a
line, processing it, writing it, and looping back for more input.
Some must read larger amounts of data before producing output;
sort is an example of the extreme case that requires all input to
be read before any output is produced.
When a command begins execution, it usually expects three files
to be already open, a "standard input," a "standard output," and
a "diagnostic output." When the user's original shell is started,
all three are opened to the user's terminal. A child process normally inherits these files from its parent. The shell permits them
to be redirected elsewhere before control is passed to an invoked
command.
An argument of the form " < f i l e " or " > f i l e " opens the
specified file as standard input or output, respectively. An argument of the form " > >file" opens the standard output to the
end of the file, thus providing a way to append data to it. In
either output case, the shell creates the file if it did not already
exist.
These forms of I/O redirection complement that of piping:
files and programs can both be used as data "sources" and
"sinks."
The following is an example of a typical pipeline:
nroff - m m
t e x t ] c o i l reform
In general, most commands neither know nor care whether
their input (output) is coming from (going to) a terminal, file, or
pipe. Commands are thus easily used in many different contexts.
A few commands vary their actions depending on the nature of
their input or output, either for etiiciency's sake, or to avoid useless actions (such as attempting random-access I/O on a terminal
or pipe).
Nroff is a text formatter whose output may contain reverse line
motions; col converts them to a form that can be printed on a
terminal lacking reverse motion, and reform is used here to speed
printing by converting the (tab-less) output of col to one containing horizontal tab characters. The flag " - m m " indicates one Of
many possible formatting options, and "text" is the name of the
file to be formatted.
3.5 Generation of Argument Lists
A simple command in a pipeline may be replaced by a command line enclosed in parentheses "( )"; in this case, another
instance of the shell is spawned to execute the command line so
enclosed. This action is helpful in combining the output of
several sequentially executed commands into a stream to be processed by a pipeline. The following line prints two separate
documents in a way similar to the previous example.
(nroff - m m
textl; nroff - m m
text2) [
coil
Many command arguments are names of files. When certain
characters are found in an argument, they cause replacement of
that argument by a sorted list of zero or more file names
obtained by pattern-matching on the contents of directories.
Most characters match themselves. The " ? " matches any single
character; the " * " matches any string of any characters other
than " / " , including the null string. Enclosing a set of characters
within square brackets "[...]" causes the construct to match any
single one of the characters in that set. Inside brackets, a pair of
characters separated by " - " includes in the set all characters lexically within the inclusive range of that pair.
reform
If the last command in a pipeline is terminated by a semicolon
(";") or new-line, the shell waits for the command to finish
before continuing execution. It does not wait if the command is
terminated by an ampersand ( " & " ) ; both sequential and asynchronous execution are thus allowed. An asynchronous pipeline
continues execution until it terminates voluntarily, or until it is
killed (by one of various means). A command line consists of zero
or more pipelines separated by Semicolons or ampersands.
For example, " * " matches all names in the current
" * t e m p * " matches any names containing "temp",
matches all names beginning with "a" through
" / u 0 / t n d s / t f / b i n / ? " matches all single-character names
"/u0/tnds/tf/bin".
This capability saves much typing, and more importantly, provides convenience in organizing files for various purposes. It
allows convenient use of large numbers of small files.
For example, the following command line is used to run timing tests on an empty system. Makeload is a cyclic shell procedure used to generate a heavy, repeatable load of disk accesses,
and testl performs timing tests on various prog~:ams. The shell
runs testl with no load on the system, then starts one makeload
to create a single unit of disk load for the second testl. Another
makeload is invoked to yield two units of load for the last test1.
testl; makeload &
testl; make load &
directory,
"]a-f]*"
" f ' , and
found in
3.6 Quoting Mechanisms
If a character has a special meaning to the shell, that meaning
may be removed by preceding the character with a backslash
( " \ " ) ; the " \ " acts as an escape and disappears. Sequences of
characters enclosed in double (") or single (.') quotes are in general taken literally, except that substitution of shell arguments
and variables is normally performed.
testl
Each makeload runs until explicitly killed by the .user. A
minimum of three processes are active by the time the final test1
is run (two makeloads and one testl). In this particular case, all
commands are implemented as shell procedures, so there is a
separate invocation of the shell for each of the five commands
on the above line, and each shell may well spawn hundreds of
additional processes. Thus, a single user may consume all system resources by creating large numbers of long-lived asynchronous processes) More typical uses o f " & " include off-line printing, background compilation, and generation of large jobs to be
sent to remote computers.
A " \ " followed by a new-line is treated as a blank, permitting
convenient continuation of multi-line commands.
4. USING THE SHELL AS A COMMAND: SHELL PROCEDURES
4.1 Invoking the Shell
The shell may be invoked explicitly in various ways:
sh "
3. Lockout of other users in this way occurs several times pei- year on PWB
systems; it is usualty caused by overly enthusiastic beginning users.
171
The new shell reads the standard input, but does
not prompt. This is often used to let the shell
act as filter, i.e., it can be used in a pipeline to
read and execute a dynamically-generated
stream of commands.
sh file [args]
A new instance of the shell is created to begin
reading the file. Arguments can be manipulated
as described in the next section.
file [args]
As noted in Section 3.1, if the file is marked
executable, and is neither a directory nor a load
module, the effect is the same as "sh file [args]".
of the variable. A common use is to capture the output of a program. For example, date writes the current time and date to its
standard output. The following line saves this value in $d:
date I = d
Thus, $d would be set to a value such as:
Tue Jul 13 19:06:02 EDT 1976
4.2 Passing Arguments to the Shell
A second use is in writing of interactive shell procedures, which
are heavily used in PWB work. The following example is part of a
procedure to ask the user what kind of terminal is being used, so
that tabs can be set, delays changed, and other useful actions
taken. The " < / d e v / t t y " indicates a redirection of the standard
input to the user's terminal; it is not seen as an argument to " = " ,
but rather causes the variable to be set to the next line typed by
the user.
When a command line is scanned, any character sequence of the
form $n is replaced by the nth argument to the shell, counting
the name of the file being read as $0. A procedure may possess
several different names and can check $0 to determine the
specific name being used, then vary its actions accordingly.
This notation permits up to 10 arguments to be referenced.
Additional arguments can be processed using the shift command.
It shifts arguments to the left; i.e., the value of $1 is thrown
away, $2 replaces $1, $3 replaces $2, etc. For example, consider
the file "loopdump" below. Echo writes its arguments to the
standard output; if, exit, and goto are discussed later, but perform
fairly obvious functions.
: loop
echo "terminal?"
= a </dev/tty
if "$a" = ti goto ti
if "$a" = hp goto hp
echo "$a no good; try ti or hp"
goto loop
: hp
... processing for terminal type " h p "
exit
: ti
... processing for terminal type "ti"
: loop
if "$1" = .... exit
echo $1 $2 $3 $4 $5 $6 $7 $8 $9
shift
goto loop
If the file is invoked by "loopdump a b c" it would print:
abc
bc
C
Currently, five variables are assigned special meanings:
The form "shift n " has no effect on the arguments' to the left of
the nth argument; the nth argument is discarded, and highernumbered ones shifted. Thus, shift is equivalent to ' ~ h ~ 1."
4.3 Shell Variables
Sn
records the number of arguments passed to the shell, not
counting the name of the shell procedure itself. Thus,
"sh file argl arg2 arg3" sets $n to 3. Shift never changes
the value of $n.
$p
permits alteration of the names and order of directory path
names used when searching for commands. It contains a
sequence of directory names (separated by colons) that are
to be used as search prefixes, in order from left to right.
The current directory is indicated by a null string. The
string is normally initialized to a value producing the effect
described in Section 3.2: " : / b i n :/usr/bin". A user could
possess
a personal directory of commands
(e.g.,
" / u 2 / p w / b i n " ) and cause it to be searched before the other
three directories by using:
Adding if and goto commands (described later) to the existing
facilities permits convenient expression of some kinds of procedures: repetitive ones that perform a given set of actions for
each argument and those that use simple conditional logic. Clear
expression of many procedures requires at least a few shell variables.
The PWB shell provides 26 string variables, $a through Sz.
Those in the first half of the alphabet are guaranteed to be initialized to null strings at the beginning of execution and are
never modified except by explicit user request. On the other
hand, some variables in the range $m through $z have specific
initial values, and may possibly be changed implicitly by the shell
during execution. As will be seen later, few shell procedures
ever use more than a few variables. A variable is given a value
as follows:
= p /u2/pw/bin : :/bin :/usr/bin
$r
gives the value of the return code of the most recent command executed by the shell. When the shell terminates, it
returns the current value of $r as its own return code.
Ss
is initialized to the name of the user's Iogin directory, i.e., the
directory that becomes the current directory upon completion of a login. Users can avoid embedding unnecessary
absolute path names in their procedures by using this variable. This is of great benefit when path names are changed
either to balance disk loads or to reflect project organizational changes.
$t
is initialized to the user's terminal identification, a single
letter or digit.
= letter [argument]
If argument is given, its value is assigned to the variable given by
letter. As an example of common usage, the procedure below
expects to be called with a list of file names, optionally preceded
by a flag " - w " . If the first argument is " - w " , the fact is
recorded by setting $a to "w", and the argument is shifted off
the argument list, leaving only file names. If the first argument
is not " - w " , $a is left unchanged, i.e., it is a null string.
if~"$1" = - w then
maw
shift
In addition to these variables, the following is provided:
endif
... code to process file names, using $a as needed
$$
If no argument follows the letter in the " = " command, a single
line is read from the standard input, and the resulting string •
(with the trailing new-line, if any, r e m o v e d ) becomes the value
172
contains a 5-digit number that is the unique process
number of the current shell. In some circumstances, it is
necessary to know the number of a process, in order to kill
it, for example. However, its most c o m m o n use to date has
been that of generating unique names for temporary files.
4.4 Extended Order of Search for Commands
All of the operators, flags, and values are separate arguments
to ~ and must be separated by blanks. The following are typical
argument-testing operations:
The user may request automatic initialization of each shell's Sp
by creating a fi.le named ". path" in the login directory. This file
Should contain a single line of the form shown for $p. Every
instance of the shell looks for this file and initializes its own Sp
from it, if it exists; otherwise " : / b i n : / u s r / b i n " is used. Thus,
the ". path" value propagates through all of the user's shells, but
changing $p in one shell has no effect on the $p of any other.
:
check a file argument to make sure it exists
if ! - r "$1" echo "can't read $1"
:
assure either that:
$1 is a and $3 is either b or c
:
or that:
$1 is d and $2 is e
if "$1" = a - a "(" "$3" = b - o "$3" = c ")" \
- o "$1" = d - a "$2" = e goto legal
This facility is heavily used in large projects. It greatly
simplifies the sharing of procedures, and can be quickly changed
to adapt to changing organizational requirements.
Recall that the effect of the " \ " at the end of the line is that of a
blank. It is generally desirable to quote arguments when they
might possibly contain blanks or other characters having special
meaning to the shell.
4.5 Control Structures
The more complex shell control structures are actually implemented as independent commands that cooperate with the shell,
but are not actually part of it. They are designed specifically for
use in shell procedures, but are treated as ordinary commands.
This separation of function allows the shell to remain a small
program, efficient for on-line use, but still able to support powerful control structures in procedures.
4.5.3 If-then-else-endif To improve the readability and speed of
shell procedures, / f was extended to provide the "if-then-elseendif" form. The general form is:
if expr then
... commands
else
... commands
endif
4.5.1 Labels and Goto. The command " : " is recognized by the
shell and treated as a comment. The most common use of .....
is to define a label to act as a target for goto. Goto is a separate
command. Using "goto label" causes the following actions:
The "else" and the commands following it may be omitted, and
it is legal to nest if's within the commands.
When tf is called with a command, using the form of Section 4.5.2, it acts as described there, directly executing (or not)
the supplied command. When called with then instead of a command, tf simply exits on a true condition, allowing the shell to
read (and interpret) the immediately following lines. On a false
condition, tf reads the file until it finds the next unmatched else
or endif, thus skipping it and the lines in between. Else reads to
the next unmatched endif, and endifis a null command.
1. A seek is performed to move the read/write pointer to the
beginning of the command file.
2. The file is scanned from the beginning, searching for
": n a m e " on a line, either alone or followed by a blank or a
tab.
3. The read/write pointer is adjusted (via the seek) to point at
the line following the labeled line.
Thus, the only effect of goto is the adjustment of the shell's file
read/write pointer to cause the shell to resume interpreting commands starting at the line following the labeled line.
These commands work together in a way that produces the
appearance of a familiar control structure, although they do little
but adjust read/write pointers.
4.5.2 I f
4.5.4 Switch-breaksw-endsw. The switch command manipulates
the input file in a way quite similar to /f. It is modeled on the
corresponding "switch" statement of the C language [RIT75Al,
and like it, provides an etticient multi-way branch:
if expr command [args]
I f is also a separate command. If the conditional expression expr
is found to be true, tf executes the command (via exec system
call), passing the arguments to it. If it is f a l s e , / f m e r e l y exits.
switch value
: labell
... commands
label2
... commands
The following primaries are used to construct the expression:
- r file
true if the file exists and is readable by the user.
- w file
true if the file exists and is writable by the user.
sl = s2
true if strings sl and s2 are equal.
sl != s2
true if the strings are not equal.
nl - e q n2
true if the integers nl and n2 are algebraically
equal. Other algebraic comparisons are indicated
by " - n e " , " - g t " , " - g e " , " - I t " , and " - l e " .
{ command }
the command is executed; a return code of 0 is
considered true, any other value is considered
false. Most commands return 0 to indicate successful complelion.
default
... commands
endsw
Switch reads the input until it finds:
•
•
•
These primaries may be combined with the following operators:
Again, from the shell's viewpoint, the only effect of switch is to
adjust the read/write pointer so that the shell effectively skips
over part of the procedure, then continues executing commands
following the chosen label or endsw.
unary negation operator.
-a
binary logical and operator.
-o
binary logical or operator: it has lower precedence
than " - a ' .
( expr )
parentheses for grouping. They must be escaped
(as \( or ")", for example) to remove their
significance to the shell.
value used as a statement label, or
"default" used as a statement label (optional), or
the next unmatched endsw command.
Value is obtained from an argument or variable; if the label
"default" is present, it must be the last label in the list; i.e., it
indicates a default action to be taken if value matches none of
the preceding labels. This construct may be nested; labels
enclosed by interior "switch-endsw" pairs are ignored.
173
The command breaksw reads the input until the next
unmatched endsw, and c o m m o n l y ends the sequence of commands associated with a label. Endsw is a null command like
endif.
The onintr command was added to solve these problems. It takes
three forms: "onintr label" causes the effect of "goto label" to
occur upon receipt of an interrupt; "onintr - " causes interrupts
to be ignored completely; "onintr" alone causes normal interrupt
action to be restored. A typical use of onintr is:
4.5.5 End-of-file and Exit. W h e n the shell reaches the end-of-file,
it terminates execution, returning to its parent the return code
found in Sr. The exit command simply seeks to the end-of-file
and returns~ setting the return code to the value of its argument,
if any. Thtis, a procedure can be terminated normally by using
"exit 0". The fact that exit is not part of the shell permits
straightforward use of it as an argument for
onintr cleanup
:
create temporary file
ls -ll tee temp$$a [ grep - c "^d" [ = d
grep - c "^-" temp$$a I = f
echo "directories: Sd, files: Sf"
: cleanup
rm temp$$a
4.5.6 The Missing Loop. Conspicuous by its absence is s o m e
form of while or do. All of the control structures described so far
are implemented outside the shell; it appears that any useful
looping construct requires significant changes to the shell itself.
In any case, the most frequently observed kind of loop is that
used to process arguments one at a time. For example, the following applies the first argument as a command to every remaining argument:
This procedure displays the numbers of subdirectories and ordi~
nary files in the current directory. The output of the ls command is a listing of the current directory; it is passed to tee,
which m a k e s a n extra copy of it in "temp$$a", but also passes it
to grep, which, in this instance, counts the n u m b e r of lines
whose first character is "d". This is the n u m b e r of subdirectories, and is saved in variable $d. The ordinary files (whose listing entries begin with " - " ) are counted in a similar way, and
both counts are displayed. If the process is interrupted by the
user, control transfers to "cleanup", where the temporary file is
removed.
: loop
if 052 - 0 exit
$1 $2
shift 2
goto loop
4.7 Additional Supporting Commands
The " 2 " causes shift to leave the first argument in place.
4. 7.1 Evaluation of Expressions. Expr supports arithmetic and logical operators on integers, and PL/I-like " s u b s t f ' , "length", and
"index" operators for string manipulation. It evaluates a single
expression and writes the result to the standard output, typically
piped into " = " to be assigned to a variable. Typical examples
are:
4.5.7 Transfer to Another Command File--Next. The command
" n e x t n a m e " causes the shell to abandon the current procedure
and begin reading from file name. Next with no argume~lts
causes the shell to read from the user's terminal. The idea of
next is to permit the use of a file to initialize variables for use at
the terminal:
:
increment Sa
expr Sa + 1 I = a
= a /u2/pw/mash/articles
= b "nroff-rT2 -mm"
:
strip off first 2 chars, of $1 and put result in Sb
:
expr substr abcde 3 999 returns cde
expr substr "$1" 3 9991 = b
. . .
next
.
If this text were stored in file "init", it could be invoked by
using " n e x t init", causing the current shell to process it and
return to the terminal. The user can then reference Sa and Sb
appropriately. The user could of course use " = " to accomplish
this directly, but at the cost of more typing.
:
obtain length of $1
expr length "$1" I = c
The most common uses of expr are counting (for loops) and
using "substr" to pick apart strings (such as the output from
date, as in Section 4.3).
Next is an attempt to obtain an effect like that of a subroutine
call with a shared environment. It handles some problems well,
but will probably be changed somewhat to make it more useful.
Its most c o m m o n application has actually been in very complex
procedures that analyze their arguments, set up variables, then
pass control to one of several successor procedures.
4. 7.2 Ec/~o. The echo command, invoked as "echo [args]", copies
its arguments to the standard output, each separated from the
preceding one by a blank, with a new-line appended to the last
argument. It is often used to perform prompting or issue diagnostics in shell procedures, to add a few lines to an output
stream in the middle of a pipeline, and to create editing scripts;
" \ n '''. yields a new-line and "\On" yields the ASCII character
given by the octal number n; " \ c " is used to get rid of unwanted
new-lines. For example, the following code prompts the user for
some input and allows the user to type on the same line as the
prompt:
4.6 Interrupt Handling in Shell Procedures
Many PWB users have taken advantage of the ease and speed of
writing shell procedures to automate various operations. In
many cases, such procedures need to be used by clerical personnel who have no knowledge of these procedures' inner workings.
A terminal interrupt (depression of " r u b o u t " or "del" key) can
be.ignored or intercepted by a compiled program, or can cause
termination of that program. The lack of interrupt-handling
facilities in the shell quickly led to the usual p r o b l e m s :
•
•
•
echo "enter name:\c"
= a </dev/tty
No procedure could use a terminal interrupt as a control
haechanism.
Any procedure that created files for temporary use left them
in existence if interrupted before it could remove them. In
practice, any procedure that prints very much information is
likely to be interrupted sooner or later.
Some procedures need to temporarily ignore interrupts so
they can guarantee consistency among files making up data
bases. The PWB supports a profusion of packages that consist
of file groupings a.ccessed only through shell procedures.
4.8 Creation of Shell Procedures
A shell procedure can be created in t w o s i m p l e steps. The first
step is that of building an ordinary text file, normally using the
UNIX text editor ed. The second step is that of changing the
mode of the file to make it executable, thus permitting it to be
invoked by " n a m e args", rather than "sh name args", as discussed in Section 3.1. The second step may be omitted for a procedure to be used once or twice and then discarded, but is
recommended for longer-lived procedures.
174
The following shows the entire user input needed to set up a
simple procedure to format text files according to a standard format and print the output on a particular type of terminal:
5. PATrERNS OF USAGE
5.1 Survey Methodology
A survey of PWB shell procedures was conducted, with the following goals:
ed
a
nroff - r T 1 - r C 3 - m m $1 $2 $3 $4 $5 $6 $7 $8 $9 1 gsi +12
•
w draft
q
chmod 755 draft
•
In the sequence above, the user called the text editor, entered a
single line of text, wrote that line (creating the new file "draft"),
and finally changed its mode to make it executable. The uffer
may then invoke this command as "draft filel file2", for example. The procedure calls the formatter with certain fixed arguments and any others supplied by the user; the formatter output
is passed to gsi to convert it to a form that is appropriate for the
user's terminal (in this case, a GSI-300).
•
A program was written to find accessible shell procedures and
print them for analysis by the author. Two facts made this a
simple program. First, shell procedures are easy to manipulate
because they are stored as simple text files. Second, the UNIX
directory structure supports simple methods of traversal that permit easy investigation of all accessible files.
If the sequence above were performed in a directory ipcluded
in the user's ". path" file,-Ihe user could change directories and
still use the "draft" command. Other people might makeluse of
it also, especially if it were placed in a shared directory of commands.
The files summarized in the next section represent a sample
taken over a user population of more than 500 people. The data
was collected in March, 1976.
The command sequence above could itself be stored as a shell
procedure, although this particular sequence is an unlikely candidate for such an action. Note that the five lines following the ed
call are processed by ed rather than the shell. It is quite reasonable to include data for other programs inside shell procedures,
as long as those programs are careful in their method of reading,
i.e., do not read beyond their own data. This- method has
beneficial results for performance, because I/O buffers can be
shared without need for separate temporary files.
5.2 Survey Results
A total of 1,725 procedures was analyzed. Table 1 summarizes
the different forms of overall structure.
TABLE 1. Control Flow Summary
Category
Shell procedures may be dynamically created by other shell
procedures. A procedure may generate .a file of commands,
invoke another instance of the shell to execute that file, then
remove it. An alternate approach is that of using next to make
the current shell execute the new command file, allowing use o f
existing values of shell variables and avoiding the spawning of an
additional process for another shell. In some cases, the use, of a
temporary file may be eliminated by using .the shell in a pipeline.
For example:
Is a* I sod "s/.*/cp & x&/"
1: single line
2. straight-line
3. argument loop only
4. bi'anching, no loops
51 more complex
Number
369
935
83
201
137
Percentage •
21
54
5
12
8
Procedures in category 1 consist of a single command line. Procedures in category 2 possess no control logic, but contain more
than one line of text. The distribution of lines per procedure
clearly favored small files, with a long "tail" of larger ones. This
indicates that users find it helpful to " c a n " even one-line
sequences. The large number of small procedures found on the
PWB is at least partly due to the ease of creating them.
I sh-
The output of Is is a list of all file names in the currem directory
whose names begin with "a", one per line. Sed (a "stream editor") converts each line of the form " n a m e " into the form
"cp name xname~, 4 and passes it to a s h e l l to be interpreted. A
copy of each named file is generated under the name prefixed by
~¢X ~'.
Discovery of procedures that could easily be redone in better
ways, mainly to help users learn better ways of using existing
tools, but also to improve system performance.
Analysis of usage patterns, in order to improve existing PWB
tools or build new ones. Rearrangements of command functions often occur when people recognize that the functions
should be combined in different patterns. Examining user
code is a good way to discover real needs.
Determination of overall properties of procedures ~to help in
redesign or enhancement to the shell and its supporting commands. This is part of an educational process aimed at putting existing tools to better use.
Thus, most procedures (75%) contain no control logic, but
consist instead of straight-line scripts. The remaining 25% contain significant use of control logic, This percentage Will probably increase with time, as more people become familiar with the
shell's programming ability. In any case, it does indicate a
significant need for control structures not provided conveniently
in many CLS.
..
Implied in the above discussion are several reasons why users
like shell procedures better than compiled code. First, it is trivially easy to create and maintain a shell procedure, since it is only
a file of ordinary text. Second, it has no corresponding object
program that must be generated and maintained. Third, it is
easy to create procedures "on the fly," use them, and remove
them, without having to worry about managing libraries or about
allocating disk storage. Finally, because such procedures are
short in length, written at a high programming level, and kept in
their source-language form, they are generally easy to find,
understand, and modify.
Procedures in category 3 are those whose Control flow consists only of a single loop for processing the argument list, one a t
a time, such as:
: loop,
"if :"$1" = .... exit
... commands
shift
g o t o loopProcedures in category 4 are those possessing conditional branching, but no loops. Those in category 5 thus have at least one
loop and at least one conditional branch in addition to the one
implementing the loop.
4. Cp copies the file named by its first argument Onto that named by the
second, creating the latter, if necessary.
175
Each entry in Table 2 gives the number of shell procedures
and the percentage of the total in which the specified construct
was found. These figures were obtained by visual inspection of
the files, and should probably be taken as lower bounds on the
number of occurrences.
Multi-person projects often possess several sets of directories
used to store shared procedures, and individuals may well have
their own directories in addition. For example, project "tnds"
may use " / u 0 / t n d s / b i n " as a repository for procedures needed by
the entire project. Group "tf" within "tnds" • may use
" / u 0 / t n d s / t f / b i n " for procedures needed only by the group, and
individual "jtb" might use another directory for personal commands. The ". path" file used in this case might contain:
TABLE 2. Occurrence of Shell Constructs
Shell Construct
switch (or obvious use for it)
non-shell commands in file
pipe into shell
(command I sh - )
parenthesized commands
(i.e., implicit sh call)
explicit sh calls or obvious
shell procedure calls
= a </dev/tty
(read line from terminal)
command [ = a
(assign output of
command to variable)
= a string
(assign string to
variable)
onintr label
(intercept interrupt)
onintr (ignore interrupt)
expr substr
expr + or next
Number
Percentage
45
514
32
3
30
2
39
2
170
14
81
5
124
7
:/ u O / t n d s / t f / j t b / b i n :/ u O / t n d s / t f /b in :l u O / t n d s / b i n :/ b i n :/ u s r l b i n
In general, people use this type of sharing mechanism to adapt
the system to their own organizational needs.
5.4 Command Usage
Although the survey concentrated on static analysis, a few
dynamic usage statistics may be of interest. Each of the larger
PWB computers execute 25,000-40,000 commands per day.
Approximately 80% of these commands are executed within shell
procedures. These figures provide clear evidence that people
utilize the shell programming capabilities to a very large degree.
6. AN ASSESSMENT
146
8
35
2
14
1
33
24
13
2
1
1
In the environment described in this paper, advances occur in
three distinct ways: improving old tools, building new ones, and
improving the framework used to put them together. Work is
under way in all three areas, including most of the following:
•
•
•
•
•
Addition of while to the shell.
Creation of a "high-speed" shell that includes the bulk of the
control structure code.
Addition of different methods of manipulating pipes. For
example, one would like to use programs that can have
several input or output pipes, rather than only one of each.
Cleanup of the overall syntax, which has its ugly aspects.
Better ways of handling variables, including more powerful
default and substitution mechanisms.
Commands to perform argument processing without loops.
The figures indicate the frequent intermingling (30%) of shell
and non-shell commands in the same file, utilizing the fact that
the shell and other programs share the same input. This intermingling occurs most often in the straight-line scripts. At least
14% of the procedures invoke another level of shell, in one or
more of the three ways listed. When CL subroutines are available, people do make use of them, and tend to become
extremely irritated with any CL that does not provide them. At
least 5% of the procedures expect to write prompts to the user
and read from the terminal; the figure would be much higher if
it included prompts issued by programs other than the shell.
•
Despite the fact that expr supports multiplication and division,
no serious uses of these operations were found. 5 From observa-
7. CONCLUSIONS
It is clearly recognized that the PWB CL does not do everything
that its users would like it to do, so it is likely to continue evolving at a rapid rate. This paper has attempted to present a consistent view of the PWB CL as it has existed for a relatively long
time (about nine months). It has emphasized the description of
existing facilities whose usage patterns are well known~ rather
than the forecasting of things to come.
This paper has described a minicomputer-based CL that is more
powerful and convenient to use than many of those available on
much larger systems. It is a real CL used daily by people doing
production work under tight schedules. Although much work
remains to be done On it, it has definitely shown its worth as a
programming language for many applications, and has made
major contributions to the ability of users to get their work done
with a minimum of effort and irritation.
tion of the procedures, it is clear that users are doing much more
string manipulation than arithmetic. Although tf can perform
arithmetic comparisons, the bulk of the operations performed by
tf were string comparisons. Likewise, the string operations of
expr occurred more frequently than the arithmetic ones. It
appears that CL arithmetic is a helpful facility, but string manipulation is a necessity, an ordering contrary to that of some existing
CLS [COL76AI.
5.3 Informal Observations
ACKNOWLEDGEMENTS
In many cases, a shell procedure is kept in a directory with the
files that it manipulates, and is never used except in that directory. This is convenient in practice, and allows the user to forget
about possible naming conflicts with procedures in other directories. It also supports the c o m m o n practice of packaging related
files in a separate directory.
The shell was originally written by K. Thompson; that it has
been so adaptable to a wide variety of uses and machine
configurations is a great tribute to its creator. Many of the additions were implemented by R. C. Haight and A. L. Glasser.
REFERENCES
All references cited in this paper appear at the end of "An IntroWorkbench. '" by Dolotta, T. A., and
Mashey, J. R., in these Proceedings.
5. Multiplication was used in two procedures, but they were ignored in the
survey because they were obviously "'play'~ procedures, i.e., two versions of
a procedure to compute factorials, one iterative and the other recursive.
duction to the Programmer's
176
Download