Introduction to Grid Engine

advertisement
Introduction to Grid
Engine
Workbook
Edition 8
January 2011
Document reference: 3609-2011
Introduction to
Grid Engine
for ECDF Users
Workbook
Introduction to
Grid Engine
for ECDF Users
Author: Brian Fletcher, ITI Unix Section
Previous title: Introduction to Sun Grid Engine
Edition 8
January 2011
CONTENTS
ABOUT THE COURSE ............................................................................................................................. 3
MODULE 1:BASIC INTRODUCTION TO GRID ENGINE ................................. 4
WHAT IS A GRID? .................................................................................................................................. 5
THE ECDF CLUSTER (EDDIE) ............................................................................................................... 6
WHAT IS GRID ENGINE? ....................................................................................................................... 7
BASIC JOB SUBMISSION ........................................................................................ 8
TWO WAYS OF RUNNING GRID ENGINE ................................................................................................. 9
THE COMMAND-LINE INTERFACE .................................................................. 10
QSUB ................................................................................................................................................... 11
QSTAT ................................................................................................................................................. 12
QACCT ................................................................................................................................................ 13
QALTER ............................................................................................................................................... 14
QDEL................................................................................................................................................... 15
PRACTICAL EXERCISE 1 ...................................................................................................................... 16
THE GRAPHICAL INTERFACE ........................................................................... 17
QMON ................................................................................................................................................. 18
PRACTICAL EXERCISE 2 ...................................................................................................................... 22
REQUESTING RESOURCES ................................................................................. 23
HOW TO REQUEST RESOURCES ........................................................................................................... 24
RUN TIME ........................................................................................................................................... 25
MEMORY AND PROCESSORS ............................................................................................................... 26
REQUESTING RESOURCES USING QMON .............................................................................................. 27
RESERVING RESOURCES...................................................................................................................... 29
PARALLEL JOBS ..................................................................................................... 30
PARALLEL ENVIRONMENT .................................................................................................................. 31
ARRAY JOBS ....................................................................................................................................... 33
TROUBLESHOOTING ............................................................................................ 34
LOGGING ON ....................................................................................................................................... 35
WHY DOES MY JOB NOT START ........................................................................................................... 36
LIMITS ................................................................................................................................................ 37
MORE QSTAT OPTIONS ........................................................................................................................ 38
WHY DOES MY JOB STOP UNEXPECTEDLY? ......................................................................................... 39
DOCUMENTATION ................................................................................................. 40
MODULE 2: INTRODUCTION TO SHELL SCRIPTS ....................................... 41
INTRODUCTION ................................................................................................................................... 42
MAKING SHELL SCRIPTS ..................................................................................................................... 43
SYNTAX RULES ................................................................................................................................... 44
SHELL VARIABLES .............................................................................................................................. 45
COMMAND LINE ARGUMENTS ............................................................................................................. 46
PRACTICAL EXERCISE 1 ...................................................................................................................... 48
SHELL ARITHMETIC ............................................................................................................................ 49
PRACTICAL EXERCISE 2 ...................................................................................................................... 50
1
CONDITIONAL STATEMENTS ............................................................................................................... 51
TEST.................................................................................................................................................... 54
PRACTICAL EXERCISE 3 ...................................................................................................................... 55
LOOPING ............................................................................................................................................. 56
PRACTICAL EXERCISE 4 ...................................................................................................................... 60
READ .................................................................................................................................................. 61
CASE .................................................................................................................................................. 62
PRACTICAL EXERCISE 5 ...................................................................................................................... 64
ANSWERS TO PRACTICALS .................................................................................................................. 65
DOCUMENTATION ................................................................................................. 71
2
About the Course
This course consists of two modules which may be run together, or
individually as separate courses. The first module covers basic job
submission using Grid Engine, and the second covers writing Unix shell
scripts, based on material from the Edinburgh University Information Services
Training Course.
Module 1
Prerequisites:
Knowledge of Unix equivalent to at least Unix 1
Some experience of shell scripting would be an advantage, but is not essential.
Learning goals
By the end of the course you should:
understand the concept of a Grid
know about ECDF from a user viewpoint
understand what a scheduler is
know how to submit jobs to the ECDF
Module 2
Prerequisites:
Knowledge of Unix equivalent to at least Unix 1
Learning goals
By the end of the course you should:
be able to write simple Shell scripts
This workbook
This workbook will lead you through both modules, with practical sessions on the
way.
3
Module 1:Basic Introduction to
Grid Engine
4
What is a grid?
 A collection of computing resources
 Cluster grids
 Campus grids
 Global grids
The Grid Engine User’s Guide defines a grid as “a collection of computing
resources that perform tasks. In its simplest form, a grid appears to users as
a large system that provides a single point of access to powerful distributed
resources.”
There are three main classes of grids. In order of increasing size, they are
Cluster grids, Campus grids and Global grids.
A Cluster Grid is a number of computers from a single organization or
department which have been set up to work together.
A Campus Grid extends this so that many departments from the same
organisation can share computing resources.
A Global Grid is a collection of Campus Grids which enable many
organizations to create very large systems.
5
The ECDF Cluster (Eddie)
The ”Compute” part of the Edinburgh Compute and Data Facility (ECDF) is a
purpose-built cluster grid known as Eddie. It was installed in 2007 in two
phases. Phase 1 (replaced in 2010) consists of 128 nodes and Phase 2
(which will be replaced in 2011) consists of 118 nodes. Each node contains
eight cores (or processors), making a total of 1968 cores altogether.
6
What is Grid Engine?
Grid Engine is a Scheduler. To understand what this is, a little preamble is
necessary.
ECDF runs a “Batch System”. In practice, this means that a job submitted
by a user will wait in a queue until the resources it requires are available.
Usually it will not have long to wait as many jobs can be running at the same
time. See the section “Basic Job Submission” for more information on this.
Obviously there has to be some way for the system to know what resources
each job in the queue needs, and to match this with those currently available
before releasing jobs for execution.
It is the job of the Scheduler to manage these resources, both before, during
and after a job’s journey through the system. In brief, a scheduler will:
 Orchestrate allocation of resources
 Accept jobs (user requests for resources)
 Send jobs to worker nodes
 Manage running jobs
 Log record of each job
More about this will be covered in the section “Basic Job Submission”
The term “job” in Grid Engine refers to a shell script containing the
commands you wish to run. If you do not know how to write shell scripts, see
the section “Introduction to Shell Scripts”
7
Basic Job Submission
8
Two ways of running Grid Engine
Grid Engine can be run in either of two ways.
The first is by typing Unix-like commands to specify resources and submit
jobs. The commands we will look at are:
 qsub to submit a job to the queue, and request resources
 qstat to examine a job’s current status
 qacct to get a job’s accounting information
 qalter to change the attributes of pending jobs
 qdel to delete jobs from the queue
The second way is to use a graphical interface, initiated with the qmon
command. We shall look at each of these in turn.
9
The Command-line Interface
10
qsub
This command sends a job, in the form of a shell script, to the Grid Engine
queuing system. It does not accept executable files.
In its simplest form, it would be a command something like:
qsub myscript.sh
where myscript.sh is a file containing the shell script you wish to run.
Some options
-o filename
Send output to the file called filename
-e filename
Send error output to filename
-cwd
Run job in current working directory (default is home
directory).
-m
followed by b, e, a or s (or any combination) sends mail
message when job begins, ends, aborts, or is suspended
-M user@host
email address to be used for –m option
-v variable[=value] defines or redefines an environment variable to be
passed to the job. The = sign and value are optional.
-V
specifies that all environment variables are to be passed to
the job
-j yes
Write output and error messages to the same file
-l h_rt=h:m:s
required runtime in hours:minutes:seconds (see p.25)
Many more options are available. See the manual page for details.
Qsub options can also be embedded in shell scripts, instead of on the
command line. See the Introduction to Shell Scripts chapter for how to do
this.
Examples
qsub -cwd -o outfile -e errfile
myscript.sh
qsub –V -o outfile -m bae –M me@myhost myscript.sh
qsub –v TERM
myscript.sh
qsub –l h_rt=30
myscript.sh
11
qstat
Once a job has been submitted, its current status can be examined with the
command qstat, as long as the job has not finished. Information on finished
jobs can be obtained using the qacct command (see below).
In its simplest form, it would be a command something like:
qstat
and will produce a list of all pending or running jobs belonging to you, similar
to the following. Note that each job has a job-ID number , such as 5034 in the
example below. The most common values for the state column are r for a job
which is currently running, or qw for one which is queued and waiting. The
letter E in the state column, such as Eqw, indicates that the scheduler has
detected an error state. Examine your script to see if you have maybe
requested resources incorrectly. See the troubleshooting section.
job-ID
prior
name
user
state submit/start at
queue
slots ja-task-ID
---------------------------------------------------------------------------------------------------2005181 0.00000 PC_0.1-NP_
bjf
qw 05/17/2010 11:53:49
1
2005034 0.06429 simple.sh
bjf
r
1
05/17/2010 11:53:03
some options
-u username
Show only jobs for username
-u “*”
Show jobs for all users
-f
Full listing by node
-j job-ID
Detailed information on pending/running jobs, including
reasons for a job being rejected.
12
qacct
Qacct is a command which scans the accounting file and will produce
accounting information in a variety of ways. The accounting file is not
structured in any way, and accessing it can be wasteful of resources.
qacct should only be used if necessary.
To get information for a particular job, the command is
qacct –j job-ID
The –j option causes qacct to produce its output in a different format from the
other options below.
Some other options
-o owner name
gives summary of system use by that owner (user)
-o (on its own)
gives data for all users (not recommended)
-g group name
gives summary of system use by members of a group
The qacct command on its own will produce total figures
Examples
qacct –o bjf
qacct –g scisup
13
qalter
This command can be used to modify attributes of a job once it has been
submitted and is in the Pending queue (see Note 2 below). The attributes to
be modified are specified as options to the qalter command. For example,
the following command will alter the stdout file for job 1234567 to be the file
junk:
qalter -o junk 1234567
The system should reply with a message similar to the following:
modified stdout path list of job 1234567
Some other options
-e
modify pathname for stderr
-m
followed by b, a, e (see qsub above) or n for no mail
-M
change mail address to which mail will be sent
-N
change name of job
-l h_rt=H:M:S change runtime
For a full list see the man page for qalter.
Notes
1) If, for any reason, one of the specified attributes cannot be modified as
requested, then none of the attributes will be modified.
2) Although many attributes can be altered while a job is running, in some
cases these changes will not take effect until the job is re-run.
3) Qalter also appears as a button on the qmon Job Control windows for
Pending and Running Jobs (see below).
14
qdel
The qdel command is used to remove jobs from the pending queue. You
might want to do this if you have submitted the wrong job or wish to withdraw
it for any reason. The syntax is
qdel job-id
where job-id is the number of the job you wish to delete, as given by qsub or
qstat.
Note that unlike the other commands we have looked at, qdel does not
require –j before the job-id.
If a job fails to delete, you can use the –f flag to force deletion:
qdel –f job-id
15
Practical Exercise 1
 Log in to eddie, using your UUN and EASE password.
 Use an editor to create a text file containing the following simple script:
#!/bin/sh
#$ -o junk
hostname
#
# print date and time
date
This output from this script will contain the name of the host on which
the script is running, and also the current date and time. It will send its
output to a file called junk.
 Submit this job using the qsub command. Unless the job runs very
quickly, you should be able to check its progress using qstat. If it's
running too quickly, try adding a sleep command to give you more time.
 Submit the job again, using some other qsub options, such as –m
 If you are confident with directories, experiment with the –cwd option
 Try adding qsub options to the script, instead of just the command line.
 Modify the file in any way you like.
You can also delete jobs from the queue by using the Delete button in the
qmon Job Control window (see below).
16
The Graphical Interface
17
qmon
The qmon command provides a graphical interface for submitting jobs. It is
typed on the command line in the same way as qsub.
It is a good idea to issue this command in background mode (by typing
qmon&) so that it will continue running while you use foreground mode to
work with your scripts and files.
The first window to open is the Grid Engine “splash” window:
followed by the qmon Main Control window:
The buttons on this window allow you to submit jobs, examine the queue, and
control many other aspects of Grid Engine. The button third from the left on
the top row, is “Submit Jobs”, which you would use to tell qmon about which
script you want to run.
Submitting Jobs using qmon
18
Click the Submit Jobs button in the Main Control window. The following
window will appear.
Ensure the General tab is selected.
You can specify the script you wish to run by typing its name in the box
labelled Job Script, or you can browse for it by clicking the button to the right
of the box. Your job should also be given a name, so that it can easily be
located when checking its status (see below). By default, the system uses
the name of your script file, but you may change this if you wish. Also, you
can type the names for your job’s stdin, stdout and stderr files in the relevant
boxes. (These can also be specified in the command line or shell script, and
the boxes here will be filled in automatically.) To run the job, click the Submit
button in the right-hand panel. To close this window, click the Done button.
The Advanced tab
Clicking the Advanced tab will give a host of other options. The most useful
will probably be Mail, in the middle column. This allows you to receive an
email when your job starts, ends, is aborted or suspended.
Simply check the boxes next to the options you require, and enter your email
address in the Mail To box. These options can also be specified using the –m
19
option in the qsub command line. See above for more details. The
examples below show typical mail content for started and completed jobs.
Job 2039014 (course1.sh) Started
User
= bjf
Queue
Host
= ecdf
= eddie315.ecdf.ed.ac.uk
Start Time = 12/03/2010 15:52:16
Job 2039014 (course1.sh) Complete
User
= bjf
Queue
Host
= ecdf@eddie315.ecdf.ed.ac.uk
= eddie315
Start Time
= 12/03/2010 15:52:16
End Time
= 12/03/2010 15:52:16
User Time
= 00:00:00
System Time
= 00:00:00
Wallclock Time = 00:00:00
CPU
Max vmem
Exit Status
= 00:00:00
= NA
=0
Checking job status in qmon
To view your job’s status, go back to the Main Control panel and click the top
left button (Job Control). The Job Control window will open.
20
If your job has not yet started executing, you should be able to see it in the
Pending Jobs list, by clicking the appropriate tab. If it has started running, it
will be in Running Jobs, and if it has run and terminated, you will find it in
Finished Jobs. The figure above shows a typical Finished Jobs display.
To make it easier to find your job, you can sort the list by any of the columns
by clicking the appropriate heading (Jobid, Priority, Owner etc). Clicking the
same heading a second time will reverse the order.
Pending jobs can be removed from the queue by highlighting the job and
clicking the Delete button in the right-hand panel. Running jobs can similarly
be suspended or deleted. You cannot delete finished jobs from the list. They
will remain there until cleared by the system.
The Job Control window can be closed by clicking Done.
21
Practical Exercise 2
 Repeat exercise 1 but this time use the qmon interface instead of qsub.
22
Requesting Resources
23
How to request Resources
The qsub command described above has options which allow you to specify
the time limit for your job, the number of CPUs you need, and the memory
requirements. These resources are specified using the –l option (lower
case L) followed by the resource and quantity desired.
It is also possible to request resources via the qmon graphical interface. Both
methods are described below.
UCSEUCSEUCSEUCSEUCS
24
Run Time
You can set a time limit for your job using a command similar to
qsub –l h_rt=H:M:S jobscript.sh
where H:M:S is the time limit in Hours:Minutes:Seconds, and jobscript.sh is
the shell script that you wish to run. If any of the values is zero, it can be
omitted as long as the colon remains. A single value, with no colons, will be
treated as a number of seconds.
There are four different runtime limits for various groups of nodes on Eddie.
These are currently set to 30 minutes, 6 hours, 24 hours and 48 hours. You
should get into the habit of specifying your required run time when you submit
your job. If you do not do this, your jobs will go into the first available slot, no
matter what the maximum run time is on that slot. This could mean that your
job could be subject to a 30 minute maximum run time, unless you specify
otherwise.
25
Memory and Processors
Memory
By default, the processing core running your job will have a memory
allocation of 2Gb. If your job requires more than 2Gb, you should proceed
as described below. Failure to do this will result in your job being killed when
it becomes larger than 2GB.
Within Eddie, memory is allocated in “slots” of either 2GB or 6GB. To
allocate, say, 4GB to your job, there is a parallel environment (pe - see later)
known as memory_2G which is used as follows:
qsub -pe memory_2G 2 myjob.sh
The number after memory_2G is the number of 2GB slots required, so 2 will
give you 4GB, 3 will give you 6GB etc. The maximum number of slots you
can request is 8. The environment memory_6G is used in an analogous way.
Note that there are more 2GB slots available than 6GB slots, so your jobs
should be released more quickly if you specify your requirements in terms of
memory_2G rather than memory_6G.
.
Processors
More than one processor will be required if you are doing true parallel
processing - the most basic environment for this is called "OpenMP".
To make use of more than one CPU core on a node, use OpenMP or one of
the OpenMPI parallel environments by adding one of the following options to
your qsub command:
-pe OpenMP n
where n is the number of CPU cores you want, up to a maximum of 8 . If you
request more than this, your job cannot run.
-pe openmpi_smp8_mark2 n
where n is the number of CPU cores you want, which must be a multiple of 8,
up to a maximum of 1024 on 128 worker nodes.
26
Requesting resources using qmon
Run time
This can be set by clicking the large Request Resources button from the
General tab of the Submit Job window. The window shown below will
appear.
Ensure the Hard Request option is lit, then double-click on h_rt in the list of
resources. A further window will appear in which you can choose the
required time limit for your job. Do not click the Infinity button, and remember
that the maximum time for a job on Eddie is 48 hours. Remember also that if
you do not specify a time limit, your job may end up in a queue whose limit is
30 minutes, and this may not be enough for your job to complete.
Hard Resources are those which must be allocated before a job can be
started. Soft Resources can be allocated while a job is running.
27
Memory and processors
Parallel Environments such as “memory” and “OpenMP” can also be selected
using qmon. Click the Advanced tab on the Submit Job window and in the
top left you will see a box labelled Parallel Environment. Click the button to
the right of this box, and a menu will appear from which you can select the
required PE. The name of the PE will appear in the box, along with a number
(defaulted to 1). This is the number of units selected, and can be changed as
required.
28
Reserving resources
If you have a big parallel job which won't run because the cluster is too busy,
use resource reservation. Put the
-R y
option into your qsub command.
29
Parallel Jobs
30
Parallel Environment
Parallel Environment (PE) is a Grid Engine software package that enables
parallel computing. By selecting a Parallel Environment, users can make use
of various Message Passing Interface (MPI) libraries installed on the cluster.
All these MPI libraries have already been tightly integrated with Grid Engine
to provide a hassle free initialisation with reliable slave process control and
correct accounting info.
Which Parallel Environments are available?
You can view the parallel environments currently available with the command
$ qconf -spl
The following valid PEs exist on Eddie at the time of writing:
OpenMP
infinipath
memory
memory-2G
memory-6G
mpich2
openib
openib-smp
openib-smp8
openib_smp8_qdr
openib_smp8_sdr
openmpi
openmpi-smp
openmpi-smp8
openmpi_fillup_mark1
openmpi_fillup_mark2
31
openmpi_smp8_mark1
openmpi_smp8_mark2
qlogic-mpi
qlogic-mpi8
qlogic-mpi_smp8_qdr
qlogic-mpi_smp8_sdr
The -smp suffix (Shared Memory Programming) tells the scheduler to
guarantee that every 4 slots are allocated on a single node to provide the
"SMP Cluster" Shared Memory environment. These PEs only accept
requests for slots in multiples of 4. The –smp8 suffix acts similarly for 8 slots.
The side effect is that your job will probably need to wait longer in the queue
until there are sufficient empty nodes available.
Using Parallel Environments
A Parallel Environment is selected when submitting a job to Grid Engine
using the -pe flag for the qsub command. Only one PE can be selected per
job. You specify the name of the Parallel Environment and the number of
slots to request:
$ qsub -pe <pe_name> <slots> -l h_rt=h:m:s <job_script>
for example
qsub -pe openmp 8 –l h_rt=16:00:00 hello.sh
Tutorials
For more information, worked examples and tutorials on using PEs, see the
links under Running Parallel Jobs on the ECDF wiki page
https://www.wiki.ed.ac.uk/display/ecdfwiki/Documentation
32
Array Jobs
Sometimes, you want to run a number of mostly identical jobs with the only
difference being input parameters or data sets. Instead of submitting each as
an independent job, you could submit an Array Job. Grid Engine provides this
feature to help users to easily manage a job series with one command. Array
jobs also place significantly lower load on the system than would otherwise
be the case.
You may have done the following operations before:
$ qsub job.sh data.1
$ qsub job.sh data.2
...
$ qsub job.sh data.100
Writing a script to submit in a loop is one option. But you may also need to
prepare the loop script to stop or cancel those jobs once you find something
wrong after the jobs start running. Such a script can also place excessive
load on the queueing system. Using an Array Job can help you to avoid these
issues as you can submit, stop and delete jobs with just one command.
Use the following command to submit an array job:
$ qsub -t 1-100 job.array.sh data
Where the file job.array.sh looks like this:
#!/bin/sh
job.sh $1.$SGE_TASK_ID
This will schedule 100 jobs, with each one being identical except for the data
input being data.number, with number counting up from 1 to 100.
$1 represents the string passed on the command line (in this case, "data"),
and $SGE_TASK_ID represents the counter.
33
Troubleshooting
34
Logging on
Permission denied, please try again.
Wrong username or password. Make sure you are using your UUN and
EASE password.
X11 connection rejected
This is most likely because the user is over quota on their home directory.
This means the
~/.Xauthority file cannot be written to hence the
authentication error.
35
Why does my job not start
Why jobs fail to start
 Check resources asked for are sensible – e.g. the following
will not work:
–pe OpenMP 10 (maximum is 8)
-l h_rt=49:00:00 (maximum is 48 hrs)
 Check with qstat that your intentions have been understood by
Grid Engine:
Time
Parallel environment
Number of slots
36
Limits
•
•
•
•
User can submit 5000
900 running at one time
Jobs asking for unavailable resources will not run e,g, more than 48
hours
CPUs
Run
Time
Number of CPUs in
total
Number of CPUs with
Infiniband
30 min
20
0
6 hr
36
36
24 hr
64
64
48 hr
1301
140
So submitting an 80-way job with 23 hours will never get started
37
More qstat options
qstat –s p
•
•
Will only list waiting (pending) jobs
Count with qstat –s p | wc -l
38
Why does my job stop unexpectedly?
•Check log
qacct –j jobnumber
Out of time (Compare time used with time requested)
Out of memory (Check memory used with maxvmem)
Exit status (> 127 means out of memory)
•Ask for more information when submitting
use –m and –M
39
Documentation
Grid Engine:
Using Grid Engine
Downloadable from
http://wikis.sun.com/display/GridEngine/Using+Sun+Grid+Engine
Eddie:
Web site
http://www.ecdf.ed.ac.uk/
Wiki
https://www.wiki.ed.ac.uk/display/ecdfwiki/Home
40
Module 2: Introduction to Shell
Scripts
41
Introduction
This chapter assumes knowledge of the Unix command line structure. Its
purpose is to show how commands are used to create shell scripts, and to
introduce flow-control (loops) and conditional constructs.
42
Making shell scripts
A shell script is a collection of Unix command lines assembled in a normal
text file, usually given a name ending in the file extension .sh, and stored in
the Unix filestore just like any other file. The commands are normal Unix
command lines as you would type at a terminal foreground session. They will
be run in order, as if they had been typed manually. A shell script must have
execute permission turned on before it will run. The command to do this is:
chmod +x script.sh
(replacing script.sh with the name of your own script file).
43
Syntax rules
Here is a simple (and trivial) shell script which simply returns the name of the
host it is running on, and the current date and time:
#!/bin/sh
#
#$ -o myoutput
#
hostname
#
# print date and time
date
We shall now examine what this means.
The first thing to note is that most lines beginning with a hash (#) character
are comments, and will be ignored when the script is executed.
An exception to this rule is the special construction in the first line:
#!/bin/sh
This is an instruction to Unix to say which shell to use when interpreting the
script. The /bin/sh refers to the location of the original Bourne shell, which
is present in all Unix systems. Specifying the Bourne shell means the script
can be passed to other Unix installations and will always be interpreted in the
same way.
Another exception is the line beginning #$. This line contains an option to
the qsub command, exactly as if they had been specified in the qsub
command line. You would need one such line for every option you wished to
specify in this way.
44
Shell variables
Variable names may comprise upper and lower case alphabetic characters,
digits and underscores, and must start with a letter. Names and are case
sensitive, so that for example FRED and fred would represent different
variables.
Some names are reserved for Unix use, such as the environment variables
PATH, TERM etc.
Variables do not have type, so there is no need for declaration. Shell
variables are treated as being string variables, and are created by simply
assigning them a value. For example
Greeting="Hello World!"
would create a variable called Greeting, and give it the value “Hello World!”
Notice that when assigning values to variables, there should be no spaces
around the equals sign.
To print out the value of a variable (called dereferencing), precede it with a
dollar sign:
echo $Greeting
Hello World
Constants are variables whose initial value cannot be changed. These are
created using the readonly command, used as follows:
Author="Charles Dickens"
readonly Author
45
Command line arguments
When a Unix command is run, for example
grep –i smith wholist
the shell will initiate the first item on the line, which it knows will be a
command or script. It is then up to the command to interpret the rest of the
line, and sort out options and arguments. Items in the list are referred to
inside the command by the positional parameters $1 - $9.
In the example, $1 would be –i, $2 would be smith, and $3 would be
wholist.
There are some special variables to help handle this inside shell scripts:
$#
number of arguments on the command line
$* and $@
contain all arguments as one string
"$*"
quoted string of all arguments
"$@"
string of individually quoted arguments
"$@" is best.
$$
current process ID of the shell
Simple example
Consider the following shell script, which will print out tony's name if he's
logged on
#!/bin/sh
# Is tony logged on?
who | grep tony
This will only ever search for tony. A much better script would be one where
you could specify who you were looking for:
46
#!/bin/sh
# Is the specified user logged on?
who | grep $1
If we called this script, say, showon, and gave it execute permission, we
could search for any user we liked:
showon tony
showon bert
Shift
An obvious question is "How do we find the tenth argument?" The answer is
that we use the shift command, which moves the parameters down one
place, so that $2 becomes $1, $3 becomes $2, and so on. Notice that as a
result of this we have lost the original $1, which would have to be assigned to
another variable before the shift.
47
Practical exercise 1

Write a shell script, called args.sh, which can take an arbitrary number of arguments and
print out how many there are. For example
args.sh a b c d
would print out “There are 4 arguments.”

Ensure the script can run with at least 10 arguments, then adapt it to print out the ninth one.

Adapt it again to print out the tenth argument.

Finally, adapt it to print out the first and tenth arguments simultaneously, for example
“The first argument is x and the tenth is y.”
48
Shell arithmetic
The shell treats all shell variables as strings, so doing arithmetic requires a
conversion command, expr, to treat variables as numbers. Note that expr
only does integer (whole number) arithmetic. Real arithmetic (involving
decimal fractions) is not possible with expr. It can be done in a round-about
way using the shell calculator bc, which is beyond the scope of this course.
There are five arithmetic operations as below.
expr num + num
Addition
expr num - num
Subtraction
expr num \* num
Multiplication
expr num / num
Division
expr num % num
Remainder
Note that in some circumstances the multiplication operator, *, must be
escaped, to prevent it being treated as a wild card. This is done in the usual
way by preceding it with a backslash (\).
As stated above, expr only does integer arithmetic. If presented with noninteger arguments, it will not do the calculation.
The result of a calculation is usually assigned to a shell variable using
command substitution (backquoting):
Result=`expr $1 + $2`
49
Practical exercise 2

Write a script which takes two numeric arguments and prints out what percentage the first is
of the second. For example,
percent.sh 1 2
50%
50
Conditional statements
Exit status
Every Unix command returns a number to the system, to indicate whether it
ran successfully, or failed. This is known as the command’s exit status as
follows:
 An exit status of zero, means the command was successful
 An exit status which is non-zero means the command failed
The exit status of the last command is held in the special shell variable $?. It
can be examined with the command
echo $?
You can force a particular exit status (n) from within a shell script using one
of the commands
exit n
return n
The if statement
Conditions can be tested for being true or false, for example, does a certain
file exist, or does a certain variable have a certain value? A command, or list
of commands, can then be execuited depending on the result of the test.
This is done using the if-then-fi construct:
if conditional-expression
then
list of commands
fi
The if command introduces the conditional-expression, which will be
evaluated (more on this later) to be either true or false, and the list of
commands following then, will be executed only if the condition is true. If the
condition is false, then nothing will happen, and the next command in
sequence will be executed as normal. The fi marks the end of the sequence
of commands, and terminates the if command.
51
if-then-else-fi
Adding an else clause to the above, enables a second list of commands to be
executed only when the condition is false:
if conditional-expression
then
first list of commands
else
second list of commands
fi
In this case, the first list of commands will be executed only if the condition is
true, as before, but the second list will only be executed if the condition is
false.
Example
#!/bin/sh
#Is the specified user logged in
if who | grep $1 > /dev/null
then
echo $1 is logged in
return 0
else
echo $1 is not logged in
return 1
fi
If the user is logged in, grep will succeed and return an exit status 0 (true) so
the then part will be run. If the user is not logged in, grep will return 1
(false) so the else part will run. The script itself will return either 0 or 1 as
appropriate, using exit commands.
elif
52
To make things even more complex, the else part could also be a
conditional in its own right. To avoid saying else if, there is a special word
elif,used as follows:
if condition 1
then
command list 1
elif condition 2
then
command list 2
else
command list 3
fi
If you choose not to use elif, you would need nested if statements, and
each would require its own fi:
if condition 1
then
command list 1
else
if condition 2
then
command list 2
else
command list 3
fi
fi
53
test
The test command evaluates conditions and returns zero if the condition is
true and non-zero if it is false. For example, the command
test –f filename
will test to see if the file filename exists and is a regular file. It will return
zero if it does, and non-zero if it doesn't. Other options work in the same
way.
File attributes
test –d file file exists and is a directory
test –r file file exists and is readable
test –w file file exists and is writeable
test –x file file exists and is executable/searchable
test –s file file exists and has a size greater than zero.
Strings
test str
str is not null
test str1=str2
str1 equals str2
test str1!=str2
str1 does not equal str2
Numeric
test num1 –eq num2
54
num1 and num2 are equal
n1 –ne n2
not equal to
n1 –gt n2
greater than
n1 –ge n2
greater than or equal to
n1 –lt n2
less than
ne –le n2
less than or equal to
Practical Exercise 3
Write a shell script, called filetype which takes a single argument and does
the following:
determine if the argument is a file or a directory, or something else.
If it is a file, print out whether it is readable, writeable or executable.
If it is a directory, print out whether it is readable, writeable or searchable.
Exit the script with a status of 0.
If it is something else, print out a suitable message then exit with status of 1.
When your script terminates, check that the exit status is what you expect.
55
Looping
Looping allows us to repeat any part or parts of a script a given number of
times, usually with different values for some variables at each cycle. The
number of repeats can be predetermined, or can be set to continue while (or
until) a given condition occurs. When the loop has finished, the script
continues with the next command.
There are three main types:
For loops
While loops
Until loops
We shall look at each in turn.
56
For loops
These are the simplest loops, which repeat a given number of times. this is
best illustrated with an example.
The general syntax is:
for variable in word-list
do
command-list
done
Each value in word-list will be substituted in turn for the variable, and the
command-list will be executed once for each new substitution. The variable
may appear in the command-list (usually as an argument of one of the
commands), or it may simply be used as a counter to control the number of
repetitions.
Example
#!/bin/sh
#Counts up to three, one count per second
for i in one two three
do
echo $i
sleep 1
done
To start, i has the value one, its value is echoed to sdtout, and then waits
one second. i then takes the value two, and the process is repeated until the
word-list is exhausted.
The word-list may be of an unknown length, for example
for file in a*
will create a word list of all files beginning with a.
57
While loops
These loops contain a conditional test, and the looping will continue while the
condition remains true. The syntax is:
while condition
do
command-list
done
The command-list is repeated as long as the condition is true. Obviously
there must be something in the command-list which can change the condition
to false, otherwise it will loop forever.
Example
#!/bin/sh
#Blast off – countdown from 10
i=10
while test $i –ge 0
do
echo $i
i=`expr $i – 1`
sleep 1
done
echo " ... we have lift off!"
exit 0
The variable i is initially set to 10. Each time round the loop, it is printed out
and then decreased by 1. This will continue as long as it is greater or equal
to 0. As soon as i goes negative, the condition will be false and looping will
terminate.
58
Until loops
These are the opposite of while-loops, and the command-list will be executed
as long as the condition is false. Looping will cease if the condition becomes
true.
Syntax
until condition
do
command-list
done
59
Practical Exercise 4
 Write a shell script to test whether a file exists, and to list it out (using
cat) if it does. Output a suitable message if the file does not exist.
 Create a shell script using the script in the while loop example
above, and run this job to make sure it works.
 Can you adapt the script to work using until instead of while?
60
Read
The read command takes a line of input from the standard input channel
(usually the keyboard, but can be reassigned with the < sign) and assigns the
words on the input line to the variables given as arguments.
Example
read a b c
will read a line of input, and assign the first word to a, the second to b, and
the third to c. If there are more than 3 words, then c will contain all words
from the third to the end. If there are less than 3 words, then any "unused"
variables will be undefined, even if they had a value before.
read returns an exit code of 0 if it managed to read a line. If it failed, such as
by reaching the end of the file, then it returns a non-zero code.
It is often used with the while loop, such as
while read f
which will cause the loop to continue reading input (and presumably doing
something with it) until the input ends, read returns a non-zero code, and the
loop terminates.
61
Case
The case command is an extension of the if command, offering multi-way
branching. It compares a given string (possibly read in from the terminal)
against a set of patterns, each of which has a corresponding set of
commands, which will be executed if the string matches the pattern.
The format is:
case string in
pattern1)
command-list1
;;
pattern2)
command-list2
;;
etc.
esac
Note that each command-list is terminated with two semi-colons, and that the
whole case construct is terminated with esac (case backwards). Only one
pattern can be matched at a time, and as soon as its command–list is
finished, the whole case command terminates.
Patterns
The standard shell pattern-matching facilities (*, ? and []) can be used in the
patterns, as well as the vertical bar to indicate options, for example
tom | jerry)
will match either word. Note that * will match any pattern, so can be used as
the last pattern, to match anything not caught by previous patterns.
62
Example
#!/bin/sh
# what-we-got - determine character type of first arg
case "$1" in
[0-9]*)
echo "$1 starts with a number"
;;
[a-z]*|[A-Z]*)
echo "$1 starts with an alphabetic"
;;
*)
echo "$1 starts with a non numeric or alphabetic
character"
;;
esac
63
Practical Exercise 5
Consider the following list of different words for "hello":
hello
(English)
bonjour
(French)
hola
(Spanish)
ciao
(Italian)
"guten tag" (German)
Write a script which will read a foreign word from the terminal, and use a
case statement to return the appropriate nationality. Don't forget to include a
wildcard pattern at the end to catch any unknown words.
64
Answers to practicals
Practical 1
#!/bin/sh
# Use $# to get the number of arguments
#
echo There are $# arguments.
#
#!/bin/sh
# Use $9 to get the ninth argument
#
echo The ninth argument is $9.
#
#!/bin/sh
# Use shift and $9 to get the tenth argument
#
shift
echo The tenth argument is $9.
#
#!/bin/sh
# Save $1 in another variable, then shift to get the
tenth argument
#
temp=$1
shift
echo The first argument is $temp and the tenth is $9.
#
65
Practical 2
#!/bin/sh
result=`expr
$1
\*
100
/
$2`
echo $result
It is important to space out the operators and operands as shown.
If your normal way of doing this is to divide the two numbers and then
multiply by 100 to get a percentage, it will not work in this case. The division
is likely to produce a fraction less than 1, and since expr only works in integer
mode, this will be represented as zero.
Since the expression is evaluated first due to its back-quoting, it could be
given as a direct argument to echo, without the intervening result variable.
66
Practical 3
#!/bin/sh
#
# Check if it is a file
#
if test -f $1
then
echo $1 is a file
if test -r $1
then
echo
It is readable
fi
if test -w $1
then
echo It is writeable
fi
if test -x $1
then
echo It is executable
fi
return 0
fi
#
# Check if it is a directory
#
if test -d $1
then
echo $1 is a directory
if test -r $1
then
echo
It is readable
67
fi
if test -w $1
then
echo It is writeable
fi
if test -x $1
then
echo It is searchable
fi
return 0
fi
#
echo $1 is something other than a file or directory, or
does not exist.
return 1
68
Practical 4
#!/bin/sh
#
if test -f $1
then
cat $1
else
echo The requested file does not exist.
fi
#!/bin/sh
#Blast off – countdown from 10
i=10
until test $i –lt 0
do
echo $i
i=`expr $i – 1`
sleep 1
done
echo " ... we have lift off!"
exit 0
69
Practical 5
#!/bin/sh
# Words for hello
case "$1" in
hello)
echo English
;;
bonjour)
echo French
;;
hola)
echo Spanish
;;
ciao)
echo Italian
;;
"guten tag")
echo German
;;
*)
echo Word not in database.
;;
esac
70
Documentation
Shell Scripts:
The Information Services Training Section runs a course on Shell
Programming. For further details, contact the Computing Skills Centre on
0131-650 3350 or email eucs.training@ed.ac.uk, quoting course code 1160.
The workbook is available on-line at
http://www.ucs.ed.ac.uk/eucs_documentation/Documents_by_Number/2630/
71
Download