Introduction to Grid Engine Workbook Edition 8 January 2011 Document reference: 3609-2011 Introduction to Grid Engine for ECDF Users Workbook Introduction to Grid Engine for ECDF Users Author: Brian Fletcher, ITI Unix Section Previous title: Introduction to Sun Grid Engine Edition 8 January 2011 CONTENTS ABOUT THE COURSE ............................................................................................................................. 3 MODULE 1:BASIC INTRODUCTION TO GRID ENGINE ................................. 4 WHAT IS A GRID? .................................................................................................................................. 5 THE ECDF CLUSTER (EDDIE) ............................................................................................................... 6 WHAT IS GRID ENGINE? ....................................................................................................................... 7 BASIC JOB SUBMISSION ........................................................................................ 8 TWO WAYS OF RUNNING GRID ENGINE ................................................................................................. 9 THE COMMAND-LINE INTERFACE .................................................................. 10 QSUB ................................................................................................................................................... 11 QSTAT ................................................................................................................................................. 12 QACCT ................................................................................................................................................ 13 QALTER ............................................................................................................................................... 14 QDEL................................................................................................................................................... 15 PRACTICAL EXERCISE 1 ...................................................................................................................... 16 THE GRAPHICAL INTERFACE ........................................................................... 17 QMON ................................................................................................................................................. 18 PRACTICAL EXERCISE 2 ...................................................................................................................... 22 REQUESTING RESOURCES ................................................................................. 23 HOW TO REQUEST RESOURCES ........................................................................................................... 24 RUN TIME ........................................................................................................................................... 25 MEMORY AND PROCESSORS ............................................................................................................... 26 REQUESTING RESOURCES USING QMON .............................................................................................. 27 RESERVING RESOURCES...................................................................................................................... 29 PARALLEL JOBS ..................................................................................................... 30 PARALLEL ENVIRONMENT .................................................................................................................. 31 ARRAY JOBS ....................................................................................................................................... 33 TROUBLESHOOTING ............................................................................................ 34 LOGGING ON ....................................................................................................................................... 35 WHY DOES MY JOB NOT START ........................................................................................................... 36 LIMITS ................................................................................................................................................ 37 MORE QSTAT OPTIONS ........................................................................................................................ 38 WHY DOES MY JOB STOP UNEXPECTEDLY? ......................................................................................... 39 DOCUMENTATION ................................................................................................. 40 MODULE 2: INTRODUCTION TO SHELL SCRIPTS ....................................... 41 INTRODUCTION ................................................................................................................................... 42 MAKING SHELL SCRIPTS ..................................................................................................................... 43 SYNTAX RULES ................................................................................................................................... 44 SHELL VARIABLES .............................................................................................................................. 45 COMMAND LINE ARGUMENTS ............................................................................................................. 46 PRACTICAL EXERCISE 1 ...................................................................................................................... 48 SHELL ARITHMETIC ............................................................................................................................ 49 PRACTICAL EXERCISE 2 ...................................................................................................................... 50 1 CONDITIONAL STATEMENTS ............................................................................................................... 51 TEST.................................................................................................................................................... 54 PRACTICAL EXERCISE 3 ...................................................................................................................... 55 LOOPING ............................................................................................................................................. 56 PRACTICAL EXERCISE 4 ...................................................................................................................... 60 READ .................................................................................................................................................. 61 CASE .................................................................................................................................................. 62 PRACTICAL EXERCISE 5 ...................................................................................................................... 64 ANSWERS TO PRACTICALS .................................................................................................................. 65 DOCUMENTATION ................................................................................................. 71 2 About the Course This course consists of two modules which may be run together, or individually as separate courses. The first module covers basic job submission using Grid Engine, and the second covers writing Unix shell scripts, based on material from the Edinburgh University Information Services Training Course. Module 1 Prerequisites: Knowledge of Unix equivalent to at least Unix 1 Some experience of shell scripting would be an advantage, but is not essential. Learning goals By the end of the course you should: understand the concept of a Grid know about ECDF from a user viewpoint understand what a scheduler is know how to submit jobs to the ECDF Module 2 Prerequisites: Knowledge of Unix equivalent to at least Unix 1 Learning goals By the end of the course you should: be able to write simple Shell scripts This workbook This workbook will lead you through both modules, with practical sessions on the way. 3 Module 1:Basic Introduction to Grid Engine 4 What is a grid? A collection of computing resources Cluster grids Campus grids Global grids The Grid Engine User’s Guide defines a grid as “a collection of computing resources that perform tasks. In its simplest form, a grid appears to users as a large system that provides a single point of access to powerful distributed resources.” There are three main classes of grids. In order of increasing size, they are Cluster grids, Campus grids and Global grids. A Cluster Grid is a number of computers from a single organization or department which have been set up to work together. A Campus Grid extends this so that many departments from the same organisation can share computing resources. A Global Grid is a collection of Campus Grids which enable many organizations to create very large systems. 5 The ECDF Cluster (Eddie) The ”Compute” part of the Edinburgh Compute and Data Facility (ECDF) is a purpose-built cluster grid known as Eddie. It was installed in 2007 in two phases. Phase 1 (replaced in 2010) consists of 128 nodes and Phase 2 (which will be replaced in 2011) consists of 118 nodes. Each node contains eight cores (or processors), making a total of 1968 cores altogether. 6 What is Grid Engine? Grid Engine is a Scheduler. To understand what this is, a little preamble is necessary. ECDF runs a “Batch System”. In practice, this means that a job submitted by a user will wait in a queue until the resources it requires are available. Usually it will not have long to wait as many jobs can be running at the same time. See the section “Basic Job Submission” for more information on this. Obviously there has to be some way for the system to know what resources each job in the queue needs, and to match this with those currently available before releasing jobs for execution. It is the job of the Scheduler to manage these resources, both before, during and after a job’s journey through the system. In brief, a scheduler will: Orchestrate allocation of resources Accept jobs (user requests for resources) Send jobs to worker nodes Manage running jobs Log record of each job More about this will be covered in the section “Basic Job Submission” The term “job” in Grid Engine refers to a shell script containing the commands you wish to run. If you do not know how to write shell scripts, see the section “Introduction to Shell Scripts” 7 Basic Job Submission 8 Two ways of running Grid Engine Grid Engine can be run in either of two ways. The first is by typing Unix-like commands to specify resources and submit jobs. The commands we will look at are: qsub to submit a job to the queue, and request resources qstat to examine a job’s current status qacct to get a job’s accounting information qalter to change the attributes of pending jobs qdel to delete jobs from the queue The second way is to use a graphical interface, initiated with the qmon command. We shall look at each of these in turn. 9 The Command-line Interface 10 qsub This command sends a job, in the form of a shell script, to the Grid Engine queuing system. It does not accept executable files. In its simplest form, it would be a command something like: qsub myscript.sh where myscript.sh is a file containing the shell script you wish to run. Some options -o filename Send output to the file called filename -e filename Send error output to filename -cwd Run job in current working directory (default is home directory). -m followed by b, e, a or s (or any combination) sends mail message when job begins, ends, aborts, or is suspended -M user@host email address to be used for –m option -v variable[=value] defines or redefines an environment variable to be passed to the job. The = sign and value are optional. -V specifies that all environment variables are to be passed to the job -j yes Write output and error messages to the same file -l h_rt=h:m:s required runtime in hours:minutes:seconds (see p.25) Many more options are available. See the manual page for details. Qsub options can also be embedded in shell scripts, instead of on the command line. See the Introduction to Shell Scripts chapter for how to do this. Examples qsub -cwd -o outfile -e errfile myscript.sh qsub –V -o outfile -m bae –M me@myhost myscript.sh qsub –v TERM myscript.sh qsub –l h_rt=30 myscript.sh 11 qstat Once a job has been submitted, its current status can be examined with the command qstat, as long as the job has not finished. Information on finished jobs can be obtained using the qacct command (see below). In its simplest form, it would be a command something like: qstat and will produce a list of all pending or running jobs belonging to you, similar to the following. Note that each job has a job-ID number , such as 5034 in the example below. The most common values for the state column are r for a job which is currently running, or qw for one which is queued and waiting. The letter E in the state column, such as Eqw, indicates that the scheduler has detected an error state. Examine your script to see if you have maybe requested resources incorrectly. See the troubleshooting section. job-ID prior name user state submit/start at queue slots ja-task-ID ---------------------------------------------------------------------------------------------------2005181 0.00000 PC_0.1-NP_ bjf qw 05/17/2010 11:53:49 1 2005034 0.06429 simple.sh bjf r 1 05/17/2010 11:53:03 some options -u username Show only jobs for username -u “*” Show jobs for all users -f Full listing by node -j job-ID Detailed information on pending/running jobs, including reasons for a job being rejected. 12 qacct Qacct is a command which scans the accounting file and will produce accounting information in a variety of ways. The accounting file is not structured in any way, and accessing it can be wasteful of resources. qacct should only be used if necessary. To get information for a particular job, the command is qacct –j job-ID The –j option causes qacct to produce its output in a different format from the other options below. Some other options -o owner name gives summary of system use by that owner (user) -o (on its own) gives data for all users (not recommended) -g group name gives summary of system use by members of a group The qacct command on its own will produce total figures Examples qacct –o bjf qacct –g scisup 13 qalter This command can be used to modify attributes of a job once it has been submitted and is in the Pending queue (see Note 2 below). The attributes to be modified are specified as options to the qalter command. For example, the following command will alter the stdout file for job 1234567 to be the file junk: qalter -o junk 1234567 The system should reply with a message similar to the following: modified stdout path list of job 1234567 Some other options -e modify pathname for stderr -m followed by b, a, e (see qsub above) or n for no mail -M change mail address to which mail will be sent -N change name of job -l h_rt=H:M:S change runtime For a full list see the man page for qalter. Notes 1) If, for any reason, one of the specified attributes cannot be modified as requested, then none of the attributes will be modified. 2) Although many attributes can be altered while a job is running, in some cases these changes will not take effect until the job is re-run. 3) Qalter also appears as a button on the qmon Job Control windows for Pending and Running Jobs (see below). 14 qdel The qdel command is used to remove jobs from the pending queue. You might want to do this if you have submitted the wrong job or wish to withdraw it for any reason. The syntax is qdel job-id where job-id is the number of the job you wish to delete, as given by qsub or qstat. Note that unlike the other commands we have looked at, qdel does not require –j before the job-id. If a job fails to delete, you can use the –f flag to force deletion: qdel –f job-id 15 Practical Exercise 1 Log in to eddie, using your UUN and EASE password. Use an editor to create a text file containing the following simple script: #!/bin/sh #$ -o junk hostname # # print date and time date This output from this script will contain the name of the host on which the script is running, and also the current date and time. It will send its output to a file called junk. Submit this job using the qsub command. Unless the job runs very quickly, you should be able to check its progress using qstat. If it's running too quickly, try adding a sleep command to give you more time. Submit the job again, using some other qsub options, such as –m If you are confident with directories, experiment with the –cwd option Try adding qsub options to the script, instead of just the command line. Modify the file in any way you like. You can also delete jobs from the queue by using the Delete button in the qmon Job Control window (see below). 16 The Graphical Interface 17 qmon The qmon command provides a graphical interface for submitting jobs. It is typed on the command line in the same way as qsub. It is a good idea to issue this command in background mode (by typing qmon&) so that it will continue running while you use foreground mode to work with your scripts and files. The first window to open is the Grid Engine “splash” window: followed by the qmon Main Control window: The buttons on this window allow you to submit jobs, examine the queue, and control many other aspects of Grid Engine. The button third from the left on the top row, is “Submit Jobs”, which you would use to tell qmon about which script you want to run. Submitting Jobs using qmon 18 Click the Submit Jobs button in the Main Control window. The following window will appear. Ensure the General tab is selected. You can specify the script you wish to run by typing its name in the box labelled Job Script, or you can browse for it by clicking the button to the right of the box. Your job should also be given a name, so that it can easily be located when checking its status (see below). By default, the system uses the name of your script file, but you may change this if you wish. Also, you can type the names for your job’s stdin, stdout and stderr files in the relevant boxes. (These can also be specified in the command line or shell script, and the boxes here will be filled in automatically.) To run the job, click the Submit button in the right-hand panel. To close this window, click the Done button. The Advanced tab Clicking the Advanced tab will give a host of other options. The most useful will probably be Mail, in the middle column. This allows you to receive an email when your job starts, ends, is aborted or suspended. Simply check the boxes next to the options you require, and enter your email address in the Mail To box. These options can also be specified using the –m 19 option in the qsub command line. See above for more details. The examples below show typical mail content for started and completed jobs. Job 2039014 (course1.sh) Started User = bjf Queue Host = ecdf = eddie315.ecdf.ed.ac.uk Start Time = 12/03/2010 15:52:16 Job 2039014 (course1.sh) Complete User = bjf Queue Host = ecdf@eddie315.ecdf.ed.ac.uk = eddie315 Start Time = 12/03/2010 15:52:16 End Time = 12/03/2010 15:52:16 User Time = 00:00:00 System Time = 00:00:00 Wallclock Time = 00:00:00 CPU Max vmem Exit Status = 00:00:00 = NA =0 Checking job status in qmon To view your job’s status, go back to the Main Control panel and click the top left button (Job Control). The Job Control window will open. 20 If your job has not yet started executing, you should be able to see it in the Pending Jobs list, by clicking the appropriate tab. If it has started running, it will be in Running Jobs, and if it has run and terminated, you will find it in Finished Jobs. The figure above shows a typical Finished Jobs display. To make it easier to find your job, you can sort the list by any of the columns by clicking the appropriate heading (Jobid, Priority, Owner etc). Clicking the same heading a second time will reverse the order. Pending jobs can be removed from the queue by highlighting the job and clicking the Delete button in the right-hand panel. Running jobs can similarly be suspended or deleted. You cannot delete finished jobs from the list. They will remain there until cleared by the system. The Job Control window can be closed by clicking Done. 21 Practical Exercise 2 Repeat exercise 1 but this time use the qmon interface instead of qsub. 22 Requesting Resources 23 How to request Resources The qsub command described above has options which allow you to specify the time limit for your job, the number of CPUs you need, and the memory requirements. These resources are specified using the –l option (lower case L) followed by the resource and quantity desired. It is also possible to request resources via the qmon graphical interface. Both methods are described below. UCSEUCSEUCSEUCSEUCS 24 Run Time You can set a time limit for your job using a command similar to qsub –l h_rt=H:M:S jobscript.sh where H:M:S is the time limit in Hours:Minutes:Seconds, and jobscript.sh is the shell script that you wish to run. If any of the values is zero, it can be omitted as long as the colon remains. A single value, with no colons, will be treated as a number of seconds. There are four different runtime limits for various groups of nodes on Eddie. These are currently set to 30 minutes, 6 hours, 24 hours and 48 hours. You should get into the habit of specifying your required run time when you submit your job. If you do not do this, your jobs will go into the first available slot, no matter what the maximum run time is on that slot. This could mean that your job could be subject to a 30 minute maximum run time, unless you specify otherwise. 25 Memory and Processors Memory By default, the processing core running your job will have a memory allocation of 2Gb. If your job requires more than 2Gb, you should proceed as described below. Failure to do this will result in your job being killed when it becomes larger than 2GB. Within Eddie, memory is allocated in “slots” of either 2GB or 6GB. To allocate, say, 4GB to your job, there is a parallel environment (pe - see later) known as memory_2G which is used as follows: qsub -pe memory_2G 2 myjob.sh The number after memory_2G is the number of 2GB slots required, so 2 will give you 4GB, 3 will give you 6GB etc. The maximum number of slots you can request is 8. The environment memory_6G is used in an analogous way. Note that there are more 2GB slots available than 6GB slots, so your jobs should be released more quickly if you specify your requirements in terms of memory_2G rather than memory_6G. . Processors More than one processor will be required if you are doing true parallel processing - the most basic environment for this is called "OpenMP". To make use of more than one CPU core on a node, use OpenMP or one of the OpenMPI parallel environments by adding one of the following options to your qsub command: -pe OpenMP n where n is the number of CPU cores you want, up to a maximum of 8 . If you request more than this, your job cannot run. -pe openmpi_smp8_mark2 n where n is the number of CPU cores you want, which must be a multiple of 8, up to a maximum of 1024 on 128 worker nodes. 26 Requesting resources using qmon Run time This can be set by clicking the large Request Resources button from the General tab of the Submit Job window. The window shown below will appear. Ensure the Hard Request option is lit, then double-click on h_rt in the list of resources. A further window will appear in which you can choose the required time limit for your job. Do not click the Infinity button, and remember that the maximum time for a job on Eddie is 48 hours. Remember also that if you do not specify a time limit, your job may end up in a queue whose limit is 30 minutes, and this may not be enough for your job to complete. Hard Resources are those which must be allocated before a job can be started. Soft Resources can be allocated while a job is running. 27 Memory and processors Parallel Environments such as “memory” and “OpenMP” can also be selected using qmon. Click the Advanced tab on the Submit Job window and in the top left you will see a box labelled Parallel Environment. Click the button to the right of this box, and a menu will appear from which you can select the required PE. The name of the PE will appear in the box, along with a number (defaulted to 1). This is the number of units selected, and can be changed as required. 28 Reserving resources If you have a big parallel job which won't run because the cluster is too busy, use resource reservation. Put the -R y option into your qsub command. 29 Parallel Jobs 30 Parallel Environment Parallel Environment (PE) is a Grid Engine software package that enables parallel computing. By selecting a Parallel Environment, users can make use of various Message Passing Interface (MPI) libraries installed on the cluster. All these MPI libraries have already been tightly integrated with Grid Engine to provide a hassle free initialisation with reliable slave process control and correct accounting info. Which Parallel Environments are available? You can view the parallel environments currently available with the command $ qconf -spl The following valid PEs exist on Eddie at the time of writing: OpenMP infinipath memory memory-2G memory-6G mpich2 openib openib-smp openib-smp8 openib_smp8_qdr openib_smp8_sdr openmpi openmpi-smp openmpi-smp8 openmpi_fillup_mark1 openmpi_fillup_mark2 31 openmpi_smp8_mark1 openmpi_smp8_mark2 qlogic-mpi qlogic-mpi8 qlogic-mpi_smp8_qdr qlogic-mpi_smp8_sdr The -smp suffix (Shared Memory Programming) tells the scheduler to guarantee that every 4 slots are allocated on a single node to provide the "SMP Cluster" Shared Memory environment. These PEs only accept requests for slots in multiples of 4. The –smp8 suffix acts similarly for 8 slots. The side effect is that your job will probably need to wait longer in the queue until there are sufficient empty nodes available. Using Parallel Environments A Parallel Environment is selected when submitting a job to Grid Engine using the -pe flag for the qsub command. Only one PE can be selected per job. You specify the name of the Parallel Environment and the number of slots to request: $ qsub -pe <pe_name> <slots> -l h_rt=h:m:s <job_script> for example qsub -pe openmp 8 –l h_rt=16:00:00 hello.sh Tutorials For more information, worked examples and tutorials on using PEs, see the links under Running Parallel Jobs on the ECDF wiki page https://www.wiki.ed.ac.uk/display/ecdfwiki/Documentation 32 Array Jobs Sometimes, you want to run a number of mostly identical jobs with the only difference being input parameters or data sets. Instead of submitting each as an independent job, you could submit an Array Job. Grid Engine provides this feature to help users to easily manage a job series with one command. Array jobs also place significantly lower load on the system than would otherwise be the case. You may have done the following operations before: $ qsub job.sh data.1 $ qsub job.sh data.2 ... $ qsub job.sh data.100 Writing a script to submit in a loop is one option. But you may also need to prepare the loop script to stop or cancel those jobs once you find something wrong after the jobs start running. Such a script can also place excessive load on the queueing system. Using an Array Job can help you to avoid these issues as you can submit, stop and delete jobs with just one command. Use the following command to submit an array job: $ qsub -t 1-100 job.array.sh data Where the file job.array.sh looks like this: #!/bin/sh job.sh $1.$SGE_TASK_ID This will schedule 100 jobs, with each one being identical except for the data input being data.number, with number counting up from 1 to 100. $1 represents the string passed on the command line (in this case, "data"), and $SGE_TASK_ID represents the counter. 33 Troubleshooting 34 Logging on Permission denied, please try again. Wrong username or password. Make sure you are using your UUN and EASE password. X11 connection rejected This is most likely because the user is over quota on their home directory. This means the ~/.Xauthority file cannot be written to hence the authentication error. 35 Why does my job not start Why jobs fail to start Check resources asked for are sensible – e.g. the following will not work: –pe OpenMP 10 (maximum is 8) -l h_rt=49:00:00 (maximum is 48 hrs) Check with qstat that your intentions have been understood by Grid Engine: Time Parallel environment Number of slots 36 Limits • • • • User can submit 5000 900 running at one time Jobs asking for unavailable resources will not run e,g, more than 48 hours CPUs Run Time Number of CPUs in total Number of CPUs with Infiniband 30 min 20 0 6 hr 36 36 24 hr 64 64 48 hr 1301 140 So submitting an 80-way job with 23 hours will never get started 37 More qstat options qstat –s p • • Will only list waiting (pending) jobs Count with qstat –s p | wc -l 38 Why does my job stop unexpectedly? •Check log qacct –j jobnumber Out of time (Compare time used with time requested) Out of memory (Check memory used with maxvmem) Exit status (> 127 means out of memory) •Ask for more information when submitting use –m and –M 39 Documentation Grid Engine: Using Grid Engine Downloadable from http://wikis.sun.com/display/GridEngine/Using+Sun+Grid+Engine Eddie: Web site http://www.ecdf.ed.ac.uk/ Wiki https://www.wiki.ed.ac.uk/display/ecdfwiki/Home 40 Module 2: Introduction to Shell Scripts 41 Introduction This chapter assumes knowledge of the Unix command line structure. Its purpose is to show how commands are used to create shell scripts, and to introduce flow-control (loops) and conditional constructs. 42 Making shell scripts A shell script is a collection of Unix command lines assembled in a normal text file, usually given a name ending in the file extension .sh, and stored in the Unix filestore just like any other file. The commands are normal Unix command lines as you would type at a terminal foreground session. They will be run in order, as if they had been typed manually. A shell script must have execute permission turned on before it will run. The command to do this is: chmod +x script.sh (replacing script.sh with the name of your own script file). 43 Syntax rules Here is a simple (and trivial) shell script which simply returns the name of the host it is running on, and the current date and time: #!/bin/sh # #$ -o myoutput # hostname # # print date and time date We shall now examine what this means. The first thing to note is that most lines beginning with a hash (#) character are comments, and will be ignored when the script is executed. An exception to this rule is the special construction in the first line: #!/bin/sh This is an instruction to Unix to say which shell to use when interpreting the script. The /bin/sh refers to the location of the original Bourne shell, which is present in all Unix systems. Specifying the Bourne shell means the script can be passed to other Unix installations and will always be interpreted in the same way. Another exception is the line beginning #$. This line contains an option to the qsub command, exactly as if they had been specified in the qsub command line. You would need one such line for every option you wished to specify in this way. 44 Shell variables Variable names may comprise upper and lower case alphabetic characters, digits and underscores, and must start with a letter. Names and are case sensitive, so that for example FRED and fred would represent different variables. Some names are reserved for Unix use, such as the environment variables PATH, TERM etc. Variables do not have type, so there is no need for declaration. Shell variables are treated as being string variables, and are created by simply assigning them a value. For example Greeting="Hello World!" would create a variable called Greeting, and give it the value “Hello World!” Notice that when assigning values to variables, there should be no spaces around the equals sign. To print out the value of a variable (called dereferencing), precede it with a dollar sign: echo $Greeting Hello World Constants are variables whose initial value cannot be changed. These are created using the readonly command, used as follows: Author="Charles Dickens" readonly Author 45 Command line arguments When a Unix command is run, for example grep –i smith wholist the shell will initiate the first item on the line, which it knows will be a command or script. It is then up to the command to interpret the rest of the line, and sort out options and arguments. Items in the list are referred to inside the command by the positional parameters $1 - $9. In the example, $1 would be –i, $2 would be smith, and $3 would be wholist. There are some special variables to help handle this inside shell scripts: $# number of arguments on the command line $* and $@ contain all arguments as one string "$*" quoted string of all arguments "$@" string of individually quoted arguments "$@" is best. $$ current process ID of the shell Simple example Consider the following shell script, which will print out tony's name if he's logged on #!/bin/sh # Is tony logged on? who | grep tony This will only ever search for tony. A much better script would be one where you could specify who you were looking for: 46 #!/bin/sh # Is the specified user logged on? who | grep $1 If we called this script, say, showon, and gave it execute permission, we could search for any user we liked: showon tony showon bert Shift An obvious question is "How do we find the tenth argument?" The answer is that we use the shift command, which moves the parameters down one place, so that $2 becomes $1, $3 becomes $2, and so on. Notice that as a result of this we have lost the original $1, which would have to be assigned to another variable before the shift. 47 Practical exercise 1 Write a shell script, called args.sh, which can take an arbitrary number of arguments and print out how many there are. For example args.sh a b c d would print out “There are 4 arguments.” Ensure the script can run with at least 10 arguments, then adapt it to print out the ninth one. Adapt it again to print out the tenth argument. Finally, adapt it to print out the first and tenth arguments simultaneously, for example “The first argument is x and the tenth is y.” 48 Shell arithmetic The shell treats all shell variables as strings, so doing arithmetic requires a conversion command, expr, to treat variables as numbers. Note that expr only does integer (whole number) arithmetic. Real arithmetic (involving decimal fractions) is not possible with expr. It can be done in a round-about way using the shell calculator bc, which is beyond the scope of this course. There are five arithmetic operations as below. expr num + num Addition expr num - num Subtraction expr num \* num Multiplication expr num / num Division expr num % num Remainder Note that in some circumstances the multiplication operator, *, must be escaped, to prevent it being treated as a wild card. This is done in the usual way by preceding it with a backslash (\). As stated above, expr only does integer arithmetic. If presented with noninteger arguments, it will not do the calculation. The result of a calculation is usually assigned to a shell variable using command substitution (backquoting): Result=`expr $1 + $2` 49 Practical exercise 2 Write a script which takes two numeric arguments and prints out what percentage the first is of the second. For example, percent.sh 1 2 50% 50 Conditional statements Exit status Every Unix command returns a number to the system, to indicate whether it ran successfully, or failed. This is known as the command’s exit status as follows: An exit status of zero, means the command was successful An exit status which is non-zero means the command failed The exit status of the last command is held in the special shell variable $?. It can be examined with the command echo $? You can force a particular exit status (n) from within a shell script using one of the commands exit n return n The if statement Conditions can be tested for being true or false, for example, does a certain file exist, or does a certain variable have a certain value? A command, or list of commands, can then be execuited depending on the result of the test. This is done using the if-then-fi construct: if conditional-expression then list of commands fi The if command introduces the conditional-expression, which will be evaluated (more on this later) to be either true or false, and the list of commands following then, will be executed only if the condition is true. If the condition is false, then nothing will happen, and the next command in sequence will be executed as normal. The fi marks the end of the sequence of commands, and terminates the if command. 51 if-then-else-fi Adding an else clause to the above, enables a second list of commands to be executed only when the condition is false: if conditional-expression then first list of commands else second list of commands fi In this case, the first list of commands will be executed only if the condition is true, as before, but the second list will only be executed if the condition is false. Example #!/bin/sh #Is the specified user logged in if who | grep $1 > /dev/null then echo $1 is logged in return 0 else echo $1 is not logged in return 1 fi If the user is logged in, grep will succeed and return an exit status 0 (true) so the then part will be run. If the user is not logged in, grep will return 1 (false) so the else part will run. The script itself will return either 0 or 1 as appropriate, using exit commands. elif 52 To make things even more complex, the else part could also be a conditional in its own right. To avoid saying else if, there is a special word elif,used as follows: if condition 1 then command list 1 elif condition 2 then command list 2 else command list 3 fi If you choose not to use elif, you would need nested if statements, and each would require its own fi: if condition 1 then command list 1 else if condition 2 then command list 2 else command list 3 fi fi 53 test The test command evaluates conditions and returns zero if the condition is true and non-zero if it is false. For example, the command test –f filename will test to see if the file filename exists and is a regular file. It will return zero if it does, and non-zero if it doesn't. Other options work in the same way. File attributes test –d file file exists and is a directory test –r file file exists and is readable test –w file file exists and is writeable test –x file file exists and is executable/searchable test –s file file exists and has a size greater than zero. Strings test str str is not null test str1=str2 str1 equals str2 test str1!=str2 str1 does not equal str2 Numeric test num1 –eq num2 54 num1 and num2 are equal n1 –ne n2 not equal to n1 –gt n2 greater than n1 –ge n2 greater than or equal to n1 –lt n2 less than ne –le n2 less than or equal to Practical Exercise 3 Write a shell script, called filetype which takes a single argument and does the following: determine if the argument is a file or a directory, or something else. If it is a file, print out whether it is readable, writeable or executable. If it is a directory, print out whether it is readable, writeable or searchable. Exit the script with a status of 0. If it is something else, print out a suitable message then exit with status of 1. When your script terminates, check that the exit status is what you expect. 55 Looping Looping allows us to repeat any part or parts of a script a given number of times, usually with different values for some variables at each cycle. The number of repeats can be predetermined, or can be set to continue while (or until) a given condition occurs. When the loop has finished, the script continues with the next command. There are three main types: For loops While loops Until loops We shall look at each in turn. 56 For loops These are the simplest loops, which repeat a given number of times. this is best illustrated with an example. The general syntax is: for variable in word-list do command-list done Each value in word-list will be substituted in turn for the variable, and the command-list will be executed once for each new substitution. The variable may appear in the command-list (usually as an argument of one of the commands), or it may simply be used as a counter to control the number of repetitions. Example #!/bin/sh #Counts up to three, one count per second for i in one two three do echo $i sleep 1 done To start, i has the value one, its value is echoed to sdtout, and then waits one second. i then takes the value two, and the process is repeated until the word-list is exhausted. The word-list may be of an unknown length, for example for file in a* will create a word list of all files beginning with a. 57 While loops These loops contain a conditional test, and the looping will continue while the condition remains true. The syntax is: while condition do command-list done The command-list is repeated as long as the condition is true. Obviously there must be something in the command-list which can change the condition to false, otherwise it will loop forever. Example #!/bin/sh #Blast off – countdown from 10 i=10 while test $i –ge 0 do echo $i i=`expr $i – 1` sleep 1 done echo " ... we have lift off!" exit 0 The variable i is initially set to 10. Each time round the loop, it is printed out and then decreased by 1. This will continue as long as it is greater or equal to 0. As soon as i goes negative, the condition will be false and looping will terminate. 58 Until loops These are the opposite of while-loops, and the command-list will be executed as long as the condition is false. Looping will cease if the condition becomes true. Syntax until condition do command-list done 59 Practical Exercise 4 Write a shell script to test whether a file exists, and to list it out (using cat) if it does. Output a suitable message if the file does not exist. Create a shell script using the script in the while loop example above, and run this job to make sure it works. Can you adapt the script to work using until instead of while? 60 Read The read command takes a line of input from the standard input channel (usually the keyboard, but can be reassigned with the < sign) and assigns the words on the input line to the variables given as arguments. Example read a b c will read a line of input, and assign the first word to a, the second to b, and the third to c. If there are more than 3 words, then c will contain all words from the third to the end. If there are less than 3 words, then any "unused" variables will be undefined, even if they had a value before. read returns an exit code of 0 if it managed to read a line. If it failed, such as by reaching the end of the file, then it returns a non-zero code. It is often used with the while loop, such as while read f which will cause the loop to continue reading input (and presumably doing something with it) until the input ends, read returns a non-zero code, and the loop terminates. 61 Case The case command is an extension of the if command, offering multi-way branching. It compares a given string (possibly read in from the terminal) against a set of patterns, each of which has a corresponding set of commands, which will be executed if the string matches the pattern. The format is: case string in pattern1) command-list1 ;; pattern2) command-list2 ;; etc. esac Note that each command-list is terminated with two semi-colons, and that the whole case construct is terminated with esac (case backwards). Only one pattern can be matched at a time, and as soon as its command–list is finished, the whole case command terminates. Patterns The standard shell pattern-matching facilities (*, ? and []) can be used in the patterns, as well as the vertical bar to indicate options, for example tom | jerry) will match either word. Note that * will match any pattern, so can be used as the last pattern, to match anything not caught by previous patterns. 62 Example #!/bin/sh # what-we-got - determine character type of first arg case "$1" in [0-9]*) echo "$1 starts with a number" ;; [a-z]*|[A-Z]*) echo "$1 starts with an alphabetic" ;; *) echo "$1 starts with a non numeric or alphabetic character" ;; esac 63 Practical Exercise 5 Consider the following list of different words for "hello": hello (English) bonjour (French) hola (Spanish) ciao (Italian) "guten tag" (German) Write a script which will read a foreign word from the terminal, and use a case statement to return the appropriate nationality. Don't forget to include a wildcard pattern at the end to catch any unknown words. 64 Answers to practicals Practical 1 #!/bin/sh # Use $# to get the number of arguments # echo There are $# arguments. # #!/bin/sh # Use $9 to get the ninth argument # echo The ninth argument is $9. # #!/bin/sh # Use shift and $9 to get the tenth argument # shift echo The tenth argument is $9. # #!/bin/sh # Save $1 in another variable, then shift to get the tenth argument # temp=$1 shift echo The first argument is $temp and the tenth is $9. # 65 Practical 2 #!/bin/sh result=`expr $1 \* 100 / $2` echo $result It is important to space out the operators and operands as shown. If your normal way of doing this is to divide the two numbers and then multiply by 100 to get a percentage, it will not work in this case. The division is likely to produce a fraction less than 1, and since expr only works in integer mode, this will be represented as zero. Since the expression is evaluated first due to its back-quoting, it could be given as a direct argument to echo, without the intervening result variable. 66 Practical 3 #!/bin/sh # # Check if it is a file # if test -f $1 then echo $1 is a file if test -r $1 then echo It is readable fi if test -w $1 then echo It is writeable fi if test -x $1 then echo It is executable fi return 0 fi # # Check if it is a directory # if test -d $1 then echo $1 is a directory if test -r $1 then echo It is readable 67 fi if test -w $1 then echo It is writeable fi if test -x $1 then echo It is searchable fi return 0 fi # echo $1 is something other than a file or directory, or does not exist. return 1 68 Practical 4 #!/bin/sh # if test -f $1 then cat $1 else echo The requested file does not exist. fi #!/bin/sh #Blast off – countdown from 10 i=10 until test $i –lt 0 do echo $i i=`expr $i – 1` sleep 1 done echo " ... we have lift off!" exit 0 69 Practical 5 #!/bin/sh # Words for hello case "$1" in hello) echo English ;; bonjour) echo French ;; hola) echo Spanish ;; ciao) echo Italian ;; "guten tag") echo German ;; *) echo Word not in database. ;; esac 70 Documentation Shell Scripts: The Information Services Training Section runs a course on Shell Programming. For further details, contact the Computing Skills Centre on 0131-650 3350 or email eucs.training@ed.ac.uk, quoting course code 1160. The workbook is available on-line at http://www.ucs.ed.ac.uk/eucs_documentation/Documents_by_Number/2630/ 71