CST 383 Shell & Script Programming with Unix Spring 2012 Brief Lecture Notes 1 for CST 383 Spring 2012 My suggestion for learning bash is to find an online book or tutorial to start if you have no experience with *nix shell scripting. A good reference is http://www.tldp.org/LDP/abs/ which gives you access to several varieties of online or downloadable bash guides. The first things you need to understand about the kind of scripting we will cover in this class is the it centers on command line windows (also known as terminal windows). Once you have gotten used to issuing single commands with parameters (arguments) you should add I/O redirection using: <, >, <<, >>, and | to provide additional flexibility. Add quotes: ” ,’, ` and end of command character: ; and comment character : #. Once you understand the PATH variable and how the execute permission works you should be ready to get to the design part of the first half of the semester. IO Redirection Each UNIX program begins existence with default I/O “ports” usually called standard I/O sometimes streams, sometimes descriptors They are accessible via regular I/O routines/functions such as read, write, open, close, printf, etc In the shells they are called stdin, stdout and stderr. They have the associated descriptor numbers o, 1, 2. They are initially connected to the keyboard for stdin, terminal window for stdout and (frequently) terminal window for stderr. If the command –line-typist (you) wants something different than that, she must indicate that with I/O redirection. Some of the (usually) unused characters are used to indicate what is to happen to the command I/O streams. These characters are. >, <, >&, >>, << and | Thay are used as follows: cmd > someplace redirect the stdout of cmd to file someplace cmd 2>&1 redirect the stderr to stdout cmd >> someplace redirect and append the stdout of cmd to someplace cmd 2> someplace stderr goes to file someplace cmd &> someplace stdout and stderr go to someplace cmd < someplace stdin comes from someplace cmd << THEEND stdin comes from command line until THEEND cmd1 | cmd2 stdout of cmd1 connected to stdin of cmd2 CST 383 Shell & Script Programming with Unix Spring 2012 Access to arguments within a shell file: $0 cmd name $1 first argument to cmd $n nth argument $* all arguments $# number of arguments Useful Commands Echo, ls, cat, sort, help, type, more, cut, grep, ssh, exec, bzip2, test USE man or info to get invocation semantics and processes. Useful bash operations/key-words If then elif else fi, while do done, for in do done Look them up in the bash tutorials Other useful bash info 1) In Unix shells 0 (zero) => success == true 1 (one) => failure == false 2) All three of the constructs a) test b) [ x ] and c) [[ x ]] are kinds of testing functionality and return 0 or 1. The bash internal test works mostly on file metadata and string comparisons but may be extended with the full plethora of testing tools and regex functions by using parentheses ( x ). The bash built-in “[ x ]” is essentially the same as test. The bash extended testing keyword pair(s) “[[ x ]]” are more versatile in that there is no filename expansion, no word splitting and does arithmetic evaluations but not the level of arithmetic the double parentheses “(( x ))” accommodates. Be sure to leave spaces around all built-in and keywords. 3) The if construct is composed the following possible keywords: if then elif else fi If you observe the construction “else if” it may have been mistakenly used inn place of elif but may also indicate a nested if statement. Evaluate first. 4) In bash (as well as most of the other scripting shells in Unix) there are two types of quotes usually used with sequences of characters which are typically not regex. The least strict of these is the so-called double quote ( “ ), the other is the single quote ( ‘ ). The double quote prevents reinterpretation of all special characters except $, ‘, \ (dollar sign, single quote, and escape). Within single quotes even these (except the other single quote) lose their specialness. There is another quote character usually only used to indicate sub-command execution, the back quote. CST 383 Shell & Script Programming with Unix Spring 2012 Regular Expressions (regex) The general format of a regular expression statement is : /addr1/,/addr2/op/regex1/regex2/flags addr1 is the starting line to which the operation applied, may be a regex addr2 is the last line to which the operation is applied, may be a regex op (if present ) m is (for match), s (for substitute), t (for translate), etc regex1 & regex2 depend on the operation as to their presence and how they are used a single address indicates all lines the match the address criteria (e.g., /addr/op/regex/// no address means all lines missing but delineated regex2’s indicate “throw away” sed is a stream editor function and may have the full regular express functionality in its instructions awk is a stream editor to but has more capabilities and can group operations via the use of braces { } grep is usually only used in a matching functionality but can be extended by use if the exec parameter This part of the course is built around regular expressions and programs that were written to use them effectively: e.g., grep, sed and awk with some help from cut and find. Bash uses regular expressions as a natural part of parameters in command invocation, where appropriate. grep, sed and awk use regular expressions as a natural part of their instructions. Sed is commonly invoked this way %sed –f filename filelist # Note: here and below where appropriate, % is being used to indicate a user input prompt not a hash (associative array). The –f filename indicates that the file called filename contains instructions that are to be used on each line of the files in filelist, iteratively For single instructions you could use %sed –e “instruction” filelist Instructions usually are aimed at fields that are separated by a delimiters (default space) This can be changed with the –F parameter. Another way to embed multiple instructions inline (the command line) looks like: % sed ’ >instruction >instruction >instruction ‘ filelist > outputfile This gets all the command issues visible at the same time. Note the > in this case are second level requests (prompts) from the kernel for iinput. An additional sed parameter of interest: -n. With this parameter only lines explicitly affected by the instructions and that have a p in the instruction flag field are output. awk also performs operations on a file list with essentially the same parameters. In addition to the parameters discussed for sed, awk has an additional useful argument, –v variable=value, which takes effect in the instruction stream. Instructions for sed and awk are usually composed of a regular expression and an action, although the action may be implicit by not being part of the instruction stream. A typical instruction involves a regular CST 383 Shell & Script Programming with Unix Spring 2012 expression that is used to determine where in the input field the action is to happen and what the action is and if the action is to take place. Typical, instructions involve flags indicating: substitution, matching, translation, printing (to stdout), scope of operation and others. Here are some of the more useful regex metacharacters: . matches any single character but newline * matches any number of the single character preceding it [ . . . ] matches any occurrence of the class enclosed by [ ] ^ first character of class - range of characters $ last character of class \ escape the standard meaning of the following character \{n,m\} n to m of the preceding character – just n exactly n, n, n or more + 1 or more of the preceding character ? 0 or 1 of the preceding character [^ range] nothing in the range ( ) used to group regular expressions and for normal meaning must be escaped { } used in awk to group (with the aid of ; ) actions/procedures – need to be escaped otherwise Use of pipes in bash are encouraged. They are used to connect commands such as sed, awk, cut, grep BACK TO BASH SPECIAL CHARACTERS AND WORDS # Comment (escapable, as are most to follow so no more comments unless unescapable) echo print ; command separator (line feed replacement) if true/false ; then true/false ; else true/false ; fi; cmd/lf ;; case option terminator . “dot” among other means source or current working dir “ partial quote string, preserves most special char meaning (of character not special) ‘ full quote string, preserves all special char meaning as char not special , links arithmetic or concatenates strings \ escape special meaning while something-is-true ; do some things; done until something-is-true ; do some things; done case “$variable” in “$var1”) cmd ;; “$var2”) cmd ;; . . . esac ! negation, true-> false ; false -> true * wild card or multiply ? single char wild card or test operator (e.g., a op b ? x ? y , if a op b then x else y) $ end of line or variable dereferrence $* parameter as single word, “$*” handles spaces $@ parameters as array of quoted strings CST 383 Shell & Script Programming with Unix Spring 2012 $? Exit status of last op $$ process ID () command group, start a subshell; locals in subshell unavailable to parent () array initialization {} brace expansion or anonymous function locals avail to greater script [] test or array element [[ ]] test keyword | pipe && logical AND || logical OR & run cmd or grouping in background exit val return val break xyz () { stuff } local {} vs () sort or subroutines COMMAND LINE INTERFACES Different invocation methods command options script input-file command options ‘script’ input-file command options –f script-file input-file command options script-input < input-file command options script-input << #input from command line; sometimes – something | command options script -e option for extended reg expr In line example Sed ‘ >… >… ‘< infile > outfile Output sed prints every line unless quiet (-n) then only when told to (p) awk only prints when told to (print) reg expr for sed/awk /adrd1/,/addr2/op/re1/re2/flags CST 383 Shell & Script Programming with Unix Spring 2012 Addr1 start addr Addr2 stop addr Op – operations - s (substitute), t (translate), d (delete), … uses re1 and re2 as needed Flags - - options (i.e., global) awk ‘instructions’ in-files awk –f scriptfile in-files awk –F, to change delimiter to , (or another char) from space fields per line $0 all fields else $1, $2, … awk ‘{ print }’ in-file #output to stdout difference in print $1; print $2; … print $1 $2 $3 … awk option –v var=value see $ClassHome/../indexonly/sedawk2progs for examples EXTRA REG EXPR notation to look up Equivalence classes [= … =] POSIX ranges/classes [[: … :]] such as alpha, alnumb, blank, cntrl, digit, space, … STRING MANIPULATION IN BASH Length ${#string} expr length $string Match expr match “$string” ‘$substring’ expr “$string” : ‘$string’ Index expr index $string $substring Substring Extraction ${string:position} ${string:position:length} CST 383 Shell & Script Programming with Unix Substring removal (substring is a RE) ${string#substring} ${string##substring} ${string%substring} ${string%%substring} Substring replacement ${string/substring/replacement} ${string//substring/replacement} Functions Function function_name { … } Or Function_name() { … } Or Func_nam () { … Spring 2012 # front, shortest #front, longest #back, shortest #back, longest #first match #all matches ;} A function call is (generally) equivalent to a command. The function definition must precede the first call to it. There is no declaration of a function. Function may not be empty but can be nested. Local variables A local variable (i.e., declared as such) is visible only within the block of code in which it is declared (e.g., within a function). Global variables declared within a function are not visible until the function is called. All variables declared outside of a function have global scope (i.e., cannot be declared local).