LN1

advertisement
CST 383
Shell & Script Programming with Unix
Spring 2012
Brief Lecture Notes 1
for
CST 383
Spring 2012
My suggestion for learning bash is to find an online book or tutorial to start if you have no experience with *nix
shell scripting. A good reference is http://www.tldp.org/LDP/abs/ which gives you access to several varieties of
online or downloadable bash guides.
The first things you need to understand about the kind of scripting we will cover in this class is the it centers on
command line windows (also known as terminal windows). Once you have gotten used to issuing single commands
with parameters (arguments) you should add I/O redirection using: <, >, <<, >>, and | to provide additional
flexibility. Add quotes: ” ,’, ` and end of command character: ; and comment character : #. Once you understand
the PATH variable and how the execute permission works you should be ready to get to the design part of the first
half of the semester.
IO Redirection
Each UNIX program begins existence with default I/O “ports” usually called standard I/O sometimes
streams, sometimes descriptors
They are accessible via regular I/O routines/functions such as read, write, open, close, printf, etc
In the shells they are called stdin, stdout and stderr.
They have the associated descriptor numbers o, 1, 2.
They are initially connected to the keyboard for stdin, terminal window for stdout and (frequently)
terminal window for stderr.
If the command –line-typist (you) wants something different than that, she must indicate that with I/O
redirection. Some of the (usually) unused characters are used to indicate what is to happen to the
command I/O streams. These characters are. >, <, >&, >>, << and | Thay are used as
follows:
cmd > someplace
redirect the stdout of cmd to file someplace
cmd 2>&1
redirect the stderr to stdout
cmd >> someplace
redirect and append the stdout of cmd to someplace
cmd 2> someplace
stderr goes to file someplace
cmd &> someplace
stdout and stderr go to someplace
cmd < someplace
stdin comes from someplace
cmd << THEEND
stdin comes from command line until THEEND
cmd1 | cmd2
stdout of cmd1 connected to stdin of cmd2
CST 383
Shell & Script Programming with Unix
Spring 2012
Access to arguments within a shell file:
$0
cmd name
$1
first argument to cmd
$n
nth argument
$*
all arguments
$#
number of arguments
Useful Commands
Echo, ls, cat, sort, help, type, more, cut, grep, ssh, exec, bzip2, test
USE man or info to get invocation semantics and processes.
Useful bash operations/key-words
If then elif else fi, while do done, for in do done
Look them up in the bash tutorials
Other useful bash info
1) In Unix shells 0 (zero) => success == true
1 (one) => failure == false
2) All three of the constructs a) test b) [ x ] and c) [[ x ]] are kinds of testing functionality and return 0 or 1.
The bash internal test works mostly on file metadata and string comparisons but may be extended with the
full plethora of testing tools and regex functions by using parentheses ( x ).
The bash built-in “[ x ]” is essentially the same as test.
The bash extended testing keyword pair(s) “[[ x ]]” are more versatile in that there is no filename
expansion, no word splitting and does arithmetic evaluations but not the level of arithmetic the double
parentheses “(( x ))” accommodates.
Be sure to leave spaces around all built-in and keywords.
3) The if construct is composed the following possible keywords: if then elif else fi
If you observe the construction “else if” it may have been mistakenly used inn place of elif but may also
indicate a nested if statement. Evaluate first.
4) In bash (as well as most of the other scripting shells in Unix) there are two types of quotes usually used
with sequences of characters which are typically not regex. The least strict of these is the so-called double
quote ( “ ), the other is the single quote ( ‘ ). The double quote prevents reinterpretation of all special
characters except $, ‘, \ (dollar sign, single quote, and escape).
Within single quotes even these (except the other single quote) lose their specialness.
There is another quote character usually only used to indicate sub-command execution, the back quote.
CST 383
Shell & Script Programming with Unix
Spring 2012
Regular Expressions (regex)
The general format of a regular expression statement is :
/addr1/,/addr2/op/regex1/regex2/flags
addr1 is the starting line to which the operation applied, may be a regex
addr2 is the last line to which the operation is applied, may be a regex
op
(if present ) m is (for match), s (for substitute), t (for translate), etc
regex1 & regex2 depend on the operation as to their presence and how they are used
a single address indicates all lines the match the address criteria (e.g., /addr/op/regex///
no address means all lines
missing but delineated regex2’s indicate “throw away”
sed is a stream editor function and may have the full regular express functionality in its instructions
awk is a stream editor to but has more capabilities and can group operations via the use of braces { }
grep is usually only used in a matching functionality but can be extended by use if the exec parameter
This part of the course is built around regular expressions and programs that were written to use them effectively:
e.g., grep, sed and awk with some help from cut and find. Bash uses regular expressions as a natural part of
parameters in command invocation, where appropriate. grep, sed and awk use regular expressions as a natural
part of their instructions.
Sed is commonly invoked this way
%sed –f filename filelist # Note: here and below where appropriate, % is being used to indicate a
user input prompt not a hash (associative array).
The –f filename indicates that the file called filename contains instructions that are to be used on each
line of the files in filelist, iteratively
For single instructions you could use %sed –e “instruction” filelist
Instructions usually are aimed at fields that are separated by a delimiters (default space)
This can be changed with the –F parameter.
Another way to embed multiple instructions inline (the command line) looks like:
% sed ’
>instruction
>instruction
>instruction ‘ filelist > outputfile
This gets all the command issues visible at the same time. Note the > in this case are second level requests
(prompts) from the kernel for iinput.
An additional sed parameter of interest: -n. With this parameter only lines explicitly affected by the
instructions and that have a p in the instruction flag field are output.
awk also performs operations on a file list with essentially the same parameters. In addition to the parameters
discussed for sed, awk has an additional useful argument, –v variable=value, which takes effect
in the instruction stream.
Instructions for sed and awk are usually composed of a regular expression and an action, although the action
may be implicit by not being part of the instruction stream. A typical instruction involves a regular
CST 383
Shell & Script Programming with Unix
Spring 2012
expression that is used to determine where in the input field the action is to happen and what the action is
and if the action is to take place. Typical, instructions involve flags indicating: substitution, matching,
translation, printing (to stdout), scope of operation and others.
Here are some of the more useful regex metacharacters:
.
matches any single character but newline
*
matches any number of the single character preceding it
[ . . . ] matches any occurrence of the class enclosed by [ ]
^ first character of class
- range of characters
$ last character of class
\
escape the standard meaning of the following character
\{n,m\} n to m of the preceding character – just n exactly n, n, n or more
+
1 or more of the preceding character
?
0 or 1 of the preceding character
[^ range] nothing in the range
( ) used to group regular expressions and for normal meaning must be escaped
{ } used in awk to group (with the aid of ; ) actions/procedures – need to be escaped otherwise
Use of pipes in bash are encouraged. They are used to connect commands such as sed, awk, cut, grep
BACK TO BASH
SPECIAL CHARACTERS AND WORDS
#
Comment (escapable, as are most to follow so no more comments unless unescapable)
echo print
;
command separator (line feed replacement)
if true/false ; then true/false ; else true/false ; fi; cmd/lf
;;
case option terminator
.
“dot” among other means source or current working dir
“
partial quote string, preserves most special char meaning (of character not special)
‘
full quote string, preserves all special char meaning as char not special
,
links arithmetic or concatenates strings
\
escape special meaning
while something-is-true ; do some things; done
until something-is-true ; do some things; done
case “$variable” in “$var1”) cmd ;; “$var2”) cmd ;; . . . esac
!
negation, true-> false ; false -> true
*
wild card or multiply
?
single char wild card or test operator (e.g., a op b ? x ? y , if a op b then x else y)
$
end of line or variable dereferrence
$*
parameter as single word, “$*” handles spaces
$@
parameters as array of quoted strings
CST 383
Shell & Script Programming with Unix
Spring 2012
$?
Exit status of last op
$$
process ID
()
command group, start a subshell; locals in subshell unavailable to parent
()
array initialization
{}
brace expansion or anonymous function locals avail to greater script
[]
test or array element
[[ ]] test keyword
|
pipe
&& logical AND
||
logical OR
&
run cmd or grouping in background
exit val
return val
break
xyz () { stuff }
local
{} vs ()
sort or subroutines
COMMAND LINE INTERFACES
Different invocation methods
command options script input-file
command options ‘script’ input-file
command options –f script-file input-file
command options script-input < input-file
command options script-input << #input from command line; sometimes –
something | command options script
-e option for extended reg expr
In line example
Sed ‘
>…
>…
‘< infile > outfile
Output
sed prints every line unless quiet (-n) then only when told to (p)
awk only prints when told to (print)
reg expr for sed/awk
/adrd1/,/addr2/op/re1/re2/flags
CST 383
Shell & Script Programming with Unix
Spring 2012
Addr1 start addr
Addr2 stop addr
Op – operations - s (substitute), t (translate), d (delete), … uses re1 and re2 as needed
Flags - - options (i.e., global)
awk ‘instructions’ in-files
awk –f scriptfile in-files
awk –F, to change delimiter to , (or another char) from space
fields per line
$0 all fields else $1, $2, …
awk ‘{ print }’ in-file #output to stdout
difference in
print $1; print $2; …
print $1 $2 $3 …
awk option –v var=value
see $ClassHome/../indexonly/sedawk2progs for examples
EXTRA REG EXPR notation to look up
Equivalence classes [= … =]
POSIX ranges/classes [[: … :]] such as alpha, alnumb, blank, cntrl, digit, space, …
STRING MANIPULATION IN BASH
Length
${#string}
expr length $string
Match
expr match “$string” ‘$substring’
expr “$string” : ‘$string’
Index
expr index $string $substring
Substring Extraction
${string:position}
${string:position:length}
CST 383
Shell & Script Programming with Unix
Substring removal (substring is a RE)
${string#substring}
${string##substring}
${string%substring}
${string%%substring}
Substring replacement
${string/substring/replacement}
${string//substring/replacement}
Functions
Function function_name {
…
}
Or
Function_name()
{
…
}
Or
Func_nam () {
…
Spring 2012
# front, shortest
#front, longest
#back, shortest
#back, longest
#first match
#all matches
;}
A function call is (generally) equivalent to a command. The function definition must precede the first
call to it. There is no declaration of a function. Function may not be empty but can be nested.
Local variables
A local variable (i.e., declared as such) is visible only within the block of code in which it is
declared (e.g., within a function).
Global variables declared within a function are not visible until the function is called. All variables
declared outside of a function have global scope (i.e., cannot be declared local).
Download