AWK overview
Patterns and actions
Records and fields
Print vs. printf
1
Students' grades in a text file
John 22 56 38 70 85 80
Alex 90 89 79 98 35
How can I calculate John's current average within this file
GREP?
– Search for John with grep? Gives me the line.
– Now I can use my calculator to figure it out.
– SED?
sed will allow me to print, change, delete, etc.
I really want to automatically manipulate the values within this line.
This is where awk comes in.
(awk me amadeus)
2
The first initials from the last names of each of the authors, Aho, Weinberg and
Kernighan
Which awk are we tawking about?
– awk
– nawk – new awk ( on CS machines )
– gawk – GNU awk ( bart )
3
awk ‘/pattern/’ file
awk ‘{action}’ file
awk ‘/pattern/ {action;}' file
cat file | awk ‘{action}’
Awk automatically reads in the file for you line by line.
– No need to open/close file. (like in C or Java)
– pattern section FINDS LINES with that pattern
– action section does the actions you defined on the lines it found
– The original file does not change.
4
awk ‘{ print }’ fruit_prices
Note: Here the pattern is missing, in this case, the awk command print is used to print each line it read
5
awk ‘
/\$[0-9]*\.[0-9][0-9]*/ { print}
‘ fruit_prices
6
Actions are specified by the programmers not just print, delete, etc (p/d/s from sed). That is why it is so awesome!
Actions consists of
– variable assignments,
– arithmetic and logic operators,
– decision structures,
– looping structures.
For example, print, if, while and for
awk ‘{print}’ filename
7
format 1: awk ‘script’
– where INPUT must come from pipe or STDIN
– command | awk ‘script’
format 2: awk ‘script’ input1 input2 ... inputn
– where we supply input FILES as input1, input2, etc.
format 3: awk -f script_file input1...
(# in "script..." is comment)
8
Types
– Regular expressions
– BEGIN
Do all the stuff BEFORE reading any input
– END
does all this stuff AFTER reading ALL input.
Pattern is optional
If no pattern is specified, the "action" will occur for EVERY
LINE one @ time.
awk ‘{Action}’ filename
awk '{print;}' names prints all lines
awk ‘BEGIN {print “The average grades”}’
9
Supports
– ^, $, ., *, +, ?, [ABC], [^ABC],
– [A-Z], A|B, (AB)+, \, &
Not support
– Backreferencing, \( \)
– Repetition, \{ \}
10
awk ‘
BEGIN { actions ; }
/pattern/ { actions ; }
/pattern/ { actions ; }
‘ files
END { actions ;}
Execution steps:
1) If a BEGIN pattern is present, executes its actions
2) Reads an input line and parses it into fields
3) Compares each of the specified patterns against the input line, if find a match, executes the actions. This step is repeated for all patterns.
4) Repeats steps 2 and 3 while input lines are present
5) After the script reads all the input lines, if the END pattern is present, executes its actions
11
Place the following in the file tryawk1.awk
BEGIN { print "Starting to read input"; nLines = 0; }
/^.*$/ { nLines++; }
END { print “DONE: Total lines = “ nLines; }
– Run the command: cat tryawk1.awk | awk –f tryawk1.awk
– Counts the # of lines in the input
nLines is a variable … note NO declaration, just use
print command prints a line of text, adds newline to end of the line
12
awk has RECORDS (lines) and FIELDS
$0 represents the entire line of input
$1 represents the first field
Print just like echo
– Print $1 $2 # $1 concat $2
– Print $1, $2 # $1 OFS $2
cat fruit_prices
awk '{print;}' fruit_prices
awk '{print $0;}' fruit_prices
awk '{print $1;}' fruit_prices
awk '{print $2;}' fruit_prices
#prints all lines
#prints each entire line
#prints first field in each line
#prints second field in each line
13
cat phones.data
John Robinson 234-3456
Yin Pan 123-4567 awk ‘{ print $1, $2, $3 }’ phones.data
John Robinson 234-3456
Yin Pan 123-4567 awk ‘{ print $2 “, ”, $1, $3 }’ phones.data
Robinson, John 234-3456
Pan, Yin 123-4567 awk ‘/^$/ { print x += 1 }’ phones.data
awk ‘/Mary/ { print $0 }’ phones.data
14
ls l | awk ‘
$6 == "Oct" { sum += $5 ; }
‘
END { print sum ; }
ls -l | awk -f block_use.awk
cat block_use.awk
$6 == "Oct" { sum += $5 ; }
END { print sum ; }
15
#!/bin/sh awk ‘
/\$[1-9][0-9]*\.[0-9][09]*/ { print $0,”*”;}
/\$0\.[0-9][0-9]*/ { print ;}
‘ fruit_prices
16
awk defines RECORDS (lines) and FIELDS
– FS, input field separator (default=space/tab)
– OFS, output field separator (default=space)
– ORS, Output record separator (default=newline)
– RS, Input record separator (default=newline)
– NR, number of the current record being processed
– NF, number of fields within current record
– FILENAME, awk sets this pattern to the name of the file that it's currently reading. (If you have more than input file, awk resets this pattern as it reads each file in turn.
17
awk ‘{print $1, $3}’ names
– Put a line of input to $0 based on RS
– The line is broken into fields based on FS and store them in a numbered variable, starting with $1
– Prints the fields with print or others based on OFS to separate fields
– After awk displays it output, it goes to next line and repeat. The output lines are separated by ORS.
18
Manually resetting FS in a BEGIN pattern
– Forces you to hard code the value of the field separator
– BEGIN{FS=“:” ; }
– Example:
$ awk ‘BEGIN { FS=“:” ; } { print $1, $6 ; }’ /etc/passwd
Specifying the –F option to awk
– awk –F: ‘ { … } ’
– Enables using a shell variable to specify the field separator dynamically
– Example:
sep=‘:’
$ awk –F$sep ‘ { print $1, $6 ; }’ /etc/passwd
19
FirstName;LastName;Address;City;State;Zip;Phone
SSN:DOB:NumberOfDependents
HospitilizationCOde,DentalCode,LifeCOde
Convert this file format to:
SSN,LastName,FirstName,Address,….
20
awk ‘BEGIN{OFS=“,”; FS=“;”}
{NR%3==1 {FS=“;”; #prepare
F=$1; L=$2; A=$3;…..}
NR%3==2 {FS=“:”; SSN=$1;DOB=$2;…}
NR%3==0{FS=“,”;…;print F L A…}
}’ filename
21
printf
– 1 st argument is a string … the ‘format’
– Prints each character of the format
Upon reaching a %, the next few characters are a format specifier
The next argument is printed according to the specifier
– Does not append a newline
– More control over appearance of output
– Consider awk 'BEGIN { printf "%5.2f\n", 2/3; }'
Prints 0.67 (here, the represents a space)
%5.2f means print a fractional number (the ‘f’) in a field 5 characters wide, with 2 digits to the right of the decimal point.
22
printf - for formatting output of your
“print”
We have function print, why printf
– Printf allows us to FORMAT stuff.
– can FORCE printing of string
– Decimals
– whole numbers
– how many digits fall on either side of decimal pt
– scientific notation
– make things line up nicely
23
printf (format, what to print)
printf ( "%s", x)
– %s is a PLACEHOLDER for some OUTPUT.
– s is a specific type of output (string)
– ONE item (%s), must have ONE thing to print in the "what to print“
– format inside of quotes, followed by comma, followed by variables outside the quotes to print.
printf ( " s = %s ", x )
– "s=" is a LITERAL string
24
s = A character string f = A floating point number d or i= the integer part of a decimal number g or e = scientific notation of a floating point c = An ASCII character if x=65 and I use this print statement printf ( " s = %c ", x ) output is "s = A“
awk 'BEGIN{x=65; printf("char: %c\n", x)}'
25
More control:
– %wd
Print an integer out in a field of width w
If the number is smaller than w characters, print leading spaces
Try awk 'BEGIN { printf "%10d\n", 10; }' /dev/null
– Try to add a ‘-’ immediately after the %
Left justifies the value in the field
26
%ws
– Print a string out in a field of width w
– Supply leading spaces as necessary
Place a ‘-’ immediately after the % to get left justification
27
%w.df
– Prints the value out in a field of width w
– Places the decimal point d places from the right end
– Place a ‘-’ immediately after the % to get left justification
28
Apple 10 20 25
<---10----><-5-><-5-><-5->
awk ‘{printf (" %10s %5d %5d %d ", $1, $2, $3, $4 )}’ file
awk ‘{printf (" %-10s %5d %5d %d ", $1, $2, $3, $4 )}’ file
minus sign designates that this field will be LEFT JUSTIFIED
awk ‘{printf (" %-10s %-5d %-5d %d ", $1, $2, $3, $4 )}’ file
awk ‘{printf (“|%-15s|\n”, $1)}’
29
Let’s put an average in there...
printf (" %-10s %-5d %-5d %-5d %f ", $1, $2, $3, $4, average )
Will provide RAW number ( as many decimals as the calculation provides with 6 char’s to RIGHT of decimal)
printf (" %-10s %-5d %-5d %-5d %.2f ", $1, $2, $3, $4, average )
%.2f says use TWO char's to RIGHT of decimal
printf doesn't provide the newline automatically....
printf (" %-10s %-5d %-5d %-5d %.2f \n ", $1, $2, $3, $4, average )
30
A special awk variable
Control the printing of numbers when using print function
awk ‘BEGIN{print 1.243434534;}’
awk ‘BEGIN{OFMT=“%.2f”; print
1.23344455;}’
31