AWK

advertisement
AWK
Kevin Taylor
CSC 415: Programming Languages
Dr. Lyle
October 21, 2014
History of AWK
Starting out
The original version of awk was developed back in 1977 at AT&T Bell Laboratories. The
name “awk” originates from its three designers; Alfred V. Aho, Peter J. Weinberger and Brian W.
Kernighan. Much like other programming languages, the first version had various issues that the
three designers and others worked to fix. The new awk, or nawk, was released eight years later.
This new version combined what awk had to offer plus the added features of user-defined
functions, multiple input streams, and computed regular expressions. Nawk would be easily
available for UNIX System V Release 3.1 and 4. Along with release 4 came more new features
and fine tuning features which still had faults.
Prior to the System V Releases came the creation of GNU awk, or gawk. Gawk had four
designers, none of whom were the original three. Gawk has become possible the more popular
of the three types of awk, and in the late 80's gawk was reworked to be compatible with nawk.
During the 90's awk and gawk were given network access and had constant development for bug
fixes, performance updates and sometimes other new features. Currently, gawk version 4.0 is the
most used variant, but awk itself is still used by many people.
Data Types
First and foremost, awk is a data-driven, text-processing scripting language. Awk
expressions can only be in the value of a string or a number. With that being said, most, if not all
awk programs will not have reserved words such as int, double, string, float, long, etc (although
not a “reserved” word, there is an int(x) function that rounds numbers truncated towards 0).
Certain contexts require numeric values. They convert strings to numbers by interpreting the text
of the string as a numeral. If the string does not look like a numeral, it converts to 0. Other
contexts will require string values. They are converted to numbers by use of “sprintf”. It is also
possible to force a string to numeric conversion or vice versa. String to numeric conversion is
done by adding a “0” to the end while numeric to string conversion can be completed by
concatenating it with a null string. If a variable is not initialized then it will hold a null value, or
0 if a number is required.
Control Statements
Control statements are statements that control the flow of the execution of a program. In
the awk language there are eight main control statements (if, while, do, for, break, continue, next,
exit). All of these are vital to have a viable scripting language and awk makes use of them all.
Functions
Awk has two main types of functions, built-in and user-defined. Built-in functions are
functions that you can always call in your program. The built-in functions within awk are either
related to numeric operations (such as sin, cos, tan, rand, etc), string operations (such as index,
length, match, etc), and I/O operations (close and system). User-defined functions are just like
built-in functions, but you need to define what it is they do. To show some syntax and what
exactly is meant by user-defined, here is an example that formats a number to what you
want/need:
function mynumber(num)
{
printf "%6.3g\n", num
}
There is also something major to watch out for when creating a user-defined function
which is whitespace characters (spaces and tabs) are not allowed between the function name and
the open-parenthesis of the argument list. If you put whitespace in this area by mistake, awk
might think that you mean to concatenate a variable with an expression in parentheses. It should
also be noted that function names, regardless of built-in or user-defined, are global and can be
referenced in code before the actual definition.
Expressions
Expressions are the vital building block for awk programs to function correctly. An
expression evaluates to a value, which you can print, test, store in a variable or pass to a function.
Just like with other programming languages expressions in awk include variables, array
references, constants, and function calls, as well as combinations of these with various operators.
Variables in awk can be a sequence of letters, digits and underscores, but it may not begin with a
digit. Unlike languages like Ada, case is significant in that “myname” and “myName” are
completely different. Awk also holds many built-in variables which are useful. Some of the more
common ones include FS (Input Field Separator), OFS (Output Field Separator), NF (Number of
Fields), and NR (Number of Records). A quick example of a built-in variable can be seen here:
#!/bin/awk -f
{
if ( $0 ~ /:/ ) {
FS=":";
} else {
FS=" ";
}
#print the third field, whatever format
print $3
}
Another topic to talk about is assignment expressions. In awk it is just like any other language as
far as how simple it is to assign values. Examples of storing a number or string is shown below:
x=1
thing = “rabbit”
Awk also allows incrementing and decrementing the value of a variable like in other C based
languages (ex. --value, value--, ++value, or value++).
Object Orientation in awk
Although awk itself cannot support object orientation, it should be noted that a version of
awk called awk++ was developed to be able to use standard OO tools. Included in this new
library for awk is classes, class properties, methods, and inheritance/multiple inheritance. Object
orientation in awk is very similar to C++ while retaining awk syntax. Some code examples that
show adding methods, objects, or a class with inheritance can be seen here:
class class_name {
attribute variable_name1
method method_name(parameters) {
...any awk code....
}
..other method definitions...
}
object_variable = class_name.new[(optional parameters)]
class class_name : inherited_class_name [ : inherited_class_name...] {.....}
Other
One topic that wasn't covered but have importance is exception handling. The main form
of exception handling in awk is using an exit statement. The exit statement causes awk to
immediately stop executing the current rule and to stop processing input; any remaining input is
ignored. A quick example of this can be seen here.
BEGIN {
if (("date" | getline date_now) <= 0) {
print "Can't get system date" > "/dev/stderr"
exit 1
}
print "current date is", date_now
close("date")
}
Evaluation
For this last section, awk will be evaluated in four separate categories. These categories
are readability, writability, reliability, and cost.
Readability
From someone who has had a few years background in programming and scripting, awk
is someone difficult to read at first. To start off, the syntax looks slightly different than its
successor Perl, which is fairly easy to grasp. That being said, Learning Perl can be done easily,
but if you have learned awk first, you pick up Perl very fast. Another hindrance in readability is
when using some built-in variables. An example of this is using the Input Field Separator:
One Two:Three:4 Five
and you executed the following script:
#!/bin/awk -f
{
print $2
FS=":"
print $2
}
At first glace you would think that this would print “Three”, but that isn't the case. The above
script would actually print “Two:Three:4” twice. This is due to if you change the field separator
before you read the line, the change affects what you read. If you change it after you read the
line, it will not redefine the variables. Awk is full of odd happenings such as that which makes
the readability somewhat hard to understand when first learning the language.
Writability
The writability of awk is more or less relative to the readability. As soon as one is able to
read awk scripts they should be able to write them with ease. The ability to write in awk is also
similar to writing in Sed, since both have very similar syntax. If one is going for a language that
is similar to awk and is about as powerful, Perl would be the best option since it is more or less a
combination of awk and Sed. Overall, the writability of awk is easier than the readability,
although that seems strange. When someone is in the process of creating a script it will more
than likely be easier to catch your own mistake as opposed to trying to find someones mistake in
a script you just saw.
Reliability
As far as the reliability of awk goes, it works. Since it has been developed it has been
updated many times and has other versions (nawk, gawk, awk++), so naturally the newer
versions will be more reliable. Most places that used awk have since moved on to different
languages (mostly Bash or Perl). However, even if businesses do not tend to use awk, it is still
fairly popular among UNIX programmers for simple programs they develop.
Cost
When referring to the cost of awk, it can only be rounded down to how much you spend
on your machine that you use and how much it would cost to train someone to be proficient in
the language (although this is inapplicable today). The language itself is free and can be put on
any machine, though it is more common on Linux/UNIX machines. That cost aside, if a
company is still around today that uses awk, they are probably using gawk or awk++, which
would also require the several weeks to learn the basics (usually a class like setting) then some
learning on your own end. Overall, the cost cannot be put into a definite number.
Overall
From a still novice programmers prospective, awk seems to be an overall good scripting
language. Although this type and this specific language is not used very often today, it is still
powerful enough to get the job done and then some. For those who felt there would be no use for
awk, the updates have given it network access and made it object oriented. Overall, this is a
language which still has some life left in it in today's world, even though it can be easily
replaced.
References
Barnett, Bruce. "Awk." A Tutorial and Introduction. General Electric Company, 22 Sept. 2001.
Web. 20 Oct. 2014. <http://www.grymoire.com/Unix/Awk.html>.
"The GNU Awk User’s Guide." The GNU Awk User’s Guide. Free Software Foundation,
Inc., n.d. Web. 20 Oct. 2014. <http%3A%2F%2Fwww.gnu.org%2Fsoftware%2Fgawk
%2Fmanual%2Fgawk.html>.
Robbins, Arnold D. "Getting Started with Awk." AWK Language Programming. N.p., 22 July
1996. Web. 19 Oct. 2014. <http://www.chemie.fuberlin.de/chemnet/use/info/gawk/gawk_3.html>.
Robbins, Arnold D. "Preface." AWK Language Programming. N.p., 22 July 1996. Web. 19 Oct.
2014. <http://www.chemie.fu-berlin.de/chemnet/use/info/gawk/gawk_1.html>.
Bezroukov, Nikolai. "AWK Programming." AWK Programming. Softpanorama, n.d. Web. 15
Oct. 2014. <http://www.softpanorama.org/Tools/awk.shtml>.
Natarajan, Ramesh. "AWK Vs NAWK Vs GAWK." The Geek Stuff. The Geek Stuff, 29 June
2011. Web. 16 Oct. 2014. <http://www.thegeekstuff.com/2011/06/awk-nawk-gawk/>.
"Scope/Function Names and Labels." Rosetta Code. Rosettacode.org, n.d. Web. 16 Oct.
2014. <http://rosettacode.org/wiki/Scope/Function_names_and_labels>.
Schreiner, Axel T. "OO Tools in AWK." Awk.info » Oo. Awk.Info, Mar. 2009. Web. 15 Oct. 2014.
<http://awk.info/?Oo#4>.
Close, Diane B., and Arnold D. Robbins. "The AWK Manual." Table of Contents. N.p., 17 July
1995. Web. 16 Oct. 2014.
<http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_toc.html#TOC115>.
Download