AWK Kevin Taylor CSC 415: Programming Languages Dr. Lyle October 21, 2014 History of AWK Starting out The original version of awk was developed back in 1977 at AT&T Bell Laboratories. The name “awk” originates from its three designers; Alfred V. Aho, Peter J. Weinberger and Brian W. Kernighan. Much like other programming languages, the first version had various issues that the three designers and others worked to fix. The new awk, or nawk, was released eight years later. This new version combined what awk had to offer plus the added features of user-defined functions, multiple input streams, and computed regular expressions. Nawk would be easily available for UNIX System V Release 3.1 and 4. Along with release 4 came more new features and fine tuning features which still had faults. Prior to the System V Releases came the creation of GNU awk, or gawk. Gawk had four designers, none of whom were the original three. Gawk has become possible the more popular of the three types of awk, and in the late 80's gawk was reworked to be compatible with nawk. During the 90's awk and gawk were given network access and had constant development for bug fixes, performance updates and sometimes other new features. Currently, gawk version 4.0 is the most used variant, but awk itself is still used by many people. Data Types First and foremost, awk is a data-driven, text-processing scripting language. Awk expressions can only be in the value of a string or a number. With that being said, most, if not all awk programs will not have reserved words such as int, double, string, float, long, etc (although not a “reserved” word, there is an int(x) function that rounds numbers truncated towards 0). Certain contexts require numeric values. They convert strings to numbers by interpreting the text of the string as a numeral. If the string does not look like a numeral, it converts to 0. Other contexts will require string values. They are converted to numbers by use of “sprintf”. It is also possible to force a string to numeric conversion or vice versa. String to numeric conversion is done by adding a “0” to the end while numeric to string conversion can be completed by concatenating it with a null string. If a variable is not initialized then it will hold a null value, or 0 if a number is required. Control Statements Control statements are statements that control the flow of the execution of a program. In the awk language there are eight main control statements (if, while, do, for, break, continue, next, exit). All of these are vital to have a viable scripting language and awk makes use of them all. Functions Awk has two main types of functions, built-in and user-defined. Built-in functions are functions that you can always call in your program. The built-in functions within awk are either related to numeric operations (such as sin, cos, tan, rand, etc), string operations (such as index, length, match, etc), and I/O operations (close and system). User-defined functions are just like built-in functions, but you need to define what it is they do. To show some syntax and what exactly is meant by user-defined, here is an example that formats a number to what you want/need: function mynumber(num) { printf "%6.3g\n", num } There is also something major to watch out for when creating a user-defined function which is whitespace characters (spaces and tabs) are not allowed between the function name and the open-parenthesis of the argument list. If you put whitespace in this area by mistake, awk might think that you mean to concatenate a variable with an expression in parentheses. It should also be noted that function names, regardless of built-in or user-defined, are global and can be referenced in code before the actual definition. Expressions Expressions are the vital building block for awk programs to function correctly. An expression evaluates to a value, which you can print, test, store in a variable or pass to a function. Just like with other programming languages expressions in awk include variables, array references, constants, and function calls, as well as combinations of these with various operators. Variables in awk can be a sequence of letters, digits and underscores, but it may not begin with a digit. Unlike languages like Ada, case is significant in that “myname” and “myName” are completely different. Awk also holds many built-in variables which are useful. Some of the more common ones include FS (Input Field Separator), OFS (Output Field Separator), NF (Number of Fields), and NR (Number of Records). A quick example of a built-in variable can be seen here: #!/bin/awk -f { if ( $0 ~ /:/ ) { FS=":"; } else { FS=" "; } #print the third field, whatever format print $3 } Another topic to talk about is assignment expressions. In awk it is just like any other language as far as how simple it is to assign values. Examples of storing a number or string is shown below: x=1 thing = “rabbit” Awk also allows incrementing and decrementing the value of a variable like in other C based languages (ex. --value, value--, ++value, or value++). Object Orientation in awk Although awk itself cannot support object orientation, it should be noted that a version of awk called awk++ was developed to be able to use standard OO tools. Included in this new library for awk is classes, class properties, methods, and inheritance/multiple inheritance. Object orientation in awk is very similar to C++ while retaining awk syntax. Some code examples that show adding methods, objects, or a class with inheritance can be seen here: class class_name { attribute variable_name1 method method_name(parameters) { ...any awk code.... } ..other method definitions... } object_variable = class_name.new[(optional parameters)] class class_name : inherited_class_name [ : inherited_class_name...] {.....} Other One topic that wasn't covered but have importance is exception handling. The main form of exception handling in awk is using an exit statement. The exit statement causes awk to immediately stop executing the current rule and to stop processing input; any remaining input is ignored. A quick example of this can be seen here. BEGIN { if (("date" | getline date_now) <= 0) { print "Can't get system date" > "/dev/stderr" exit 1 } print "current date is", date_now close("date") } Evaluation For this last section, awk will be evaluated in four separate categories. These categories are readability, writability, reliability, and cost. Readability From someone who has had a few years background in programming and scripting, awk is someone difficult to read at first. To start off, the syntax looks slightly different than its successor Perl, which is fairly easy to grasp. That being said, Learning Perl can be done easily, but if you have learned awk first, you pick up Perl very fast. Another hindrance in readability is when using some built-in variables. An example of this is using the Input Field Separator: One Two:Three:4 Five and you executed the following script: #!/bin/awk -f { print $2 FS=":" print $2 } At first glace you would think that this would print “Three”, but that isn't the case. The above script would actually print “Two:Three:4” twice. This is due to if you change the field separator before you read the line, the change affects what you read. If you change it after you read the line, it will not redefine the variables. Awk is full of odd happenings such as that which makes the readability somewhat hard to understand when first learning the language. Writability The writability of awk is more or less relative to the readability. As soon as one is able to read awk scripts they should be able to write them with ease. The ability to write in awk is also similar to writing in Sed, since both have very similar syntax. If one is going for a language that is similar to awk and is about as powerful, Perl would be the best option since it is more or less a combination of awk and Sed. Overall, the writability of awk is easier than the readability, although that seems strange. When someone is in the process of creating a script it will more than likely be easier to catch your own mistake as opposed to trying to find someones mistake in a script you just saw. Reliability As far as the reliability of awk goes, it works. Since it has been developed it has been updated many times and has other versions (nawk, gawk, awk++), so naturally the newer versions will be more reliable. Most places that used awk have since moved on to different languages (mostly Bash or Perl). However, even if businesses do not tend to use awk, it is still fairly popular among UNIX programmers for simple programs they develop. Cost When referring to the cost of awk, it can only be rounded down to how much you spend on your machine that you use and how much it would cost to train someone to be proficient in the language (although this is inapplicable today). The language itself is free and can be put on any machine, though it is more common on Linux/UNIX machines. That cost aside, if a company is still around today that uses awk, they are probably using gawk or awk++, which would also require the several weeks to learn the basics (usually a class like setting) then some learning on your own end. Overall, the cost cannot be put into a definite number. Overall From a still novice programmers prospective, awk seems to be an overall good scripting language. Although this type and this specific language is not used very often today, it is still powerful enough to get the job done and then some. For those who felt there would be no use for awk, the updates have given it network access and made it object oriented. Overall, this is a language which still has some life left in it in today's world, even though it can be easily replaced. References Barnett, Bruce. "Awk." A Tutorial and Introduction. General Electric Company, 22 Sept. 2001. Web. 20 Oct. 2014. <http://www.grymoire.com/Unix/Awk.html>. "The GNU Awk User’s Guide." The GNU Awk User’s Guide. Free Software Foundation, Inc., n.d. Web. 20 Oct. 2014. <http%3A%2F%2Fwww.gnu.org%2Fsoftware%2Fgawk %2Fmanual%2Fgawk.html>. Robbins, Arnold D. "Getting Started with Awk." AWK Language Programming. N.p., 22 July 1996. Web. 19 Oct. 2014. <http://www.chemie.fuberlin.de/chemnet/use/info/gawk/gawk_3.html>. Robbins, Arnold D. "Preface." AWK Language Programming. N.p., 22 July 1996. Web. 19 Oct. 2014. <http://www.chemie.fu-berlin.de/chemnet/use/info/gawk/gawk_1.html>. Bezroukov, Nikolai. "AWK Programming." AWK Programming. Softpanorama, n.d. Web. 15 Oct. 2014. <http://www.softpanorama.org/Tools/awk.shtml>. Natarajan, Ramesh. "AWK Vs NAWK Vs GAWK." The Geek Stuff. The Geek Stuff, 29 June 2011. Web. 16 Oct. 2014. <http://www.thegeekstuff.com/2011/06/awk-nawk-gawk/>. "Scope/Function Names and Labels." Rosetta Code. Rosettacode.org, n.d. Web. 16 Oct. 2014. <http://rosettacode.org/wiki/Scope/Function_names_and_labels>. Schreiner, Axel T. "OO Tools in AWK." Awk.info » Oo. Awk.Info, Mar. 2009. Web. 15 Oct. 2014. <http://awk.info/?Oo#4>. Close, Diane B., and Arnold D. Robbins. "The AWK Manual." Table of Contents. N.p., 17 July 1995. Web. 16 Oct. 2014. <http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_toc.html#TOC115>.