Perl Major parts of this lecture adapted from http://www.scs.leeds.ac.uk/Perl/start.html 26-Jul-16 Why Perl? Perl is built around regular expressions REs are good for string processing Therefore Perl is a good scripting language Perl is especially popular for CGI scripts Perl makes full use of the power of UNIX Short Perl programs can be very short “Perl is designed to make the easy jobs easy, without making the difficult jobs impossible.” -- Larry Wall, Programming Perl 2 Why not Perl? Perl is very UNIX-oriented Perl does not scale well to large programs Perl is available on other platforms... ...but isn’t always fully implemented there However, Perl is often the best way to get some UNIX capabilities on less capable platforms Weak subroutines, heavy use of global variables Perl’s syntax is not particularly appealing 3 What is a scripting language? Operating systems can do many things copy, move, create, delete, compare files execute programs, including compilers schedule activities, monitor processes, etc. A command-line interface gives you access to these functions, but only one at a time A scripting language is a “wrapper” language that integrates OS functions 4 Major scripting languages UNIX has sh, Perl Macintosh has AppleScript, Frontier Windows has no major scripting languages probably due to the weaknesses of DOS Generic scripting languages include: Perl (most popular) Tcl (easiest for beginners) Python (new, Java-like, best for large programs) 5 Perl versions To find out which version of Perl you are using, type perl -v at the command line Alternatively, put the following in a file version.pl Run this program with perl version.pl #!/usr/bin/perl print "\n\nHello from Perl $] !\n\n"; This may or may not work, depending on where and how Perl is located on your system Please use Perl 5 in your assignments I’m using Perl 5.8.8 6 Perl Example 1 #!/usr/local/bin/perl # # Program to do the obvious # print 'Hello world.'; # Print a message Save this on a file named hello.pl 7 Comments on “Hello, World” Perl statements end with semicolons Perl is case-sensitive Perl is compiled and run in a single operation Comments are # to end of line But the first line, #!/usr/local/bin/perl, tells where to find the Perl compiler on your system It’s usually here or at /usr/bin/perl Perl files should have the .pl extension How to run the hello.pl Perl program: perl hello.pl may work, and may or may not require the first #! line ./hello.pl (or .\hello.pl) may work, and requires the first #! line 8 Perl in EasyEclipse Open the Perl perspective Create a Perl Project File -> New -> Project... -> Perl -> Perl Project File -> New -> Perl Project (if in the Perl perspective) Create a file, and give it the .pl extension Window -> Open Perspective -> Other... -> Perl File -> New -> Perl File Write your Perl code Create a run configuration Run -> Run... -> Perl Local -> New_configuration In the Main tab: Name: enter a meaningful name (in place of “new_configuration” Project:Browse, choose project File to execute: choose file Apply and Run 9 Perl Example 2 #!/ex2/usr/bin/perl # Remove blank lines from a file # Usage: singlespace < oldfile > newfile while ($line = <STDIN>) { if ($line eq "\n") { next; } print "$line"; } 10 More Perl notes On the UNIX command line; In Perl, <STDIN> is the input file, <STDOUT> is the output file Scalar variables start with $ Scalar variables hold strings or numbers, and they are interchangeable Examples: < filename means to get input from this file > filename means to send output to this file $priority = 9; $priority = '9'; Array variables start with @ 11 Perl Example 3 #!/usr/local/bin/perl # Usage: fixm <filenames> # Replace \r with \n -- replaces input files foreach $file (@ARGV) { print "Processing $file\n"; if (-e "fixm_temp") { die "*** File fixm_temp already exists!\n"; } if (! -e $file) { die "*** No such file: $file!\n"; } open DOIT, "| tr \'\\015' \'\\012' < $file > fixm_temp" or die "*** Can't: tr '\015' '\012' < $infile > $outfile\n"; close DOIT; open DOIT, "| mv -f fixm_temp $file" or die "*** Can't: mv -f fixm_temp $file\n"; close DOIT; } 12 Comments on example 3 In # Usage: fixm <filenames>, the angle brackets just mean to supply a list of file names here In UNIX text editors, the \r (carriage return) character usually shows up as ^M (hence the name fixm_temp) The UNIX command tr '\015' '\012' replaces all \015 characters (\r) with \012 (\n) characters The format of the open and close commands is: open fileHandle, fileName close fileHandle, fileName says: Take input from $file, pipe it to the tr command, put the output on "| tr \'\\015' \'\\012' < $file > fixm_temp" fixm_temp 13 Arithmetic in Perl $a = 1 + 2; $a = 3 - 4; $a = 5 * 6; $a = 7 / 8; $a = 9 ** 10; $a = 5 % 2; ++$a; $a++; --$a; $a--; # Add 1 and 2 and store in $a # Subtract 4 from 3 and store in $a # Multiply 5 and 6 # Divide 7 by 8 to give 0.875 # Nine to the power of 10, that is, 910 # Remainder of 5 divided by 2 # Increment $a and then return it # Return $a and then increment it # Decrement $a and then return it # Return $a and then decrement it 14 String and assignment operators $a = $b . $c; # Concatenate $b and $c $a = $b x $c; # $b repeated $c times $a = $b; $a += $b; $a -= $b; $a .= $b; # Assign $b to $a # Add $b to $a # Subtract $b from $a # Append $b onto $a 15 Single and double quotes $a = 'apples'; $b = 'bananas'; print $a . ' and ' . $b; print '$a and $b'; prints: apples and bananas prints: $a and $b print "$a and $b"; prints: apples and bananas 16 Arrays @food = ("apples", "bananas", "cherries"); But… print $food[1]; @morefood = ("meat", @food); prints "bananas" @morefood == ("meat", "apples", "bananas", "cherries"); ($a, $b, $c) = (5, 10, 20); 17 push and pop push adds one or more things to the end of a list pop removes and returns the last element push (@food, "eggs", "bread"); push returns the new length of the list $sandwich = pop(@food); $len = @food; # $len gets length of @food $#food # returns index of last element 18 foreach # Visit each item in turn and call it $morsel foreach $morsel (@food) { print "$morsel\n"; print "Yum yum\n"; } 19 Tests “Zero” is false. This includes: 0, '0', "0", '', "" Anything not false is true Use == and != for numbers, eq and ne for strings &&, ||, and ! are and, or, and not, respectively. 20 for loops for loops are just as in C or Java for ($i = 0; $i < 10; ++$i) { print "$i\n"; } 21 while loops #!/usr/local/bin/perl print "Password? "; $a = <STDIN>; chop $a; # Remove the last character (\n) while ($a ne "fred") { print "sorry. Again? "; $a = <STDIN>; chop $a; } 22 do..while and do..until loops #!/usr/local/bin/perl do { print "Password? "; $a = <STDIN>; chop $a; } while ($a ne "fred"); 23 if statements if ($a) { print "The string is not empty\n"; } else { print "The string is empty\n"; 24 if - elsif statements if (!$a) { print "The string is empty\n"; } elsif (length($a) == 1) { print "The string has one character\n"; } elsif (length($a) == 2) { print "The string has two characters\n"; } else { print "The string has many characters\n"; } 25 Why Perl? Two factors make Perl important: Pattern matching/string manipulation Based on regular expressions (REs) REs are similar in power to those in Formal Languages… …but have many convenience features Ability to execute UNIX commands The Perl interpreter emulates these commands on nonUNIX platforms Often Perl is used simply for its UNIX emulation 26 Basic pattern matching $sentence =~ /the/ $sentence = "The dog bites."; if ($sentence =~ /the/) # is false True if $sentence contains "the" …because Perl is case-sensitive !~ is "does not contain" 27 RE special characters . # Any single character except a newline ^ # The beginning of the line or string $ # The end of the line or string * # Zero or more of the last character + # One or more of the last character ? # Zero or one of the last character 28 RE examples ^.*$ # matches the entire string hi.*bye # matches from "hi" to "bye" inclusive x +y # matches x, one or more blanks, and y ^Dear # matches "Dear" only at beginning bags? # matches "bag" or "bags" hiss+ # matches "hiss", "hisss", "hissss", etc. 29 Square brackets [qjk] # Either q or j or k [^qjk] # Neither q nor j nor k [a-z] # Anything from a to z inclusive [^a-z] # No lower case letters [a-zA-Z] # Any letter [a-z]+ # Any non-zero sequence of # lower case letters 30 More examples [aeiou]+ # matches one or more vowels [^aeiou]+ # matches one or more nonvowels [0-9]+ # matches an unsigned integer [0-9A-F] # matches a single hex digit [a-zA-Z] # matches any letter [a-zA-Z0-9_]+ # matches identifiers 31 More special characters \n \t \w \W \d \D \s \S \b \B # A newline # A tab # Any alphanumeric; same as [a-zA-Z0-9_] # Any non-word char; same as [^a-zA-Z0-9_] # Any digit. The same as [0-9] # Any non-digit. The same as [^0-9] # Any whitespace character # Any non-whitespace character # A word boundary, outside [] only # No word boundary 32 Quoting special characters \| \[ \) \* \^ \/ \\ # Vertical bar # An open square bracket # A closing parenthesis # An asterisk # A carat symbol # A slash # A backslash 33 Alternatives and parentheses jelly|cream # Either jelly or cream (eg|le)gs # Either eggs or legs (da)+ # Either da or dada or # dadada or... 34 Substitution =~ is a test, as in: $sentence =~ /the/ !~ is the negated test, as in: $sentence !~ /the/ =~ is also used for replacement, as in: $sentence =~ /london/London/ This is an expression, whose value is the number of substitutions made (0 or 1) 35 The $_ variable Often we want to process one string repeatedly The $_ variable holds the current string If a subject is omitted, $_ is assumed Hence, the following are equivalent: if ($sentence =~ /under/) … $_ = $sentence; if (/under/) ... 36 Global substitutions s/london/London/ s/london/London/g substitutes London for the first occurrence of london in $_ substitutes London for each occurrence of london in $_ The value of a substitution expression is the number of substitutions actually made 37 Case-insensitive substitutions s/london/London/i case-insensitive substitution; will replace london, LONDON, London, LoNDoN, etc. You can combine global substitution with caseinsensitive substitution s/london/London/gi 38 Remembering patterns Any part of the pattern enclosed in parentheses is assigned to the special variables $1, $2, $3, …, $9 Numbers are assigned according to the left (opening) parentheses "The moon is high" =~ /The (.*) is (.*)/ Afterwards, $1 = "moon" and $2 = "high" 39 Dynamic matching During the match, an early part of the match that is tentatively assigned to $1, $2, etc. can be referred to by \1, \2, etc. Example: \b.+\b matches a single word /(\b.+\b) \1/ matches repeated words "Now is the the time" =~ /(\b.+\b) \1/ Afterwards, $1 = "the" 40 tr tr does character-by-character translation tr returns the number of substitutions made $sentence =~ tr/abc/edf/; $count = ($sentence =~ tr/*/*/); replaces a with e, b with d, c with f counts asterisks tr/a-z/A-Z/; converts to all uppercase 41 split split breaks a string into parts $info = "Caine:Michael:Actor:14, Leafy Drive"; @personal = split(/:/, $info); @personal = ("Caine", "Michael", "Actor", "14, Leafy Drive"); 42 Associative arrays Associative arrays allow lookup by name rather than by index Associative array names begin with % Example: %fruit = ("apples", "red", "bananas", "yellow", "cherries", "red"); Now, $fruit{"bananas"} returns "yellow" Note: braces, not parentheses 43 Associative Arrays II Can be converted to normal arrays: @food = %fruit; You cannot index an associative array, but you can use the keys and values functions: foreach $f (keys %fruit) { print ("The color of $f is " . $fruit{$f} . "\n"); } 44 Associative Arrays III The function each gets key-value pairs while (($f, $c) = each(%fruit)) { print "$f is $c\n"; } 45 Calling subroutines Assume you have a subroutine printargs that just prints out its arguments Subroutine calls: &printargs("perly", "king"); Prints: "perly king" &printargs("frog", "and", "toad"); Prints: "frog and toad" 46 Defining subroutines Here's the definition of printargs: sub printargs { print "@_\n"; } Where are the parameters? Parameters are put in the array @_ @_ has nothing to do with $_ ; they are unrelated 47 Returning a result The value of a subroutine is the value of the last expression that was evaluated sub maximum { if ($_[0] > $_[1]) { $_[0]; } else { $_[1]; } } $biggest = &maximum(37, 24); 48 Local variables @_ is local to the subroutine, and… …so are $_[0], $_[1], $_[2], … local creates local variables 49 Example subroutine sub inside { local($a, $b); ($a, $b) = ($_[0], $_[1]); $a =~ s/ //g; $b =~ s/ //g; ($a =~ /$b/ || $b =~ /$a/); } &inside("lemon", "dole money"); # Make local variables # Assign values # Strip spaces from # local variables # Is $b inside $a # or $a inside $b? # true 50 Perl 5 Perl 5 is usually described as “a whole new language” However, Perl 5 is mostly backward compatible, and there are only a few apparent differences Perl 4 had three types of data: $scalar , @array, and %hash Perl 5 adds another item: the reference References are indicated by \ Perl 5 interpolates arrays into double-quoted strings Perl 5 provides reluctant quantifiers (with ?) Perl 5 has modules, which are similar to classes The most significant difference is that Perl can now be written in a more object-oriented fashion (like C++ compares to C) Perl now provides a handful of modules (just as Java has always provided prewritten classes) Perl 5 has “auto” variables Variables may now be declared within a lexical scope 51 Perl 6 Whereas Perl 5 is often described as “a whole new language,” Perl 6 is a whole new language The best summary of the differences that I’ve found is http://perlcabal.org/syn/Differences.html Perl 6 status: Vaporware? Under development since 1999 or 2000 Larry Wall: “We're working on it, slowly but surely...or notso-surely in the spots we're not so sure...” Just interesting reading: An interview with Larry Wall: http://lwn.net/2001/features/LarryWall/ 52 The End 53