Perl OBJECTIVES What is Perl Concepts Variables Control Structures Modules Objects Windows AE6382 Perl Practical Extraction and Report Language Originally designed as a text processing and “glue” language Perl is a scripting language Each invocation of a Perl script compiles then executes code Uses a C-like syntax Has object-oriented programming features Highly portable between OS’s AE6382 Running Perl On Unix Typically set line 1 to #!/usr/bin/perl (wherever Perl is installed) On Windows Set file extension to – – .pl for standard Perl .pls for PerlScript (ActiveX scripting engine) Run from the perl command line AE6382 Variables Perl is not a strongly typed language, the contents of a variable are converted as necessary The first character of a variable name indicates the type of a variable $name The name part of variable can also be enclosed in { } ${name} @{$reference_to_array} Scalar Array Hash Subroutine Typeglob $ @ % & * Name of individual value List a values, keyed by index List of values, keyed by string Callable Perl code Everything AE6382 Variables - Scalar A scalar represents a single value The data held by the variable is converted as necessary Scalar names start with a $ Integer Floating point String Reference $name As an lvalue $name = “george burdell”; AE6382 Variables - Arrays An array is an ordered list of scalars Arrays are indexed by a number, starting at 0 Arrays indexed by negative numbers are ordered backwards from the end of the array The indexing operator is [ ] An array starts with @ To refer to full array (or a slice) @names @names[1,3,5] @names[2 .. 6] slice slice A single element of an array starts with $ $names[4] $names[$value] AE6382 Variables - Arrays As an lvalue $names[4] = 345; @names = (1,2,3,4,5); @names = 1 .. 5; $last_value = $names[-1]; AE6382 Variables - Hashes A hash, or associative array, is an un-ordered list of scalars Hashes are indexed by strings The indexing operator is { } A hash starts with % To refer to the entire hash A single element of a hash starts with $ %months $months{‘Mar’} $months{$some_string} As an lvalue $months{‘Mar’} = ‘March’; %months = (‘Jan’ => ‘January’, ‘Feb’ => ‘February’); AE6382 Variables - Namespaces Two types of namespace Global variables are kept in symbol tables that are named and accessible Global Lexical Are created in the context of a package (default is $main::) Can be referenced from another package using $package::variable Lexical variables are created and exist only in the context of a Perl block (normally region enclosed with { }) AE6382 Literals – Numeric Numeric literals can take several formats 12345 12345.67 1.23e06 1_234_567 0123 0xffff 0b101010 integer floating point scientific octal hexidecimal binary AE6382 Literals - String There are several ways to quote a string Substitution for variables in a string is known as interpolation print “The value is $value\n”; print ‘The value is ‘,$value,”\n”; Interpolation occurs for variables and back slash literals Usual General Meaning Interpolate ‘ ‘ q/ / Literal string No “ “ qq/ / Literal string Yes ` ` qx/ / Command execution Yes ( ) qw/ / Word list No / / m/ / Pattern match Yes s/ / / s/ / / Pattern substitution Yes y/ / / tr/ / / Character translation No AE6382 Literals - String Special additions to the character set Backslash escape characters \n \r \t \033 \cX \x{263a} \\ newline carriage return tab character represented by octal 033 Control-X Unicode character back slash Translation escapes \u \l \U \L \E force next character to uppercase force next character to lowercase force all following characters to uppercase force all following characters to lowercase end \U or \L switch AE6382 Literals - String There is flexibility in choosing quotes The following executes a command using the OS shell and returns its output as a string $string = qq[This method allows inclusion of ‘ and ‘’]; $string = qq{This method allows inclusion of ‘ and ‘’}; $string = qq/This method allows inclusion of ‘ and ‘’/; $result = qx(ls); Word list form does not require tedious quoting @months = qw(January February March April); AE6382 Interpolation Interpolation is the process of expanding a variable in a string literal, the “ form of the string Scalars are resolved in place, numeric values are converted to characters Arrays are interpolated by joining all the elements of the array separated by the value of the special $” variable $” = ‘~‘; @months = qw(jan feb mar apr may jun); $string = “The months are: @months”; – The months are: jan~feb~mar~apr~may~jun Hashes are interpolated similarly, the key followed by the value are inserted into the string AE6382 List Values A list consists of values enclosed in ( ) and separated by commas In list context the above example loads the array with the values In a scalar context, each value is evaluated and the last value is returned, $value == 11 below @array = (1,3,5,7,9,11); $value = (1,3,5,7,9,11); There is an important difference between a list and an array, when an array is evaluated in scalar context it returns its length, $length == 6 $length = @array; $length = scalar @array; $length = @array + 0; AE6382 List Values List interpolation Lists can be indexed using [ ] (@array1, @array2, 1) Each element above is evaluated and inserted into the list that is generated There are no lists of lists ($day,$month,$year) = (localtime())[3..5]; Lists may be used as lvalues (see above) AE6382 Context Every operation in Perl is evaluated in one of two contexts: scalar or list Assignment to a scalar lvalue will cause the right side to be evaluated in scalar context Assignment to an arrary, hash, or a slice lvalue will cause the right side to be evaluated in list context Assignment to a list on the left will cause the right side to be evaluated in list context Use the scalar function to force evaluation in scalar context Some operations return different values depending on the context in which they are evaluated $number_of_matches = m/([^,]+)*/; @numbers = m/([^,]+)*/; AE6382 Arrays and Context An array when referenced using @ operates in a list context An array element operates in a scalar context When a list is assigned to an array each value is inserted into the next element Special forms of arrays $length = scalar @array; $last_index = $#array; scalar @array == $#array + 1 (scalar not required here) (an identity) AE6382 Hashes and Context A hash when referenced in the % form operates in list context A hash element operates in a scalar context When a list is assigned to a hash each pair of values in the list is taken as a key-value pair There is a special syntax available for this %colors = (‘red’,0xff0000,’green’,0x00ff00,’blue’,0x0000ff); %colors = (red => 0xff0000, green => 0x00ff00, blue => 0x0000ff); Use the keys function to generate a list of keys for a hash To find the number of keys in a particular hash $number_of_keys = scalar keys %hash; AE6382 Filehandles and Input A filehandle refers to a file Filehandles are, by convention, all upper case Use <> operator to read from a filehandle STDIN, STDOUT, STDERR are predefined $line = <STDIN>; @lines = <STDIN>; read one line from STDIN read all lines from STDIN Read and print entire STDIN while(<>) { print; } – reads each line to the special variable $_ which is used implicitly in both the <> and print commands AE6382 Operators Operator precedence Operators can be overloaded when using objects Terms and list operators -> ++ -** ! ~ \ unary + unary =~ !~ * / % x + - . << >> Named unary operators < > <= >= lt gt le ge == != <=> eq ne cmp & | ^ && || .. ... ? : (ternary) = += -= *= (etc) , => List operators not and or xor AE6382 Simple Statements A simple statement is an expression that is evaluated A simple statement is terminated with a ; A simple statement may be followed by a modifier if expr unless expr while expr until expr foreach list Examples print “Value is $i\n” if $i > 5; print “i=$i-- \n” while $i != 0; AE6382 Compound Statements Expressions containing blocks A block is normally contained in { } if statement if (expr) block if (expr) block else block if (expr) block elsif (expr) block if (expr) block elsif (expr) block else block unless statement is similar $i = $max; if ($i == $max) { print “The max is five\n”; exit; } else { $i++; } $i = $max; unless ($i == $max) { $i++; } else { print “The max is five\n”; exit; } AE6382 Compound Statements while statement until statement label while (expr) block label while (expr) block continue block label until (expr) block label until (expr) block continue block The continue block is executed before starting next iteration of loop while (<STDIN>) { chomp; @fields = split(/:/); print “Field 1: $fields[0]\n”; } AE6382 Compound Statements for loop label for (expr1 ; expr2 ; expr3) block expr1 start condition expr2 ending condition expr3 loop statement for (my $i = 0;$i < 10;$i++) { print “i=$i\n”; } AE6382 Compound Statements foreach statement label foreach (list) block label foreach var (list) block label foreach var (list) block continue block Loops over each entry in the list When var is omitted then $_ is used foreach my $key (sort keys %people) { print “Key: $key, Value=$people{key}\n”; } foreach my $entry (@items) { print “Item: $entry\n”; } AE6382 Compound Statements Labeled block label block label block continue block Equivalent to a single iteration loop Can be used with last, next, and redo AE6382 Loop Control These statements can be used with blocks The optional label further refines their effect last label next label Skip the rest of this iteration and start the next iteration Execute the continue block before the next iteration begins redo label Exit the loop (block) The continue block is not executed Restart the loop with the current iteration parameters The continue block is not executed The label parameter enables multi-level block control AE6382 Declarations Subroutine declaration is a global declaration Must declare a subroutine before using it Can define a subroutine at declaration sub count; sub count { … } Pragmas are directives to the Perl compiler use strict; use integer; use warnings; use English; AE6382 Declarations Variable declarations Lexically scoped declarations – – – Lexically scoped global declarations – my $var; my ($var1, $var2); my $value = function(); our $var; Dynamically scoped global declarations – local $var; AE6382 Pattern Matching Regular Expressions Simple patterns Rule based pattern matching mechanism m/Class/ Complex pattern m/AE[0-9]+[A-Z]/ AE6382 Regular Expressions Meta-characters Quantifiers * + ? {3} {2,5} RE’s normally match maximal text Add ? to end to match minimal text Character classes \ | ( ) [ { ^ $ * + ? . Have special meanings inside patterns \ is the escape character used to use one of the meta-characters as itself in a pattern, eg, \\ or \. [ ] or [^ ] Grouping () AE6382 Regular Expressions The pattern matching operators match substitute transliterate Binding operators m// s/// tr/// =~ !~ binds string to pattern operator Examples $string =~ m/AE[0-9]{4}[A-Z]/; $string =~ s/old/new/; $string =~ s(old)(new); $string =~ s’old’new’; can use arbitrary delimiters AE6382 Regular Expressions Maximal and Minimal matches “exasperate” =~ m/e(.*)e/ – Returns “xasperat” “exasperate” =~ m/e(.*?)e/ – Returns “xasp” AE6382 Functions There are many built-in functions Can be used with or without parentheses around arguments – – – – With parentheses it will be parsed as a function Without parentheses it will be parsed as a prefix operator, preferred Use the –w switch on the #!/usr/bin/perl –w line to flag when it is being parsed as a function Example • print 1+2*4; • print (1+2)*4; # prints 9 # prints 3 For details see perl documentation or Camel book Users may define functions sub name { code }; User functions are called with parentheses around arguments AE6382 Functions - Arguments Arguments are passed to functions in the built-in array @_ The elements of @_ can be accessed by any of several techniques sub func { sub func { } sub func { sub func { my $arg1 = $_[0]; my $arg1 = shift; my $arg1 = shift; my $arg2 = $_[1]; my $arg2 = shift; my @rest = @_; } } my $nargs = @_; my $arg1 = shift; my @rest = @_; } shift is a built-in function that returns the first element of an array then shifts the remaining elements down shift operates in a manner similar to a stack pop AE6382 eval Function The eval function normally used to trap runtime errors The eval function has two forms eval block – eval expr – – Will execute the code enclosed by the block Compiles and executes the code in expr The code in expr can be dynamically created The special variable $@ contains the result of execution $@ is set to the error message if there is an error $@ is set to an empty string if there is no error eval { … } # execute block of code if ($@) { … } # handle error AE6382 References A reference in Perl is a scalar that contains a pointer to some data in memory Perl has two types symbolic and hard Use the $ prefix to dereference a reference Symbolic: scalar contains the name of another variable Hard: scalar contains the address of the memory $ref is the scalar that contains the reference $$ref # dereference ${$ref} # dereference Hard references are generally more common AE6382 References The \ (backslash) operator is used to create a hard reference $ref = \$sample In this example $ref is an alias for $sample, they both refer to the same location in memory Use $$ref to refer to that memory location: $$ref == $sample and ${$ref} = $sample $ref = \@array In this example $ref is an alias for @array To access an array element: $$ref[1] or ${$ref}[1] or $ref->[1] To access array: @$ref or @{$ref} AE6382 Data Structures References are useful in accessing anonymous data structures Anonymous array Anonymous hash [ element1, element2, … , elementN ] $ref = [0,1,2,3,4]; $$ref[0] or ${$ref}[0] or $ref->[0] { key1=>element1, key2=>element2, … , keyN=>elementN } $ref = { Jan=>1, Feb=>2, Mar=>3, Apr=>4 }; $$ref{Jan} or ${$ref}{Jan} or $ref->{Jan} The -> operator is syntactic shorthand that removes the extra $ dereference AE6382 Data Structures Creating arbitrarily complex data structures is relatively easy using references Create any number of anonymous structures placing their address into a scalar (reference) Store the resulting scalars into other structures AE6382 Arrays of Arrays An array of arrays is how to create a multi-dimensional array in Perl In each cell of one array save a reference to another array There is no requirement that each secondary array be the same length my @array; my $array_ref; for (my $i=0;$i<4;$i++) { for (my $i=0;$i<4;$i++) { my $ref; my $ref; for (my $j=$i;$j<$i+4;j++) { for (my $j=$i;$j<$i+4;j++) { push @{$ref},$j; push @{$ref},$j; } } $array[$i] = $ref; $array_ref->[$i] = $ref; } } print $array[0]->[0],”\n”; print $array_ref->[0]->[0],”\n”; AE6382 Hash of Arrays In each cell of a hash table save a reference to an array my %months = ( Jan=>[1..31], Feb=>[1..28]); $, = ‘, ‘; foreach my $month (keys %months) { print “$month: “,@{$months{$month}},”\n”; } Jan: 1, 2, 3, 4, … 27, 28, 29, 30, 31 Feb: 1, 2, 3, 4, … 27, 28 AE6382 Complex Structures Data structures can be created to any level of complexity Can mix all types to any depth Arrays of hashes of hashes of arrays Hashes containing references to user defined functions – &{$func_list{$member}}(…arguments…) sub startup { print “Startup\n”; } sub shutdown { $code = shift; print “Shutdown: $code\n”; } %func_list = (Startup=>\&startup, Shutdown=>\&shutdown); &{$func_list{shutdown}}(99); AE6382 Packages A package is the way to isolate code in its own namespace This is particularly useful for re-usable code (libraries) As generally used, the scope of a package declaration is the file in which it appears Usually package is the first line of a file that is processed by require or use To refer to a variable in another package use $package::variable The default package is main, $main::variable or $::variable AE6382 Modules The module is the basic unit of re-usable Perl code Module files end with the .pm file extension Modules come in two forms Modules are accessed with the use keyword Traditional: functions and variables Object-Oriented: methods and properties use Module; A module file contains a package declaration with the same name as the file A module may export a list of functions and variables to the namespace that contains the use statement (do not export OO methods) AE6382 Modules Module names should begin with a capital letter and end with .pm The last line of a module must be 1; File Sample.pm use Sample; package Sample; my $result = Sample::func1; sub func1 { } sub func2 { } 1; AE6382 Modules Beyond the simple form there is additional support for modules The Exporter module can be used to place selected symbols into the Perl code that uses the module There is a version checking mechanism There is an autoload feature File Sample.pm package Sample; use Sample; require Exporter; our @ISA = qw(Exporter); my $result = func1; our @EXPORT = qw(func1 func2); sub func1 { } sub func2 { } 1; AE6382 Objects The module forms the basis of the Object Oriented features of Perl The package name is the class name (type) The function definitions in the module are the methods A class may inherit methods from parent classes A class may be sub-classed Perl classes inherit methods not data An object is a reference to an instance of a class All Perl classes are sub-classes of the UNIVERSAL class AE6382 Objects – Method Invocation Assume a class named Sample with an instance named $instance Invoking a class method Invoking an instance method Sample->class_method(…arguments…); $instance->instance_method(… arguments…); The first argument of a method invocation is hidden and is either the class name (class method) or a reference to an object (instance method) Methods can override super class methods AE6382 Objects – Method Invocation (2) There is an alternate invocation method using indirect objects Looks like method object (list) method object list method object This method is less common as it suffers from some syntactic ambiguity Frequently used in calling constructor $q = new CGI; $q = CGI->new; AE6382 Objects - Constructors A constructor method is an ordinary method, usually named new Constructors for sub-classable classes need to be designed carefully (Camel Book 3rd ed, p 318) The instance properties are usually kept in an anonymous hash that is saved in the instance variable The bless function associates the reference variable with the class # Constructor for class named Sample sub new { my $obj = shift; my $class = ref($obj) || $obj; $object = Sample->new(alpha=>1,beta=>2); my $self = { @_ }; bless($self,$class); return $self; } AE6382 Objects - Constructors In the previous example the instance data are stored in an anonymous hash The ref built-in function returns the class name of the object that is referred to Any reference can be used, hashes are common and convenient The use fields …; pragma is useful for creating object field storage, use this with the use base …; pragma AE6382 Objects – Properties The instance data can be referenced as hash entries when the object is hash based my $prop1 = $object->{alpha}; my $prop2 = $object->{beta}; Instance data should normally be accessed using accessor methods AE6382 Objects - Overloading Perl provides a mechanism to overload operators use overload implements this There is a handler (method/function) associated with each operator that has been overloaded, Perl will take care of the details AE6382 Tied Variables In Perl the tie function associates an object with a normal Perl variable (scalar, array, hash) For example, a file can be accessed as if it were a simple array The store and fetch accesses to the variable are provided by methods, Perl handles the details There are numerous available modules that create tied variables to access more complex data sources AE6382 Extending Perl There are several ways to extend Perl Hundreds of modules are available at http://www.cpan.org/ Perl is available for almost every OS Create modules (object oriented or traditional) Create native code, C code, that is appended to the Perl interpreter Generally pre-compiled for Linux Windows version from http://www.activestate.com/ The Perl interpreter can be embedded in native code programs AE6382