Perl

advertisement
Perl
Kurtis Hage
CSC 415: Programming Languages
History of Perl
Perl 1.0 was originally released in December of 1987; it was developed by Larry
Wall as an interpreted language optimized for scanning arbitrary text files, extracting
information from those text files, and printing reports based on that information, as
stated in the original Perl Manual. Perl is implemented using C, and borrows from awk,
sh, and sed; all languages that preceded it.(Lapworth)
Perl has a few nicknames; it is called “the Swiss Army chainsaw of programming
languages”(Sheppard) and “the duct tape that holds the Internet together.”(Leonard) It
gained these nicknames because of its high degree of adaptability and its rise in
popularity as the web was being developed; because most of what was being done on
the early part of the web happened with text, and because Perl was designed, at least
in part, to handle text processing, it was better suited than the available alternative
languages at the time. (Sheppard)
The Perl motto is 'There's More Than One Way To Do It', which emphasizes both
the flexibility of Perl and the fact that Perl is about getting the job done. Its strengths
are its writability, which stems from its prevalent use of the English language and the
design decision to be easy for humans to write, rather than easy for computers to
understand (Cozens), and its free, open-source distribution model, which allows for a
high degree of portability across computers and platforms. (Cozens)
Perl's most common use in today's environment is for CGI (Common Gateway
Interface, not to be confused with Computer Generated Imagery) Programming,
meaning that Perl is used to dynamically create web pages. Perl is the powerhouse
behind popular sites such as Slashdot and Amazon.
Overview of the Language
Names and Scopes
Perl scripts start with an implied declaration of package main; where “main” is the
name of the namespace to which the block, subroutine, eval, or file belongs. Package
variables can be declared by using the fully package qualified name in the code,
# can use variable without declaring it with 'my'
$some_package::answer=42;
warn "The value is '$some_package::answer'\n";
which is allowed regardless of which namespace the code currently resides, or by use
of the “our” variable, which creates the variable in the current namespace . (London)
package Hogs;
our $speak = 'oink';
warn "Hogs::speak is '$Hogs::speak'";
> Hogs::speak is 'oink' ...
Package declarations can be made inside code blocks, but upon leaving the block,
the package namespace reverts to the previous, overarching namespace. After
reverting to the overarching namespace, the package declaration inside the block can
still be accessed; however, it can then only be accessed by using the fully package
qualified name. (London) This is due to the use of Perl's lexical scope.
Lexical scope refers to anything that is visible or has an effect only within a certain
boundary of the source text or code. (London) Lexically scoped variables have three
main features: 1) Lexical variables do not belong to any package namespace. 2)
Lexical variables are only directly accessible from the point where they are declared to
the end of the nearest enclosing block, subroutine, eval, or file.
no warnings;
no strict;
{
my $speak = 'moo';
}
warn "speak is '$speak'\n";
> speak is ' '
In the above example, the lexical scoping of $speak only exists inside of the
bracketed structure. 3) Lexical variables are subject to garbage collection at the end of
the scope. (London)
Perl allows for lexical variables with the “my” keyword, though, as noted above, they
go out of scope at the end of the code block. Package variables are permanent,
though, and never go out of scope.
Data Types
Perl uses three basic storage types for it's variables. (London) These variables
requires special characters at the beginning of their name to specify their type.
(Sebesta) These special characters are $, which represents a scalar, @, which
represents arrays, and %, which represents hashes. (London)
Scalars
Scalar ($) variables can store strings, numbers, references, and filehandles.
(London) However, no declaration of which type is intended is necessary; Perl
automatically promotes and handles the scalar as the correct type.
Because Perl was designed in part to handle text processing, it has a variety of very
useful functions to operate on strings, which are a subset of scalars. These functions
are a large part of what makes the language so powerful, and some of these functions
are documented in the Appendix.
my $fullname = 'mud' . "bath"; # concatenation
my $line = '-' x 80; # repition; $line is eighty hyphens
my $len = length($line); # length; $len is 80
qw() #covered in Example 5
Perl string literals must be placed in single or double quotes, though a list of string
literals can be created using the qw() function. (London)
my ($first,$last)=qw( John Doe );
print "first is '$first'\n";
print "last is '$last'\n"; #returns “John”, returns “Doe”
Perl scalars allow for different formats for numeric literals with special declarations;
numbers preceded by “0b” are binary, by “0x” are hexadecimal, and by “0” are octal
(London); otherwise, numerical literals are assumed to be integers, though floatingpoint and scientific notation are also allowed.
my $base_address = 01234567; # octal
my $high_address = 0xfa94; # hexadecimal
my $low_address = 0b100101; # binary
Similar to its powerful string functions, Perl also has a myriad of built-in powerful
numeric functions; it doesn't need to call a module for (some) trigonometric functions
which return a value in radians, for exponentiation, square roots, or natural logarithms.
my $radian = 45 * ( 3.14 / 180 ); # 45 degrees in radians
my $sine_radians = sin($radian); # Trig sin, returns 0.707, Correct!
my $seven_squared = 7 ** 2; # exponentiation, returns 49
my $square_root_of_123 = sqrt(123); # sqrt, returns 11.0905
Arrays
Arrays are preceded with the “@” symbol, and store scalars that are accessed via
an integer index. (London) Arrays start at index 0 (as opposed to index 1), unless they
are declared as negative indexes, in which case they start from the end of the array
and work backwards. Perl arrays are one-dimensional only. (London)
Standard notation is different in Perl than most other languages; while the array is
declared with the @ symbol, accessing an index of the array must use the scalar $
symbol with the number of the index placed inside square ( [] ) brackets.
my @numbers = qw ( zero one two three );
my $string = $numbers[2];
warn $string; #returns two, the third element (indexed at 2)
The length of Perl arrays is not pre-defined; instead, Perl allocates whatever space
is needed. Perl also knows its' array's length via a function called scalar().
my @phonetic = qw ( alpha bravo charlie delta );
my $quantity = scalar(@phonetic);
warn $quantity; # returns 4, the number of elements in the array
This can also be done by assigning the entire array into a scalar variable.
my @phonetic = qw ( alpha bravo charlie );
my $quant = @phonetic;
warn $quant; # returns 3
Arrays can be treated similar to the “Stack” data structure, in which push() and
pop() methods operate on the end (highest index) of an array. Also included are shift()
and unshift(), which operate at the beginning of the array (index 0). Other operations
included are sort(), which sorts the array alphabetically, and reverse(), which returns
the array in reverse order.(London)
Hashes
Hashes are preceded with the % symbol and store scalars that are accessed via a
string index called a key. Like arrays, hashes are one-dimensional only. Any even
number of scalars can be assigned to a hash; Perl extracts them in pairs. (London) The
odd-numbered items are treated as the key, with the even numbered items treated as
the key's value. Below is a typical hash call:
my %info = qw ( name John age 42 );
my $data = $info{name};
warn $data; #returns “John”
Keys do not have to be pre-declared (as they were in the above example); if the key
does not exist during an assignment, the key is created and given the assigned value.
my %inventory;
$inventory{apples}=42;
Again, hashes have many built in functions, though they are not covered.
Expressions and Assignment Statements
Perl integrates regular expressions into the syntax of the core language
itself. Its regular expressions allow you to search for and transform text in innumerable
ways with ease and speed. (Cozens)
These expressions allow for searches of strings for particular patterns, find what
matched the patterns, and substitute the matched patterns with new strings. This is
accomplished through three separate functions, match(), substitute(), and
transliterate(). Perl allows for any delimiter in these operators. (London)
There are two ways to “bind” these operators to a string expression: 1) =~ in which
the pattern matches the string expression and 2) !~ in which the pattern does NOT
match the string expression. (London) Braces in the three above mentioned functions
are equivalent to double-quote marks.
Statement-Level Control Structures
Standard statements get executed in sequential order in Perl, but control flow
statements allow you to alter the order of execution. Many of these control flow
structures are shared in other languages, such as if..elseif..else blocks, while blocks,
and etc. Perl has an included if..unless block, though, which is just an added
conditional level. This unless block can be extended by elseif and else blocks, as well.
Typical “for” loops, though, are handled as “foreach” loops in Perl. (London)
Perl also has labels, which are optional names for associated control structures.
These names are next, last, and redo.
The last command goes to the end of the entire control structure, skipping any
continue block if it exists. The next command skips the remaining block, but executes
anything in a continue block if it is there. Regardless of a continue block exists,
execution will then begin at the next iteration of the control structure (assuming it is a
loop). The redo command skips the remaining block, again not executing anything in a
continue block, and resumes at the start of the control structure without evaluating the
conditional again. (London)
Subroutines
Perl allows named and anonymous subroutines that can be declared with the &
symbol, though unlike scalars, arrays, and hashes, the & symbol is not mandatory.
Subroutines follow a syntax of “sub NAME BLOCK”, where NAME is any valid identifier
and BLOCK is a code block enclosed by parenthesis. The name of a subroutine is
placed in the current package namespace and can be accessed with just NAME if you
are in the correct package, or with the fully package qualified name if you are outside
the package, all with or without the optional &.
sub Ping {print "ping\n";}
Ping;
&Ping;
MyArea::Ping;
&MyArea::Ping; #all of these return “ping” and are correct calls
The contents of each block are invisible to anything outside the code block (function
calls, etc; not invisible in the IDE). (London)
Any values that get passed to or from a subroutine are put in the parenthesis at the
subroutine call, where all arguments are reduced to scalars and their corresponding
elements; the subroutine will NOT know if those reduced scalars came from scalars,
arrays, or hashes. (London)
Inside the subroutine, the arguments are accessed via a special array called @_. If
the arguments are fixed and known, they can be extracted by assigning @_ to a list of
scalars with meaningful names.
sub compare {
my ($left,$right)=@_;
return $left<=>$right; }
The @_ array is really a list of aliases for the original arguments that were passed in;
one must then be careful, because assigning a value to an element in @_ will change
the value in the original variable that was passed in. One must also be careful when
calling a subroutine with the & symbol and no parenthesis; the current @_ array gets
implicitly passed to the subroutine being called. (London)
Subroutines can return single values or lists of values; the returned value can be
explicit or is implied to be the last statement of the subroutine.
An interesting included function of subroutines is caller(). This function returns a list
of information about how and where the subroutine was called. This information
includes the package namespace at the time of call, the filename where it was called,
the line inside the file where it was called, the subroutine that calls it, whether or not
the subroutine had explicit arguments passed in, and some other information. (London)
Support for Object-Oriented Programming
Perl supports both procedural and object-oriented programming. This is
accomplished by the creation of classes, which can refer to a package or module.
The SUPER:: method allows a child object (a class that uses another class as its'
base) to call a method that belongs to its parent's class. This has a built in limitation,
though, of only being able to look up the class inheritance hierarchy starting at the
class from which it was originally called. (London) Long trees of class extension, then,
can cause SUPER to not work.
Created objects are subject to garbage collection by something called Object
Destruction. This occurs when all references to a specific object have gone out of
lexical scope. When this happens, Perl calls the DESTROY method on the object (if it
exists), otherwise removing the data if it does not. This has similar limitations as
SUPER, in that the DESTORY method only travels up to the first method of DESTORY
in its hierarchical tree. (London)
Modules are included by making calls in the structure “use MODULENAME
'directory' “. “use base” can also be used to have classes inherit from a base class that
has common methods. Classes can override the methods of their base class by simply
writing the same MethodName as the method that is attempting to be overridden.
(London)
Concurrency
Perl supports concurrency...sort of. The language itself wasn't designed to with
concurrency in mind; instead, Perl handles concurrency with the use of a threading
module. Each thread runs as a separate virtual machine, and only data that is explicitly
marked as shared can be shared. (Wegrzanowski)
In this way, Perl can run concurrently; this concurrency allows access to all Perl
libraries and avoids some problematic errors by not allowing everything to be shared
by default. This type of concurrency works very well for some problems, but very poorly
for large-scale concurrency programs, in which specialty languages may be more
suitable. (Wegrzanowski)
Exception and Event Handling
Perl exception handling typically occurs in eval..do blocks, which are functionally
similar to try..catch blocks.
These exceptions typically carry three important pieces of information with them; 1)
Type of exception as determined by the class of the exception object, 2) Where the
exception occurred and 3) Context information, which includes the error message and
other state information. (Shankar)
Shankar states:
“Object-oriented exception handling allows you to separate error-handling code from the
normal code. As a result, the code is less complex, more readable and, at times, more efficient.
The code is more efficient because the normal execution path doesn't have to check for errors.
As a result, valuable CPU cycles are saved.”
However, there exists a module hosted on CPAN, the Comprehensive Perl Archive
Network, called Error.pm that attempts to mimic other object-oriented languages like
Java and C++. This module provides interfaces for procedural exception handling and
a base class for other exception classes. (Shankar)
Other Issues
Perl does not have a boolean type variable. Instead, the interpreter interprets scalar
strings and numbers as true or false based on a set of rules.
1)Strings "" and "0" are FALSE,
any other string or stringification is TRUE
2) Number 0 is FALSE, any other number is TRUE
3)all references are TRUE
4)undef is FALSE
Any value that is evaluated that is NOT a scalar is evaluated in a scalar context and
then treated as a string or number. The scalar context of an array is its size, but, of
note is that an array with one undefined value still has a scalar value of true.
Subroutines return scalars or a list depending on the context in which it is called, and in
order to explicitly return false, an empty return statement is used. (London)
A potential issue that Perl has is its Autovivcation. Without “use strict” (and “use
warnings”, in some cases), and without declaring a variable with “my”, variables are
created and initialized to be undefined (undef in Perl), which returns false. (London)
A last potential issue is Perl's garbage collection, which has been briefly discussed
in the “Names and Scope” section of this paper. When Perl frees up memory, the
memory is not returned to the system, but instead is used for possible declarations of
new lexically scoped variables that could be declared later in the program (after the
garbage has been collected). This means that running Perl programs will never get
smaller; any memory that is allocated remains under Perl's jurisdiction. (London)
Evaluation
Readability
Perl ranges from overtly simple to read to notoriously difficult, as is abundantly clear
in obfuscated Perl competitions that have taken place in the past.
Much of what makes Perl so powerful also contributes to its (potential) obfuscation;
all of the delimiter available functions and its regular expressions, in particular, can
quickly become a jumbled mess. Another feature of the language that can make Perl
difficult to read is its flexibility with its subroutine calls; Appendix 13 shows four
separate ways to call the same function, even though it may not be inherently obvious
that each line produces the same output.
With good programming habits, though, Perl's typical block structure allows for
relatively simplistic readability, barring perhaps the special characters assigned to
variable types for those previously unfamiliar with the language.
Writability
For the average user, Perl's writability is almost on par with languages like C++ or
Java, again with the possible exception of how variable type declaration is handled.
Much of what can potentially harm Perl's readability can also greatly enhance its'
writability, as seen in Appendix 16 and 17, which are incredibly powerful programs
written in 2 and 7 lines of code, respectively.
Reliability
Perl code tends to have a long lifespan, as is evident by its continued use in today's
programming environment. As noted in the introduction, Perl is the “duct tape that
holds the Internet together.” (Leonard)
With the continued open-source developmental support of Perl 5 and its iterations,
and the continued development of the as-of-yet not fully released Perl 6, Perl appears
as though it will continue to be maintained and reliable.
Cost
Perl's initial cost is zero; the full source code and documentation are free to copy,
compile, print, and give away. Any programs written in Perl incur no royalties or
restrictions on distribution. Perl is released under the terms of the “Artistic” GNU
General Public License, meaning any modifications must be clearly flagged and the
original modules distributed along with the modified versions.
Training a new Perl user, though, may not be free. While its open-source nature,
with plenty of free books and example code, can certainly allow for a programmer to be
self-taught, courses on Perl can range anywhere from $120 to upwards of $5000 for
“boot camp” type courses.
Perl is available for most operating systems, particularly Unix and its variants,
meaning hardware costs are typically kept to a minimum.
Conclussion
Perl is an incredibly powerful and yet simple or as complex-as-you-want-it-to-be
language that is still widely in use today. It continues to be used, as well as updated
and improved, and should continue to be relevant to programmers for some time to
come.
Code Appendix
Example 1: Two Line RSA Algorithm – (Beck – just two lines of Perl.)
print pack"C*",split/\D+/,`echo "16iII*o\U@{$/=$z;[(pop,pop,unpack"H*",<>
)]}\EsMsKsN0[lN*1lK[d2%Sa2/d0<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<J]dsJxp"|dc`
Example 2: 7 Line method DVD Copy Protection descramble – (Winstein and Horowits)
$_='while(read+STDIN,$_,2048){$a=29;$b=73;$c=142;$t=255;@t=map{$_%16or$t^=$c^=(
$m=(11,10,116,100,11,122,20,100)[$_/16%8])&110;$t^=(72,@z=(64,72,$a^=12*($_%16 2?0:$m&17)),$b^=$_%64?12:0,@z)[$_%8]}(16..271);if((@a=unx"C*",$_)[20]&48){$h
=5;$_=unxb24,join"",@b=map{xB8,unxb8,chr($_^$a[--$h+84])}@ARGV;s/...$/1$&/;$
d=unxV,xb25,$_;$e=256|(ord$b[4])<<9|ord$b[3];$d=$d8^($f=$t&($d12^$d4^
$d^$d/8))<<17,$e=$e8^($t&($g=($q=$e14&7^$e)^$q*8^$q<<6))<<9,$_=$t[$_]^
(($h=8)+=$f+(~$g&$t))for@a[128..$#a]}print+x"C*",@a}';s/x/pack+/g;eval
Bibliography
L a p w o r t h , L e o . " Th e P e r l P r o gr a m m i n g L a n gu a g e . " T h e P e r l P ro g r a m m i n g L a n g u a g e . N . p . , 2 0 11 .
We b . 2 6 S e p 2 0 1 1 . < http://www.perl.org/>
W i e r s d o r f , S c o t t . P e r l . c o m . To m C h r i s t i a n s e n , S e p t e m b e r 2 1 s t , 2 0 1 1 . We b . 2 6 S e p . 2 0 1 1 .
< h t t p : / / w w w. p e r l . c o m / > .
S h a n k a r , A r u n U d a y a . " O b j e c t O r i e n t e d E x c e p t i o n H a n d l i n g i n P e r l . " P e r l . c o m . To m C h r i s t i a n s e n ,
n . d . We b . 2 6 O c t . 2 0 1 1 .
Wo l f g a n g , G. P e r l B e g i n n e r s ' S i t e . N . p . , J u l y 2 2 , 2 0 1 1 . We b . 2 6 S e p 2 0 1 1 . < h t t p : / / p e r l begin.org/>
S e b e s t a , R o b e r t W. C o n c e p t s o f P r o g r a m m i n g L a n g u a g e s . 9 t h . A d d i s o n - We s l e y
M c C u l l a g h , D e c l a n . " D e s c r a m b l e t h a t D V D i n 7 L i n e s . " Wi r e d . 0 7 0 3 2 0 0 1 : n . p a g e . We b . 1 1 O c t .
2 0 1 1 . < h t t p : / / w w w. w i r e d . c o m / c u l t u r e / l i f e s t y l e / n e w s / 2 0 0 1 / 0 3 / 4 2 2 5 9 >
S h e p p a r d , D o u g . " B e g i n n e r ' s I n t r o d u c t i o n t o P e r l . " P e r l . c o m . To m C h r i s t i a n s e n , 1 6 1 0 2 0 0 0 . We b .
1 2 O ct. 2 0 11 .
L e o n a r d , A n d r e w. " T h e j o y o f P e r l . " S a l o n . 0 8 0 1 2 0 1 1 : n . p a g e . We b . 1 2 O c t . 2 0 1 1 .
C o z e n s , S i m o n . B e g i n n i n g P e r l . W r o x P r e s s , 2 0 0 0 . P r i n t . < h t t p : / / w w w. p e r l . o r g / b o o k s / b e g i n n i n g perl/>.
S c h w a r t z , R a n d a l , To m P h o e n i x , a n d B r i a n F o y. L e a r n i n g P e r l . 5 t h E d i t i o n . O ' R e i l l y, P r i n t .
L o n d o n , G r e g g . I m p a t i e n t P e r l . Ve r s i o n 9 . 2 0 1 0 . e B o o k .
We g r z a n o w s k i , To m a s z . " W h y P e r l I s a G r e a t L a n g u a g e f o r C o n c u r r e n t P r o g r a m m i n g . " TA W ' S
B l o g . 0 4 , O c t o b e r , 2 0 0 9 . We b . 2 6 O c t . 2 0 1 1 . < h t t p : / / t - a - w. b l o g s p o t . c o m / 2 0 0 6 / 1 0 / w h y - p e r l - i s g r e a t - l a n g u a g e - f o r. h t m l > .
Download