PERL_shortcourse_APG12

advertisement
PERL super-short course
Genetics 875
The following is a very brief summary of the very basics of perl. We will use the
concepts listed here to write some simple perl programs in lab. The exercise is meant to
highlight the power of perl programming and encourage people to learn perl in more
depth. For an indepth tutorial the ‘Learning Perl’ books are very straightforward and
useful and are available online for UW staff and students.
O’Reilley “Learning Perl” available online through UW
http://proquest.safaribooksonline.com/0596101058?uiCode=uwimad
Perl
Perl is a versatile programming language that can be used for many tasks. Perl provides a
large number of built-in tools and functions for quite complicated programs. It is
especially useful for manipulating data, files, searching text/sequence, etc.
Programs written in Perl are called Perl scripts – they are written by the user using the
Perl language. The perl script is read and executed by the master program PERL. The
PERL program must therefore be installed on your computer before any perl scripts can
be run (it typically comes with newer versions of OS-X).
Perl is implemented as an interpreted language (as opposed to a ‘compiled’ language that
is converted into bits and bytes read directly by the CPU of your computer). This means
that your scripts are interpreted and executed by the PERL program. Thus, the execution
of a Perl script tends to be slower than other languages because it is one step removed
from the CPU.
However, the advantage of perl is that it’s especially good at manipulating, searching,
reading, reorganizing text and data files compared to other languages. In fact, this was in
large part why perl was invented.
Perl syntax
Perl is a language not unlike English. A book has words, sentences, paragraphs, chapters,
and punctuation linking them all together. Analogously, perl uses variables, functions,
loops, subroutines, modules, etc. Like any language, the perl language has a specific
syntax that is very important to its interpretation by the PERL program. Therefore,
spaces, capitol vs. small-case letters, and the correct punctuation (brackets,
parentheses, semicolons) are very important.
First, some definitions:
1
A scalar is a single unit that can be either a number (e.g. 123) or ‘string’ of characters
(e.g. abc).
A variable is a container for a single unit/scalar – the value stored can vary (hence, the
name variable). The variable can have any name you like, but it is indicated to the PERL
program with a $ in font.
$bazooka is a variable named bazooka.
$bazooka = 5; here $bazooka is set to equal 5
$bazooka = 23; now $bazooka has been changed to equal 23
$bazooka = “JimBob”; now $bazooka holds the string ‘JimBob’
Perl also has containers for multiple variables grouped together:
An array is an ordered group of variables, essentially a list. A given array can have any
name you like, but it is indicated to PERL with an @ in front.
@guns; this is an array called guns
@guns = (“bazooka”, “shotgun”, “pistol”); here the array holds 3 strings
To ‘call’ a single one of the variables in the array we use this syntax:
$guns[1]
The $ tells PERL this is a single variable – BUT the brackets indicate that this is really a
single variable that’s part of an array, stored at position 1 in the array.
A note about counting in programming languages: we always start counting a 0 (not 1!).
So in our example above:
$guns[0] stores the string ‘bazooka’
$guns[1] stores the string ‘shotgun’
$guns[2] stores the string ‘pistol’
A hash is a fancier container for multiple variables. Again, you can name it whatever
you want but it is indicated to PERL with a % in front:
%Guns;
The main difference between a hash and an array is how the values are stored:
In an array each variable is at a position in the list (0, 1, 2, etc).
In a hash, each variable is ‘indexed’ with a key word:
2
$Guns{“Arkansas”} = “bazooka”;
the $ tells PERL this is a single variable – BUT the curly brackets indicate that this is
really a single variable that’s part of a hash, the value ‘bazooka’ is stored under the key
word ‘Arkansas’. Hashes can be very useful for liking related things to one another, e.g.
storing each variable linked to a key word.
Syntax
The proper syntax is critical to get your scripts to run.
$Blah means something to PERL – if you put a space between $ and Blah PERL sees two
separate things and can’t understand either one. Therefore PERL will give you an error.
Often the error can guide you in exactly where in your program things are failing.
Any word that is not part of a defined item in PERL (eg. $word, @word, %word) needs
to be in quotes – quotes lets PERL know it doesn’t need to recognize the word as a
predefined term or function.
Punctuation is also very important. A semicolon is the ‘period’ of PERL and is required
at the end of each statement. Other important punctuation marks link items that go
together:
( … ) { …. } [ … ] etc.
Functions
A function is like a verb is in English – it is an action that is taken on a scarlar. PERL
has a lot of useful built-in functions. Some work on numbers (such as +, -, *, / which
add, subtract, multiply, and divide). Others work on strings (e.g. concatenating two
words).
PERL is especially good for pattern (string) matching and manipulation, e.g. searching
for a word in a long string, replacing one subsequence with another, and
transposing/swapping letters. We will learn how to do some of this in lab.
Conditional Statements
Many functions are useful if only applied under certain conditions. For example:
if ($variable eq “bazooka”) {
… do something …
}
In perl the ‘eq’ function means equals, { .. } specifies a group of activities/functions to be
done together, and the if (…) part stipulates that { .. } is executed ONLY IF the statement
in parentheses is true.
3
Loops
Many times you want to iteratively apply the same function. This can be done using
loops. Typically a loop is setup for a determined number of cycles at the outset:
for ($x=0; $x<5, $x++) { … do something .. }
Everything in { … } gets executed with each run through the loop. The first ‘for’
statement essentially establishes to do the loop 5 times (we will discuss the setup of this in
lab).
4
Download