PERL super-short course Genetics 875 The following is a very brief summary of the very basics of perl. We will use the concepts listed here to write some simple perl programs in lab. The exercise is meant to highlight the power of perl programming and encourage people to learn perl in more depth. For an indepth tutorial the ‘Learning Perl’ books are very straightforward and useful and are available online for UW staff and students. O’Reilley “Learning Perl” available online through UW http://proquest.safaribooksonline.com/0596101058?uiCode=uwimad Perl Perl is a versatile programming language that can be used for many tasks. Perl provides a large number of built-in tools and functions for quite complicated programs. It is especially useful for manipulating data, files, searching text/sequence, etc. Programs written in Perl are called Perl scripts – they are written by the user using the Perl language. The perl script is read and executed by the master program PERL. The PERL program must therefore be installed on your computer before any perl scripts can be run (it typically comes with newer versions of OS-X). Perl is implemented as an interpreted language (as opposed to a ‘compiled’ language that is converted into bits and bytes read directly by the CPU of your computer). This means that your scripts are interpreted and executed by the PERL program. Thus, the execution of a Perl script tends to be slower than other languages because it is one step removed from the CPU. However, the advantage of perl is that it’s especially good at manipulating, searching, reading, reorganizing text and data files compared to other languages. In fact, this was in large part why perl was invented. Perl syntax Perl is a language not unlike English. A book has words, sentences, paragraphs, chapters, and punctuation linking them all together. Analogously, perl uses variables, functions, loops, subroutines, modules, etc. Like any language, the perl language has a specific syntax that is very important to its interpretation by the PERL program. Therefore, spaces, capitol vs. small-case letters, and the correct punctuation (brackets, parentheses, semicolons) are very important. First, some definitions: 1 A scalar is a single unit that can be either a number (e.g. 123) or ‘string’ of characters (e.g. abc). A variable is a container for a single unit/scalar – the value stored can vary (hence, the name variable). The variable can have any name you like, but it is indicated to the PERL program with a $ in font. $bazooka is a variable named bazooka. $bazooka = 5; here $bazooka is set to equal 5 $bazooka = 23; now $bazooka has been changed to equal 23 $bazooka = “JimBob”; now $bazooka holds the string ‘JimBob’ Perl also has containers for multiple variables grouped together: An array is an ordered group of variables, essentially a list. A given array can have any name you like, but it is indicated to PERL with an @ in front. @guns; this is an array called guns @guns = (“bazooka”, “shotgun”, “pistol”); here the array holds 3 strings To ‘call’ a single one of the variables in the array we use this syntax: $guns[1] The $ tells PERL this is a single variable – BUT the brackets indicate that this is really a single variable that’s part of an array, stored at position 1 in the array. A note about counting in programming languages: we always start counting a 0 (not 1!). So in our example above: $guns[0] stores the string ‘bazooka’ $guns[1] stores the string ‘shotgun’ $guns[2] stores the string ‘pistol’ A hash is a fancier container for multiple variables. Again, you can name it whatever you want but it is indicated to PERL with a % in front: %Guns; The main difference between a hash and an array is how the values are stored: In an array each variable is at a position in the list (0, 1, 2, etc). In a hash, each variable is ‘indexed’ with a key word: 2 $Guns{“Arkansas”} = “bazooka”; the $ tells PERL this is a single variable – BUT the curly brackets indicate that this is really a single variable that’s part of a hash, the value ‘bazooka’ is stored under the key word ‘Arkansas’. Hashes can be very useful for liking related things to one another, e.g. storing each variable linked to a key word. Syntax The proper syntax is critical to get your scripts to run. $Blah means something to PERL – if you put a space between $ and Blah PERL sees two separate things and can’t understand either one. Therefore PERL will give you an error. Often the error can guide you in exactly where in your program things are failing. Any word that is not part of a defined item in PERL (eg. $word, @word, %word) needs to be in quotes – quotes lets PERL know it doesn’t need to recognize the word as a predefined term or function. Punctuation is also very important. A semicolon is the ‘period’ of PERL and is required at the end of each statement. Other important punctuation marks link items that go together: ( … ) { …. } [ … ] etc. Functions A function is like a verb is in English – it is an action that is taken on a scarlar. PERL has a lot of useful built-in functions. Some work on numbers (such as +, -, *, / which add, subtract, multiply, and divide). Others work on strings (e.g. concatenating two words). PERL is especially good for pattern (string) matching and manipulation, e.g. searching for a word in a long string, replacing one subsequence with another, and transposing/swapping letters. We will learn how to do some of this in lab. Conditional Statements Many functions are useful if only applied under certain conditions. For example: if ($variable eq “bazooka”) { … do something … } In perl the ‘eq’ function means equals, { .. } specifies a group of activities/functions to be done together, and the if (…) part stipulates that { .. } is executed ONLY IF the statement in parentheses is true. 3 Loops Many times you want to iteratively apply the same function. This can be done using loops. Typically a loop is setup for a determined number of cycles at the outset: for ($x=0; $x<5, $x++) { … do something .. } Everything in { … } gets executed with each run through the loop. The first ‘for’ statement essentially establishes to do the loop 5 times (we will discuss the setup of this in lab). 4