Output - UMassRC.org

advertisement
A Crash Course on
Perl programming
Presented by:
Shailender Nagpal, Al Ritacco
Research Computing
UMASS Medical School
Information Services, 09/17/2012
AGENDA
Perl Basics: Scalars, Arrays, Expressions, Printing
Built-in functions, Blocks, Branching, Loops
Hash arrays, String and Array operations
File reading and writing
Writing custom functions
Regular expressions: Find/replace/count
Providing input to programs
Using Perl scripts with the LSF cluster
2
Information Services, 00/00/2010
What is Perl?
• Perl is a high-level, general-purpose, interpreted,
dynamic programming language
• Provides a simple iterative, top-down, left to right
programming environment for users to create small,
and larg’ish programs
• Originally developed by Larry Wall in 1987 at NASA
• The acronym PERL means
– Practical Extraction and Reporting Language
3
Information Services, 00/00/2010
Features of Perl
•
•
•
•
Perl code is portable between Linux, Mac, Windows
Easy to use and lots of resources (CPAN) are available
Procedural programming, not strongly “typed”
Similar programming syntax as other languages
– if, if-then-else, while, for, functions, classes, etc
• Provides several methods to manipulate data
– Arrays, hash arrays, array of arrays, hash of hashes
– What does this mean?
• Well, we can store things easily and compare easily
4
Information Services, 00/00/2010
Advantages of Perl
• Perl is a general-purpose programming language like
C, Java, etc. But it is “higher” level, means is
advantageous to use it in certain applications like
Bioinformatics
–
–
–
–
–
Fewer lines of code than C, Java
No compilation necessary. Prototype and run!
Vast function library geared towards scientific computing
Save coding time and automate computing tasks
Intuitive. Code is concise, but human readable
• CPAN is a vast repository of Perl modules, reuse code
5
Information Services, 00/00/2010
First Perl program
• The obligatory “Hello World” program
#!/usr/bin/perl
# Comment: 1st program: variable, print
$name = “World”;
print “Hello $name”;
• Save these lines of text as a text file with the “.pl”
extension, then at the command prompt (linux):
$ perl hello.pl
Hello World
6
Information Services, 00/00/2010
Understanding the code
• The first line of a Perl script requires an interpreter
location, which is the path to the Perl executable
#!/path/to/perl
• 2nd line: A comment, beginning with “#”
• 3rd line: Declaration of a string variable
• 4th line: Printing some text to the shell with a
variable, whose value is interpolated
• The quotes are not printed, and $word is replaced by
“World” in the output. All statements end in “;”
7
Information Services, 00/00/2010
Second program
• Report summary statistics of DNA sequence
#!/usr/bin/perl
$DNA = “ATAGCAGATAGCAGACGACGAGA”;
print “Length of DNA is “.length($DNA);
print “Number of A bases are ”.($DNA=~tr/A//);
print “Number of C bases are ”.($DNA=~tr/C//);
print “Number of G bases are ”.($DNA=~tr/G//);
print “Number of T bases are ”.($DNA=~tr/T//);
print “Number of G+C bases are ”.($DNA=~tr/GC//);
print “Number of GC dinucleotides are ”.($DNA=~s/GC//g);
print “G+C percent content is “.($DNA=~tr/GC//)/length($DNA)*100);
• In 10 lines of code, we can summarize our data!
• Can re-use this code to find motifs, RE sites, etc
8
Information Services, 00/00/2010
Perl Comments
• Use “#” character at beginning of line for adding
comments into your code
• Helps you and others to understand your thought
process
• Lets say you intend to sum up a list of numbers
#
100
x=1 x
(sum from 1 to 100 of X)
• The code would look like this:
$sum = 0;
# Initialize variable called “sum” to 0
for($x=1; $x<=100; $x++) # Use “for” loop to iterate over 1 to 100
{
$sum=$sum+$x; }
# Add the previous sum to $x
print “The sum of 1..$x is $sum\n”; # Report the result
9
Information Services, 00/00/2010
Perl Variables
• Variables
– Provide a location to “store” data we are interested in
• Strings, decimals, integers, characters, lists, …
– What is a character – a single letter or number
– What is a string – a list of characters
– What is an integer – a number 4.7 (sometimes referred to
as a real if there is a decimal point)
• Variables can be assigned or changed easily within a
perl script
10
Information Services, 00/00/2010
Variables and built-in keywords
• Variable names should represent or describe the
data they contain
– Do not use meta-characters, stick to alphabets, digits and
underscores. Begin variable with alphabet
• Perl as a language has keywords that should not be
used as variable names. They are reserved for writing
syntax and logical flow of the program
– Examples include: my, if, then, else, for, foreach, while, do,
unless, until, break, continue, switch, def, class
11
Information Services, 00/00/2010
Scalar and Array variables
• Variables preceded by a “$” sign are called “Scalar”
variables. They hold a single value – could be a
number or string, etc.
$score = 5.3;
$dna = “ATAGGATAGCGA”;
$name = “Shailender”;
• Variables preceded by a “@” sign are called “Array”
variables. They hold a list of values – could be a list of
students in a class, scores from a test, etc
@students = (“Alan”, ”Shailender”, ”Chris”);
@scores = (89.1, 65.9, 92.4);
@binding_pos = (9439984, 114028942);
12
Information Services, 00/00/2010
Printing text
• Using double “ “ and single quotes ‘ ‘
• Using double quotes process all items within them
– Ex: print “This \t is a test\nwith text”
– “\t” is a tab delimiter. “\n” is a newline character.
– Output:
This
is a test
With text
• Single quotes are not processed at all. So all items
are treated as actual text.
– Ex: print ‘This \t is a test\nwith text’
Output: This \t is a test\nwith text
13
Information Services, 00/00/2010
Printing scalar variables
• Scalar variables can be printed easily within doublequotes following a print statement. Variables names
are “interpolated”, printing the values they contain
$x = 5;
$name = “John”;
print “$name has $x dollars\n”;
• If you run this as a program, you get this output
John has 5 dollars
14
Information Services, 00/00/2010
Printing array variables
• Array variables can also be printed as a list with a
default delimiter, but another way to print arrays is
put them in a loop and print them as scalars
@students = (“Alan”, ”Shailender”, ”Chris”);
print “@students\n”;
# Method 1
foreach $name (@students) # Method 2
{ print “Student name is $name\n” }
• If you run this as a program, you get this output:
Alan Shailender
Student name is
Student name is
Student name is
Chris
Alan
Shailender
Chris
15
(Method 1)
(Method 2)
(Method 2)
(Method 2)
Information Services, 00/00/2010
Math Operators and Expressions
• Math operators
–
–
–
–
–
Eg: 3 + 2
+ is the operator
We read this left to right
Basic operators such as + - / * ** ( ^ )
Variables can be used
print “Sum of 2 and 3 is ”.(2+3);
$x = 3;
print “Sum of 2 and x is “.(2+$x);
• PEMDAS rules are followed to build mathematical
expressions. Built-in math functions can be used
16
Information Services, 00/00/2010
Mathematical operations
• $x=3
• $y=5;
• $z=$y+$x;
– Is this the same: $z=$x+$y ?
– Yes, but not in-terms of computing it (LR grammar)
• $x=$x*z;
• $y=$y+1;
• $y++;
17
Information Services, 00/00/2010
Perl built-in functions
• The perl language comes with many built-in
functions that can be used with variables to produce
summary output, eg,
–
–
–
–
–
–
length: return length of a string
substr: return sub-string from a string
uc: convert string to upper case
reverse: reverse the contents of a list
sort: Sort a list of numbers or strings
pop: Return the last element of an array and remove it
• Many mathematical functions are also available
18
Information Services, 00/00/2010
Array Indexing
• Arrays can be indexed by number to retrieve individual
elements (scalars)
• Indexes have range 0 to (n-1), where 0 is the index of the
first element and n-1 is the last item’s index
@array=(“A”,”C”,”G”,”T”,”U”);
@nucleotides=(“adenine”, ”cytosine”, ”guanine”,
”thymine”, ”uracil”);
$array[0] is equal to  A
$nucleotides[3] is equal to  thymine
$nucleotides[4] is equal to what?
@array[0..2] is equal to what?
• Any element of an array can be re-assigned
19
Information Services, 00/00/2010
Array Operations
• Perl arrays are dynamic, they assume
whatever values or size needed
– Can dynamically lengthen or shorten arrays
– May be defined, but empty
– No predefined size or "out of bounds" error
– unshift and shift add to and remove from the
front
– push and pop add to and remove from the end
20
Information Services, 00/00/2010
Arrays operations
my @fruits;
# Undefined
@fruits = qw(apples bananas cherries);# Assigned
@fruits = (@fruits, "dates");
# Lengthen
@fruits = ();
# Empty
unshift @fruits, "acorn";# Add an item to the front
my $nut = shift @fruits; # Remove from the front
print "Well, a squirrel would think a $nut was a
fruit!\n";
push @fruits, "mango";
# Add an item to the end
my $food = pop @fruits; # Remove from the end
print "My, that was a yummy $food!\n";
Output:
Well, a squirrel would think a acorn was a fruit!
My, that was a yummy mango!
21
Information Services, 00/00/2010
Array operations
• Slices of an array (sub-array) is itself an array
– Take an array slice with @array[@indices]
– $array[0] is a scalar, that is, a single value - item
– @array[0] is an array, containing a single scalar
– Scalars always begin with $ - single item
– Arrays always begin with @ - array itself
22
Information Services, 00/00/2010
String Operations: Split
• Perl is very useful in handling string variables
• The “split” command allows users to search for patterns in a
string and use them as a delimiter to break the string apart
• For example, to extract the words in a sentence, we use the
space delimiter to capture the words
@words = split(/ /, “This is a sentence”);
• Now @words contains the words
$words[0]
$words[1]
$words[2]
$words[3]
=
=
=
=
“This”
“is”
“a”
“sentence”
String Operations: Split (…contd)
• Another way to assign the results of a split command is to
anticipate how many scalar variables will be created
($first, undef, $third, $fourth) = split(/ /,
“This is a sentence”);
• In this case, we anticipate 4 values to be returned by the split
command but we choose to ignore the second one, which we
assign to “undef”. Only 3 variables will get created
String Operations: Join
• The join command concatenates multiple variables in the order
they are entered into the join command
– A delimiter must be specified, such as space, tab, comma, etc
– Empty delimiter “” results in full concatenation
• Syntax:
$new_string = join(“:”, @names, “Paul”, “Debbie”);
• The value of $new_string shall be
“Joe:John:David:Paul:Debbie”
• Try out:
perl split_and_join.pl
String operations: substr
• Perl allows the extraction of a substring from a scalar string
variable
• Syntax:
$part_string = substr($fullstring, $start,
$length);
$part_string = substr(“This is a full string”,
10, 4);
• The $part_string value is “full”;
• Remember that indexing within string variables begins with 0,
just like with arrays, but sub-strings cannot be extracted like
arrays
Iterating over Arrays with
“foreach”
• Ok, so we have these arrays, but how do we work
with each element automatically?
– How can we iterate over them and perform the same
operation to each element?
• We use looping logic to work with the arrays
• We use Perl’s “for”, more specifically foreach
foreach $named_item (@array) {
$named_item = <some expression>;
}
27
Information Services, 00/00/2010
Perl Arrays, foreach cont.
• Example:
my @nucleotides=("adenine", "cytosine", "guanine",
"thymine", "uracil");
foreach $nt (@nucleotides)
{
print "Nucleotide is: $nt\n";
}
Output:
Nucleotide
Nucleotide
Nucleotide
Nucleotide
Nucleotide
is:
is:
is:
is:
is:
adenine
cytosine
guanine
thymine
uracil
28
Information Services, 00/00/2010
Iterating over Arrays with “for”
• Example:
my @nucleotides=("adenine", "cytosine", "guanine",
"thymine", "uracil");
for($i=0;$i<length(@nucleotides);$i++)
{
print "Nucleotide is: $nucleotides[$i]\n";
}
Output:
Nucleotide
Nucleotide
Nucleotide
Nucleotide
Nucleotide
is:
is:
is:
is:
is:
adenine
cytosine
guanine
thymine
uracil
29
Information Services, 00/00/2010
Iterating over Arrays with “while”
• Example:
@nucleotides=("adenine", "cytosine", "guanine",
"thymine", "uracil");
$i = 0;
while($i<length(@nucleotides) {
print "Nucleotide is: $nucleotides[$i]\n";
$i++; }
• Output:
Nucleotide
Nucleotide
Nucleotide
Nucleotide
Nucleotide
is:
is:
is:
is:
is:
adenine
cytosine
guanine
thymine
uracil
30
Information Services, 00/00/2010
The infamous “$_”
• $_ is a buffer used by Perl to hold the current entry
of a loop string selector
• Example
@a= (1, 1, 2, 3, 5, 8, 11);
foreach (@a) {
print $_;
# “print” without $_ will work too
print “ ”;
}
• Output:
1 1 2 3 5 8 11
31
Information Services, 00/00/2010
Boolean Operations
• Boolean operators provide Boolean context
• Many types of operators are provided
– Relational (<, >, lt, gt)
– Equality (==, !=, eq, ne)
– Logical (high precedence) (&&, ||, !)
– Logical (low precedence) (and, or, not)
– Conditional (?:)
32
Information Services, 00/00/2010
Commands blocks in Perl
•
•
•
•
•
A group of statements surrounded by braces {}
Creates a new context for statements and commands
Starts with “{“
Ends with “}”
Ex:
{
print “Test\n”;
}
33
Information Services, 00/00/2010
Conditional operations with “ifthen-else”
• If-then-else syntax allows programmers to introduce
logic in their programs
• Blocks of code can be branched to execute only
when certain conditions are met
if(condition1 is true)
{ <execute these statements if condition1 is
true>; }
else
{
<execute these statements if condition1 is
false>; }
34
Information Services, 00/00/2010
Perl nested blocks
• Blocks within blocks
{
if ($x>1) {
if ($y>2) {
print(“y>2\n”;
}
print (“x>1\n”;
}
}
35
Information Services, 00/00/2010
Hash Arrays
• What is a Hash Array?
– Associative arrays, also frequently called hashes, are the
third major data type in Perl after scalars and arrays.
• Hashes work very similarly to a common data
structure that programmers use in other languages-hash tables
• Hashes in Perl are actually a direct language
supported data type
• All Perl hash variable names are prefixed with a %
symbol, and can hold key-value pairs
36
Information Services, 00/00/2010
Hash Arrays (…contd)
• Example data pairs that are suited to be stored in a
hash array (as opposed to storing them in 2 separate
arrays)
– Words (key) and their meanings (value)
– Gene symbols (key) and their full names (value)
– Country names (key) and their capitals/ currencies (value)
• Accessing a hash works the same as an array, instead
of subscript, you provide a key to retrieve the value
• Looking up items is faster than searching through an
array
37
Information Services, 00/00/2010
Hash Arrays, cont.
• Example
Key => Pair value
%wheels = (
unicycle => 1,
bike => 2,
tricycle => 3,
car => 4,
semi => 18
);
• To print the number of wheels in a car, do
print “A car has $wheels{‘car’} wheels”;
38
Information Services, 00/00/2010
Hash Arrays, cont.
• Other ways to assign Hash Arrays:
%dessert = ("pie", "apple", "cake", "carrot", "sorbet",
"orange");
# Method 1
%dessert = (pie
=> "apple“, cake
=> "carrot",
sorbet => "orange"); # Method 2
$dessert{“pie”} = “apple”;
# Method 3
$dessert{“cake”} = “carrot”;
# Method 3
$dessert{“sorbet”} = “orange”; # Method 3
print "I would like $dessert{pie} pie.\n";
Output:
I would like apple pie.
39
Information Services, 00/00/2010
Hash Array operations
• Certain key-value pairs can be deleted
print “Before deleting, the Unicycle has
$wheels{‘unicycle’} wheels\n“;
delete $wheels{‘unicycle’}; # delete unicyle entry
print "After deleting, the Unicycle has
$wheels{‘unicycle’} wheels\n“;
Output:
The Unicycle has: 1 wheels
After the delete the Unicycle has wheels
• Keys and values of a hash array can be retrieved
@vehicles = keys %wheels;
@wheel_nums = values %wheels;
40
Information Services, 00/00/2010
Hash Array iteration
• Easy to iterate
– “each” returns key/value pairs in random order
– while loop can iterate over entire hash
– we can create entites such as:
• ($vehicle, $wheels) = each %sounds
– when called on a hash in list context, returns a 2element list consisting of the key and value for the
next element of a hash
41
Information Services, 00/00/2010
Hash Array iteration (...contd)
my %sounds = (cow => "moooo", duck => "quack“,
horse => "whinny", sheep => "baa", hen => "cluck“,
pig
=> "oink");
my @barnyard_sounds = @sounds{"horse","hen","pig"};
while (my ($animal, $noise) = each %sounds) {
print "Old MacDonald had a $animal.";
print " With a $noise! $noise! here...\n";
}
Output:
Old MacDonald had a hen. With a cluck! cluck!
here...
Old MacDonald had a cow. With a moooo! moooo!
here...
Old MacDonald had a sheep. With a baa! baa! here...
42
Information Services, 00/00/2010
Numerical Operators, (more)
• We have the ability to short cut operators such as
x=x+1, and x=4/2 as well as other more common
operations
• Numeric operators provide numeric context
• All common operators are provided
–
–
–
–
Increment and decrement (++, --)
Arithmetic (+, *)
Assignment (+=, *=)
Bitwise Shifts (<<, >>)
43
Information Services, 00/00/2010
Perl File access
• What is file access?
– set of Perl commands/syntax to work with data files
• Why do we need it?
– Makes reading data from files easy, we can also create new
data files
• What different types are there?
– Read, write, append
44
Information Services, 00/00/2010
Perl File access
• Access to files is similar to shell redirection
– open() allows access to the file
– Redirect characters (<, >) define access type
– Can read, write, append, read & write, etc.
– Filehandle refers to opened file
– close() stops access to the file
– $! contains IO error messages (similar to $?)
45
Information Services, 00/00/2010
Perl File access
• Example
# Open file for reading
open INPUT, "< datafile" or die "Can't open input file:
$!";
# Open file for writing
open OUTPUT, "> outfile" or die "Can't open output file:
$!";
# Append
open LOG, ">> logfile " or die "Can't open log file: $!";
#RW File
open RWFILE, "+< myfile " or die "Can't open file: $!";
close INPUT;
46
Information Services, 00/00/2010
Perl Files access (reading)
• Reading from files
– Input operator <> reads one line from the file,
including the newline character
– chomp will remove newline if you want
– Can modify input recorder separator $/ to read
characters, words, paragraphs, records, etc.
47
Information Services, 00/00/2010
Perl File access (reading)
• Reading from files
– Easy to loop over entire file
– Loops will assign to $_ by default
– Be sure that the file is open for reading first
• Input file:
Lastname:Firstname:Age:Address:Apartment:City:State:
ZIP
Smith:Al:18:123 Apple St.:Apt.#1:Cambridge:MA:02139
48
Information Services, 00/00/2010
Perl File access Example
• Example
open(CUSTOMERS, "< mailing_list“) or die "Can't open
input file: $!";
while ($line = <CUSTOMERS>) {
my @fields = split(":", $line); # Fields separated by
colons
print "$fields[1] $fields[0]\n“;# Display selected fields
print "$fields[3], $fields[4]\n";
print "$fields[5], $fields[6] $fields[7]\n"; }
• Output:
Al Smith
123 Apple St., Apt. #1
Cambridge, MA 02139
49
Information Services, 00/00/2010
Perl File access writing
• Writing to files
– print writes to a file
– print writes to a STDOUT by default
– Be sure that the file is open for writing first
• Check for errors along the way
50
Information Services, 00/00/2010
Perl File access writing
• Example writing to a file
# Read file
open CUSTOMERS, "< mailing_list" or die "Can't open
input file: $!";
# Output file
open LABELS, "> labels" or die "Can't open output
file: $!";
while (my $line = <CUSTOMERS>) {
my @fields = split(":", $line);
print LABELS "$fields[1] $fields[0]\n";
print LABELS "$fields[3], $fields[4]\n";
print LABELS "$fields[5], $fields[6]
$fields[7]\n";
}
51
Information Services, 00/00/2010
Subroutines/ Functions
• What is a subroutine?
–
–
–
–
group related statements into a single task
segment code into logical blocks
avoid code and variable based collision
can be “called” by segments of other code
• Perl allows
– both declared and anonymous subs
– various ways of handling arguments
– various ways of calling subs
52
Information Services, 00/00/2010
Perl Subroutines
• Subroutines return values
– Explicitly with the return command
– Implicitly as the value of the last executed statement
• Return values can be a scalar or a flat list
• One can pass arguments to subs by value
• Arguments are passed into the @_ array
– @_ is the "fill in the blanks" array
– Usually should copy @_ into local variables
53
Information Services, 00/00/2010
Perl Subroutines
sub add_one {
my ($n) = @_;
#
return ($n + 1);
#
}
my ($a, $b) = (10, 0);
add_one($a); # Return value
$b = add_one($a);
#
54
Copy first argument
Return 1 more than input
is lost, nothing changes
$a is 10, $b is 11
Information Services, 00/00/2010
Perl Subroutines
• Subroutine calls usually have arguments in
parentheses
– Parentheses are not needed if sub is declared first
– But using parentheses is often good style
• Subroutine calls may be recursive
• Subroutines are another data type
– Name may be preceded by an & character
– & is not needed when calling subs
55
Information Services, 00/00/2010
Perl Subroutines
•
Example function
–
•
Fib(n-1)+fin(n-2)…
Compute
–
1,1,2,3,5,8,13….
• Example Output:
fibonacci(1) is 1
fibonacci(2) is 1
fibonacci(3) is 2
fibonacci(4) is 3
fibonacci(5) is 5
sub fibonacci {
($n) = @_;
die "Number must be positive" if $n <=
0;
return 1 if $n <= 2;
return (fibonacci($n-1) +
fibonacci($n-2));
}
foreach my $i (1..5) {
my $fib = fibonacci($i);
print "fibonacci($i) is $fib\n";
}
56
Information Services, 00/00/2010
Regular Expressions
• There are three kinds of regular expressions
– Matching: returns T or F if a pattern is found in string
$search_string =~ m/pattern/
– Substitution: matches a pattern, then substitutes it with
another. Returns count of substitutions
$search_string =~
s/search_pattern/substitute_pattern/;
– Transliteration: matches a single character and translates it
into another single character. Returns count of replaced
$search_string =~
tr/search_pattern/translate_pattern/;
57
Information Services, 00/00/2010
Regular expressions (…contd)
• RE’s are useful to Biologists looking for “patterns” in
DNA and protein sequences such as restriction
enzymes, motifs and single AA’s or DNA bases
• Transcribe a DNA sequence to RNA
$dna = “ATTAGGACGAAGATTGA”;
$dna =~ s/T/U/g;
print $dna;
• Obtain the reverse compliment of DNA sequence
$dna =~ tr/ATCG/TAGC/;
print “Reverse compliment is “.reverse($dna)
58
Information Services, 00/00/2010
Providing input to programs
• Perl programs can certainly have variables containing
parameter data, but it is sometimes convenient not
to have to edit a program to change some of the data
• There are 2 ways of doing this
– Reading data from shell directly into program variables by
requesting keyboard input
– Command line arguments
59
Information Services, 00/00/2010
Requesting keyboard input
• Example
print "What type of pet do you have? ";
my $pet = <STDIN>;
# Read a line from
STDIN
chomp $pet;
# Remove newline
print "Enter your pet's name: ";
my $name = <>;
# STDIN is optional, and is implied
chomp $name;
print "Your pet $pet is named $name.\n";
• Output:
What type of pet do you have? parrot
Enter your pet's name: Polly
Your pet parrot is named Polly.
60
Information Services, 00/00/2010
Command Line Arguments
• Command line arguments are optional values that
can be passed as input to the perl program
– After the name of the program, string or numeric values
are placed, with spaces separating them
– These values can be accessed by the @ARGV array variable
inside the program
• Examples:
perl arguments.pl arg1 arg2 10 20
• Why do you need Command Line Arguments?
– Specify inputs at runtime without re-editing the program
Using Perl programs on the cluster
• Perl scripts can easily be submitted as jobs to be run
on the MGHPCC infrastructure
• Basic understanding of Linux commands is required,
and an account on the cluster
• Lots of useful and account registration information at
www.umassrc.org
• Feel free to reach out to Research Computing for
help
hpcc-support@umassmed.edu
62
Information Services, 00/00/2010
What is a computing “Job”?
• A computing “job” is an instruction to the HPC
system to execute a command or script
– Simple linux commands or Perl/Python/R scripts that can
be executed within miliseconds would probably not qualify
to be submitted as a “job”
– Any command that is expected to take up a big portion of
CPU or memory for more than a few seconds on a node
would qualify to be submitted as a “job”. Why? (Hint:
multi-user environment)
63
How to submit a “job”
• The basic syntax is:
bsub <valid linux command>
• bsub: LSF command for submitting a job
• Lets say user wants to execute a Perl script. On
a linux PC, the command is
perl countDNA.pl
• To submit a job to do the work, do
bsub perl countDNA.pl
64
Specifying more “job” options
• Jobs can be marked with options for better job
tracking and resource management
– Job should be submitted with parameters such as queue
name, estimated runtime, job name, memory required,
output and error files, etc.
• These can be passed on in the bsub command
bsub –q short –W 1:00 –R rusage[mem=2048] –J
“Myjob” –o hpc.out –e hpc.err perl countDNA.pl
65
Job submission “options”
Option flag or Description
name
-q
Name of queue to use. On our systems, possible values are “short”
(<=4 hrs execution time), “long” and “interactive”
-W
Allocation of node time. Specify hours and minutes as HH:MM
-J
Job name. Eg “Myjob”
-o
Output file. Eg. “hpc.out”
-e
Error file. Eg. “hpc.err”
-R
Resources requested from assigned node. Eg: “-R
rusage[mem=1024]”, “-R hosts[span=1]”
-n
Number of cores to use on assigned node. Eg. “-n 8”
66
Why use the correct queue?
•
•
•
•
Match requirements to resources
Jobs dispatch quicker
Better for entire cluster
Help GHPCC staff determine when new resources are
needed
67
A bioinformatics demo
• Log on to the Umass server using Putty on windows or
Terminal on Mac
• Request an interactive shell session on one of the
compute nodes for this demo
$ bsub –q interactive –W 4:00 –Is bash
• Navigate to the training directory or copy the
examples to your local directory
$ cd /project/umw_rcs/training/perl
OR
$ cp /project/umw_rcs/training/perl/* .
A bioinformatics demo (…contd)
• Lets say we have microRNA data across 3 different
files
– microRNA sequence FASTA file
– Data file containing abundance of microRNA in sample
– Annotation file containing targets of microRNA
• Our goal in this exercise is to bring all of this data
together into a single report or table, alongwith
some analysis
69
Information Services, 00/00/2010
Questions?
• How can we help further?
• Please check out books we recommend as well as
web references (next 2 slides)
70
Information Services, 00/00/2010
Perl Books
• Perl books which may be helpful
– http://shop.oreilly.com/product/9780596000806.do
• Beginning Perl for Bioinformatics
– http://shop.oreilly.com/product/9780596003074.do
• Mastering Perl for Bioinformatics
– http://shop.oreilly.com/product/9780596003135.do
• Perl Cookbook
– http://shop.oreilly.com/product/9780596004927.do
• Programming Perl – The Reference
71
Information Services, 00/00/2010
Perl References
• http://en.wikipedia.org/wiki/Perl
• https://docs.google.com/viewer?url=http://blob.perl
.org/books/beginning-perl/3145_Chap01.pdf
• http://www.ebb.org/PickingUpPerl/pickingUpPerl_6.
html
• http://sipb.mit.edu/iap/perl/slides/slides.html
• http://perldoc.perl.org/
• http://www.tutorialspoint.com/perl/perl_oo_perl.ht
m
72
Information Services, 00/00/2010
Download