PowerPoint

advertisement
Computer Programming for Biologists
Class 9
Dec 4th, 2014
Karsten Hokamp
http://bioinf.gen.tcd.ie/GE3M25
Computer Programming for Biologists
Overview
• mock exam
• revision
• variable scope
• extensions
• file handles
Computer Programming for Biologists
Mock Exam
http://bioinf.gen.tcd.ie/GE3M25/programming/exam
Computer Programming for Biologists
Revision - Subroutines
my $prot = &translate($seq); # call
sub translate {
# definition
my $seq = shift @_; # parameters
…
return $prot; # return value(s)
}
Computer Programming for Biologists
scope
The area of the script in which a variable is visible.
Different blocks defined by:
• main program
• subroutines
• loops
• branches
 different namespaces
Computer Programming for Biologists
scope
my $global_1;
…
while (my $input = <>) {
statement1;
}
if (condition) {
my $local_1 = 'xxx';
}
sub subroutine {
my $local_1;
foreach my $nuc (@bases) {
statement2;
}
}
main
part
Computer Programming for Biologists
scope
Blocks
my $global_1;
…
while (my $input = <>) {
statement1;
}
if (condition) {
my $local_1 = 'xxx';
}
sub subroutine {
my $local_1;
foreach my $nuc (@bases) {
statement2;
}
}
Computer Programming for Biologists
scope
Tip: Keep local variables within subroutines
 explicitly pass content between main part and subs,
e.g.:
$protein = &translate($seq);
value returned from subroutine
value passed into subroutine
 avoid accidentally changing global variables
Computer Programming for Biologists
scope
Wrong:
# extract header
while ($input = <>) {
if ($input =~ /^>(.+)/) {
my $header = $1;
}
}
print "sequence ID: $header\n";
Global symbol "$header" requires explicit package name
Computer Programming for Biologists
scope
Correct:
# initialize global variable
my $header = '';
# extract header
while ($input = <>) {
if ($input =~ /^>(.+)/) {
$header = $1;
}
}
print "sequence ID: $header\n";
Computer Programming for Biologists
course project
common errors:
(scope)
my $dna = '';
# read input
while (my $input = <>) {
# remove line ending
chomp $input;
# append to sequence string
my $dna .= $input;
}
Computer Programming for Biologists
course project
common errors:
(scope)
my $dna = '';
# read input
while (my $input = <>) {
different variables
# remove line ending
chomp $input;
# append to sequence string
my $dna .= $input;
}
Computer Programming for Biologists
course project
common errors:
(scope)
my $dna = '';
# read input
while (my $input = <>) {
same variable
# remove line ending
chomp $input;
# append to sequence string
$dna .= $input;
}
Computer Programming for Biologists
course project
common errors:
(arrangement)
# print output in chunks of 60 bp width
while ($dna) {
$out = substr $dna, 0, 60, '';
print "$i $out\n";
$i += length($out);
}
# change string to array:
my @chars = split //, $dna;
empties $dna
Computer Programming for Biologists
course project
considerations:
# form the reverse complement
$dna = reverse($dna);
$dna =~ tr/ACTG/TGAC/;
# translate
my $protein = &translate($dna);
order is important
# translate
my $protein = &translate($dna);
# form the reverse complement
$dna = reverse($dna);
$dna =~ tr/ACTG/TGAC/;
Computer Programming for Biologists
course project
considerations:
# define variables:
my $do_revcomp = '';
my $do_composition = '';
my $do_translate = '';
# read sequence
my $dna = '';
…
make actions
optional
# calculate GC content
if ($do_composition) {
&composition($dna);
}
# form the reverse complement
if ($do_revcomp) {
$dna = reverse($dna);
$dna =~ tr/ACTG/TGAC/;
}
Computer Programming for Biologists
course project
considerations:
# define variables:
my $do_revcomp = '1';
my $do_composition = '';
my $do_translate = '';
# read sequence
my $dna = '';
…
make actions
optional
# calculate GC content
if ($do_composition) {
&composition($dna);
}
# form the reverse complement
if ($do_revcomp) {
$dna = reverse($dna);
$dna =~ tr/ACTG/TGAC/;
}
Computer Programming for Biologists
course project
Work on your course project (sequanto.pl):
1. fix bugs
2. add "choice" variables at the top
3. move code blocks into subroutines (GC-content, composition)
Computer Programming for Biologists
Control through options
Perl module
Getopt::Long
allows processing command line options.
Computer Programming for Biologists
Control through options
$ man Getopt::Long
NAME
Getopt::Long - Extended processing of command line options
SYNOPSIS
use Getopt::Long;
my $data
= "file.dat";
my $length = 24;
my $verbose;
$result = GetOptions ("length=i" => \$length,
# numeric
"file=s"
=> \$data,
# string
"verbose"
=> \$verbose);
# flag
Computer Programming for Biologists
Control through options
$ man Getopt::Long
NAME
Getopt::Long - Extended processing of command line options
SYNOPSIS
use Getopt::Long;
my $data
= "file.dat";
my $length = 24;
type
of argument
reference
my $verbose = '';
$result = GetOptions ("length=i" => \$length,
name
of parameter
# numeric
"file=s"
=> \$data,
# string
"verbose"
=> \$verbose);
# flag
Computer Programming for Biologists
Control through options
Command line parameters (with arguments):
perl test.pl -verbose -length 20 -file input.txt
 $verbose set to '1', $length set to '20', $data set to 'input.txt'
Computer Programming for Biologists
Control through options
Command line parameters (with arguments):
perl test.pl -verbose -length 20 -file input.txt
$verbose set to '1', $length set to '20', $data set to 'input.txt'
Reorder:
perl test.pl -file input.txt -length 20 –verbose
Computer Programming for Biologists
Control through options
Command line parameters (with arguments):
perl test.pl -verbose -length 20 -file input.txt
 $verbose set to '1', $length set to '20', $data set to 'input.txt'
Reorder:
perl test.pl -file input.txt -length 20 -verbose
Long version:
perl test.pl --verbose --length=20 --file=input.txt
Computer Programming for Biologists
Control through options
Command line parameters (with arguments):
perl test.pl -verbose -length 20 -file input.txt
 $verbose set to '1', $length set to '20', $data set to 'input.txt'
Reorder:
perl test.pl -file input.txt -length 20 -verbose
Long version:
perl test.pl --verbose --length=20 --file=input.txt
Short version:
perl test.pl -v -l 20 -f input.txt
Computer Programming for Biologists
Control through options
Try this in your script:
use Getopt::Long;
my $do_translation = '';
my $do_revcomp = '';
&GetOptions ("translate"
"revcomp"
=> \$do_translation,
=> \$do_revcomp,
);
To allow the following execution:
perl sequanto.pl -gc -revcomp test.fa
Computer Programming for Biologists
Control through options
Try this in your script:
use Getopt::Long;
1. Import module
my $do_translation = '';
2. Initialise variables
my $do_revcomp = '';
&GetOptions ("translate"
define flags
"revcomp"
3. Call function
=> \$do_translation,
=> \$do_revcomp,
);
associate with
referenced variables
To allow the following execution:
perl sequanto.pl -gc -revcomp test.fa
Data Input/Output
Redirect output
Program prints output to screen:
$ translate.pl seq.fa
MGSAILSALLSRRSQRATTIIYHYARITTQRAHGLCDII…
Redirect into file:
$ translate.pl seq.fa > seq.aa
Append to file:
$ translate.pl seq.fa >> seq.aa
Data Input/Output
Filehandles
Reading from STDIN, default input stream:
my $in = <>;
Use filehandle to read input from a specific file:
open (IN, 'input.txt'); # open file for reading
filehandle
while (my $in = <IN>) { … }
# read content line by line
close IN; # close filehandle when finished
Data Input/Output
Filehandles
Syntax:
open (FH, filename); # open file for reading
open (FH, "< filename");
# open file for reading
open (FH, "> filename");
# open file for writing
open (FH, ">> filename"); # append to file
close FH;
# empties buffer
Write and append mode will create files if necessary
Write mode will empty file first
Data Input/Output
Writing to files
$file_name = 'results.txt';
if ($write_modus eq 'append') {
# append to file (creates file if necessary)
open (OUT, ">>$file_name");
} else {
# normal write (erases content if file exists)
open (OUT, ">$file_name");
}
print OUT 'some text';
close OUT; # output might not appear until FH is closed!
Data Input/Output
Error check!
Always test if an important operation worked out:
open (IN, $file_name)
or die "Can't read from $file_name: $!";
open (OUT, ">>$file_out")
or die "Can't append to $file_out: $!";
# Note: special variable $! contains error message
Data Input/Output
Reading from Filehandle
One or more file names are specified after the program,
loop over each argument:
foreach my $file (@ARGV) {
# special variable @ARGV
open (IN, $file) or die; # open filehandle
while (my $in = <IN>) {
# read file line by line
# do something
}
close IN;
}
# close filehandle
Computer Programming for Biologists
Reading sequence from a file
Note: two command line arguments!
$ perl split.pl gcctg test.fa
# read pattern and sequence
my ($pattern, $file) = @ARGV;
# get pattern
my $pattern = shift @ARGV;
open (IN, $file) or die "$!";
my $sequence = '';
while (<IN>) {
next if (/^>/);
chomp;
$sequence .= $_;
}
close IN;
my $sequence = '';
while (<>) {
next if (/^>/);
chomp;
$sequence .= $_;
}
Computer Programming for Biologists
Reading sequence from a file
Note: two command line arguments!
$ perl split.pl gcctg test.fa
# read pattern and sequence
my $pattern = shift @ARGV;
my $file = shift @ARGV;
# get pattern
my $pattern = shift @ARGV;
open (IN, $file) or die "$!";
my $sequence = '';
while (<IN>) {
next if (/^>/);
chomp;
$sequence .= $_;
}
close IN;
my $sequence = '';
while (<>) {
next if (/^>/);
chomp;
$sequence .= $_;
}
Computer Programming for Biologists
course project
Work on your course project (sequanto.pl):
1. Add explicit opening of file-handle
2. Store translated sequence into a new file
Computer Programming for Biologists
Exam
Reminder
Exam:
Thu, Dec 11th, 11 - 1 pm
Download