Computer Programming for Biologists Class 9 Dec 4th, 2014 Karsten Hokamp http://bioinf.gen.tcd.ie/GE3M25 Computer Programming for Biologists Overview • mock exam • revision • variable scope • extensions • file handles Computer Programming for Biologists Mock Exam http://bioinf.gen.tcd.ie/GE3M25/programming/exam Computer Programming for Biologists Revision - Subroutines my $prot = &translate($seq); # call sub translate { # definition my $seq = shift @_; # parameters … return $prot; # return value(s) } Computer Programming for Biologists scope The area of the script in which a variable is visible. Different blocks defined by: • main program • subroutines • loops • branches different namespaces Computer Programming for Biologists scope my $global_1; … while (my $input = <>) { statement1; } if (condition) { my $local_1 = 'xxx'; } sub subroutine { my $local_1; foreach my $nuc (@bases) { statement2; } } main part Computer Programming for Biologists scope Blocks my $global_1; … while (my $input = <>) { statement1; } if (condition) { my $local_1 = 'xxx'; } sub subroutine { my $local_1; foreach my $nuc (@bases) { statement2; } } Computer Programming for Biologists scope Tip: Keep local variables within subroutines explicitly pass content between main part and subs, e.g.: $protein = &translate($seq); value returned from subroutine value passed into subroutine avoid accidentally changing global variables Computer Programming for Biologists scope Wrong: # extract header while ($input = <>) { if ($input =~ /^>(.+)/) { my $header = $1; } } print "sequence ID: $header\n"; Global symbol "$header" requires explicit package name Computer Programming for Biologists scope Correct: # initialize global variable my $header = ''; # extract header while ($input = <>) { if ($input =~ /^>(.+)/) { $header = $1; } } print "sequence ID: $header\n"; Computer Programming for Biologists course project common errors: (scope) my $dna = ''; # read input while (my $input = <>) { # remove line ending chomp $input; # append to sequence string my $dna .= $input; } Computer Programming for Biologists course project common errors: (scope) my $dna = ''; # read input while (my $input = <>) { different variables # remove line ending chomp $input; # append to sequence string my $dna .= $input; } Computer Programming for Biologists course project common errors: (scope) my $dna = ''; # read input while (my $input = <>) { same variable # remove line ending chomp $input; # append to sequence string $dna .= $input; } Computer Programming for Biologists course project common errors: (arrangement) # print output in chunks of 60 bp width while ($dna) { $out = substr $dna, 0, 60, ''; print "$i $out\n"; $i += length($out); } # change string to array: my @chars = split //, $dna; empties $dna Computer Programming for Biologists course project considerations: # form the reverse complement $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/; # translate my $protein = &translate($dna); order is important # translate my $protein = &translate($dna); # form the reverse complement $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/; Computer Programming for Biologists course project considerations: # define variables: my $do_revcomp = ''; my $do_composition = ''; my $do_translate = ''; # read sequence my $dna = ''; … make actions optional # calculate GC content if ($do_composition) { &composition($dna); } # form the reverse complement if ($do_revcomp) { $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/; } Computer Programming for Biologists course project considerations: # define variables: my $do_revcomp = '1'; my $do_composition = ''; my $do_translate = ''; # read sequence my $dna = ''; … make actions optional # calculate GC content if ($do_composition) { &composition($dna); } # form the reverse complement if ($do_revcomp) { $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/; } Computer Programming for Biologists course project Work on your course project (sequanto.pl): 1. fix bugs 2. add "choice" variables at the top 3. move code blocks into subroutines (GC-content, composition) Computer Programming for Biologists Control through options Perl module Getopt::Long allows processing command line options. Computer Programming for Biologists Control through options $ man Getopt::Long NAME Getopt::Long - Extended processing of command line options SYNOPSIS use Getopt::Long; my $data = "file.dat"; my $length = 24; my $verbose; $result = GetOptions ("length=i" => \$length, # numeric "file=s" => \$data, # string "verbose" => \$verbose); # flag Computer Programming for Biologists Control through options $ man Getopt::Long NAME Getopt::Long - Extended processing of command line options SYNOPSIS use Getopt::Long; my $data = "file.dat"; my $length = 24; type of argument reference my $verbose = ''; $result = GetOptions ("length=i" => \$length, name of parameter # numeric "file=s" => \$data, # string "verbose" => \$verbose); # flag Computer Programming for Biologists Control through options Command line parameters (with arguments): perl test.pl -verbose -length 20 -file input.txt $verbose set to '1', $length set to '20', $data set to 'input.txt' Computer Programming for Biologists Control through options Command line parameters (with arguments): perl test.pl -verbose -length 20 -file input.txt $verbose set to '1', $length set to '20', $data set to 'input.txt' Reorder: perl test.pl -file input.txt -length 20 –verbose Computer Programming for Biologists Control through options Command line parameters (with arguments): perl test.pl -verbose -length 20 -file input.txt $verbose set to '1', $length set to '20', $data set to 'input.txt' Reorder: perl test.pl -file input.txt -length 20 -verbose Long version: perl test.pl --verbose --length=20 --file=input.txt Computer Programming for Biologists Control through options Command line parameters (with arguments): perl test.pl -verbose -length 20 -file input.txt $verbose set to '1', $length set to '20', $data set to 'input.txt' Reorder: perl test.pl -file input.txt -length 20 -verbose Long version: perl test.pl --verbose --length=20 --file=input.txt Short version: perl test.pl -v -l 20 -f input.txt Computer Programming for Biologists Control through options Try this in your script: use Getopt::Long; my $do_translation = ''; my $do_revcomp = ''; &GetOptions ("translate" "revcomp" => \$do_translation, => \$do_revcomp, ); To allow the following execution: perl sequanto.pl -gc -revcomp test.fa Computer Programming for Biologists Control through options Try this in your script: use Getopt::Long; 1. Import module my $do_translation = ''; 2. Initialise variables my $do_revcomp = ''; &GetOptions ("translate" define flags "revcomp" 3. Call function => \$do_translation, => \$do_revcomp, ); associate with referenced variables To allow the following execution: perl sequanto.pl -gc -revcomp test.fa Data Input/Output Redirect output Program prints output to screen: $ translate.pl seq.fa MGSAILSALLSRRSQRATTIIYHYARITTQRAHGLCDII… Redirect into file: $ translate.pl seq.fa > seq.aa Append to file: $ translate.pl seq.fa >> seq.aa Data Input/Output Filehandles Reading from STDIN, default input stream: my $in = <>; Use filehandle to read input from a specific file: open (IN, 'input.txt'); # open file for reading filehandle while (my $in = <IN>) { … } # read content line by line close IN; # close filehandle when finished Data Input/Output Filehandles Syntax: open (FH, filename); # open file for reading open (FH, "< filename"); # open file for reading open (FH, "> filename"); # open file for writing open (FH, ">> filename"); # append to file close FH; # empties buffer Write and append mode will create files if necessary Write mode will empty file first Data Input/Output Writing to files $file_name = 'results.txt'; if ($write_modus eq 'append') { # append to file (creates file if necessary) open (OUT, ">>$file_name"); } else { # normal write (erases content if file exists) open (OUT, ">$file_name"); } print OUT 'some text'; close OUT; # output might not appear until FH is closed! Data Input/Output Error check! Always test if an important operation worked out: open (IN, $file_name) or die "Can't read from $file_name: $!"; open (OUT, ">>$file_out") or die "Can't append to $file_out: $!"; # Note: special variable $! contains error message Data Input/Output Reading from Filehandle One or more file names are specified after the program, loop over each argument: foreach my $file (@ARGV) { # special variable @ARGV open (IN, $file) or die; # open filehandle while (my $in = <IN>) { # read file line by line # do something } close IN; } # close filehandle Computer Programming for Biologists Reading sequence from a file Note: two command line arguments! $ perl split.pl gcctg test.fa # read pattern and sequence my ($pattern, $file) = @ARGV; # get pattern my $pattern = shift @ARGV; open (IN, $file) or die "$!"; my $sequence = ''; while (<IN>) { next if (/^>/); chomp; $sequence .= $_; } close IN; my $sequence = ''; while (<>) { next if (/^>/); chomp; $sequence .= $_; } Computer Programming for Biologists Reading sequence from a file Note: two command line arguments! $ perl split.pl gcctg test.fa # read pattern and sequence my $pattern = shift @ARGV; my $file = shift @ARGV; # get pattern my $pattern = shift @ARGV; open (IN, $file) or die "$!"; my $sequence = ''; while (<IN>) { next if (/^>/); chomp; $sequence .= $_; } close IN; my $sequence = ''; while (<>) { next if (/^>/); chomp; $sequence .= $_; } Computer Programming for Biologists course project Work on your course project (sequanto.pl): 1. Add explicit opening of file-handle 2. Store translated sequence into a new file Computer Programming for Biologists Exam Reminder Exam: Thu, Dec 11th, 11 - 1 pm