Genomics (Ecol 553L) Computational Lab Week 11: Oct 30, Nov 1 Course webpage: http://genomics.arizona.edu/553/ Topics: Perl subroutines and modules In class exercises: 1) Copy /genome/student/ecol553_week12 to your home directory. 2) Rewrite the seq_len.pl script to use Bio::SeqIO and name the new script bp_seq_len.pl. How many lines of code do you need and how does this compare to the older seq_len.pl script we wrote? 3) Using bp_seq_len.pl as a starting point, add the rev_com subroutine from week11, and use Bio::SeqIO to write an output file that contains reverse-complemented sequences, with each sequence name prepended with “RC1_”. Name the new sequence file with “RC1_” as the filename prefix. Name the script bp_seq_rc1.pl. 4) Go to http://doc.bioperl.org and find the documentation for the revcom method in the Bio::Seq module. Remember to check PrimarySeq and PrimarySeqI Included or Inherited modules on the BioPerl doc pages. 5) Using bp_seq_rc1.pl as a starting point, remove the rev_com subroutine and figure out how to call a Bio::Seq method to reverse complement the sequences. Name the new script bp_seq_rc2.pl and prepend each output sequence name with “RC2_”. Name the new sequence file with “RC2_” as the filename prefix. 6) Use the Unix diff command to compare the outputs of bp_seq_rc1.pl and bp_seq_rc2.pl 7) Add an argument specifying the sequence file format (e.g. fasta, genbank, etc.) to each of the bp_*.pl scripts to make them more flexible. How do we pass the format to BioPerl? Homework calculate theta and pi for an input alignment Create a perl program named piTheta.pl that is given a file name from the command line. This file will be a simplified aligned fasta file. Each sequence will be on a single line so that you can use the following pseudo-code to take in the file: Foreach line in the file If the first char is not ‘>’ Push the line onto a sequence array End if End foreach The sequences are also guaranteed to be the same length. You will need to find the columns that are variable, calculate the nucleotide ratios, etc. It might be helpful to keep an auxiliary array that has as many elements as there are columns in the alignment that indicate if a column is variable or not. Your output to standard out should look like: pi: <PI> \t var:<vartheta> theta: <THETA> \t var:<vartheta> To standard error you should output the alignment but only of the variable columns. For a reminder on the formulas for pi and theta look at Dr. Whiteman slides from 23 Oct.