Perl exercise 5 (Due on 01/06/2009) Don't forget to write well-organized scripts, use meaningful variables and to write comments about what you are doing in each part of code. Always test your script on several examples. If you need more biological sequences, search for appropriate examples in GenBank. Hashes 1. Change the solution for class exercise 4 question 1 (reading names and expanses from a file): After reading the input file, ask the user for a name (from STDIN) and print the sum of expanses for that name (use a hash). 2. Write a script that reads a FASTA file and stores the sequences in a hash. The header line should be the key and the sequence should be the value. Now ask the user for a header line and extract the sequence from the hash. 3. Write a script that reads a Genbank record and stores the title of each paper in a hash, with the last names of the authors as the keys (each title should be stored once for each of its authors). Now ask the user for an author's last name and print the paper by this author. In cases that an author appears on more than one paper – print only one of the papers it appears on (no need to store all this author's papers). Complex data structures 4. Write a script that reads a file where each line contains a sample number followed by any number of protein-level measurements, for example: 104 0.4322 0.3992 0.4832 Store an array of measurements for each sample in a hash, where the sample number is the key. Ask the user for a sample number and print a sorted list of measurements. 5. Write a script that reads a FASTA file of mRNA sequences, where the FASTA header line of each sequence has the following format: >accession #DE description #LN length #CD coding sequence start..end #TA TATA-box #PA poly-A start #RP repetitive element start..end The fields of #PA, #TA and #RP are optional (some sequences won’t have them), and #RP may appear more than once, but always at the end of the line. (The RP fields describe areas in the sequence that are annotated as repetitive elements) For example: >AF070670 #DE Homo sapiens protein phosphatase 2C alpha 2 mRNA, complete cds. #LN 2100 #CD 360..1334 #TA 328 #PA 2046 #RP 122..133 #RP 1874..1899 An example of such a file is available from the course webpage (the "rich-FASTA" file). a. Store all data from the file in a complex data structure of your choice. b. Ask the user for a length and print all accessions of sequences shorter than that length. c. Print the lengths of the proteins coded by the mRNAs (number of amino acids), and add a note for every protein with two or more RPs. d. Ask the user for a word and print all accessions of sequences whose definition contained that word. For example, "phosphatase” appears in the header above (hint: lesson 6 slides 1415).