Genomics (ECOL 553) Computational Lab Week 7: Oct 2 & 4, 2012. Course webpage: http://genomics.arizona.edu/553/ Topics: Functions glob,length,substr; Command line arguments; File Input/Output; Wrapping command line programs in Perl: system and backtics In class exercises: 1) Copy the directory ~deblasio/ecol553_student/ecol553_week7 to your home directory. 2) In the ecol553_week7 directory, run the glob_length.pl script. Then use an editor to change glob_length.pl as follows: a. Add code to find and output the shortest filename and its length. (For the initial $shortest_len value, use 9999) b. How could you output multiple shortest filenames in the case of a tie? 3) Run the cmdline_args.pl script, giving only one argument. Run it again correctly. Add code before the while loop to append an exclamation point (!) to each element of the @ARGV array. Test your modified script. 4) Run the file_input.pl script and be sure that you understand what it is doing. What Unix command would you use to check that the file_input.pl is printing the correct result? 5) Run the file_output.pl script and be sure that you understand what it is doing. Notice that the output filename is similar to the input filename. Is that a good idea? Why or why not? 6) Run the file_output_testexist.pl script and answer ‘N’. Figure out what is wrong with the script and fix it. 7) Run the system_cmd.pl script. Add another command line argument that specifies the number of hits and alignments to output (-num_descriptions and – num_alignments options will use the argument value.) Homework, due by 11:59pm on Tues., Oct. 15 (PRINTED ON BOTH SIDES!) Note that this is a 20 point homework and that you have 2 weeks to complete it. Get started right away so that you have time to ask for help if you need it. Remember to code and test incrementally! The scripts described below in numbers 1) and 2) should be placed in a new directory on login.hpc.arizona.edu, named: ~/eeb553_homework/homework5. To submit your homework, run the “turnin” script as usual: turnin homework5 (* remember your command prompt may be different from mine and the one listed *) 1) Using glob_length.pl, cmdline_args.pl, and file_output.pl as a guide, write a script named count_matched.pl a. Your script should use one command line argument that specifies a filename extension. Use the glob function to get an array of files having the specified filename extension and print the number of matching files found. Add comments to explain the code you write. You will lose points for inadequate comments. Output from an example run might look like this: [service1][~/ecol553_week7]> perl count_matched.pl “bln” There are 3 files with a .bln extension b. Instead of writing output to the screen, open an output file named matched_xxx.txt, where xxx is the pattern specified as an argument. An example run might look like this: [service1][~/ ecol553_week7]> perl count_matched.pl “bln” [service1][~/ ecol553_week7]> cat matched_bln.txt There are 3 files with a .bln extension 2) Using system_cmd.pl as a starting point, rename the script as blast_grep.pl Add comments to explain the code you write. You will lose points for inadequate comments. Make the following modifications to the script: a. Remove the code that creates a file with numbered lines b. Using backtics, run the command `grep JAY291 $file.blastn` and capture the results of the grep in an array c. For each element in the resulting array, print a substring of the array element that includes only the GI number (use substr). (Note that in general, substr is not a good way to extract GI numbers as they may vary in length…we’ll learn a better way soon.) d. To make the script more flexible, add code to accept a second command line argument that replaces the hard-coded JAY291 in the grep command. Example runs with output might look like this: [service1][~/ ecol553_week7]> perl blast_grep.pl yeastgenes.fa Patent Starting BLAST run...Finished BLAST run. 254748699 257307076 257307082 [service1][~/week7]> perl blast_grep.pl yeastgenes.fa EC1118 Starting BLAST run...Finished BLAST run. 259149040 259145041 259148440 259145824