Handout & Homework

advertisement
Genomics (ECOL 553) Computational Lab
Week 7: Oct 2 & 4, 2012.
Course webpage: http://genomics.arizona.edu/553/
Topics: Functions glob,length,substr; Command line arguments; File Input/Output;
Wrapping command line programs in Perl: system and backtics
In class exercises:
1) Copy the directory ~deblasio/ecol553_student/ecol553_week7 to your home
directory.
2) In the ecol553_week7 directory, run the glob_length.pl script. Then use an editor
to change glob_length.pl as follows:
a. Add code to find and output the shortest filename and its length. (For the
initial $shortest_len value, use 9999)
b. How could you output multiple shortest filenames in the case of a tie?
3) Run the cmdline_args.pl script, giving only one argument. Run it again correctly.
Add code before the while loop to append an exclamation point (!) to each
element of the @ARGV array. Test your modified script.
4) Run the file_input.pl script and be sure that you understand what it is doing.
What Unix command would you use to check that the file_input.pl is printing the
correct result?
5) Run the file_output.pl script and be sure that you understand what it is doing.
Notice that the output filename is similar to the input filename. Is that a good
idea? Why or why not?
6) Run the file_output_testexist.pl script and answer ‘N’. Figure out what is wrong
with the script and fix it.
7) Run the system_cmd.pl script. Add another command line argument that
specifies the number of hits and alignments to output (-num_descriptions and –
num_alignments options will use the argument value.)
Homework, due by 11:59pm on Tues., Oct. 15 (PRINTED ON BOTH SIDES!)
Note that this is a 20 point homework and that you have 2 weeks to complete it. Get
started right away so that you have time to ask for help if you need it. Remember to code
and test incrementally!
The scripts described below in numbers 1) and 2) should be placed in a new directory on
login.hpc.arizona.edu, named: ~/eeb553_homework/homework5. To submit your
homework, run the “turnin” script as usual:
turnin homework5
(* remember your command prompt may be different from
mine and the one listed *)
1) Using glob_length.pl, cmdline_args.pl, and file_output.pl as a
guide, write a script named count_matched.pl
a. Your script should use one command line argument that specifies a
filename extension. Use the glob function to get an array of files having
the specified filename extension and print the number of matching files
found. Add comments to explain the code you write. You will lose points
for inadequate comments. Output from an example run might look like
this:
[service1][~/ecol553_week7]> perl count_matched.pl “bln”
There are 3 files with a .bln extension
b. Instead of writing output to the screen, open an output file named
matched_xxx.txt, where xxx is the pattern specified as an argument. An
example run might look like this:
[service1][~/ ecol553_week7]> perl count_matched.pl “bln”
[service1][~/ ecol553_week7]> cat matched_bln.txt
There are 3 files with a .bln extension
2) Using system_cmd.pl as a starting point, rename the script as
blast_grep.pl Add comments to explain the code you write. You will lose
points for inadequate comments.
Make the following modifications to the script:
a. Remove the code that creates a file with numbered lines
b. Using backtics, run the command `grep JAY291 $file.blastn` and capture
the results of the grep in an array
c. For each element in the resulting array, print a substring of the array
element that includes only the GI number (use substr).
(Note that in general, substr is not a good way to extract GI numbers as they may
vary in length…we’ll learn a better way soon.)
d. To make the script more flexible, add code to accept a second command
line argument that replaces the hard-coded JAY291 in the grep command.
Example runs with output might look like this:
[service1][~/ ecol553_week7]> perl blast_grep.pl yeastgenes.fa Patent
Starting BLAST run...Finished BLAST run.
254748699
257307076
257307082
[service1][~/week7]> perl blast_grep.pl yeastgenes.fa EC1118
Starting BLAST run...Finished BLAST run.
259149040
259145041
259148440
259145824
Download