Computational Molecular Biology and Genomics Assignment 1 02-711/03-711/15-856 Due November 1st, 2012 Articles: C. A. Hutchinson, DNA Sequencing: bench to bedside and beyond. Nucleic Acids Research, 35 (18): 6227-6237, 2007. H. Stevens, Dr. Sanger, meet Mr. Moore. Bioessays 34: 103-105, 2011. Read these articles and briefly answer the following questions. You may read additional materials, if you wish. If you do, you must cite your sources. You may not quote verbatim without attribution. 1. Name four innovations that were prerequisites for the development of Sanger’s dideoxy sequencing method and explain in one or two sentences the importance of each one. 1 2. Draw a picture of a gel that would result from sequencing the DNA fragment ATGTATTTC using Sanger’s “Plus and minus”. 3. The main problem with the “plus and minus” method was accurate determination of homopolymer runs. Why? Why did this problem not occur with sequences that have no nucleotides? 2 4. Protein sequencing methods took advantage of enzymes that cleave proteins at a specific amino acid. Nucleic acid sequencing could not be implemented in an analogous manner, because there were no known enzymes that cut DNA at a specific base. Sanger offered one solution to this problem with his dideoxy method. Maxam and Gilbert had a different solution in their chemical method. Describe these two solutions. 5. Describe two similarities between the dideoxy method and the chemical method. 3 6. Two major innovations, introduced ten years apart, allowed high-throughput, automated sequencing factories. What were they? For each one, state in one or two sentences why it made sequencing more efficient. 7. Expressed sequence tags (EST) are short sequences from messenger RNA that have been reverse transcribed to cDNA. (i) When did Ventor introduce EST sequencing? (ii) When was the first whole genome sequence published? (iii) When was the draft human genome sequence announced? (iv) What information did EST sequencing provide that was unique at the time that EST sequencing was first introduced? 4 8. Venter’s use of whole genome shotgun sequencing first to sequence an entire genome (H. influezae) and later to sequence the large complex genome of a higher eukaryote (human) was controversial. What is shotgun sequencing and what major computational problem had to be solved for this approach to work? 9. What was the “paired end” strategy and what problem did it help to solve? 10. What are two similarities between Sanger’s “plus and minus” method and early pyrosequencing? 5 11. Stevens notes the distinction between “data driven” and hypothesis driven science. In basic terms, what does Stevens mean when he says next-gen sequencing encourages “data driven” science? (No need to comment on Bacon and Popper, unless you want to.) 6