Genetics 302 - Winter 2011 - Exercise 5 – Structural Variations Due Wed April 6 at 9:00 AM The purpose of this assignment is two-fold. (A) Introduce you to the concept of structural variations (SVs): large-scale insertions, deletions, and inversions that exceed the size of a typical sequence read, say 1 kb to be definite. (B) Let you work out for yourself how end-sequence pairs (ESPs) are used in genome analysis, in this case to detect SVs. Questions should be answered by drawing pictures. This assignment will count for 4% of your course grade. Two papers have been posted. The first is a policy paper that explains why SVs are important and how they are being measured. Read it all. The second is a data paper that contains more detail than you need to know and you should only read enough to understand Figure 2. EE Eichler, et al. 2007. Completing the map of human genetic variation. Nature 447: 161-165. JM Kidd, et al. 2008. Mapping and sequencing of structural variation from eight human genomes. Nature 453: 56-64. The methods used can be summarized as follows. We already have a reference sequence for the human genome. Now consider 8 additional genomes from 8 individuals. From each of these genomes we make a fosmid library, consisting of millions of fosmid clones per library. The fosmid cloning system generates an exceptionally narrow distribution of clone insert sizes, 40 ± 2.8 kb. Each of these fosmid clones is sequenced from both ends, creating an ESP with two 500 bp sequence reads separated by a known distance (40 kb) in the test genome from which the fosmid clone was made. SVs are then detected by computationally aligning ESPs to the reference genome. In the absence of a SV, that alignment would be depicted as follows. span = 40 kb REF test Notice that the two ends of an ESP are represented by arrows. The directionality refers to the fact that sequence reads always start from one or the other end of a clone and work to their way towards the interior. There is information in the arrow and the directionality matters. Question 1: Suppose that the test genome had a 10 kb deletion relative to the reference genome. How would the ESP alignment be depicted? Given the fosmid size limitations what is the smallest and largest deletion that can be detected? Question 2: Suppose that the test genome had a 10 kb insertion relative to the reference genome. How would the ESP alignment be depicted? Given the fosmid size limitations what is the smallest and largest insertion that can be detected? Question 3: Suppose that the test genome had a 10 kb inversion relative to the reference genome. How would the ESP alignment be depicted? Given the fosmid size limitations what is the smallest and largest inversion that can be detected? In all 3 questions you are expected to show more detail than Figure 1 of the first paper. Show the expected distance between the two ends of the alignment on the reference genome. The directions of the arrows matter. Consider information from multiple ESPs if necessary. All questions have equal weight.