genet302-GW-exer5Q

advertisement
Genetics 302 - Winter 2011 - Exercise 5 – Structural Variations
Due Wed April 6 at 9:00 AM
The purpose of this assignment is two-fold.
(A) Introduce you to the concept of structural variations (SVs): large-scale
insertions, deletions, and inversions that exceed the size of a typical sequence
read, say 1 kb to be definite.
(B) Let you work out for yourself how end-sequence pairs (ESPs) are used in
genome analysis, in this case to detect SVs.
Questions should be answered by drawing pictures. This assignment will count
for 4% of your course grade.
Two papers have been posted. The first is a policy paper that explains why SVs
are important and how they are being measured. Read it all. The second is a
data paper that contains more detail than you need to know and you should only
read enough to understand Figure 2.
EE Eichler, et al. 2007. Completing the map of human genetic variation. Nature
447: 161-165.
JM Kidd, et al. 2008. Mapping and sequencing of structural variation from eight
human genomes. Nature 453: 56-64.
The methods used can be summarized as follows. We already have a reference
sequence for the human genome. Now consider 8 additional genomes from 8
individuals. From each of these genomes we make a fosmid library, consisting of
millions of fosmid clones per library. The fosmid cloning system generates an
exceptionally narrow distribution of clone insert sizes, 40 ± 2.8 kb. Each of these
fosmid clones is sequenced from both ends, creating an ESP with two 500 bp
sequence reads separated by a known distance (40 kb) in the test genome from
which the fosmid clone was made. SVs are then detected by computationally
aligning ESPs to the reference genome. In the absence of a SV, that alignment
would be depicted as follows.
span = 40 kb
REF
test
Notice that the two ends of an ESP are represented by arrows. The directionality
refers to the fact that sequence reads always start from one or the other end of a
clone and work to their way towards the interior. There is information in the arrow
and the directionality matters.
Question 1: Suppose that the test genome had a 10 kb deletion relative to the
reference genome. How would the ESP alignment be depicted? Given the fosmid
size limitations what is the smallest and largest deletion that can be detected?
Question 2: Suppose that the test genome had a 10 kb insertion relative to the
reference genome. How would the ESP alignment be depicted? Given the fosmid
size limitations what is the smallest and largest insertion that can be detected?
Question 3: Suppose that the test genome had a 10 kb inversion relative to the
reference genome. How would the ESP alignment be depicted? Given the fosmid
size limitations what is the smallest and largest inversion that can be detected?
In all 3 questions you are expected to show more detail than Figure 1 of the first
paper. Show the expected distance between the two ends of the alignment on
the reference genome. The directions of the arrows matter. Consider information
from multiple ESPs if necessary.
All questions have equal weight.
Download