Basic Concepts in Genetics

advertisement
Tutorial #6
by Ma’ayan Fishelson
Based on notes by Terry Speed
Background – Lander & Green’s HMM
• Complexity:
– Linear in the number of loci, and number of founders.
– Exponential in the number of non-founders.
• Recombinations across successive intervals are
independent  sequential computation across loci using
the forward-backward algorithm is enabled.
• The algorithm computing the probability of the data
given an inheritance vector is linear in the number of
founders.
• We need to sum over all possible inheritance vectors
(exponential in the number of non-founders).
Goal
• Compute Pr[ml | vl], at locus l.
marker data at this
locus (evidence).
A certain
inheritance vector.
References
The algorithm presented herein was introduced by
Sobel and Lange [2], and Kruglyak et al. [1].
1. E. Sobel and K. Lange. Descent graphs in pedigree
analysis: applications to haplotyping, location
score, and marker-sharing statistics. Am. J. Hum.
Genet., 58:1323--1337. 1996.
2. L. Kruglyak, M.J. Daly, M.P. Reeve-Daly, and E.S.
Lander. Parametric and nonparametric linkage
analysis: a unified multipoint approach. Am. J.
Hum. Genet., 58:1347--1363, 1996.
Main Idea
• Let a = (a1,…,a2f) be a vector of alleles assigned to
founders of the pedigree (f is the number of
founders).
• We want to represent by a graph the restrictions
imposed by the observed marker genotypes on the
vectors a that can be assigned to the founder genes.
• The algorithm extracts from the graph only vectors
a compatible with the marker data.
• Pr[m|v] is obtained via a sum over all compatible
vectors a.
Example – marker data on a
pedigree
1
11
2
12
13
a/b
a/b
21
22
23
24
a/b
a/b
a/c
b/d
14
Descent Graph
• Corresponds to a specific inheritance
vector.
• Vertices: the individuals’ genes (2 genes
for each individual in the pedigree).
• Edges: represent the gene flow specified
by the inheritance vector. A child’s gene is
connected by an edge to the parent’s gene
from which it flowed.
Example – Descent Graph (vertices)
1
2
11
12
13
14
21
22
23
24
a/b
a/b
a/b
a/b
a/c
Descent Graph
1
2
(a,b)
Assume that the descent graph
vertices below represent the
pedigree on the left.
b/d
3
4
(a,b)
(a,b)
5
6
(a,b)
(a,c)
7
8
(b,d)
Example – Descent Graph (cont.)
Descent Graph
1
2
(a,b)
3
4
(a,b)
(a,b)
5
6
(a,b)
(a,c)
7
8
(b,d)
Assume that paternally inherited genes are on the left.
2. Assume that non-founders are placed in increasing
order.
3. A ‘1’ (‘0’) is used to denote a paternally (maternally)
originated gene.
 The gene flow above corresponds to the inheritance
vector: v = ( 1,1; 0,0; 1,1; 1,1; 1,1; 0,0 )
1.
Founder Graph
• Vertices: the founder genes.
• Edges: connect the genes appearing
together in a genotyped individual for the
gene flow specified by the inheritance
vector v.
• Note: the edges are labeled with the
genotype of the corresponding individuals.
Example – Founder Graph
Descent Graph
1
2
3
4
5
(a,b)
(a,b)
6
7
(a,b)
(a,b)
(a,c)
(b,d)
Founder Graph
5
(a,b)
3
(a,b)
2
1
6
(b,d)
8
(a,b)
8
4
(a,c)
7
Founder Graph
• Includes m connected components, C1,…Cm.
• The founder genes assigned to different
components appear in different genotyped
individuals, by construction.
• Under random mating and Hardy-Weinberg
equilibrium, the vectors of alleles assigned to
different components are independent
• Each component can be processed individually.
Singleton Components
• The vertices corresponding to genes that
never passed through genotyped individuals
form singleton components.
• Any allele type can be assigned to singleton
components.
Singleton
component
5
(a,b)
3
(a,b)
2
1
6
(b,d)
8
(a,b)
4
(a,c)
7
Singleton Components (cont.)
3
1
2
(a,b)
4
(a,b)
(a,b)
5
6
(a,b)
(a,c)
7
8
(b,d)
Find compatible allelic assignments
for non-singleton components
1.
Identify the set of compatible alleles for each
vertex. This is the intersection of the genotypes.
attached to the edges incident to the vertex.
{a,b} ∩ {a,b} = {a,b}
5
(a,b)
3
(a,b)
2
1
{a,b} ∩ {b,d} = {b}
6
(b,d)
8
(a,b)
4
(a,c)
7
Find compatible allelic assignments
for non-singleton components (cont.)
2.
Utilize the whole structure of the component to find allelic
assignments compatible with observed genotypes for the
component.
I.
Pick an arbitrary vertex in the component.
II.
If the set of compatible alleles for that vertex contains one
element  select that allele type.
Otherwise, repeat step III for each of the 2 allele types.
III.
Traverse the graph & record the alleles assigned to each vertex
to obtain a compatible allelic assignment (when selecting one
allele type, the allele types of the adjacent vertices are
determined…).
IV.
If an incompatibility is encountered at some point  there’s no
compatible assignment for the allele type we started from.
Possible Allelic Assignments
(example)
{a,b}
5
(a,b)
{a,b}
3
(a,b)
2
{a,b,c,d}
1
{a,b}
{b}
{a}
(a,b)
6
4
(a,c)
(b,d)
8
7
{b,d}
{a,c}
Graph Component
(2)
Allelic Assignments
(a), (b), (c), (d)
(1,3,5)
(4,6,7,8)
(a,b,a), (b,a,b)
(a,b,c,d)
Compatible Allelic Assignments
• Denote by A1,…,Am the set of compatible allelic
assignments obtained for each connected
component at the end of the algorithm.
• Except for singleton components, each Ai contains
0,1, or 2 assignments.
• If for some i, Ai is empty  Pr[m|v] = 0.
• The compatible assignments are those in the
Cartesian product A1x…xAm.
Computing Pr[m|v]
• The probability of singleton components is 1  we can
ignore them.
• Let ahi be an element of Ai (a vector of alleles assigned
to the vertices of component Ci).
Pr[ ahi ] 
 Pr[a
j
]
hi
]
{ j: jCi }
Pr[Ci ] 
 Pr[ a
{h:ah iAi }
Pr[ m | v ] 
m
 Pr[C
i 1
i
]
Computing Pr[m|v] – Complexity
Pr[ ahi ] 
 Pr[a
j
]
The product is over 2f
elements.
hi
]
The summation contains at
most 2 terms.
{ j: jCi }
Pr[Ci ] 
 Pr[ a
{h:ah iAi }
Pr[ m | v ] 
m
 Pr[C
i
]
i 1
The maximum number of operations is 4f.
The computation scales linearly with the no. of founders.
Download