Tutorial #6 by Ma’ayan Fishelson Based on notes by Terry Speed Background – Lander & Green’s HMM • Complexity: – Linear in the number of loci, and number of founders. – Exponential in the number of non-founders. • Recombinations across successive intervals are independent sequential computation across loci using the forward-backward algorithm is enabled. • The algorithm computing the probability of the data given an inheritance vector is linear in the number of founders. • We need to sum over all possible inheritance vectors (exponential in the number of non-founders). Goal • Compute Pr[ml | vl], at locus l. marker data at this locus (evidence). A certain inheritance vector. References The algorithm presented herein was introduced by Sobel and Lange [2], and Kruglyak et al. [1]. 1. E. Sobel and K. Lange. Descent graphs in pedigree analysis: applications to haplotyping, location score, and marker-sharing statistics. Am. J. Hum. Genet., 58:1323--1337. 1996. 2. L. Kruglyak, M.J. Daly, M.P. Reeve-Daly, and E.S. Lander. Parametric and nonparametric linkage analysis: a unified multipoint approach. Am. J. Hum. Genet., 58:1347--1363, 1996. Main Idea • Let a = (a1,…,a2f) be a vector of alleles assigned to founders of the pedigree (f is the number of founders). • We want to represent by a graph the restrictions imposed by the observed marker genotypes on the vectors a that can be assigned to the founder genes. • The algorithm extracts from the graph only vectors a compatible with the marker data. • Pr[m|v] is obtained via a sum over all compatible vectors a. Example – marker data on a pedigree 1 11 2 12 13 a/b a/b 21 22 23 24 a/b a/b a/c b/d 14 Descent Graph • Corresponds to a specific inheritance vector. • Vertices: the individuals’ genes (2 genes for each individual in the pedigree). • Edges: represent the gene flow specified by the inheritance vector. A child’s gene is connected by an edge to the parent’s gene from which it flowed. Example – Descent Graph (vertices) 1 2 11 12 13 14 21 22 23 24 a/b a/b a/b a/b a/c Descent Graph 1 2 (a,b) Assume that the descent graph vertices below represent the pedigree on the left. b/d 3 4 (a,b) (a,b) 5 6 (a,b) (a,c) 7 8 (b,d) Example – Descent Graph (cont.) Descent Graph 1 2 (a,b) 3 4 (a,b) (a,b) 5 6 (a,b) (a,c) 7 8 (b,d) Assume that paternally inherited genes are on the left. 2. Assume that non-founders are placed in increasing order. 3. A ‘1’ (‘0’) is used to denote a paternally (maternally) originated gene. The gene flow above corresponds to the inheritance vector: v = ( 1,1; 0,0; 1,1; 1,1; 1,1; 0,0 ) 1. Founder Graph • Vertices: the founder genes. • Edges: connect the genes appearing together in a genotyped individual for the gene flow specified by the inheritance vector v. • Note: the edges are labeled with the genotype of the corresponding individuals. Example – Founder Graph Descent Graph 1 2 3 4 5 (a,b) (a,b) 6 7 (a,b) (a,b) (a,c) (b,d) Founder Graph 5 (a,b) 3 (a,b) 2 1 6 (b,d) 8 (a,b) 8 4 (a,c) 7 Founder Graph • Includes m connected components, C1,…Cm. • The founder genes assigned to different components appear in different genotyped individuals, by construction. • Under random mating and Hardy-Weinberg equilibrium, the vectors of alleles assigned to different components are independent • Each component can be processed individually. Singleton Components • The vertices corresponding to genes that never passed through genotyped individuals form singleton components. • Any allele type can be assigned to singleton components. Singleton component 5 (a,b) 3 (a,b) 2 1 6 (b,d) 8 (a,b) 4 (a,c) 7 Singleton Components (cont.) 3 1 2 (a,b) 4 (a,b) (a,b) 5 6 (a,b) (a,c) 7 8 (b,d) Find compatible allelic assignments for non-singleton components 1. Identify the set of compatible alleles for each vertex. This is the intersection of the genotypes. attached to the edges incident to the vertex. {a,b} ∩ {a,b} = {a,b} 5 (a,b) 3 (a,b) 2 1 {a,b} ∩ {b,d} = {b} 6 (b,d) 8 (a,b) 4 (a,c) 7 Find compatible allelic assignments for non-singleton components (cont.) 2. Utilize the whole structure of the component to find allelic assignments compatible with observed genotypes for the component. I. Pick an arbitrary vertex in the component. II. If the set of compatible alleles for that vertex contains one element select that allele type. Otherwise, repeat step III for each of the 2 allele types. III. Traverse the graph & record the alleles assigned to each vertex to obtain a compatible allelic assignment (when selecting one allele type, the allele types of the adjacent vertices are determined…). IV. If an incompatibility is encountered at some point there’s no compatible assignment for the allele type we started from. Possible Allelic Assignments (example) {a,b} 5 (a,b) {a,b} 3 (a,b) 2 {a,b,c,d} 1 {a,b} {b} {a} (a,b) 6 4 (a,c) (b,d) 8 7 {b,d} {a,c} Graph Component (2) Allelic Assignments (a), (b), (c), (d) (1,3,5) (4,6,7,8) (a,b,a), (b,a,b) (a,b,c,d) Compatible Allelic Assignments • Denote by A1,…,Am the set of compatible allelic assignments obtained for each connected component at the end of the algorithm. • Except for singleton components, each Ai contains 0,1, or 2 assignments. • If for some i, Ai is empty Pr[m|v] = 0. • The compatible assignments are those in the Cartesian product A1x…xAm. Computing Pr[m|v] • The probability of singleton components is 1 we can ignore them. • Let ahi be an element of Ai (a vector of alleles assigned to the vertices of component Ci). Pr[ ahi ] Pr[a j ] hi ] { j: jCi } Pr[Ci ] Pr[ a {h:ah iAi } Pr[ m | v ] m Pr[C i 1 i ] Computing Pr[m|v] – Complexity Pr[ ahi ] Pr[a j ] The product is over 2f elements. hi ] The summation contains at most 2 terms. { j: jCi } Pr[Ci ] Pr[ a {h:ah iAi } Pr[ m | v ] m Pr[C i ] i 1 The maximum number of operations is 4f. The computation scales linearly with the no. of founders.