Genetic linkage analysis Dotan Schreiber According to a series of presentations by M. Fishelson OutLine • • • • • Introduction. Basic concepts and some background. Motivation for linkage analysis. Linkage analysis: main approaches. Latest developments. “Genetic linkage analysis is a statistical method that is used to associate functionality of genes to their location on chromosomes.“ http://bioinfo.cs.technion.ac.il/superlink/ The Main Idea/usage: Neighboring genes on the chromosome have a tendency to stick together when passed on to offsprings. Therefore, if some disease is often passed to offsprings along with specific markergenes , then it can be concluded that the gene(s) which are responsible for the disease are located close on the chromosome to these markers. Basic Concepts • • • • Locus Allele Genotype Phenotype Dominant Vs. Recessive Allele צבע עיניים:דוגמא קלאסית heterozygote homozygote (se)X-Linked Allele Most human cells contain 46 chromosomes: • 2 sex chromosomes (X,Y): XY – in males. XX – in females. • 22 pairs of chromosomes named autosomes. Around 1000 human alleles are found only on the X chromosome. “…the Y chromosome essentially is reproduced via cloning from one generation to the next. This prevents mutant Y chromosome genes from being eliminated from male genetic lines. Subsequently, most of the human Y chromosome now contains genetic junk rather than genes.” http://anthro.palomar.edu/biobasis/bio_3b.htm Medical Perspective When studying rare disorders, 4 general patterns of inheritance are observed: • Autosomal recessive (e.g., cystic fibrosis). – Appears in both male and female children of unaffected parents. • Autosomal dominant (e.g., Huntington disease). – Affected males and females appear in each generation of the pedigree. – Affected parent transmits the phenotype to both male and female children. ..Continued • X-linked recessive (e.g., hemophilia). – Many more males than females show the disorder. – All daughters of an affected male are “carriers”. – None of the sons of an affected male show the disorder or are carriers. • X-linked dominant. – Affected males pass the disorder to all daughters but to none of their sons. – Affected heterozygous females married to unaffected males pass the condition to half their sons and daughters. Example 1 2 3 4 5 6 7 8 9 10 – After the disease is introduced into the family in generation #2, it appears in every generation dominant! – Fathers do not transmit the phenotype to their sons X-linked! Crossing Over Sometimes in meiosis, homologous chromosomes exchange parts in a process called crossing-over, or recombination. Recombination Fraction The probability for a recombination between two genes is a monotone, nonlinear function of the physical distance between their loci on the chromosome. (Linkage ) 0 P(Recombinat ion ) 0.5 ( No Linkage) Linkage The further apart two genes on the same chromosome are, the more it is likely that a recombination between them will occur. Two genes are called linked if the recombination fraction between them is small (<< 50% chance) Linkage related Concepts • Interference - A crossover in one region usually decreases the probability of a crossover in an adjacent region. • CentiMorgan (cM) - 1 cM is the distance between genes for which the recombination frequency is 1%. • Lod Score - a method to calculate linkage distances (to determine the distance between genes). Ultimate Goal: Linkage Mapping With the following few minor problems: – It’s impossible to make controlled crosses in humans. – Human progenies are rather small. – The human genome is immense. The distances between genes are large on average. Possible Solutions • Make general assumptions: Hardy-Weinberg Equilibrium – assumes certain probability for a certain individual to have a certain genotype. Linkage Equilibrium – assumes two alleles at different loci are independent of each other. • Incorporate those assumptions into possible solutions: Elston-Stewart method. Lander-Green method. Elston-Stewart method • Input: A simple pedigree + phenotype information about some of the people. These people are called typed. founder leaf 1/2 • Simple pedigree – no cycles, single pair of founders. ..Continued • Output: the probability of the observed data, given some probability model for the transmission of alleles. Composed of: founder probabilities - Hardy-Weinberg equilibrium penetrance probabilities The probability of the phenotype, given the genotype transmission probabilities the probability of a child having a certain genotype given the parents’ genotypes ..Continued • Bottom-Up: sum conditioned probabilities over all possible genotypes of the children and only then on the possible genotypes for the parents. • Linear in the number of people. Lander-Green method • Computes the probability of marker genotypes, given an inheritance vector. P(Mi|Vi) at locus i marker data at this locus (evidence). A certain inheritance vector. Main Idea • Let a = (a1,…,a2f) be a vector of alleles assigned to founders of the pedigree (f is the number of founders). • We want a graph representation of the restrictions imposed by the observed marker genotypes on the vector a that can be assigned to the founder genes. • The algorithm extracts only vectors a compatible with the marker data. • Pr[m|v] is obtained via a sum over all compatible vectors a. Example – marker data on a pedigree 1 11 2 12 13 a/b a/b 21 22 23 24 a/b a/b a/c b/d 14 Example – Descent Graph 1 2 11 12 13 14 21 22 23 24 a/b a/b a/b Descent Graph 1 2 (a,b) a/b a/c 3 4 (a,b) (a,b) b/d 5 6 (a,b) (a,c) 7 8 (b,d) Descent Graph 1 2 (a,b) 3 4 (a,b) (a,b) 5 6 7 8 (a,b) (a,c) (b,d) Assume that paternally inherited genes are on the left. 2. Assume that non-founders are placed in increasing order. 3. A ‘1’ (‘0’) is used to denote a paternally (maternally) originated gene. The gene flow above corresponds to the inheritance vector: v = ( 1,1; 0,0; 1,1; 1,1; 1,1; 0,0 ) 1. Example – Founder Graph Descent Graph 1 2 3 4 5 7 (a,b) (a,b) 6 (a,b) (a,b) (a,c) (b,d) Founder Graph 5 (a,b) 3 (a,b) 2 1 8 6 (b,d) 8 (a,b) 4 (a,c) 7 Find compatible allelic assignments for non-singleton components 1. Identify the set of compatible alleles for each vertex. This is the intersection of the genotypes. {a,b} ∩ {a,b} = {a,b} 5 (a,b) 3 (a,b) 2 1 {a,b} ∩ {b,d} = {b} 6 (b,d) 8 (a,b) 4 (a,c) 7 Possible Allelic Assignments {a,b} {a,b} 5 (a,b) 3 (a,b) 2 {a,b,c,d} 1 {a,b} {b} {a} (a,b) 6 4 (a,c) (b,d) 8 7 {b,d} {a,c} Graph Component (2) Allelic Assignments (a), (b), (c), (d) (1,3,5) (4,6,7,8) (a,b,a), (b,a,b) (a,b,c,d) Computing P(m|v) • If for some component there are no possible allelic assignments, then P(m|v) = 0. • The probability of singleton components is 1 we can ignore them. • Let ahi be an element of a vector of alleles assigned to the vertices of component Ci. Pr[ ahi ] Pr[a j ] over 2f elements hi ] 2 terms at most { j: jCi } Pr[Ci ] Pr[ a {h:ah iAi } Pr[ m | v ] m Pr[C i 1 i ] Linear in the number of founders Latest News: SuperLink • Combines the covered approaches in one unified program. • Has other built-in abilities that increase its computations efficiency. • Claimed to be more capable and faster than other related programs (by its own makers). • http://bioinfo.cs.technion.ac.il/superlink/ The End