1) heterozygosity Several measures of heterozygosity exist. The value of these measures will range from zero (no heterozygosity) to nearly 1.0 (for a system with a large number of equally frequent alleles). We will focus primarily on expected heterozygosity (HE, or gene diversity, D). The simplest way to calculate it for a single locus is as: H 1 pi2 where pi is the frequency of the ith of k alleles . If we want the gene diversity over several loci we need double summation and subscripting as follows H 1 pij2 i j In H.W heterozygosity is given by 2pq. The rest of the 2 2 p + q expression ( ) is the homozygosity. The heterozgosity for a two-allele system is described by a concave down parabola that starts at zero (when p = 0) goes to a maximum at p = 0.5 and goes back to zero when p = 1. In fact for any multi-allelic system, heterozygosity is greatest when p1 = p2 = p3 = ….pk The maximum heterozygosity for a 10-allele system comes when each allele has a frequency of 0.1 H then equals 0.9. 2) Indentity G1 and G2 are identical by descent (i.b.d) if they are physical copies of the same ancestor, or one of the other. G1 and G2 are identical by state (i.b.s) if they represent the same allele. The kinship between two relatives fij is the probability that random gene from autosomal loci in I and j are i.b.d. The interbreeding coefficient is the probability that his or her two genes from autosomal loci are i.b.d Every mutation creates a new allele Identity in state = identity by descent (IBD) F=1-H (inbreeding coefficient) is probability of IBD = 1/2. 3) Back to genetic drift Assume a population size of N, therefore 2N alleles in population. Imagine eggs and sperm released randomly into environment (e.g. sea) What is the probability of 2 gametes drawn randomly having the same allele? Therefore, after 1 generation the level of inbreeding is F1 = 1/2N After t generations the probability is 1 Ft 1 1 2N Ft 1 1 1 Ft 1 2N 2N t Genetic drift will make initially identical population different Eventually, each population will be fixed for a different allele If there are very many populations, the proportion of populations fixed for each allele will correspond to the initial frequency of the allele Small populations will get different more rapidly The effective population size is determined by Large variation in the number of offspring Overlapping generation Fluctuations in population size Unequal numbers of males and females contributing to reproduction 4) Founders effect Population Costa Rica Finland Hutterites Japan Iceland Newfoundland Quebec # of found ers 4,000 500 80 1,000 25,000 25,000 2,500 # of generati ons 12 80-100 14 80-100 40 16 12-16 Current size 2,500,000 5,000,000 36,000 120,000,000 300,000 500,000 6,000,000 Sardinia 500 400 1,660,000 5) Coalesence Simplification: 0, 1 or 2 offspring Coalesce: have the same parent Probability to coalesce: 1/N Probability Not to coalesce: 1 – 1/N t generations: (1-1/N)t Average time to coalesce for 2 genes: N For the whole population: 2N 6) Genetic drift and mutation Ft 1 1 2 2 1 1 1 Ft 1 2N 2N Probability of neither of 2 alleles being mutated is (1-)2 1 1 2 2 1 1 1 Ft 1 2N 2N 1 Fˆ Ft Ft 1 1 4N Ft If one also includes gene flow FT = [1/2N + (1 - 1/2N) * FT-1] * (1 – m- μ)2 7) Balance between Mutation and selection. Mutations can provide a balancing force to selection. Let us assume a mutation rate of from A2 to A1. The dynamics equation is: (1-r)p 2 +pq P= -p+(1- )*q W An equilibrium is obtained when pq+(1-s)q 2 q= 1+ (1-s)/s W(1- ) 8) How to compute kinship fAC is the coancestry of A with C etc., i.e. the probability of 2 gametes taken at random, 1 from A and one from C, being IBD. The inbreeding is thus ,- fAA be the probability of 2 gametes taken at random from A being IBD. A-B C- D | | P - Q | X 1 1 1 1 FX PQ AD AC BC BD 4 4 4 4 A- B | | P - Q | X xx=0.5(1+Fx)=3/4 9) General algorithm 1 1 2 3 4 5 6 If i originates from k and l ij= ji = ½(jk + jl) If i originates from k and l ii= ½+ kl 6 5 4 3 2 1 10) ¼ ¼ ¼ ¼ 0 ½ 1 ¼ ¼ ¼ ¼ ½ 0 2 3/8 3/8 ¼ ½ ¼ ¼ 3 3/8 3/8 ½ ¼ ¼ ¼ 4 3/8 5/8 3/8 3/8 ¼ ¼ 5 5/8 3/8 3/8 3/8 ¼ ¼ 6 Identity coefficients We can now summarize the kinship coefficient of some basic family relations: Relation ¼ 1/8 ParentOffspring Half Sibling ¼ Full Sibling 1/16 First Cousins Double First Cousins Second Cousins UncleNephew 1/8 1/64 1/8 11) Detailed Identity States Allele 1 Allele 2 I J S3=S*2US*2 S5=S*4US*5 S7=S*9US*12 S8=S*10 S*11 U S*13US*14 1 , 2 , 3 , 4 are 0, when i is not inbred. 1 , 2 , 5 , 6 are 0, when j is not inbred. 1 , 3 , 5 , 7 and 8 are 0, when i and j are unrelated. ji= 1+1/2(3 + 5 + 7)+1/4 8 12) 1 2 3 9 8 7 Relation ¼ 0 1 0 Parent-Offspring 1/8 ½ ½ 0 Half Sibling ¼ ¼ ½ 1/4 Full Sibling 1/16 ¾ ¼ 0 First Cousins 1/8 9/16 6/16 1/16 Double First Cousins 1/64 15/16 1/16 0 Second Cousins 1/8 ½ ½ 0 Uncle-Nephew Genotype prediction. What is the probability that i has a given genotype, given the genotype of j ? For example, If my uncle has a genetic disease, what is the probability that I will also have it? What are the probabilities of brothers from inbred parents to be homozygous for a disease causing gene? If I is heterozygous, with an inbreeding coefficient i 9 Pr( j m / n | i k / l ) Pr( j m / n | S r , i k / l ) * Pr( S r | i k / l ) r 1 r4 0 Pr( S r , i k / l ) Pr( S r | i k / l ) r 2 pk pl Pr(i k / l ) (1 )2 p p r 4 i k l r4 0 r (1 ) r 4 i If I is heterozygous, with an inbreeding coefficient i r pk f p (1 f ) p 2 Pr( S r , i k / k ) i k i k Pr( S r | i k / k ) 2 Pr(i k / k ) r pk fi pk (1 f i ) pk2 r r4 f (1 f ) p i i k r pk2 r4 fi (1 fi ) pk r4 r4 Pr( j m / n | S r , i k / l ) j i S 1, 7 S 2, 4, 6,9 j is independent of i j shares one gene with i S 3,8 S 5 j is either k/k or l/l When j is independent of i, it only follows the H,W equilibrium. When j is equivalent to i, the probability is one if m/n=k/l and zero otherwise. When j shares one allele with I, m/n and k/l must overlap with one allele and the other one has H.W distribution. Example What is the blood type of non-inbred siblings? 1 Pr( j A / B | i A / B) Pr( j A / B | S 7 , i A / B) 4 1 1 Pr( j A / B | S8 , i A / B) Pr( j A / B | S9 , i A / B) 2 4 1 1 1 1 1 *1 ( p A pB ) 2 p A pB 4 2 2 2 4 When j is independent of i, it only follows the H,W equilibrium. When j is equivalent to i, the probability is one if m/n=k/l and zero otherwise. When j shares one allele with I, m/n and k/l must overlap with one allele and the other one has H.W distribution. What is the blood type of non-inbred siblings? Pr( j A | i O / O) Pr( j A / O | i O / O) Pr( j A / A | i O / O) 1 Pr( j A / O | S7 , i O / O) 4 1 1 Pr( j A / O | S8 , i O / O) Pr( j A / O | S9 , i O / O) 2 4 ....................... 1 1 1 1 1 *0 *0 2 p 2A p A 2 p A po 4 2 4 2 4 Risk Ratios and Genetic Model Discrimination. Let us assume that each person in the population is assigned a factor of X=1 if he/she is affected by a condition and X=0 otherwise. The Prevalence of the condition is K=E(X). Given two non-inbred relatives i and j and given that i is affected, what is the probability that J is affected? KR=P(Xj=1|Xi=1( P(Xj=1,Xi=1) = P(Xj=1|Xi=1(P(Xi=1) = KRK = E(XiXj) P(Xj=1|Xi=1) = E(XiXj)/K = (cov(Xi,Xj)+K2)/K = cov(Xi,Xj)/K+K This result simply represents the fact that the extra risk for j results from the covariance of X between i and j. The risk ratio can thus be defined as: R= cov(Xi,Xj)/K2 Let us compute this covariance, and following it the risk ratio. Let us assume that a given property is defined by a single gene with multiple alleles. E ( X ) kl pk pl k l For the sake of simplicity let us normalize E(x)=0, and divide: kl k l kl ; k pk 0 k k kl pk 0 E ( X i X j ) mn kl p (m, n | k , l ) pk pl k l m n 7 ij ( k l kl ) 2 pk pl 8ij ( k l kl )( k m km ) pk pl pm k l k l m 9ij ( k l kl )( m n mn ) pk pl pm pn k l m n 7 ij 2 k 2 pk kl 2 pk pl 8ij 2 k 2 pk k l k k 1 1 2 7 ij 8ij 2 k 2 pk 7 ij kl 2 pk pl 4 2 k k l 2ij a2 7 ij d2 R Relative Type Risk Ratio M Identical Twin sa/2K +2sd/2K2 S Sibling sa2/2K +2sd2/2K2 1 First Degree sa2/2K2 2 Second Degree sa2/2K2 3 Third Degree sa2/2K2