# statgen5

```1) heterozygosity
 Several measures of heterozygosity exist. The value of these
measures will range from zero (no heterozygosity) to nearly
1.0 (for a system with a large number of equally frequent
alleles). We will focus primarily on expected heterozygosity
(HE, or gene diversity, D). The simplest way to calculate it for
a single locus is as:
H  1   pi2
where pi is the frequency of the ith of k alleles
 . If we want the gene diversity over several loci we need
double summation and subscripting as follows
H  1   pij2
i
j
 In H.W heterozygosity is given by 2pq. The rest of the
2
2
p
+
q
expression (
) is the homozygosity.
 The heterozgosity for a two-allele system is described by a
concave down parabola that starts at zero (when p = 0) goes to
a maximum at p = 0.5 and goes back to zero when p = 1. In
fact for any multi-allelic system, heterozygosity is greatest
when
 p1 = p2 = p3 =
….pk
 The maximum heterozygosity for a 10-allele system comes
when each allele has a frequency of 0.1 H then equals 0.9.
2) Indentity
 G1 and G2 are identical by descent (i.b.d) if they are physical
copies of the same ancestor, or one of the other.
 G1 and G2 are identical by state (i.b.s) if they represent the
same allele.
 The kinship between two relatives fij is the probability that
random gene from autosomal loci in I and j are i.b.d.
 The interbreeding coefficient is the probability that his or her
two genes from autosomal loci are i.b.d
 Every mutation creates a new allele
 Identity in state = identity by descent (IBD)
 F=1-H (inbreeding coefficient) is probability of IBD = 1/2.
3) Back to genetic drift
 Assume a population size of N, therefore 2N alleles in
population. Imagine eggs and sperm released randomly into
environment (e.g. sea)
 What is the probability of 2 gametes drawn randomly having the same allele?
 Therefore, after 1 generation the level of inbreeding is F1 =
1/2N
 After t generations the probability is
1 

Ft  1  1 

 2N 
Ft 
1 
1 
 1 
 Ft 1
2N  2N 
t

 Genetic drift will make initially identical population different
 Eventually, each population will be fixed for a different allele
 If there are very many populations, the proportion of
populations fixed for each allele will correspond to the initial
frequency of the allele
 Small populations will get different more rapidly





The effective population size is determined by
Large variation in the number of offspring
Overlapping generation
Fluctuations in population size
Unequal numbers of males and females contributing to
reproduction
4) Founders effect
Population
Costa Rica
Finland
Hutterites
Japan
Iceland
Newfoundland
Quebec
# of
found
ers
4,000
500
80
1,000
25,000
25,000
2,500
# of
generati
ons
12
80-100
14
80-100
40
16
12-16
Current size
2,500,000
5,000,000
36,000
120,000,000
300,000
500,000
6,000,000
Sardinia
500
400
1,660,000
5) Coalesence







Simplification: 0, 1 or 2 offspring
Coalesce: have the same parent
Probability to coalesce: 1/N
Probability Not to coalesce: 1 – 1/N
t generations:
(1-1/N)t
Average time to coalesce for 2 genes: N
For the whole population: 2N
6) Genetic drift and mutation
Ft 
1
1
2
2
1     1   1    Ft 1
2N
 2N 

 Probability of neither of 2 alleles being mutated is (1-)2
1
1
2
2
1     1   1    Ft 1
2N
 2N 
1
Fˆ  Ft  Ft 1 
1  4N 
Ft 

 If one also includes gene flow
 FT = [1/2N + (1 - 1/2N) * FT-1] * (1 – m- μ)2
7) Balance between Mutation and selection.
 Mutations can provide a balancing force to selection.
 Let us assume a mutation rate of from A2 to A1. The
dynamics equation is:
(1-r)p 2 +pq
P=
-p+(1- )*q
W

 An equilibrium is obtained when
pq+(1-s)q 2
q=
 1+ (1-s)/s
W(1- )



8) How to compute kinship
 fAC is the coancestry of A with C etc., i.e. the probability of 2
gametes taken at random, 1 from A and one from C, being
IBD.
 The inbreeding is thus ,- fAA be the probability of 2 gametes
taken at random from A being IBD.
A-B C- D
|
|
P - Q
|
X
1
1
1
1
FX  PQ   AD   AC  BC  BD
4
4
4
4
A- B
|
|
P - Q
|
X
xx=0.5(1+Fx)=3/4
9)
General algorithm
1
1
2
3
4
5
6
 If i originates from k and l ij= ji = &frac12;(jk + jl)
 If i originates from k and l ii= &frac12;+ kl

6
5
4
3
2
1
10)
&frac14;
&frac14;
&frac14;
&frac14;
0
&frac12;
1
&frac14;
&frac14;
&frac14;
&frac14;
&frac12;
0
2
3/8
3/8
&frac14;
&frac12;
&frac14;
&frac14;
3
3/8
3/8
&frac12;
&frac14;
&frac14;
&frac14;
4
3/8
5/8
3/8
3/8
&frac14;
&frac14;
5
5/8
3/8
3/8
3/8
&frac14;
&frac14;
6
Identity coefficients
 We can now summarize the kinship coefficient of some basic
family relations:
Relation

&frac14;
1/8
ParentOffspring
Half Sibling
&frac14;
Full Sibling
1/16
First
Cousins
Double First
Cousins
Second
Cousins
UncleNephew
1/8
1/64
1/8
11)
Detailed Identity States
Allele 1
Allele 2
I
J
 
 
 
 
 
 
 
 
 S3=S*2US*2
 S5=S*4US*5
 S7=S*9US*12
 S8=S*10
S*11 U S*13US*14





1 , 2 , 3 , 4 are 0, when i is not inbred.
1 , 2 , 5 , 6 are 0, when j is not inbred.
1 , 3 , 5 , 7 and 8 are 0, when i and j are unrelated.
ji= 1+1/2(3 + 5 + 7)+1/4 8
12)
1
2
3

9
8
7
Relation
&frac14;
0
1
0
Parent-Offspring
1/8
&frac12;
&frac12;
0
Half Sibling
&frac14;
&frac14;
&frac12;
1/4
Full Sibling
1/16
&frac34;
&frac14;
0
First Cousins
1/8
9/16
6/16
1/16
Double First Cousins
1/64
15/16
1/16
0
Second Cousins
1/8
&frac12;
&frac12;
0
Uncle-Nephew
Genotype prediction.
What is the probability that i has a given genotype, given the genotype of j ?
For example, If my uncle has a genetic disease, what is the probability that I
will also have it?
What are the probabilities of brothers from inbred parents to be homozygous
for a disease causing gene?
If I is heterozygous, with an inbreeding coefficient i
9
Pr( j  m / n | i  k / l )   Pr( j  m / n | S r , i  k / l ) * Pr( S r | i  k / l )
r 1
r4
0
Pr( S r , i  k / l ) 
Pr( S r | i  k / l ) 
   r 2 pk pl
Pr(i  k / l )
 (1   )2 p p r  4
i
k l

r4
0


r
 (1   ) r  4
i

If I is heterozygous, with an inbreeding coefficient i
 r pk

 f p  (1  f ) p 2
Pr( S r , i  k / k )  i k
i
k
Pr( S r | i  k / k ) 

2
Pr(i  k / k )
 r pk

 fi pk  (1  f i ) pk2
r

r4
 f  (1  f ) p
 i
i
k

 r pk2

r4
 fi  (1  fi ) pk
r4
r4
Pr( j  m / n | S r , i  k / l ) 
j i
 S  1, 7
 S  2, 4, 6,9 j is independent of i


j shares one gene with i
S  3,8
S  5
j is either k/k or l/l
 When j is independent of i, it only follows the H,W equilibrium.
 When j is equivalent to i, the probability is one if m/n=k/l
 and zero otherwise.
 When j shares one allele with I, m/n and k/l must overlap with
 one allele and the other one has H.W distribution.
Example





What is the blood type of non-inbred siblings?
1
Pr( j  A / B | i  A / B)  Pr( j  A / B | S 7 , i  A / B)
4
1
1
 Pr( j  A / B | S8 , i  A / B)  Pr( j  A / B | S9 , i  A / B)
2
4
1
1 1
1
1
 *1  ( p A  pB )  2 p A pB
4
2 2
2
4
When j is independent of i, it only follows the H,W equilibrium.
When j is equivalent to i, the probability is one if m/n=k/l and zero otherwise.
When j shares one allele with I, m/n and k/l must overlap with one allele and the
other one has H.W distribution.
What is the blood type of non-inbred siblings?
Pr( j  A | i  O / O)  Pr( j  A / O | i  O / O)
 Pr( j  A / A | i  O / O)
1
 Pr( j  A / O | S7 , i  O / O)
4
1
1
 Pr( j  A / O | S8 , i  O / O)  Pr( j  A / O | S9 , i  O / O)
2
4
.......................
1
1
1
1
1
 *0  *0  2 p 2A  p A  2 p A po
4
2
4
2
4
Risk Ratios and Genetic Model
Discrimination.
 Let us assume that each person in the population is assigned a factor of X=1 if
he/she is affected by a condition and X=0 otherwise.
 The Prevalence of the condition is K=E(X).
 Given two non-inbred relatives i and j and given that i is affected, what is the
probability that J is affected?
 KR=P(Xj=1|Xi=1(
 P(Xj=1,Xi=1) = P(Xj=1|Xi=1(P(Xi=1) = KRK = E(XiXj)
 P(Xj=1|Xi=1) = E(XiXj)/K = (cov(Xi,Xj)+K2)/K = cov(Xi,Xj)/K+K
 This result simply represents the fact that the extra risk for j results from the
covariance of X between i and j.
 The risk ratio can thus be defined as:
 R= cov(Xi,Xj)/K2
 Let us compute this covariance, and following it the risk ratio.
 Let us assume that a given property is defined by a single gene with multiple
alleles.
E ( X )   kl pk pl
k
l
 For the sake of simplicity let us normalize E(x)=0, and divide:
kl   k   l   kl ;   k pk  0

k
k
kl
pk  0
E ( X i X j )    mn kl p (m, n | k , l ) pk pl
k
l
m
n
  7 ij  ( k   l   kl ) 2 pk pl  8ij  ( k   l   kl )( k   m   km ) pk pl pm
k
l
k
l
m
 9ij  ( k   l   kl )( m   n   mn ) pk pl pm pn
k
l
m
n




  7 ij  2  k 2 pk    kl 2 pk pl   8ij  2  k 2 pk 
k
l
 k

 k

1
1

 2   7 ij  8ij  2  k 2 pk   7 ij   kl 2 pk pl
4
2
 k
k
l
 2ij a2   7 ij d2
R
Relative Type
Risk Ratio
M
Identical Twin
sa/2K +2sd/2K2
S
Sibling
sa2/2K +2sd2/2K2
1
First Degree
sa2/2K2
2
Second Degree
sa2/2K2
3
Third Degree
sa2/2K2
```