HGEN502, 2011
Hermine H. Maes
1/18: Course introduction; Introduction to
Quantitative Genetics & Genetic Model Building
1/20: Study Design and Genetic Model Fitting
1/25: Basic Twin Methodology
1/27: Advanced Twin Methodology and Scope of
Genetic Epidemiology
2/1: Quantitative Genetics Problem Session
Historical Background
Genetical Principles
Genetic Parameters: additive, dominance
Biometrical Model
Statistical Principles
Basic concepts: mean, variance, covariance
Path Analysis
Likelihood
Analysis of patterns and mechanisms underlying variation in continuous traits to resolve and identify their genetic and environmental causes
Continuous traits have continuous phenotypic range; often polygenic & influenced by environmental effects
Ordinal traits are expressed in whole numbers; can be treated as approx discontinuous or as threshold traits
Some qualitative traits; can be treated as having underlying quantitative basis, expressed as a threshold trait (or multiple thresholds)
Mendelian Disorders
Single gene, highly penetrant, severe, small % affected (e.g., Huntington’s Disease)
Chromosomal Disorders
Insertions, deletions of chromosomal sections, severe, small % affected (e.g., Down’s Syndrome)
Complex Traits
Multiple genes (of small effect), environment, large % population, susceptibility – not destiny (e.g., depression, alcohol dependence, etc)
Gregor Mendel (1822-1884): Mathematical rules of particulate inheritance (“Mendel’s Laws”)
Charles Darwin (1809-1882): Evolution depends on differential reproduction of inherited variants
Francis Galton (1822-1911): Systematic measurement of family resemblance
Karl Pearson (18571936): “Pearson
Correlation”; graduate student of Galton
Pearson and Lee’s diagram for measurement of “span”
(finger-tip to finger-tip distance)
From Pearson and Lee (1903) p.378
From Pearson and Lee (1903) p.387
© Lindon Eaves, 2009
Family Studies
Does the trait aggregate in families?
The (Really!) Big Problem: Families are a mixture of genetic and environmental factors
Twin Studies
Galton’s solution: Twins
One (Ideal) solution: Twins separated at birth
But unfortunately MZA’s are rare
Easier solution: MZ & DZ twins reared together
Minnesota Study of Twins Reared Apart (T. Bouchard et al, 1979
>100 sets of reared-apart twins from across the US & UK
All pairs spent formative years apart (but vary tremendously in amount of contact prior to study)
56 MZAs participated
Monozygotic (MZ; “identical”): result from fertilization of a single egg by a single sperm; share 100% of genetic material
Dizygotic (DZ, “fraternal” or “nonidentical”): result from independent fertilization of two eggs by two sperm; share on average 50% of their genes
MZs share 100% genes, DZs (on avg) 50%
Both twin types share 100% environment
If rMZ > rDZ, then genetic factors are important
If rDZ > ½ rMZ, then growing up in the same home is important
If rMZ < 1, then non-shared environmental factors are important
For MZs, appears to be random
For DZs,
Increases with mother’s age (follicle stimulating hormone, FSH, levels increase with age)
Hereditary factors (FSH)
Fertility treatment
Rates of twins/multiple births are increasing, currently
~3% of all births
100% of DZ twins are dichorionic
~1/3 of MZ twins are dichorionic and
~2/3 are monochorionic
-2
-7
-12
8
3
Virginia Twin Study of Adolescent Behavioral Development
Scatterplot for corrected MZ stature Scatterplot for age and sex corrected stature in DZ twins
20
13 r=0.924
10
0
-10 r=0.535
-10
-20
-5 0
HTDEV1
5 10 -16 -11 -6 -1
HTDEV1
4
MZ Stature DZ Stature
9 14
© Lindon Eaves, 2009
1918: On the Correlation Between
Relatives on the Supposition of
Mendelian Inheritance
1921: Introduced concept of
“likelihood”
1930: The Genetical Theory of
Natural Selection
1935: The Design of Experiments
Fisher developed mathematical theory that reconciled Mendel’s work with Galton and Pearson’s correlations
Continuous variation caused by lots of genes (polygenic inheritance)
Each gene followed Mendel’s laws
Environment smoothed out genetic differences
Genes may show different degrees of dominance
Genes may have many forms (multiple alleles)
Mating may not be random (assortative mating)
Showed that correlations obtained by Pearson & Lee were explained well by polygenic inheritance
[“Mendelian” Crosses with Quantitative Traits]
Manuel Ferreira, Shaun Purcell
Pak Sham, Lindon Eaves
Building a Genetic Model
Revisit common genetic parameters such as allele frequencies, genetic effects, dominance, variance components, etc
Use these parameters to construct a biometrical genetic model
Model that expresses the:
(1) Mean
(2) Variance
(3) Covariance between individuals for a quantitative phenotype as a function of genetic parameters.
Genetic Concepts
Population level
Allele and genotype frequencies
Transmission level
Mendelian segregation
Genetic relatedness
Phenotype level
Biometrical model
Additive and dominance components
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G G
G G
P P
Population level
1. Allele frequencies
A single locus, with two alleles
- Biallelic / diallelic
- Single nucleotide polymorphism, SNP
Alleles A and a
- Frequency of A is p
- Frequency of a is q = 1 – p
A
Every individual inherits two alleles
- A genotype is the combination of the two alleles
- e.g. AA , aa (the homozygotes) or Aa (the heterozygote)
A a a
Population level
2. Genotype frequencies (Random mating)
A ( p )
Allele 1 a ( q )
A ( p ) a ( q )
AA ( p 2 ) aA ( qp )
Aa ( pq ) aa ( q 2 )
Hardy-Weinberg Equilibrium frequencies
P ( AA ) = p 2
P ( Aa ) = 2pq
P ( aa ) = q 2 p 2 + 2pq + q 2 = 1
Transmission level
Mendel’s experiments
Pure Lines
AA aa
F1
Intercross
Aa Aa
AA Aa Aa
3:1 Segregation Ratio aa
Transmission level
F1
Aa
Back cross
Pure line aa
Aa aa
1:1 Segregation ratio
Transmission level
Pure Lines
AA
F1
Intercross
AA
Aa aa
Aa
Aa Aa
3:1 Segregation Ratio aa
Transmission level
F1
Aa
Back cross
Pure line aa
Aa aa
1:1 Segregation ratio
Transmission level
Mendel’s law of segregation
Father
( A
1
A
2
)
A
1
( ½ )
A
2
( ½ )
A
3
(
Mother ( A
3
A
4
)
½ ) A
4
Segregation, Meiosis
( ½ )
Gametes
A
1
A
A
2
A
3
3
(
(
¼
¼ )
) A
1
A
4
( ¼ )
A
2
A
4
( ¼ )
Phenotype level
1. Classical Mendelian traits
Dominant trait ( D - presence, R - absence)
AA , Aa D
aa R
Recessive trait ( D - absence, R - presence)
AA , Aa D
aa R
Codominant trait ( X, Y, Z )
AA X
Aa Y
aa Z
Phenotype level
2. Dominant Mendelian inheritance
Mother ( Dd )
Father
( Dd )
D ( ½ ) d ( ½ )
D ( ½ )
D D ( ¼ ) d D ( ¼ ) d ( ½ )
D d ( ¼ ) d d ( ¼ )
Phenotype level
3. Dominant Mendelian inheritance with incomplete penetrance and phenocopies
Mother ( Dd )
Father
( Dd )
D ( ½ ) d ( ½ )
D ( ½ )
D D ( ¼ ) d D ( ¼ ) d ( ½ )
D d (
¼
) d d ( ¼ )
Incomplete penetrance
Phenocopies
Phenotype level
4. Recessive Mendelian inheritance
Mother ( Dd )
Father
( Dd )
D ( ½ ) d ( ½ )
D ( ½ )
D D ( ¼ ) d D ( ¼ ) d ( ½ )
D d ( ¼ ) d d ( ¼ )
Phenotype level
Two kinds of differences
Continuous
Graded, no distinct boundaries e.g. height, weight, blood-pressure, IQ, extraversion
Categorical
Yes/No
Normal/Affected (Dichotomous)
None/Mild/Severe (Multicategory)
Often called “threshold traits” because people “affected” if they fall above some level of a measured or hypothesized continuous trait
Phenotype level
Polygenic Traits
Mendel’s Experiments in Plant Hybridization , showed how discrete particles (particulate theory of inheritance) behaved mathematically: all or nothing states (round/wrinkled, green/yellow), “Mendelian” disease
How do these particles produce a continuous trait like stature or liability to a complex disorder?
1 Gene
3 Genotypes
3 Phenotypes
2 Genes
9 Genotypes
5 Phenotypes
3 Genes
27 Genotypes
7 Phenotypes
4 Genes
81 Genotypes
9 Phenotypes
Phenotype level
Quantitative traits
.072
.128205
g==-1
0
-3.90647
0
.128205
g==1 qt
0
-3.90647
.128205
g==-1
.128205
.128205
0 g==-1 g==0 g==1
0
0 g==1
.128205
2.7156
0
-3.90647
2.7156
qt
Histograms by g g==0
AA g==0
-3.90647
Aa
-3.90647
qt
Histograms by g aa
2.7156
qt
Histograms by g
2.7156
2.7156
Phenotype level
P ( X )
Aa aa AA aa
-a m -a
Biometric Model m
Aa d m +d
X
AA
+a m +a
Genotypic effect
Genotypic means
Very Basic Statistical Concepts
1. Mean ( X )
2. Variance ( X )
3. Covariance ( X,Y )
4. Correlation ( X,Y )
Mean, variance, covariance
1. Mean ( X )
)
n
i x i f
i
Mean, variance, covariance
2. Variance ( X )
(
i i
i
Mean, variance, covariance
3. Covariance ( X,Y )
Y
X
Y i
X i
Y i
, i
i
1
Mean, variance, covariance (& correlation)
4. Correlation ( X,Y ) r x , y
cov x , y s x s y
Biometrical model for single biallelic QTL
Biallelic locus
- Genotypes: AA, Aa, aa
- Genotype frequencies: p 2 , 2pq, q 2
Alleles at this locus are transmitted from P-O according to
Mendel’s law of segregation
Genotypes for this locus influence the expression of a quantitative trait X (i.e. locus is a QTL)
Biometrical genetic model that estimates the contribution of this QTL towards the (1) Mean , (2) Variance and (3) Covariance between individuals for this quantitative trait X
Biometrical model for single biallelic QTL
Biallelic locus
- Genotypes: AA, Aa, aa
- Genotype frequencies: p 2 , 2pq, q 2
Alleles at this locus are transmitted from P-O according to
Mendel’s law of segregation
Genotypes for this locus influence the expression of a quantitative trait X (i.e. locus is a QTL)
Biometrical genetic model that estimates the contribution of this QTL towards the (1) Mean , (2) Variance and (3) Covariance between individuals for this quantitative trait X
Biometrical model for single biallelic QTL
1. Contribution of the QTL to the Mean ( X )
i x f
Genotypes
Effect, x
Frequencies, f ( x )
AA a p 2
Aa d
2pq aa
-a q 2
Mean ( X ) = a ( p 2 ) + d ( 2pq ) – a ( q 2 ) = a ( p q ) + 2 pq d
Biometrical model for single biallelic QTL
2. Contribution of the QTL to the Variance ( X )
i
Genotypes
Effect, x
Frequencies, f ( x )
AA a p 2
Aa d
2pq aa
-a q 2
Var ( X ) = ( a m ) 2 p 2 + ( d m ) 2 2pq + (a m ) 2 q 2
= V
QTL
Broad-sense heritability of X at this locus = V
QTL
/ V
Total
Broad-sense total heritability of X = Σ V
QTL
/ V
Total
Biometrical model for single biallelic QTL
Var ( X ) = ( a m ) 2 p 2 + ( d m ) 2 2pq + (a m ) 2 q 2
= 2 pq [ a +( q p ) d ] 2 + ( 2pq d ) 2
= V
AQTL
+ V
DQTL
Additive effects: the main effects of individual alleles
Dominance effects: represent the interaction between alleles aa Aa m
AA d = 0
–a d +a
Biometrical model for single biallelic QTL
Var ( X ) = ( a m ) 2 p 2 + ( d m ) 2 2pq + (a m ) 2 q 2
= 2 pq [ a +( q p ) d ] 2 + ( 2pq d ) 2
= V
AQTL
+ V
DQTL
Additive effects: the main effects of individual alleles
Dominance effects: represent the interaction between alleles aa Aa AA m d > 0
–a d +a
Biometrical model for single biallelic QTL
Var ( X ) = ( a m ) 2 p 2 + ( d m ) 2 2pq + (a m ) 2 q 2
= 2 pq [ a +( q p ) d ] 2 + ( 2pq d ) 2
= V
AQTL
+ V
DQTL
Additive effects: the main effects of individual alleles
Dominance effects: represent the interaction between alleles aa Aa AA m d < 0
–a d +a
Biometrical model for single biallelic QTL
+a d m
–a aa
Aa
AA
Var ( X ) = Regression Variance + Residual Variance
= Additive Variance + Dominance Variance
Biometrical model for single biallelic QTL
Var ( X ) = 2 pq [ a +( q p ) d ] 2 + ( 2 pq d ) 2
V
AQTL
Demonstrate
+ V
DQTL
2A. Average allelic effect
2B. Additive genetic variance
NOTE: Additive genetic variance depends on allele frequency
& additive genetic value as well as dominance deviation p a d
Additive genetic variance typically greater than dominance variance
1/3
Biometrical model for single biallelic QTL
2A. Average allelic effect ( α)
The deviation of the allelic mean from the population mean
Mean ( X )
Allele a
?
Population a ( p q ) + 2 pq d
Allele A
?
A a
AA a p
Aa d q p a
α a
α
A
A aa
-a q
Allelic mean Average allelic effect ( α) a p + d q d p a q q
-p
(
( a a
+
+ d d
(
( q q
-
p p
))
))
Biometrical model for single biallelic QTL
2/3
Denote the average allelic effects
α
A
α a
= q ( a + d ( q p ))
= -p ( a + d ( q p ))
If only two alleles exist, we can define the average effect of allele substitution
α = α
A
α a
α = ( q -( -p ))( a + d ( q p )) = ( a + d ( q p ))
Therefore:
α
A
α a
= q α
= -p α
Biometrical model for single biallelic QTL
2A. Average allelic effect ( α)
2B. Additive genetic variance
The variance of the average allelic effects
AA
Aa aa
Freq.
p 2
2pq q 2
Additive effect
2 α
A
α
A
+ α a
2 α a
= 2 q α
= ( q p ) α
= -2 p α
α
A
α a
= q
α
= -p α
3/3
V
AQTL
= ( 2 q α ) 2 p 2 + (( q p ) α ) 2 2pq + (2 p α ) 2 q 2
= 2 pq α 2
= 2 pq [ a + d ( q p )] 2 d = 0, V
A
QTL
= 2 pq a 2 p = q , V
AQTL
= ½ a 2
Biometrical model for single biallelic QTL
1. Contribution of the QTL to the Mean ( X )
2. Contribution of the QTL to the Variance ( X )
2A. Average allelic effect ( α)
2B. Additive genetic variance
3. Contribution of the QTL to the Covariance ( X,Y )
Biometrical model for single biallelic QTL
3. Contribution of the QTL to the Cov ( X,Y )
( i
y
AA ( a m )
Aa ( d m ) aa ( -a m )
AA ( a m )
( a m ) 2
( a m ) ( d m )
( a m ) ( -a m )
Aa ( d m )
( d m ) 2
( d m ) ( -a m ) aa ( -a m )
( -a m ) 2
Biometrical model for single biallelic QTL
3A. Contribution of the QTL to the Cov ( X,Y) – MZ twins
( i
y
AA (a m )
Aa (d m) aa (-a m)
AA (a m ) Aa (d m) p 2 (a m ) 2
0 (a m) (d m)
0 (a m) (-a m)
2pq (d m ) 2
0 (d m ) (-a m)
Covar ( X i
,X j
) = ( a m ) 2 p 2 + ( d m ) 2 2pq + (a m ) 2 q 2
= 2 pq [ a +( q p ) d ] 2 + ( 2pq d ) 2 q 2 aa (-a
(-a m)
-
= V
AQTL
+ V
DQTL
2
m)
Biometrical model for single biallelic QTL
3B. Contribution of the QTL to the Cov ( X,Y ) – Parent-Offspring
AA (a m )
Aa (d m) aa (-a m)
AA (a m ) p 3 (a m ) 2 p 2 q (a m) (d m)
0 (a m) (-a m)
Aa (d m) pq (d m ) 2 pq 2 (d m ) (-a m) aa (-a q 3 (-a m) 2
m)
Biometrical model for single biallelic QTL e.g. given an AA father, an AA offspring can come from either
AA x AA or AA x Aa parental mating types
AA x AA will occur p 2 × p 2 = p 4 and have AA offspring Prob()=1
AA x Aa will occur p 2 × 2pq = 2p 3 q and have AA offspring Prob()=0.5
and have Aa offspring Prob()=0.5
therefore, P( AA father & AA offspring) = p 4 + p 3 q
= p 3 (p+q)
= p 3
Biometrical model for single biallelic QTL
3B. Contribution of the QTL to the Cov ( X,Y ) – Parent-Offspring
AA (a m )
Aa (d m) aa (-a m)
AA (a m ) p 3 (a m ) 2 p 2 q (a m) (d m)
0 (a m) (-a m)
Aa (d m) pq (d m ) 2 pq 2 (d m ) (-a m) aa (-a q 3 (-a m) 2
m)
Cov ( X i
,X j
)
=
= ( a m ) 2 p 3 + … + (a m ) 2 q 3 pq [ a +( q p ) d ] 2 = ½ V
AQTL
Biometrical model for single biallelic QTL
3C. Contribution of the QTL to the Cov ( X,Y ) – Unrelated individuals
AA (a m ) Aa (d m)
AA (a m )
Aa (d m) aa (-a m) p 4 (a m ) 2
2p 3 q (a m) (d m) 4p 2 q 2 (d m ) 2 p 2 q 2 (a m) (-a m) 2pq 3 (d m ) (-a m) aa (-a q 4 (-a m) 2
m)
Cov ( X i
,X j
) = ( a m ) 2 p 4 + … + (a m ) 2 q 4
= 0
Biometrical model for single biallelic QTL
3D. Contribution of the QTL to the Cov ( X,Y ) – DZ twins and full sibs
¼ genome
¼ genome ¼ genome ¼ genome
# identical alleles inherited from parents
2 1
(father)
1
(mother)
0
¼ (2 alleles) + ½ (1 allele) + ¼ (0 alleles)
MZ twins P-O Unrelateds
Cov ( X i
,X j
) = ¼ Cov(MZ) + ½ Cov(P-O) + ¼ Cov(Unrel)
= ¼( V
AQTL
+ V
DQTL
) + ½ (½ V
AQTL
) + ¼ (0)
= ½ V
AQTL
+ ¼ V
DQTL
Summary
Biometrical model predicts contribution of a QTL to the mean, variance and covariances of a trait
1 QTL Var ( X ) = V
AQTL
+ V
DQTL
Cov ( MZ ) = V
AQTL
+ V
DQTL
Cov ( DZ ) = ½ V
AQTL
+ ¼ V
DQTL
Multiple QTL Var ( X ) = Σ( V
AQTL
) + Σ( V
DQTL
) = V
A
+ V
D
Cov ( MZ ) = Σ( V
AQTL
) + Σ( V
DQTL
) = V
A
+ V
D
Cov ( DZ ) = Σ(½ V
A
QTL
) + Σ(¼ V
D
QTL
) = ½ V
A
+ ¼ V
D
Summary
Biometrical model underlies the variance components estimation performed in Mx
Var ( X ) = V
A
+ V
D
+ V
E
Cov ( MZ ) = V
A
+ V
D
Cov ( DZ ) = ½ V
A
+ ¼ V
D
HGEN502, 2011
Hermine H. Maes
Write equations for means, variances and covariances of different type of relative or
Draw path diagrams for easy derivation of expected means, variances and covariances and translation to mathematical formulation
Allows us to represent linear models for the relationship between variables in diagrammatic form, e.g. a genetic model; a factor model; a regression model
Makes it easy to derive expectations for the variances and covariances of variables in terms of the parameters of the proposed linear model
Permits easy translation into matrix formulation as used by statistical programs
Squares or rectangles denote observed variables
Circles or ellipses denote latent (unmeasured) variables
Upper-case letters are used to denote variables
Lower-case letters (or numeric values) are used to denote covariances or path coefficients
latent variables observed variables
Single-headed arrows or paths ( –>) are used to represent causal relationships between variables under a particular model - where the variable at the tail is hypothesized to have a direct influence on the variable at the head
Double-headed arrows (< –>) represent a covariance between two variables, which may arise through common causes not represented in the model. They may also be used to represent the variance of a variable
double-headed arrows single-headed arrows
Trace backwards, change direction at a 2headed arrow, then trace forwards (implies that we can never trace through two-headed arrows in the same chain).
The expected covariance between two variables, or the expected variance of a variable, is computed by multiplying together all the coefficients in a chain, and then summing over all possible chains.
Cov AB = kl + mqn + mpl
Cov AB =
Cov BC =
Cov AC =
Var A =
Var B =
Var C =
Cov AB = kl + mqn + mpl
Cov BC = no
Cov AC = mqo
Var A = k 2 + m 2 + 2 kpm
Var B = l 2 + n 2
Var C = o 2
MZ Twins Reared Together
DZ Twins Reared Together
MZ Twins Reared Apart
DZ Twins Reared Apart
Parents & Offspring
Twin 1 Expected
Covariance
Twin 1
Twin 2 a 2+ c 2+ e 2 variance a 2+ c 2 covariance
Twin 2 a 2+ c 2 a 2+ c 2+ e 2
Twin 1 Expected
Covariance
Twin 1 a 2+ c 2+ e 2
Twin 2 .5a
2+ c 2
Twin 2
.5a
2+ c 2 a 2+ c 2+ e 2