Imperial090106 - Wellcome Trust Centre for Human Genetics

An integrated statistical
framework for mapping
quantitative traits in
mice
Richard Mott
Jonathan Flint
Wellcome Trust Centre for Human Genetics, Oxford
Richard.Mott@well.ox.ac.uk
Outline
• Introduction
• QTL Mapping
• Multiple Phenotype Heterogeneous Stock
Experiment
• Testing for Functional Variants
• Expression Data
• Future
Genetic Traits
• Quantitative (height, weight)
• Dichotomous (affected/unaffected)
• Factorial (blood group)
• Mendelian - controlled by single gene
(cystic fibrosis)
• Complex – controlled by multiple
genes*environment (diabetes, asthma)
Quantitative Trait Loci
QTL: Quantitative Trait Locus
chromosome
genes
Quantitative Trait Loci
QTL: Quantitative Trait Locus
chromosome
QTG: Quantitative Trait Gene
Quantitative Trait Loci
QTL: Quantitative Trait Locus
chromosome
QTG: Quantitative Trait Gene
QTN: Quantitative Trait Nucleotide
Map in
Humans or Animal Models ?
• Disease studied directly
• Population and environment
stratification
• Very many SNPs (1,000,000?)
required
• Hard to detect trait loci – very
large sample sizes required to
detect loci of small effect
(5,000-10,000)
• Potentially very high mapping
resolution – single gene
• Very Expensive
• Animal Model required
• Population and environment
controlled
• Fewer SNPs required (~10010,000)
• Easy to detect QTL with ~500
animals
• Poorer mapping resolution –
1Mb (10 genes)
• Relatively inexpensive
Mosaic Crosses
Inbred founders
G3
mixing
GN
chopping up
F2, diallele
F20
inbreeding
Heterogeneous Stock,
Advanced Intercross,
Random Outbreds
Recombinant
Inbred Lines
Sizes of Behavioural QTL in rodents
(% of total phenotypic variance)
30
25
Number
20
15
10
5
0
1
3
5
7
9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59
Effect size (% var)
Effect size of cloned genes
4
Number
3
2
1
0
1
3
5
7
9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59
Effect size (% var)
Mapping Resolution
• F2 crosses
– Powerful at detecting QTL
– Poor at Localisation – 20cM
– Too few recombinants
• Increase number of recombinants:
– more animals
– more generations in cross
Heterogeneous Stocks
• cross 8 inbred strains for >10 generations
Heterogeneous Stocks
• cross 8 inbred strains for >10 generations
Heterogeneous Stocks
• cross 8 inbred strains for >10 generations
0.25 cM
Multiple Phenotype QTL
Experiment
Multiple Phenotypes measured on
a Heterogeneous Stock
• 2000 HS mice (Northport, Bob Hitzeman)
84 families
40th generation
• 150 traits measured on each animal
– Standardised phenotyping protocol
– Covariates Recorded
• Experimenter
• Time/Date
• Litter
– Microchipping
Phenotypes
•
•
•
•
•
•
•
•
•
Anxiety (Conditioned and Unconditioned Tests)
Asthma (Plethysmography)
Diabetes (Glucose Tolerance Test)
Haematology
Immunology
Biochemistry
Wound Healing (Ear Punch)
Gene Expression
….others….
High throughput phenotyping facility
Neophobia
Fear Potentiated Startle
Ovalbumin sensitization
Plethysmograph
Intraperitoneal Glucose Tolerance Test
Ears
Genotyping
• 15360 SNPs genotyped by Illumina
– 2000 HS mice
– 300 HS parents
– 8 inbred HS founders
– 500 other inbreds
• www.well.ox.ac.uk/mouse/snp.selector
• 13459 SNPs successful
• 99.8% accuracy (parent-offspring)
Distribution of Marker Spacing
1200
Mean Interval (kb)
SD
Max interval
Min interval
Number of Markers
1000
800
204
231
11328
0
(chromosome X)
(9 Markers)
600
400
200
0
0
0.5
1
1.5
2
2.5
Distance (Mb)
3
3.5
4
4.5
5
LD Decay with distance
0.9
0.8
0.7
R squared
0.6
0.5
Chr 1
Chr 2
Chr 3
Chr 4
Chr 5
Chr 6
Chr 7
Chr 8
Chr 9
Chr 10
Chr 11
Chr 12
Chr 13
Chr 14
Chr 15
Chr 16
Chr 17
Chr 18
Chr 19
Chr X
0.4
0.3
0.2
0.1
0
0
5
15
10
20
Distance (MB)
99.2% marker pairs on different autosomes have R2 < 0.05.
25
Genetic Drift in HS
• 40 generations of
breeding
• Allele Frequency in
founders will drift
• 8% of genome fixed
Allele Frequency in
Founders
Allele Frequency in
HS
12.5
14.99
25
23.23
37.5
29.77
50
31.45
Analysis
• Automated analysis pipeline
– R HAPPY package
– Single Marker Association
• Each phenotype analysed independently
– Transformed to Normality, outliers removed
– Tailored set of covariates
– Linear models for most phenotypes
– Survival models for latency phenotypes
Twisted Pair Analysis of Heterogeneous Stock
chromosome
markers
alleles
1 1 2 1 1 1 2 1 11 2 2 1 2 2 1 1 1 1 2 1 1 2 111 11 2 2 1 2 1 2
• Want to predict ancestral strain from genotype
• We know the alleles in the founder strains
• Single marker association lacks power, can’t
distinguish all strains
• Multipoint analysis – combine data from neighbouring
markers
Twisted Pair Analysis of Heterogeneous Stock
chromosome
markers
alleles
•
•
•
•
1 1 2 1 1 1 2 1 11 2 2 1 2 2 1 1 1 1 2 1 1 2 111 11 2 2 1 2 1 2
Hidden Markov model HAPPY
Hidden states = ancestral strains
Observed states = genotypes
Unknown phase of genotypes
• Analyse both chromosomes simultaneously
• Twisted pair of HMMs
• Mott et al 2000 PNAS
Testing for a QTL
• piL(s,t) = Prob( animal i is descended from strains s,t at locus L)
• piL(s,t) calculated by HMM using
– genotype data
– founder strains’ alleles
• Phenotype is modelled
E(yi) = Ss,t piL(s,t)T(s,t) + mi
Var(yi) = s2
• Test for no QTL at locus L
– H0: T(s,t) are all same
– ANOVA partial F test
Genome Scan
• Additive and dominance models
• Record all peaks that exceed 5% genomewide significance,
– Threshold based on 200 permutations
– 9000 preliminary candidate QTL found
Jointly Significant QTL
• Forward selection over candidate QTL
• Test each QTL conditional on other QTL
• Rescan genome conditional on selected
QTLs to identify new QTL
• 5% genome-wide significance threshold
• Bootstrap residuals to find QTL confidence
regions
Results
• ~7 jointly significant QTL per phenotype
• 95% Confidence Interval ~ 2 Mb
• ~50% of QTL have a significant nonadditive component
• Only 3 phenotypes were explained by
single major QTL
– Most phenotypes are complex
Distribution of QTL Effects
Mean Effect size 2.7%
180
160
140
Number of QTL
120
100
80
60
40
20
0
0
5
10
15
Effect size of QTL (% Var)
20
25
Distribution of #Genes under QTL
60
50
40
#QTL
30
+20 QTL
With > 10
genes
20
10
0
01
12
23
34
45
56
#genes
6
7
78
89
10
9
10
11
%Variance Explained
% Additive Genetic Variance
100
90
% Addtive Genetic Variance
80
70
60
50
% Additive Genetic Variance
40
30
20
10
0
0
20
40
60
80
100
%Var Joint QTL
[% Additive Genetic Variance calculated using 3-generation pedigree data,
not genotypes]
Coat colour genes
albino
agouti
brown
dilute
Gene
Tyr
Asip
Tyrp
Myo5a
Chr.
7
2
4
9
Position (Mb)
149
310.14
158.4
150.8
HS Mapping Position
148.8 - 150.6
309.6 - 310.2
158.2 - 159
150.8 - 151.2
A known QTL: HDL
Wang et al, 2003
HS mapping
New QTLs: two examples
• Ear Punch Hole Area Regrowth
– wound healing
• Cue Conditioning Freeze.During.Tone
– measure of fear
Cue Conditioning
• Freeze.During.Tone: huge effect, small
chr15 number of genes
cntn1:
Contactin precursor
(Neural cell surface
protein)
Gene x Environment
Gene x Sex
• Repeat analysis looking for QTLs that
interact with
– Gender
– Litter number
– Season, Month, etc
– Experimenter
• Compare models
E(y) = m + locus + env
E(y) = m + locus * env
Gene x Environment
• 431 jointly significant GxE QTLs
–
–
–
–
–
27 gene x experimenter,
81 gene x litter number,
67 gene x age,
105 gene x study day
151 gene x season.
• 13% of variation is GxE
• 25 GxE QTLs overlapped with original joint QTL
– defined as lying within 4Mb of the peak position
• 42 GxSex QTLs
Testing for Functional Variants
• Is a SNP functional for a trait?
• Is a functional assay measured in
founders related to a trait?
– Gene expression
– DNA-Protein binding
Testing for non-Functional Variants
• Is a SNP’s pattern of variation inconsistent
with the QTL’s pattern of action ?
• Is a functional assay’s distribution
inconsistent with the QTL’s pattern of
action ?
Merge Analysis
Yalcin et al 2005 Genetics
• Require sequence of HS founders
– Determine all variants and their strain
distribution patterns (SDP)
• Don’t genotype every variant in the HS
– Instead predict genotypes in HS at all variants
based on a sparse skeleton of genotypes
Merge Analysis
• A variant v will partition the HS founder
strains into 2 or more groups, depending
on its strain distribution pattern (SDP)
• If p is functional for the trait then the strain
effects at the QTL must be identical for
strains with the same allele.
– so if merging founders according to v’s SDP
destroys significance then we reject v
Merge Analysis
Model Comparison
•
piL(s,t) = Prob( animal i is descended from strains s,t at locus L)
•
Replace strains s,t by merged pseudo-strains g,h
– Add together probabilities for strains with the same allele
– Phenotypic effect of merged strains g,h is F (g,h)
•
viL(g,h) = Prob( animal i is descended from merged strains g,h at locus L)
•
Compare fits of nested models
•
E(yi) = Ss,t piL(s,t)T(s,t) + mi
E(yi) = Sg,h viL(g,h)F(g,h) + mi
E(yi) =
mi
unmerged
merged
null
Require no significant difference between merged and unmerged models,
– and for both to be significant compared to null model
Merge Analysis
Open Field Activity, Chr 1
Merge Analysis
rgs18
Functional Merge Analysis
•
Measure functional assay on HS founders
–
–
FL(t) is value at locus L on founder s
e.g. gene expression
•
Expected value in HS is
•
If assay is related to phenotype y then
•
Compare nested models (thanks to Chris Holmes)
E(fi) = Ss,t piL(s,t)[F(s) + F(t)]
assuming additivity
E(yi) = q E(fi) + mi
E(yi) = Ss,t piL(s,t)T(s,t)
E(yi) = q Ss,t piL(s,t)[F(s) + F(t)]
E(yi) =
•
+ mi
+ mi
mi
unmerged
merged
null
Require no significant difference between merged and unmerged models,
–
and for both to be significant compared to null model
Gene Expression Data
(with Binnaz Yalcin, Jennifer Taylor)
• Illumina 40k chip
• Livers, Lungs
– 190 HS
– HS founders
Weight.GrowthRanSlope
Biochem.LDL
5
5
10
10
exp.log(Pr>F)
exp.log(Pr>F)
15
15
20
20
0
0
15
15
10
10
15
15
00 55
15
15
55
20
20
00
55
1010
1515
Freeze
Biochem.Tot.Cholesterol
Explore
Biochem.Tot.Protein
exp.log(Pr>F)
15
00 55
15
15
10
10
2020
15
15
exp.log(Pr>F)
model
difference logp
locus.log(Pr>F)
locus.log(Pr>F)
exp.log(Pr>F)
modelexp.log(Pr>F)
difference logp
55
20
20
15
15
Anx
Biochem.Sodium
locus.log(Pr>F)
locus.log(Pr>F)
Context
Biochem.Phosphorous
20
20
00
55
1010
1515
exp.log(Pr>F)
exp.log(Pr>F)
model
difference logp
15
Biochem.Urea
(Pr>F)
Biochem.Triglycerides
15
10
10
model difference logp
exp.log(Pr>F)
modelexp.log(Pr>F)
difference logp
(Pr>F)
5
exp.log(Pr>F)
exp.log(Pr>F)
0 5
0 5
00
5
model difference logp
0 5
0 5
00
0 05 5 1515
Weight.GrowthSlope
Biochem.HDL
locus.log(Pr>F)
locus.log(Pr>F)
exp.log(Pr>F)
0 5
15
0 5
15
locus.log(Pr>F)
locus.log(Pr>F)
locus.log(Pr>F)
locus.log(Pr>F)
0
0
locus.log(Pr>F)
locus.log(Pr>F)
exp.log(Pr>F)
2020
Future Work
Extensions to basic model
•
•
•
•
Generalised linear models
Multivariate data
Mixture Models, EM (Chris Holmes)
Family Effects, Variance Components,
REML (Peter Visscher, Allan McRae)
• Epistasis
• Pleiotropy
Mixture Models
(with Chris Holmes, Sascha Antonyuk)
• piL(s,t) = Prob( animal i is descended from
strains s,t at locus L)
• Expectation Model
E(yi)= Ss,t piL(s,t)T(s,t) + mi
Var(yi) = s2
• Mixture model:
yi ~ Ss,t piL(s,t) f([yi - mi -T(s,t)]/s)
e.g. f(t) is standard Normal density
What do we want?
• Biological:
– Joint QTL containing the functional genes and that lead to their
identification
– But genetic mapping finds the variants not the genes
• Statistical:
– Multi-locus QTL selection algorithms that predict the phenotype
of new animals accurately
– Model-Averaging: no best choice?
– Ghost QTL
• Are statistical QTL algorithms consistent?
– Do they find the biological QTL given a large enough sample
size?
– Simulations of multiple QTL models indicate mapping accuracy
declines as complexity increases [Valdar et al 2006 Genetics in press]
Conclusions
•
•
•
•
Complexity of analysis
No definitive analysis
Gene x Environment
Mouse Systems Biology
Work of many hands
Carmen Arboleda-Hitas
Amarjit Bhomra
Stephanie Burnett
Peter Burns
Richard Copley
Stuart Davidson
Simon Fiddy
Jonathan Flint
Polinka Hernandez
Sue Miller
Richard Mott
Chela Nunez
Gemma Peachey
Sagiv Shifman
Leah Solberg
Amy Taylor
Martin Taylor
William Valdar
Binnaz Yalcin
Dave Bannerman
Shoumo Bhattacharya
Bill Cookson
Rob Deacon
Dominique Gauguier
Doug Higgs
Tertius Hough
Paul Klenerman
Nick Rawlins
Jennifer Taylor
Chris Holmes
Project funded by
The Wellcome Trust, UK