Gonzalez-Martinez Association Genetics

advertisement
Association genetics in forest trees
Santiago C. González-Martínez
Center of Forest Research, INIA, PO Box 8111 28080 Madrid,
Spain
santiago@inia.es
SNP 1
AT
AT
TT
AA
AA
AT
AA
TT
TT
TT
AT
AA
AA
TT
AT
AT
AT
AA
AA
TT
SNP 2
GT
GT
TT
GG
GG
TT
GT
GT
TT
TT
GG
GT
TT
TT
GG
GT
GG
GG
GG
GT
SNP 3
CC
GG
GG
CG
CG
CG
CC
GG
GG
CC
CG
CG
CG
GG
CC
CG
CC
CC
GG
GG
f(AA)=0.35
f(AT)=0.35
f(TT)=0.30
f(GG)=0.35
f(GT)=0.35
f(GG)=0.30
f(CC)=0.30
f(CG)=0.45
f(GG)=0.25
Trait 1
u(AA)=32/7=4.57
u(AT)=23/6=3.83
u(TT)=51/6=8.50
Trait 1
u(GG)=28/7=4.00
u(GT)=42/7=6.00
u(TT)=41/6=6.83
Trait 1
u(CC)=6/6=1.00
u(GC)=35/7=5.00
u(GG)=70/7=10.00
Trait 2
u(AA)=14/7=2.00
u(AT)=33/7=4.71
u(TT)=49/6=8.16
Trait 2
u(GG)=24/7=3.43
u(GT)=33/7=4.71
u(TT)=41/6=6.83
Trait 2
u(CC)=28/6=4.66
u(GC)=24/7=3.43
u(GG)=44/7=6.28
Trait 1 Trait 2
1
3
10
4
10
7
5
1
5
3
5
5
1
2
10
8
10
7
1
10
5
6
5
4
5
1
10
9
1
6
5
4
1
5
1
2
10
1
10
8
What is association
genetics?
Linkage versus Association: finding the molecular
variation underlying complex traits
A favourable mutation
X
X
several generations
X
X
X
X
X
LG
Mapping pedigree
Natural population
(= multiple genetic
backgrounds)
For which organisms genetic association is a
promising approach?
• Relatively undomesticated species with outbred mating systems
and large natural populations.
• Organisms with long life-spam, where generating pedigrees
would take several years.
• Organisms (such as humans) where artificial crosses are not
possible or are difficult to obtain (incompatible species).
• In plants: opportunity to test for genetic association of multiple
traits and phenotypes: long-term common garden experiments
(including clonal tests  high precision in the estimation of
phenotypes).
The ‘immortal’ association population
Linkage disequilibrium and association
0.5
a)
Picea abies all
0.45
P. abies without Romania
0.4
Baltico-Nordic domain
0.35
Alpine domain
r
2
0.3
b)
0.25
Heuertz et al. 2006
Genetics
0.2
0.15
0.1
0.05
0
0
500
1000
1500
2000
2500
3000
3500
distance (base pairs)
c)
Rapid decay of LD in conifers, but LD
Stumpf & McVean (2003)
Nature Reviews Genetics
might be stronger in regions under selection
(example: LD extends over 800 kb around Y1
gene in maize, Palaisa et al. 2004, which in
general shows also a rapid decay of LD with
physical distance, Remington et al. 2001)
Extend of LD and association: higher LD makes easier
to detect associations but more difficult to identify the
causal mutations
Variation among species
0.5
Picea abies all
0.45
P. abies without Romania
0.4
Baltico-Nordic domain
0.35
Alpine domain
r
2
0.3
0.25
conifers
0.2
0.15
0.1
0.05
0
0
500
1000
1500
2000
2500
distance (base pairs)
humans
Variation among genes
Stumpf & McVean (2003)
Nature Reviews Genetics
3000
3500
Approaches to genetic association in plants
Complex
demography
Population structure
unknown
SA
Natural
GC
populations
Breeding
populations
GLM
GC
GLM
GC
MLM
MLM
TDT
QTDT
Familial relatedness
Based on Yu & Buckler (2006)
Current Opinion in Biotechnology
Power considerations: the size of an
association population
A single random mating population with mutation,
random genetic drift, and recombination
1
Power
0.8
0.6
0.4
0.2
0
0
10
20
N=500
30
N=100
40
N=50
% variation explained by QTN
Long & Langley (1999)
Genome Research
50
Increased rate of false-positives due to
population structure…
…but correcting for pop structure
produces true negatives!
Drought cline
Hirschhorn & Daly 2005
Nature Reviews Genetics
a
b
haplotypes
Multiple glaciar refugiaMoroccan
c
Western
Eastern
Postglacial migrations
maritime pine
Zhao et al. (2007)
PLoS Genetics
Power
Power considerations: structured populations
% variation explained by QTN
Zhao et al. (2007)
PLoS Genetics
(Small association pop of ~100 accessions)
Methods for genetic association in forest trees
• Standard general linear models (GLMs), usually with p values
computed by permutation.
y =  + mi + eij, where y is the trait value,  is a general mean,
mi is the genotype of the i-th SNP and eij is the residual.
• Structured Association (Pritchard et al. 2000; Thornsberry 2001)
and PCA Association (Price et al. 2006).
Controls for population structure by incorporating a Q matrix.
This matrix is an n × p population structure incidence matrix
where n is the number of individuals assayed and p is the
number of populations defined.
• Mixed Linear Models (MLMs; Yu et al. 2006).
They incorporate a Q matrix (fixed effect) but also a pairwise
relatedness matrix (K matrix, a random effect), which account for
within population structure.
• Family-based methods (Transmission Disequilibrium Test, TDT or
QTDT, and its several extensions).
Parents must be heterozygous to be informative.
From few to moderate genetic backgrounds tested.
FBRC association
population in loblolly
pine
Partial diallel, including 15-24
offspring from 61 families.
Association with WUE (isotope
discrimination in two sites)
0.8
0.6
Trait
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
Genotype by family for DHN1-S2
González-Martínez et al. (2008)
Heredity
Corrections for multiple testing
• Experiment-wise permutation
•
Bonferroni (/k, with k = the number of tests)
•
False Discovery Rate (FDR)
FDR: the expected
proportion of false positives
among all significant tests
Storey & Tibshirani (2003)
PNAS
Permutation tests
(Hirschhorn and Daly 2005)
Some examples
Monolignol biosynthesis
and cell-wall related genes
González-Martínez et al. (2007)
Genetics
Drought tolerance
Collada et al. (in prep.)
Pinus taeda L
Continuous range, no clear
population genetic structure
Pinus pinaster Ait.
22 populations
Fragmented range, significant
population structure
Pinus pinaster
geographic
range
(46) Pleucadec
(47)Erdeven
France
St Jean de Monts(45)
Olonne/Mer(44)
(43)Le Verdon
(42)Hourtin
(41)Mimizan
(40)Petrock
Spain
(27)San Cipriano
Cuellar
Cuellar (25)San Leonardo de Yagüe
(23)Cuellar
(26)Bayubas de Abajo
(22)Coca
(21)Arenas de San Pedro Valdemaqueda(24)
Cenicientos
(20)
Portugal
Restonica (2)
Pinia (15)
(11)Pinet
a (10)Aulenne
Ahin(28)
(29)Oria
Tabarka(50)
Tabarka
Tabarka
Tunisia
Tamrabta(30)
Morocco
ADEPT project
TREESNIPS project
(also P. sylvestris, Picea abies and oaks)
Genetic association with wood property traits
Phenotypic traits
• Earlywood specific gravity (ewsg)
• Latewood specific gravity (lwsg)
• Percent latewood (lw)
• Earlywood microfibril angle (ewmfa)
• Lignin & cellulose content (lgn-cel)
microfibril
angle
S3
S2
2o wall
S1
1o wall
• Synthetic PCAs for different wood-age types
SNP genotyping
FP-TDI platform  58 SNPs from
20 wood- and drought- related
candidate genes.
González-Martínez et al. 2007
Genetics
cad
Significant genetic
association of cad gene
with earlywood specific
gravity and 4cl with %
latewood
0
-60
90 208
90
F1A
500
1000
321
781
1500
1008 1133
R1A
F2
R6
F6
2000
2500
2500
3000
1417 1528 1681
R2
F3
4cl
0
500
1000
9
9
4
1
F4
61
R4
601
F5
491
F3
947
1500
1
4
1
0
R3
F2
1454 1486
R3
2000
1
6
0
9
1
6
9
7
1
8
4
5
1
9
3
4
2500
2
0
0
4
R1A
2003
F6
1956
2
3
8
5
2
5
8
9
R6
2728
cynnamyl alcohol dehydrogenase (cad)
SNP M28 (position 16 bp)
M28 M29
T
T
T
G
T
G
G A G
G A G
G A A
A
A
G
G
C G G
C G G
10
*
MGSLESEKTV
AA
Tested but not giving
significant associations
3500
3192 3284
[…]
180
*
SPMKHFGMTEP
10
180
*
*
MGSLETEKTV […] SPMKHFAMTEP
R3
Genetic association with WUE
Phenotypic traits
• Isotope discrimination (WUE)
• Growth (height, diameter, annual
increments)
• Biomass (total and aerial)
• Ontogeny scores
• Survival
Provenance-progeny
combined tests in two
sites: Cálcena (central
Spain) & Bordeaux
(southwestern France)
SNP genotyping
Pyrosequencing 
Relatively high genotyping
error.
Collada et al. (in prep.)
agp4
GLMs, population as
a factor
dhn1
ccoaomt
erd3
dhn2
lp3-3
rd21
470bp
1062bp
1069bp
116bp
171bp
1229bp
92bp
248bp
254bp
259bp
293bp
43bp
69bp
75bp
223bp
267bp
272bp
3bp
c
c
C
A
A
A
A
A
A
C
C
T
T
T
T
T
T
T
T
T
G
T
C
C
C
T
T
T
T
T
T
T
C
C
C
C
T
T
T
T
T
T
T
C
A
A
A
A
A
A
G
A
A
A
A
T
T
T
C
C
C
C
C
C
C
T
G
G
G
G
G
G
G
G
G
G
A
G
G
G
G
G
A
G
G
G
G
G
C
C
C
C
C
C
C
T
C
C
C
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
T
A
A
A
A
A
A
A
A
A
A
G
C
C
C
C
C
C
C
C
C
C
A
A
A
A
A
A
A
A
A
A
A
G
C
C
C
C
C
C
C
C
T
T
C
a
a
A
T
T
T
T
T
A
A
A
T
T
T
T
T
T
T
T
T
T
T
A
A
A
G
G
G
G
G
A
A
A
C
C
C
T
T
T
T
T
C
C
C
T
T
T
T
T
T
T
T
T
C
C
T
T
T
T
T
T
T
T
T
T
C
C
T
t
T
T
T
T
T
T
T
T
T
T
A
C
C
C
C
C
C
C
C
C
T
T
C
c
T
T
T
T
T
T
T
T
C
C
C
A
G
G
G
G
G
G
G
G
A
A
A
t
T
T
T
T
T
T
T
T
G
G
T
c
T
T
T
T
T
T
T
T
T
T
C
C
C
C
C
C
C
C
C
C
C
C
T
c
C
C
C
C
C
C
C
C
C
C
C
A
G
G
G
G
G
G
G
G
G
G
A
g
T
T
T
T
T
T
T
T
T
T
G
t
C
C
C
C
C
C
C
C
C
C
T
A
T
T
T
A
A
A
A
A
A
A
A
T
T
T
T
C
C
C
C
C
C
C
C
G
G
G
G
G
G
G
G
G
A
A
G
A
G
G
G
G
G
G
G
G
T
T
T
T
C
C
C
C
C
C
C
C
T
T
T
A
T
T
T
A
A
A
A
A
A
A
A
T
C
C
C
T
T
T
T
T
C
C
C
T
T
T
T
C
C
C
C
C
T
T
T
C
C
C
C
T
T
T
T
T
C
C
C
C
C
C
C
C
C
C
C
C
T
T
T
G
A
A
A
G
G
G
G
G
G
G
G
G
C
C
C
G
G
G
G
G
G
G
G
T Pinus taeda
T
1
T
6
T
5
T
10
T
29
T
1
T
2
T
1
T
1
T
1
T
1
BLUEs (pop effect removed)
Isotope discrimination
pr-agp4
FRD13C
0.1469
0.000999
0.000999
0.013
0.2188
0.0699
0.4256
0.4286
0.2927
0.3646
0.3457
0.4605
0.7373
0.027
0.3377
0.9071
0.4366
0.7313
C
T
C
C
T
C
C
C
C
C
C
C
0.20
0.15
0.10
0.05
0.00
-0.05
-0.10
-0.15
-0.20
TT
GT
Average for TT: 0.0034
Average for GT: -0.0407
GG
Central/
margina
l pairs
Tassel demo
R SNPassoc package demo
Perspectives on genetic association in forest trees
• Enormous potential, but still many technical challenges ahead:
optimization of SNP genotyping platforms, dealing with recently
evolved gene families, building large unstructured association
populations, transfer information to non-model species, etc.
• Linking genotype-phenotype through association genetics works
well for well-known metabolic pathways, and for some species
such as loblolly pine genome-wide approaches are now in place.
As large-scale association studies are developed, more complex
questions will be addressed: gene interactions, heterosis,
plasticity (G x E), etc.
• Apart from industry applications, given the ecosystem-wide
importance of forest trees, genetic association will have a strong
influence in evolutionary and ecological research.
Absence of transpecific SNPs between P. pinaster and
P. taeda, two pine species separated by ~120 Myr
nt_43
nt_44
nt_55
nt_59
nt_64
nt_65
nt_66
nt_67
nt_68
nt_69
nt_70
nt_71
nt_72
nt_73
nt_74
nt_75
nt_76
nt_77
nt_81
nt_85
nt_87
nt_91
nt_97
nt_106
nt_115
nt_127
nt_134
nt_143
nt_156
nt_158
nt_161
nt_188
nt_196
nt_198
nt_199
nt_200
nt_201
nt_204
nt_223
nt_235
nt_236
nt_246
nt_267
nt_272
nt_298
nt_318
nt_319
nt_330
nt_363
Lp3_3 pinaster
F1
R1
ABA-and-WDS-induced-gene-3
(lp3-3)
0
185 352
406
Hap_1
Hap_2
Hap_3
P. pinaster Hap_4
P.pinaster
Hap_5
Hap_6
Hap_7
Hap_8
Hap_A
P. taeda
Hap_B
P.taeda
Hap_C
Hap_D
C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T T T A A G A T A C
C G C G G G A G G T G A A G A G T G A G T G C G A C C T G G G C C G A T C C T T T T C T C A T A C
C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T T T C T C A T A C
C G C G G G A G G A G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T C T A A G A T A C
C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C A T T T A A G A T A C
T G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C A T T T A A G A T A C
C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T T T A A G A T A T
C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T T T C A G A T A C
C G T A - - - - - - - - - - - - C A T T C T T A G T A G G A A - T A - - - T T C T C A A G A C G C
C T T A - - - - - - - - - - - - C A T T C T T A G T A G G A A - T A - - - T T C T C A A G A C G C
C G T A - - - - - - - - - - - - C A T T C T T A G T A G A A A - T A - - - T T C T C A A G A C G C
C G T A - - - - - - - - - - - - C A T T C T T A G T A G G A A - T A - - - T T C T C A A G G C G C
Average Ks between P. pinaster and P. taeda of ~2%
Acknowledgements
TREESNIPS (for maritime pine: C. Collada, E. Eveno, M.A.
Guevara, A. Booth, A. Soto, C. Plomion, L. Díaz, S. McCallum, I.
Aranda, O. Brendel, R. Alía, V. Leger, J. Brach, J. Russell, P.H.
Garnier-Géré, M.T. Cervera)
ADEPT & ADEPT2 (N.C. Wheeler, E. Ersoz, G.R. Brown, G.P. Gill,
R.J. Kuntz, J.A. Beal, J. Manares, D. Huber, J. Davis, B. Pande, J.
Lee, A. Eckert, J. Wegrzyn, C.D. Nelson)
FUNDING AGENCIES (NSF, CSREES-USDA, EU, MEC-Spain)
and, of course, all you!
Download