Association genetics in forest trees Santiago C. González-Martínez Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain santiago@inia.es SNP 1 AT AT TT AA AA AT AA TT TT TT AT AA AA TT AT AT AT AA AA TT SNP 2 GT GT TT GG GG TT GT GT TT TT GG GT TT TT GG GT GG GG GG GT SNP 3 CC GG GG CG CG CG CC GG GG CC CG CG CG GG CC CG CC CC GG GG f(AA)=0.35 f(AT)=0.35 f(TT)=0.30 f(GG)=0.35 f(GT)=0.35 f(GG)=0.30 f(CC)=0.30 f(CG)=0.45 f(GG)=0.25 Trait 1 u(AA)=32/7=4.57 u(AT)=23/6=3.83 u(TT)=51/6=8.50 Trait 1 u(GG)=28/7=4.00 u(GT)=42/7=6.00 u(TT)=41/6=6.83 Trait 1 u(CC)=6/6=1.00 u(GC)=35/7=5.00 u(GG)=70/7=10.00 Trait 2 u(AA)=14/7=2.00 u(AT)=33/7=4.71 u(TT)=49/6=8.16 Trait 2 u(GG)=24/7=3.43 u(GT)=33/7=4.71 u(TT)=41/6=6.83 Trait 2 u(CC)=28/6=4.66 u(GC)=24/7=3.43 u(GG)=44/7=6.28 Trait 1 Trait 2 1 3 10 4 10 7 5 1 5 3 5 5 1 2 10 8 10 7 1 10 5 6 5 4 5 1 10 9 1 6 5 4 1 5 1 2 10 1 10 8 What is association genetics? Linkage versus Association: finding the molecular variation underlying complex traits A favourable mutation X X several generations X X X X X LG Mapping pedigree Natural population (= multiple genetic backgrounds) For which organisms genetic association is a promising approach? • Relatively undomesticated species with outbred mating systems and large natural populations. • Organisms with long life-spam, where generating pedigrees would take several years. • Organisms (such as humans) where artificial crosses are not possible or are difficult to obtain (incompatible species). • In plants: opportunity to test for genetic association of multiple traits and phenotypes: long-term common garden experiments (including clonal tests high precision in the estimation of phenotypes). The ‘immortal’ association population Linkage disequilibrium and association 0.5 a) Picea abies all 0.45 P. abies without Romania 0.4 Baltico-Nordic domain 0.35 Alpine domain r 2 0.3 b) 0.25 Heuertz et al. 2006 Genetics 0.2 0.15 0.1 0.05 0 0 500 1000 1500 2000 2500 3000 3500 distance (base pairs) c) Rapid decay of LD in conifers, but LD Stumpf & McVean (2003) Nature Reviews Genetics might be stronger in regions under selection (example: LD extends over 800 kb around Y1 gene in maize, Palaisa et al. 2004, which in general shows also a rapid decay of LD with physical distance, Remington et al. 2001) Extend of LD and association: higher LD makes easier to detect associations but more difficult to identify the causal mutations Variation among species 0.5 Picea abies all 0.45 P. abies without Romania 0.4 Baltico-Nordic domain 0.35 Alpine domain r 2 0.3 0.25 conifers 0.2 0.15 0.1 0.05 0 0 500 1000 1500 2000 2500 distance (base pairs) humans Variation among genes Stumpf & McVean (2003) Nature Reviews Genetics 3000 3500 Approaches to genetic association in plants Complex demography Population structure unknown SA Natural GC populations Breeding populations GLM GC GLM GC MLM MLM TDT QTDT Familial relatedness Based on Yu & Buckler (2006) Current Opinion in Biotechnology Power considerations: the size of an association population A single random mating population with mutation, random genetic drift, and recombination 1 Power 0.8 0.6 0.4 0.2 0 0 10 20 N=500 30 N=100 40 N=50 % variation explained by QTN Long & Langley (1999) Genome Research 50 Increased rate of false-positives due to population structure… …but correcting for pop structure produces true negatives! Drought cline Hirschhorn & Daly 2005 Nature Reviews Genetics a b haplotypes Multiple glaciar refugiaMoroccan c Western Eastern Postglacial migrations maritime pine Zhao et al. (2007) PLoS Genetics Power Power considerations: structured populations % variation explained by QTN Zhao et al. (2007) PLoS Genetics (Small association pop of ~100 accessions) Methods for genetic association in forest trees • Standard general linear models (GLMs), usually with p values computed by permutation. y = + mi + eij, where y is the trait value, is a general mean, mi is the genotype of the i-th SNP and eij is the residual. • Structured Association (Pritchard et al. 2000; Thornsberry 2001) and PCA Association (Price et al. 2006). Controls for population structure by incorporating a Q matrix. This matrix is an n × p population structure incidence matrix where n is the number of individuals assayed and p is the number of populations defined. • Mixed Linear Models (MLMs; Yu et al. 2006). They incorporate a Q matrix (fixed effect) but also a pairwise relatedness matrix (K matrix, a random effect), which account for within population structure. • Family-based methods (Transmission Disequilibrium Test, TDT or QTDT, and its several extensions). Parents must be heterozygous to be informative. From few to moderate genetic backgrounds tested. FBRC association population in loblolly pine Partial diallel, including 15-24 offspring from 61 families. Association with WUE (isotope discrimination in two sites) 0.8 0.6 Trait 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 Genotype by family for DHN1-S2 González-Martínez et al. (2008) Heredity Corrections for multiple testing • Experiment-wise permutation • Bonferroni (/k, with k = the number of tests) • False Discovery Rate (FDR) FDR: the expected proportion of false positives among all significant tests Storey & Tibshirani (2003) PNAS Permutation tests (Hirschhorn and Daly 2005) Some examples Monolignol biosynthesis and cell-wall related genes González-Martínez et al. (2007) Genetics Drought tolerance Collada et al. (in prep.) Pinus taeda L Continuous range, no clear population genetic structure Pinus pinaster Ait. 22 populations Fragmented range, significant population structure Pinus pinaster geographic range (46) Pleucadec (47)Erdeven France St Jean de Monts(45) Olonne/Mer(44) (43)Le Verdon (42)Hourtin (41)Mimizan (40)Petrock Spain (27)San Cipriano Cuellar Cuellar (25)San Leonardo de Yagüe (23)Cuellar (26)Bayubas de Abajo (22)Coca (21)Arenas de San Pedro Valdemaqueda(24) Cenicientos (20) Portugal Restonica (2) Pinia (15) (11)Pinet a (10)Aulenne Ahin(28) (29)Oria Tabarka(50) Tabarka Tabarka Tunisia Tamrabta(30) Morocco ADEPT project TREESNIPS project (also P. sylvestris, Picea abies and oaks) Genetic association with wood property traits Phenotypic traits • Earlywood specific gravity (ewsg) • Latewood specific gravity (lwsg) • Percent latewood (lw) • Earlywood microfibril angle (ewmfa) • Lignin & cellulose content (lgn-cel) microfibril angle S3 S2 2o wall S1 1o wall • Synthetic PCAs for different wood-age types SNP genotyping FP-TDI platform 58 SNPs from 20 wood- and drought- related candidate genes. González-Martínez et al. 2007 Genetics cad Significant genetic association of cad gene with earlywood specific gravity and 4cl with % latewood 0 -60 90 208 90 F1A 500 1000 321 781 1500 1008 1133 R1A F2 R6 F6 2000 2500 2500 3000 1417 1528 1681 R2 F3 4cl 0 500 1000 9 9 4 1 F4 61 R4 601 F5 491 F3 947 1500 1 4 1 0 R3 F2 1454 1486 R3 2000 1 6 0 9 1 6 9 7 1 8 4 5 1 9 3 4 2500 2 0 0 4 R1A 2003 F6 1956 2 3 8 5 2 5 8 9 R6 2728 cynnamyl alcohol dehydrogenase (cad) SNP M28 (position 16 bp) M28 M29 T T T G T G G A G G A G G A A A A G G C G G C G G 10 * MGSLESEKTV AA Tested but not giving significant associations 3500 3192 3284 […] 180 * SPMKHFGMTEP 10 180 * * MGSLETEKTV […] SPMKHFAMTEP R3 Genetic association with WUE Phenotypic traits • Isotope discrimination (WUE) • Growth (height, diameter, annual increments) • Biomass (total and aerial) • Ontogeny scores • Survival Provenance-progeny combined tests in two sites: Cálcena (central Spain) & Bordeaux (southwestern France) SNP genotyping Pyrosequencing Relatively high genotyping error. Collada et al. (in prep.) agp4 GLMs, population as a factor dhn1 ccoaomt erd3 dhn2 lp3-3 rd21 470bp 1062bp 1069bp 116bp 171bp 1229bp 92bp 248bp 254bp 259bp 293bp 43bp 69bp 75bp 223bp 267bp 272bp 3bp c c C A A A A A A C C T T T T T T T T T G T C C C T T T T T T T C C C C T T T T T T T C A A A A A A G A A A A T T T C C C C C C C T G G G G G G G G G G A G G G G G A G G G G G C C C C C C C T C C C G G G G G G G G G G G G G G G G G G G G G T A A A A A A A A A A G C C C C C C C C C C A A A A A A A A A A A G C C C C C C C C T T C a a A T T T T T A A A T T T T T T T T T T T A A A G G G G G A A A C C C T T T T T C C C T T T T T T T T T C C T T T T T T T T T T C C T t T T T T T T T T T T A C C C C C C C C C T T C c T T T T T T T T C C C A G G G G G G G G A A A t T T T T T T T T G G T c T T T T T T T T T T C C C C C C C C C C C C T c C C C C C C C C C C C A G G G G G G G G G G A g T T T T T T T T T T G t C C C C C C C C C C T A T T T A A A A A A A A T T T T C C C C C C C C G G G G G G G G G A A G A G G G G G G G G T T T T C C C C C C C C T T T A T T T A A A A A A A A T C C C T T T T T C C C T T T T C C C C C T T T C C C C T T T T T C C C C C C C C C C C C T T T G A A A G G G G G G G G G C C C G G G G G G G G T Pinus taeda T 1 T 6 T 5 T 10 T 29 T 1 T 2 T 1 T 1 T 1 T 1 BLUEs (pop effect removed) Isotope discrimination pr-agp4 FRD13C 0.1469 0.000999 0.000999 0.013 0.2188 0.0699 0.4256 0.4286 0.2927 0.3646 0.3457 0.4605 0.7373 0.027 0.3377 0.9071 0.4366 0.7313 C T C C T C C C C C C C 0.20 0.15 0.10 0.05 0.00 -0.05 -0.10 -0.15 -0.20 TT GT Average for TT: 0.0034 Average for GT: -0.0407 GG Central/ margina l pairs Tassel demo R SNPassoc package demo Perspectives on genetic association in forest trees • Enormous potential, but still many technical challenges ahead: optimization of SNP genotyping platforms, dealing with recently evolved gene families, building large unstructured association populations, transfer information to non-model species, etc. • Linking genotype-phenotype through association genetics works well for well-known metabolic pathways, and for some species such as loblolly pine genome-wide approaches are now in place. As large-scale association studies are developed, more complex questions will be addressed: gene interactions, heterosis, plasticity (G x E), etc. • Apart from industry applications, given the ecosystem-wide importance of forest trees, genetic association will have a strong influence in evolutionary and ecological research. Absence of transpecific SNPs between P. pinaster and P. taeda, two pine species separated by ~120 Myr nt_43 nt_44 nt_55 nt_59 nt_64 nt_65 nt_66 nt_67 nt_68 nt_69 nt_70 nt_71 nt_72 nt_73 nt_74 nt_75 nt_76 nt_77 nt_81 nt_85 nt_87 nt_91 nt_97 nt_106 nt_115 nt_127 nt_134 nt_143 nt_156 nt_158 nt_161 nt_188 nt_196 nt_198 nt_199 nt_200 nt_201 nt_204 nt_223 nt_235 nt_236 nt_246 nt_267 nt_272 nt_298 nt_318 nt_319 nt_330 nt_363 Lp3_3 pinaster F1 R1 ABA-and-WDS-induced-gene-3 (lp3-3) 0 185 352 406 Hap_1 Hap_2 Hap_3 P. pinaster Hap_4 P.pinaster Hap_5 Hap_6 Hap_7 Hap_8 Hap_A P. taeda Hap_B P.taeda Hap_C Hap_D C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T T T A A G A T A C C G C G G G A G G T G A A G A G T G A G T G C G A C C T G G G C C G A T C C T T T T C T C A T A C C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T T T C T C A T A C C G C G G G A G G A G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T C T A A G A T A C C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C A T T T A A G A T A C T G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C A T T T A A G A T A C C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T T T A A G A T A T C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T T T C A G A T A C C G T A - - - - - - - - - - - - C A T T C T T A G T A G G A A - T A - - - T T C T C A A G A C G C C T T A - - - - - - - - - - - - C A T T C T T A G T A G G A A - T A - - - T T C T C A A G A C G C C G T A - - - - - - - - - - - - C A T T C T T A G T A G A A A - T A - - - T T C T C A A G A C G C C G T A - - - - - - - - - - - - C A T T C T T A G T A G G A A - T A - - - T T C T C A A G G C G C Average Ks between P. pinaster and P. taeda of ~2% Acknowledgements TREESNIPS (for maritime pine: C. Collada, E. Eveno, M.A. Guevara, A. Booth, A. Soto, C. Plomion, L. Díaz, S. McCallum, I. Aranda, O. Brendel, R. Alía, V. Leger, J. Brach, J. Russell, P.H. Garnier-Géré, M.T. Cervera) ADEPT & ADEPT2 (N.C. Wheeler, E. Ersoz, G.R. Brown, G.P. Gill, R.J. Kuntz, J.A. Beal, J. Manares, D. Huber, J. Davis, B. Pande, J. Lee, A. Eckert, J. Wegrzyn, C.D. Nelson) FUNDING AGENCIES (NSF, CSREES-USDA, EU, MEC-Spain) and, of course, all you!