Landscape genomics in sugar pines (Pinus lambertiana) Exploring patterns of adaptive genetic variation along environmental gradients. Carl Vangestel Spatial Genomics Why associations with measures of aridity? • Drought stress common cause mortality and annual yield loss • Shortage of water is one of the strongest environmental constraints and abiotic selective forces in trees • Geography directly affect water availability → clinal variation in adaptive traits Spatial Genomics Why associations with measures of aridity? • Future climate change → affect local abiotic conditions and distribution of trees → higher temperatures and increased variability in precipitation SW US → increase in frequency and intensity of drought Spatial Genomics Why sugar pine? • Sugar pines are less tolerant to drought stress than other conifer species → expected to show strong clinal patterns in adaptive genetic variation along aridity gradient → very sensitive to future climate changes: alterations in current distribution range • One of the most diverse genomes among conifers → average heterozygosity of specific genes was 26 percent (upper range of pines studied so far) Spatial Genomics Climate Change current 2030 (Source: USDA Forest Service, RMRS, Moscow Forestry Science Labaratory) - Different scenarios - Hadley Climate Scenario 2060 2090 Spatial Genomics Detailed knowledge on adaptive variation may become crucial to mitigate impact global climate change How adaptive variation is distributed over the range of environments is largely unknown Goal of this study: • identify adaptive SNP’s associated with variation in temperature, precipitation, aridity index (precipitation/potential evapotranspiration), elevation • functionally annotate these genes • explore both neutral and adaptive variation across the sugar pine’s range Spatial Genomics N= 338 individuals Spatial Genomics • Transcriptome assembly: Sanger, 454 (pool) and Illumina (3 ind) • Candidate SNPs selection Literature SNP Quality • MYB proteins (stomatal closure, etc ...) • heat shock proteins (prevention of protein denaturation during cellular dehydration) • Trehalose-6-phosphate synthase (osmotic protection cell membranes during dehydration) • LEA proteins (membrane and protein stabilisers, etc ...) • ... • First screening: 67 genes selected • Second screening: 109 under review Spatial Genomics Multi-analytical approach Generalized linear models Fst Outlier Analysis Bayesian Environmental analyses Spatial Genomics Neutral SNP Spatial Genomics Neutral SNP Gene Flow (IBD) Genetic Drift Spatial Genomics • “Separate” neutral patterns from selective ones • Explore adaptive patterns while accounting for neutral population structure ‘Neutral SNP’ ‘Adaptive SNP’ Spatial Genomics Generalized linear models For each SNP j: ij int 0 ENVi 1q1i ... 12 q12 i log 1 ij ENVi = Environmental value for tree i q1i .. q12n: first n principal components of Q-matrix for tree i Spatial Genomics Fst Outlier Analysis Arlequin Spatial Genomics Fst Outlier Analysis BayeScan FDR=0.2 FDR=0.05 FDR=0.001 11 0.05 0.05 0.05 10 10 0.04 0.04 0.04 10 66 0.03 0.02 0.01 0.01 0.02 9 0.02 fst 0.03 fst 0.03 66 0.01 -2 -3 -4 0 -1 -2 -3 0 -4 log10(q value) Alpha11 posterior distribution Alpha66 posterior distribution Density 0.2 0.4 0.3 Density Density 0.4 0.6 0.6 0.5 0.8 0.6 0.8 0.7 1.0 Alpha10 posterior distribution 0.5 1.0 1.5 2.0 2.5 3.0 Alpha10 SNP10 [0.68,2.35] 0.0 0.1 0.2 0.0 0.0 0.0 0.5 1.0 1.5 Alpha11 2.0 2.5 3.0 HPDI SNP11: [0.92,2.52] 3.5 -1 -2 log10(q value) log10(q value) 0.4 -1 0.2 0 0.0 fst 11 11 0 1 2 3 Alpha66 SNP66: [0.00,2.20] 4 -3 -4 Spatial Genomics Bayesian Environmental Analysis ε𝑙 fancestral 𝑥𝑙1 𝑥𝑙2 𝑥𝑙3 𝑥𝑙4 𝑥𝑙5 𝑔(θ𝑙1 ) 𝑔(θ𝑙2 ) 𝑔(θ𝑙3 ) 𝑔(θ𝑙4 ) 𝑔(θ𝑙5 ) Drift: fpopulation deviate Gene flow: deviations covary Transformed variable 𝑔(θ𝑙𝑖 ) Spatial Genomics Heat map of var-cov matrix (Coop et al., 2010) ρ² ρ 1 ρ ρ² ρ³ ρ² ρ 1 ρ ← pop5 ← pop4 1 ρ ρ 1 ρ² ρ ρ³ ρ² ρ4 ρ³ ← pop3 Ω= ← pop2 ← pop1 Bayesian Environmental Analysis ρ4 ρ³ ρ² ρ 1 ← pop1 ← pop2 ← pop3 ← pop4 ← pop5 Structure Spatial Genomics Bayesian Environmental Analysis • Selected 1 SNP per gene for var-cov matrix (excluded putative selective genes) Correlation matrix BayEnv Pairwise Fst matrix Spatial Genomics Bayesian Environmental Analysis • Formulate null model: drift/gene flow • Alternative model: drift/gene flow + selection Null model: P(θl|Ω, εl) ~ N(εl, εl(1- εl) Ω) Alternative model: P(θl|Ω, εl, β) ~ N(εl + βY, εl(1- εl) Ω) • Bayes Factor: ratio of posterior probability under alternative to the one under null • High BF indicative for SELECTION Spatial Genomics Bayesian Environmental Analysis