Ecological genetics of Arabidopsis thaliana from reservoir populations in low-disturbance habitats Neil Pearson, Warwick HRI Contact n.pearson@warwick.ac.uk Student ID: 0867630 Supervisors: Professor Eric Holub, Professor Robin Allaby Funding: BBSRC Summary Background Bioinformatic analyses of high density SNP data from Arabidopsis thaliana accessions collected in the UK will attempt to identify regions in the genome that could trace the history of haplotypes back to founding populations, and to determine whether regions are under selection due to parasitism (e.g, Albugo candida, the white blister rust pathogen). A number of genome-wide studies have recently discovered evidence of selection in the human genome, and this project will extend such techniques into the field of Arabidopsis genomic research. Efforts to understand the genetic basis of phenotypic diversity have advanced in 3 major stages, with techniques generally being pioneered in human genetics and subsequently applied to the study of other model organisms, including A. thaliana: Haplotype blocks under selection will be identified by incidences of contiguous covariant SNPs at a rate significantly higher than expected under a neutral model. The lengths and distributions of these haplotype may grant insight into migratory history and recent adaptive walks of A. thaliana, and may also provide indications of the allelic compositions of the founders of the UK population. The population history inference software DADI will also be used to compare a frequency spectrum derived from SNP data with models of potential population histories. In parallel, genome-wide association mapping will be used to identify regions that may confer susceptibility in a global sample of A. thaliana to a common oomycete parasite Albugo candida (white blister rust). Use of A. candida will allow a test for correlation with haplotype blocks, thus indicating positive selection for resistance to infection. Correlation between haplotype blocks and environment types (gardens versus low disturbance wall sites), or between broader habitat types, will be investigated using available geographic data. A software solution dedicated to finding evidence of positive selection from such combined SNP and phenotype data will be produced and released to facilitate further research into the underlying genetic causes of phenotypic variation using this approach. The aim of this project is therefore to identify local genetic variation in A. thaliana that can be attributed to the action of selection, especially that caused by A. candida. 1. Human Genome Project: First complete sequence of the entire genome. Raised the possibility that the genetic causes of all phenotypic variation might soon be known Image 1 Albugo candida, encountered on Arabidopsis thaliana’s close relative and competitor Capsella bursa (Shepherd’s purse), causing the disease ‘white rust’ a b e c f d g Images 2a-g Observed response phenotypes to Albugo 1. International HapMap Consortium: Utilised high-density SNP data to attempt to trace genetic differences responsible for phenotypic variation. Shifted perspective from simple Mendelian characters to more complex, quantitative traits, as described by Plomin (2010) 2. 1000 Genomes Project: Resequencing effort, made possible by technological advances. Addresses biases inherent to HapMap approach, and enables comprehensive genome-wide association mapping techniques These techniques, pioneered in human genetic research, have proven effective when applied to A. thaliana, being used – for example – to identify genes associated with flowering time (Ehrenreich et al, 2009). It is further argued (Bergelson and Roux, 2010) that placing such genome-wide association studies in an ecological context enables the study of past and the prediction of future evolutionary trajectories – i.e., selective walks. Following this line of thought, three complementary tests were applied to a set of genome-wide SNP data generated from A. thaliana by Horton et al (2012), in order to identify previously unknown genomic regions that are under selection. Many were found, but the exact details of the population history responsible for these results are, as yet, unknown. candida infection in Arabidopsis thaliana, ranging from complete resistance (a) through partial resistance (b, c, d, e) to full susceptibility (f, g) Objective 1: Identification of haplotypes Previous work (Platt et al., 2010) indicated that globally, the Arabidopsis thaliana population followed a model of gene flow known as isolation by distance – in which the likelihood of two individuals sharing alleles decreases as geographic separation increases – and that this model held true at all scales examined across Eurasia. Due to the relatively small number of loci available, however, this approach could not be used to investigate specific predictions concerning selection acting on particular loci. Applying a similar approach with the 250K SNP dataset, though, allows such predictions to be made. Two subsets of accessions were selected from the full dataset: accessions collected in the UK, and accessions from the Nordborg-Bergelson set. SNP data were divided into windows of 100 adjacent loci each, and a script was used to locate pairwise similarity of 99% or greater per window. A K-means clustering algorithm was then used to separate out haplotypes that failed to be distinguished due to proximity (see Figure 2). Results (Figures 1a and 1b) show a close similarity in the distribution of haplotypes across the genome in both subsets. This shows that, in all likelihood, most haplotypes are older than the species’ entry to the UK; this will be confirmed by a comparison seeking corresponding blocks occurring in both subsets, further enabling a comparison of frequencies between subsets (see Objective 3). Objective 2: QTL mapping Albugo resistance A set of Multiparent Advanced Generation Intercross (MAGIC) lines (see Kover et al., 2009) were grown and inoculated with ACEM2, a race of Albugo candida, and the resulting phenotypes recorded (see Images 2a to 2g). Analysis of phenotypes using MAGIC mapping software revealed 3 association peaks, closely corresponding with genes WRR4 (Borhan et al., 2008), WRR5/6 (Holub and Cevic, pers. comm.), and an unnamed gene. Upon first analysis using all accessions, a strong association peak is discovered (see Images 3c and 3d). Removing lines showing complete resistance reveals two more peaks (Image 3e). a b This experiment is now being repeated with a second A. candida isolate collected from C. bursa, in order to establish that the two isolates are clonal, and provoke the same phenotypes that associate with the same defence genes in A. thaliana. c d e An overview this group: Window Sq_1 NFA_8 Hil_1 Crl_1 Edburgh_8 HR_5 HR_10 Cnt_1 UKSE6_640 UKSE6_618 UKID35 UKID87 UKID103 UKID28 UKID17 CIBC_5A UKSE6_626 PHW_13 of run structure in 1 | | | | - 2 | | | | | - 3 | | | | - 4 | | | | - 5 | | | | | | | | | | | - 6 | | | | | | | - 7 | | | | | | | | | | | - 8 | | | | | | | | | | | - 9 | | | | | | | | | | 10 | | | | | | | | | | Figure 2 An example of similarity within a short series of windows, demonstrating the necessity of employing clustering analysis to determine haplotype structure. Vertical marks represent ≥99% similarity between 2 or more accessions, horizontal marks represent less extensive similarity. Image 3a-e Haplotypes (≥99% similarity) identified from 250K SNP data in UK accessions (a) and international accessions (b), and MAGIC mapping traces using all phenotypes (c), binary resistant/susceptible (d), and entirely resistant phenotypes removed (e) DADI analysis Image 4 PCA of international 250K SNP data (taken from Horton et al 2012 supplementary data) Objective 3: Further investigation Several lines of enquiry may now be followed: • Measure A. candida resistance phenotypes of accessions used in 250K dataset • Carry out DADI analysis (Gutenkunst & Hernandezr, 2010) of frequency spectra in order to infer population history, in addition to simple geographic correlations of haplotypes comparisons to models specified from data derived from Platt et al. (2010) • Use Kimura’s equation (Kimura & Ohta, 1973) to estimate divergence time (in generations) between haplotypes found in UK and Nordborg-Bergelson accessions, assuming neutrality • Resampling in regions showing differences in frequency. Use F-statistics and HardyWeinberg equilibrium to identify instances of gene flow between populations in distinct geographic areas, and selection The end goal… Image 5 Initial 2-dimensional comparison of 250K SNP data (UK and Nordborg-Bergelson groups) against FS derived from bottlenecked and diverging population model. Note process of constructing data is, as yet, flawed. RELATE THE ECOLOGY TO THE GENETICS References of multiple populations from genomic polymorphism data. • Kover, P. X., Valdar, W., Trakalo, J., Scarcelli, N., Ehrenreich, I. M., Purugganan, M. D., Borhan, M. H., Gunn, N., Cooper, A., Gulden, S., Tör, M., Rimmer, S. R., & Holub, E. B. Statistics, 4-4. Durrant, C., et al. (2009). A Multiparent Advanced Generation Inter-Cross to fine-map (2008). WRR4 encodes a TIR-NB-LRR protein that confers broad-spectrum white rust • Horton, M. W., Hancock, A. M., Huang, Y. S., Toomajian, C., Atwell, S., Auton, A., quantitative traits in Arabidopsis thaliana. PLoS genetics, 5(7), e1000551. resistance in Arabidopsis thaliana to four physiological races of Albugo candida. Muliyati, N. W., et al. (2012). Genome-wide patterns of genetic variation in worldwide • Platt, A., Horton, M., Huang, Y. S., Li, Y., Anastasio, A. E., Mulyati, N. W., Agren, J., et Molecular plant-microbe interactions : MPMI, 21(6), 757-68. Arabidopsis thaliana accessions from the RegMap panel. Nature Genetics, 44(2), 212al. (2010). The scale of population structure in Arabidopsis thaliana. PLoS genetics, 6(2), • Ehrenreich, I. M., Hanzawa, Y., Chou, L., Roe, J. L., Kover, P. X., & Purugganan, M. D. 216. Nature Publishing Group. e1000843. (2009). Candidate gene association mapping of Arabidopsis flowering time. Genetics, • Kimura, M., & Ohta, T. (1973). The age of a neutral mutant persisting in a finite • Plomin, R., Haworth, C. M. A., & Davis, O. S. P. (n.d.). 183(1), 325-35. population. Genetics, 75(1). Genetics Soc America. Retrieved from quantitative traits. Genetics. • Gutenkunst, R. N., Hernandezr, R. D., Williamson, S. H., & Bustamante, C. D. (2010). http://www.genetics.org/content/75/1/199.short Inferring the demographic history • Acknowledgments Prof. Eric Holub Prof. Robin Allaby Doc. Volkan Cevik Warwick School of Life Sciences BBSRC