of Arabidopsis thaliana reservoir populations in low-disturbance habitats

advertisement
Ecological genetics of Arabidopsis thaliana from
reservoir populations in low-disturbance habitats
Neil Pearson, Warwick HRI Contact n.pearson@warwick.ac.uk Student ID: 0867630
Supervisors: Professor Eric Holub, Professor Robin Allaby Funding: BBSRC
Summary
Background
Bioinformatic analyses of high density SNP data from Arabidopsis thaliana accessions
collected in the UK will attempt to identify regions in the genome that could trace the
history of haplotypes back to founding populations, and to determine whether regions
are under selection due to parasitism (e.g, Albugo candida, the white blister rust
pathogen). A number of genome-wide studies have recently discovered evidence of
selection in the human genome, and this project will extend such techniques into the
field of Arabidopsis genomic research.
Efforts to understand the genetic basis of phenotypic diversity have advanced in 3
major stages, with techniques generally being pioneered in human genetics and
subsequently applied to the study of other model organisms, including A. thaliana:
Haplotype blocks under selection will be identified by incidences of contiguous
covariant SNPs at a rate significantly higher than expected under a neutral model. The
lengths and distributions of these haplotype may grant insight into migratory history
and recent adaptive walks of A. thaliana, and may also provide indications of the allelic
compositions of the founders of the UK population. The population history inference
software DADI will also be used to compare a frequency spectrum derived from SNP
data with models of potential population histories.
In parallel, genome-wide association mapping will be used to identify regions that may
confer susceptibility in a global sample of A. thaliana to a common oomycete parasite
Albugo candida (white blister rust). Use of A. candida will allow a test for correlation
with haplotype blocks, thus indicating positive selection for resistance to infection.
Correlation between haplotype blocks and environment types (gardens versus low
disturbance wall sites), or between broader habitat types, will be investigated using
available geographic data. A software solution dedicated to finding evidence of positive
selection from such combined SNP and phenotype data will be produced and released
to facilitate further research into the underlying genetic causes of phenotypic variation
using this approach.
The aim of this project is therefore to identify local genetic variation in A. thaliana that
can be attributed to the action of selection, especially that caused by A. candida.
1. Human Genome Project: First complete sequence of the entire genome. Raised the
possibility that the genetic causes of all phenotypic variation might soon be known
Image 1
Albugo candida, encountered on Arabidopsis
thaliana’s close relative and competitor Capsella bursa
(Shepherd’s purse), causing the disease ‘white rust’
a
b
e
c
f
d
g
Images 2a-g Observed response phenotypes to Albugo
1. International HapMap Consortium: Utilised high-density SNP data to attempt to
trace genetic differences responsible for phenotypic variation. Shifted perspective
from simple Mendelian characters to more complex, quantitative traits, as
described by Plomin (2010)
2. 1000 Genomes Project: Resequencing effort, made possible by technological
advances. Addresses biases inherent to HapMap approach, and enables
comprehensive genome-wide association mapping techniques
These techniques, pioneered in human genetic research, have proven effective when
applied to A. thaliana, being used – for example – to identify genes associated with
flowering time (Ehrenreich et al, 2009). It is further argued (Bergelson and Roux, 2010)
that placing such genome-wide association studies in an ecological context enables the
study of past and the prediction of future evolutionary trajectories – i.e., selective
walks.
Following this line of thought, three complementary tests were applied to a set of
genome-wide SNP data generated from A. thaliana by Horton et al (2012), in order to
identify previously unknown genomic regions that are under selection. Many were
found, but the exact details of the population history responsible for these results are,
as yet, unknown.
candida infection in Arabidopsis thaliana, ranging from
complete resistance (a) through partial resistance (b, c, d, e)
to full susceptibility (f, g)
Objective 1: Identification of haplotypes
Previous work (Platt et al., 2010) indicated that
globally, the Arabidopsis thaliana population
followed a model of gene flow known as isolation by
distance – in which the likelihood of two individuals
sharing alleles decreases as geographic separation
increases – and that this model held true at all scales
examined across Eurasia. Due to the relatively small
number of loci available, however, this approach
could not be used to investigate specific predictions
concerning selection acting on particular loci.
Applying a similar approach with the 250K SNP
dataset, though, allows such predictions to be made.
Two subsets of accessions were selected from the
full dataset: accessions collected in the UK, and
accessions from the Nordborg-Bergelson set. SNP
data were divided into windows of 100 adjacent loci
each, and a script was used to locate pairwise
similarity of 99% or greater per window. A K-means
clustering algorithm was then used to separate out
haplotypes that failed to be distinguished due to
proximity (see Figure 2).
Results (Figures 1a and 1b) show a close similarity in
the distribution of haplotypes across the genome in
both subsets. This shows that, in all likelihood, most
haplotypes are older than the species’ entry to the
UK; this will be confirmed by a comparison seeking
corresponding blocks occurring in both subsets,
further enabling a comparison of frequencies
between subsets (see Objective 3).
Objective 2: QTL mapping Albugo
resistance
A set of Multiparent Advanced Generation Intercross (MAGIC) lines (see Kover et al., 2009) were
grown and inoculated with ACEM2, a race of Albugo
candida, and the resulting phenotypes recorded (see
Images 2a to 2g). Analysis of phenotypes using
MAGIC mapping software revealed 3 association
peaks, closely corresponding with genes WRR4
(Borhan et al., 2008), WRR5/6 (Holub and Cevic,
pers. comm.), and an unnamed gene. Upon first
analysis using all accessions, a strong association
peak is discovered (see Images 3c and 3d). Removing
lines showing complete resistance reveals two more
peaks (Image 3e).
a
b
This experiment is now being repeated with a
second A. candida isolate collected from C. bursa, in
order to establish that the two isolates are clonal,
and provoke the same phenotypes that associate
with the same defence genes in A. thaliana.
c
d
e
An overview
this group:
Window
Sq_1
NFA_8
Hil_1
Crl_1
Edburgh_8
HR_5
HR_10
Cnt_1
UKSE6_640
UKSE6_618
UKID35
UKID87
UKID103
UKID28
UKID17
CIBC_5A
UKSE6_626
PHW_13
of run structure in
1
|
|
|
|
-
2
|
|
|
|
|
-
3
|
|
|
|
-
4
|
|
|
|
-
5
|
|
|
|
|
|
|
|
|
|
|
-
6
|
|
|
|
|
|
|
-
7
|
|
|
|
|
|
|
|
|
|
|
-
8
|
|
|
|
|
|
|
|
|
|
|
-
9
|
|
|
|
|
|
|
|
|
|
10
|
|
|
|
|
|
|
|
|
|
Figure 2 An example of similarity within a short series of windows,
demonstrating the necessity of employing clustering analysis to determine
haplotype structure. Vertical marks represent ≥99% similarity between 2
or more accessions, horizontal marks represent less extensive similarity.
Image 3a-e Haplotypes (≥99% similarity) identified from 250K SNP data in UK accessions
(a) and international accessions (b), and MAGIC mapping traces using all phenotypes (c),
binary resistant/susceptible (d), and entirely resistant phenotypes removed (e)
DADI analysis
Image 4
PCA of international
250K SNP data (taken from Horton
et al 2012 supplementary data)
Objective 3: Further investigation
Several lines of enquiry may now be followed:
• Measure A. candida resistance phenotypes of accessions used in 250K dataset
• Carry out DADI analysis (Gutenkunst & Hernandezr, 2010) of frequency spectra in
order to infer population history, in addition to simple geographic correlations of
haplotypes comparisons to models specified from data derived from Platt et al. (2010)
• Use Kimura’s equation (Kimura & Ohta, 1973) to estimate divergence time (in
generations) between haplotypes found in UK and Nordborg-Bergelson accessions,
assuming neutrality
• Resampling in regions showing differences in frequency. Use F-statistics and HardyWeinberg equilibrium to identify instances of gene flow between populations in distinct
geographic areas, and selection
The end goal…
Image 5 Initial 2-dimensional comparison of 250K SNP data (UK and Nordborg-Bergelson
groups) against FS derived from bottlenecked and diverging population model. Note process
of constructing data is, as yet, flawed.
RELATE THE ECOLOGY TO THE GENETICS
References
of multiple populations from genomic polymorphism data.
•
Kover, P. X., Valdar, W., Trakalo, J., Scarcelli, N., Ehrenreich, I. M., Purugganan, M. D.,
Borhan, M. H., Gunn, N., Cooper, A., Gulden, S., Tör, M., Rimmer, S. R., & Holub, E. B.
Statistics, 4-4.
Durrant, C., et al. (2009). A Multiparent Advanced Generation Inter-Cross to fine-map
(2008). WRR4 encodes a TIR-NB-LRR protein that confers broad-spectrum white rust
•
Horton, M. W., Hancock, A. M., Huang, Y. S., Toomajian, C., Atwell, S., Auton, A.,
quantitative traits in Arabidopsis thaliana. PLoS genetics, 5(7), e1000551.
resistance in Arabidopsis thaliana to four physiological races of Albugo candida.
Muliyati, N. W., et al. (2012). Genome-wide patterns of genetic variation in worldwide •
Platt, A., Horton, M., Huang, Y. S., Li, Y., Anastasio, A. E., Mulyati, N. W., Agren, J., et
Molecular plant-microbe interactions : MPMI, 21(6), 757-68.
Arabidopsis thaliana accessions from the RegMap panel. Nature Genetics, 44(2), 212al. (2010). The scale of population structure in Arabidopsis thaliana. PLoS genetics, 6(2),
•
Ehrenreich, I. M., Hanzawa, Y., Chou, L., Roe, J. L., Kover, P. X., & Purugganan, M. D.
216. Nature Publishing Group.
e1000843.
(2009). Candidate gene association mapping of Arabidopsis flowering time. Genetics,
•
Kimura, M., & Ohta, T. (1973). The age of a neutral mutant persisting in a finite
•
Plomin, R., Haworth, C. M. A., & Davis, O. S. P. (n.d.).
183(1), 325-35.
population. Genetics, 75(1). Genetics Soc America. Retrieved from
quantitative traits. Genetics.
•
Gutenkunst, R. N., Hernandezr, R. D., Williamson, S. H., & Bustamante, C. D. (2010).
http://www.genetics.org/content/75/1/199.short
Inferring the demographic history
•
Acknowledgments
Prof. Eric Holub
Prof. Robin Allaby
Doc. Volkan Cevik
Warwick School of Life Sciences
BBSRC
Download