Use of RAD Sequencing to Create a Meiotic Recombination Arabidopsis Natasha Elina

advertisement
Use of RAD Sequencing to
Create a Meiotic Recombination
Map in Arabidopsis
Natasha Elina
Plant Sciences, University of Cambridge
Edinburgh
October 21st 2009
Meiotic Recombination
Homologous chromosomes
Chiasma
Recombinant chromatids
Recombination is not evenly distributed along
the chromosome – ‘hot’ and ‘cold’ spots
Drouaud et al., Genome Research, 2006
What is the distribution of crossover frequencies (a fine map)?
What genomic features do crossover distributions correlate with?
Why do we want to know that?
Plant breeding,
population genetics
Model Plant – Arabidopsis thaliana
Genome completely sequenced
Sequence information available for
different ecotypes
Epigenomic information (siRNA
database, DNA methylation, etc)
Mutants that change this information
More cross-over in the male than in
female meioses; sex average is 8.9
High Resolution Mapping of Meiotic
Recombination Points
Columbia-0
F1
Landsberg erecta
X
Back-crossing F1 to one of
the parents
Columbia-0
RAD sequencing this
population and making
a recombinant map
Two Arabidopsis Ecotypes Columbia-0 and
Landsberg erecta
Col-0
Ler
1% sequence divergence ~ 120,000 total
polymorphisms
Statistical Rationale for the Project
Arabidopsis genome – 5 chromosomes
Whole genome ~125Mb
Average of 8.9 cross-overs per meiosis
~240 SNPs/chromosome;
1200 SNPs total
200 kb ~ 1 cM
1200 SNP markers in 500 F2 plants will give a 95%
likelihood of observing an average of 10 CO events
per interval
Looking at the same SNPs in all plants:
RAD mapping (Baird et al., PLoS ONE 2008)
Krys Kelly
Distribution of SNPs linked to the
restriction enzyme
Krys Kelly
RAD Mapping
DNA
digestion
Sonication
Size selection
Library
amplification
1 – sample
2- DNA ladder
RAD library profile
Adapters and Bar Coding
genomic DNA
P1 adapter
SNP
SNP
SNP
P2 adapter
Illumina sequence read length – 35-100 nt
Based on 31x depth with Illumina, we need 0.5 mln reads per plant
P1 adapter:
Contains a barcode: four base code followed by the fifth ‘checksum’ base;
Barcode sequences – a combination of most divergent ones
Bar-codes used: A - cgtga, B – gtcga, C – agcgc, D - tatga
P1 and P2 adapters:
5’ phosphorylated
3’ base is linked through a phosphorothioate bond
Work in progress (pilot study)
Expected:
Col-0
Col-0, homozygous
Ler
Ler, homozygous
F1:ColxLer
Col, Ler heterozygous
F2
Col, Ler, recombination point(s)
Total number of reads:
After de-multiplexing:
Percentage recovered:
8,491,734
7,936,046
93%
Columbia
Number
Landsberg
%
Number
%
F1
Number
%
Number
2,496,244
Total genome matching reads (redundant)
1,556,149
62%
1,238,547
59%
1,382,835
63%
586,472
51%
Uniquely matching
Multiply matching reads
1,080,455
516,320
69%
892,745
371,690
72%
1,038,490
344,346
75%
316,231
270,242
54%
124,127
77,265
115,317
65,455
2,199,201
%
Total reads for each sample
Non-redundant reads
Non-redundant genome matching reads
2,091,127
F2
124,024
76,502
1,149,474
71,981
41,513
Workflow – from DNA to a Cross-Over
Frequency Map
population of recombinants
RAD library
Illumina
sequencing data
Aligning sequences
to the reference
genome
genotype calling
Cross-over frequency map
Future Plans
DNA methylation profile
Expression profile in met1
Cross-over frequencies
Drouaud et al., Genome Research, 2006; Zhang et al., Cell 2006; Zilberman et al., Nature Genetics 2007;
Cokus et al., Nature 2008
Acknowledgements
Ian Henderson
Liz Alvey
David Baulcombe
Krys Kelly
Kim Rutheford
Paul Etter
Eric Johnson
CRUK Cancer Research Institute
James Hadfield
Nik Mattews
Kevin Howe
Rory Stark
Download