Use of RAD Sequencing to Create a Meiotic Recombination Map in Arabidopsis Natasha Elina Plant Sciences, University of Cambridge Edinburgh October 21st 2009 Meiotic Recombination Homologous chromosomes Chiasma Recombinant chromatids Recombination is not evenly distributed along the chromosome – ‘hot’ and ‘cold’ spots Drouaud et al., Genome Research, 2006 What is the distribution of crossover frequencies (a fine map)? What genomic features do crossover distributions correlate with? Why do we want to know that? Plant breeding, population genetics Model Plant – Arabidopsis thaliana Genome completely sequenced Sequence information available for different ecotypes Epigenomic information (siRNA database, DNA methylation, etc) Mutants that change this information More cross-over in the male than in female meioses; sex average is 8.9 High Resolution Mapping of Meiotic Recombination Points Columbia-0 F1 Landsberg erecta X Back-crossing F1 to one of the parents Columbia-0 RAD sequencing this population and making a recombinant map Two Arabidopsis Ecotypes Columbia-0 and Landsberg erecta Col-0 Ler 1% sequence divergence ~ 120,000 total polymorphisms Statistical Rationale for the Project Arabidopsis genome – 5 chromosomes Whole genome ~125Mb Average of 8.9 cross-overs per meiosis ~240 SNPs/chromosome; 1200 SNPs total 200 kb ~ 1 cM 1200 SNP markers in 500 F2 plants will give a 95% likelihood of observing an average of 10 CO events per interval Looking at the same SNPs in all plants: RAD mapping (Baird et al., PLoS ONE 2008) Krys Kelly Distribution of SNPs linked to the restriction enzyme Krys Kelly RAD Mapping DNA digestion Sonication Size selection Library amplification 1 – sample 2- DNA ladder RAD library profile Adapters and Bar Coding genomic DNA P1 adapter SNP SNP SNP P2 adapter Illumina sequence read length – 35-100 nt Based on 31x depth with Illumina, we need 0.5 mln reads per plant P1 adapter: Contains a barcode: four base code followed by the fifth ‘checksum’ base; Barcode sequences – a combination of most divergent ones Bar-codes used: A - cgtga, B – gtcga, C – agcgc, D - tatga P1 and P2 adapters: 5’ phosphorylated 3’ base is linked through a phosphorothioate bond Work in progress (pilot study) Expected: Col-0 Col-0, homozygous Ler Ler, homozygous F1:ColxLer Col, Ler heterozygous F2 Col, Ler, recombination point(s) Total number of reads: After de-multiplexing: Percentage recovered: 8,491,734 7,936,046 93% Columbia Number Landsberg % Number % F1 Number % Number 2,496,244 Total genome matching reads (redundant) 1,556,149 62% 1,238,547 59% 1,382,835 63% 586,472 51% Uniquely matching Multiply matching reads 1,080,455 516,320 69% 892,745 371,690 72% 1,038,490 344,346 75% 316,231 270,242 54% 124,127 77,265 115,317 65,455 2,199,201 % Total reads for each sample Non-redundant reads Non-redundant genome matching reads 2,091,127 F2 124,024 76,502 1,149,474 71,981 41,513 Workflow – from DNA to a Cross-Over Frequency Map population of recombinants RAD library Illumina sequencing data Aligning sequences to the reference genome genotype calling Cross-over frequency map Future Plans DNA methylation profile Expression profile in met1 Cross-over frequencies Drouaud et al., Genome Research, 2006; Zhang et al., Cell 2006; Zilberman et al., Nature Genetics 2007; Cokus et al., Nature 2008 Acknowledgements Ian Henderson Liz Alvey David Baulcombe Krys Kelly Kim Rutheford Paul Etter Eric Johnson CRUK Cancer Research Institute James Hadfield Nik Mattews Kevin Howe Rory Stark