Targeted Sequencing of Human Genomes, Transcriptomes, and Methylomes Jin Billy Li George Church Lab Harvard Medical School jli@genetics.med.harvard.edu Genetic Loci X Sample Size = Information PCR seq Mass-spec # samples SNP array Shotgun seq RNA-seq ChIP-seq # genetic loci Target Capturing with Padlock Probes (aka MIPs) pol lig feature 1 … feature n PCR (or RCA) … Porreca et al., Nat Methods 2007 Mass Production of Padlock Oligos 150 nt 100 nt 50 nt 55k features of up to 200nt ~10,000-fold Improvement Since Nov 2007 2 3 500 1.0 400 0.8 300 0.6 200 0.4 100 0.2 * 10,000x 1,000x 100x 10x 1x 250:1 100:1 50:1 10:1 5 days 2 days 1 day + cycling 0 1 day 1 hour * 0.0 Fold improvement 1 1.2 15 mins Capturing efficiency (%) 1. longer hybridization time; 2. more probes; 3. right [dNTP] variable hyb time variable probe:gDNA variable dNTP amount probe:gDNA = 10:1 2 day hyb time 1 day hyb time 100x dNTP 100x dNTP probe:gDNA = 100:1 20-fold improvement already by better probe design and synthesis Li et al., in prepration ~10,000-fold Improvement Since Nov 2007 2 3 500 1.0 400 0.8 300 0.6 200 0.4 100 0.2 * 10,000x 1,000x 100x 10x 1x 250:1 100:1 50:1 10:1 5 days 2 days 1 day + cycling 0 1 day 1 hour * 0.0 Fold improvement 1 1.2 15 mins Capturing efficiency (%) 1. longer hybridization time; 2. more probes; 3. right [dNTP] variable hyb time variable probe:gDNA variable dNTP amount probe:gDNA = 10:1 2 day hyb time 1 day hyb time 100x dNTP 100x dNTP probe:gDNA = 100:1 20-fold improvement already by better probe design and synthesis Li et al., in prepration ~10,000-fold Improvement Since Nov 2007 2 3 500 1.0 400 0.8 300 0.6 200 0.4 100 0.2 * 10,000x 1,000x 100x 10x 1x 250:1 100:1 50:1 10:1 5 days 2 days 1 day + cycling 0 1 day 1 hour * 0.0 Fold improvement 1 1.2 15 mins Capturing efficiency (%) 1. longer hybridization time; 2. more probes; 3. right [dNTP] variable hyb time variable probe:gDNA variable dNTP amount probe:gDNA = 10:1 2 day hyb time 1 day hyb time 100x dNTP 100x dNTP probe:gDNA = 100:1 20-fold improvement already by better probe design and synthesis Li et al., in prepration Improved Technology -> Better Performance Sensitivity + Uniformity Correlation Current Current Nov 2007 95% captured 85% within 100-fold range 55% within 10-fold range Nov 2007 Li et al., in prepration Summary of Improvements Nov 2007 Current ~100% ~100% Sensitivity/Multiplexity (of 55k) 18% 95% Uniformity (in 100-fold range) 16% 85% Correlation of replicates (r) 0.35 0.98 Accuracy (heterozygous calls) 31% 99% Specificity Targeted Capturing of • Genomes – – – – Exome: PGP etc. Contiguous regions or gene panels SNPs Hypermutable CpG dinucleotides • Transcriptomes – Alleotyping – RNA editing sites • Methylomes – CpG methylation Targeted Capturing of • Genomes – – – – Exome: PGP etc. Contiguous regions or gene panels SNPs Hypermutable CpG dinucleotides • Transcriptomes – Alleotyping – RNA editing sites • Methylomes – CpG methylation A -> I (G) RNA Editing • Post-transcriptional A -> I • I is read as G during translation • Only 10 targets are known in human coding regions Predicting Putative Editing Sites A in the genome G in some mRNAs or ESTs Discovery of 100’s of Novel Editing Sites 36,000 predicted editing sites gDNA + 7 tissue cDNAs from an individual Padlock + Solexa: 239 sites found to be edited Validation (PCR + Sanger): 18 of 20 random sites are obviously edited with Erez Levanon, in preparation Example: VEZF1 Genomic DNA RNA - cerebellum RNA - corpus callosum RNA - frontal lobe RNA - diencephalon RNA - intestine RNA - kidney RNA - adrenal Bisulfite Padlock Probes (BSP): CpG Methylation Bisulfite-treated genome “3-base” genome High specificity of padlock Methylation Level Accurately Measured r = 0.979 BSP-Sanger correlation Methylation level estimated by sequencing Sanger Methylation level estimated by Sanger sequencing Methylation level, replicate 2 BSP-BSP correlation 1 0.8 0.6 0.4 0.2 0 r = 0.966 -0.2 0 Methylation level, replicate 1 0.2 0.4 0.6 0.8 Methylation level measured by BSP sequencing Methylation level measured by BSP sequencing 1 Methylation Pattern around Genes Gene-Body Methylation with Madeleine Price Ball, in preparation (poster) Acknowledgements George Church Padlock technology Sequencing Kun Zhang John Aach Abraham Rosenbaum Jay Shendure Greg Porreca Annika Ahlford Yuan Gao Bin Xie Bob Steen RNA editing Erez Levanon Jung-Ki Yoon CpG methylation Madeleine Price Ball Church Lab Agilent Emily Leproust Wilson Woo Superior Quality of Padlock Oligos 55k features of up to 200nt PCR (2x) Solexa sequencing 150 nt 100 nt 50 nt probes ofsites Fraction Percentage of (%) 12 before amplification (data) after amplication (data) before amplication (poisson) after amplification (poisson) 10 8 6 4 2 0 0 10 20 30 Number of reads 40 50 From Agilent Oligos to Padlock Probes amplification and selection DpnII T Agilent oligo, 136 bp 18bp 18bp PCR * U A p exonuclease * U Annealed with DpnII guide oligo * U NN USER + DpnII Padlock probe Heterozygous Genotypes Correctly Called before after Homozygous wild type Heterozygous variation Homozygous variation Methods in Comparison Padlock Array-based hyb Upfront probe cost (10-20% of exome) $12,000 per 55k 100mers $600 per 385k 70mers Probes amplifiable? Yes No Reaction phase Solution, 10-20 μl Surface, 200 μl Enzymatic hyb? Yes No gDNA required ~0.5-1 μg 20 μg (WGA) Efficiency (->accuracy) 1% N/A (<0.1%?) Uniformity 100-fold range 10-fold range Specificity ~100% on target 30-80% on or near target Differential Clamping at Ligation Junction 293 300 Average coverage 250 200 150 181 165 155 160 166153 146 125 162 166156 142 159 139 100 38 50 0 proximal distal extension arm proximal ligation arm distal A C G T (1 0, 1 (1 5] 5, 2 (2 0] 0, 2 (2 5] 5, 3 (3 0] 0, 3 (3 5] 5, 4 (4 0] 0, 4 (4 5] 5, 5 (5 0] 0, 5 (5 5] 5, 6 (6 0] 0, 6 (6 5] 5, 7 (7 0] 0, 7 (7 5] 5, 8 (8 0] 0, 8 (8 5] 5, 90 ] Average coverage % GC VS Capturing Efficiency 200 gap + arms gap extension arm 150 ligation arm 100 50 0 % GC 99% Concordance Between Padlock and HapMap The Editing “Calls” Are Well Correlated 1 G/(A+G), frontal lobe replicate 2 r = 0.964 0.1 0.01 0.01 0.1 G/(A+G), frontal lobe replicate 1 1 Bisulfite Padlock Probes (BSP): CpG Methylation Bisulfite-treated genome • 10k CpG sites tiling the ENCODE regions – 1 CpG site every 3kb region on average • High specificity – 79 of 80 Sanger reads match correct locations collected in a tube B P shearing, end polishing PCR B P adapter ligation λ exonuclease B hybridization in closed-tube solution strep B denaturing, PCR Li et al., unpublished Methods in Comparison Padlock Array-based hyb Biotin-coupled hyb Upfront probe cost (10-20% of exome) $12,000 per 55k 100mers $600 per 385k 70mers $500 per 244k 60mers Probes amplifiable? Yes No Yes Reaction phase Solution, 10-20 μl Surface, 200 μl Solution, 10-20 μl Enzymes in hyb? Yes No No gDNA required ~0.5-1 μg 20 μg (WGA) ~0.5-1 μg Efficiency (->accuracy) 1% N/A (<0.1%?) ~10%? Uniformity 100-fold range 10-fold range 10-fold range? Specificity ~100% on target 30-80% on or near target ~55% on or near target Two Tech Replicates Are Well Correlated Correlation of counts Counts, replicate 2 Number of reads per site Uniformity Ranked target sites Counts, replicate 1