Polony Sequencing

advertisement
Targeted Sequencing of Human
Genomes, Transcriptomes, and Methylomes
Jin Billy Li
George Church Lab
Harvard Medical School
jli@genetics.med.harvard.edu
Genetic Loci X Sample Size = Information
PCR seq
Mass-spec
# samples
SNP array
Shotgun seq
RNA-seq
ChIP-seq
# genetic loci
Target Capturing with Padlock Probes (aka MIPs)
pol
lig
feature 1
…
feature n
PCR (or RCA)
…
Porreca et al., Nat Methods 2007
Mass Production of Padlock Oligos
150 nt
100 nt
50 nt
55k features of
up to 200nt
~10,000-fold Improvement Since Nov 2007
2
3
500
1.0
400
0.8
300
0.6
200
0.4
100
0.2
*
10,000x
1,000x
100x
10x
1x
250:1
100:1
50:1
10:1
5 days
2 days
1 day +
cycling
0
1 day
1 hour
*
0.0
Fold improvement
1
1.2
15 mins
Capturing efficiency (%)
1. longer hybridization time; 2. more probes; 3. right [dNTP]
variable hyb time
variable probe:gDNA
variable dNTP amount
probe:gDNA = 10:1
2 day hyb time
1 day hyb time
100x dNTP
100x dNTP
probe:gDNA = 100:1
20-fold improvement already by better probe design and synthesis
Li et al., in prepration
~10,000-fold Improvement Since Nov 2007
2
3
500
1.0
400
0.8
300
0.6
200
0.4
100
0.2
*
10,000x
1,000x
100x
10x
1x
250:1
100:1
50:1
10:1
5 days
2 days
1 day +
cycling
0
1 day
1 hour
*
0.0
Fold improvement
1
1.2
15 mins
Capturing efficiency (%)
1. longer hybridization time; 2. more probes; 3. right [dNTP]
variable hyb time
variable probe:gDNA
variable dNTP amount
probe:gDNA = 10:1
2 day hyb time
1 day hyb time
100x dNTP
100x dNTP
probe:gDNA = 100:1
20-fold improvement already by better probe design and synthesis
Li et al., in prepration
~10,000-fold Improvement Since Nov 2007
2
3
500
1.0
400
0.8
300
0.6
200
0.4
100
0.2
*
10,000x
1,000x
100x
10x
1x
250:1
100:1
50:1
10:1
5 days
2 days
1 day +
cycling
0
1 day
1 hour
*
0.0
Fold improvement
1
1.2
15 mins
Capturing efficiency (%)
1. longer hybridization time; 2. more probes; 3. right [dNTP]
variable hyb time
variable probe:gDNA
variable dNTP amount
probe:gDNA = 10:1
2 day hyb time
1 day hyb time
100x dNTP
100x dNTP
probe:gDNA = 100:1
20-fold improvement already by better probe design and synthesis
Li et al., in prepration
Improved Technology -> Better Performance
Sensitivity + Uniformity
Correlation
Current
Current
Nov 2007
95% captured
85% within 100-fold range
55% within 10-fold range
Nov 2007
Li et al., in prepration
Summary of Improvements
Nov 2007
Current
~100%
~100%
Sensitivity/Multiplexity (of 55k)
18%
95%
Uniformity (in 100-fold range)
16%
85%
Correlation of replicates (r)
0.35
0.98
Accuracy (heterozygous calls)
31%
99%
Specificity
Targeted Capturing of
• Genomes
–
–
–
–
Exome: PGP etc.
Contiguous regions or gene panels
SNPs
Hypermutable CpG dinucleotides
• Transcriptomes
– Alleotyping
– RNA editing sites
• Methylomes
– CpG methylation
Targeted Capturing of
• Genomes
–
–
–
–
Exome: PGP etc.
Contiguous regions or gene panels
SNPs
Hypermutable CpG dinucleotides
• Transcriptomes
– Alleotyping
– RNA editing sites
• Methylomes
– CpG methylation
A -> I (G) RNA Editing
• Post-transcriptional A -> I
• I is read as G during translation
• Only 10 targets are known in human coding regions
Predicting Putative Editing Sites
A in the genome
G in some mRNAs or ESTs
Discovery of 100’s of Novel Editing Sites
36,000 predicted editing sites
gDNA + 7 tissue cDNAs from an individual
Padlock + Solexa:
239 sites found to be edited
Validation (PCR + Sanger):
18 of 20 random sites are obviously edited
with Erez Levanon, in preparation
Example:
VEZF1
Genomic DNA
RNA - cerebellum
RNA - corpus callosum
RNA - frontal lobe
RNA - diencephalon
RNA - intestine
RNA - kidney
RNA - adrenal
Bisulfite Padlock Probes (BSP): CpG Methylation
Bisulfite-treated genome
“3-base”
genome
High
specificity
of padlock
Methylation Level Accurately Measured
r = 0.979
BSP-Sanger correlation
Methylation level estimated by
sequencing
Sanger
Methylation
level estimated
by Sanger sequencing
Methylation level, replicate 2
BSP-BSP correlation
1
0.8
0.6
0.4
0.2
0
r = 0.966
-0.2
0
Methylation level, replicate 1
0.2
0.4
0.6
0.8
Methylation level measured by BSP sequencing
Methylation
level measured by
BSP sequencing
1
Methylation Pattern around Genes
Gene-Body Methylation
with Madeleine Price Ball, in preparation (poster)
Acknowledgements
George Church
Padlock technology
Sequencing
Kun Zhang
John Aach
Abraham Rosenbaum
Jay Shendure
Greg Porreca
Annika Ahlford
Yuan Gao
Bin Xie
Bob Steen
RNA editing
Erez Levanon
Jung-Ki Yoon
CpG methylation
Madeleine Price Ball
Church Lab
Agilent
Emily Leproust
Wilson Woo
Superior Quality of Padlock Oligos
55k features of up to 200nt
PCR (2x)
Solexa sequencing
150 nt
100 nt
50 nt
probes
ofsites
Fraction
Percentage of
(%)
12
before amplification (data)
after amplication (data)
before amplication (poisson)
after amplification (poisson)
10
8
6
4
2
0
0
10
20
30
Number of reads
40
50
From Agilent Oligos to Padlock Probes
amplification and selection
DpnII
T
Agilent oligo, 136 bp
18bp
18bp
PCR
*
U
A
p
 exonuclease
*
U
Annealed with DpnII guide oligo
*
U
NN
USER + DpnII
Padlock probe
Heterozygous Genotypes Correctly Called
before
after
Homozygous wild type
Heterozygous variation
Homozygous variation
Methods in Comparison
Padlock
Array-based hyb
Upfront probe cost
(10-20% of exome)
$12,000 per 55k 100mers
$600 per 385k 70mers
Probes amplifiable?
Yes
No
Reaction phase
Solution, 10-20 μl
Surface, 200 μl
Enzymatic hyb?
Yes
No
gDNA required
~0.5-1 μg
20 μg (WGA)
Efficiency (->accuracy)
1%
N/A (<0.1%?)
Uniformity
100-fold range
10-fold range
Specificity
~100% on target
30-80% on or near target
Differential Clamping at Ligation Junction
293
300
Average coverage
250
200
150
181
165
155
160 166153
146
125
162 166156
142
159
139
100
38
50
0
proximal
distal
extension arm
proximal
ligation arm
distal
A
C
G
T
(1
0,
1
(1 5]
5,
2
(2 0]
0,
2
(2 5]
5,
3
(3 0]
0,
3
(3 5]
5,
4
(4 0]
0,
4
(4 5]
5,
5
(5 0]
0,
5
(5 5]
5,
6
(6 0]
0,
6
(6 5]
5,
7
(7 0]
0,
7
(7 5]
5,
8
(8 0]
0,
8
(8 5]
5,
90
]
Average coverage
% GC VS Capturing Efficiency
200
gap + arms
gap
extension arm
150
ligation arm
100
50
0
% GC
99% Concordance Between Padlock and HapMap
The Editing “Calls” Are Well Correlated
1
G/(A+G), frontal lobe replicate 2
r = 0.964
0.1
0.01
0.01
0.1
G/(A+G), frontal lobe replicate 1
1
Bisulfite Padlock Probes (BSP): CpG Methylation
Bisulfite-treated genome
• 10k CpG sites tiling the ENCODE regions
– 1 CpG site every 3kb region on average
• High specificity
– 79 of 80 Sanger reads match correct locations
collected in a tube
B
P
shearing, end polishing
PCR
B
P
adapter ligation
λ exonuclease
B
hybridization in closed-tube solution
strep
B
denaturing, PCR
Li et al., unpublished
Methods in Comparison
Padlock
Array-based hyb
Biotin-coupled hyb
Upfront probe cost
(10-20% of exome)
$12,000 per 55k
100mers
$600 per 385k
70mers
$500 per 244k
60mers
Probes amplifiable?
Yes
No
Yes
Reaction phase
Solution, 10-20 μl
Surface, 200 μl
Solution, 10-20 μl
Enzymes in hyb?
Yes
No
No
gDNA required
~0.5-1 μg
20 μg (WGA)
~0.5-1 μg
Efficiency (->accuracy)
1%
N/A (<0.1%?)
~10%?
Uniformity
100-fold range
10-fold range
10-fold range?
Specificity
~100% on target
30-80% on or near
target
~55% on or near
target
Two Tech Replicates Are Well Correlated
Correlation of counts
Counts, replicate 2
Number of reads per site
Uniformity
Ranked target sites
Counts, replicate 1
Download