Presentatie Prof. J. Vandesompele

advertisement
Bioinformatics and PCR
Jo Vandesompele
professor, Ghent University
co-founder and CEO, Biogazelle
Bioinformatics for the Medical Molecular Laboratory
HOWEST Brugge, 18 oktober, 2011
outlines







introduction on (q)PCR
primer design and assay validation (17)
reference gene selection and validation (7)
cross laboratory data comparison using external standards (7)
data-analysis (5)
RDML data exchange and reporting (4)
statistics (1)
polymerase chain reaction (PCR)
30-45 cycles of a 3-step process
denaturation (95° C 15 s)
annealing (50-70° C 15 s)
20bp ~1/1000 000 000 000
extension (72° C 1 min)
polymerase chain reaction (PCR)
exponential
amplification
35 cycli
235 = 68 x 10E9
1
2 (21)
4 (22)
8 (23)
16 (24)
polymerase chain reaction (PCR)
 endpoint detection (agarose gel electrophoresis)
 not really quantitative
 carry-over contamination
 laborious
real-time PCR principle
real-time PCR principle
 continuous monitoring of PCR product accumulation
i.e. measure every cycle the amount of fluorescence
 relationship between the time fluorescence increases above background
and the initial amount of template
i.e. the sooner fluorescence is visible, the more template was present, and
vice versa
amplification curve
exponential function
(y=a^x + b)
sigmoid amplification curve
amplification curve
quantification cycle value: threshold method
threshold
Cq
quantification cycle value: 2nd derivative maximum method
Cq
comparative Cq quantification method
delta-Cq = 19 – 17 = 2
2^2 = 4
comparative Cq quantification method
delta-Cq
1
2
3
n
n
0.5
0.25
0.125
RQ
1/2
1/4
1/8
2-1
2-2
2-3
2-n
E-n
comparative Cq quantification method
delta-Cq
1
2
3
0.5
0.25
0.125
n
n
RQ
1/2
1/4
1/8
2-1
2-2
2-3
2-n
E-n
 Relative Quantity (RQ) = E^delta-Cq (calibrator – unknown sample)
 amplification efficiency E
 normalization to correct for experimental differences
 NRQ = RQgoi/RQref
standard curve quantification method
y=a*x+b
Cq
1
35
10 1
10 2
unknown sample
10 3
25
10 4
10 5
350 copies
10 6
15
0
1
2
3
4
log10 quantity
5
6
standard curve quantification method
 slope (a) of the standard curve ~ efficiency of the PCR
 efficiency = 10^(-1/a) - 1
 ideal efficiency
 100 %
 1
 slope of -3.322
 2 (base exponential amplification) !
real-time PCR
 advantages
 large dynamic range of linear quantification
 sensitivity
 accuracy and reproducibility
 high-throughput
 absence of post-PCR manipulations
real-time PCR
 detection chemistries
 double-stranded DNA specific binding dyes
SYBR Green I, EvaGreen
o very well suited for quantification of a large number of different
sequences
o detection of primer-dimer artifacts and non-specific amplification
o product differentiation (DNA melting curve analysis) (Ririe et al.,
1997)
real-time PCR
 detection chemistries
 double-stranded DNA specific binding dyes
SYBR Green I, EvaGreen
 sequence specific probes
hydrolysis probe (TaqMan), Molecular beacon, dual hybridization
probes
 combined primer-probes
Scorpion
 fluorescent primers
Sunrise, LightUp, Lux
 probes are useful for multiplexing, genotyping and in diagnostics; for
all other applications, ds DNA binding dyes are method of choice
SYBR Green I
SYBR Green I
SYBR Green I
65
70
75
80
85
90
melt curve analysis
-dF/dT
Tm
melting peak
82.6 88.1
uMelt - http://dna.utah.edu/umelt/umelt.html
 prediction of melt peaks
hydrolysis probe
hydrolysis probe
hydrolysis probe
FRET
hydrolysis probe
hydrolysis probe
hydrolysis probe
hydrolysis probe (TaqMan probe)
Pacman
booming technology
Higuchi et al., 1993
real-time PCR applications





gene expression analysis (gold standard)
fusion gene detection (MRD)
pathogen detection (e.g. viral load)
genotyping (SNP analysis, mutation analysis)
gene copy number quantification (e.g. oncogenes, exon deletions)
critical factors contributing to reliable results
Derveaux et al., Methods, 2010
assay design – probes vs SYBR
 choose probes for
 multiplexing
 genotyping
 absolute sensitivity (detection past cycle 40) (e.g. clinical-diagnostic

setting, GMO detection)
choose SYBR Green I for
 all other applications
 low cost
 seeing what you do
assay design – design guidelines
 location
 sequence repeats, protein domains (pfam database)
 splice variants
 intron spanning vs. exonic
 size
 short amplicons: 80-150bp
 primers
 dTm < 2°C
 identical Tm for all assays
 maximum 2 GC in last 5 nucleotides
 use software to design assays
 Primer3(Plus), BeaconDesigner, RTPrimerDB / primerXL
PCR assay design and validation
 do thorough in silico assay evaluation
 BLAST/BiSearch specificity analysis
 mfold secondary structure
 SNP analysis of primer annealing regions
 splice variant specificity
 do experimental validation
 standard curve (range, # dilution points are important)

o formula 4 Hellemans et al., 2007, Genome Biology
 electrophoresis (agarose, polyacrylamide, microfluidic) (only once)
 melting curves
 (sequence)
submit validated assay to public database, such as RTPrimerDB
Primer3Plus - http://www.primer3plus.com
BiSearch - http://bisearch.enzim.hu
 developed for PCR specificity assessment
 faster due to indexing the genome
dbSNP - http://www.ncbi.nlm.nih.gov/projects/SNP/
mfold - http://mfold.rna.albany.edu/?q=mfold
assay QC – in silico validation
101%
920%
RTPrimerDB – http://www.rtprimerdb.org
 database of experimentally verified qPCR assays
 in silico evaluation of custom assays
Nucleic Acids Research, 2003, 2006, 2009
assay QC – in silico validation
 step 1
assay QC – in silico validation
 step 2
assay QC – in silico validation
 step 3
assay QC – in silico validation
 mfold
 Zuker et al., Nucleic Acids Research, 2003
assay QC – in silico validation
assay QC – in silico validation
assay QC – wet lab validation
Effect of oligo producer on ampl eff
amplification efficiency
2.3
2.2
2.1
2
Ig
1.9
Id
1.8
B
1.7
1.6
1.5
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15
primerXL – http://www.primerxl.org (not online yet)
 Lefever et al., in preparation
 primer design for qPCR, resequencing, genotyping
reference gene variability
 quantitative RT-PCR analysis of 10 reference genes (belonging to different
functional and abundance classes) on 85 samples from 13 different human
tissues
4
3
ACTB
HMBS
2
HPRT1
TBP
1
UBC
0
A
B
C
D
E
F
G
15 fold difference between A and B if normalized
by only one gene (ACTB or HMBS)
our geNorm solution
 framework for qPCR gene expression normalisation using the reference
gene concept:
 quantified errors related to the use of a single reference gene
(> 3 fold in 25% of the cases; > 6 fold in 10% of the cases)
 developed a robust algorithm for assessment of expression stability of
candidate reference genes
 proposed the geometric mean of at least 3 reference genes for
accurate and reliable normalisation
 Vandesompele et al., Genome Biology, 2002
Data QC – reference gene stability
 pairwise variation V (between 2 genes)
gene A
gene B
sample 1
a1
b1
log2(a1/b1)
sample 2
a2
b2
log2(a2/b2)
sample 3
a3
b3
log2(a3/b3)
…
…
…
…
sample n
an
bn
log2(an/bn)
standard deviation = V
 gene stability measure M
average pairwise variation V of a gene with all other genes
geNorm software
 ranking of candidate reference genes according to their stability
 determination of how many genes are required for reliable normalization
 http://www.genorm.info
Data QC – reference gene stability
 geometric mean of 3 reference gene expression levels
geometric mean = (a x b x c)
arithmetic mean =
1/3
a+b+c
3
 controls for outliers
 compensates for differences in expression level between the reference
genes
geNorm validation
 cancer patients survival curve
statistically more significant results
log rank statistics
NF4
0.003
NF1
0.006
0.021
0.023
0.056
Hoebeeck et al., Int J Cancer, 2006
normalization using multiple stable reference genes
 geNorm is the de facto standard for reference gene validation and
normalization
 > 3500 citations of our geNorm technology
 > 15,000 geNorm software downloads in 100 countries
improved geNorm > genormPLUS
classic
geNorm
improved
geNorm
Excel
Windows
qbasePLUS
Win, Mac, Linux
1x
20x
interpretation
-
+
ranking best 2 genes
-
+
handling missing data
-
+
raw data (Cq) as input
-
+
platform
speed
external oligonucleotide standards
 synthetic control
FP
stuffer
RCRP
 55-60 nucleotides
 standard desalted – unblocked (<> Vermeulen et al., 2009)
 5 points dilution series: 150 000 molecules > 15 molecules
external oligonucleotide standards cross lab comparison
366
samples
5 standards
(triplicates)
3 reference
genes + 5
genes of
interest
external oligonucleotide standards cross lab comparison
Cq qPCR instrument 1, mastermix 1
 5 standards (triplicates)
36
34
average ΔCq standards
32
30
28
correction Cq samples
26
24
22
20
18
16
16
18
20
22
24
26
28
30
32
34
36
Cq qPCR instrument 2, mastermix 2
external oligonucleotide standards cross lab comparison
Cq qPCR instrument 1, mastermix 1
 5 standards (triplicates)
36
34
32
30
28
26
24
22
20
18
16
16
18
20
22
24
26
28
30
32
34
36
Cq qPCR instrument 2, mastermix 2
external oligonucleotide standards cross lab comparison
 ARHGEF7 gene
 366 samples
Cq 7900HT
 use of 5 standards (triplicates) for correction
Cq LC480
abs (dCq)
external oligonucleotide standards cross lab comparison
 5-15% better patient classification accuracy after calibration
external oligonucleotide standards cross lab comparison
 Vermeulen et al., Nucleic Acids Research, 2009
problem of data-analysis
 extraction of meaningful biological information from qPCR data
problem of data-analysis
 extraction of meaningful biological information from qPCR data
universal quantification model with proper error propagation
qBase paper
Hellemans et al., Genome Biology, 2007
qbasePLUS
 most powerful, flexible and user-friendly real-time PCR data-analysis
software
 based on Ghent University’s geNorm and qBase technology
 up to fifty 384-well plates
 multiple reference genes for accurate normalization
 detection and correction of inter-run variation
 dedicated error propagation
 automated analysis; no manual interaction required
http://www.qbaseplus.com
MIQE guidelines in Clinical Chemistry
http://www.rdml.org/miqe
MIQE checklist for authors, reviewers and editors









experimental design
sample
nucleic acid extraction
reverse transcription
target information
oligonucleotides
qPCR protocol
qPCR validation
data analysis
RDML data exchange format
RDML: http://www.RDML.org
RDML data exchange format
 Lefever et al., Nucleic Acids Research, 2009
Statistical analysis & interpretations
 sample size (~ power analysis)
 log transform gene expression data
 consider pairing
 parametric vs. non-parametric
• central limit theorem: parametric test
• better safe than sorry: non-parametic
t-test
Paired t-test
ANOVA
Mann-Whitney
Wilcoxon signed rank test
Kruskal-Wallis
Download