Bioinformatics and PCR Jo Vandesompele professor, Ghent University co-founder and CEO, Biogazelle Bioinformatics for the Medical Molecular Laboratory HOWEST Brugge, 18 oktober, 2011 outlines introduction on (q)PCR primer design and assay validation (17) reference gene selection and validation (7) cross laboratory data comparison using external standards (7) data-analysis (5) RDML data exchange and reporting (4) statistics (1) polymerase chain reaction (PCR) 30-45 cycles of a 3-step process denaturation (95° C 15 s) annealing (50-70° C 15 s) 20bp ~1/1000 000 000 000 extension (72° C 1 min) polymerase chain reaction (PCR) exponential amplification 35 cycli 235 = 68 x 10E9 1 2 (21) 4 (22) 8 (23) 16 (24) polymerase chain reaction (PCR) endpoint detection (agarose gel electrophoresis) not really quantitative carry-over contamination laborious real-time PCR principle real-time PCR principle continuous monitoring of PCR product accumulation i.e. measure every cycle the amount of fluorescence relationship between the time fluorescence increases above background and the initial amount of template i.e. the sooner fluorescence is visible, the more template was present, and vice versa amplification curve exponential function (y=a^x + b) sigmoid amplification curve amplification curve quantification cycle value: threshold method threshold Cq quantification cycle value: 2nd derivative maximum method Cq comparative Cq quantification method delta-Cq = 19 – 17 = 2 2^2 = 4 comparative Cq quantification method delta-Cq 1 2 3 n n 0.5 0.25 0.125 RQ 1/2 1/4 1/8 2-1 2-2 2-3 2-n E-n comparative Cq quantification method delta-Cq 1 2 3 0.5 0.25 0.125 n n RQ 1/2 1/4 1/8 2-1 2-2 2-3 2-n E-n Relative Quantity (RQ) = E^delta-Cq (calibrator – unknown sample) amplification efficiency E normalization to correct for experimental differences NRQ = RQgoi/RQref standard curve quantification method y=a*x+b Cq 1 35 10 1 10 2 unknown sample 10 3 25 10 4 10 5 350 copies 10 6 15 0 1 2 3 4 log10 quantity 5 6 standard curve quantification method slope (a) of the standard curve ~ efficiency of the PCR efficiency = 10^(-1/a) - 1 ideal efficiency 100 % 1 slope of -3.322 2 (base exponential amplification) ! real-time PCR advantages large dynamic range of linear quantification sensitivity accuracy and reproducibility high-throughput absence of post-PCR manipulations real-time PCR detection chemistries double-stranded DNA specific binding dyes SYBR Green I, EvaGreen o very well suited for quantification of a large number of different sequences o detection of primer-dimer artifacts and non-specific amplification o product differentiation (DNA melting curve analysis) (Ririe et al., 1997) real-time PCR detection chemistries double-stranded DNA specific binding dyes SYBR Green I, EvaGreen sequence specific probes hydrolysis probe (TaqMan), Molecular beacon, dual hybridization probes combined primer-probes Scorpion fluorescent primers Sunrise, LightUp, Lux probes are useful for multiplexing, genotyping and in diagnostics; for all other applications, ds DNA binding dyes are method of choice SYBR Green I SYBR Green I SYBR Green I 65 70 75 80 85 90 melt curve analysis -dF/dT Tm melting peak 82.6 88.1 uMelt - http://dna.utah.edu/umelt/umelt.html prediction of melt peaks hydrolysis probe hydrolysis probe hydrolysis probe FRET hydrolysis probe hydrolysis probe hydrolysis probe hydrolysis probe (TaqMan probe) Pacman booming technology Higuchi et al., 1993 real-time PCR applications gene expression analysis (gold standard) fusion gene detection (MRD) pathogen detection (e.g. viral load) genotyping (SNP analysis, mutation analysis) gene copy number quantification (e.g. oncogenes, exon deletions) critical factors contributing to reliable results Derveaux et al., Methods, 2010 assay design – probes vs SYBR choose probes for multiplexing genotyping absolute sensitivity (detection past cycle 40) (e.g. clinical-diagnostic setting, GMO detection) choose SYBR Green I for all other applications low cost seeing what you do assay design – design guidelines location sequence repeats, protein domains (pfam database) splice variants intron spanning vs. exonic size short amplicons: 80-150bp primers dTm < 2°C identical Tm for all assays maximum 2 GC in last 5 nucleotides use software to design assays Primer3(Plus), BeaconDesigner, RTPrimerDB / primerXL PCR assay design and validation do thorough in silico assay evaluation BLAST/BiSearch specificity analysis mfold secondary structure SNP analysis of primer annealing regions splice variant specificity do experimental validation standard curve (range, # dilution points are important) o formula 4 Hellemans et al., 2007, Genome Biology electrophoresis (agarose, polyacrylamide, microfluidic) (only once) melting curves (sequence) submit validated assay to public database, such as RTPrimerDB Primer3Plus - http://www.primer3plus.com BiSearch - http://bisearch.enzim.hu developed for PCR specificity assessment faster due to indexing the genome dbSNP - http://www.ncbi.nlm.nih.gov/projects/SNP/ mfold - http://mfold.rna.albany.edu/?q=mfold assay QC – in silico validation 101% 920% RTPrimerDB – http://www.rtprimerdb.org database of experimentally verified qPCR assays in silico evaluation of custom assays Nucleic Acids Research, 2003, 2006, 2009 assay QC – in silico validation step 1 assay QC – in silico validation step 2 assay QC – in silico validation step 3 assay QC – in silico validation mfold Zuker et al., Nucleic Acids Research, 2003 assay QC – in silico validation assay QC – in silico validation assay QC – wet lab validation Effect of oligo producer on ampl eff amplification efficiency 2.3 2.2 2.1 2 Ig 1.9 Id 1.8 B 1.7 1.6 1.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 primerXL – http://www.primerxl.org (not online yet) Lefever et al., in preparation primer design for qPCR, resequencing, genotyping reference gene variability quantitative RT-PCR analysis of 10 reference genes (belonging to different functional and abundance classes) on 85 samples from 13 different human tissues 4 3 ACTB HMBS 2 HPRT1 TBP 1 UBC 0 A B C D E F G 15 fold difference between A and B if normalized by only one gene (ACTB or HMBS) our geNorm solution framework for qPCR gene expression normalisation using the reference gene concept: quantified errors related to the use of a single reference gene (> 3 fold in 25% of the cases; > 6 fold in 10% of the cases) developed a robust algorithm for assessment of expression stability of candidate reference genes proposed the geometric mean of at least 3 reference genes for accurate and reliable normalisation Vandesompele et al., Genome Biology, 2002 Data QC – reference gene stability pairwise variation V (between 2 genes) gene A gene B sample 1 a1 b1 log2(a1/b1) sample 2 a2 b2 log2(a2/b2) sample 3 a3 b3 log2(a3/b3) … … … … sample n an bn log2(an/bn) standard deviation = V gene stability measure M average pairwise variation V of a gene with all other genes geNorm software ranking of candidate reference genes according to their stability determination of how many genes are required for reliable normalization http://www.genorm.info Data QC – reference gene stability geometric mean of 3 reference gene expression levels geometric mean = (a x b x c) arithmetic mean = 1/3 a+b+c 3 controls for outliers compensates for differences in expression level between the reference genes geNorm validation cancer patients survival curve statistically more significant results log rank statistics NF4 0.003 NF1 0.006 0.021 0.023 0.056 Hoebeeck et al., Int J Cancer, 2006 normalization using multiple stable reference genes geNorm is the de facto standard for reference gene validation and normalization > 3500 citations of our geNorm technology > 15,000 geNorm software downloads in 100 countries improved geNorm > genormPLUS classic geNorm improved geNorm Excel Windows qbasePLUS Win, Mac, Linux 1x 20x interpretation - + ranking best 2 genes - + handling missing data - + raw data (Cq) as input - + platform speed external oligonucleotide standards synthetic control FP stuffer RCRP 55-60 nucleotides standard desalted – unblocked (<> Vermeulen et al., 2009) 5 points dilution series: 150 000 molecules > 15 molecules external oligonucleotide standards cross lab comparison 366 samples 5 standards (triplicates) 3 reference genes + 5 genes of interest external oligonucleotide standards cross lab comparison Cq qPCR instrument 1, mastermix 1 5 standards (triplicates) 36 34 average ΔCq standards 32 30 28 correction Cq samples 26 24 22 20 18 16 16 18 20 22 24 26 28 30 32 34 36 Cq qPCR instrument 2, mastermix 2 external oligonucleotide standards cross lab comparison Cq qPCR instrument 1, mastermix 1 5 standards (triplicates) 36 34 32 30 28 26 24 22 20 18 16 16 18 20 22 24 26 28 30 32 34 36 Cq qPCR instrument 2, mastermix 2 external oligonucleotide standards cross lab comparison ARHGEF7 gene 366 samples Cq 7900HT use of 5 standards (triplicates) for correction Cq LC480 abs (dCq) external oligonucleotide standards cross lab comparison 5-15% better patient classification accuracy after calibration external oligonucleotide standards cross lab comparison Vermeulen et al., Nucleic Acids Research, 2009 problem of data-analysis extraction of meaningful biological information from qPCR data problem of data-analysis extraction of meaningful biological information from qPCR data universal quantification model with proper error propagation qBase paper Hellemans et al., Genome Biology, 2007 qbasePLUS most powerful, flexible and user-friendly real-time PCR data-analysis software based on Ghent University’s geNorm and qBase technology up to fifty 384-well plates multiple reference genes for accurate normalization detection and correction of inter-run variation dedicated error propagation automated analysis; no manual interaction required http://www.qbaseplus.com MIQE guidelines in Clinical Chemistry http://www.rdml.org/miqe MIQE checklist for authors, reviewers and editors experimental design sample nucleic acid extraction reverse transcription target information oligonucleotides qPCR protocol qPCR validation data analysis RDML data exchange format RDML: http://www.RDML.org RDML data exchange format Lefever et al., Nucleic Acids Research, 2009 Statistical analysis & interpretations sample size (~ power analysis) log transform gene expression data consider pairing parametric vs. non-parametric • central limit theorem: parametric test • better safe than sorry: non-parametic t-test Paired t-test ANOVA Mann-Whitney Wilcoxon signed rank test Kruskal-Wallis