Measurements and Statistics of ddPCR Experiments Version 1.0, released Dec 3, 2012 Presentation Overview This presentation answers the following questions: How does QuantaSoft calculate target concentration? Why is amount of sample loaded critical to accurate ddPCR measurements? What do the two types of error bars presented in QuantaSoft mean, and which ones do I use? Technical replicates on the QX100 at a range of concentrations Key Points About Droplet Digital PCR Each droplet is an isolated reaction vessel All PCR reagents are contained in each droplet volume In any given droplet, target may or may not be present Readout: Number of droplets with target (positives) Number of droplets without target (negatives) This is an endpoint assay Only +/- matters. Consistent PCR efficiency is less important than it is in qPCR Generate droplets PCR Count with droplet reader # positive: # of negatives: Copies Per Droplet CPD = Copies of target Per Droplet Units are of # per unit volume, not mass per unit volume CPD is the average number of copies per droplet. For a CPD of 2, some droplets will have 0 copies, some will have 1, 2, 3, 4, etc. Multiple ways to calculate CPD: Example 1: 100,000 molecules total, 20,000 droplets: CPD = (100, 000 molecules) / (20, 000 droplets) CPD = 5 molecules / droplet Total number of molecules CPD = Total number of droplets Example 2: Molecules CPD = • droplet volume (µ l) µl 20 molecules in a 20 µ l sample, 1 nl droplet volume: 20 molecules/20 µ l = 1.0 molecule/µ l CPD = (1.0 molecule / µl)(0.001 µl / droplet) CPD = 0.001 Low CPD Quantitation: Each Droplet Contains 1 Target Before PCR, each droplet contains at most 1 target. This is traditional “limiting dilution.” 6 copies of target in 20 l sample (0.3 copies/ul) Likely outcome of ddPCR: 6 positives μ Simple formula for low-concentration case only (1 nl droplet volume): N pos *1000 = copies / µl N total Npos = # of droplets with template Ntotal = Total # of droplets Intermediate CPD: Some Droplets with > 1 Target Example: 5,000 targets in 20,000 droplets Note: 5,000 targets in 20 μl = 250 copies per μl = 0.25 CPD Observations: We might expect 25% of the droplets to contain 1 copy of target and 75% of the droplets to contain no copies of target, but in reality, we’ll see some droplets with 2, 3, or even 4 copies, and correspondingly more droplets with 0 copies Statistics tells us exactly what to expect (on average): 0 targets 78% of droplets ~ 1 target 19.5% of droplets ~ ~ 2 targets 2.4% of droplets ~ ~ ~ 3 targets 0.2% of droplets ~ ~ ~ ~ 4 targets 0.01% of droplets “negative”: 78% (not 75%) “positive”: 22% (not 25%) Droplets that start out with 1,2,3, or 4 targets all look the same after thermal cycling – PCR saturates High CPD: More Target Molecules Than Droplets The average number of molecules per droplet will be 2.5. 0 1642 8.21% 1 4104 20.5% 2 5130 25.7% 3 4275 21.4% 4 2672 13.4% 5 1336 6.68% 6 557 2.78% 7 199 0.99% 8 62 0.31% 9 17 0.086% 10 4 0.022% 11 1 0.0049% Droplet occupancy, 2.5 CPD 5000 Percent of total droplets 3000 Count of droplets 0 1000 μ μ 50,000 targets in 20 l = 2500 targets per l = 2.5 CPD # of target molecules # of droplets Example 0 But there will still be many droplets with 0 molecules 1 2 3 4 5 6 7 8 9 10 # target molecules When we package 50,000 molecules into 20,000 droplets, on average 1642 droplets will have 0 targets, 4104 droplets will have 1 target, 5130 droplets will have 2 targets, etc. Average Number of Empty Droplets Changes with CPD 0 1 2 3 4 # of target molecules in droplet 2.5 CPD (data from table on previous slide) 5000 18,357 occupied droplets expected“positives” 0 # of droplets 15000 5000 12,642 occupied droplets expected“positives” 0 # of droplets 15000 5000 0 # of droplets 4,425 occupied droplets expected“positives” 15000 1 CPD 0.25 CPD 0 1 2 3 4 5 6 7 # of target molecules in droplet 0 1 2 3 4 5 6 7 # target molecules 15,576 empty droplets expected“negatives” 7,358 empty droplets expected“negatives” 8 9 # of target molecules in droplet 1,642 empty droplets expected“negatives” We calculate CPD based on the number of empty droplets observed. If CPD is Too High, There Are Not Enough Negative Droplets for Quantitation 0 3 6 9 12 16 20 24 # target molecules 7 empty droplets expected“negatives” 28 2000 1000 0 1000 # of droplets 2000 15 CPD 0 1000 # of droplets 2000 10 CPD 0 # of droplets 8 CPD 0 3 6 9 12 16 20 24 28 0 # target molecules 1 empty droplets expected“negatives” 3 6 9 12 16 20 24 28 # target molecules 0 empty droplets expected“negatives” Formulas Used in ddPCR C = CPD E = observed fraction of droplets that are empty Vdroplet = Volume of droplet c n e−c Pr(n) = n! Poisson distribution: probability that a droplet will contain n copies of target if the mean # of target copies per droplet is c. Pr(0) = e−c Poisson distribution, n=0. Probability that a droplet will be empty for a given value of c. c = −ln(E) Best estimate of CPD given the fraction of observed droplets that are empty. conc = c Vdroplet Concentration of sample ddPCR Confidence Intervals Two types of errors are reported by QuantaSoft Poisson errors: calculated for a single well or merged well, with contributions from subsampling and partitioning Total errors: calculated for replicates Experiments Involve Subsampling In most molecular biology experiments, we analyze part of a whole (a subsample) Examples: a sample of blood a biopsy from a tumor an aliquot from a tube of DNA Whenever you subsample from a larger volume, there is a subsampling error Subsampling error is most significant at low concentrations Subsampling At Low Concentrations No subsampling – analyze entire volume of sample Perfect counting machine Count molecules in entire volume: no subsampling error. Subsampling error – analyze part of a sample Expect 6 molecules. Measure 5 molecules. Most of the 25 subsamples contain 4, 5, 6, 7, or 8 molecules – this uncertainty is what we mean by subsampling error. This uncertainty contributes to the “Poisson error bars.” Perfect counting machine 150 molecules in sample. Subsample 1/25 (6 molecules expected) Subsampling Error is Inevitable M= expected number of target molecules in ddPCR reaction Fundamental subsampling limits Subsampling example: Suppose a person has a total of 100,000 copies of a particular target in his blood (5 liters total volume) and you take 5 ml of plasma, extract DNA, and run ddPCR. On average, you will find 100 copies of target, but the standard deviation of this measurement is 10 and the CV is 10%. CV % M M 2 4 6 8 CV = 12 stdev = M 0 1000 3000 5000 Expected # of items When subsampling from a large volume, these are absolute limits on measurement error. You cannot do better when measuring error properly. Errors Bars at High Concentration: Effect of Partitioning into Droplets Start with a sample with exactly 288 target molecules. Partition into 144 droplets (2 CPD) Empty droplets: 22 (19 expected) Calculated concentration: 1.88 CPD Each small square represents a 1 nL droplet Empty droplet Occupied droplets Repeat 3 times. Partitioning of 144 molecules will be a little different every time This uncertainty contributes to the “Poisson error bars.” Empty: 20 Est Conc: 1.97 CPD Empty: 17 Est Conc: 2.14 CPD Empty: 19 Est Conc: 2.03 CPD Error Bars at High Concentration Relative contribution of partitioning error and subsampling error to ddPCR error 5 At high CPD, uncertainty due to partitioning is higher than uncertainty due to subsampling. stdev ⋅100 mean 4 1 2 CV(%) CV is standard deviation expressed as a percentage. Partitioning error Subsampling error 0.11 0 CV = 3 Dotted lines show CPD range with CV < 2.5%. The lowest CV occurs at a CPD of ~1.6. Example: 3 CPD •CV = 1.19 % (assuming 15,000 droplets read) •95% CI = 2.93-3.07 ddPCR error (15,000 droplets) Subsampling error 0 1 2 3 CPD 4 5.73 5 6 Error Bars at Low Concentration 10 Errors at low concentration 8 Subsampling error ddPCR error (15,000 droplets) 0 2 4 CV(%) 6 At low concentration, the largest contribution to the error is from subsampling. Partitioning a given sample into more droplets will not change it. 0 5000 10000 15000 Target molecules per 20 ul Range is 0.0067 to 1 CPD, or 100 to 15,000 copies of target in the sample (15,000 droplets) “Technical Replicates” Describes Multiple Different Experimental Designs Sample • • Posttreatment DNA DNA for ddPCR DNA + MMX ddPCR well Poisson confidence intervals (CIs) reported by QuantaSoft capture the 1st example. CIs can be calculated without replicates The 2nd and 3rd examples show why total CIs might be larger than Poisson CIs – additional variability from entire process is often larger than measurement error. Poisson Error Bars Estimate Errors on Pure Technical Replicates qPCR 1 Estimate of the mean and CI based on technical replicates Relative concentration Relative concentration Mean and CI This type of error is captured by Poisson CI in ddPCR 1 Sample Sample ddPCR Calculate mean and CI based on known statistical properties of digital observations 1 ddPCR Copies per microliter Droplets from replicates pooled in metawell 1 Treat all three wells together as one big well, and calculate CI as for 1 well. Estimate mean and CI based on replicates, to account for additional types of variability. 1 Poisson CI Copies per microliter Copies per microliter Mean and CI Inner error bars show CIs based on fundamental ddPCR statistics (Poisson CI) Outer error bars (total CI) include all the observed variability that is not accounted for by fundamental ddPCR statistics. Biological Replicates – qPCR and ddPCR qPCR: multiple wells required to estimate measurement error ddPCR: one well is sufficient to estimate the measurement error * Multiple wells per sample recommended for extremely low concentrations Error Bar “Rules of Thumb” Total error bars always greater than or equal to Poisson error bars Enforced by QuantaSoft, it is not a fundamental property of the math Total errors bars will be approximately equal to Poisson error bars for true instrument technical replicates with a good assay and good technique. If in doubt, report total error bars as 95% confidence interval (CI) For downstream analysis, stdev = (CImax- CImin)/(2*1.96) Summary In ddPCR, we determine concentration by effectively counting target molecules We estimate measurement error in two ways Based on fundamental statistics Based on technical replicates combined with fundamental statistics At low concentrations, subsampling error is a fundamental limitation of any measurement technique Units Term Definition Typical range Significance Copies/ l number of target molecules per l. 1-6000 QuantaSoft reports concentration in these units. copies/drople t (CPD) Number of target molecules per droplet 0.001-6 To ensure some empty droplets, load less than 6 CPD. Genome equivalents (GE) Approximate number of human genomes present. 1 diploid cell contains 2 GEs of DNA. 1 GE = 3.3 pg. Depends For targets present at 1 copy per genome, load <= 6 GE per droplet. μ μ Notes: •A 20 l sample is partitioned into 20,000 droplets •For a 1 nl droplet, 1 CPD = 1 copy/nL = 1000 copies/uL (QuantSoft units) μ