Chem. 31 – 2/11 Lecture Announcements • Today’s Lecture – Chapter 4 Material • Probability within Limits • Confidence Intervals • Statistical Tests Chapter 4 – Gaussian Distributions Now for a “real” limit problem example: A man wants to get life insurance. If his measured cholesterol level is over 240 mg/dL (2,400 mg/L), his premium will be 25% higher. His level is measured and found to be 249 mg/dL. His uncle, a biochemist who developed the test, tells him that a typical standard deviation on the measurement is 25 mg/dL. What is the chance that a second measurement (with no crash diet or extra exercise) will result in a value under 240 mg/dL (e.g. beat the test)? Graphical view of example Equivalent Area Frequency Normal Distribution 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 Table area Desired area -5 -4 -3 -2 -1 0 1 2 3 4 5 Z value 240 249 X-axis Chapter 4 – Calculation of Confidence Interval 1. 2. x n Z depends on area or desired probability At Area = 0.45 (90% both sides), Z = 1.65 At Area = 0.475 (95% both sides), Z = 1.96 => larger confidence interval Normal Distribution Frequency Confidence Interval = x + uncertainty Calculation of uncertainty depends on whether σ is “well known” 3. When s is not well known (covered later) 4. When s is well known (not in text) Value + uncertainty = Zs 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -3 -2 -1 0 Z value 1 2 3 Chapter 4 – Calculation of Uncertainty Example: The concentration of NO3- in a sample is measured 2 times and found to give 18.6 and 19.0 ppm. The method is known to have a constant relative standard deviation of 2.0% (from past work). Determine the concentration and 95% confidence interval. Chapter 4 – Calculation of Confidence Interval with s Not Known Value + uncertainty = tS x n t = Student’s t value t depends on: - the number of samples (more samples => smaller t) - the probability of including the true value (larger probability => larger t) Chapter 4 – Calculation of Uncertainties Example • Measurement of lead in drinking water sample: – values = 12.3, 9.8, 11.4, and 13.0 ppb • What is the 95% confidence interval? Chapter 4 – Ways to Reduce Uncertainty 1. Decrease standard deviation in measurements (usually requires more skill in analysis or better equipment) 2. Analyze each sample more time (this increases n and decreases t) 3. Understand variability better (so that s is known and Z-based uncertainty can be used) Overview of Statistical Tests • t-Tests: Determine if a systematic error exists in a method or between methods or if a difference exists in sample sets • F-Test: Determine if there is a significant difference in standard deviations in two methods or sample sets (which method is more precise/which set is more variable) • Grubbs Test: Determine if a data point can be excluded on a statistical basis Statistical Tests Possible Outcomes • Outcome #1 – There is a statistically significant result (e.g. a systematic error) – this is at some probability (e.g. 95%) – can occasionally be wrong (5% of time possible if test barely valid at 95% confidence) • Outcome #2 – No significant result can be detected – this doesn’t mean there is no systematic error – it does mean that the systematic error, if it exists, is not detectable (e.g. not observable due to larger random errors) – It is not possible to prove a null hypothesis beyond any doubt Statistical Tests Example from Research This Week • Goal of Work: be able to consistently use high resolution mass spectrometer to measure mass with error less than 5 ppm (limit set for publication in several journals) • Measurement is challenging and could be subject to poor data treatment (e.g. selection of “good” vs. “bad” data) • Do any measurements within 5 ppm limit meet the requirement? • No. We couldn’t just pick 1 out of 4 repeated measurements that meets the standard. We want to be 95+% certain true measured value is within the 5 ppm limit • So we need to use statistics to set rules for meeting the limit • In this case (different than tests in this class), measured value is acceptable if furthest 90% limit is within 5 ppm limit and closest 95% limits is within 5 ppm limit Measured Mass = 809.4569 amu Example compound: expected mass = 809.4587 amu To meet 5 ppm limit, meas. mass = 809.4547 to 809.4628 Statistical Tests Example from Research This Week • Graphical Explanation of Mass Measurement – multiple mass measurements made – giving: • mean value +/- 90% and 95% CIs – not only mean but 90%/95% limits need to be within limit – in example, >5% chance of error expected mass (from mass of each atom) mean measured mass expected distribution – based on SD 90% high limit out of range + and – 5 ppm Statistical Tests t Tests • Case 1 – used to determine if there is a significant bias by measuring a test standard and determining if there is a significant difference between the known and measured concentration • Case 2 – used to determine if there is a significant differences between two methods (or samples) by measuring one sample multiple time by each method (or each sample multiple times) • Case 3 – used to determine if there is a significant difference between two methods (or sample sets) by measuring multiple sample once by each method (or each sample in each set once) Case 1 t test • Methylmannopyranoside (MMP) example • Added as an internal standard at 5 ppm • Analysis will tell if sample causes a bias compared to standard Case 2 t test Example • A winemaker found a barrel of wine that was labeled as a merlot, but was suspected of being part of a chardonnay wine batch and was obviously mis-labeled. To see if it was part of the chardonnay batch, the mislabeled barrel wine and the chardonnay batch were analzyed for alcohol content. The results were as follows: – Mislabeled wine: n = 6, mean = 12.61%, S = 0.52% – Chardonnay wine: n = 4, mean = 12.53%, S = 0.48% • Determine if there is a statistically significant difference in the ethanol content. Case 3 t Test Example • Case 3 t Test used when multiple samples are analyzed by two different methods (only once each method) • Useful for establishing if there is a constant systematic error • Example: Cl- in Ohio rainwater measured by Dixon and PNL (14 samples) Case 3 t Test Example – Data Set and Calculations Calculations Conc. of Cl- in Rainwater (Units = uM) Step 1 – Calculate Difference Sample # Dixon Cl- PNL Cl- 1 9.9 17.0 7.1 2 2.3 11.0 8.7 3 23.8 28.0 4.2 4 8.0 13.0 5.0 5 1.7 7.9 6.2 6 2.3 11.0 8.7 7 1.9 9.9 8.0 8 4.2 11.0 6.8 9 3.2 13.0 9.8 10 3.9 10.0 6.1 11 2.7 9.7 7.0 12 3.8 8.2 4.4 13 2.4 10.0 7.6 14 2.2 11.0 8.8 Step 2 - Calculate mean and standard deviation in differences ave d = (7.1 + 8.7 + ...)/14 ave d = 7.49 Sd = 2.44 Step 3 – Calculate t value: tCalc d Sd tCalc = 11.5 n Case 3 t Test Example – Rest of Calculations • Step 4 – look up tTable – (t(95%, 13 degrees of freedom) = 2.17) • Step 5 – Compare tCalc with tTable, draw conclusion – tCalc >> tTable so difference is significant t- Tests • Note: These (case 2 and 3) can be applied to two different senarios: – samples (e.g. sample A and sample B, do they have the same % Ca?) – methods (analysis method A vs. analysis method B) F - Test • Similar methodology as t tests but to compare standard deviations between two methods to determine if there is a statistical difference in precision between the two methods (or variability between two sample sets) FCalc S1 > S2 S12 2 S2 As with t tests, if FCalc > FTable, difference is statistically significant