Chapter 4 Statistics Standard Deviation Sample Standard deviation Population Standard deviation (for use with small samples n< ~25) (for use with samples n > 25) m = population mean IN the absence of systematic error, the population mean approaches the true value for the measured quantity. ( xi x ) s n 1 2 ( xi m ) N 2 Example The following results were obtained in the replicate analysis of a blood sample for its lead content: 0.752, 0.756, 0.752, 0.760 ppm lead. Calculate the mean and standard deviation for the data set. Standard deviation 0.752, 0.756, 0.752, 0.760 ppm lead. x 0.755 ( xi x ) s n 1 2 You’d report the amount of lead in this sample of blood as Excel® Demo Distributions of Experimental Data We find that the distribution of replicate data from most quantitative analytical measurements approaches a Gaussian curve. Example – Consider the calibration of a pipet. Replicate data on the calibration of a 10-ml pipet. Trial 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Volume 9.988 9.973 9.986 9.980 9.975 9.982 9.986 9.982 9.981 9.990 9.980 9.989 9.978 9.971 9.982 9.983 9.988 Mean 9.982 ml median 9.982 ml spread 0.025 ml Standard Deviation Trial 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 0.0056 ml Volume 9.975 9.980 9.994 9.992 9.984 9.981 9.987 9.978 9.983 9.982 9.991 9.981 9.969 9.985 9.977 9.976 9.983 Trial 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Volume 9.976 9.990 9.988 9.971 9.986 9.978 9.986 9.982 9.977 9.977 9.986 9.978 9.983 9.980 9.983 9.979 Frequency distribution Volume Range, mL Number in Range % in range 9.969 to 9.971 3 9.982 to 9.974 1 9.975 to 9.977 7 9.978 to 9.980 9 9.981 to 9.983 13 9.984 to 9.986 7 9.987 to 9.989 5 9.990 to 9.992 4 9.993 to 9.995 1 6 2 14 18 26 14 10 8 2 1 ( xm )2 / 2 2 y e 2 14 Number of measurements 12 10 8 6 4 2 0 9.965 9.970 9.975 9.980 9.985 9.990 Range of measured values 9.995 14 Average= 9.982 12 Number of measurements Std. Dev = + 0.0056 10 8 6 4 2 0 9.965 9.970 9.975 9.980 9.985 9.990 Range of measured values 9.995 4-2 Confidence Intervals For small data sets ts ConfidenceIntervalfor m x n small data sets m is the true mean and the above equations express that the “true mean” will be in the calculated range at a given confidence. Example The following results were obtained in the replicate analysis of a blood sample for its lead content: 0.752, 0.756, 0.752, 0.760 ppm lead. Calculate the mean and standard deviation for the data set. 0.755 0.0038 ppm Find (a) the 50% CL and (b) the 90% CL Confidence Intervals ? ts CL for m x n t 0.0038 CL for m 0.755 4 Confidence Intervals ? ts CL for m x n t 0.0038 50% CL for m 0.755 4 0.765 0.0038 50% CL for m 0.755 4 50% CL for m 0.755 0.00146 Confidence Intervals ? ts CL for m x n t 0.0038 90% CL for m 0.755 4 2.353 0.0038 90% CL for m 0.755 4 90% CL for m 0.755 0.0045 Confidence Intervals 90 % CI 50 % CI 0.750 0.755 0.760 There is a 50% chance that the true mean, m, lies in the range 0.755 + 0.001 ppm (of from 0.754 to 0.756 ppm) Likewise, these calculations mean that there is a 90% chance that the true mean, m, lies in the range 0.755 + 0.005 ppm (of from 0.750 to 0.760 ppm) Confidence limits and uncertainty Suppose we measure the volume of a vessel five times and observe values: 6.372, 6.375, 6.374, 6.377, and 6.375 mL. And find average = 6.3746 mL And s = 0.0018mL Use a 90% CL to Estimate uncertainty! Experimental Uncertainty Well, a 90% CI means that there is a 90% chance that the true volume is within the range. And find average = 6.3746 mL And s = 0.0018mL ts t ( 0 . 001 ) 8 CL for m x CL for m 6.3746 n 5 Experimental Uncertainty ______(0.0018 ) 90% CL for m 6.3746 5 Comparison of Means with Student’s t Comparison of Means with Student’s t A t test is used to compare one set of measurements with another to decide whether or not they are “The Same” Compare measured result with a “true” value Comparing two experimental means Comparing a mean with a true value Good for detecting systematic (determinate) errors Uses student tvalues tcalculated x xt s n COMPARE TO ttable If tcalc > ttable – difference is significant If tcalc < ttable - difference is NOT significant EXAMPLE A new procedure for the rapid determination of sulfur in kerosene was tested on a KNOWN sample (m or xt = 0.123% S). The results were: % S = 0.112, 0.118, 0.115, and 0.119. Is there a difference at the 95% confidence level? tcalculated x xt s n tcalculated 0.116 0.123 0.003162 4 tcalculated = ttable = 3.182 Are they significantly different? Strutt’s Story At the turn of the last century it was generally thought that dry air contained about one-fifth oxygen and four-fifths nitrogen. One man wanted to confirm this …. sample – Dry air … Added red-hot copper. Cu would react with oxygen to make solid copper oxide (CuO). 1st Air without Oxygen. 2nd sample – Make an equal volume of nitrogen. Pure Nitrogen can be generated by decomposition of N2O (Nitrous oxide) or NO (Nitric Oxide). Pure nitrogen. Reasoned the amounts would be the same Is there a Difference at the 95% Confidence Level? Comparison of two means tcalculated x1 x2 n1n2 n1 n2 s pooled ( x x ) ( x x ) 2 s pooled i set 1 i set 2 n1 n2 2 2 s12 (n1 1) s22 (n2 1) n1 n2 2 Comparison of two means x2 2n.129947 n2 2x.131011 7 8 ttcalculated calculated s pooled0.00102 n1 n2 78 tcalculated 20.2 0.000143 (7 1) 0.001382 (8 1) 0.00102 7 82 2 s pooled If tcalc > ttable – difference is significant Why the difference? In 1904, Lord Rayleigh was awarded the Novel Prize for discovering Argon Comparison of Standard deviations between data The F-test may be used to provide insights into: Whether there is a difference in the precision of two methods. (may warrant a new calculation to compare means! ) Is method A more precise than method B? F-test (comparison of std. dev.) 2 Fcalculated s1 2 2 s We always put the larger standard deviation in the numerator, so that F>1. If Fcalculated > Ftable then the difference is significant at the 95% CL. Example A well developed method for protein concentration determination yields a standard deviation of 0.25 M over many hundreds of replicates. A) Dr. Skeels develops a rapid method for the determination of protein concentration that yields a standard deviation of 0.15 M (for 12 degrees of freedom). B) Dr. Marano’s method yields 0.11 M (std dev) for the same number of degrees of freedom. Is Dr. Skeels’ method more precise than the standard or is Dr. Marano’s, or neither? Throwing out “Bad data” For an analysis of alcohol content in wine Dr. Skeels finds the following: 12.53, 12.56, 12.47, 12.67, and 12.48% Q-test for Bad Data gap Qcalculated range Compare to Qcritical Qcalc > Qcritical can reject Range 12.47 12.48 12.53 12.56 12.67 Gap Qcalculated ? Qtable 0.64