9/23/2009 Systematic Error Illustration of Bias Sources of Systematic Errors Instrument Errors Method Errors Personal – Prejudice – Preconceived notion of “true” value – Number bias Prefer 0/5 Small over large Even over odd Effects of Systematic Errors Constant Errors – Become more serious as size of measurement get smaller Proportional Errors – Interfering contaminants I f the contaminant becomes larger, the signal becomes larger. 1 9/23/2009 Detection of Systematic and Personal Errors Calibration Care and Self-discipline – Instrument readings – Notebook entries – Calculations – Physical disabilities--color blindness Bias Difficult to detect – Analyze standard samples – Do an independent analysis – Determine a blank – Vary the sample size Applying Statistics to Data Evaluation Gross error or segment of population? Define the confidence interval. Find the number of replicates necessary to ensure that the mean falls within a predetermined interval. What is the probability that an experimental mean and a “true” value or a two experimental means are different. Calibrate 2 9/23/2009 Gross Errors The Q -test: rejecting outliers Gross Errors The Q -test: rejecting outliers Qexp xq xn xq x1 d w quest. result - nearest neighbor range The Q -test: An Example A calcite sample yields the following data for the determination of calcium as CaO: 55.95, 56.00, 56.04, 56.08, and 56.23. Should we reject 56.23? Qexp xq xn xq x1 56.23 56.08 56.23 55.95 0.54 3 9/23/2009 The Q -test: The Q -Table (5-1) Qcrit Number of Observations 90% 95% 99% 3 0.941 0.970 0.994 4 0.765 0.829 0.926 5 0.642 0.710 0.821 6 0.560 0.625 0.740 7 0.507 0.568 0.680 The Q -test: An Example A calcite sample yields the following data for the determination of calcium as CaO: 55.95, 56.00, 56.04, 56.08, and 56.23. Should we reject 56.23? Qexp xq xn xq x1 56.23 56.08 56.23 55.95 0.54 What’s the criterion? Qexp > Qcrit , reject. If Qexp < Qcrit , accept. Qexp = 0.54; Qcrit = 0.64, so If accept. Here 4 9/23/2009 Can we reject data? Blind application of statistical tests is no better than doing nothing. Use good judgement based on experience. If you know that something went wrong with a sample and the sample produces an outlier, then rejection may be warranted. Be cautious about rejecting data for any reason. Recommendations Keep good records and examine the data carefully. If possible, estimate the precision of the method. Repeat the analysis if time and sample are available. Compare with first data. If not feasible, apply the Q -test. Recommendations If Q -test indicated retention, consider reporting the median. The median allows inclusion of all of the data without undue influence from the outlier. The median of a set of 3 measurements from a normal distribution gives a better estimate than the mean of the remaining 2 values after an outlier is rejected. 5 9/23/2009 Confidence Limits and Intervals Confidence Limits are limits around an experimentally determined mean within which the true mean lies with a give degree of probability. The confidence interval is the interval around the mean defined by the confidence limits. Confidence limits if s is a good estimate of CL for x z (single measurement) CL for x z N (mean x of N measurements) 50% Confidence Limits 6 9/23/2009 80% Confidence Limits 90% Confidence Limits 95% Confidence Limits 7 9/23/2009 99% Confidence Limits Confidence limits if s is not a good estimate of t x Student' s t (analogous to z ) ts N (mean x of N measurements) CL for x Values of Student's t Probability Level Degrees of Freedom 90% 95% 99% 99.8% 1 6.31 12.7 63.7 318 2 2.92 4.30 9.92 22.3 3 2.35 3.18 5.84 10.2 4 2.13 2.78 4.60 7.17 5 2.02 2.57 4.03 5.89 (z) 1.64 1.96 2.58 3.09 8 9/23/2009 Finding the Confidence Interval: An Example Determination of the alcohol content in blood gives the following data: % C2H5OH: 0.084, 0.089, and 0.079. (a) If the precision of the method is unknown, find the 95% confidence limits of the mean. (b) Perform the same calculation if the the standard deviation s = 0.0050% C2H5OH. (How could we determine s ?) unknown (use t ) (a) x 0.084 % C 2 H 5OH s 0.0050 % C 2 H 5OH 95% CL ts (4.30)(0.0050) 0.084 N 3 0.084 0.012 % C 2 H 5OH x Values of Student's t Probability Level Degrees of Freedom 90% 95% 99% 99.8% 1 6.31 12.7 63.7 318 2 2.92 4.30 9.92 22.3 3 2.35 3.18 5.84 10.2 4 2.13 2.78 4.60 7.17 5 2.02 2.57 4.03 5.89 (z) 1.64 1.96 2.58 3.09 9 9/23/2009 = 0.0050 % (b) s (use z ) x 0.084 % C 2 H 5OH 0.0050 % C 2 H 5OH 95% CL (1.96)(0.0050) N 3 0.084 0.006 % C 2 H 5OH x z 0.084 Values of Student's t Probability Level Degrees of Freedom 90% 95% 99% 99.8% 1 6.31 12.7 63.7 318 2 2.92 4.30 9.92 22.3 3 2.35 3.18 5.84 10.2 4 2.13 2.78 4.60 7.17 5 2.02 2.57 4.03 5.89 (z) 1.64 1.96 2.58 3.09 Comparing a mean to the true value: The Null Hypothesis The null hypothesis assumes that two measurments are the same. Any numerical difference is assumed to be due to random error. If the observed difference is greater than or equal to the difference that would occur 5% of the time, the null hypothesis is rejected, and the difference is judged significant. 10 9/23/2009 The Critical Value We rearrange the equation for the confidence interval. x x ts N ts N Compare the difference to the critical value The difference x is compared to ts / N the critical value the desired probability level. at x If is greater than the critical value, the null hypothesis is rejected. An Example: The Determinaton of Sulfur in Kerosenes A known sample containing 0.123% sulfur was analyzed and the results for four samples were: 0.112, 0.118, 0.115, and 0.119 %S. Is there bias in the method? Let’s do a spreadsheet. 11 9/23/2009 The Spreadsheet (5%) True Val. 0.123 Difference -0.007 Data t(95%, 3 df) 0.112 3.18 0.118 0.115 ts/sqrt(N) 0.119 0.0050 0.116 0.0032 Mean Std. Dev. If we wish to be wrong no more than 5% of the time, we must reject the null hypothesis, and there is systematic error. Here What about 1%? True Val. 0.123 Difference -0.007 Data t(99%, 3 df) 0.112 5.84 0.118 0.115 ts/sqrt(N) 0.119 0.0092 0.116 0.0032 Mean Std. Dev. If we wish to be wrong no more than 1% of the time, we must accept the null hypothesis, and there is no systematic error. Comparing Two Experimental Means x1 x1 tspooled d.f. N1 N1 N 2 N1 N 2 N2 2 12 9/23/2009 Least-Squares for Analyzing Linear Calibrations: y = mx +b Least-squares assumes that there is relatively little error in the x measurement. The mathematics of the derivation of the equations minimizes the sum of the squares of the deviations (the residuals ) of the points from the best line in the y direction only. From calculus, take the partial derivatives of the equation for the sum of squares with respect to m and b , set it equal to zero, and solve for the variables. 13 9/23/2009 The Intermediate Equations (See pp. 161-2) S xx xi x 2 S yy yi y 2 S xy xi x yi x xi 2 N yi yi2 2 N y and N xi xi2 xi xi yi yi N yi y N The Results S xy 1. Slope : m 2. Intercept : b S xx y mx 3. The standard deviation about regression, or the standard error of the estimate : sr m 2 S xx S yy N 2 where N 2 d.f. sr2 S xx 4. The standard deviation of the slope : sm The Standard Deviation about Regression Analogous to the standard deviation Measure of the scatter of points Precision similar to individual data sr S yy m 2 S xx yi N 2 yi mxi b N 2 yline N 2 2 2 14 9/23/2009 More Results 5. The standard deviation of the intercept : sb xi2 sr N 2 i x xi 2 6. The standard deviation of results from the calibration curve : sb sr m 1 M 1 N yc y m 2 S xx 2 where M yi yc i 1 M M no. replicates of the unknown 15 9/23/2009 Assignment 2 7-2, 7-4, 7-6, 7-11, 7-16, 7-19 SS p. 164 16