Version 2012 Updated on 030212 Copyright © All rights reserved Dong-Sun Lee, Prof., Ph.D. Chemistry, Seoul Women’s University Chapter 7 Statistical Data Treatment and Evaluation America's most beloved illustrator Norman Rockwell - Jury Holdout Artist Name: NORMAN ROCKWELL Title: "JURY ROOM" or sometimes "THE HOLDOUT" Print size: 13.5" X 11" . Release Date: POST Cover February 14, 1959 “This lively cover provided Rockwell with the opportunity to do character studies of eleven good men and true, plus that of one determined woman who is not about to be shaken by arguments that she finds unconvincing. Rockwell has attacked with relish the problem of portraying the debris produced during the course of the marathon session leading up to the moment shown here.” Not-guilty or guilty ? Norman Rockwell’s Saturday Evening Post cover The Holdout, from Feb. 14, 1959. One of the 12 jurors dose not agree with the others, who are trying to convince her. http://www.curtispublishing.com/images/Rockwell/9590214.jpg http://stores.ebay.com/Norman-Rockwells Confidence Intervals In statistical inference, one wishes to estimate population parameters using observed sample data. A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. (Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1) The common notation for the parameter in question is . Often, this parameter is the population mean , which is estimated through the sample mean x. The level C of a confidence interval gives the probability that the interval produced by the method employed includes the true value of the parameter . Significance level The probability that a result is outside the confidence interval is often called the significance level. When expressed as a fraction, the significance level is often given the symbol . Confidence level The confidence level is the probability value 1– associated with a confidence interval, where is the level of significance. It can also be expressed as a percentage 100(1 – )% and is than sometimes called the confidence coefficient. The confidence level (CL) is related to on a percentage basis by CL = (1 – ) ×100% Confidence interval for mean ; accuracy of an analysis The confidence interval (CI) for the mean is the range of values within which the true population mean, , is expected to lie with a certain probability distance from the measured mean, x. The confidence level (CL) is the probability that the true mean lies within a certain interval. It is often expressed as a percentage(%). Finding the confidence interval when is known or s is a good estimate of : (s ) : Single measurement : CI for = x z n measurement : CI for = x z/ n Confidence Level, % z 50.0 0.67 68.0 1.00 80.0 1.28 90.0 1.64 95.0 1.96 95.4 2.00 99.0 2.58 99.7 3.00 99.9 3.29 Areas under a Gaussian curve for various values of z. Finding the confidence interval when is unknown ; For a single measurement : t = (x – ) / s For the mean of n measurements: t = (x – ) / (s / n ) where t is a statistical constant that depends both on the , confidence level and the number of measurements(DF: degree of freedom, n1) involved. CI for : Values of t for Various Levels of Probability Degrees of Freedom 80% 90% 95% 99% 99.9% 1 3.08 6.31 12.7 63.7 637 2 1.89 2.92 4.30 9.92 31.6 3 1.64 2.35 3.18 5.84 12.9 4 1.53 2.13 2.78 4.60 8.61 5 1.48 2.02 2.57 4.03 6.87 6 1.44 1.94 2.45 3.71 5.96 7 1.42 1.90 2.36 3.50 5.41 8 1.40 1.86 2.31 3.36 5.04 9 1.38 1.83 2.26 3.25 4.78 10 1.37 1.81 2.23 3.17 4.59 15 1.34 1.75 2.13 2.95 4.07 20 1.32 1.73 2.09 2.84 3.85 40 1.30 1.68 2.02 2.70 3.55 60 1.30 1.67 2.00 2.62 3.46 ∞ 1.28 1.64 1.96 2.58 3.29 Breath alcohol analyzers. Many states have ruled that ablood alcohol level of 0.1% or greater indicates intoxication. Ex. A Chemist obtained the following data for the alcohol content of a sample of blood: % C2H5OH: 0.084, 0.089 and 0.079. Calculate the 95% CI for the mean assuming (a) the three results obtained are the only indication of the precision of the method and (b) from previous experience on hundreds of samples, we know that the standard deviation of the method s = 0.005% and is a good estimate of (s ). Values of t for 2 and 3 degree of freedom at the 95% confidence level: 4.3 ( 2 degree of freedom), 3.18 (3 degree of freedom) (a) xi = 0.084 + 0.089 + 0.079 = 0.252 xi2 = (0.084)2 + (0.089)2 + (0.079)2 = 0.021218 x = 0.252 / 3 = 0.084 s = {(xi –x )2 /(n – 1)}1/2 = 0.0050% 95% CI = 0.084 (4.30 0.0050) / 3 = 0.084 ± 0.012% (b) 95% CI = x ± z /n = 0.084 ± (1.96×0.0050)/3= 0.084 ± 0.006% Significance level The probability of a false rejection of the null hypothesis in a statistical test. Also called level of significance . The significance level of a test is the probability that the test statistic will reject the null hypothesis when the hypothesis is true. Significance is a property of the distribution of a test statistic, not of any particular draw of the statistic. In hypothesis testing, the significance level is the criterion used for rejecting the null hypothesis. The significance level is used in hypothesis testing as follows: First, the difference between the results of the experiment and the null hypothesis is determined. Then, assuming the null hypothesis is true, the probability of a difference that large or larger is computed . Finally, this probability is compared to the significance level. If the probability is less than or equal to the significance level, then the null hypothesis is rejected and the outcome is said to be statistically significant. Traditionally, experimenters have used either the .05 level (sometimes called the 5% level) or the .01 level (1% level), although the choice of levels is largely subjective. The lower the significance level, the more the data must diverge from the null hypothesis to be significant. Therefore, the .01 level is more conservative than the .05 level. The Greek letter alpha () is sometimes used to indicate the significance level. Hypothesis testing Hypothesis testing is a method of inferential statistics. An experimenter starts with a hypothesis about a population parameter called the null hypothesis. Data are then collected and the viability of the null hypothesis is determined in light of the data. If the data are very different from what would be expected under the assumption that the null hypothesis is true, then the null hypothesis is rejected. If the data are not greatly at variance with what would be expected under the assumption that the null hypothesis is true, then the null hypothesis is not rejected. Failure to reject the null hypothesis is not the same thing as accepting the null hypothesis. Null hypothesis In statistics, a null hypothesis is a hypothesis that is presumed true until statistical evidence in the form of a hypothesis test indicates otherwise. It is a hypothesis that the parameters, or mathematical characteristics, of two or more populations are identical. The null hypothesis is an hypothesis about a population parameter. The purpose of hypothesis testing is to test the viability of the null hypothesis in the light of experimental data. Depending on the data, the null hypothesis either will or will not be rejected as a viable possibility. Consider a researcher interested in whether the time to respond to a tone is affected by the consumption of alcohol. The null hypothesis is that µ1 – µ2 = 0 where µ1 is the mean time to respond after consuming alcohol and µ2 is the mean time to respond otherwise. Thus, the null hypothesis concerns the parameter µ1 – µ2 and the null hypothesis is that the parameter equals zero. The null hypothesis is often the reverse of what the experimenter actually believes; it is put forward to allow the data to contradict it. In the experiment on the effect of alcohol, the experimenter probably expects alcohol to have a harmful effect. If the experimental data show a sufficiently large effect of alcohol, then the null hypothesis that alcohol has no effect can be rejected. It should be stressed that researchers very frequently put forward a null hypothesis in the hope that they can discredit it. For a second example, consider an educational researcher who designed a new way to teach a particular concept in science, and wanted to test experimentally whether this new method worked better than the existing method. The researcher would design an experiment comparing the two methods. Since the null hypothesis would be that there is no difference between the two methods, the researcher would be hoping to reject the null hypothesis and conclude that the method he or she developed is the better of the two. The symbol H0 is used to indicate the null hypothesis. For the example just given, the null hypothesis would be designated by the following symbols: H0: µ1 – µ2 = 0 or by H0: µ1 = µ2. The null hypothesis is typically a hypothesis of no difference as in this example where it is the hypothesis of no difference between population means. That is why the word "null" in "null hypothesis" is used -- it is the hypothesis of no difference. Despite the "null" in "null hypothesis," there are occasions when the parameter is not hypothesized to be 0. For instance, it is possible for the null hypothesis to be that the difference between population means is a particular value. Or, the null hypothesis could be that the mean SAT score in some population is 600. The null hypothesis would then be stated as: H0: μ = 600. Although the null hypotheses discussed so far have all involved the testing of hypotheses about one or more population means, null hypotheses can involve any parameter. An experiment investigating the correlation between job satisfaction and performance on the job would test the null hypothesis that the population correlation (ρ) is 0. Symbolically, H0: ρ = 0. Some possible null hypotheses are given below: H0: μ=0 H0: μ=10 H 0: μ1 – μ2 = 0 H0: π = .5 H0: π1 – π2 = 0 H 0: μ1 = μ2 = μ3 H0: ρ1 – ρ2= 0 When a one-tailed test is conducted, the null hypothesis includes the direction of the effect. A one-tailed test of the differences between means might test the null hypothesis that μ1 – μ2 is greater than 0. If M1 – M2 were much less than 0 then the null hypothesis would be rejected in favor of the alternative hypothesis (Ha): μ1 – μ2 < 0. The p-value (level of significance) All statistical tests produce a p-value and this is equal to the probability of obtaining the observed difference, or one more extreme, if the null hypothesis is true. To put it another way - if the null hypothesis is true, the p-value is the probability of obtaining a difference at least as large as that observed due to sampling variation. Consequently, if the p-value is small the data support the alternative hypothesis. If the p-value is large the data support the null hypothesis. But how small is 'small' and how large is 'large'?! Conventionally (and arbitrarily) a p-value of 0.05 (5%) is generally regarded as sufficiently small to reject the null hypothesis. If the p-value is larger than 0.05 we fail to reject the null hypothesis. The 5% value is called the significance level of the test. Other significance levels that are commonly used are 1% and 0.1%. Some people use the following terminology: p-value Outcome of test Statement greater than 0.05 Fail to reject H0 No evidence to reject H0 between 0.01 and 0.05 Reject H0 (Accept H1) Some evidence to reject H0 (therefore accept H1) between 0.001 and 0.01 Reject H0 (Accept H1) Strong evidence to reject H0 (therefore accept H1) less than 0.001 Reject H0 (Accept H1) Very strong evidence to reject H0 (therefore accept H1) Hypothesis Testing To explain an observation, a hypothetical model is advanced and is tested experimentally to determine its validity. In statistics, a null hypothesis postulates that two or more observed quantities are the same. Comparing an experimental mean with a known value Large sample z test: 1. State the null hypothesis: Ho : = o 2. Form the test statistics : z = (x – o) / ( / n) 3. State the alternative hypothesis, Ha, and determine the rejection region: For Ha : o, reject Ho if z zcrit or if –zcrit z For Ha : Ha : > o, reject Ho if z zcrit For Ha : < o, reject Ho if –zcrit z . Rejection regions for the 95% confidence level. (a) Two-tailed test for Ha : o. Note the critical value of z is 1.96. (b) One-tailed test for Ha : > o. Here, the critical value of zcrit is 1.64, so that 95% of the area is to the left of and 5% of the area is to the right. (c) One-tailed test for Ha : < o . Here the critical value is again 1.64, so that 5% of the area lies to the left of –zcrit. Small sample t test: 1. State the null hypothesis: Ho : = o 2. Form the test statistics : t = (x – o) / (s / n) 3. State the alternative hypothesis, Ha, and determine the rejection region: For Ha : o, reject Ho if t tcrit or if –tcrit t For Ha : Ha : > o, reject Ho if t tcrit For Ha : < o, reject Ho if –tcrit t . Illustration of systematic error in an analytical method. Curve A is the frequency distribution for the accepted value (0) by a method without bias. Curve B illustrates the frequency distribution of results by a method that could have a significant bias. bias = B – 0 Comparison of two experimental means The t test for differences in means Two sets of data: x1, n1 replicate analyses, s1 x2, n2 replicate analyses, s2 The standard error of the mean (sm) is the standard deviation of a set of data divided by the square root of the number of data points in the set. sm1 = s1 / n1 The variance of the mean : s2m1 = s21 / n1 , s2m2 = s22 / n2 The variance of the difference (s2d ) between the means: s2d = s2m1 + s2m2 (sd / n ) = (s21 / n1 ) +(s22 / n2 ) The pooled (=combined) standard deviation: spooled = {(xi–x1) + (xj–x2) + … } / (n1+n2 + …) (sd / n ) = (s2pooled / n1 ) +(s2pooled / n2 ) = spooled / (n1+ n2) / n1n2 . t = (x1–x2) / spooled / (n1+ n2) / n1n2 . If tcalculated > t table (95%), the difference is significant. Paired data Ho : o = o o = 0 t = (d–0) / (sd / n ) d = di /n Ex. Glucose in serum (mg/L) Patient 1 Patient 2 Patient 3 Patient 4 Patient 5 Patient 6 mean Method A 1044 720 845 800 957 650 836.0 146.5 Method B 1028 711 820 795 935 639 821.3 142.7 Difference 16 9 25 5 22 11 14.67 7.76 n = 6, di = 16+9+25+5+22+11= 88, di2 = 1592, d = 14.67 sd = {1592 – (88)2/6} / (6 – 1) = 7.76 t = 14.67 / (7.76 / 6 ) = 4.628 DF = n –1 = 6 –1 = 5 t (4.628) > tcrit (2.57) CL 95% tcrit = 2.57 Ho is rejected Method A Method B s Errors in hypothesis testing A type I error occurs when H0 is rejected although it is actually true. In some sciences, a type I error is called a false negative. A type II error occurs when H0 is accepted and it is actually false. This is sometimes termed a false positive. The F test : comparison of precision The F test is used to compare the precision of two sets of data. The F test is a test designed to indicate whether there is a significant difference between two methods based on their standard deviations. F is defined in terms of the variance of the two methods. F = s1 2 / s2 2 V 1 / V 2 Where s12 > s22. There are two different degrees of freedom. If the calculated F value exceeds a tabulated F value at selected confidence level, then there is a significant difference between the variances of the two methods. Critical Values of F at the 5% Probability Level (95% confidence level) Degrees of Freedom (Denominator) Degrees of Freedom(Numerator) 2 3 4 5 6 10 12 20 ∞ 2 19.00 19.16 19.25 19.30 19.33 19.40 19.41 19.45 19.50 3 9.55 9.28 9.12 9.01 8.94 8.79 8.74 8.66 8.53 4 6.94 6.59 6.39 6.26 6.16 5.96 5.91 5.80 5.63 5 5.79 5.41 5.19 5.05 4.95 4.74 4.68 4.56 4.36 6 5.14 4.76 4.53 4.39 4.28 4.06 4.00 3.87 3.67 10 4.10 3.71 3.48 3.33 3.22 2.98 2.91 2.77 2.54 12 3.89 3.49 3.26 3.11 3.00 2.75 2.69 2.54 2.30 20 3.49 3.10 2.87 2.71 2.60 2.35 2.28 2.12 1.84 ∞ 3.00 2.60 2.37 2.21 2.10 1.83 1.75 1.57 1.00 Analysis of Variance (ANOVA) ANOVA is used to test whether a difference exists in the means of more than two populations. After ANOVA indicates a potential difference, multiple comparison procedures can be used to identify which specific population means differ from the others. In ANOVA procedures, we detect differences in several population means by comparing the variance. For comparing I population means, 1, 2, 3, ..., the null hypothesis H0 is of the form H0 : 1 = 2 = 3 = … = i And the alternative hypothesis Ha is Ha : at least two of the i’s are different. The populations have differing values of a common characteristic called a factor or sometimes a treatment. The different values of the factor of interest are called levels. The comparison among the various populations are made by measuring a response for each item sampled. The factor can be considered the independent variable, whereas the response is the dependent variable. The basic principle of ANOVA is to compare the variations between the different factor levels (groups) with those within factor levels. Pictorial of the results from the ANOVA study of the determination of calcium by five analysts. Each analyst does the determination in triplicate. Analyst is considered a factor, whereas analyst 1, analyst 2, analyst 3, analyst 4, and analyst 5 are levels of the factor. Pictorial representation of the ANOVA principle. The results of each analyst are considered a group. The triangles represent individual results, and the circle represent the means. Here the variation between the group means is compared with that within groups. Single-Factor ANOVA H0 : 1 = 2 = 3 = … = i x1, x2, x3, … , xi s 2 1 , s2 2 , s2 3 , … , s2 i The grand average (x) is the average of all the data. x = (n1/N) x1 + (n2/N) x2 + (n3/N) x3 + … + (ni/N) xi Where N is the total number of measurements. 1. The sum of the squares due to the factor (SSF): SSF = n1(x1–x)2 + n2(x2–x)2 + n3(x3–x)2 + … + ni(xi–x)2 2. The sum of the squares due to error (SSE): SSE = (x1j–x1)2 + (x2j– x2)2 + (x3j– x3)2 + … + (xij– xi)2 SSE = (n1 –1) s21 + (n2 –1) s22 + (n3 –1) s23 + …. + (ni –1) s2i 3. The total sum of the squares (SST): SST = SSF + SSE ANOVA (Analysis of variance) Source of Variation Sum of Squares(SS) Degrees of Freedom(df) Btween groups (factor effect) SSF I–1 Within groups (error) SSE N–1 Total SST N–1 Mean Square (MS) MSF= SSF I –1 MSE= SSE N –1 SSF = The sum of the squares due to the factor SSE = The sum of the squares due to error SST = The total sum of the squares = SSF + SSE (N – 1) = (I – 1 ) + ( N – 1) F = MSF / MSE Mean Square Estimates σ2E + σ2F σ0E F MSF MSE Determining which results differ In the least significant difference (LSD) method, a difference is calculated that is judged to be the smallest difference that is significant. The difference between each pair of means is then compared with the least significant difference to determine which means are different. For an equal number of replicates Ng in each group, the least significant difference is calculated as follows: LSD = t (2×MSE) / Ng The value of t should have (N – I) degrees of freedom. Detection of gross errors: Rejection of aberrant data : the Q-test Q = gap / range If Qobserved > Q tabulated, discard the questionable point Critical Values for the Rejection Quotient, Q* Qcrit (Reject if Q> Q crit) Number of Observations 90% confidence 95% confidence 99%confidence 3 0.941 0.970 0.994 4 0.765 0.829 0.926 5 0.642 0.710 0.821 6 0.560 0.625 0.740 7 0.507 0.568 0.680 8 0.468 0.526 0.634 9 0.437 0.493 0.598 10 0.412 0.466 0.568 Example : gap =0.11 12.47 12.48 12.53 12.56 12.67 Range = 0.20 Q = 0.11 /0.20 = 0.55 < 0.64 (table value, =0.10) 12.67 should be retained. Values of Q for rejection of data Q (90% confidence) Number of observation 0.94 3 0.76 4 0.64 0.56 0.51 5 6 7 0.47 0.44 0.41 8 9 10 Summary confidence interval, confidence level Student’s t null hypothesis The t test for differences in means type I error type II error F test ANOVA Q-test