Analytical Chemistry I / Lecture Note 7-1

Version 2012 Updated on 030212 Copyright © All rights reserved
Dong-Sun Lee, Prof., Ph.D. Chemistry, Seoul Women’s University
Chapter 7
Statistical Data Treatment and
Evaluation
America's most beloved illustrator
Norman Rockwell - Jury Holdout
Artist Name: NORMAN ROCKWELL
Title: "JURY ROOM" or sometimes "THE HOLDOUT"
Print size: 13.5" X 11" . Release Date: POST Cover February
14, 1959
“This lively cover provided Rockwell with the
opportunity to do character studies of eleven good
men and true, plus that of one determined woman who
is not about to be shaken by arguments that she finds
unconvincing. Rockwell has attacked with relish the
problem of portraying the debris produced during the
course of the marathon session leading up to the
moment shown here.”
Not-guilty or guilty ?
Norman Rockwell’s Saturday Evening Post cover
The Holdout, from Feb. 14, 1959. One of the 12
jurors dose not agree with the others, who are trying to
convince her.
http://www.curtispublishing.com/images/Rockwell/9590214.jpg
http://stores.ebay.com/Norman-Rockwells
Confidence Intervals
In statistical inference, one wishes to estimate population parameters using
observed sample data.
A confidence interval gives an estimated range of values which is likely to
include an unknown population parameter, the estimated range being
calculated from a given set of sample data. (Definition taken from Valerie J.
Easton and John H. McColl's Statistics Glossary v1.1)
The common notation for the parameter in question is  . Often, this
parameter is the population mean  , which is estimated through the sample
mean x.
The level C of a confidence interval gives the probability that the interval
produced by the method employed includes the true value of the parameter .
Significance level
The probability that a result is outside the confidence interval is often called
the significance level. When expressed as a fraction, the significance level
is often given the symbol .
Confidence level
The confidence level is the probability value 1–  associated with a
confidence interval, where is the level of significance. It can also be
expressed as a percentage 100(1 –  )% and is than sometimes called the
confidence coefficient.
The confidence level (CL) is related to on a percentage basis by
CL = (1 –  ) ×100%
Confidence interval for mean ;
accuracy of an analysis
The confidence interval (CI) for the mean is the range of values within
which the true population mean, , is expected to lie with a certain
probability distance from the measured mean, x.
The confidence level (CL) is the probability that the true mean lies within a
certain interval. It is often expressed as a percentage(%).
Finding the confidence interval when  is
known or s is a good estimate of  : (s ) :
Single measurement :
CI for  = x  z
n measurement :
CI for  = x  z/ n
Confidence
Level, %
z
50.0
0.67
68.0
1.00
80.0
1.28
90.0
1.64
95.0
1.96
95.4
2.00
99.0
2.58
99.7
3.00
99.9
3.29
Areas under a Gaussian curve for various values of z.
Finding the confidence interval when  is unknown ;
For a single measurement :
t = (x – ) / s
For the mean of n measurements: t = (x – ) / (s / n )
where t is a statistical constant that depends both on the , confidence level
and the number of measurements(DF: degree of freedom, n1) involved.
CI for  :
Values of t for Various Levels of Probability
Degrees of
Freedom
80%
90%
95%
99%
99.9%
1
3.08
6.31
12.7
63.7
637
2
1.89
2.92
4.30
9.92
31.6
3
1.64
2.35
3.18
5.84
12.9
4
1.53
2.13
2.78
4.60
8.61
5
1.48
2.02
2.57
4.03
6.87
6
1.44
1.94
2.45
3.71
5.96
7
1.42
1.90
2.36
3.50
5.41
8
1.40
1.86
2.31
3.36
5.04
9
1.38
1.83
2.26
3.25
4.78
10
1.37
1.81
2.23
3.17
4.59
15
1.34
1.75
2.13
2.95
4.07
20
1.32
1.73
2.09
2.84
3.85
40
1.30
1.68
2.02
2.70
3.55
60
1.30
1.67
2.00
2.62
3.46
∞
1.28
1.64
1.96
2.58
3.29
Breath alcohol analyzers.
Many states have ruled that ablood alcohol level of 0.1% or greater
indicates intoxication.
Ex. A Chemist obtained the following data for the alcohol content of a sample of
blood: % C2H5OH: 0.084, 0.089 and 0.079.
Calculate the 95% CI for the mean assuming (a) the three results obtained are
the only indication of the precision of the method and (b) from previous
experience on hundreds of samples, we know that the standard deviation of
the method s = 0.005% and is a good estimate of  (s ).
Values of t for 2 and 3 degree of freedom at the 95% confidence level:
4.3 ( 2 degree of freedom), 3.18 (3 degree of freedom)
(a) xi = 0.084 + 0.089 + 0.079 = 0.252
xi2 = (0.084)2 + (0.089)2 + (0.079)2 = 0.021218
x = 0.252 / 3 = 0.084
s = {(xi –x )2 /(n – 1)}1/2 = 0.0050%
95% CI
= 0.084  (4.30  0.0050) /  3 = 0.084 ± 0.012%
(b) 95% CI = x ± z /n = 0.084 ± (1.96×0.0050)/3= 0.084 ± 0.006%
Significance level
The probability of a false rejection of the null hypothesis in a statistical test.
Also called level of significance .
The significance level of a test is the probability that the test statistic will reject
the null hypothesis when the hypothesis is true. Significance is a property of the
distribution of a test statistic, not of any particular draw of the statistic.
In hypothesis testing, the significance level is the criterion used for rejecting the
null hypothesis. The significance level is used in hypothesis testing as follows:
First, the difference between the results of the experiment and the null
hypothesis is determined. Then, assuming the null hypothesis is true, the
probability of a difference that large or larger is computed . Finally, this
probability is compared to the significance level. If the probability is less than
or equal to the significance level, then the null hypothesis is rejected and the
outcome is said to be statistically significant. Traditionally, experimenters have
used either the .05 level (sometimes called the 5% level) or the .01 level (1%
level), although the choice of levels is largely subjective. The lower the
significance level, the more the data must diverge from the null hypothesis to
be significant. Therefore, the .01 level is more conservative than the .05 level.
The Greek letter alpha () is sometimes used to indicate the significance level.
Hypothesis testing
Hypothesis testing is a method of inferential statistics. An experimenter starts with
a hypothesis about a population parameter called the null hypothesis. Data are then
collected and the viability of the null hypothesis is determined in light of the data.
If the data are very different from what would be expected under the assumption
that the null hypothesis is true, then the null hypothesis is rejected. If the data are
not greatly at variance with what would be expected under the assumption that the
null hypothesis is true, then the null hypothesis is not rejected. Failure to reject the
null hypothesis is not the same thing as accepting the null hypothesis.
Null hypothesis
In statistics, a null hypothesis is a hypothesis that is presumed true until statistical
evidence in the form of a hypothesis test indicates otherwise. It is a hypothesis that
the parameters, or mathematical characteristics, of two or more populations are
identical.
The null hypothesis is an hypothesis about a population parameter. The purpose of
hypothesis testing is to test the viability of the null hypothesis in the light of
experimental data. Depending on the data, the null hypothesis either will or will
not be rejected as a viable possibility.
Consider a researcher interested in whether the time to respond to a tone is
affected by the consumption of alcohol. The null hypothesis is that µ1 – µ2 = 0
where µ1 is the mean time to respond after consuming alcohol and µ2 is the mean
time to respond otherwise. Thus, the null hypothesis concerns the parameter µ1 –
µ2 and the null hypothesis is that the parameter equals zero.
The null hypothesis is often the reverse of what the experimenter actually believes;
it is put forward to allow the data to contradict it. In the experiment on the effect
of alcohol, the experimenter probably expects alcohol to have a harmful effect. If
the experimental data show a sufficiently large effect of alcohol, then the null
hypothesis that alcohol has no effect can be rejected.
It should be stressed that researchers very frequently put forward a null
hypothesis in the hope that they can discredit it. For a second example, consider
an educational researcher who designed a new way to teach a particular concept
in science, and wanted to test experimentally whether this new method worked
better than the existing method. The researcher would design an experiment
comparing the two methods. Since the null hypothesis would be that there is no
difference between the two methods, the researcher would be hoping to reject the
null hypothesis and conclude that the method he or she developed is the better of
the two.
The symbol H0 is used to indicate the null hypothesis. For the example just given,
the null hypothesis would be designated by the following symbols:
H0: µ1 – µ2 = 0
or by
H0: µ1 = µ2.
The null hypothesis is typically a hypothesis of no difference as in this example
where it is the hypothesis of no difference between population means. That is
why the word "null" in "null hypothesis" is used -- it is the hypothesis of no
difference.
Despite the "null" in "null hypothesis," there are occasions when the
parameter is not hypothesized to be 0. For instance, it is possible for the
null hypothesis to be that the difference between population means is a
particular value. Or, the null hypothesis could be that the mean SAT score
in some population is 600. The null hypothesis would then be stated as: H0:
μ = 600. Although the null hypotheses discussed so far have all involved
the testing of hypotheses about one or more population means, null
hypotheses can involve any parameter. An experiment investigating the
correlation between job satisfaction and performance on the job would test
the null hypothesis that the population correlation (ρ) is 0. Symbolically,
H0: ρ = 0.
Some possible null hypotheses are given below:
H0: μ=0
H0: μ=10
H 0: μ1 – μ2 = 0
H0: π = .5
H0: π1 – π2 = 0
H 0: μ1 = μ2 = μ3
H0: ρ1 – ρ2= 0
When a one-tailed test is conducted, the null hypothesis includes the
direction of the effect. A one-tailed test of the differences between means
might test the null hypothesis that μ1 – μ2 is greater than 0. If M1 – M2 were
much less than 0 then the null hypothesis would be rejected in favor of the
alternative hypothesis (Ha): μ1 – μ2 < 0.
The p-value (level of significance)
All statistical tests produce a p-value and this is equal to the probability of
obtaining the observed difference, or one more extreme, if the null hypothesis is
true. To put it another way - if the null hypothesis is true, the p-value is the
probability of obtaining a difference at least as large as that observed due to
sampling variation.
Consequently, if the p-value is small the data support the alternative hypothesis.
If the p-value is large the data support the null hypothesis. But how small is
'small' and how large is 'large'?!
Conventionally (and arbitrarily) a p-value of 0.05 (5%) is generally regarded as
sufficiently small to reject the null hypothesis. If the p-value is larger than 0.05
we fail to reject the null hypothesis.
The 5% value is called the significance level of the test. Other significance
levels that are commonly used are 1% and 0.1%. Some people use the
following terminology:
p-value
Outcome of test
Statement
greater than 0.05
Fail to reject H0
No evidence to reject H0
between 0.01 and 0.05 Reject H0 (Accept H1) Some evidence to reject H0
(therefore accept H1)
between 0.001 and
0.01
Reject H0 (Accept H1) Strong evidence to reject H0
(therefore accept H1)
less than 0.001
Reject H0 (Accept H1) Very strong evidence to
reject H0 (therefore accept
H1)
Hypothesis Testing
To explain an observation, a hypothetical model is advanced and is tested
experimentally to determine its validity.
In statistics, a null hypothesis postulates that two or more observed quantities
are the same.
Comparing an experimental mean with a known value
Large sample z test:
1.
State the null hypothesis: Ho :  = o
2.
Form the test statistics : z = (x – o) / ( /  n)
3.
State the alternative hypothesis, Ha, and determine the rejection region:
For Ha :   o, reject Ho if z  zcrit or if –zcrit  z
For Ha : Ha :  > o, reject Ho if z  zcrit
For Ha :  < o, reject Ho if –zcrit  z .
Rejection regions for the 95% confidence level.
(a) Two-tailed test for Ha :   o. Note the critical
value of z is 1.96.
(b) One-tailed test for Ha :  > o. Here, the critical
value of zcrit is 1.64, so that 95% of the area is to
the left of and 5% of the area is to the right.
(c) One-tailed test for Ha :  < o . Here the critical
value is again 1.64, so that 5% of the area lies to
the left of –zcrit.
Small sample t test:
1.
State the null hypothesis: Ho :  = o
2.
Form the test statistics : t = (x – o) / (s /  n)
3.
State the alternative hypothesis, Ha, and determine the rejection region:
For Ha :   o, reject Ho if t  tcrit or if –tcrit  t
For Ha : Ha :  > o, reject Ho if t  tcrit
For Ha :  < o, reject Ho if –tcrit  t .
Illustration of systematic error in an analytical method. Curve A is the
frequency distribution for the accepted value (0) by a method without bias.
Curve B illustrates the frequency distribution of results by a method that
could have a significant bias.
bias = B – 0
Comparison of two experimental means
The t test for differences in means
Two sets of data: x1, n1 replicate analyses, s1
x2, n2 replicate analyses, s2
The standard error of the mean (sm) is the standard deviation of a set of data divided by
the square root of the number of data points in the set.
sm1 = s1 / n1
The variance of the mean :
s2m1 = s21 / n1 , s2m2 = s22 / n2
The variance of the difference (s2d ) between the means:
s2d = s2m1 + s2m2
(sd / n ) =  (s21 / n1 ) +(s22 / n2 )
The pooled (=combined) standard deviation:
spooled = {(xi–x1) + (xj–x2) + … } / (n1+n2 + …)
(sd / n ) =  (s2pooled / n1 ) +(s2pooled / n2 ) = spooled /  (n1+ n2) / n1n2 .
t = (x1–x2) / spooled /  (n1+ n2) / n1n2 .
If tcalculated > t table (95%), the difference is significant.
Paired data
Ho : o = o
o = 0
t = (d–0) / (sd / n )
d =  di /n
Ex. Glucose in serum (mg/L)
Patient 1 Patient 2
Patient 3 Patient 4 Patient 5 Patient 6 mean
Method A
1044
720
845
800
957
650
836.0
146.5
Method B
1028
711
820
795
935
639
821.3
142.7
Difference
16
9
25
5
22
11
14.67
7.76
n = 6, di = 16+9+25+5+22+11= 88, di2 = 1592, d = 14.67
sd = {1592 – (88)2/6} / (6 – 1) = 7.76
t = 14.67 / (7.76 / 6 ) = 4.628
DF = n –1 = 6 –1 = 5
t (4.628) > tcrit (2.57)
CL 95%
tcrit = 2.57
 Ho is rejected
 Method A  Method B
s
Errors in hypothesis testing
A type I error occurs when H0 is rejected although it is actually true. In
some sciences, a type I error is called a false negative.
A type II error occurs when H0 is accepted and it is actually false. This is
sometimes termed a false positive.
The F test : comparison of precision
The F test is used to compare the precision of two sets of data. The F test is
a test designed to indicate whether there is a significant difference between
two methods based on their standard deviations. F is defined in terms of
the variance of the two methods.
F = s1 2 / s2 2  V 1 / V 2
Where s12 > s22. There are two different degrees of freedom. If the
calculated F value exceeds a tabulated F value at selected confidence level,
then there is a significant difference between the variances of the two
methods.
Critical Values of F at the 5% Probability Level (95% confidence level)
Degrees of
Freedom
(Denominator)
Degrees of Freedom(Numerator)
2
3
4
5
6
10
12
20
∞
2
19.00
19.16
19.25
19.30
19.33
19.40
19.41
19.45
19.50
3
9.55
9.28
9.12
9.01
8.94
8.79
8.74
8.66
8.53
4
6.94
6.59
6.39
6.26
6.16
5.96
5.91
5.80
5.63
5
5.79
5.41
5.19
5.05
4.95
4.74
4.68
4.56
4.36
6
5.14
4.76
4.53
4.39
4.28
4.06
4.00
3.87
3.67
10
4.10
3.71
3.48
3.33
3.22
2.98
2.91
2.77
2.54
12
3.89
3.49
3.26
3.11
3.00
2.75
2.69
2.54
2.30
20
3.49
3.10
2.87
2.71
2.60
2.35
2.28
2.12
1.84
∞
3.00
2.60
2.37
2.21
2.10
1.83
1.75
1.57
1.00
Analysis of Variance (ANOVA)
ANOVA is used to test whether a difference exists in the means of more than two
populations. After ANOVA indicates a potential difference, multiple comparison
procedures can be used to identify which specific population means differ from the
others.
In ANOVA procedures, we detect differences in several population means by
comparing the variance. For comparing I population means, 1, 2, 3, ..., the null
hypothesis H0 is of the form
H0 : 1 = 2 = 3 = … = i
And the alternative hypothesis Ha is
Ha : at least two of the i’s are different.
The populations have differing values of a common characteristic called a factor or
sometimes a treatment. The different values of the factor of interest are called levels.
The comparison among the various populations are made by measuring a response for
each item sampled. The factor can be considered the independent variable, whereas
the response is the dependent variable.
The basic principle of ANOVA is to compare the variations between the different
factor levels (groups) with those within factor levels.
Pictorial of the results from the ANOVA study of the determination of calcium by
five analysts. Each analyst does the determination in triplicate. Analyst is considered
a factor, whereas analyst 1, analyst 2, analyst 3, analyst 4, and analyst 5 are levels of
the factor.
Pictorial representation of the ANOVA principle. The results of each
analyst are considered a group. The triangles represent individual results,
and the circle represent the means. Here the variation between the group
means is compared with that within groups.
Single-Factor ANOVA
H0 : 1 = 2 = 3 = … = i
x1, x2, x3, … , xi
s 2 1 , s2 2 , s2 3 , … , s2 i
The grand average (x) is the average of all the data.
x = (n1/N) x1 + (n2/N) x2 + (n3/N) x3 + … + (ni/N) xi
Where N is the total number of measurements.
1.
The sum of the squares due to the factor (SSF):
SSF = n1(x1–x)2 + n2(x2–x)2 + n3(x3–x)2 + … + ni(xi–x)2
2. The sum of the squares due to error (SSE):
SSE =  (x1j–x1)2 + (x2j– x2)2 + (x3j– x3)2 + … + (xij– xi)2
SSE = (n1 –1) s21 + (n2 –1) s22 + (n3 –1) s23 + …. + (ni –1) s2i
3. The total sum of the squares (SST):
SST = SSF + SSE
ANOVA (Analysis of variance)
Source of
Variation
Sum of
Squares(SS)
Degrees of
Freedom(df)
Btween groups
(factor effect)
SSF
I–1
Within groups
(error)
SSE
N–1
Total
SST
N–1
Mean Square
(MS)
MSF=
SSF
I –1
MSE= SSE
N –1
SSF = The sum of the squares due to the factor
SSE = The sum of the squares due to error
SST = The total sum of the squares = SSF + SSE
(N – 1) = (I – 1 ) + ( N – 1)
F = MSF / MSE
Mean Square
Estimates
σ2E + σ2F
σ0E
F
MSF
MSE
Determining which results differ
In the least significant difference (LSD) method, a difference is calculated
that is judged to be the smallest difference that is significant. The difference
between each pair of means is then compared with the least significant
difference to determine which means are different.
For an equal number of replicates Ng in each group, the least significant
difference is calculated as follows:
LSD = t (2×MSE) / Ng
The value of t should have (N – I) degrees of freedom.
Detection of gross errors: Rejection of aberrant
data : the Q-test
Q = gap / range
If Qobserved > Q tabulated, discard the questionable point
Critical Values for the Rejection Quotient, Q*
Qcrit (Reject if Q> Q crit)
Number of
Observations
90% confidence
95% confidence
99%confidence
3
0.941
0.970
0.994
4
0.765
0.829
0.926
5
0.642
0.710
0.821
6
0.560
0.625
0.740
7
0.507
0.568
0.680
8
0.468
0.526
0.634
9
0.437
0.493
0.598
10
0.412
0.466
0.568
Example :
gap =0.11
12.47 12.48 12.53 12.56
12.67
Range = 0.20
 Q = 0.11 /0.20 = 0.55 < 0.64 (table value, =0.10)
 12.67 should be retained.
Values of Q for rejection of data
Q (90% confidence)
Number of observation
0.94
3
0.76
4
0.64
0.56
0.51
5
6
7
0.47
0.44
0.41
8
9
10
Summary
confidence interval, confidence level
Student’s t
null hypothesis
The t test for differences in means
type I error
type II error
F test
ANOVA
Q-test