CHEMISTRY 59-320
ANALYTICAL CHEMISTRY
Fall - 2010
Lecture 6
• Example : Daily level of an impurity in a reactor has a mean
4.0 and
= 0.3.
What is the probability that the impurity level on a randomly chosen day will exceed 4.4?
z
0.3
1.333
Tail = 0.0918 or
~ 9%
• The more times you measure a quantity, the more confident that the average of your measurements is close to the true population mean.
• Uncertainty decreases in proportion to , where n is the number of measurement n
Confidence interval: Interval within which the true value almost certainly lies!
• In equation 4 - 6 t is a statistical factor that depends on the number of degrees of freedom
( degrees of freedom = N-1).
• n is the number of measurements
Values of t at different confidence levels and degrees of freedom are located in table 4.2
• Exercise 4A: For the numbers 116.0, 97.9,
114.2, 106.8 and 108.3, find the mean, standard deviation, and 90% confidence interval for the mean.
• Solution: the mean = (116.0 + 97.9+
114.2+106.8+108.3)/5 = 108.6
4 the standard deviation s = … the t value from Table 4-2 is: 2.132
use equation 4-6 to calculate the confidence interval:
• Standard deviation is frequently used as the estimated uncertainty.
• It is a good practice to report the number of measurement so that confidence level can be calculated
• Confidence limits and the t test assume that data follow a Gaussian distribution. If they do not, different formulas would be required.
• t test can be used to compare whether two sets of measurements are “the same”, i.e. whether the observed difference between the two means arises from purely random measurement error.
• We customarily accept the result if we have a
95% chance that the conclusion is correct.
• Computing the 95% confidence interval for your answer and check if that range includes the “known” answer.
• If the known answer is not within the 95% confidence interval, the results do not agree.
• A reliable assay shows that the ATP (adenosine triphosphate) content of a certain cell type is 111
μmol/100 mL. You developed a new assay, which gave the following values for replicate analyses: 117,
119, 111, 115, 120 μmol/100 mL (average = 116.4).
Can you be 95% confident that your result differs from the “known” value?
The 95% confidence interval does not include the accepted value of 111 μmol/100 mL, so the difference is significant.
• Lord Rayleigh’s experiments: the discovery of Argon.
• For two sets of data consisting of n
1 measurements with averages , calculate a
1 2 value of t with the formula and n
2
• Find t in Table 4-2 for n
1
+ n
2 t calculate
>T table
-2 degree of freedom. If
(95%), the difference is significant
t
• Situation: using two methods to make single measurements on different samples, i.e. no measurement has been duplicated.
•
• To see if there is a significant difference between the methods, one uses paired t test.
T = 2.228 for 95% CI
Related: Problems: 4.1 to 4.4 and 4.7,
4.17, 4.19 to 4.22.
F
• If the standard deviations of the two data set are significantly different, then the following equation is needed for the t test.
• The F test tells us whether two standard deviations are significantly different from each other.
• F = s
1
2 /s
2
2
• Use degrees of freedoms
1 find a F value from Table 4-4.
and
2 to
• If the calculated F value exceeds a tabulated F value at a selected confidence level (95%), then there is a significant difference between the variances of the two methods.
• Problem 4-17
. If you measure a quantity 4 times and the standard deviation is 1.0% of the average, can you be 90% confidence that the true value is within 1.2% of the measured average.
• The Q test is used to determine if an
“ outlier ” is due to a determinate error. If it is not, then it falls within the expected random error and should be retained.
• Q = gap/w where gap = difference between “ outlier ” and nearest result and w = range of results.
• If Q calculate
> Q table
, the questionable point should be discarded.
0.55 < 0.64
4-7: The method of least squares
(Regression Analysis)
The straight line model y
mx b
Starting point: Line through the origin y
mx
Experience suggests that there is an error in the response, therefore, y obs
mx i i
; represents the error
;
obs
The method of least squares takes the best fitting model by minimizing the quantity,
S
S
n
( )
( i
1 y obs
x i
)
2
A plot of S as a function of Beta produces a minimum with a constant least square estimate for beta “m”.
After “m” is known, you have all the calculated values y i
mx i
The difference between these two values is the residual, and the sum of the squares of the residuals is also a minimum value.
S
R
i n
1
( y obs
y i
)
2
S
S
n
( )
( i
1 y obs
x i
)
2 y i
mx i
S
R
i n
1
( y obs
y i
)
2
Estimate of the experimental error variance, s 2
S
R
i n
1
( y obs
y i
)
2 s
2 n
S
R
1
The coefficient of determination R 2 is the proportion of variability in a data set that is accounted for by a statistical model.
The version most common in statistics texts is based on analysis of variance decomposition as follows:
R
2
SS
R
SS
T
SS
R
SS
T
i n
1
( y i
y )
2
i n
1
( y obs
y )
2