Document

advertisement
Chem. 31 – 2/11 Lecture
Announcements
• Today’s Lecture
– Chapter 4 Material
• Probability within Limits
• Confidence Intervals
• Statistical Tests
Chapter 4 – Gaussian Distributions
Now for a “real” limit problem example:
A man wants to get life insurance. If his measured
cholesterol level is over 240 mg/dL (2,400 mg/L), his
premium will be 25% higher. His level is measured and
found to be 249 mg/dL. His uncle, a biochemist who
developed the test, tells him that a typical standard
deviation on the measurement is 25 mg/dL. What is the
chance that a second measurement (with no crash diet
or extra exercise) will result in a value under 240 mg/dL
(e.g. beat the test)?
Graphical view of example
Equivalent Area
Frequency
Normal Distribution
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
Table area
Desired area
-5
-4
-3
-2
-1
0
1
2
3
4
5
Z value
240
249
X-axis
Chapter 4 – Calculation of Confidence
Interval
1.
2.
x
n
Z depends on area or desired
probability
At Area = 0.45 (90% both sides),
Z = 1.65
At Area = 0.475 (95% both sides), Z =
1.96 => larger confidence interval
Normal Distribution
Frequency
Confidence Interval = x + uncertainty
Calculation of uncertainty depends on
whether σ is “well known”
3.
When s is not well known (covered
later)
4.
When s is well known (not in text)
Value + uncertainty =
Zs
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-3
-2
-1
0
Z value
1
2
3
Chapter 4 – Calculation of
Uncertainty
Example:
The concentration of NO3- in a sample is measured 2 times
and found to give 18.6 and 19.0 ppm. The method is
known to have a constant relative standard deviation of
2.0% (from past work). Determine the concentration
and 95% confidence interval.
Chapter 4 –
Calculation of Confidence Interval with s Not Known
Value + uncertainty =
tS
x
n
t = Student’s t value
t depends on:
- the number of samples (more samples => smaller t)
- the probability of including the true value (larger
probability => larger t)
Chapter 4 –
Calculation of Uncertainties Example
• Measurement of lead in drinking water
sample:
– values = 12.3, 9.8, 11.4, and 13.0 ppb
• What is the 95% confidence interval?
Chapter 4 –
Ways to Reduce Uncertainty
1. Decrease standard deviation in
measurements (usually requires more
skill in analysis or better equipment)
2. Analyze each sample more time (this
increases n and decreases t)
3. Understand variability better (so that s is
known and Z-based uncertainty can be
used)
Overview of Statistical Tests
• t-Tests: Determine if a systematic error
exists in a method or between methods or
if a difference exists in sample sets
• F-Test: Determine if there is a significant
difference in standard deviations in two
methods or sample sets (which method is
more precise/which set is more variable)
• Grubbs Test: Determine if a data point
can be excluded on a statistical basis
Statistical Tests
Possible Outcomes
• Outcome #1 – There is a statistically significant
result (e.g. a systematic error)
– this is at some probability (e.g. 95%)
– can occasionally be wrong (5% of time possible if test
barely valid at 95% confidence)
• Outcome #2 – No significant result can be
detected
– this doesn’t mean there is no systematic error
– it does mean that the systematic error, if it exists, is
not detectable (e.g. not observable due to larger
random errors)
– It is not possible to prove a null hypothesis beyond
any doubt
Statistical Tests
Example from Research This Week
• Goal of Work: be able to consistently use
high resolution mass spectrometer to
measure mass with error less than 5 ppm
(limit set for publication in several journals)
• Measurement is challenging and could be
subject to poor data treatment (e.g. selection
of “good” vs. “bad” data)
• Do any measurements within 5 ppm limit
meet the requirement?
• No. We couldn’t just pick 1 out of 4
repeated measurements that meets the
standard. We want to be 95+% certain true
measured value is within the 5 ppm limit
• So we need to use statistics to set rules for
meeting the limit
• In this case (different than tests in this
class), measured value is acceptable if
furthest 90% limit is within 5 ppm limit and
closest 95% limits is within 5 ppm limit
Measured
Mass =
809.4569
amu
Example compound:
expected mass =
809.4587 amu
To meet 5 ppm limit,
meas. mass = 809.4547
to 809.4628
Statistical Tests
Example from Research This Week
• Graphical Explanation of Mass Measurement
– multiple mass measurements made – giving:
• mean value +/- 90% and 95% CIs
– not only mean but 90%/95% limits need to be within
limit
– in example, >5% chance of error
expected mass
(from mass of
each atom)
mean measured
mass
expected distribution –
based on SD
90% high limit out
of range
+ and – 5 ppm
Statistical Tests
t Tests
• Case 1
– used to determine if there is a significant bias by measuring a
test standard and determining if there is a significant difference
between the known and measured concentration
• Case 2
– used to determine if there is a significant differences between
two methods (or samples) by measuring one sample multiple
time by each method (or each sample multiple times)
• Case 3
– used to determine if there is a significant difference between
two methods (or sample sets) by measuring multiple sample
once by each method (or each sample in each set once)
Case 1 t test
• Methylmannopyranoside (MMP) example
• Added as an internal standard at 5 ppm
• Analysis will tell if sample causes a bias
compared to standard
Case 2 t test Example
• A winemaker found a barrel of wine that was labeled as
a merlot, but was suspected of being part of a
chardonnay wine batch and was obviously mis-labeled.
To see if it was part of the chardonnay batch, the mislabeled barrel wine and the chardonnay batch were
analzyed for alcohol content. The results were as
follows:
– Mislabeled wine: n = 6, mean = 12.61%, S = 0.52%
– Chardonnay wine: n = 4, mean = 12.53%, S = 0.48%
• Determine if there is a statistically significant difference
in the ethanol content.
Case 3 t Test Example
• Case 3 t Test used when multiple
samples are analyzed by two different
methods (only once each method)
• Useful for establishing if there is a
constant systematic error
• Example: Cl- in Ohio rainwater measured
by Dixon and PNL (14 samples)
Case 3 t Test Example –
Data Set and Calculations
Calculations
Conc. of Cl- in Rainwater
(Units = uM)
Step 1 –
Calculate
Difference
Sample #
Dixon Cl-
PNL Cl-
1
9.9
17.0
7.1
2
2.3
11.0
8.7
3
23.8
28.0
4.2
4
8.0
13.0
5.0
5
1.7
7.9
6.2
6
2.3
11.0
8.7
7
1.9
9.9
8.0
8
4.2
11.0
6.8
9
3.2
13.0
9.8
10
3.9
10.0
6.1
11
2.7
9.7
7.0
12
3.8
8.2
4.4
13
2.4
10.0
7.6
14
2.2
11.0
8.8
Step 2 - Calculate
mean and standard
deviation in differences
ave d = (7.1 + 8.7 + ...)/14
ave d = 7.49
Sd = 2.44
Step 3 – Calculate t value:
tCalc 
d
Sd
tCalc = 11.5
n
Case 3 t Test Example –
Rest of Calculations
• Step 4 – look up tTable
– (t(95%, 13 degrees of freedom) = 2.17)
• Step 5 – Compare tCalc with tTable, draw
conclusion
– tCalc >> tTable so difference is significant
t- Tests
• Note: These (case 2 and 3) can be applied to
two different senarios:
– samples (e.g. sample A and sample B, do they have
the same % Ca?)
– methods (analysis method A vs. analysis method B)
F - Test
• Similar methodology as t tests but to compare
standard deviations between two methods to
determine if there is a statistical difference in
precision between the two methods (or
variability between two sample sets)
FCalc
S1 > S2
S12
 2
S2
As with t tests, if FCalc > FTable,
difference is statistically significant
Download