Statistics for Analytical Chemistry

advertisement
Statistics for Analytical
Chemistry
Reading –lots to revise and learn




Chapter 3
Chapter 4
Chapter 5-1 and 5-2
Chapter 5-3 will be necessary background
for the AA lab
 Chapter 5-4 we will use later
Data Analysis
 Most data quantitative - derived from
measurements
 Never really know error
 With more measurements you get a
better idea what it might be
 Don’t spend a lot of time on an answer
-where only 20% accuracy is required
-or where sampling error is big although you don’t want to make the
error worse
Significant Figure Convention
 Final answer should only contain figures
that are certain, plus the first uncertain
number
 eg 45.2%
 error less than 1% or we would only write
45%
 error larger than 0.05% or would write
45.23%
Remember




Leading zeros are not significant
Trailing zeros are significant
0.06037 - 4 significant figures
0.060370 - 5 significant figures
 1200 ????
 12 x 102 - 2 significant figures
Rounding Off
 Round a 5 to nearest even number
 4.55 to 4.6
 Carry an extra figure all through calculations
 BUT NOT 6 EXTRA
 Just round off at the end
Adding
 Absolute uncertainty of answer must not exceed
that of most uncertain number
 Simple rule: Decimal places in answer = decimal
places in number with fewest places
12.2
00.365
01.04
13.605
goes to 13.6
When errors are known
 Rr =(A a) + (B b) + (C c)
 where r2 = a2 + b2 + c2
 Example: Calculate the error in the MW
of FeS from the following atomic
weights:
 Fe:55.847 0.004 S:32.064 0.003
 r = (0.0042 + 0.0032)1/2
 MW = 87.911 0.005
Multiplication and Division
 Simplest rule: Sig figs in answer =
smallest number of sig figs in any value
used
 This can lead to problems - particularly
if the first digit of the number is 9.
 1.07400 x 0.993 = 1.07
 1.07400 x 1.002 = 1.076
 Error is ~ 1/1000 therefore 4 significant
figs in answer
Multiplication and Division
 The relative uncertainty of the answer must
fall between 0.2 and 2.0 times the largest
relative uncertainty in the data used in the
calculation.
 Unless otherwise specified, the absolute
uncertainty in an experimental measurement
is taken to be +/- the last digit
Multiplication and Division
 With known errors - add squares of
relative uncertainties
 r/R = [(a/A)2 + (b/B)2 +(c/C)2]1/2
Logs
 Only figures in the mantissa (after the decimal
point) are significant figures
 Use as many places in mantissa as there are
significant figures in the corresponding number
 pH = 2.45
has 2 sig figs
Definitions
 Arithmetic mean, (average)
 Median -middle value
 for N=even number, use average of central
pair
Accuracy
 Deviation from true answer
 Difficult to know
 Best way is to use Reference standards
 National Bureau of Standards
 Traceable Standards
Precision
 Describes reproducibility of results
 What is used to calculate the confidence
limit
 Can use deviation from mean
 or relative deviation
 0.1/5 x 1000 = 20ppt (parts per thousand)
 0.1/5 x 100% = 2%
Precision of Analytical Methods






Absolute standard deviation s or sd
Relative standard deviation (RSD)
Standard deviation of the mean sm
Sm = s/N½
Coefficient of variation (CV) s/x x 100%
Variance
s2
Standard Curve
Readout
Not necessarily linear. Linear is
mathematically easier to deal with.
15
10
5
0
y = 1.9311x + 1.1127
2
R = 0.9888
0
1
2
3
4
5
[Ca] (mg/L)
6
7
Correlation coefficients
 Show how good a fit you have.
 R or R2
 For perfect correlation, R = 1, R2 = 1
[( xi  x)( yi  y)]
R 
2
2
( xi  x) ( yi  y)
2
2
LINEST
 Calculates slope and intercept
 Calculates the uncertainty in the slope and
the intercept
 Calculates R2
 Calculates s.d. of the population of y values
 See page pp 68-72, Harris.
Use these values to determine the number of sig figs
for the slope and intercept
Dealing with Random Errors
Indeterminate Error
 Repeating a coarse measurement gives the
same result
 eg weighing 50 g object to nearest g - only
error would be determinate - such as there
being a fault in the balance
 If same object was weighed to several
decimal places -get random errors
How many eggs in a dozen?
 How wide is your desk?
 Will everyone get the same answer?
 What does this depend on?
With a few
measurements,
the mean won’t
reflect the true
mean as well as
if you take
a lot of
measurements
Random errors
 With many measurements, more will be
close to the mean
 Various little errors add in different ways
 Some cancel - sometimes will all be one
way
 A plot of frequency versus value gives a bell
curve or Gaussian curve or normal error
curve
 Errors in a chemical analysis will fit this
curve
Equation for Gaussian Curve
e
y
 2
 ( xi  u ) 2
2 2
Let z 
xi  u

Then
y
e
z
2
2
 2
If z is abscissa (x axis)
 Same curve is always obtained
as
z expresses the deviation from the mean in
units of standard deviation
Statistics
 Statistics apply to an infinite number of
results
 Often we only do an analysis 2 or 3 times
and want to use the results to estimate the
mean and the precision
6868.3%: ±1 ,
95.4%: ±2 ,
99.7%: ±3
Standard deviation
 68.3% of area is within ± 1 of mean
 95.5% of area is within ± 2 of mean
 99.7% of area is within ± 3 of mean
 For any analysis, chances are 95.5 in
100 that error is ± 2
 Can say answer is within  ± 2 with
95.5% confidence
For a large data set
 Get a good estimate of the mean, 


iN
i 1
( xi  u )
N
2
 Know this formula -but use a calculator
 2 = variance
 Useful because additive
Small set of data
 Average (x )  
 An extra uncertainty
 The standard deviation calculated will differ
for each small set of data used
 It will be smaller than the value calculated
over the larger set
 Could call that a negative bias
s
iN
s
 (x
i 1
i
 x)
N 1
2
 For  use N in denominator
 For s use N-1 in denominator (we have one
less degree of freedom - don’t know )
 At end, round s to 2 sig figs or less if there
are not enough sig figs in data
Confidence Interval
 We are doing an analysis to find the
true mean  - it is unknown
 What we measure is x but it may not
be the same as 
 Set a confidence limit eg 4.5 ± 0.3 g
 The mean of the measurements was
4.5 g
 The true mean is in the interval 4.2-4.8
with some specified degree of
confidence
Confidence limit
 A measure of the reliability (Re)
 The reliability of a mean (x ) increases as
more measurements are taken
 Re = k(n)1/2
 Reliability increases with square root of
number of measurements
 Quickly reach a condition of limiting return
Reliability
 Would you want a car that is 95% reliable?
 How often would that break down?
Confidence Interval
 For 100 % confidence - need a huge interval
 Often use 95 %
 The confidence level chosen can change
with the reason for the analysis
Confidence Interval when s ~ 
 µ ± xi = 1.96  for 95 % confidence
 z = (xi - µ)/  =1.96
 Appropriate z values are given as a table
 This applies to a single measurement
 The confidence limit decreases as (N)1/2 as
more measurements are taken
Confidence Interval
 In the lab this year I will make you go home
before you can get enough data for s to = 
 Therefore we will have to do a different kind
of calculation to estimate the precision.
Student’s t-test
The Student's t-Test was formulated by W.
Gossett in the early 1900's. His employer
(brewery) had regulations concerning trade
secrets that prevented him from publishing his
discovery, but in light of the importance of the t
distribution, Gossett was allowed to publish
under the pseudonym "Student".
The t-Test is typically used to compare the
means of two populations
t-test
( xi  u )
t
s
 t depends on desired confidence limit
 degrees of freedom (N-1)
Degrees of Values of t for Various degrees of
Freedom
Probability
80%
90%
95%
99.9%
1
3.08
6.31
12.7
637
2
1.89
2.92
4.30
31.6
3
1.64
2.35
3.18
12.9
4
1.53
2.02
2.78
8.60
5
1.48
1.94
2.57
6.86
6
1.44
1.90
2.45
5.96
7
1.42
1.86
2.36
5.40
8
1.40
1.83
2.31
5.04

1.29
1.64
1.96
3.29
For practical purposes
 Assume  = s if you have made 20
measurements
 Sometimes  can be evaluated for a
particular technique rather than for each
sample
 Usually too time consuming to do 20
replicate measurements on each sample
CONFIDENCE
ts
  x 
N
Example
 Cal Culator obtained the following results for
replicate determinations of calcium in limestone
 14.35%, 14.41%, 14.40%, 14.32%, 14.37%
 each is xi
 Calculate the confidence interval
Answer








Average = 14.37 %
S = 0.037%
Choose a 95 % confidence limit
Degrees of freedom = N-1 = 5-1 =4
From t-table, t = 2.78
14.37% ± ts/N½
14.37 % ± 2.78 x 0.037% / 5 ½
14.37 ± 0.05 %
Significant figures
 I say: Use two or less significant figures in a
confidence limit. Then use the same number
of decimal places in both (guided by the CL)
 When less than two sig figs in the CL?
 When using two would require you to have
more decimal places than were in the actual
data.
The bunny gave up
Pooled standard deviation
s (n1  1)  s (n2  1)  ......
sp 
N  ns
2
1
2
2
ns  no of groups of samples
i  n1
sp 
i  n2
 (x  x )   (x  x
i 1
2
i
1
i 1
n1  n2  2
i
2
)
2
Comparison of Means
 We analyze several samples and want
to know if they are the same or different
 For each sample we take several
measurements and obtain a mean
2
1
2
2
s
s
If x1  x 2  t

n1 n2
there is no significan t difference
Comparing two means
Compare x1  x2
to
ts1
ts2

n1
n2
If s is a pooled sd
x1  x 2
tcalc 
s
n1n2
n1  n2
If tcalc  ttable then the difference is
not significan t at the chosen CL
Comparing two means
s1 2
s2 2
  x  t. (
) (
)
n1
n2
If s is the pooled s
x1  x 2
t
s
n1n2
n1  n2
If tcalc  ttable then the difference is
not significan t at the chosen CL
Example
 Two barrels of wine were analyzed for their
alcohol content to determine whether or not
they were from different sources:
 12.61% (6 analyses),
 12.53% (4 analyses)
 Pooled standard deviation = 0.07 %
12.61%  12.53% 6 * 4
t
1.77
0.07%
64




Degrees of freedom = 6+4-2=8
t at 95% CL for 8 deg of freedom =2.3
tcalc < ttable
therefore difference is not significant at
the 95% CL – the two samples are the
same at the 95% CL
Rejection of data- Q Test
 Qexp= questionable value-nearest numerical value

range
 Look up Table of Qcritical
 If Qexp < Qcritical, keep the point
 If more observations are taken it is easier to
determine if a point is an outlier
Calibration Sensitivity
 The slope of the calibration curve at the
concentration of interest
 Doesn’t take precision into account
Analytical Sensitivity
 Slope/s.d. = m/s.d.
 Where s = standard deviation of the signal
 Analytical sensitivity is independent of gain,
but can vary with the concentration as s can
depend on concentration
Limit of detection
 The minimum concentration detectable
at a known confidence level
 Is the concentration corresponding to
the lowest usable reading (LUR)
 LUR = average blank + k s.d.blank
 k determines the confidence level
 We use k = 3 for a 95% C.L.
 Do not confuse LOD and LUR
Harris page 103
 LUR corresponds to Signal detection limit
 LOD corresponds to Concentration detection limit
 When doing this in lab WE CHEAT
 We should have 20 measurements of the blank
and we never do because of time constraints. To
publish a result or for a paying client, we would
need 20.
Readout
8
4
y = 1.9311x + 1.1127
2
R = 0.9888
0
0
1
2
3
4
 Ideally, the average blank = b (the
intercept)
 However, if b > average blank, then
recalculate LUR using LUR = b + k
s.d.blank
 Usually say LUR = b + 3 sd
 LOD = 5.2 mg/L (k = 3)
 Note the 2 significant figures
Quality Assurance




Begins with sampling
Calibration Check
Run standards every few samples.
Reference standards are of known
concentration. Do you get the right answer?
 Include in Table of Results.
 SOP’s are very important
SOP (Standard operating
procedure)
 Set of written instructions that document a
routine or repetitive activity which is followed
by employees in an organization.
 The development and use of SOPs is an
integral part of a successful quality system.
 Provides information to perform a job
properly and consistently in order to achieve
pre-determined specifications and quality.
 http://people.stfx.ca/tsmithpa/Chem361/
Numerical Criteria for Selecting
Analytical Methods






Precision
Bias
Sensitivity
Detection Limit
Concentration Range
Selectivity
Other characteristics to be
considered





Speed
Ease and convenience
Skill required of operator
Cost and availibility of equipment
Per-sample-cost
Criterion
Figure of Merit
Precision
Absolute sd, relative sd, coefficient of
variation, variance
Bias
Absolute systematic error, relative systematic
error
Sensitivity
Calibration sensitivity, analytical sensitivity
Limit of
detection
Av.Blank + 3 sd blank
Concentratio
n range
LOQ to LOL (limit of linearity)
Selectivity
Coefficient of selectivity
Download