Stat 20: Intro to Probability and Statistics

advertisement
Stat 20: Intro to Probability and Statistics
Lecture 7: Measurement Error
Tessa L. Childers-Day
UC Berkeley
2 July 2014
Today’s Goals
Repeated Measurements
Outliers
Errors
By the end of this lecture...
You will be able to:
Explain why we measure repeatedly
Construct and interpret a boxplot
Recognize outliers, and make an informed decision about how
to treat them
Differentiate between chance and systematic error
2 / 17
Today’s Goals
Repeated Measurements
Outliers
Errors
Example 1: Shelf building
Let’s say you are building a shelf. You know that you want it to be
3 feet long. You measure the board to 36 inches and cut it. You
try to attach it to the brackets, but find that it is too short to fit.
What went wrong? How could you have avoided the problem?
3 / 17
Today’s Goals
Repeated Measurements
Outliers
Errors
Better to be safe than sorry
Humans aren’t perfect
Perform measurements more than once
Multiple results yields greater confidence/reliability
Keep conditions/procedure consistent
4 / 17
Today’s Goals
Repeated Measurements
Outliers
Errors
Example 2: Surfactants in Jet Fuel
Surfactants are soaps that form from combinations of acids. The
combination can be a natural by-product of a chemical compound
(like fuel), or can be a goal by itself (making soap).
Surfactants are dangerous when they appear in jet fuel, because
their presence can set off a chain reaction that can cause corrosion
in airplane engines, leading to expensive repairs or dangerous
accidents.
Thus, it is important that jet fuel be carefully tested for the
presence of surfactants.
Suppose that concentrations of surfactants in jet fuel of 2 ppm are
considered “safe”.
5 / 17
Today’s Goals
Repeated Measurements
Outliers
Errors
Example 2: Surfactants in Jet Fuel (cont.)
We measure the concentration in a batch of fuel 500 times.
obs
ppm
1
1.7292
2
1.8573
3
1.7864
4
1.8658
5
1.7084
···
···
How should we display the data?
6 / 17
Today’s Goals
Repeated Measurements
Outliers
Errors
Example 2: Surfactants in Jet Fuel (cont.)
Concentration of Surfactants in Jet Fuel
Concentration of Surfactants in Jet Fuel
3.0
2.4
2.5
2.2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
● ●
●
●●
●
●●
●
●●
●
●●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ● ●
●
●
●
● ●
●
●●
●● ● ●
●●
●
●
●●
●
●
●
●● ● ●
● ●●
●
● ●● ●
● ● ●●
●
●●
●●
●
●
●
●●
●
● ●● ●
●
●●●
●
●
●
●●
● ●● ●
●
● ●
● ● ●● ● ●●
● ●
●●●
●● ● ●●
● ● ●●
●●●●
●
●● ●
●●
● ●
●
●
●
●
●
●●
●●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●● ●●
● ● ● ●
●
●●
●
● ● ● ●●
● ● ● ●
● ●
●
●
● ● ●
●
● ●
●
●●
●
●
● ● ●●
●
●● ● ●●● ●●
●
● ● ● ●● ● ●
● ●
●
●● ●
●
●
●
● ●
●
●
● ●
●
●
●
●●
●●●●●
●
●●● ●
●●
● ●
● ● ● ●● ●●
●●●
● ●●
● ●
● ●
●
●
●
●
●
● ● ●●
●●
● ●●
● ●
● ●
● ●●●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●
● ●
●
● ●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
● ● ●●●
●
●
●
●●
●
●
● ●●
●
●
● ●●
●
●
● ●
●● ●●
● ●● ●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
● ● ●
●
2.0
1.8
Parts Per Million
1.6
1.5
●
●
● ●●
●
●
● ●
●
1.4
1.0
1.2
0.5
0.0
Density
2.0
●
●●
●
●
●
1.2
1.4
1.6
1.8
2.0
Parts Per Million
2.2
2.4
●
0
100
200
300
400
500
Measurement Number
7 / 17
Today’s Goals
Repeated Measurements
Outliers
Errors
Example 2: Surfactants in Jet Fuel (cont.)
Boxplot:
Concentration of Surfactants in Jet Fuel
Another way of displaying
quantitative information
2.4
2.0
Upper Hinge
1.8
Parts Per Million
Upper Whisker
Median
Lower Hinge
1.6
Whiskers are the most
extreme points less than
1.5×IQR from each hinge
●
Lower Whisker
1.4
Hinges are at 25th and 75th
percentiles of data
●
2.2
(50th
1.2
Middle line is median
percentile of data)
●
●
What are the dots?
8 / 17
Today’s Goals
Repeated Measurements
Outliers
Errors
Outliers
Extreme measurements/data points
Far away from bulk of data
Could make an explicit definition/rule
What should we do about them?
9 / 17
Today’s Goals
Repeated Measurements
Outliers
Errors
Two Types of Errors
All measurements are subject to errors/mistakes
1
Chance Error (Idiosyncratic)
2
Systematic Error (Bias)
10 / 17
Today’s Goals
Repeated Measurements
Outliers
Errors
Chance Error
Error due simply to chance/randomness
Due purely to chance
Can’t explain or eliminate
Different for each measurement
Can go in either direction
Can measure/estimate
11 / 17
Today’s Goals
Repeated Measurements
Outliers
Errors
Measuring Chance Error
An observation is the true value and an individual chance error
measurementj = true value + chance errorj
So the observations spread around the true value. Use SD to
measure the spread:
v
u X
n
X
u1 n
2
t
(Xj − X̄ )
where X̄ =
Xj = avg
SD =
n
j=1
j=1
The Xj are the observed measurements. The SD is the average
distance from an observation to the mean; it estimates the size of
the error in 1 measurement.
12 / 17
Today’s Goals
Repeated Measurements
Outliers
Errors
Example 2: Surfactants in Jet Fuel (cont.)
Concentration of Surfactants in Jet Fuel
Chance error is responsible for the
spread of the data around the
average ppm
2.4
●
2.2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
● ●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ● ●
●
●
●
● ●
●
●●
●● ● ●
●●
●
●
●●
●
●
●
●● ● ●
● ●●
●
● ●● ●
● ● ●●
●
●●
●●
●
●
●
●●
●
● ●● ●
●
●●●
●
●
●
●●
● ●● ●
●
● ●
● ● ●● ● ●●
●●●
●● ● ●●
● ● ●●
●●●●
●
●● ●
●●
●● ●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●● ●
●● ●
● ● ● ●
●
●
●●● ● ● ●●
● ● ● ●
● ●
●
●
●
● ●
●
●●
●
●
● ●● ●●● ●
●
●● ● ●●● ●●
●
● ● ● ●● ● ●
● ●
●
●● ●
●
●
●
● ●
●
●
● ●
●
●
●
●●
●●●●●
●
●●● ●
●●
● ●
● ● ● ●● ●●
●●●
● ●●
● ●
● ●
●
●
●
●
●
● ● ●●
●●
● ●●
● ●
● ●
● ●●●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●
● ●
●
● ●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
● ● ●●●
●
●
●
●●
●
●
● ●●
●
● ●
●
● ●●
●
●
● ●
●● ●●
●
●
●
●
●
●
●●
●
●● ●
●
●●
●
●●
●
●●●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
● ● ●
●
●
● ●●
●
●
● ●
●
●
2.0
1.2
If there is no chance error, then all
of our measurements are the same
●
●
●
●
1.4
1.6
SD = 0.12 pmm
1.8
Avg = 1.80 ppm
Parts Per Million
●
●●
●
●
●
●
0
100
200
300
400
500
Measurement Number
13 / 17
Today’s Goals
Repeated Measurements
Outliers
Errors
Systematic Error (Bias)
Somewhat similar to bias in experiments and surveys
Error that applies to all measurements
Affects all data equally
Pushes observation in a particular direction
Hard to measure/estimate
measurementj = true value + bias + chance errorj
How can we look for evidence of bias?
14 / 17
Today’s Goals
Repeated Measurements
Outliers
Errors
Example 2: Surfactants in Jet Fuel (cont.)
Concentration of Surfactants in Jet Fuel
●
2.2
2.0
Parts Per Million
1.8
1.6
Use a new apparatus to
measure
1.4
Observe the experimenter
●
●
●
●
●
1.2
Look at the procedure
carefully
2.4
Examine the jet fuel for systematic
error/bias:
●
First
Second
15 / 17
Today’s Goals
Repeated Measurements
Outliers
Errors
Example 3: Using Repeated Measurements
You send a yardstick to a local laboratory for calibration, asking
that the procedure be repeated three times. They report the
following values:
35.96 inches
36.01 inches
36.03 inches
If you send the yardstick back for a fourth calibration, you would
expect to get
inches, give or take
inches. Why
are the measurements varying? What are some possible sources of
error?
16 / 17
Today’s Goals
Repeated Measurements
Outliers
Errors
Important Takeaways
Repeated measurements:
Give us confidence in our estimates
Help us quantify chance error
Don’t help us quantify systematic error
Outliers look different from the rest of the data, need
judgement to deal with them
Boxplots can show the spread of the data
Next time: Bivariate Data and Correlation
17 / 17
Download