Stat 20: Intro to Probability and Statistics Lecture 7: Measurement Error Tessa L. Childers-Day UC Berkeley 2 July 2014 Today’s Goals Repeated Measurements Outliers Errors By the end of this lecture... You will be able to: Explain why we measure repeatedly Construct and interpret a boxplot Recognize outliers, and make an informed decision about how to treat them Differentiate between chance and systematic error 2 / 17 Today’s Goals Repeated Measurements Outliers Errors Example 1: Shelf building Let’s say you are building a shelf. You know that you want it to be 3 feet long. You measure the board to 36 inches and cut it. You try to attach it to the brackets, but find that it is too short to fit. What went wrong? How could you have avoided the problem? 3 / 17 Today’s Goals Repeated Measurements Outliers Errors Better to be safe than sorry Humans aren’t perfect Perform measurements more than once Multiple results yields greater confidence/reliability Keep conditions/procedure consistent 4 / 17 Today’s Goals Repeated Measurements Outliers Errors Example 2: Surfactants in Jet Fuel Surfactants are soaps that form from combinations of acids. The combination can be a natural by-product of a chemical compound (like fuel), or can be a goal by itself (making soap). Surfactants are dangerous when they appear in jet fuel, because their presence can set off a chain reaction that can cause corrosion in airplane engines, leading to expensive repairs or dangerous accidents. Thus, it is important that jet fuel be carefully tested for the presence of surfactants. Suppose that concentrations of surfactants in jet fuel of 2 ppm are considered “safe”. 5 / 17 Today’s Goals Repeated Measurements Outliers Errors Example 2: Surfactants in Jet Fuel (cont.) We measure the concentration in a batch of fuel 500 times. obs ppm 1 1.7292 2 1.8573 3 1.7864 4 1.8658 5 1.7084 ··· ··· How should we display the data? 6 / 17 Today’s Goals Repeated Measurements Outliers Errors Example 2: Surfactants in Jet Fuel (cont.) Concentration of Surfactants in Jet Fuel Concentration of Surfactants in Jet Fuel 3.0 2.4 2.5 2.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ●● ● ●● ● ● ●● ●●●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●●●● ● ●●● ● ●● ● ● ● ● ● ●● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● 2.0 1.8 Parts Per Million 1.6 1.5 ● ● ● ●● ● ● ● ● ● 1.4 1.0 1.2 0.5 0.0 Density 2.0 ● ●● ● ● ● 1.2 1.4 1.6 1.8 2.0 Parts Per Million 2.2 2.4 ● 0 100 200 300 400 500 Measurement Number 7 / 17 Today’s Goals Repeated Measurements Outliers Errors Example 2: Surfactants in Jet Fuel (cont.) Boxplot: Concentration of Surfactants in Jet Fuel Another way of displaying quantitative information 2.4 2.0 Upper Hinge 1.8 Parts Per Million Upper Whisker Median Lower Hinge 1.6 Whiskers are the most extreme points less than 1.5×IQR from each hinge ● Lower Whisker 1.4 Hinges are at 25th and 75th percentiles of data ● 2.2 (50th 1.2 Middle line is median percentile of data) ● ● What are the dots? 8 / 17 Today’s Goals Repeated Measurements Outliers Errors Outliers Extreme measurements/data points Far away from bulk of data Could make an explicit definition/rule What should we do about them? 9 / 17 Today’s Goals Repeated Measurements Outliers Errors Two Types of Errors All measurements are subject to errors/mistakes 1 Chance Error (Idiosyncratic) 2 Systematic Error (Bias) 10 / 17 Today’s Goals Repeated Measurements Outliers Errors Chance Error Error due simply to chance/randomness Due purely to chance Can’t explain or eliminate Different for each measurement Can go in either direction Can measure/estimate 11 / 17 Today’s Goals Repeated Measurements Outliers Errors Measuring Chance Error An observation is the true value and an individual chance error measurementj = true value + chance errorj So the observations spread around the true value. Use SD to measure the spread: v u X n X u1 n 2 t (Xj − X̄ ) where X̄ = Xj = avg SD = n j=1 j=1 The Xj are the observed measurements. The SD is the average distance from an observation to the mean; it estimates the size of the error in 1 measurement. 12 / 17 Today’s Goals Repeated Measurements Outliers Errors Example 2: Surfactants in Jet Fuel (cont.) Concentration of Surfactants in Jet Fuel Chance error is responsible for the spread of the data around the average ppm 2.4 ● 2.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●● ●●● ●● ● ●● ● ● ●● ●●●● ● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ●● ● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●●●● ● ●●● ● ●● ● ● ● ● ● ●● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ●●● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● 2.0 1.2 If there is no chance error, then all of our measurements are the same ● ● ● ● 1.4 1.6 SD = 0.12 pmm 1.8 Avg = 1.80 ppm Parts Per Million ● ●● ● ● ● ● 0 100 200 300 400 500 Measurement Number 13 / 17 Today’s Goals Repeated Measurements Outliers Errors Systematic Error (Bias) Somewhat similar to bias in experiments and surveys Error that applies to all measurements Affects all data equally Pushes observation in a particular direction Hard to measure/estimate measurementj = true value + bias + chance errorj How can we look for evidence of bias? 14 / 17 Today’s Goals Repeated Measurements Outliers Errors Example 2: Surfactants in Jet Fuel (cont.) Concentration of Surfactants in Jet Fuel ● 2.2 2.0 Parts Per Million 1.8 1.6 Use a new apparatus to measure 1.4 Observe the experimenter ● ● ● ● ● 1.2 Look at the procedure carefully 2.4 Examine the jet fuel for systematic error/bias: ● First Second 15 / 17 Today’s Goals Repeated Measurements Outliers Errors Example 3: Using Repeated Measurements You send a yardstick to a local laboratory for calibration, asking that the procedure be repeated three times. They report the following values: 35.96 inches 36.01 inches 36.03 inches If you send the yardstick back for a fourth calibration, you would expect to get inches, give or take inches. Why are the measurements varying? What are some possible sources of error? 16 / 17 Today’s Goals Repeated Measurements Outliers Errors Important Takeaways Repeated measurements: Give us confidence in our estimates Help us quantify chance error Don’t help us quantify systematic error Outliers look different from the rest of the data, need judgement to deal with them Boxplots can show the spread of the data Next time: Bivariate Data and Correlation 17 / 17