Factor 10 9 6 1 -2 3 2 5 10 7 8 2 5 1 8 8 2 10 7 10 Outcome 11 9 6 1 -2 4 2 3 9 7 10 1 4 3 7 9 4 11 6 9 Which of these two relationships is “tighter?” Factor 5 5 10 2 6 1 6 1 1 9 6 7 9 5 3 8 3 2 7 9 Outcome 10 11 -5 20 8 23 7 22 21 -3 8 4 -3 9 17 2 17 20 5 -2 1 Factor 10 9 6 1 -2 3 2 5 10 7 8 2 5 1 8 8 2 10 7 10 Outcome 11 9 6 1 -2 4 2 3 9 7 10 1 4 3 7 9 4 11 6 9 The relationship on the left appears “tighter” for three reasons: 1. Cognition bias. Simple linear relationships are easier to “eyeball” than complex relationships. 2. Information bias. Rounding masks information. 3. Confirmation bias. Tendency to focus on observations that confirm beliefs and ignore observations that contradict beliefs. 2 Outcome 11 9 6 1 -2 4 2 3 9 7 10 1 4 3 7 9 4 11 6 9 12 10 8 6 Outcome Factor 10 9 6 1 -2 3 2 5 10 7 8 2 5 1 8 8 2 10 7 10 4 2 0 -4 0 -2 2 4 6 8 10 12 -2 -4 Factor 3 25 20 Outcome 15 10 5 0 0 2 4 6 -5 -10 Factor 8 10 12 Factor 5 5 10 2 6 1 6 1 1 9 6 7 9 5 3 8 3 2 7 9 Outcome 10 11 -5 20 8 23 7 22 21 -3 8 4 -3 9 17 2 17 20 5 -2 4 Lesson #1 Never trust your eyes. 5 Corollary Don’t trust summary statistics either. Anscombe’s quartet Four data sets that yield identical summary statistics. 6 Anscombe's quartet I Mean Stdev Corr alpha hat beta hat II III IV x 10 8 13 9 11 14 6 4 12 7 5 y 8.04 6.95 7.58 8.81 8.33 9.96 7.24 4.26 10.84 4.82 5.68 x 10 8 13 9 11 14 6 4 12 7 5 y 9.14 8.14 8.74 8.77 9.26 8.1 6.13 3.1 9.13 7.26 4.74 x 10 8 13 9 11 14 6 4 12 7 5 y 7.46 6.77 12.74 7.11 7.81 8.84 6.08 5.39 8.15 6.42 5.73 x 8 8 8 8 8 8 8 19 8 8 8 y 6.58 5.76 7.71 8.84 8.47 7.04 5.25 12.5 5.56 7.91 6.89 9.00 3.32 7.50 2.03 9.00 3.32 7.50 2.03 9.00 3.32 7.50 2.03 9.00 3.32 7.50 2.03 0.82 0.82 0.82 0.82 3.00 0.50 3.00 0.50 3.00 0.50 3.00 0.50 7 8 Lesson #1 Never trust your eyes. (Don’t trust summary statistics either) Lesson #2 Always employ sanity checks. 9 10.0% 2.5 2.4 9.5% 2.3 9.0% 2.2 8.5% 2.1 8.0% 2 1.9 7.5% 1.8 7.0% 1.7 6.5% 1.6 6.0% 1.5 1991 1992 1993 1994 1995 1996 Conventional Mortgage Rates 1997 1998 1999 2000 2001 2002 Mystery Variable from 2 Years Prior 10 Mystery variable explains 57% of the variation in mortgage rates. Relationship is: Rate 0.03 0.02 Mystery Variable 2.5 10.0% 2.4 9.5% 2.3 9.0% 2.2 8.5% 2.1 2 8.0% 1.9 7.5% 1.8 7.0% 1.7 6.5% 1.6 1.5 6.0% 1991 1992 1993 1994 1995 1996 Conventional Mortgage Rates 1997 1998 1999 2000 2001 2002 Mystery Variable from 2 Years Prior 11 Mystery variable is Algeria’s GDP-relative-to-Trade Spurious Results An infinite number of factors can attempt to explain a given outcome. Look hard enough and you are guaranteed to find a perfect predictor. If the factor is “spurious,” what you are observing is random chance. 12 Mystery variable is Algeria’s GDP-relative-to-Trade. 18.0% 4 16.0% By random chance, the mystery variable predicts mortgage rates over this period. 14.0% 3.5 3 12.0% 2.5 10.0% 2 8.0% 1.5 6.0% 4.0% 77 9 1 1 79 9 1 81 9 1 83 9 1 85 9 1 87 9 1 89 9 1 Conventional Mortgage Rates 91 9 1 93 9 1 95 9 1 97 9 1 99 9 1 01 0 2 03 0 2 Mystery Variable from 2 Years Prior 13 If you wait long enough, randomness will tell you anything you want to hear. 100,000 letters DJIA will be down tomorrow! 200,000 letters . . . DJIA will be down tomorrow! DJIA will be down tomorrow! . . . DJIA will be down tomorrow! 200,000 letters DJIA will be up tomorrow! . . . DJIA will be up tomorrow! 25,000 letters DJIA will be down tomorrow! 50,000 letters DJIA will be down tomorrow! 100,000 letters DJIA will be up tomorrow! . . . DJIA will be up tomorrow! . . . DJIA will be down tomorrow! . . . DJIA will be down tomorrow! 25,000 letters DJIA will be up tomorrow! 50,000 letters DJIA will be up tomorrow! . . . . . . DJIA will be up tomorrow! DJIA will be up tomorrow! 14 180 60 160 50 140 120 40 100 80 30 60 40 20 20 1980 1979 1978 1977 1976 1975 1974 1973 1972 1971 1970 1969 1968 1967 1966 1965 1964 1963 1962 1961 10 1960 0 Number of Sunspots in the Current Year (left axis) Number of Republicans in the Senate 1 Year in the Future (right axis) Source: ftp.ngdc.noaa.gov/stp/solar_data/sunspot_numbers/yearly www.senate.gov/pagelayout/history/one_item_and_teasers/partydiv.htm 15 Counter argument: Spurious or not, sunspots would have been useful at predicting Republicans in the Senate. Fallacy: We see the correlation in hindsight. To be useful, we need to detect the correlation before it ceases to exist. 16 1981 – 2005 1960 – 1980 180 60 160 180 80 160 50 140 120 70 140 120 60 40 100 100 80 80 50 30 60 60 40 20 40 40 30 20 Number of Sunspots in the Current Year (left axis) Number of Sunspots in the Current Year (left axis) Number of Republicans in the Senate 1 Year in the Future (right axis) Number of Republicans in the Senate 1 Year in the Future (right axis) 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 20 1983 0 1982 1980 1979 1978 1977 1976 1975 1974 1973 1972 1971 1970 1969 1968 1967 1966 1965 1964 1963 1962 1961 10 1960 0 1981 20 Source: ftp.ngdc.noaa.gov/stp/solar_data/sunspot_numbers/yearly www.senate.gov/pagelayout/history/one_item_and_teasers/partydiv.htm 17 18 19 20 21 22 Lesson #1 Never trust your eyes. (Don’t trust summary statistics either) Lesson #2 Always employ sanity checks. Lesson #3 An observation is meaningless. Corollary An anecdote is both meaningless and dangerous. 23 Left half of room: Don’t look. Right half of room: Write what you read. 24 The average person in Benin earns an annual income of $750 (in U.S. dollars). 25 Right half of room: Don’t look. Left half of room: Write what you read. 26 The average person in Andorra earns an annual income of $40,000 (in U.S. dollars). 27 The average person on planet Earth earns what annual income (in U.S. dollars)? 28 Anchoring When we see a piece of information, we evaluate subsequent information in light of the first piece of information. Information News interview of a single mother working three jobs to support her family. Policy Question Do we need welfare reform? Problem How common is this example? 29 Left half of room: Don’t look. Right half of room: Read and answer. 30 Should we require school districts to pay to install seat belts on school buses? 1 Definitely not! 2 3 4 5 Absolutely! 31 Right half of room: Don’t look. Left half of room: Read and answer. 32 Every year in the U.S., 17,000 children are treated for injuries sustained in school buses accidents. Most of these injuries could have been avoided had the children been wearing seat belts. Should we require school districts to pay to install seat belts on school buses? 1 Definitely not! 2 3 4 5 Absolutely! 33 Availability It’s easier to see what’s in front of us that it is to see what isn’t. Information News report showing the benefit of school bus seat belts. Policy Question Should we require seat belts in school buses? Problem What is the expected benefit and what are the tradeoffs? 34 Lesson #1 Never trust your eyes. Lesson #2 Always employ sanity checks. Lesson #3 An observation is meaningless. Corollary An anecdote is both meaningless and dangerous. Lesson #4 Not everything that appears random is. 35 y X1 u ˆ 50.01 8.65 ˆ 0.11 0.14 Y R 2 0.01 X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 y X2 u ˆ 1.18 ˆ 0.50 7.56 0.06 Y R 2 0.55 X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 y 1 X 1 2 X 2 u 0.00 ˆ1 1.00 0.00 ˆ2 1.00 0.00 ˆ 0.00 Y R 2 1.00 X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y X2 X1 Y X2 X1 Y X2 X1 Y X2 X1 Y X2 X1 Y X2 X1 Y X2 X1 Y X2 X1 Y X2 X1 Y X2 X1 Y X2 X1 Y X2 X1 Y X2 X1 Regression Why do we do this? 217 A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries. Miles Traveled 500 250 500 500 250 400 375 325 450 450 Deliveries Travel Time (hours) 4 11.3 3 6.8 4 10.9 2 8.5 2 6.2 2 8.2 3 9.4 4 8 3 9.6 2 8.1 Approach #1: Calculate Average Time per Mile Trucks in the data set required a total of 87 hours to travel a total of 4,000 miles. Dividing hours by miles, we find an average of 0.02 hours per mile journeyed. (0.02 hours per mile) (200 miles) = 4 hours 218 A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries. Miles Traveled 500 250 500 500 250 400 375 325 450 450 Deliveries Travel Time (hours) 4 11.3 3 6.8 4 10.9 2 8.5 2 6.2 2 8.2 3 9.4 4 8 3 9.6 2 8.1 Approach #2: Calculate Average Time per Delivery Trucks in the data set required a total of 87 hours to make 29 deliveries. Dividing hours by deliveries, we find an average of 3 hours per delivery. (3 hours per delivery) (3 deliveries) = 9 hours 219 A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries. Miles Traveled 500 250 500 500 250 400 375 325 450 450 Deliveries Travel Time (hours) 4 11.3 3 6.8 4 10.9 2 8.5 2 6.2 2 8.2 3 9.4 4 8 3 9.6 2 8.1 Approach #3: Combine Average Time per Mile and Average Time per Delivery Trucks in the data set required 0.02 hours per mile journeyed and 3 hours per delivery. (0.02 hours per mile) (200 miles) + (3 hours per delivery) (3 deliveries) = 13 hours 220 A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries. Miles Traveled 500 250 500 500 250 400 375 325 450 450 Deliveries Travel Time (hours) 4 11.3 3 6.8 4 10.9 2 8.5 2 6.2 2 8.2 3 9.4 4 8 3 9.6 2 8.1 Problems 1. Combining average time per delivery and average time per mile will double-count time if delivery and miles are correlated. 2. We have ignored a possible fixed effect – an amount of “overhead” time that is required regardless of the number of miles and deliveries. 221 A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries. Miles Traveled 500 250 500 500 250 400 375 325 450 450 Deliveries Travel Time (hours) 4 11.3 3 6.8 4 10.9 2 8.5 2 6.2 2 8.2 3 9.4 4 8 3 9.6 2 8.1 Timei 0 1 (deliveries i ) u i ˆ0 5.38 ˆ1 1.14 5.38 hours + (1.14 hours per delivery) (3 deliveries) = 8.8 hours 222 A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries. Miles Traveled 500 250 500 500 250 400 375 325 450 450 Deliveries Travel Time (hours) 4 11.3 3 6.8 4 10.9 2 8.5 2 6.2 2 8.2 3 9.4 4 8 3 9.6 2 8.1 Timei 0 1 (miles i ) u i ˆ0 3.27 ˆ1 0.01 3.27 hours + (0.01 hours per mile) (200 miles) = 5.27 hours 223 A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries. Miles Traveled 500 250 500 500 250 400 375 325 450 450 Deliveries Travel Time (hours) 4 11.3 3 6.8 4 10.9 2 8.5 2 6.2 2 8.2 3 9.4 4 8 3 9.6 2 8.1 Timei 0 1 (miles i ) 2 (deliveries i ) u i ˆ0 1.13 ˆ1 0.01 ˆ2 0.92 1.13 hours + (0.01 hours per mile) (200 miles) + (0.92 hours per delivery) (3 deliveries) = 5.89 hours 224 A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries. Miles Traveled 500 250 500 500 250 400 375 325 450 450 Deliveries Travel Time (hours) 4 11.3 3 6.8 4 10.9 2 8.5 2 6.2 2 8.2 3 9.4 4 8 3 9.6 2 8.1 Timei 1 (miles i ) 2 (deliveries i ) u i ˆ1 0.01 ˆ2 1.07 (0.01 hours per mile) (200 miles) + (1.07 hours per delivery) (3 deliveries) = 5.21 hours 225 A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries. Hours per Mile 0.02 0.02 0.01 0.01 0.01 Hours per Delivery 3.00 3.00 1.14 0.92 1.07 Fixed Hours 5.38 3.27 1.13 Estimated Hours 4.00 9.00 13.00 8.80 5.27 5.89 5.21 226