Technology, Data Collection, and Analysis Association of Private Enterprise Education April 6-8, 2008 Most people understand that data can be misrepresented via visual sleight-of-hand. Price of Gas as % of Average Hourly Earnings 18.5% 18.3% 18.1% 17.9% 17.7% 17.5% 17.3% 17.1% 16.9% 16.7% 16.5% 1981 2008 Price of Gas as % of Average Hourly Earnings 18.0% 16.0% 14.0% 12.0% 10.0% 8.0% 6.0% 4.0% 2.0% 0.0% 1981 2008 Price of 1-Mile's Worth of Gas as % of Average Disposable Hourly Earnings 18.0% 16.0% 14.0% 12.0% 10.0% 8.0% 6.0% 4.0% 2.0% 0.0% 1981 2008 Similar misrepresentation occurs when not enough data is collected, the wrong type of data is collected, or the data is aggregated. Lesson #1: A single observation is meaningless. Corollary: An anecdote is both meaningless and dangerous. On January 25, 1994, Bill Clinton gave his first State of the Union Address. The next day, the Dow-Jones Industrial Average rose. Pundits took this as evidence of the market’s approval of policies Clinton outlines in the Address. Growth on 01/26/94 0.35% 0.30% 0.25% 0.20% 0.15% 0.10% 0.05% 0.00% DJIA A single data point contains no meaning. A mean is what you get when you collect a bunch of individual data points. Since N nothing nothing, N Lesson #2: A mean is meaningless. Corollary: A mean is dangerous because obtaining it involves simple math and people trust math they can do. Growth on 01/26/94 0.35% 0.30% 0.25% 0.20% 0.15% 0.10% 0.05% 0.00% DJIA Expected If a single data point is meaningless, then comparing a single data point to a mean is meaninglessness wrapped in the illusion of meaning. Comparing a single observation to a time series reveals information because a time series reveals variance. 1 1 Variance over time s xt xs T 1 t 1 T s 1 T T 2 2 Lesson #3: A variance is meaningful. Corollary: A variance is dangerous because obtaining it involves complicated math and people don’t trust math they can’t do. Daily Growth from 01/26/93 to 01/26/94 2.50% 2.00% 1.50% 1.00% 0.50% 26-Jan-94 26-Dec-93 26-Nov-93 26-Oct-93 26-Sep-93 26-Aug-93 26-Jul-93 26-Jun-93 26-May-93 26-Apr-93 -1.50% 26-Mar-93 -1.00% 26-Feb-93 -0.50% 26-Jan-93 0.00% -2.00% -2.50% -3.00% Variance over time reveals the significance of a single observation. Comparing a single observation to a cross-section reveals information because a cross-section reveals variance. Lesson #4: Variance can be revealed both in time series and in cross-sectional data. Growth on 1/26/94 2.50% 2.00% 1.50% 1.00% 0.50% 0.00% DJIA Nikkei 225 DAX CAC 40 Hang Seng Comparing a single observation to both a time series and a cross-section reveals a lot of information because the two dimensions give different information on variance Panel data. Lesson #5: Panel data is extremely meaningful. Corollary: If you thought variances were dangerous, panel data is downright witchcraft. Range of Random Variation in Daily Growth 4.00% 3.00% 2.00% 1.00% 0.00% DJIA Nikkei DAX CAC 40 Hang Seng -1.00% -2.00% -3.00% Variance over time reveals significance relative to the past. Variance over cross-section reveals significance relative to others. This is too complicated. Why not just use time series? Lesson #6: A time series data with few observations is as meaningless as a single observation. 5.0% 30% 5.0% 25% 4.9% 4.9% 20% 4.8% 4.8% 15% 4.7% 10% 4.7% 4.6% 5% 4.6% 4.5% 0% 1973 Unemployment Rate 2005 Trade as % of GDP A comparison of two points in time reveals that greater trade is associated with greater unemployment. 12% Unemployment Rate 10% 8% 6% 4% 2% 0% 10% 15% 20% 25% Trade (imports plus exports) as % of GDP The fact that trade reduces unemployment is only revealed after examining many observations. 30% Lesson #7: Even with many observations, time does not cure all ills. Mali 0.32 25% 0.30 20% 0.28 15% 0.26 10% 0.24 5% 0.22 Human Development Index 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1980 1979 1978 1977 1976 0% 1975 0.20 Govt Spending as % of GDP Twenty years’ worth of data reveal a positive relationship between government spending and the HDI. Austria 0.96 21% 0.94 20% 0.92 19% 0.90 0.88 18% 0.86 17% 0.84 16% 0.82 Human Development Index 2001 1999 1997 1995 1993 1991 1989 1987 1985 1983 1981 1979 1977 15% 1975 0.80 Govt Spending as % of GDP Twenty years’ worth of data reveal a positive relationship between government spending and the HDI. Recall Lesson #1: A single observation is meaningless. If a single observation is meaningless, then perhaps so too is a single time series. Let’s look at the average time series across countries… Mean Over All Countries 0.60 15.5% 0.58 15.0% 0.56 14.5% 0.54 14.0% 2001 16.0% 1999 0.62 1997 16.5% 1995 0.64 1993 17.0% 1991 0.66 1989 17.5% 1987 0.68 1985 18.0% 1983 0.70 1981 18.5% 1979 0.72 1977 19.0% 1975 0.74 The apparent relationship between HDI and the size of government is seen in a different light after examining many time series. Recall Lesson #2: A mean is meaningless. How are all the individual countries behaving? Standard Errors of Means Over All Countries 0.60 15.5% 0.58 15.0% 0.56 14.5% 0.54 14.0% 2001 16.0% 1999 0.62 1997 16.5% 1995 0.64 1993 17.0% 1991 0.66 1989 17.5% 1987 0.68 1985 18.0% 1983 0.70 1981 18.5% 1979 0.72 1977 19.0% 1975 0.74 We get more information from looking at many individual countries than from looking at means. Panel data does not lend itself well to graphing. But, panel data contains rich information that is found in neither time series nor cross-sectional data. Econometric techniques can extract that data. Government Spending that Maximizes HDI Panel data enables us to filter out noise that occurs across time and across countries to see underlying relationships. Panel data can be visualized, but doing so requires animation. gapminder.org Moral of the Story Data yields the greatest information when the data is: • Disaggregated reporting averages hides information • Time series reporting a snapshot hides trends • Cross-sectional reporting one instance of a time series hides atypical trends For discerning truth from noise, disaggregated panel data is the tool of choice.