Technology, Data Collection, and Analysis

advertisement
Technology, Data Collection, and Analysis
Association of Private Enterprise Education
April 6-8, 2008
Most people understand that data can be misrepresented via visual sleight-of-hand.
Price of Gas as % of Average Hourly Earnings
18.5%
18.3%
18.1%
17.9%
17.7%
17.5%
17.3%
17.1%
16.9%
16.7%
16.5%
1981
2008
Price of Gas as % of Average Hourly Earnings
18.0%
16.0%
14.0%
12.0%
10.0%
8.0%
6.0%
4.0%
2.0%
0.0%
1981
2008
Price of 1-Mile's Worth of Gas as % of Average Disposable Hourly Earnings
18.0%
16.0%
14.0%
12.0%
10.0%
8.0%
6.0%
4.0%
2.0%
0.0%
1981
2008
Similar misrepresentation occurs when not enough
data is collected, the wrong type of data is collected,
or the data is aggregated.
Lesson #1: A single observation is meaningless.
Corollary: An anecdote is both meaningless and
dangerous.
On January 25, 1994, Bill Clinton gave his first
State of the Union Address.
The next day, the Dow-Jones Industrial Average
rose. Pundits took this as evidence of the market’s
approval of policies Clinton outlines in the Address.
Growth on 01/26/94
0.35%
0.30%
0.25%
0.20%
0.15%
0.10%
0.05%
0.00%
DJIA
A single data point contains no meaning.
A mean is what you get when you collect a bunch
of individual data points.
Since  N  nothing  nothing,  N  
Lesson #2: A mean is meaningless.
Corollary: A mean is dangerous because obtaining
it involves simple math and people trust
math they can do.
Growth on 01/26/94
0.35%
0.30%
0.25%
0.20%
0.15%
0.10%
0.05%
0.00%
DJIA
Expected
If a single data point is meaningless, then comparing a
single data point to a mean is meaninglessness wrapped
in the illusion of meaning.
Comparing a single observation to a time series
reveals information because a time series reveals
variance.


1
1
Variance over time  s 
xt   xs 


T  1 t 1 
T s 1 
T
T
2
2
Lesson #3: A variance is meaningful.
Corollary: A variance is dangerous because obtaining
it involves complicated math and people
don’t trust math they can’t do.
Daily Growth from 01/26/93 to 01/26/94
2.50%
2.00%
1.50%
1.00%
0.50%
26-Jan-94
26-Dec-93
26-Nov-93
26-Oct-93
26-Sep-93
26-Aug-93
26-Jul-93
26-Jun-93
26-May-93
26-Apr-93
-1.50%
26-Mar-93
-1.00%
26-Feb-93
-0.50%
26-Jan-93
0.00%
-2.00%
-2.50%
-3.00%
Variance over time reveals the significance of a single observation.
Comparing a single observation to a cross-section
reveals information because a cross-section
reveals variance.
Lesson #4: Variance can be revealed both in time
series and in cross-sectional data.
Growth on 1/26/94
2.50%
2.00%
1.50%
1.00%
0.50%
0.00%
DJIA
Nikkei 225
DAX
CAC 40
Hang Seng
Comparing a single observation to both a time series
and a cross-section reveals a lot of information
because the two dimensions give different information
on variance  Panel data.
Lesson #5: Panel data is extremely meaningful.
Corollary: If you thought variances were
dangerous, panel data is downright
witchcraft.
Range of Random Variation in Daily Growth
4.00%
3.00%
2.00%
1.00%
0.00%
DJIA
Nikkei
DAX
CAC 40
Hang Seng
-1.00%
-2.00%
-3.00%
Variance over time reveals significance relative to the past.
Variance over cross-section reveals significance relative to others.
This is too complicated. Why not just use time series?
Lesson #6: A time series data with few observations
is as meaningless as a single
observation.
5.0%
30%
5.0%
25%
4.9%
4.9%
20%
4.8%
4.8%
15%
4.7%
10%
4.7%
4.6%
5%
4.6%
4.5%
0%
1973
Unemployment Rate
2005
Trade as % of GDP
A comparison of two points in time reveals that greater
trade is associated with greater unemployment.
12%
Unemployment Rate
10%
8%
6%
4%
2%
0%
10%
15%
20%
25%
Trade (imports plus exports) as % of GDP
The fact that trade reduces unemployment is only
revealed after examining many observations.
30%
Lesson #7: Even with many observations, time
does not cure all ills.
Mali
0.32
25%
0.30
20%
0.28
15%
0.26
10%
0.24
5%
0.22
Human Development Index
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
0%
1975
0.20
Govt Spending as % of GDP
Twenty years’ worth of data reveal a positive relationship
between government spending and the HDI.
Austria
0.96
21%
0.94
20%
0.92
19%
0.90
0.88
18%
0.86
17%
0.84
16%
0.82
Human Development Index
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
1979
1977
15%
1975
0.80
Govt Spending as % of GDP
Twenty years’ worth of data reveal a positive relationship
between government spending and the HDI.
Recall
Lesson #1: A single observation is meaningless.
If a single observation is meaningless, then
perhaps so too is a single time series. Let’s look at
the average time series across countries…
Mean Over All Countries
0.60
15.5%
0.58
15.0%
0.56
14.5%
0.54
14.0%
2001
16.0%
1999
0.62
1997
16.5%
1995
0.64
1993
17.0%
1991
0.66
1989
17.5%
1987
0.68
1985
18.0%
1983
0.70
1981
18.5%
1979
0.72
1977
19.0%
1975
0.74
The apparent relationship between HDI and the size
of government is seen in a different light after
examining many time series.
Recall
Lesson #2: A mean is meaningless.
How are all the individual countries behaving?
Standard Errors of Means Over All Countries
0.60
15.5%
0.58
15.0%
0.56
14.5%
0.54
14.0%
2001
16.0%
1999
0.62
1997
16.5%
1995
0.64
1993
17.0%
1991
0.66
1989
17.5%
1987
0.68
1985
18.0%
1983
0.70
1981
18.5%
1979
0.72
1977
19.0%
1975
0.74
We get more information from looking at many
individual countries than from looking at means.
Panel data does not lend itself well to graphing.
But, panel data contains rich information that is
found in neither time series nor cross-sectional
data.
Econometric techniques can extract that data.
Government Spending that Maximizes HDI
Panel data enables us to filter out noise that occurs across
time and across countries to see underlying relationships.
Panel data can be visualized, but doing so requires
animation.
 gapminder.org
Moral of the Story
Data yields the greatest information when the data is:
• Disaggregated
reporting averages hides information
• Time series
reporting a snapshot hides trends
• Cross-sectional
reporting one instance of a time series
hides atypical trends
For discerning truth from noise, disaggregated panel data is
the tool of choice.
Download