Paul`s memo

advertisement
Superpopluation: A Statistical, Philosophical, and Historical Motivation
You may have heard once or twice that frequentist statistics (i.e. the statistics commonly taught in an
introductory statistics course) are designed to gain inferences on a random sample drawn from a larger
population.1 For many social scientists, this idea should give pause. Most social scientists work with
observational data, not data drawn from a survey or an experiment. Thus, our data is not a random
sample, but is instead the actual population of events. For example, if one has the Correlates of War
(COW) dataset of all wars from 1816 to 2010, this is not a sample of wars, but is instead the population
of all known wars. Moreover, there is probably nothing random about the onset of war or the factors
that caused the wars.
That the factors explaining war onset (our `independent variables’) were not randomly assigned is not a
huge concern (this is the rationale for conditioning on covariates, either through regression adjustment,
matching techniques, and/or a combination of the two). Instead, we should be concerned that the
observations themselves were not randomly drawn from a larger population. Without this larger
population, then the entire exercise of null hypothesis testing is meaningless!
Let’s reflect on this point for a moment. We are taught to use statistics as a way of determining if some
parameter of interest differs in a `meaningful’ way from a null hypothesis value (which is typically
assumed to be zero).2 By `meaningful’, what we really mean is `statistical’, which is that the observed
(estimated) value of the parameter is unlikely to be different from the null value (usually zero) due to
`chance’ alone (where the threshold for not being due to chance is a pre-defined significance level
chosen by the analyst). Stated differently, we usually ask, ``If we assume that the null hypothesized
value of the statistic (zero) is true, then what is the probability of observing a non-zero value for the
statistic that is at least as extreme as the non-zero value that we actually observed?’’ In other words, is
the fact that we observed a non-zero value for some statistic simply explained by random error or did it
have a cause?
When answering such a question we commonly apply a threshold of 1 out of 20, meaning that if the
probability of actually witnessing the observed value of the statistic (usually referred to as a p-value) is
equal to or less than 1 out of 20, then it’s unlikely that the observed value of the statistic is different
from the null (zero) value for reasons of random chance. 3 This gives us `confidence’ that the estimated
value may, in actuality, capture the true value. In this case, we make statements such as ``the
1
See Casella and Berger Statistical Inference 2001. For brief discussion on the origins of the frequentists school of
statistics, also referred to as the hypothesis testing school, see discussion of historical development of significance
testing from Gill in Political Research Quarterly September 1999.
2
Keep in mind that the error term in statistical models contains two components: what which we don’t observe
(uncertainty or omitted variables) and that which we can’t observe (randomness).
3
To put statistical significance in perspective (and how the standard 1 out of 20 chance is arbitrary), consider the
following example from Mlodinow (2008): ``In a 7-game series there is a sizable chance that the inferior team will
be crowned champion. For instance, if one team is good enough to warrant beating another in 55 percent of its
games, the weaker team will nevertheless win a 7-game series about 4 times out of 10. And if the superior team
could be expected to beat its opponent, on average, 2 out of 3 times they meet, the inferior team will still win a 7game series about once every 5 matchups. There really is no way for sports leagues to change this. In the lopsided
2/3 probability case, for example, you’d have to play a series consisting of at minimum the best of 23 games to
determine the winner with what is called statistical significance, meaning the weaker team would be crowned
champion 5 percent or less of the time. And in the case of one team’s having only a 55-45 edge, the shortest
statistically significant `world series’ would be the best of 269 games!’’ (p. 71).
relationship between X and Y is statistically significant at the 0.95 confidence level’’, which is another
way of saying ``the difference between the estimated relationship between X and Y and the true
relationship between X and Y is random and imperfect, so my estimate of X and Y is a true
representation of the actual relationship.’’ If, however, the probability is greater than 1 out of 20 (or
even 1 out of 10), then we commonly presume that the difference between the observed value and the
null value is due to random chance.
But words such as `chance’ or `probability’ suggests uncertainty: we don’t know exactly what the true
value is, so we place a probability on observing it. This is a problem because, if we have the population
(such as, in the case of COW, the population of wars), then notions of `chance’ or `probability’ are, in a
sense, meaningless.4 Mind you, we still can obtain, conditional on a host of control variables, the
parameter relating X to Y (perhaps the frequency that democracies are involved in wars, conditional on
the military capabilities of the countries and their distance from one another). However, without an
element of randomness (meaning it was drawn from a larger population), stating that something is
statistically significant is just plain silly.
Or is it? While we might, for instance, be able to observe all wars that occurred over the past 200 years
and we might be able to tally the casualties of those wars, one would be remiss to say that those figures
were not generated by some degree of randomness. For example, consider fatalities from the
American Civil War. For nearly 150 years, estimates of the war’s causalities were placed at 620,000.
However, recent research says the tally could be closer to 850,000 (see Hacker in Civil War History
2012). While this discrepancy is due to measurement error (which is not what we mean by
randomness), it does highlight a larger point: assuming there is no measurement error, why did the
American Civil War produce 620,000 casualties? Why couldn’t there have been 621,000 casualties,
610,000 casualties, or even 675,000 casualties? A lot of this has to do with randomness. Indeed, as
many a military historian (or veteran of war) will tell you, why one individual is struck by a bullet and not
another is often complete chance.
The world is full of randomness and contingency. I always found interesting how St. Thomas Aquinas, in
his Summa Theologica, when addressing the presence of evil in the world and whether it can generate
good outcomes, ultimately relies on the notion of randomness. When asked if good outcomes can result
from evil actions, Aquinas states that ``evil is not of itself ordered to good, but accidentally. For it is
beside the intention of the sinner that any good should follow from his sin’’ (Summa Theologica, Vol 1,
Q19, A9). In other words, evil sometimes accidently (i.e. randomly) results in good and
sometimes…well…it doesn’t.
My favorite book on the subject of contingency and randomness in the observable world is not a work of
statistics, philosophy, or theology, but the volume What If? (Berkley Trade Publishing, 2000). This is a
collection of counterfactual histories by eminent historians. Upon reading these highly entertaining and
enlightening essays, one might become unsettled by the extent to which major events in world history
were conditional on seemingly trivial and uncontrollable factors (e.g. the weather and tides prior to the
D-Day invasion of Normandy).
4
It is at this point that some people turn to Bayesian statistics (Gill Bayesian Methods 2003), which have the nice
properties of assuming that the data are fixed and that the relationship between the variables (i.e. the coefficient)
is a random variable (rather than assuming that the relationship is fixed, but the data are random). Whether this is
a `better’ way of conceptualizing the use of statistics to systematically evaluate evidence is a discussion for another
time and place.
To further emphasize the role of randomness and contingency in the world, consider the view of former
movie studio executive David Picker (former President of United Artists, Paramount, and Columbia
Pictures). He states ``if I had said yes to all the projects I turned down, and no to all the other ones I
took, it would have worked out about the same.’’5 The statistician Leonard Mlodinow may have said it
best in his book The Drunkard’s Walk: ``When we look at extraordinary accomplishments in sports – or
elsewhere – we should keep in mind that extraordinary events can happen without extraordinary
causes. Random events often look like nonrandom events, and in interpreting human affairs we must
take care not to confuse the two’’ (Mlodinow 2008, 20).
These examples highlight the notion that the world we observe is simply just one realization of what
could (quite easily) have been. This brings us to our core idea: superpopulations. A superpopulation is
a hypothetical infinite population from which the finite observed population is a sample (see Hartley and
Sielken in Biometrica June 1975).6 For example, while COW identifies 95 wars, some crises might have
become wars (but, of course, did not). Hence, these 95 wars come from a larger `population’ of possible
(possible in the past and in the future) and realized wars.7
5
Quoted in Mlodinow 2008, p. 12.
For the philosophical foundation of superpopulation, see Alvin Plantinga’s discussion of chance and alternate
worlds in his The Nature of Necessity. Also, look up Novick’s ``principle of fecundity’’, which holds that the world in
which we live (and, hence, observe) is just one of a nearly innumerable number of universes. In short, both ideas
hold that the events we observe are just the realizations of what might have been.
7
The idea that ``future’’ events are included in the superpopluation is an important one. Andrew Gelman makes
this point in his contribution to Field Experiments and Their Critics, ``Textbook presentations often imply that the
goal of causal inference is to learn about the units who happen to be in the study. Invariably, though, these are a
sample from a larger population of interest. Even when the study appears to include the entire population – for
example, an analysis of all 50 [US] states – the ultimate question s apply to a superpopulation such as these same
states in future years.’’
6
Download