Superpopluation: A Statistical, Philosophical, and Historical Motivation You may have heard once or twice that frequentist statistics (i.e. the statistics commonly taught in an introductory statistics course) are designed to gain inferences on a random sample drawn from a larger population.1 For many social scientists, this idea should give pause. Most social scientists work with observational data, not data drawn from a survey or an experiment. Thus, our data is not a random sample, but is instead the actual population of events. For example, if one has the Correlates of War (COW) dataset of all wars from 1816 to 2010, this is not a sample of wars, but is instead the population of all known wars. Moreover, there is probably nothing random about the onset of war or the factors that caused the wars. That the factors explaining war onset (our `independent variables’) were not randomly assigned is not a huge concern (this is the rationale for conditioning on covariates, either through regression adjustment, matching techniques, and/or a combination of the two). Instead, we should be concerned that the observations themselves were not randomly drawn from a larger population. Without this larger population, then the entire exercise of null hypothesis testing is meaningless! Let’s reflect on this point for a moment. We are taught to use statistics as a way of determining if some parameter of interest differs in a `meaningful’ way from a null hypothesis value (which is typically assumed to be zero).2 By `meaningful’, what we really mean is `statistical’, which is that the observed (estimated) value of the parameter is unlikely to be different from the null value (usually zero) due to `chance’ alone (where the threshold for not being due to chance is a pre-defined significance level chosen by the analyst). Stated differently, we usually ask, ``If we assume that the null hypothesized value of the statistic (zero) is true, then what is the probability of observing a non-zero value for the statistic that is at least as extreme as the non-zero value that we actually observed?’’ In other words, is the fact that we observed a non-zero value for some statistic simply explained by random error or did it have a cause? When answering such a question we commonly apply a threshold of 1 out of 20, meaning that if the probability of actually witnessing the observed value of the statistic (usually referred to as a p-value) is equal to or less than 1 out of 20, then it’s unlikely that the observed value of the statistic is different from the null (zero) value for reasons of random chance. 3 This gives us `confidence’ that the estimated value may, in actuality, capture the true value. In this case, we make statements such as ``the 1 See Casella and Berger Statistical Inference 2001. For brief discussion on the origins of the frequentists school of statistics, also referred to as the hypothesis testing school, see discussion of historical development of significance testing from Gill in Political Research Quarterly September 1999. 2 Keep in mind that the error term in statistical models contains two components: what which we don’t observe (uncertainty or omitted variables) and that which we can’t observe (randomness). 3 To put statistical significance in perspective (and how the standard 1 out of 20 chance is arbitrary), consider the following example from Mlodinow (2008): ``In a 7-game series there is a sizable chance that the inferior team will be crowned champion. For instance, if one team is good enough to warrant beating another in 55 percent of its games, the weaker team will nevertheless win a 7-game series about 4 times out of 10. And if the superior team could be expected to beat its opponent, on average, 2 out of 3 times they meet, the inferior team will still win a 7game series about once every 5 matchups. There really is no way for sports leagues to change this. In the lopsided 2/3 probability case, for example, you’d have to play a series consisting of at minimum the best of 23 games to determine the winner with what is called statistical significance, meaning the weaker team would be crowned champion 5 percent or less of the time. And in the case of one team’s having only a 55-45 edge, the shortest statistically significant `world series’ would be the best of 269 games!’’ (p. 71). relationship between X and Y is statistically significant at the 0.95 confidence level’’, which is another way of saying ``the difference between the estimated relationship between X and Y and the true relationship between X and Y is random and imperfect, so my estimate of X and Y is a true representation of the actual relationship.’’ If, however, the probability is greater than 1 out of 20 (or even 1 out of 10), then we commonly presume that the difference between the observed value and the null value is due to random chance. But words such as `chance’ or `probability’ suggests uncertainty: we don’t know exactly what the true value is, so we place a probability on observing it. This is a problem because, if we have the population (such as, in the case of COW, the population of wars), then notions of `chance’ or `probability’ are, in a sense, meaningless.4 Mind you, we still can obtain, conditional on a host of control variables, the parameter relating X to Y (perhaps the frequency that democracies are involved in wars, conditional on the military capabilities of the countries and their distance from one another). However, without an element of randomness (meaning it was drawn from a larger population), stating that something is statistically significant is just plain silly. Or is it? While we might, for instance, be able to observe all wars that occurred over the past 200 years and we might be able to tally the casualties of those wars, one would be remiss to say that those figures were not generated by some degree of randomness. For example, consider fatalities from the American Civil War. For nearly 150 years, estimates of the war’s causalities were placed at 620,000. However, recent research says the tally could be closer to 850,000 (see Hacker in Civil War History 2012). While this discrepancy is due to measurement error (which is not what we mean by randomness), it does highlight a larger point: assuming there is no measurement error, why did the American Civil War produce 620,000 casualties? Why couldn’t there have been 621,000 casualties, 610,000 casualties, or even 675,000 casualties? A lot of this has to do with randomness. Indeed, as many a military historian (or veteran of war) will tell you, why one individual is struck by a bullet and not another is often complete chance. The world is full of randomness and contingency. I always found interesting how St. Thomas Aquinas, in his Summa Theologica, when addressing the presence of evil in the world and whether it can generate good outcomes, ultimately relies on the notion of randomness. When asked if good outcomes can result from evil actions, Aquinas states that ``evil is not of itself ordered to good, but accidentally. For it is beside the intention of the sinner that any good should follow from his sin’’ (Summa Theologica, Vol 1, Q19, A9). In other words, evil sometimes accidently (i.e. randomly) results in good and sometimes…well…it doesn’t. My favorite book on the subject of contingency and randomness in the observable world is not a work of statistics, philosophy, or theology, but the volume What If? (Berkley Trade Publishing, 2000). This is a collection of counterfactual histories by eminent historians. Upon reading these highly entertaining and enlightening essays, one might become unsettled by the extent to which major events in world history were conditional on seemingly trivial and uncontrollable factors (e.g. the weather and tides prior to the D-Day invasion of Normandy). 4 It is at this point that some people turn to Bayesian statistics (Gill Bayesian Methods 2003), which have the nice properties of assuming that the data are fixed and that the relationship between the variables (i.e. the coefficient) is a random variable (rather than assuming that the relationship is fixed, but the data are random). Whether this is a `better’ way of conceptualizing the use of statistics to systematically evaluate evidence is a discussion for another time and place. To further emphasize the role of randomness and contingency in the world, consider the view of former movie studio executive David Picker (former President of United Artists, Paramount, and Columbia Pictures). He states ``if I had said yes to all the projects I turned down, and no to all the other ones I took, it would have worked out about the same.’’5 The statistician Leonard Mlodinow may have said it best in his book The Drunkard’s Walk: ``When we look at extraordinary accomplishments in sports – or elsewhere – we should keep in mind that extraordinary events can happen without extraordinary causes. Random events often look like nonrandom events, and in interpreting human affairs we must take care not to confuse the two’’ (Mlodinow 2008, 20). These examples highlight the notion that the world we observe is simply just one realization of what could (quite easily) have been. This brings us to our core idea: superpopulations. A superpopulation is a hypothetical infinite population from which the finite observed population is a sample (see Hartley and Sielken in Biometrica June 1975).6 For example, while COW identifies 95 wars, some crises might have become wars (but, of course, did not). Hence, these 95 wars come from a larger `population’ of possible (possible in the past and in the future) and realized wars.7 5 Quoted in Mlodinow 2008, p. 12. For the philosophical foundation of superpopulation, see Alvin Plantinga’s discussion of chance and alternate worlds in his The Nature of Necessity. Also, look up Novick’s ``principle of fecundity’’, which holds that the world in which we live (and, hence, observe) is just one of a nearly innumerable number of universes. In short, both ideas hold that the events we observe are just the realizations of what might have been. 7 The idea that ``future’’ events are included in the superpopluation is an important one. Andrew Gelman makes this point in his contribution to Field Experiments and Their Critics, ``Textbook presentations often imply that the goal of causal inference is to learn about the units who happen to be in the study. Invariably, though, these are a sample from a larger population of interest. Even when the study appears to include the entire population – for example, an analysis of all 50 [US] states – the ultimate question s apply to a superpopulation such as these same states in future years.’’ 6