Missing Voters, Missing Data: Using Multiple Imputation to Estimate the Effects of Low Turnout 17 May 2010 Patrick Bernhagen Department of Politics and International Relations University of Aberdeen Michael Marsh Department of Political Science Trinity College Dublin Address for correspondence: Patrick Bernhagen, University of Aberdeen, Department of Politics and International Relations, Edward Wright Building, Dunbar Street, Aberdeen, AB24 3QY, United Kingdom. E-mail: p.bernhagen@abdn.ac.uk. Missing Voters, Missing Data: Using Multiple Imputation to Estimate the Effects of Low Turnout ABSTRACT: In recent years, different methods have been proposed to estimate the political effects of low voter turnout. This article contributes to the discussion by assessing the performance of multiple imputation in estimating the partisan effects of low turnout. Using the 2002 Irish General Election as a case study, we demonstrate how multiple imputation can be used to fill in the vote choices of non-voters. We verify simulations and reported turnout against official data and compare the results to those from alternative, maximum likelihood methods. While the methods differ in their ability to simulate vote choice correctly, these differences are generally not large enough to affect the counterfactual estimation of election results under universal turnout. To asses the generality of this finding, we also compare the different methods across 30 elections in the Comparative Study of Electoral Systems dataset. Multiple imputation produces on average higher turnout effects than multinomial logit methods and the differences increase as turnout goes down. System variables such as the number of parties do not affect the differences in results between methods. Acknowledgements Previous versions of this article were presented at the annual conference of the Political Studies Association Specialist Group on Elections, Public Opinion and Parties (EPOP), 8-10 September 2006, Nottingham, and at the 65th Annual National Conference of the Midwest Political Science Association, April 12-15, 2007, Chicago, IL. We would like to thank Brad Gomez and three anonymous reviewers for the Journal of Elections, Public Opinion and Parties for very helpful comments. Patrick Bernhagen would like to acknowledge grant support from the British Academy. 2 Missing Voters, Missing Data: Using Multiple Imputation to Estimate the Effects of Low Turnout In his 1996 presidential address to the American Political Science Association, Arend Lijphart (1997) drew attention to the problem that less than full turnout might affect election results and lead to the under-representation of the political interests of some groups of citizens, such as ethnic minorities or low-income groups. Recent elections in Germany and the USA revived the notion in public discourse that decreasing turnout is bad for social democratic parties or that higher turnout would benefit the Democratic candidate. The political consequences might extend well beyond the realm of electoral representation. For example, class differences in voter mobilization have been shown to affect welfare spending (Hill, Leighley and Hinton-Andersson, 1995). Numerous attempts have been made to assess the veracity and relative importance of this claim: If turnout went up, would it affect the election result? Which parties would gain and which would lose? Different methods have been proposed to estimate the political effects of low turnout, often yielding different answers to these questions.1 This article contributes to the methodological discussion by assessing the performance of multiple imputation in the estimation of the partisan effects of low voter turnout. We proceed as follows: After a brief overview of the main strategies for detecting turnout effects, we introduce the idea of treating vote abstainers as missing data. Using the 2002 Irish General Election as a case study, we first demonstrate how a statistical model of multiple imputation can be used to fill in the vote choices of non-voters. We compare the results to those generated by more traditional vote-propensity methods and demonstrate the 1 A recent special issue of Electoral Studies contains studies of turnout effects in the context of large numbers of national and supranational elections as well as referendums. Lutz and Marsh (2007) provide a thorough review of the previous literature on turnout effects. 3 validity of the multiple imputation method by verifying estimates and reported turnout against official data. Both the Irish polity and the Irish National Election Study have a number of useful properties that enable us to verify our estimates against professed vote choices and official records and evaluate the performance of different methods for ascertaining turnout effects. Furthermore, we asses the generality of our results by comparing turnout effects at 30 elections in 25 countries in the Comparative Study of Electoral Systems dataset and identifying the institutional and structural circumstances in which the results from the different simulation methods diverge. We conclude by evaluating the utility of multiple imputation vis-à-vis traditional methods of estimating turnout effects and discuss the implications for political scientist’s efforts to assess the political consequences of low turnout. Does Turnout Matter? An Overview of Methods and Findings Political preference revelation at elections and other polls will be biased whenever turnout is short of 100 percent and abstainers’ preferences are non-randomly distributed among the electorate. When everyone votes there can be no bias in the representation of party preferences. And if non-voters are a representative cross-section of the electorate, their abstention will not increase the influence of any one group of voters at the expense of another. But vote abstention seems far from random and for some time now political scientists have been able to identify those segments of the electorate that are least likely to vote (Brady, Verba and Schlozman, 1995; Norris, 2002, 83-100; Rosenstone and Hansen, 1993). At the same time, the ability to estimate the size and direction of the political bias that results from non-random voter abstention is severely hampered. We frequently know how voters differ from non-voters with respect to various sociodemographic characteristics or political attitudes. We can also determine the preferences of those voters that are in many 4 important social and political respects like non-voters and in that way estimate the hypothetical vote choices of nonvoters. Finally, we can ask non-voters how they would vote if they voted. But we cannot know with certainty what the vote choice of non-voters would be if they actually turned into voters. Together, these hypothetical vote choices make up what we refer to as turnout effects – the effects that higher or lower turnout would have on an election outcome, where election outcome is defined as the vote shares of the parties (cf. Lutz and Marsh, 2007: 541). Any attempt to estimate such turnout effects is subject to the fundamental methodological problem that the demonstration of political inequalities resulting from unequal turnout requires knowledge of how non-voters might behave were they not non-voters. The question is, how can we know the parameters that govern this counterfactual state of the world? In the presence of these limitations, three strategies can be distinguished among efforts to estimate counterfactual voter behaviour and its effects on election outcomes. One approach uses opinion polls to consider whether voters and nonvoters differ in any significant way on the dimension of partisan identification or with respect to various policy-related issues. Some studies use survey data to compare the attitudes of voters and non-voters with respect to social and economic policy issues or general attitudinal dispositions (e.g., Bennett and Resnick, 1990), while others ask more specifically about partisan identifications and preferences (e.g., Highton and Wolfinger, 2001). But querying nonvoters as to whether their attitudes differ from those of voters is not the same as to estimate how these people would behave in actual elections: the propensity to vote is part of an individual’s attitudinal make-up and it is possible that, as this aspect changes, other elements such as party or policy preferences change too. A second approach examines election studies and official election results for evidence of turnout effects. This involves regressing the vote share of certain types of parties and 5 candidates (usually left-of-centre ones) on aggregate turnout and a variety of control variables (DeNardo, 1980; Radcliff, 1994; Pacek and Radcliff, 1995). But the use of aggregate data by this approach poses an ecological inference problem: the fact that turnout rates in a group of elections are similar does not imply that in each of the elections the types of individuals abstaining are the same (Herron, 1998: 6-7). A third approach to estimating turnout effects, finally, tackles these problems by analyzing individual-level behaviour using the information contained in reported electoral behaviour to estimate the propensity to vote for a particular party and then compare (selfreported) voters and non-voters (e.g., Brunell and DiNardo, 2004; Citrin et al., 2003; Martinez and Gill, 2005). In this approach, used for various analyses of US elections, the vote choices of nonvoting individuals are predicted using coefficients from maximum likelihood regression analysis of the behaviour of voters who share similar sociodemographic and, sometimes, political characteristics.2 These votes are then added to the observed vote choices after a weighting has been applied to account for differential propensity to vote. This approach has its own drawbacks. Firstly, in common with the previous approach it works on the assumption that nonvoters would behave like those voters with whom they share a set of sociodemographic and attitudinal characteristics. This assumption is problematic. Sociodemographic and attitudinal correlates of vote choice alike may provide a weak basis for estimating the preferences of abstainers, not least because abstainers might be those with preferences that run counter to the norm in their respective social group. Secondly, simulations of the vote choices of nonvoters are often based on a restricted set of variables, consisting mainly of demographics (e.g., Citrin et al., 2003; Brunell and DiNardo 2004). Martinez and Gill’s study (2005) improves on this by employing an expanded set of predictor 2 This strategy is also followed by Tóka’s (2002) analysis of elections in the Comparative Study of Electoral Systems dataset, although he uses discriminant analyses to estimate his vote choice models. 6 variables of vote choice and turnout in their model. But attitudinal variables tend to have more missing values than demographics and, with listwise deletion as the default method of dealing with missing data, more variables usually mean fewer cases. Thus, the multinomial logit methods employed by these authors involve a trade-off between the richness of the model on one hand and the number of cases to be used as a basis for prediction on the other. This is a choice between a loss of information and potential bias incurred through an underspecified model and a loss of information and potential bias stemming from deletion of a considerable number of cases available for fitting the model.3 By opting for the former, Citrin et al. (2003) avoided data attrition at the expense of a richly specified model for predicting vote choice.4 By contrast, Martinez and Gill (2005) use a more fully specified vote model but at the cost of losing a considerable amount of data. At best, this leads to a loss of information and an increase in variance around the predictions. At worst, missingness is correlated with either vote choice or vote propensity, or both. Hence, Martinez and Gill’s (2005) model for predicting vote choice may also be mis-specified as it omits the selection variable, and their estimates of turnout effects may be incorrect. In the following section we describe an alternative method that addresses these limitations. Multiple Imputation Voting is an individual choice, which makes the sample of voters a self-selected one (Dubin and Rivers, 1989). The reason why we do not have data on how some citizens voted is simply 3 In fact, because the problem of sample selection is one of omitted variable bias (Heckman, 1979; Dubin and Rivers, 1989), the loss of observations is strictly speaking also a problem of model specification. 4 This is not to insinuate that Citrin et al. made a deliberate decision in favor of larger N and against a fully specified model. Rather, as Citrin et al. analyze state-level elections in the USA, their data restrictions result from the census data they used, which are the best data available for their purpose but do not contain attitudinal variables. 7 because they chose not to vote. This means that the problem of less-than-full turnout is analogous to the problem of missing observations in any statistical analysis. If missing data points are uncorrelated with the error term or with the variables of interest, they pose few problems beyond reducing the number of observations and thereby increasing the variance of the estimates of parameters or quantities of interest. But if the sampling fractions are correlated with the errors or the dependent variable, estimates may be biased. In the case of voter turnout, we have reason to suspect such correlation: while studies of turnout effects may have failed to establish a robust and directional link between turnout and vote choice, their findings on the whole suggest that abstention is not random in relation to vote choice (e.g., Norris, 2002, 83-100). Treating the unobserved vote choices of non-voters as missing data points means that we assume the vote choices of non-voters constitute data that actually exist but have not been observed or recorded because the potential voters’ proclivity to turn out the vote has not exceeded a certain threshold. Those who find this assumption difficult to accept might find it more agreeable to think in terms of preferences instead of vote choices: if we assume that individual preferences over parties and candidates are measured by votes, data on these preferences are missing for non-voters (Dubin and Rivers, 1989, p. 383). The question of turnout effects is whether or not the decision to vote and the decision which party to vote for are related. This question can be tackled using available techniques for the imputation of missing data. “Ad-hoc” methods of addressing missing data such as filling in means or imputing predicted values from regression analysis based on the observed data points (“conditional means imputation”) at best understate variability and at worst induce bias (Allison, 2002; Horton and Kleinman, 2007). Imputations arrived at by these methods will be unbiased only if the probability of missing data on any variable is unrelated to the values of the variable itself or the values of any other variables in the data. In the context of turnout 8 effects, predictions from regression analysis therefore run the risk of producing wrong estimates of nonvoters’ party choices while at the same time suggesting a false sense of accuracy of these estimates. The risk of biased estimates can be greatly reduced by the use of maximum likelihood estimation. But maximum likelihood predictions of vote choice such as those reported by Citrin et al. (2003) or Martinez and Gill (2005) may still be biased if the probability of any data point missing depends on both the dependent and independent variables – which is of course precisely the conjunction that is suspected to render turnout problematic for election outcomes. Given the robust findings in the literature on the sociodemographic and attitudinal determinants of vote choice (Nie, Verba and Petrocik, 1980) and equally robust evidence concerning the individual-level correlates of turnout (Wolfinger and Rosenstone, 1980), the possibility of missingness being jointly contingent on both dependent and independent variables cannot be dismissed. In this situation, multiple imputation (MI) is an efficient method of arriving at estimates of non-recorded vote choices. Bernhagen and Marsh (2007) use this method to estimate turnout effects at 30 elections in 25 countries from the Comparative Study of Electoral Systems (Module I). While their model specifications closely resemble Tóka’s (2002) and Martinez and Gill’s (2005) models, their reported changes in the parties’ fractions of the vote are generally higher than those generated by the multinomial logit or discriminant analyses of these studies. They find the change in parties’ vote share before and after imputation is mainly a function of turnout, that is, of the scope for change, which increases with declining turnout. At the level of parties, Bernhagen and Marsh found no evidence for either left, right, or centre parties gaining from full turnout scenarios. However, they did find that non-governing parties typically benefit from full turnout, as was originally suggested by DeNardo (1980), and that smaller parties would gain from full turnout. More generally, 9 Bernhagen and Marsh found that full turnout would on average reduce the gap between the strongest and second-strongest party. In one case, the US Congressional election of 1996, this would have led to a different party coming in first in the election. Originally proposed by Rubin (1976), MI involves three steps. First, plausible values for missing observations are created that reflect uncertainty about the nonresponse model. These values are used to “fill-in” the missing data points. This process is repeated, resulting in the creation of a number (usually 5-10) of “completed” datasets. In a second step, each of these datasets is analyzed using standard methods. In the case of estimating turnout effects, this simplifies to estimating the vote proportions of the different parties. Thirdly, the results are combined, which allows the uncertainty regarding the imputation in step one to enter the final result. To apply this method to the problem of low turnout, we have to assume that the probability of an individual’s vote choice remaining unrecorded may depend on the observed values of other variables, but, after controlling for these variables, is independent of any other missing information. In the terminology established by Rubin’s (1976) classification of data missingness, that is to assume the data are missing at random (MAR), i.e., missingness is random after controlling for missingness due to observed quantities. Formally, if there are two variables X and Y, where X is always observed and Y is sometimes missing, MAR means, Pr(Ymissing | Y , X ) Pr(Ymissing | X ) . Of course, most multivariate datasets will contain missing values on several, perhaps even on most, variables. Moreover, it is impossible to test whether the MAR condition is actually satisfied (Allison, 2002: 4). However, if at least one element in a vector of independent variables X is fully observed, we can assume that the data are MAR, conditional on the imputation model (King et al., 2001: 53). Furthermore, the MAR assumption can be 10 made more realistic by including more informative variables in the imputation process (Collins, Schaffer and Kam, 2001). To estimate turnout effects by way of multiple imputation we use the Amelia II program written by Honaker, King and Blackwell.5 Amelia II uses an expectationmaximization (EM) algorithm to generate values in place of missing observations. While the likelihood conditional on the observed (but incomplete) data cannot be easily constructed, the likelihood of a rectangularized data set (i.e., one for which all cells are treated as observed) is easy to construct and maximize, especially under the assumption of multivariate normality. The EM algorithm rectangularizes the data set by filling in estimates of the missing elements, generated from the observed data. In the E-step, missing data points are filled-in using linear regression, with their expected values conditional on the current estimate of the sufficient statistics and the observed data. In the M-step, a new estimate of the sufficient statistics is computed from the current version of the completed data (see Honaker, James and King, 2006 for a detailed exposition). The multiple imputation procedure is not intended to create causal explanation or parameter interpretation (King et al. 2001). The algorithm imputes values to all empty cells in a dataset loaded by it and does not discriminate between variables according to their status as dependent or independent variables in a regression model. Thus, MI imputes values on the independent variables as much as on the dependent variable, and the imputed values of the dependent variable are then used to improve the imputation of independent variables, and vice versa. EM always starts with the full covariance matrix, which means that it uses all the available variables as predictors for imputing the missing data (Allison, 2002: 20). Thus, the 5 The software, Amelia II: A Program for Missing Data (version 1.1-6 beta, July 18, 2006) is freely available at <http://gking.harvard.edu/amelia/ >. 11 MI model uses more information, both in terms of more variables and more observations than the approaches of Citrin et al. (2003) and Martinez and Gill (2005), each of which delete considerable amounts of existing data. This may well alter the simulation of non-voters’ vote choices and lead to different simulations of election outcomes. It also means that the standard errors of the resulting estimates will generally be too low, as the estimator assumes that there are complete data for all cases. The solution to this problem is to repeat the EM-based imputation process m times to produce m “complete” datasets. If random draws from the residual distribution of each imputed variable are made and added to the imputed values, estimates of the parameters of interest will be slightly different depending on which imputed dataset is used. This variability can be used to adjust the standard errors upward by averaging the parameters of interest and combining their standard errors according to a formula devised by Rubin (1987). For the analysis of turnout effects, this involves obtaining the probability of voting for a particular candidate or party for each imputed dataset (j = 1, ..., m) and averaging the m values. The standard error for the estimate is obtained in three steps: (1) the standard errors for the m point estimates are squared and then averaged; (2) the sample variance in the point estimates across the data sets is calculated; (3) the results from (1) and (2) are added together, weighted by a factor that corrects for the bias resulting from m < ∞, and the square root is taken (Allison, 2002: 29-30). As vote choice is a categorical variable measuring choices among k candidates, as is the case in most elections, we generate multiple imputations for k dummy variables created from the categories of the original vote choice variable. Data: The 2002 Irish National Election Study and the Comparative Study of Electoral Systems 12 The first data we use to demonstrate the MI method of simulating higher voter turnout are from the 2002 Irish National Election Study.6 Several aspects make Ireland an important case study for the assessment of the political effects of low voter turnout. Firstly, among European countries, the Republic of Ireland has fared notably badly in terms of election turnout. Average turnout at general elections has been only slightly above 70 percent since the 1970s, hitting a low at the 2002 election with 63 percent. Secondly, Ireland’s low turnout facilitates the estimation of tangible turnout effects, as any biases will be larger the bigger the share of vote abstainers among the electorate. Together with the previous point this means there is both more chance of observing a bias in the first place and more chance that such a bias can have a significant impact on the result (Kohler and Rose, 2008). Thirdly, the country’s STV electoral system leads to a fairly proportional translation of seats into votes, which reduces the amount of strategic voting among the observed votes that provide much of the information for simulating the behaviour of nonvoters. There are also a number of desirable properties of this particular election study that facilitate our efforts of estimating turnout effects and gaining a sense of how valid our method is. The Irish election study contains official data on individual turnout, allowing the validation of voters and weight corrections for over reporting. Furthermore, it contains the stated preferences of those who reported a vote, even if the official record indicated that the vote was not cast, as well as the hypothetical preferences of many who said they had not voted but told the interviewers how they would have voted. This will be useful in evaluating the MI method for assessing the impact of low turnout on the outcome. The dataset contains a range of demographic and political variables for 2,663 individuals. 2,391 respondents have reported their vote choices at the 2002 election by either agreeing to fill out a ballot paper 6 The 2002 Irish National Election Study was funded under the PRTLI/National Development Plan: see www.tcd.ie/ines. 13 during the interview or simply reporting their first preference vote.7 Of these, 1,835 have been officially validated as having voted. The MI procedure can make use of an extensive set of variables that can plausibly be suspected to be related to vote choice. We include gender and age, as these variables are often found to be influential in determining candidate or party preference. Additional sociodemographic variables, such as union membership, education, income, urban versus rural residence, religious denomination and language are also included, because it has been conjectured that these sociodemographic variables influence either turnout or vote choice or both. Beyond these sociodemographic characteristics, we include respondents’ evaluations of the economy and of other policy areas (health and housing), as well as a measure of political knowledge. Above all, however, we are able to impute missing data points based on the reported party and party leader preferences of voters and non-voters as recorded by reported probability to vote (PTV) and thermometer scales. Thus, the extensive set of data employed here enables us to go well beyond the sociodemographic correlates of turnout. The inclusion of party and leader preferences and evaluations at least allows for the possibility that abstainers might be those with preferences that run counter to the norm in their social groups, thus providing a significant improvement over simulations based solely on demographics. Furthermore, the political variables add to the overall richness of the set of predictors in the model, thus making the MAR assumption more realistic. A list of the variables included in the multiple imputation model is provided in Appendix A. 7 Respondents were asked to fill out a ballot paper for the relevant constituency. Those who sad they voted in May 2002 were asked, “Please fill it in as you did on polling day – as best you can remember.” Respondents who said they did not vote were asked, “Suppose you had voted in the May 2002 election, how would you have filled in the ballot paper on election day?” Respondents who declined to fill out the ballot paper were asked, “Could I ask which party you gave your first preference vote to?”. 14 The particular properties of the Irish election study may mean that findings based on this case are not necessarily generalizable to other contexts with different characteristics, such as political systems with fewer parties or larger incentives for strategic voting. To better judge the generality of our findings from the Irish case, we therefore replicate Bernhagen and Marsh’s (2007) analysis of turnout effects at 30 elections in 25 countries from the Comparative Study of Electoral Systems (Module I). These surveys were conducted between 1996 and 2002 and at a functionally equivalent point in time: when a national election was taking place. This maximizes unit homogeneity across elections that differ in respect of important variables such as the number of parties or extent of voter participation. Analysis and Results Existing theories and evidence about the reasons why people do not vote suggest a number of patterns that can be expected. A well-established argument claims that non-voters are nonvoters because they have little contact with agencies of mobilization (Rosenstone and Hansen, 1993; Brady, Verba and Schlozman, 1995). Following this logic, we might expect the larger Irish parties, Fianna Fáil (FF) and Fine Gael (FG), to be more attractive to potential supporters than smaller parties, as well as being more effective at mobilizing any latent support. Consequently, those who stay at home might be more likely to have a preference for smaller parties, and smaller parties such as Greens, Labour, and Progressive Democrats (PD) should benefit most from “complete” turnout.8 A second argument sees non-voting as a sign of disaffection (Crozier, Huntington and Watanuki, 1975; Gurr, 1970). Non-voters are more 8 We do not expect Sinn Féin to fall within the domain of this argument because this party is renowned for its resourced and resourceful local campaigns that match those of the larger parties. 15 detached from the established political system, and if they did vote it would tend to be for more radical parties, both right and left. Applying this logic to the 2002 Irish case we might expect more radical parties, such as the Greens or Sinn Féin (SF) to benefit from 100 percent turnout. Furthermore, we expect parties with a strong working class support base to benefit most from complete turnout. In many European countries these will be left-wing, socialist or Labour parties, but given what we know abut party affiliations in Ireland, we expect Sinn Féin to benefit most from higher working class turnout, followed by Fianna Fáil. While we have referred so far to voters and non-voters as separate groups of people, people may move both into and out of the electorate over time. For example, habitual voters may abstain because they are acutely unhappy with the incumbent performance of their traditional party. This may be due to the state of the economy or the likely failure by any government to deliver on some of their election pledges. At any rate we know that governing parties tend to lose votes (Nannestad and Paldam, 2002), and some of that loss will be due to abstention by erstwhile supporters. This would imply that incumbent parties (Fianna Fáil, Progressive Democrats) benefit most from “complete” turnout.9 [Figure 1 about here] Figure 1 displays the vote shares of each party and the residual group of independents/others as quadruplets of bars. For each party, the first bar represents the party’s vote share as recorded from the 1,835 validated voters in the dataset, weighted by demographic corrections for survey bias. This is contrasted with the second bar based on counterfactual full (100 percent) turnout. The full turnout figures are composed of the 1,835 9 We should also allow for the opposite effect of turnout increases identified by DeNardo (1980) for a series of US Congressional elections in the 1960s and 1970s. According to DeNardo, turnout boosts tend to harm the incumbent party as they involve the mobilization of “peripheral” voters, who respond in a rather fickle fashion to short term campaign effects. 16 vote choices of validated voters, the 148 imputed vote choices of people that did vote, but did not divulge their choice, and the imputed vote choices of 680 non-voters. The imputations have been obtained using information on the full range of variables listed in Appendix A.10 The error bars represent 95 percent confidence intervals. To assess the robustness of the imputations to different imputation models, we also obtained imputations based only on, firstly, socio-demographic variables and, secondly, utility and thermometer scales of parties and party leaders. Imputations on the vote variable are influenced by the observed votes when either of these subsets of variables is used in the imputation process, while imputed votes differ more from the observed votes when the maximum set of information is used (see Appendix B). In order to compare the MI results to maximum likelihood simulations of full-turnout, we replicate the procedures employed by Citrin et al. (2003) and Martinez and Gill (2005). This involves modelling vote choice at the 2002 election and using the estimated coefficients from multinomial logit estimation to calculate for each non-voter the predicted probabilities of vote choice for the various parties. To obtain the parties’ vote shares under a full turnout scenario, we then add the estimated vote choice of non-voters to the observed vote 10 The models employed here replicate those used in Bernhagen and Marsh (2007) and are fully justified there. This is because in a later stage of this analysis we use the same CSES data sets as used by that study and want to maintain comparability. We could have used a different model for the Irish analysis here but chose not to do so on grounds of consistency. In doing so, we ensure that our imputation model corresponds to with previous research on Irish election turnout. There are very few models of turnout. Lyons and Sinnott (2002) and Marsh et al. (2008) detail mobilization and resource effects as well as socio-demographics – all key elements of the model used here. While these studies make the point that non-voting is heterogeneous, with some abstaining for circumstantial reasons and some more deliberately, this argument, if correct, is not only appropriate to Ireland (Blondel et al., 1998). In any case, the CSES data sets do not include the variable operationalizing this categorization. 17 proportions, weighting the components by the actual turnout rate of 63 percent.11 Following Citrin et al. (2003), the first vote choice model includes gender, education, income, age, and urban-versus-rural residence.12 In a second step, we add a comprehensive set of nondemographic variables to specify a model analogous to Martinez and Gill’s (2005). This list of variables is identical with the one used for the MI simulation procedure above. Thus, the third bar represents the simulated vote share based on predictions from multinomial logit on demographics only (Citrin et al.’s approach). The fourth bar represents the simulated vote share based on predictions from multinomial logit using the full set of information available in the election study for all cases with no missing data on variables other than vote choice (Martinez and Gill’s approach). The most notable result is that the simulated increase in turnout from 69 (in the survey) to 100 percent does not lead to any radical changes in the vote shares of the parties – regardless of the simulation method used. The single biggest change through full turnout is a loss of about two percentage points for Fianna Fáil, matched by an increase of roughly one percentage point each for Fine Gael and Sinn Féin. Even the two percent drop in the Fianna Fáil vote share, however, is well within the five-percent margin of uncertainty indicated by 11 As we use survey data, the “observed” vote choices have not been observed in the strict sense. For clarity and ease of presentation, we will use the following nomenclature: “observed” vote choice is the vote choice reported by respondents in a national election study who have been verified as having turned out to vote according to official records. “Professed” vote choice is the vote choice reported by respondents who said they cast a vote in the actual election but did not actually do so. This latter category also included the choice of non-voters indicating how they believe they would have voted had they gone to the polls. 12 Citrin et al. also include race as a key variable. However, race is not a noteworthy issue in Irish electoral behavior and the Irish election study contains no information on ethnic or race variables. Instead, we include urban-versus-rural residence, as this variable is a key correlate of Irish voter behavior (cf. Marsh et al., 2008: 164-79). 18 the error bars. Thus, while the literature offers many reasons to expect the costs and benefits of full turnout to be unevenly distributed across parties, this analysis of the 2002 Irish General Election suggests the impact would be marginal, and that, indeed, we cannot be sure that it would have any differential impact at all. Comparing the MI simulations with simulated election results based on predictions using multinomial logit coefficients from a demographic turnout model, we can see that the latter are quite similar to those generated by MI. Full turnout simulations using this method are slightly less “full” (N=2,570) than those arrived at by MI. This is due to small amounts of missing observations on several predictor variables, which led to the loss of almost 100 cases by way of listwise deletion. For some parties, the demographics-based multinomial logit simulations are between the MI simulations and the actual result; sometimes they are a little above or below the one or the other. While the turnout effects suggested by MI are overall slightly more pronounced than those picked up by the different multinomial logit models, the deviations from either the actual or the MI results do not follow a systematic pattern. If anything, the demographics-based multinomial logit simulations suggest slight gains for smaller parties from full turnout, at the expense of the larger parties. These observations also hold for the richer multinomial logit simulations as suggested by Martinez and Gill. Because of the expanded number of independent variables in this vote model, many of which have missing observations, listwise deletion leads to the loss of over 400 cases. Just how much faith can we have in the MI method and how good is it at estimating the vote choices of non-voters? We cannot answer that question definitively, for the same reasons that make this or any other simulation strategy necessary in the first place. But, to begin with, the MI results are confirmed by multinomial logit estimates following the strategies of Citrin et al. (2003) and Martinez and Gill (2005). However, as the multinomial logit estimator is no more efficient than MI, different standards have to be applied. Therefore, 19 we conduct two further tests to explore the validity of the imputations. First, the 2002 Irish Election Study records how non-voters said they had voted or said they would have voted. Thus, we know their survey responses to the question of how they voted (if they falsely claim they had voted) or how they think they would have voted had they actually voted. After verifying that these respondents really are non-voters, we can compare their professed vote choices with the simulated votes of the same individuals obtained through MI. The sub sample for this analysis comprises of those 556 respondents that are known not to have voted, but who reported a vote choice during the interview. The results are displayed in Figure 2. The general picture that emerges is consistent with the differences between the observed and simulated full turnout. As before, the simulations from multinomial logit generally trail the prevailing pattern. For the larger parties the demographics-based predictions are “between” the professed and the imputed vote shares, while they tend to give larger gains to Sinn Féin and independent candidates than either what respondents would claim or what the imputation algorithm would fill in. The simulations based on the full multinomial logit model follow that random pattern, being at times closer to respondents’ assertions than either MI or predictions from the demographic multinomial logit model, at times further away, and sometimes in between. Even when comparing purely simulated voting behaviour we find that the differences between MI and either of the multinomial logit alternatives are not on the whole significant. The only exception is the simulated vote share of the Progressive Democrats, which according to the simulations based on Martinez and Gill’s model is 5.8 percent and lies outside the 95 percent margin of error of the MI estimate (3.4 percent +/- 2 percentage points). [Figure 2 about here] It might be argued that the reported vote of actual non-voters is not necessarily a reliable indication of how they would have voted had they done so. There may be a bias in favour of 20 the winners, for instance, and against the most obvious losers. Indeed, this appears to be the case with respect to Fianna Fail and Fine Gael respectively. It might also be that these were people who paid less attention to the campaign and gave less thought to their choice than actual voters. In this respect the test is not an ideal one, as it cannot tell us whether it is the MI procedure that was unreliable or the imagination of the respondents (Karp and Brockington 2005). We can gain a better idea of the performance of the imputation procedure by artificially creating a further set of missing values on the vote variable and using MI to reestimate the votes of these fake “nonvoters”. To do so, we first truncate the dataset to one that contains only confirmed voters (N=1,835). We then impose the survey turnout rate of 69 percent anew by removing observations at random, before re-imputing the deleted observations using Amelia. By cross-tabulating the distributions of observed and re-imputed preferences we can gauge how close the individual imputations are to the recorded votes. Lastly, we can obtain similar cross-tabulations for imputed and professed votes reported in Figure 2 above and compare the closeness of the two sets of cross-tabulations. This will indicate how MI performs as an approximation of actual vote choices vis-à-vis professed votes. The results are displayed in Table 1. The general picture that emerges from panel (a) is that the distributions of observed and re-imputed preferences are quite similar. This is all the more significant as the re-imputations of artificial non-voters have to make do with much smaller numbers of cases. However, we are primarily interested in the accuracy of imputation within each party group: to what extent does the imputation procedure correctly identify those who vote Fianna Fáil, or Fine Gael and so on? [Table 1 about here] The table can be read in two directions: down and across. Reading across, the accuracy seems poor. Except in the case of Fianna Fáil and Fine Gael, less than half of the 21 voters for any party are correctly identified in either panel of the table. However, this may partly be an artefact of the EM algorithm, which will be biased towards the larger parties. More appropriate is to read downwards: it can then be seen that even in the case of the smaller parties the imputations are much more likely to predict the true choice with accuracy than it is to make any other prediction. Lastly, we might expect the imputations to be closer to the “real” ones in the case of the artificially created non-voters than in the case of the actual non-voters (panel b), on the basis that the choice made by the former was a realized intention and that of the latter at best an intention and at worst something respondents make up. This difference in closeness is shown in panel (c): The bigger the positive difference, the more our re-imputation of real vote choices outperforms our imputation of the professed choices of non-voters. Negative figures indicate that our implications are closer to the professed vote choices than they are to the actual ones. The expectation that MI simulations resemble actual votes more than professed ones bears out for most parties, but not for Fianna Fáil or the Progressive Democrats.13 As the “artificial” abstainers were chosen at random from among actual voters, the MAR assumption is arguably more likely satisfied than in the context of the other MI simulations reported in this article.14 The fact that our re-imputation of the vote choices of artificial non-voters outperforms our simulation of the professed votes of actual non-voters (who have been presented with the problematic task of answering counterfactual questions) further validates the imputation model. While potentially extraneous information about 13 These were the two government parties both before and after the 2002 election. It is unclear to what extent this pattern reflects respondents’ impulse to side with the winners. 14 However, as these are random draws from a sub sample that can itself not be considered random with respect to vote choice, the “missing” vote choices of the newly created non-voters’ are not missing completely at random. 22 socioeconomic determinants of turnout decision and vote choice may not aid the imputation algorithm in making the right imputations, it appears at worst irrelevant. [Table 2 about here] How do the multinomial logit strategies of estimating turnout effects perform in this respect? Table 2 repeats the cross-tabulations for predictions based on Citrin et al.’s model. Here never more than half of the voters for any party are correctly identified in either panel of the table. And only in the case of Sinn Féin are the simulations notably more likely to predict the true choice with accuracy than to make any other prediction (panel a). Furthermore, for most parties the simulations are closer to the professed votes of non-voters than they are to the real votes of the artificially created non-voters, although the differences are very small (panel c). In other words, while MI obtains its best results when estimating what people really did, multinomial logit estimation using demographics is hardly better at replicating what people did rather than what they say they did, and it matches either data rather poorly. [Table 3 about here] This does not necessarily mean that MI performs per se better than multinomial logit. While the model replicating Citrin et al.’s method had to make do with a significantly smaller number of predictor variables, relying on demographics only, a fairer comparison might be one between MI and multinomial logit based on the full set of demographic and political variables as used by Martinez and Gill (2005). The predicted vote choices of real and artificially created non-voters from multinomial logit estimation of a comprehensive model are presented in Table 3. Despite the inevitably considerable attrition, these multinomial logit-based simulations of vote choice perform well, attributing the correct vote choice more than two thirds of the time for all parties (but only about half the time for the independent candidate). And while the simulations are also closely correlated to non-voters’ professed and hypothetical votes, the match is again best for re-estimation of the deleted votes of actual 23 voters.15 Indeed, the match for re-estimation of the deleted votes of actual voters is overall better for multinomial logit than it is for MI, although the latter has been able to make use of a significantly larger number of observations. Despite these differences, the main finding that emerges from this analysis of the 2002 Irish election is that the three methods for simulating the vote choices of non-voters lead to substantially similar results. To gauge to what extent this finding is particular to this case, we also use the three methods to simulate full turnout at 30 elections in 25 countries from the Comparative Study of Electoral Systems (Module I).16 These elections capture the full variation of systems of government, electoral systems, party systems and voter turnout. In order to express the net differences between the observed and the hypothetical vote at the national level, we use the Gallagher Index of Disproportionality. Originally designed to measure the difference between the distributions of votes and seats in an election, this index uses squared differences of the proportions, thereby avoiding the problem that changes to party vote share cancel each other out while giving larger weight to the big vote share changes for individual parties (Gallagher, 1991).17 The index provides an ideal tool for comparing the observed-versus-imputed vote bias across diverse elections. 15 Again, we find that the edge of re-estimation of the deleted votes of actual voters over simulating professed votes is somewhat reduced in the cases of the two government parties. 16 The study contains data on two successive elections in Mexico (1997 and 2000) and Spain (1996 and 2000). To control for distinct electoral cleavages, data on Wallonia, East Germany, and Scotland are treated as separate elections. 17 The index is calculated as the square root of 1 n (vc vo )2 , 2 i 1 where vc denotes the vote share based on 100 percent (“complete”) turnout and vo denotes the observed vote share of the n competing parties. 24 Figure 3 shows the distribution of the resulting disproportionality scores across the 30 elections in the CSES data as well as the 2002 Irish election. Looking at this range of elections from different countries, many of the simulated election results under full turnout are less similar across the three methods than in the Irish case. Firstly, MI leads to bigger turnout effects on average. The mean disproportionality score using MI is 4.25 (S.D. = 2.25), compared to 1.95 (S.D. = 1.33) when the Martinez and Gill method is used and .9 (S.D. = .74) in the case of the method used by Citrin et al. Bivariate correlations are moderately high and statically significant between disproportionality scores based on MI and Martinez and Gill’s method (r = .41, p = .02) and between the disproportionality scores using Martinez and Gill’s and Citrin et al.’s methods, respectively (r = .37, p = .04), but not between the MI and Citrin et al. based disproportionality scores (r = .27, p = .14). This suggests that the model used by Martinez and Gill assumes a middle position between simulation on logit predictions and MI. [Figure 3 about here] Looking at the differences in correspondence of the three disproportionality scores across elections, few systematic sources of variation can be discerned. Instances in which election results amended by MI simulations differ starkly from simulations using multinomial logit predictions include proportional representation systems such as the Spanish one as well as first-past-the-post systems like in Canada. However, cases like the Spanish elections or New Zealand in 1996 suggest that discrepancies between the different simulation methods might vary with the number of parties. To investigate this possibility, we regress differences between MI and each of the two multinomial logit-based simulations on the number of parties fielded in each election (Figure 4). The measure for the difference between any two disproportionality scores is again Gallagher’s Index of Disproportionality. The mild positive association between the number of parties and discrepancy between disproportionality scores 25 is not statistically significant, suggesting that the differences between MI and multinomial logit predictions pertain as much to elections in two-party systems as to those in multi-party systems. [Figure 4 about here] The only systematic pattern that seems to appear is that the three disproportionality scores differ less in the context of elections with high turnout. We test this more directly by regressing the disproportionality scores expressing the difference between MI and each of the two multinomial logit disproportionality scores, respectively, on turnout. Figure 5 shows that this expectation bears out: A ten percent increase in turnout reduces the difference between MI and multinomial logit based simulations by almost a full unit on both disproportionality scores (which range from .66 to 9.28 in the demographics only case and from .5 to 10.23 in the case of the full information model). Thus, the differences between MI simulations of counterfactual voting behaviour and multinomial logit predictions of the same matter most in the context of low turnout elections. [Figure 5 about here] Discussion and Conclusions The question of whether turnout matters for election results can only be answered if we can say with some degree of certainty how the abstainers would have voted had they voted. In this article we have assessed and compared multiple imputation as a method of ascertaining the impact of turnout on election results. Our findings suggest that we can have reasonable confidence in the MI method of estimating turnout effects: two validity tests produced good results. The MI results are also partly matched by simulations using multinomial logit models as proposed by Citrin et al. (2003) and Martinez and Gill (2005). 26 Beside a relative similarity to the results from multinomial logit estimation of full turnout election results, MI simulation has a number of advantages. Firstly, it provides us with a measure of confidence reflecting the uncertainty of the imputation method as well as uncertainty fundamentally inherent in the world. Secondly, simulation by MI rests on assumptions about the relationship between turnout and vote choice that are less demanding and therefore more realistic than those underlying simulations based on multinomial logit predictions. Thirdly, multinomial logit strategies face a trade-off between the richness of the model and the number of cases to be used as a basis for prediction on the other. In the cases of the Citrin et al. and Martinez and Gill studies, this amounts to a trade-off between a model based on demographics only, which tend to have fewer missing values than attitudinal variables, and a fully specified model. This trade-off necessitates a choice between the loss of information and potential bias incurred through an underspecified model or the loss of information and potential bias stemming from deletion of a considerable number or cases. By imputing missing data among all variables including the vote choice variable, MI simulation avoids this trade-off. The MI approach is therefore likely to be more efficient than multinomial logit to the extend that it can utilize a wider range of variables, including, where available, party measures of voter utilities as given by party and candidate thermometers and PTV scores, without incurring a loss of observations. Comparing the predictions from models following Citrin et al., and Martinez and Gill’s specifications, respectively, the lesson is to prioritize a full model over thee need to maximize observations if that is the only choice available. MI ensures that a third option is available.18 18 The latest version of Stata (release 11) provides MI routines that offer a range of multiple imputation methods. For imputation of missing values on only a single variable, multinomial logit regression is offered. This method enables researchers to simulate higher turnout using the method proposed by Martinez and Gill (2005) but with the added benefit of estimating the uncertainty of the imputations based on averaging across a 27 In line with recent studies on turnout effects at elections around the world (cf. Lutz and Marsh 2007), the main finding from the Irish case study is that the fortunes of the Irish political parties at the 2002 election would have remained virtually unaffected by universal turnout. One might speculate about the difference that a few more votes might have made to the distribution of seats: Fine Gael might not have suffered quite such the meltdown (the party incurred by far the largest losses of all parties from the previous election), and Fianna Fáil might have needed support from independent TDs to form a government with the Progressive Democrats. But while the empirical differences between MI and the two sets of multinomial logit simulations are negligible in the case of the Irish election, an extension of the comparison to the first CSES module suggests that discrepancies are the norm rather than the exception. Differences become more marked the more nonvoters there are whose behaviour has to be simulated in order to detect the political effects of high and low election turnout. Finally, a caveat. What we have done here is simply to assess the impact of full turnout, which in practice can only ever be approximated even trough the use of compulsory voting rules. Our analysis does not say what would happen if turnout rose by 5, 10 or 15 percentage points, and there are good reasons to expect that it would depend on what the agency of mobilization is. The simulation of election results under full turnout is nonetheless important – this is after all at the heart of the argument in Lijphart’s (1997) seminal article on number (e.g. five) sets of imputed values. To the extent that this remedies the biggest weakness of the Martinez and Gill method – the absence of a measure of uncertainty – Stata 11 can implement an “enriched” version of their method. Alternatively, an iterative Markov chain Monte Carlo method is available to impute missing values on all variables in the dataset, resembling the procedure implemented by Amelia and evaluated in this article. Through iteratively improving imputations on all (dependent and independent) variables, this option continues to have the added benefit of using all the information available in the data. 28 the topic. Future analyses should examine how MI performs relative to other approaches in simulating incremental increases or decreases in turnout. 29 References Allison, Paul D (2002) Missing Data (Thousand Oaks, CA: Sage). Bennett, Stephen Earl, and David Resnick (1990) The Implications of Non-Voting for Democracy in the United States, American Journal of Political Science 34: 771-802. Bernhagen, Patrick, and Michael Marsh (2007) The Partisan Effects of Low Turnout: Analyzing Vote Abstention as a Missing Data Problem, Electoral Studies 26(3): 54860. Blondel, Jean, Richard Sinnott, and Palle Svensson (1998) People and Parliament in the European Union (Oxford: Oxford University Press). Brady, Henry. E., Sidney Verba, and Kay L. Schlozman (1995) Beyond SES: A Resource Model of Political Participation, American Political Science Review 89: 271–294. Brunell, Thomas. L., and John DiNardo (2004) A Propensity Score Reweighting Approach to Estimating the Partisan Effects of Full Turnout in American Presidential Elections, Political Analysis 12: 28–45. Citrin, Jack, Eric Schickler, and John Sides (2003) What if Everyone Voted? Simulating the Impact of Increased Turnout in Senate Elections, American Journal of Political Science 47: 75–90. Collins, Linda M., Joseph L. Schafer, and Chi-Ming Kam (2001) A Comparison of Inclusive and Restrictive Strategies in Modern Missing Data Procedures, Psychological Methods 6 (4): 330–51. Crozier, Michael, Samuel P. Huntington, and Joji Watanuki (1975) The Crisis of Democracy: Report on the Governability of Democracies to the Trilateral Commission (New York, NY: New York University Press). DeNardo, James (1980) Turnout and the Vote: The Joke’s on the Democrats, American Political Science Review 74: 406–20. Dubin, Jeffrey A., and Douglas Rivers (1989) Selection Bias in Linear Regression, Logit and Probit Models, Sociological Methods and Research 18: 360–90. Gurr, Ted R. (1970) Why Men Rebel (Princeton, NJ: Princeton University Press). Heckman, James J. (1979) Sample selection bias as a specification error, Econometrica 47, 153-61. Herron, Michael C. (1998) The Presidential Election of 1988: Low Voter Turnout and the Defeat of Michael Dukakis, Political Methodology Working Paper. Highton, Benjamin, and Raymond E. Wolfinger (2001) The Political Implications of Higher Turnout, British Journal of Political Science 31 (1): 179-223. 30 Hill, Kim Quaile, Jan E. Leighley, and Angela Hinton-Andersson (1995) Lower Class Mobilization and Policy Linkage in the U.S. States, American Journal of Political Science 39: 75–86. Honaker, James, and Gary King (2006) What to do About Missing Values in Time Series Cross-Section Data, Unpublished manuscript, available at http://gking.harvard.edu/. Horton, Nicholas J., and Ken P. Kleinman (2007) Much Ado About Nothing: A Comparison of Missing Data Methods and Software to Fit Incomplete Data Regression Models, The American Statistician, 61, 1 (February): 79-90. Karp, Jeffrey A., and David Brockington (2005) Social Desirability and Response Validity: A Comparative Analysis of Overreporting Voter Turnout in Five Countries, Journal of Politics 67, No. 3 (August), 825-40. King, Gary, James Honaker, Anne Joseph, and Kenneth Scheve (2001) Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation, American Political Science Review 95: 49–69. Kohler, Uli, and Richard Rose (2008) Election Outcomes and Maximizing Turnout: Modelling the Effect, WZB Discussion Paper (Berlin: Social Science Research Centre Berlin). Lijphart, Arend (1997) Unequal Participation: Democracy’s Unresolved Dilemma, American Political Science Review 91: 1–14. Little, R. J. A., and Donald. B. Rubin (2002) Statistical Analysis with Missing Data (Hoboken: Wiley). Lyons, Pat, and Sinnott, Richard (2002) Voter turnout in 2002 and beyond, pp. 159-76 in M. Gallagher, P. Mitchell and M. Marsh eds., How Ireland Voted 2002 (Basingstoke: Palgrave). Lutz, Georg, and Michael Marsh (2007) Introduction: Consequences of Low Turnout, Electoral Studies 26 (3): 539–547. Marsh, Michael, Richard Sinnott, John Garry and Fiachra Kennedy (2008) The Irish voter: the nature of electoral competition in the Republic of Ireland (Manchester: Manchester University Press). Martinez, Michael D. and Jeff Gill (2005) The Effects of Turnout on Partisan Outcomes in U.S. Presidential Elections 1960-2000, Journal of Politics 67 (4), 1248-74. Nannestad, P., Paldam, M. (2002) The Cost of Ruling. A Foundation Stone for Two Theories, In: Dorussen, H., Palmer, H. D., Taylor, M. (eds.) Economic Voting (London: Routledge). 31 Nie, Norman H., Sidney Verba and John R. Petrocik (1979) The Changing American Voter (Cambridge, MA: Harvard University Press). Pacek, Alexander, and Benjamin Radcliff (1995) Turnout and the Left - Party Vote, British Journal of Political Science 25 (1): 137-153. Radcliff, Benjamin (1994) Turnout and the Democratic Vote, American Politics Quarterly 22: 259-276. Rosenstone, S. J., Hansen, J.M. (1993) Mobilization, Participation, and Democracy in America. (New York, NY: Macmillan). Rubin, Donald B. (1976) Inference and Missing Data, Biometrika 63: 581-592. Rubin, Donald B. (1987) Multiple Imputation for Nonresponse in Surveys (New York: John Wiley). Studlar, Donley T., and Susan Welch (1986) The Policy Opinions of British Non-voters: A Research Note, European Journal of Political Research 14: 139-48. Tóka, Gábor (2002) Voter inequality, turnout and information effects in a cross-national perspective. Working paper no. 297, Helen Kellogg Institute, Notre Dame University. Wolfinger R.E. and S.J. Rosenstone (1980) Who Votes? (New Haven, CT: Yale University Press). 32 Appendix A. Variables in Imputation Model for the 2002 Irish General Election Number of FF candidates in constituency Number of FG candidates in constituency Number of Green candidates in constituency Number of Labour candidates in constituency Number of PD candidates in constituency Number of SF candidates in constituency Number of independent candidates in constituency N 2663 2663 2663 2663 2663 2663 2663 Mean 2.63 2.14 0.78 1.11 0.49 0.86 3.47 S.D. 0.62 0.7 0.41 0.58 0.65 0.48 1.8 Min. 2 1 0 0 0 0 0 Max. 4 4 1 3 3 2 7 How likely ever to vote for Fianna Fáil How likely ever to vote for Fine Gael How likely ever to vote for Green Party How likely ever to vote for Labour How likely ever to vote for Progressive Democrats How likely ever to vote for Sinn Féin How likely ever to vote for an independ. candidate 2625 2603 2586 2602 2592 2595 2599 6.74 5.11 4.69 4.81 4.76 3.37 5.68 3.2 3.05 2.81 2.78 2.75 2.83 2.97 1 1 1 1 1 0 0 10 10 10 10 10 10 10 Thermometer degree, Bertie Ahern Thermometer degree, Mary Harney Thermometer degree, Ruairi Quinn Thermometer degree, Trevor Sargent Thermometer degree, Michael Noonan Thermometer degree, Gerry Adams Thermometer degree, Fianna Fáil Thermometer degree, Green Party Thermometer degree, Fine Gael Thermometer degree, Labour Thermometer degree, Progressive Democrats Thermometer degree, Sinn Féin 2612 2595 2558 2419 2562 2579 2591 2543 2567 2557 2553 2538 65.55 51.07 42.87 42.18 36.74 38.85 63.92 47.71 47.03 45.29 47.22 33.36 24.24 23.54 20.71 21.87 23.07 26.6 25.6 21.92 23.34 20.97 22.34 25.8 0 0 0 0 0 0 0 0 0 0 0 0 100 100 100 100 100 100 100 100 100 100 100 100 Evaluation of economy over last 5 years Evaluation of health services over last 5 years Evaluation of housing situation over last 5 years 2657 2655 2651 1.9 3.5 3.13 1.08 1.26 1.48 1 1 1 6 6 6 Age Female Urban Class Education Union member Left/right self placement Satisfaction with democracy Efficacy Frequency of attending religious service Political knowledge Party identification 2640 2663 2592 2498 2654 2326 2347 2341 2660 2393 2663 2663 46.9 0.52 0.29 2.53 3.84 0.35 2.1 6.91 2.54 3.09 3.39 0.28 17.12 1.7 1.37 0.48 0.61 2.82 1.66 1.84 1.28 - 18 0 0 1 1 0 1 0 1 1 0 0 100 1 1 5 6 1 4 11 7 8 5 1 Did Fianna Fáil contact? Did Fine Gael contact? Did Greens contact? Did Labour contact? Did Progressive Democrats contact? Did Sinn Féin contact? 2663 2663 2663 2663 2663 2663 0.33 0.24 0.02 0.12 0.04 0.06 - 0 0 0 0 0 0 1 1 1 1 1 1 33 Appendix B. Party vote shares under validated and full turnout using alternative imputation models and multinomial logit predictions Observed Vote (N=1,835) Simulation of Full Turnout MI (full set of variables, N=2,663) MI (demographics only, N=2,663) MI (preferences only, N=2,663) M-logit (demographics only, N=2,570) M-logit (all variables, N=2,240) Full sample FF FG Greens Labour PD SF Independent Nonvoters only FF FG Greens Labour PD SF Independent N 820 406 80 173 58 90 208 Prop. 45% 20% 4% 10% 3% 6% 11% Professed vote (N=556) N Prop. 261 45% 107 19% 30 7% 63 13% 22 3% 22 4% 51 10% Prop. 43% 21% 4% 10% 3% 7% 11% S.E. 0.012 0.011 0.006 0.007 0.004 0.006 0.008 Prop. 44% 20% 5% 10% 3% 6% 12% S.E. 0.012 0.009 0.006 0.007 0.004 0.007 0.008 Prop. 44% 21% 4% 10% 3% 6% 11% S.E. 0.012 0.010 0.006 0.007 0.005 0.006 0.007 Prop. 44% 20% 5% 10% 3% 7% 11% Prop. 43% 20% 4% 10% 4% 8% 11% (N=556) Prop. 42% 23% 4% 10% 3% 7% 11% S.E. 0.036 0.033 0.016 0.020 0.010 0.012 0.025 (N=556) Prop. 43% 20% 5% 10% 3% 6% 13% S.E. 0.032 0.025 0.011 0.016 0.011 0.019 0.018 (N=556) Prop. 43% 24% 4% 10% 3% 6% 10% S.E. 0.033 0.025 0.017 0.019 0.013 0.012 0.017 (N=496) Prop. 43% 19% 5% 10% 3% 8% 12% (N=305) Prop. 40% 21% 5% 11% 6% 10% 8% Figure 1. Observed versus full (100%) turnout vote 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% Actual turnout (N=1835) Full turnout (MI, N=2663) Full turnout (Citrin et al., N=2570) In d SF PD La bo ur re en s G FG FF Full turnout (Martinez and Gill, N=2,240) Figure 2. Professed versus simulated vote 60% Professed vote (N=556) 55% 50% MI (N=556) 45% 40% Citrin et al. predictions (N=496) 35% 30% Martinez and Gill predictions (N=305) 25% 20% 15% 10% 5% 0% FF FG Greens Labour PD 36 SF Ind SW Z 19 H U 99 N 1 ES 998 P 19 C AN 96 1 ES 997 P 2 KO 00 R 0 20 M EX 00 1 PR 99 T 7 2 SL 002 V 19 N TH 96 TW 199 N 8 1 C 996 ZH 19 R U 96 S 19 G BR 99 1 PR 99 U 7 EG 200 ER 1 1 IS 99 R 19 U KR 96 R 199 O M 8 19 M EX 96 20 U SA 0 0 19 D N 96 K 1 N 998 ZL 1 SC 99 O 6 W 19 G ER 97 19 IC 98 E SW 199 E 9 BE 1 9 LW 98 19 IR 99 L BE 200 LF 2 1 AU 9 9 S 9 19 96 Disproportionality Figure 3. Differences between observed and simulated vote shares based on MI and multinomial logit predictions (Gallagher Index) 10 9 MI 8 Martinez and Gill Citrin et al. 7 6 5 4 3 2 1 0 Figure 4. Differences between MI and multinomial logit simulations by number of parties (b) Difference between MI and Martinez and Gill HUN 1998 10 10 (a) Difference between MI and Citrin et al. HUN 1998 SWZ 1999 8 8 ESP 1996 ESP 1996 CAN 1997 SWZ 1999 CAN 1997 6 6 KOR 2000 MEX 1997 SLV 1996 NTH 1998 TWN 1996 MEX 1997 PRT 2002 CZH 1996 4 4 Britain 1997 MEX 2000 EastG 1998 PRU 2001 Scotland 1997 ISR 1996 UKR 1998 RUS 1999 DEU 1998 ICE 1999 SWE 1998 RUS 1999 UKR 1998 SWE1999 1998 ICE BELF1999 BELW1999 IRL BELF1999 2002 BELW1999 IRL 2002 AUS 1996 0 AUS 1996 b = .081; se = .085; R-sq = .03; N = 31 5 NTH 1998 CZH 1996 TWN 1996 SLV 1996 Scotland 1997 PRU 2001 ROM 1996 USA 1996 ISR 1996 MEX 2000 DEU 1998 Britain 1997 NZL 1996 DNK 1998 2 2 DNK 1998 NZL 1996 USA 1996 ESP 2000 EastG 1998 ROM 1996 0 Disproportionality ESP 2000 PRT 2002 KOR 2000 10 15 20 b = .092; se = .082; R-sq = .04; N = 31 25 Number of Parties 5 10 15 20 25 Figure 5. Differences between MI and multinomial logit simulations by turnout (b) Difference between MI and Martinez and Gill HUN 1998 10 10 (a) Difference between MI and Citrin et al. HUN 1998 SWZ 1999 8 8 ESP 1996 SWZ 1999 ESP 1996 CAN 1997 CAN 1997 6 6 KOR 2000 MEX 1997 NTHSLV 19981996 TWN 1996 ESP 2000 MEX 1997 PRT 2002 NTH 1998 CZH 1996 RUS 1999 4 4 Britain 1997 PRU 2001 MEX 2000 EastG 1998 ROM 1996 Scotland 1997 RUS 1999UKR 1998 Scotland 1997 PRU 2001 USA 1996 ISR 1996 2 2 SWE ICE 19981999 IRL 2002 AUS 1996 0 AUS 1996 b = -.084; se = .030; R-sq = .21; N = 31 50 60 70 NZL 1996 DNK 1998 BELW1999 BELF1999 IRL 2002 40 ISR 1996 DEU 1998 UKR 1998 DEU 1998 ICE 1999 SWE 1998 USA 1996 80 EastG 1998 CZH 1996 TWN 1996 SLV 1996 ROM 1996 MEX 2000 Britain 1997 DNK1996 1998 NZL 0 Disproportionality ESP 2000 PRT 2002 KOR 2000 90 b = -.087; se = .029; R-sq = .24; N = 31 40 Actual Turnout 39 50 60 70 80 90 BELF1999 BELW1999 Table 1. MI Vote by Observed and Professed Vote Re-imputed, N=569 (of 1,835) Greens Labour PD FF FG FF FG Greens Labour PD SF Independent 0.64 0.17 0.39 0.17 0.37 0.21 0.25 0.17 0.58 0.18 0.20 0.35 0.09 0.31 0.02 0.03 0.31 0.05 0.01 0.09 0.09 0.03 0.06 0.10 0.36 0.04 0.12 0.06 Total 0.40 0.30 0.05 0.08 FF FG FF FG Greens Labour PD SF Independent 0.70 0.28 0.27 0.30 0.33 0.24 0.33 0.12 0.47 0.21 0.16 0.23 0.23 0.18 0.02 0.03 0.19 0.05 0.00 0.04 0.04 0.04 0.06 0.15 0.27 0.14 0.06 0.12 Total 0.49 0.21 0.04 0.09 (a) Observed Vote (b) Professed Vote SF Independ. 0.03 0.04 -0.10 0.02 0.10 -0.04 0.03 0.00 0.00 0.01 0.02 0.09 0.32 0.02 0.11 0.12 0.11 0.18 0.05 0.21 0.23 0.03 0.02 0.14 SF Independ. 0.03 0.03 0.05 0.01 0.17 0.02 0.01 0.02 0.01 0.03 0.07 0.02 0.30 0.07 0.07 0.11 0.11 0.13 0.11 0.12 0.23 0.03 0.04 0.10 Imputed, N=556 (of 2,663) Greens Labour PD (c) Difference in -6 11 12 9 -7 2 0 Party Match Note: Cell entries are the average probabilities of a vote for the column party. Imputations are based on MI model including all available variables related to vote choice. 40 Table 2. Multinomial logit-simulated Vote by Observed and Professed Vote (Citrin et al. model) Re-simulated, N=513 (of 1,835) Greens Labour PD FF FG FF FG Greens Labour PD SF Independent 0.46 0.44 0.41 0.45 0.40 0.45 0.44 0.24 0.26 0.26 0.25 0.28 0.20 0.25 0.03 0.04 0.05 0.04 0.04 0.03 0.04 0.08 0.09 0.09 0.09 0.10 0.08 0.09 Total 0.45 0.25 0.03 0.08 FF FG FF FG Greens Labour PD SF Independent 0.44 0.44 0.36 0.42 0.40 0.43 0.42 0.21 0.24 0.19 0.20 0.20 0.16 0.20 0.04 0.04 0.09 0.06 0.07 0.04 0.05 0.10 0.09 0.12 0.11 0.12 0.10 0.10 Total 0.43 0.21 0.05 0.10 (a) Observed Vote (b) Professed Vote SF Independ. 0.02 0.02 0.03 0.02 0.03 0.02 0.02 0.04 0.04 0.03 0.04 0.03 0.08 0.04 0.12 0.12 0.13 0.12 0.12 0.13 0.12 0.02 0.04 0.12 SF Independ. 0.03 0.03 0.05 0.04 0.05 0.04 0.03 0.06 0.05 0.07 0.06 0.06 0.11 0.08 0.12 0.12 0.11 0.12 0.10 0.11 0.12 0.03 0.06 0.12 Simulated, N=496 (of 2,663) Greens Labour PD (c) Difference in 2 2 -4 -2 -2 -3 0 Party Match Note: Cell entries are the average probabilities of a vote for the column party. Simulations are based on multinomial logit model including only demographics. 41 Table 3. Multinomial logit-simulated Vote by Observed and Professed Vote (Martinez and Gill model) Re-simulated, N=313 (of 1,835) Greens Labour PD FF FG FF FG Greens Labour PD SF Independent 0.77 0.15 0.10 0.14 0.16 0.20 0.20 0.11 0.72 0.01 0.07 0.10 0.07 0.22 0.01 0.00 0.70 0.00 0.02 0.00 0.03 0.02 0.03 0.09 0.66 0.00 0.02 0.04 Total 0.43 0.28 0.02 0.08 FF FG FF FG Greens Labour PD SF Independent 0.70 0.21 0.18 0.13 0.22 0.07 0.25 0.09 0.54 0.16 0.10 0.12 0.06 0.16 0.03 0.01 0.39 0.05 0.03 0.05 0.07 0.04 0.07 0.07 0.56 0.06 0.04 0.09 Total 0.42 0.20 0.05 0.11 (a) Observed Vote (b) Professed Vote SF Independ. 0.00 0.01 0.00 0.01 0.67 0.00 0.01 0.01 0.01 0.02 0.00 0.00 0.66 0.01 0.08 0.08 0.08 0.12 0.05 0.06 0.49 0.02 0.03 0.14 SF Independ. 0.05 0.03 0.04 0.03 0.52 0.06 0.04 0.04 0.06 0.09 0.05 0.01 0.69 0.09 0.06 0.09 0.08 0.07 0.05 0.04 0.30 0.06 0.08 0.09 Simulated, N=305 (of 2,663) Greens Labour PD (c) Difference in 7 18 31 10 15 -3 19 Party Match Note: Cell entries are the average probabilities of a vote for the column party. Simulations are based on multinomial logit model including all available variables related to vote choice. 42