Missing Voters, Missing Data

advertisement
Missing Voters, Missing Data: Using Multiple Imputation to Estimate the
Effects of Low Turnout
17 May 2010
Patrick Bernhagen
Department of Politics and International Relations
University of Aberdeen
Michael Marsh
Department of Political Science
Trinity College Dublin
Address for correspondence:
Patrick Bernhagen, University of Aberdeen, Department of Politics and International
Relations, Edward Wright Building, Dunbar Street, Aberdeen, AB24 3QY, United Kingdom.
E-mail: p.bernhagen@abdn.ac.uk.
Missing Voters, Missing Data: Using Multiple Imputation to Estimate the
Effects of Low Turnout
ABSTRACT: In recent years, different methods have been proposed to estimate the political
effects of low voter turnout. This article contributes to the discussion by assessing the
performance of multiple imputation in estimating the partisan effects of low turnout. Using
the 2002 Irish General Election as a case study, we demonstrate how multiple imputation can
be used to fill in the vote choices of non-voters. We verify simulations and reported turnout
against official data and compare the results to those from alternative, maximum likelihood
methods. While the methods differ in their ability to simulate vote choice correctly, these
differences are generally not large enough to affect the counterfactual estimation of election
results under universal turnout. To asses the generality of this finding, we also compare the
different methods across 30 elections in the Comparative Study of Electoral Systems dataset.
Multiple imputation produces on average higher turnout effects than multinomial logit
methods and the differences increase as turnout goes down. System variables such as the
number of parties do not affect the differences in results between methods.
Acknowledgements
Previous versions of this article were presented at the annual conference of the Political
Studies Association Specialist Group on Elections, Public Opinion and Parties (EPOP), 8-10
September 2006, Nottingham, and at the 65th Annual National Conference of the Midwest
Political Science Association, April 12-15, 2007, Chicago, IL. We would like to thank Brad
Gomez and three anonymous reviewers for the Journal of Elections, Public Opinion and
Parties for very helpful comments. Patrick Bernhagen would like to acknowledge grant
support from the British Academy.
2
Missing Voters, Missing Data: Using Multiple Imputation to Estimate the
Effects of Low Turnout
In his 1996 presidential address to the American Political Science Association, Arend
Lijphart (1997) drew attention to the problem that less than full turnout might affect election
results and lead to the under-representation of the political interests of some groups of
citizens, such as ethnic minorities or low-income groups. Recent elections in Germany and
the USA revived the notion in public discourse that decreasing turnout is bad for social
democratic parties or that higher turnout would benefit the Democratic candidate. The
political consequences might extend well beyond the realm of electoral representation. For
example, class differences in voter mobilization have been shown to affect welfare spending
(Hill, Leighley and Hinton-Andersson, 1995). Numerous attempts have been made to assess
the veracity and relative importance of this claim: If turnout went up, would it affect the
election result? Which parties would gain and which would lose? Different methods have
been proposed to estimate the political effects of low turnout, often yielding different answers
to these questions.1 This article contributes to the methodological discussion by assessing the
performance of multiple imputation in the estimation of the partisan effects of low voter
turnout.
We proceed as follows: After a brief overview of the main strategies for detecting
turnout effects, we introduce the idea of treating vote abstainers as missing data. Using the
2002 Irish General Election as a case study, we first demonstrate how a statistical model of
multiple imputation can be used to fill in the vote choices of non-voters. We compare the
results to those generated by more traditional vote-propensity methods and demonstrate the
1
A recent special issue of Electoral Studies contains studies of turnout effects in the context of large numbers of
national and supranational elections as well as referendums. Lutz and Marsh (2007) provide a thorough review
of the previous literature on turnout effects.
3
validity of the multiple imputation method by verifying estimates and reported turnout
against official data. Both the Irish polity and the Irish National Election Study have a
number of useful properties that enable us to verify our estimates against professed vote
choices and official records and evaluate the performance of different methods for
ascertaining turnout effects. Furthermore, we asses the generality of our results by comparing
turnout effects at 30 elections in 25 countries in the Comparative Study of Electoral Systems
dataset and identifying the institutional and structural circumstances in which the results from
the different simulation methods diverge. We conclude by evaluating the utility of multiple
imputation vis-à-vis traditional methods of estimating turnout effects and discuss the
implications for political scientist’s efforts to assess the political consequences of low
turnout.
Does Turnout Matter? An Overview of Methods and Findings
Political preference revelation at elections and other polls will be biased whenever turnout is
short of 100 percent and abstainers’ preferences are non-randomly distributed among the
electorate. When everyone votes there can be no bias in the representation of party
preferences. And if non-voters are a representative cross-section of the electorate, their
abstention will not increase the influence of any one group of voters at the expense of
another. But vote abstention seems far from random and for some time now political
scientists have been able to identify those segments of the electorate that are least likely to
vote (Brady, Verba and Schlozman, 1995; Norris, 2002, 83-100; Rosenstone and Hansen,
1993). At the same time, the ability to estimate the size and direction of the political bias that
results from non-random voter abstention is severely hampered. We frequently know how
voters differ from non-voters with respect to various sociodemographic characteristics or
political attitudes. We can also determine the preferences of those voters that are in many
4
important social and political respects like non-voters and in that way estimate the
hypothetical vote choices of nonvoters. Finally, we can ask non-voters how they would vote
if they voted. But we cannot know with certainty what the vote choice of non-voters would
be if they actually turned into voters. Together, these hypothetical vote choices make up what
we refer to as turnout effects – the effects that higher or lower turnout would have on an
election outcome, where election outcome is defined as the vote shares of the parties (cf. Lutz
and Marsh, 2007: 541). Any attempt to estimate such turnout effects is subject to the
fundamental methodological problem that the demonstration of political inequalities resulting
from unequal turnout requires knowledge of how non-voters might behave were they not
non-voters. The question is, how can we know the parameters that govern this counterfactual
state of the world?
In the presence of these limitations, three strategies can be distinguished among
efforts to estimate counterfactual voter behaviour and its effects on election outcomes. One
approach uses opinion polls to consider whether voters and nonvoters differ in any significant
way on the dimension of partisan identification or with respect to various policy-related
issues. Some studies use survey data to compare the attitudes of voters and non-voters with
respect to social and economic policy issues or general attitudinal dispositions (e.g., Bennett
and Resnick, 1990), while others ask more specifically about partisan identifications and
preferences (e.g., Highton and Wolfinger, 2001). But querying nonvoters as to whether their
attitudes differ from those of voters is not the same as to estimate how these people would
behave in actual elections: the propensity to vote is part of an individual’s attitudinal make-up
and it is possible that, as this aspect changes, other elements such as party or policy
preferences change too.
A second approach examines election studies and official election results for evidence
of turnout effects. This involves regressing the vote share of certain types of parties and
5
candidates (usually left-of-centre ones) on aggregate turnout and a variety of control variables
(DeNardo, 1980; Radcliff, 1994; Pacek and Radcliff, 1995). But the use of aggregate data by
this approach poses an ecological inference problem: the fact that turnout rates in a group of
elections are similar does not imply that in each of the elections the types of individuals
abstaining are the same (Herron, 1998: 6-7).
A third approach to estimating turnout effects, finally, tackles these problems by
analyzing individual-level behaviour using the information contained in reported electoral
behaviour to estimate the propensity to vote for a particular party and then compare (selfreported) voters and non-voters (e.g., Brunell and DiNardo, 2004; Citrin et al., 2003;
Martinez and Gill, 2005). In this approach, used for various analyses of US elections, the vote
choices of nonvoting individuals are predicted using coefficients from maximum likelihood
regression analysis of the behaviour of voters who share similar sociodemographic and,
sometimes, political characteristics.2 These votes are then added to the observed vote choices
after a weighting has been applied to account for differential propensity to vote.
This approach has its own drawbacks. Firstly, in common with the previous approach
it works on the assumption that nonvoters would behave like those voters with whom they
share a set of sociodemographic and attitudinal characteristics. This assumption is
problematic. Sociodemographic and attitudinal correlates of vote choice alike may provide a
weak basis for estimating the preferences of abstainers, not least because abstainers might be
those with preferences that run counter to the norm in their respective social group. Secondly,
simulations of the vote choices of nonvoters are often based on a restricted set of variables,
consisting mainly of demographics (e.g., Citrin et al., 2003; Brunell and DiNardo 2004).
Martinez and Gill’s study (2005) improves on this by employing an expanded set of predictor
2
This strategy is also followed by Tóka’s (2002) analysis of elections in the Comparative Study of Electoral
Systems dataset, although he uses discriminant analyses to estimate his vote choice models.
6
variables of vote choice and turnout in their model. But attitudinal variables tend to have
more missing values than demographics and, with listwise deletion as the default method of
dealing with missing data, more variables usually mean fewer cases. Thus, the multinomial
logit methods employed by these authors involve a trade-off between the richness of the
model on one hand and the number of cases to be used as a basis for prediction on the other.
This is a choice between a loss of information and potential bias incurred through an
underspecified model and a loss of information and potential bias stemming from deletion of
a considerable number of cases available for fitting the model.3 By opting for the former,
Citrin et al. (2003) avoided data attrition at the expense of a richly specified model for
predicting vote choice.4 By contrast, Martinez and Gill (2005) use a more fully specified vote
model but at the cost of losing a considerable amount of data. At best, this leads to a loss of
information and an increase in variance around the predictions. At worst, missingness is
correlated with either vote choice or vote propensity, or both. Hence, Martinez and Gill’s
(2005) model for predicting vote choice may also be mis-specified as it omits the selection
variable, and their estimates of turnout effects may be incorrect. In the following section we
describe an alternative method that addresses these limitations.
Multiple Imputation
Voting is an individual choice, which makes the sample of voters a self-selected one (Dubin
and Rivers, 1989). The reason why we do not have data on how some citizens voted is simply
3
In fact, because the problem of sample selection is one of omitted variable bias (Heckman, 1979; Dubin and
Rivers, 1989), the loss of observations is strictly speaking also a problem of model specification.
4
This is not to insinuate that Citrin et al. made a deliberate decision in favor of larger N and against a fully
specified model. Rather, as Citrin et al. analyze state-level elections in the USA, their data restrictions result
from the census data they used, which are the best data available for their purpose but do not contain attitudinal
variables.
7
because they chose not to vote. This means that the problem of less-than-full turnout is
analogous to the problem of missing observations in any statistical analysis. If missing data
points are uncorrelated with the error term or with the variables of interest, they pose few
problems beyond reducing the number of observations and thereby increasing the variance of
the estimates of parameters or quantities of interest. But if the sampling fractions are
correlated with the errors or the dependent variable, estimates may be biased. In the case of
voter turnout, we have reason to suspect such correlation: while studies of turnout effects
may have failed to establish a robust and directional link between turnout and vote choice,
their findings on the whole suggest that abstention is not random in relation to vote choice
(e.g., Norris, 2002, 83-100).
Treating the unobserved vote choices of non-voters as missing data points means that
we assume the vote choices of non-voters constitute data that actually exist but have not been
observed or recorded because the potential voters’ proclivity to turn out the vote has not
exceeded a certain threshold. Those who find this assumption difficult to accept might find it
more agreeable to think in terms of preferences instead of vote choices: if we assume that
individual preferences over parties and candidates are measured by votes, data on these
preferences are missing for non-voters (Dubin and Rivers, 1989, p. 383). The question of
turnout effects is whether or not the decision to vote and the decision which party to vote for
are related. This question can be tackled using available techniques for the imputation of
missing data. “Ad-hoc” methods of addressing missing data such as filling in means or
imputing predicted values from regression analysis based on the observed data points
(“conditional means imputation”) at best understate variability and at worst induce bias
(Allison, 2002; Horton and Kleinman, 2007). Imputations arrived at by these methods will be
unbiased only if the probability of missing data on any variable is unrelated to the values of
the variable itself or the values of any other variables in the data. In the context of turnout
8
effects, predictions from regression analysis therefore run the risk of producing wrong
estimates of nonvoters’ party choices while at the same time suggesting a false sense of
accuracy of these estimates.
The risk of biased estimates can be greatly reduced by the use of maximum likelihood
estimation. But maximum likelihood predictions of vote choice such as those reported by
Citrin et al. (2003) or Martinez and Gill (2005) may still be biased if the probability of any
data point missing depends on both the dependent and independent variables – which is of
course precisely the conjunction that is suspected to render turnout problematic for election
outcomes. Given the robust findings in the literature on the sociodemographic and attitudinal
determinants of vote choice (Nie, Verba and Petrocik, 1980) and equally robust evidence
concerning the individual-level correlates of turnout (Wolfinger and Rosenstone, 1980), the
possibility of missingness being jointly contingent on both dependent and independent
variables cannot be dismissed.
In this situation, multiple imputation (MI) is an efficient method of arriving at
estimates of non-recorded vote choices. Bernhagen and Marsh (2007) use this method to
estimate turnout effects at 30 elections in 25 countries from the Comparative Study of
Electoral Systems (Module I). While their model specifications closely resemble Tóka’s
(2002) and Martinez and Gill’s (2005) models, their reported changes in the parties’ fractions
of the vote are generally higher than those generated by the multinomial logit or discriminant
analyses of these studies. They find the change in parties’ vote share before and after
imputation is mainly a function of turnout, that is, of the scope for change, which increases
with declining turnout. At the level of parties, Bernhagen and Marsh found no evidence for
either left, right, or centre parties gaining from full turnout scenarios. However, they did find
that non-governing parties typically benefit from full turnout, as was originally suggested by
DeNardo (1980), and that smaller parties would gain from full turnout. More generally,
9
Bernhagen and Marsh found that full turnout would on average reduce the gap between the
strongest and second-strongest party. In one case, the US Congressional election of 1996, this
would have led to a different party coming in first in the election.
Originally proposed by Rubin (1976), MI involves three steps. First, plausible values
for missing observations are created that reflect uncertainty about the nonresponse model.
These values are used to “fill-in” the missing data points. This process is repeated, resulting
in the creation of a number (usually 5-10) of “completed” datasets. In a second step, each of
these datasets is analyzed using standard methods. In the case of estimating turnout effects,
this simplifies to estimating the vote proportions of the different parties. Thirdly, the results
are combined, which allows the uncertainty regarding the imputation in step one to enter the
final result. To apply this method to the problem of low turnout, we have to assume that the
probability of an individual’s vote choice remaining unrecorded may depend on the observed
values of other variables, but, after controlling for these variables, is independent of any other
missing information. In the terminology established by Rubin’s (1976) classification of data
missingness, that is to assume the data are missing at random (MAR), i.e., missingness is
random after controlling for missingness due to observed quantities. Formally, if there are
two variables X and Y, where X is always observed and Y is sometimes missing, MAR
means,
Pr(Ymissing | Y , X )  Pr(Ymissing | X ) .
Of course, most multivariate datasets will contain missing values on several, perhaps
even on most, variables. Moreover, it is impossible to test whether the MAR condition is
actually satisfied (Allison, 2002: 4). However, if at least one element in a vector of
independent variables X is fully observed, we can assume that the data are MAR, conditional
on the imputation model (King et al., 2001: 53). Furthermore, the MAR assumption can be
10
made more realistic by including more informative variables in the imputation process
(Collins, Schaffer and Kam, 2001).
To estimate turnout effects by way of multiple imputation we use the Amelia II
program written by Honaker, King and Blackwell.5 Amelia II uses an expectationmaximization (EM) algorithm to generate values in place of missing observations. While the
likelihood conditional on the observed (but incomplete) data cannot be easily constructed, the
likelihood of a rectangularized data set (i.e., one for which all cells are treated as observed) is
easy to construct and maximize, especially under the assumption of multivariate normality.
The EM algorithm rectangularizes the data set by filling in estimates of the missing elements,
generated from the observed data. In the E-step, missing data points are filled-in using linear
regression, with their expected values conditional on the current estimate of the sufficient
statistics and the observed data. In the M-step, a new estimate of the sufficient statistics is
computed from the current version of the completed data (see Honaker, James and King,
2006 for a detailed exposition).
The multiple imputation procedure is not intended to create causal explanation or
parameter interpretation (King et al. 2001). The algorithm imputes values to all empty cells in
a dataset loaded by it and does not discriminate between variables according to their status as
dependent or independent variables in a regression model. Thus, MI imputes values on the
independent variables as much as on the dependent variable, and the imputed values of the
dependent variable are then used to improve the imputation of independent variables, and
vice versa.
EM always starts with the full covariance matrix, which means that it uses all the
available variables as predictors for imputing the missing data (Allison, 2002: 20). Thus, the
5
The software, Amelia II: A Program for Missing Data (version 1.1-6 beta, July 18, 2006) is freely available at
<http://gking.harvard.edu/amelia/ >.
11
MI model uses more information, both in terms of more variables and more observations than
the approaches of Citrin et al. (2003) and Martinez and Gill (2005), each of which delete
considerable amounts of existing data. This may well alter the simulation of non-voters’ vote
choices and lead to different simulations of election outcomes. It also means that the standard
errors of the resulting estimates will generally be too low, as the estimator assumes that there
are complete data for all cases. The solution to this problem is to repeat the EM-based
imputation process m times to produce m “complete” datasets. If random draws from the
residual distribution of each imputed variable are made and added to the imputed values,
estimates of the parameters of interest will be slightly different depending on which imputed
dataset is used. This variability can be used to adjust the standard errors upward by averaging
the parameters of interest and combining their standard errors according to a formula devised
by Rubin (1987). For the analysis of turnout effects, this involves obtaining the probability of
voting for a particular candidate or party for each imputed dataset (j = 1, ..., m) and averaging
the m values. The standard error for the estimate is obtained in three steps: (1) the standard
errors for the m point estimates are squared and then averaged; (2) the sample variance in the
point estimates across the data sets is calculated; (3) the results from (1) and (2) are added
together, weighted by a factor that corrects for the bias resulting from m < ∞, and the square
root is taken (Allison, 2002: 29-30). As vote choice is a categorical variable measuring
choices among k candidates, as is the case in most elections, we generate multiple
imputations for k dummy variables created from the categories of the original vote choice
variable.
Data: The 2002 Irish National Election Study and the Comparative Study of Electoral
Systems
12
The first data we use to demonstrate the MI method of simulating higher voter turnout are
from the 2002 Irish National Election Study.6 Several aspects make Ireland an important case
study for the assessment of the political effects of low voter turnout. Firstly, among European
countries, the Republic of Ireland has fared notably badly in terms of election turnout.
Average turnout at general elections has been only slightly above 70 percent since the 1970s,
hitting a low at the 2002 election with 63 percent. Secondly, Ireland’s low turnout facilitates
the estimation of tangible turnout effects, as any biases will be larger the bigger the share of
vote abstainers among the electorate. Together with the previous point this means there is
both more chance of observing a bias in the first place and more chance that such a bias can
have a significant impact on the result (Kohler and Rose, 2008). Thirdly, the country’s STV
electoral system leads to a fairly proportional translation of seats into votes, which reduces
the amount of strategic voting among the observed votes that provide much of the
information for simulating the behaviour of nonvoters.
There are also a number of desirable properties of this particular election study that
facilitate our efforts of estimating turnout effects and gaining a sense of how valid our
method is. The Irish election study contains official data on individual turnout, allowing the
validation of voters and weight corrections for over reporting. Furthermore, it contains the
stated preferences of those who reported a vote, even if the official record indicated that the
vote was not cast, as well as the hypothetical preferences of many who said they had not
voted but told the interviewers how they would have voted. This will be useful in evaluating
the MI method for assessing the impact of low turnout on the outcome. The dataset contains a
range of demographic and political variables for 2,663 individuals. 2,391 respondents have
reported their vote choices at the 2002 election by either agreeing to fill out a ballot paper
6
The 2002 Irish National Election Study was funded under the PRTLI/National Development Plan: see
www.tcd.ie/ines.
13
during the interview or simply reporting their first preference vote.7 Of these, 1,835 have
been officially validated as having voted.
The MI procedure can make use of an extensive set of variables that can plausibly be
suspected to be related to vote choice. We include gender and age, as these variables are often
found to be influential in determining candidate or party preference. Additional
sociodemographic variables, such as union membership, education, income, urban versus
rural residence, religious denomination and language are also included, because it has been
conjectured that these sociodemographic variables influence either turnout or vote choice or
both. Beyond these sociodemographic characteristics, we include respondents’ evaluations of
the economy and of other policy areas (health and housing), as well as a measure of political
knowledge. Above all, however, we are able to impute missing data points based on the
reported party and party leader preferences of voters and non-voters as recorded by reported
probability to vote (PTV) and thermometer scales. Thus, the extensive set of data employed
here enables us to go well beyond the sociodemographic correlates of turnout. The inclusion
of party and leader preferences and evaluations at least allows for the possibility that
abstainers might be those with preferences that run counter to the norm in their social groups,
thus providing a significant improvement over simulations based solely on demographics.
Furthermore, the political variables add to the overall richness of the set of predictors in the
model, thus making the MAR assumption more realistic. A list of the variables included in
the multiple imputation model is provided in Appendix A.
7
Respondents were asked to fill out a ballot paper for the relevant constituency. Those who sad they voted in
May 2002 were asked, “Please fill it in as you did on polling day – as best you can remember.” Respondents
who said they did not vote were asked, “Suppose you had voted in the May 2002 election, how would you have
filled in the ballot paper on election day?” Respondents who declined to fill out the ballot paper were asked,
“Could I ask which party you gave your first preference vote to?”.
14
The particular properties of the Irish election study may mean that findings based on
this case are not necessarily generalizable to other contexts with different characteristics,
such as political systems with fewer parties or larger incentives for strategic voting. To better
judge the generality of our findings from the Irish case, we therefore replicate Bernhagen and
Marsh’s (2007) analysis of turnout effects at 30 elections in 25 countries from the
Comparative Study of Electoral Systems (Module I). These surveys were conducted between
1996 and 2002 and at a functionally equivalent point in time: when a national election was
taking place. This maximizes unit homogeneity across elections that differ in respect of
important variables such as the number of parties or extent of voter participation.
Analysis and Results
Existing theories and evidence about the reasons why people do not vote suggest a number of
patterns that can be expected. A well-established argument claims that non-voters are nonvoters because they have little contact with agencies of mobilization (Rosenstone and
Hansen, 1993; Brady, Verba and Schlozman, 1995). Following this logic, we might expect
the larger Irish parties, Fianna Fáil (FF) and Fine Gael (FG), to be more attractive to potential
supporters than smaller parties, as well as being more effective at mobilizing any latent
support. Consequently, those who stay at home might be more likely to have a preference for
smaller parties, and smaller parties such as Greens, Labour, and Progressive Democrats (PD)
should benefit most from “complete” turnout.8 A second argument sees non-voting as a sign
of disaffection (Crozier, Huntington and Watanuki, 1975; Gurr, 1970). Non-voters are more
8
We do not expect Sinn Féin to fall within the domain of this argument because this party is renowned for its
resourced and resourceful local campaigns that match those of the larger parties.
15
detached from the established political system, and if they did vote it would tend to be for
more radical parties, both right and left. Applying this logic to the 2002 Irish case we might
expect more radical parties, such as the Greens or Sinn Féin (SF) to benefit from 100 percent
turnout. Furthermore, we expect parties with a strong working class support base to benefit
most from complete turnout. In many European countries these will be left-wing, socialist or
Labour parties, but given what we know abut party affiliations in Ireland, we expect Sinn
Féin to benefit most from higher working class turnout, followed by Fianna Fáil.
While we have referred so far to voters and non-voters as separate groups of people,
people may move both into and out of the electorate over time. For example, habitual voters
may abstain because they are acutely unhappy with the incumbent performance of their
traditional party. This may be due to the state of the economy or the likely failure by any
government to deliver on some of their election pledges. At any rate we know that governing
parties tend to lose votes (Nannestad and Paldam, 2002), and some of that loss will be due to
abstention by erstwhile supporters. This would imply that incumbent parties (Fianna Fáil,
Progressive Democrats) benefit most from “complete” turnout.9
[Figure 1 about here]
Figure 1 displays the vote shares of each party and the residual group of
independents/others as quadruplets of bars. For each party, the first bar represents the party’s
vote share as recorded from the 1,835 validated voters in the dataset, weighted by
demographic corrections for survey bias. This is contrasted with the second bar based on
counterfactual full (100 percent) turnout. The full turnout figures are composed of the 1,835
9
We should also allow for the opposite effect of turnout increases identified by DeNardo (1980) for a series of
US Congressional elections in the 1960s and 1970s. According to DeNardo, turnout boosts tend to harm the
incumbent party as they involve the mobilization of “peripheral” voters, who respond in a rather fickle fashion
to short term campaign effects.
16
vote choices of validated voters, the 148 imputed vote choices of people that did vote, but did
not divulge their choice, and the imputed vote choices of 680 non-voters. The imputations
have been obtained using information on the full range of variables listed in Appendix A.10
The error bars represent 95 percent confidence intervals. To assess the robustness of the
imputations to different imputation models, we also obtained imputations based only on,
firstly, socio-demographic variables and, secondly, utility and thermometer scales of parties
and party leaders. Imputations on the vote variable are influenced by the observed votes when
either of these subsets of variables is used in the imputation process, while imputed votes
differ more from the observed votes when the maximum set of information is used (see
Appendix B).
In order to compare the MI results to maximum likelihood simulations of full-turnout,
we replicate the procedures employed by Citrin et al. (2003) and Martinez and Gill (2005).
This involves modelling vote choice at the 2002 election and using the estimated coefficients
from multinomial logit estimation to calculate for each non-voter the predicted probabilities
of vote choice for the various parties. To obtain the parties’ vote shares under a full turnout
scenario, we then add the estimated vote choice of non-voters to the observed vote
10
The models employed here replicate those used in Bernhagen and Marsh (2007) and are fully justified there.
This is because in a later stage of this analysis we use the same CSES data sets as used by that study and want to
maintain comparability. We could have used a different model for the Irish analysis here but chose not to do so
on grounds of consistency. In doing so, we ensure that our imputation model corresponds to with previous
research on Irish election turnout. There are very few models of turnout. Lyons and Sinnott (2002) and Marsh et
al. (2008) detail mobilization and resource effects as well as socio-demographics – all key elements of the
model used here. While these studies make the point that non-voting is heterogeneous, with some abstaining for
circumstantial reasons and some more deliberately, this argument, if correct, is not only appropriate to Ireland
(Blondel et al., 1998). In any case, the CSES data sets do not include the variable operationalizing this
categorization.
17
proportions, weighting the components by the actual turnout rate of 63 percent.11 Following
Citrin et al. (2003), the first vote choice model includes gender, education, income, age, and
urban-versus-rural residence.12 In a second step, we add a comprehensive set of nondemographic variables to specify a model analogous to Martinez and Gill’s (2005). This list
of variables is identical with the one used for the MI simulation procedure above. Thus, the
third bar represents the simulated vote share based on predictions from multinomial logit on
demographics only (Citrin et al.’s approach). The fourth bar represents the simulated vote
share based on predictions from multinomial logit using the full set of information available
in the election study for all cases with no missing data on variables other than vote choice
(Martinez and Gill’s approach).
The most notable result is that the simulated increase in turnout from 69 (in the
survey) to 100 percent does not lead to any radical changes in the vote shares of the parties –
regardless of the simulation method used. The single biggest change through full turnout is a
loss of about two percentage points for Fianna Fáil, matched by an increase of roughly one
percentage point each for Fine Gael and Sinn Féin. Even the two percent drop in the Fianna
Fáil vote share, however, is well within the five-percent margin of uncertainty indicated by
11
As we use survey data, the “observed” vote choices have not been observed in the strict sense. For clarity and
ease of presentation, we will use the following nomenclature: “observed” vote choice is the vote choice reported
by respondents in a national election study who have been verified as having turned out to vote according to
official records. “Professed” vote choice is the vote choice reported by respondents who said they cast a vote in
the actual election but did not actually do so. This latter category also included the choice of non-voters
indicating how they believe they would have voted had they gone to the polls.
12
Citrin et al. also include race as a key variable. However, race is not a noteworthy issue in Irish electoral
behavior and the Irish election study contains no information on ethnic or race variables. Instead, we include
urban-versus-rural residence, as this variable is a key correlate of Irish voter behavior (cf. Marsh et al., 2008:
164-79).
18
the error bars. Thus, while the literature offers many reasons to expect the costs and benefits
of full turnout to be unevenly distributed across parties, this analysis of the 2002 Irish
General Election suggests the impact would be marginal, and that, indeed, we cannot be sure
that it would have any differential impact at all.
Comparing the MI simulations with simulated election results based on predictions
using multinomial logit coefficients from a demographic turnout model, we can see that the
latter are quite similar to those generated by MI. Full turnout simulations using this method
are slightly less “full” (N=2,570) than those arrived at by MI. This is due to small amounts of
missing observations on several predictor variables, which led to the loss of almost 100 cases
by way of listwise deletion. For some parties, the demographics-based multinomial logit
simulations are between the MI simulations and the actual result; sometimes they are a little
above or below the one or the other. While the turnout effects suggested by MI are overall
slightly more pronounced than those picked up by the different multinomial logit models, the
deviations from either the actual or the MI results do not follow a systematic pattern. If
anything, the demographics-based multinomial logit simulations suggest slight gains for
smaller parties from full turnout, at the expense of the larger parties. These observations also
hold for the richer multinomial logit simulations as suggested by Martinez and Gill. Because
of the expanded number of independent variables in this vote model, many of which have
missing observations, listwise deletion leads to the loss of over 400 cases.
Just how much faith can we have in the MI method and how good is it at estimating
the vote choices of non-voters? We cannot answer that question definitively, for the same
reasons that make this or any other simulation strategy necessary in the first place. But, to
begin with, the MI results are confirmed by multinomial logit estimates following the
strategies of Citrin et al. (2003) and Martinez and Gill (2005). However, as the multinomial
logit estimator is no more efficient than MI, different standards have to be applied. Therefore,
19
we conduct two further tests to explore the validity of the imputations. First, the 2002 Irish
Election Study records how non-voters said they had voted or said they would have voted.
Thus, we know their survey responses to the question of how they voted (if they falsely claim
they had voted) or how they think they would have voted had they actually voted. After
verifying that these respondents really are non-voters, we can compare their professed vote
choices with the simulated votes of the same individuals obtained through MI. The sub
sample for this analysis comprises of those 556 respondents that are known not to have voted,
but who reported a vote choice during the interview.
The results are displayed in Figure 2. The general picture that emerges is consistent
with the differences between the observed and simulated full turnout. As before, the
simulations from multinomial logit generally trail the prevailing pattern. For the larger parties
the demographics-based predictions are “between” the professed and the imputed vote shares,
while they tend to give larger gains to Sinn Féin and independent candidates than either what
respondents would claim or what the imputation algorithm would fill in. The simulations
based on the full multinomial logit model follow that random pattern, being at times closer to
respondents’ assertions than either MI or predictions from the demographic multinomial logit
model, at times further away, and sometimes in between. Even when comparing purely
simulated voting behaviour we find that the differences between MI and either of the
multinomial logit alternatives are not on the whole significant. The only exception is the
simulated vote share of the Progressive Democrats, which according to the simulations based
on Martinez and Gill’s model is 5.8 percent and lies outside the 95 percent margin of error of
the MI estimate (3.4 percent +/- 2 percentage points).
[Figure 2 about here]
It might be argued that the reported vote of actual non-voters is not necessarily a reliable
indication of how they would have voted had they done so. There may be a bias in favour of
20
the winners, for instance, and against the most obvious losers. Indeed, this appears to be the
case with respect to Fianna Fail and Fine Gael respectively. It might also be that these were
people who paid less attention to the campaign and gave less thought to their choice than
actual voters. In this respect the test is not an ideal one, as it cannot tell us whether it is the
MI procedure that was unreliable or the imagination of the respondents (Karp and
Brockington 2005). We can gain a better idea of the performance of the imputation procedure
by artificially creating a further set of missing values on the vote variable and using MI to reestimate the votes of these fake “nonvoters”. To do so, we first truncate the dataset to one that
contains only confirmed voters (N=1,835). We then impose the survey turnout rate of 69
percent anew by removing observations at random, before re-imputing the deleted
observations using Amelia. By cross-tabulating the distributions of observed and re-imputed
preferences we can gauge how close the individual imputations are to the recorded votes.
Lastly, we can obtain similar cross-tabulations for imputed and professed votes reported in
Figure 2 above and compare the closeness of the two sets of cross-tabulations. This will
indicate how MI performs as an approximation of actual vote choices vis-à-vis professed
votes.
The results are displayed in Table 1. The general picture that emerges from panel (a)
is that the distributions of observed and re-imputed preferences are quite similar. This is all
the more significant as the re-imputations of artificial non-voters have to make do with much
smaller numbers of cases. However, we are primarily interested in the accuracy of imputation
within each party group: to what extent does the imputation procedure correctly identify
those who vote Fianna Fáil, or Fine Gael and so on?
[Table 1 about here]
The table can be read in two directions: down and across. Reading across, the
accuracy seems poor. Except in the case of Fianna Fáil and Fine Gael, less than half of the
21
voters for any party are correctly identified in either panel of the table. However, this may
partly be an artefact of the EM algorithm, which will be biased towards the larger parties.
More appropriate is to read downwards: it can then be seen that even in the case of the
smaller parties the imputations are much more likely to predict the true choice with accuracy
than it is to make any other prediction. Lastly, we might expect the imputations to be closer
to the “real” ones in the case of the artificially created non-voters than in the case of the
actual non-voters (panel b), on the basis that the choice made by the former was a realized
intention and that of the latter at best an intention and at worst something respondents make
up. This difference in closeness is shown in panel (c): The bigger the positive difference, the
more our re-imputation of real vote choices outperforms our imputation of the professed
choices of non-voters. Negative figures indicate that our implications are closer to the
professed vote choices than they are to the actual ones. The expectation that MI simulations
resemble actual votes more than professed ones bears out for most parties, but not for Fianna
Fáil or the Progressive Democrats.13
As the “artificial” abstainers were chosen at random from among actual voters, the
MAR assumption is arguably more likely satisfied than in the context of the other MI
simulations reported in this article.14 The fact that our re-imputation of the vote choices of
artificial non-voters outperforms our simulation of the professed votes of actual non-voters
(who have been presented with the problematic task of answering counterfactual questions)
further validates the imputation model. While potentially extraneous information about
13
These were the two government parties both before and after the 2002 election. It is unclear to what extent
this pattern reflects respondents’ impulse to side with the winners.
14
However, as these are random draws from a sub sample that can itself not be considered random with respect
to vote choice, the “missing” vote choices of the newly created non-voters’ are not missing completely at
random.
22
socioeconomic determinants of turnout decision and vote choice may not aid the imputation
algorithm in making the right imputations, it appears at worst irrelevant.
[Table 2 about here]
How do the multinomial logit strategies of estimating turnout effects perform in this
respect? Table 2 repeats the cross-tabulations for predictions based on Citrin et al.’s model.
Here never more than half of the voters for any party are correctly identified in either panel of
the table. And only in the case of Sinn Féin are the simulations notably more likely to predict
the true choice with accuracy than to make any other prediction (panel a). Furthermore, for
most parties the simulations are closer to the professed votes of non-voters than they are to
the real votes of the artificially created non-voters, although the differences are very small
(panel c). In other words, while MI obtains its best results when estimating what people really
did, multinomial logit estimation using demographics is hardly better at replicating what
people did rather than what they say they did, and it matches either data rather poorly.
[Table 3 about here]
This does not necessarily mean that MI performs per se better than multinomial logit.
While the model replicating Citrin et al.’s method had to make do with a significantly smaller
number of predictor variables, relying on demographics only, a fairer comparison might be
one between MI and multinomial logit based on the full set of demographic and political
variables as used by Martinez and Gill (2005). The predicted vote choices of real and
artificially created non-voters from multinomial logit estimation of a comprehensive model
are presented in Table 3. Despite the inevitably considerable attrition, these multinomial
logit-based simulations of vote choice perform well, attributing the correct vote choice more
than two thirds of the time for all parties (but only about half the time for the independent
candidate). And while the simulations are also closely correlated to non-voters’ professed and
hypothetical votes, the match is again best for re-estimation of the deleted votes of actual
23
voters.15 Indeed, the match for re-estimation of the deleted votes of actual voters is overall
better for multinomial logit than it is for MI, although the latter has been able to make use of
a significantly larger number of observations.
Despite these differences, the main finding that emerges from this analysis of the
2002 Irish election is that the three methods for simulating the vote choices of non-voters
lead to substantially similar results. To gauge to what extent this finding is particular to this
case, we also use the three methods to simulate full turnout at 30 elections in 25 countries
from the Comparative Study of Electoral Systems (Module I).16 These elections capture the
full variation of systems of government, electoral systems, party systems and voter turnout. In
order to express the net differences between the observed and the hypothetical vote at the
national level, we use the Gallagher Index of Disproportionality. Originally designed to
measure the difference between the distributions of votes and seats in an election, this index
uses squared differences of the proportions, thereby avoiding the problem that changes to
party vote share cancel each other out while giving larger weight to the big vote share
changes for individual parties (Gallagher, 1991).17 The index provides an ideal tool for
comparing the observed-versus-imputed vote bias across diverse elections.
15
Again, we find that the edge of re-estimation of the deleted votes of actual voters over simulating professed
votes is somewhat reduced in the cases of the two government parties.
16
The study contains data on two successive elections in Mexico (1997 and 2000) and Spain (1996 and 2000).
To control for distinct electoral cleavages, data on Wallonia, East Germany, and Scotland are treated as separate
elections.
17
The index is calculated as the square root of
1 n
 (vc  vo )2 ,
2 i 1
where vc denotes the vote share based on 100 percent (“complete”) turnout and vo denotes the observed vote
share of the n competing parties.
24
Figure 3 shows the distribution of the resulting disproportionality scores across the 30
elections in the CSES data as well as the 2002 Irish election. Looking at this range of
elections from different countries, many of the simulated election results under full turnout
are less similar across the three methods than in the Irish case. Firstly, MI leads to bigger
turnout effects on average. The mean disproportionality score using MI is 4.25 (S.D. = 2.25),
compared to 1.95 (S.D. = 1.33) when the Martinez and Gill method is used and .9 (S.D. =
.74) in the case of the method used by Citrin et al. Bivariate correlations are moderately high
and statically significant between disproportionality scores based on MI and Martinez and
Gill’s method (r = .41, p = .02) and between the disproportionality scores using Martinez and
Gill’s and Citrin et al.’s methods, respectively (r = .37, p = .04), but not between the MI and
Citrin et al. based disproportionality scores (r = .27, p = .14). This suggests that the model
used by Martinez and Gill assumes a middle position between simulation on logit predictions
and MI.
[Figure 3 about here]
Looking at the differences in correspondence of the three disproportionality scores
across elections, few systematic sources of variation can be discerned. Instances in which
election results amended by MI simulations differ starkly from simulations using multinomial
logit predictions include proportional representation systems such as the Spanish one as well
as first-past-the-post systems like in Canada. However, cases like the Spanish elections or
New Zealand in 1996 suggest that discrepancies between the different simulation methods
might vary with the number of parties. To investigate this possibility, we regress differences
between MI and each of the two multinomial logit-based simulations on the number of
parties fielded in each election (Figure 4). The measure for the difference between any two
disproportionality scores is again Gallagher’s Index of Disproportionality. The mild positive
association between the number of parties and discrepancy between disproportionality scores
25
is not statistically significant, suggesting that the differences between MI and multinomial
logit predictions pertain as much to elections in two-party systems as to those in multi-party
systems.
[Figure 4 about here]
The only systematic pattern that seems to appear is that the three disproportionality
scores differ less in the context of elections with high turnout. We test this more directly by
regressing the disproportionality scores expressing the difference between MI and each of the
two multinomial logit disproportionality scores, respectively, on turnout. Figure 5 shows that
this expectation bears out: A ten percent increase in turnout reduces the difference between
MI and multinomial logit based simulations by almost a full unit on both disproportionality
scores (which range from .66 to 9.28 in the demographics only case and from .5 to 10.23 in
the case of the full information model). Thus, the differences between MI simulations of
counterfactual voting behaviour and multinomial logit predictions of the same matter most in
the context of low turnout elections.
[Figure 5 about here]
Discussion and Conclusions
The question of whether turnout matters for election results can only be answered if we can
say with some degree of certainty how the abstainers would have voted had they voted. In
this article we have assessed and compared multiple imputation as a method of ascertaining
the impact of turnout on election results. Our findings suggest that we can have reasonable
confidence in the MI method of estimating turnout effects: two validity tests produced good
results. The MI results are also partly matched by simulations using multinomial logit models
as proposed by Citrin et al. (2003) and Martinez and Gill (2005).
26
Beside a relative similarity to the results from multinomial logit estimation of full
turnout election results, MI simulation has a number of advantages. Firstly, it provides us
with a measure of confidence reflecting the uncertainty of the imputation method as well as
uncertainty fundamentally inherent in the world. Secondly, simulation by MI rests on
assumptions about the relationship between turnout and vote choice that are less demanding
and therefore more realistic than those underlying simulations based on multinomial logit
predictions. Thirdly, multinomial logit strategies face a trade-off between the richness of the
model and the number of cases to be used as a basis for prediction on the other. In the cases
of the Citrin et al. and Martinez and Gill studies, this amounts to a trade-off between a model
based on demographics only, which tend to have fewer missing values than attitudinal
variables, and a fully specified model. This trade-off necessitates a choice between the loss of
information and potential bias incurred through an underspecified model or the loss of
information and potential bias stemming from deletion of a considerable number or cases. By
imputing missing data among all variables including the vote choice variable, MI simulation
avoids this trade-off. The MI approach is therefore likely to be more efficient than
multinomial logit to the extend that it can utilize a wider range of variables, including, where
available, party measures of voter utilities as given by party and candidate thermometers and
PTV scores, without incurring a loss of observations. Comparing the predictions from models
following Citrin et al., and Martinez and Gill’s specifications, respectively, the lesson is to
prioritize a full model over thee need to maximize observations if that is the only choice
available. MI ensures that a third option is available.18
18
The latest version of Stata (release 11) provides MI routines that offer a range of multiple imputation
methods. For imputation of missing values on only a single variable, multinomial logit regression is offered.
This method enables researchers to simulate higher turnout using the method proposed by Martinez and Gill
(2005) but with the added benefit of estimating the uncertainty of the imputations based on averaging across a
27
In line with recent studies on turnout effects at elections around the world (cf. Lutz
and Marsh 2007), the main finding from the Irish case study is that the fortunes of the Irish
political parties at the 2002 election would have remained virtually unaffected by universal
turnout. One might speculate about the difference that a few more votes might have made to
the distribution of seats: Fine Gael might not have suffered quite such the meltdown (the
party incurred by far the largest losses of all parties from the previous election), and Fianna
Fáil might have needed support from independent TDs to form a government with the
Progressive Democrats. But while the empirical differences between MI and the two sets of
multinomial logit simulations are negligible in the case of the Irish election, an extension of
the comparison to the first CSES module suggests that discrepancies are the norm rather than
the exception. Differences become more marked the more nonvoters there are whose
behaviour has to be simulated in order to detect the political effects of high and low election
turnout.
Finally, a caveat. What we have done here is simply to assess the impact of full
turnout, which in practice can only ever be approximated even trough the use of compulsory
voting rules. Our analysis does not say what would happen if turnout rose by 5, 10 or 15
percentage points, and there are good reasons to expect that it would depend on what the
agency of mobilization is. The simulation of election results under full turnout is nonetheless
important – this is after all at the heart of the argument in Lijphart’s (1997) seminal article on
number (e.g. five) sets of imputed values. To the extent that this remedies the biggest weakness of the Martinez
and Gill method – the absence of a measure of uncertainty – Stata 11 can implement an “enriched” version of
their method. Alternatively, an iterative Markov chain Monte Carlo method is available to impute missing
values on all variables in the dataset, resembling the procedure implemented by Amelia and evaluated in this
article. Through iteratively improving imputations on all (dependent and independent) variables, this option
continues to have the added benefit of using all the information available in the data.
28
the topic. Future analyses should examine how MI performs relative to other approaches in
simulating incremental increases or decreases in turnout.
29
References
Allison, Paul D (2002) Missing Data (Thousand Oaks, CA: Sage).
Bennett, Stephen Earl, and David Resnick (1990) The Implications of Non-Voting for
Democracy in the United States, American Journal of Political Science 34: 771-802.
Bernhagen, Patrick, and Michael Marsh (2007) The Partisan Effects of Low Turnout:
Analyzing Vote Abstention as a Missing Data Problem, Electoral Studies 26(3): 54860.
Blondel, Jean, Richard Sinnott, and Palle Svensson (1998) People and Parliament in the
European Union (Oxford: Oxford University Press).
Brady, Henry. E., Sidney Verba, and Kay L. Schlozman (1995) Beyond SES: A Resource
Model of Political Participation, American Political Science Review 89: 271–294.
Brunell, Thomas. L., and John DiNardo (2004) A Propensity Score Reweighting Approach to
Estimating the Partisan Effects of Full Turnout in American Presidential Elections,
Political Analysis 12: 28–45.
Citrin, Jack, Eric Schickler, and John Sides (2003) What if Everyone Voted? Simulating the
Impact of Increased Turnout in Senate Elections, American Journal of Political
Science 47: 75–90.
Collins, Linda M., Joseph L. Schafer, and Chi-Ming Kam (2001) A Comparison of Inclusive
and Restrictive Strategies in Modern Missing Data Procedures, Psychological
Methods 6 (4): 330–51.
Crozier, Michael, Samuel P. Huntington, and Joji Watanuki (1975) The Crisis of Democracy:
Report on the Governability of Democracies to the Trilateral Commission (New
York, NY: New York University Press).
DeNardo, James (1980) Turnout and the Vote: The Joke’s on the Democrats, American
Political Science Review 74: 406–20.
Dubin, Jeffrey A., and Douglas Rivers (1989) Selection Bias in Linear Regression, Logit and
Probit Models, Sociological Methods and Research 18: 360–90.
Gurr, Ted R. (1970) Why Men Rebel (Princeton, NJ: Princeton University Press).
Heckman, James J. (1979) Sample selection bias as a specification error, Econometrica 47,
153-61.
Herron, Michael C. (1998) The Presidential Election of 1988: Low Voter Turnout and the
Defeat of Michael Dukakis, Political Methodology Working Paper.
Highton, Benjamin, and Raymond E. Wolfinger (2001) The Political Implications of Higher
Turnout, British Journal of Political Science 31 (1): 179-223.
30
Hill, Kim Quaile, Jan E. Leighley, and Angela Hinton-Andersson (1995) Lower Class
Mobilization and Policy Linkage in the U.S. States, American Journal of Political
Science 39: 75–86.
Honaker, James, and Gary King (2006) What to do About Missing Values in Time Series
Cross-Section Data, Unpublished manuscript, available at http://gking.harvard.edu/.
Horton, Nicholas J., and Ken P. Kleinman (2007) Much Ado About Nothing: A Comparison
of Missing Data Methods and Software to Fit Incomplete Data Regression Models,
The American Statistician, 61, 1 (February): 79-90.
Karp, Jeffrey A., and David Brockington (2005) Social Desirability and Response Validity: A
Comparative Analysis of Overreporting Voter Turnout in Five Countries, Journal of
Politics 67, No. 3 (August), 825-40.
King, Gary, James Honaker, Anne Joseph, and Kenneth Scheve (2001) Analyzing Incomplete
Political Science Data: An Alternative Algorithm for Multiple Imputation, American
Political Science Review 95: 49–69.
Kohler, Uli, and Richard Rose (2008) Election Outcomes and Maximizing Turnout:
Modelling the Effect, WZB Discussion Paper (Berlin: Social Science Research Centre
Berlin).
Lijphart, Arend (1997) Unequal Participation: Democracy’s Unresolved Dilemma, American
Political Science Review 91: 1–14.
Little, R. J. A., and Donald. B. Rubin (2002) Statistical Analysis with Missing Data
(Hoboken: Wiley).
Lyons, Pat, and Sinnott, Richard (2002) Voter turnout in 2002 and beyond, pp. 159-76 in M.
Gallagher, P. Mitchell and M. Marsh eds., How Ireland Voted 2002 (Basingstoke:
Palgrave).
Lutz, Georg, and Michael Marsh (2007) Introduction: Consequences of Low Turnout,
Electoral Studies 26 (3): 539–547.
Marsh, Michael, Richard Sinnott, John Garry and Fiachra Kennedy (2008) The Irish voter:
the nature of electoral competition in the Republic of Ireland (Manchester:
Manchester University Press).
Martinez, Michael D. and Jeff Gill (2005) The Effects of Turnout on Partisan Outcomes in
U.S. Presidential Elections 1960-2000, Journal of Politics 67 (4), 1248-74.
Nannestad, P., Paldam, M. (2002) The Cost of Ruling. A Foundation Stone for Two Theories,
In: Dorussen, H., Palmer, H. D., Taylor, M. (eds.) Economic Voting (London:
Routledge).
31
Nie, Norman H., Sidney Verba and John R. Petrocik (1979) The Changing American Voter
(Cambridge, MA: Harvard University Press).
Pacek, Alexander, and Benjamin Radcliff (1995) Turnout and the Left - Party Vote, British
Journal of Political Science 25 (1): 137-153.
Radcliff, Benjamin (1994) Turnout and the Democratic Vote, American Politics Quarterly
22: 259-276.
Rosenstone, S. J., Hansen, J.M. (1993) Mobilization, Participation, and Democracy in
America. (New York, NY: Macmillan).
Rubin, Donald B. (1976) Inference and Missing Data, Biometrika 63: 581-592.
Rubin, Donald B. (1987) Multiple Imputation for Nonresponse in Surveys (New York: John
Wiley).
Studlar, Donley T., and Susan Welch (1986) The Policy Opinions of British Non-voters: A
Research Note, European Journal of Political Research 14: 139-48.
Tóka, Gábor (2002) Voter inequality, turnout and information effects in a cross-national
perspective. Working paper no. 297, Helen Kellogg Institute, Notre Dame University.
Wolfinger R.E. and S.J. Rosenstone (1980) Who Votes? (New Haven, CT: Yale University
Press).
32
Appendix A. Variables in Imputation Model for the 2002 Irish General Election
Number of FF candidates in constituency
Number of FG candidates in constituency
Number of Green candidates in constituency
Number of Labour candidates in constituency
Number of PD candidates in constituency
Number of SF candidates in constituency
Number of independent candidates in constituency
N
2663
2663
2663
2663
2663
2663
2663
Mean
2.63
2.14
0.78
1.11
0.49
0.86
3.47
S.D.
0.62
0.7
0.41
0.58
0.65
0.48
1.8
Min.
2
1
0
0
0
0
0
Max.
4
4
1
3
3
2
7
How likely ever to vote for Fianna Fáil
How likely ever to vote for Fine Gael
How likely ever to vote for Green Party
How likely ever to vote for Labour
How likely ever to vote for Progressive Democrats
How likely ever to vote for Sinn Féin
How likely ever to vote for an independ. candidate
2625
2603
2586
2602
2592
2595
2599
6.74
5.11
4.69
4.81
4.76
3.37
5.68
3.2
3.05
2.81
2.78
2.75
2.83
2.97
1
1
1
1
1
0
0
10
10
10
10
10
10
10
Thermometer degree, Bertie Ahern
Thermometer degree, Mary Harney
Thermometer degree, Ruairi Quinn
Thermometer degree, Trevor Sargent
Thermometer degree, Michael Noonan
Thermometer degree, Gerry Adams
Thermometer degree, Fianna Fáil
Thermometer degree, Green Party
Thermometer degree, Fine Gael
Thermometer degree, Labour
Thermometer degree, Progressive Democrats
Thermometer degree, Sinn Féin
2612
2595
2558
2419
2562
2579
2591
2543
2567
2557
2553
2538
65.55
51.07
42.87
42.18
36.74
38.85
63.92
47.71
47.03
45.29
47.22
33.36
24.24
23.54
20.71
21.87
23.07
26.6
25.6
21.92
23.34
20.97
22.34
25.8
0
0
0
0
0
0
0
0
0
0
0
0
100
100
100
100
100
100
100
100
100
100
100
100
Evaluation of economy over last 5 years
Evaluation of health services over last 5 years
Evaluation of housing situation over last 5 years
2657
2655
2651
1.9
3.5
3.13
1.08
1.26
1.48
1
1
1
6
6
6
Age
Female
Urban
Class
Education
Union member
Left/right self placement
Satisfaction with democracy
Efficacy
Frequency of attending religious service
Political knowledge
Party identification
2640
2663
2592
2498
2654
2326
2347
2341
2660
2393
2663
2663
46.9
0.52
0.29
2.53
3.84
0.35
2.1
6.91
2.54
3.09
3.39
0.28
17.12
1.7
1.37
0.48
0.61
2.82
1.66
1.84
1.28
-
18
0
0
1
1
0
1
0
1
1
0
0
100
1
1
5
6
1
4
11
7
8
5
1
Did Fianna Fáil contact?
Did Fine Gael contact?
Did Greens contact?
Did Labour contact?
Did Progressive Democrats contact?
Did Sinn Féin contact?
2663
2663
2663
2663
2663
2663
0.33
0.24
0.02
0.12
0.04
0.06
-
0
0
0
0
0
0
1
1
1
1
1
1
33
Appendix B. Party vote shares under validated and full turnout using alternative imputation models and multinomial logit predictions
Observed Vote
(N=1,835)
Simulation of Full Turnout
MI (full set of variables,
N=2,663)
MI (demographics only,
N=2,663)
MI (preferences only,
N=2,663)
M-logit
(demographics
only, N=2,570)
M-logit (all
variables,
N=2,240)
Full sample
FF
FG
Greens
Labour
PD
SF
Independent
Nonvoters
only
FF
FG
Greens
Labour
PD
SF
Independent
N
820
406
80
173
58
90
208
Prop.
45%
20%
4%
10%
3%
6%
11%
Professed vote (N=556)
N
Prop.
261
45%
107
19%
30
7%
63
13%
22
3%
22
4%
51
10%
Prop.
43%
21%
4%
10%
3%
7%
11%
S.E.
0.012
0.011
0.006
0.007
0.004
0.006
0.008
Prop.
44%
20%
5%
10%
3%
6%
12%
S.E.
0.012
0.009
0.006
0.007
0.004
0.007
0.008
Prop.
44%
21%
4%
10%
3%
6%
11%
S.E.
0.012
0.010
0.006
0.007
0.005
0.006
0.007
Prop.
44%
20%
5%
10%
3%
7%
11%
Prop.
43%
20%
4%
10%
4%
8%
11%
(N=556)
Prop.
42%
23%
4%
10%
3%
7%
11%
S.E.
0.036
0.033
0.016
0.020
0.010
0.012
0.025
(N=556)
Prop.
43%
20%
5%
10%
3%
6%
13%
S.E.
0.032
0.025
0.011
0.016
0.011
0.019
0.018
(N=556)
Prop.
43%
24%
4%
10%
3%
6%
10%
S.E.
0.033
0.025
0.017
0.019
0.013
0.012
0.017
(N=496)
Prop.
43%
19%
5%
10%
3%
8%
12%
(N=305)
Prop.
40%
21%
5%
11%
6%
10%
8%
Figure 1. Observed versus full (100%) turnout vote
60%
55%
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
Actual turnout (N=1835)
Full turnout (MI, N=2663)
Full turnout (Citrin et al., N=2570)
In
d
SF
PD
La
bo
ur
re
en
s
G
FG
FF
Full turnout (Martinez and Gill,
N=2,240)
Figure 2. Professed versus simulated vote
60%
Professed vote (N=556)
55%
50%
MI (N=556)
45%
40%
Citrin et al. predictions (N=496)
35%
30%
Martinez and Gill predictions
(N=305)
25%
20%
15%
10%
5%
0%
FF
FG
Greens Labour
PD
36
SF
Ind
SW
Z
19
H
U 99
N
1
ES 998
P
19
C
AN 96
1
ES 997
P
2
KO 00
R 0
20
M
EX 00
1
PR 99
T 7
2
SL 002
V
19
N
TH 96
TW 199
N 8
1
C 996
ZH
19
R
U 96
S
19
G
BR 99
1
PR 99
U 7
EG 200
ER 1
1
IS 99
R
19
U
KR 96
R 199
O
M 8
19
M
EX 96
20
U
SA 0 0
19
D
N 96
K
1
N 998
ZL
1
SC 99
O 6
W
19
G
ER 97
19
IC 98
E
SW 199
E 9
BE 1 9
LW 98
19
IR 99
L
BE 200
LF 2
1
AU 9 9
S 9
19
96
Disproportionality
Figure 3. Differences between observed and simulated vote shares based on MI and multinomial logit predictions (Gallagher Index)
10
9
MI
8
Martinez and Gill
Citrin et al.
7
6
5
4
3
2
1
0
Figure 4. Differences between MI and multinomial logit simulations by number of parties
(b) Difference between MI and Martinez and Gill
HUN 1998
10
10
(a) Difference between MI and Citrin et al.
HUN 1998
SWZ 1999
8
8
ESP 1996
ESP 1996
CAN 1997
SWZ 1999
CAN 1997
6
6
KOR 2000
MEX 1997
SLV 1996
NTH 1998
TWN 1996
MEX 1997
PRT 2002
CZH 1996
4
4
Britain 1997
MEX 2000 EastG 1998
PRU 2001
Scotland 1997
ISR 1996
UKR 1998
RUS 1999
DEU 1998
ICE 1999
SWE 1998
RUS 1999
UKR 1998
SWE1999
1998
ICE
BELF1999
BELW1999
IRL BELF1999
2002
BELW1999
IRL 2002
AUS 1996
0
AUS 1996
b = .081; se = .085; R-sq = .03; N = 31
5
NTH 1998
CZH 1996
TWN 1996
SLV 1996
Scotland 1997
PRU 2001
ROM
1996
USA 1996
ISR 1996
MEX 2000 DEU 1998
Britain 1997
NZL 1996
DNK 1998
2
2
DNK
1998
NZL 1996
USA 1996
ESP 2000
EastG 1998
ROM 1996
0
Disproportionality
ESP 2000
PRT 2002
KOR 2000
10
15
20
b = .092; se = .082; R-sq = .04; N = 31
25
Number of Parties
5
10
15
20
25
Figure 5. Differences between MI and multinomial logit simulations by turnout
(b) Difference between MI and Martinez and Gill
HUN 1998
10
10
(a) Difference between MI and Citrin et al.
HUN 1998
SWZ 1999
8
8
ESP 1996
SWZ 1999
ESP 1996
CAN 1997
CAN 1997
6
6
KOR 2000
MEX 1997
NTHSLV
19981996
TWN 1996
ESP 2000
MEX 1997 PRT 2002
NTH 1998
CZH 1996
RUS 1999
4
4
Britain 1997
PRU 2001
MEX 2000
EastG 1998
ROM 1996
Scotland 1997
RUS 1999UKR 1998
Scotland
1997
PRU
2001
USA 1996
ISR 1996
2
2
SWE ICE
19981999
IRL 2002
AUS 1996
0
AUS 1996
b = -.084; se = .030; R-sq = .21; N = 31
50
60
70
NZL 1996
DNK 1998
BELW1999
BELF1999
IRL 2002
40
ISR 1996
DEU 1998
UKR 1998
DEU 1998
ICE 1999
SWE 1998
USA 1996
80
EastG 1998
CZH 1996
TWN 1996
SLV 1996
ROM 1996
MEX 2000
Britain 1997
DNK1996
1998
NZL
0
Disproportionality
ESP 2000
PRT 2002
KOR 2000
90
b = -.087; se = .029; R-sq = .24; N = 31
40
Actual Turnout
39
50
60
70
80
90
BELF1999
BELW1999
Table 1. MI Vote by Observed and Professed Vote
Re-imputed, N=569 (of 1,835)
Greens
Labour
PD
FF
FG
FF
FG
Greens
Labour
PD
SF
Independent
0.64
0.17
0.39
0.17
0.37
0.21
0.25
0.17
0.58
0.18
0.20
0.35
0.09
0.31
0.02
0.03
0.31
0.05
0.01
0.09
0.09
0.03
0.06
0.10
0.36
0.04
0.12
0.06
Total
0.40
0.30
0.05
0.08
FF
FG
FF
FG
Greens
Labour
PD
SF
Independent
0.70
0.28
0.27
0.30
0.33
0.24
0.33
0.12
0.47
0.21
0.16
0.23
0.23
0.18
0.02
0.03
0.19
0.05
0.00
0.04
0.04
0.04
0.06
0.15
0.27
0.14
0.06
0.12
Total
0.49
0.21
0.04
0.09
(a) Observed Vote
(b) Professed Vote
SF
Independ.
0.03
0.04
-0.10
0.02
0.10
-0.04
0.03
0.00
0.00
0.01
0.02
0.09
0.32
0.02
0.11
0.12
0.11
0.18
0.05
0.21
0.23
0.03
0.02
0.14
SF
Independ.
0.03
0.03
0.05
0.01
0.17
0.02
0.01
0.02
0.01
0.03
0.07
0.02
0.30
0.07
0.07
0.11
0.11
0.13
0.11
0.12
0.23
0.03
0.04
0.10
Imputed, N=556 (of 2,663)
Greens
Labour
PD
(c) Difference in
-6
11
12
9
-7
2
0
Party Match
Note: Cell entries are the average probabilities of a vote for the column party. Imputations are based on MI model
including all available variables related to vote choice.
40
Table 2. Multinomial logit-simulated Vote by Observed and Professed Vote (Citrin et al.
model)
Re-simulated, N=513 (of 1,835)
Greens
Labour
PD
FF
FG
FF
FG
Greens
Labour
PD
SF
Independent
0.46
0.44
0.41
0.45
0.40
0.45
0.44
0.24
0.26
0.26
0.25
0.28
0.20
0.25
0.03
0.04
0.05
0.04
0.04
0.03
0.04
0.08
0.09
0.09
0.09
0.10
0.08
0.09
Total
0.45
0.25
0.03
0.08
FF
FG
FF
FG
Greens
Labour
PD
SF
Independent
0.44
0.44
0.36
0.42
0.40
0.43
0.42
0.21
0.24
0.19
0.20
0.20
0.16
0.20
0.04
0.04
0.09
0.06
0.07
0.04
0.05
0.10
0.09
0.12
0.11
0.12
0.10
0.10
Total
0.43
0.21
0.05
0.10
(a) Observed Vote
(b) Professed Vote
SF
Independ.
0.02
0.02
0.03
0.02
0.03
0.02
0.02
0.04
0.04
0.03
0.04
0.03
0.08
0.04
0.12
0.12
0.13
0.12
0.12
0.13
0.12
0.02
0.04
0.12
SF
Independ.
0.03
0.03
0.05
0.04
0.05
0.04
0.03
0.06
0.05
0.07
0.06
0.06
0.11
0.08
0.12
0.12
0.11
0.12
0.10
0.11
0.12
0.03
0.06
0.12
Simulated, N=496 (of 2,663)
Greens
Labour
PD
(c) Difference in
2
2
-4
-2
-2
-3
0
Party Match
Note: Cell entries are the average probabilities of a vote for the column party. Simulations are based on multinomial
logit model including only demographics.
41
Table 3. Multinomial logit-simulated Vote by Observed and Professed Vote (Martinez and
Gill model)
Re-simulated, N=313 (of 1,835)
Greens
Labour
PD
FF
FG
FF
FG
Greens
Labour
PD
SF
Independent
0.77
0.15
0.10
0.14
0.16
0.20
0.20
0.11
0.72
0.01
0.07
0.10
0.07
0.22
0.01
0.00
0.70
0.00
0.02
0.00
0.03
0.02
0.03
0.09
0.66
0.00
0.02
0.04
Total
0.43
0.28
0.02
0.08
FF
FG
FF
FG
Greens
Labour
PD
SF
Independent
0.70
0.21
0.18
0.13
0.22
0.07
0.25
0.09
0.54
0.16
0.10
0.12
0.06
0.16
0.03
0.01
0.39
0.05
0.03
0.05
0.07
0.04
0.07
0.07
0.56
0.06
0.04
0.09
Total
0.42
0.20
0.05
0.11
(a) Observed Vote
(b) Professed Vote
SF
Independ.
0.00
0.01
0.00
0.01
0.67
0.00
0.01
0.01
0.01
0.02
0.00
0.00
0.66
0.01
0.08
0.08
0.08
0.12
0.05
0.06
0.49
0.02
0.03
0.14
SF
Independ.
0.05
0.03
0.04
0.03
0.52
0.06
0.04
0.04
0.06
0.09
0.05
0.01
0.69
0.09
0.06
0.09
0.08
0.07
0.05
0.04
0.30
0.06
0.08
0.09
Simulated, N=305 (of 2,663)
Greens
Labour
PD
(c) Difference in
7
18
31
10
15
-3
19
Party Match
Note: Cell entries are the average probabilities of a vote for the column party. Simulations are based on multinomial
logit model including all available variables related to vote choice.
42
Download