Factors Affecting Polling Accuracy in the 2004 U.S. Presidential Election Wayne Wanta School of Journalism University of Missouri Columbia, MO 65211-1200 573-884-9689 wantaw@missouri.edu Hans K. Meyer School of Journalism University of Missouri Columbia, MO 65211-1200 hanskmeyer@gmail.com Antonie Stam College of Business University of Missouri Columbia, MO 65211 573-882-6286 stama@missouri.edu ** Paper presented at the annual conference for the World Association for Public Opinion Research, Berlin, Germany, September 2007. ** Wanta is a professor in the School of Journalism and executive director of the Center for the Digital Globe, Meyer is a doctoral student in the School of Journalism, and Stam is the Leggett & Platt Distinguished Professor of MIS in the College of Business – all at the University of Missouri. Factors Affecting Polling Accuracy in the 2004 U.S. Presidential Election Abstract Using polling data from the 50 states and District of Columbia during the 2004 U.S. presidential election, this study examined four factors that could potentially affect how accurate a poll could predict eventual voting results. Findings show that polls were most accurate for states in which the election was very closely contested and in which the poll was conducted close to the election day. How many respondents and how many undecided respondents were in the poll did not play roles in polling accuracy. 2 Factors Affecting Polling Accuracy in the 2004 U.S. Presidential Election Every election, scores of polls are conducted, aimed at finding trends among respondents that could lead to predictions of eventual winners. These polls are conducted at local, regional, state and national levels. All polls are not created equal, however. Past polls have been so inaccurate as to predict the wrong winner, such as the 1936 election when the Literary Digest poll predicted Alf Landon would defeat Franklin Roosevelt. In this case, survey methods over-selected high-income individuals, leading to more Republicans answering the survey questions. Certainly, survey methodology has made great strides since 1936. Pre-election public opinion polls in the 2004 U.S. presidential election, in fact, were exceptionally accurate. As Traugott (2005) notes, based on three different polling accuracy measures, pollsters had their best year since 1956. According to Shelley and Hwang (1991), a number of factors influence the way a poll can accurately gauge pubic opinion. In a study of all the polls of five major news organizations from Jan. 1 to election day in 1988, they describe how polling trends and the impact of events contributed to the accuracy and consistency of poll numbers. They found that the average deviation from President George H. Bush’s final election result was 4 percentage points while Michael Dukakis was 4.4. This is consistent with Crespi’s 1988 finding that final local, state and presidential primary and general election polls have had an average absolute deviation of 5.7 percentage points. 3 This study looks at the effects of four factors on polling accuracy for individual states during the 2004 presidential campaign: the timing of a poll, the closeness of an election, the number of respondents in a poll and the number of undecided voters in a poll. Each of these variables could affect how accurate a poll would be in predicting the eventual winner of the U.S. presidential race. Polls were identified from the National Election website for the Los Angeles Times, which reported the most recent polls available from each individual state. Given that the 2004 polling was very accurate, this study hopes to identify factors that contributed to these accurate poll results. By identifying factors affecting accuracy, researchers can take these variables into consideration when examining future polling results. Literature Review Since the early 1800s, journalists have done all they can to predict elections (Frankovic, 2003). “They have used many available tools, from carrier pigeons to the Internet, to tell the important story of the peaceful transfer of power in a democratic process” (p. 19). One of the most common tools has been the public opinion poll, which Crespi (1980) says can be traced back to the 1820s. In fact, he writes newspapers have chased public opinion through polls because no greater news scoop exists than predicting the winner of an election before it has even happened. “The expression of public opinion in the voting booth has an immediate, direct and power political effect: therefore, it is news. Moreover, being able to anticipate correctly the outcome of an election before anyone else is a form of news scoop” (p. 465). 4 Newspaper organizations might have had their best year in predicting the 2004 election (Traugott, 2005). Of the 19 polls he studied, Traugott (2005) found that all but one were within the range of plus or minus 4 percentage points, with 13 showing Bush ahead, two showing ties and five showing Kerry ahead (p. 645). One reason for their overall accuracy is pollsters expected a close race and altered their mix of study designs. Even their overall success in predicting the 2004 election, however, did not inoculate them from criticism, especially in how they defined likely voters. “As the voting procedures and devices change and more opportunities to vote early arise, pre-election and exit pollsters will face new challenges … More public disclosure of methods and their consequences will be required to maintain public confidence in the profession, as well as in the basic foundation of the American electoral system and its transparency to the public” (Traugott, 2005: 653). Before examining one poll in particular, the L.A. Times’ online compendium of state pre-election polls, it is important to understand the thinking and theory behind opinion polls, why the media participate so actively and what measures influence a polls ability to correctly predict an election. Polling and Democracy In virtually every democracy in the world, Gallup, or one of its rivals, conducts and publishes public opinion surveys of voter intentions prior to a major election (LewisBlack, 2005). Researchers continue to disagree on how polls impact the political process. Sidney Verba called public opinion polling “the most significant development in promoting the democratization of the American regime” (cited in Pearson, 2004). George Gallup, founder of the poll that bears him name, called polls a “continuous referendum by 5 which majority will could be made known” (cited in Meyer, 1990). By providing information about a likely outcome, polls allow leaders and followers alike to make the adjustments they feel are necessary (Lewis-Beck, 2005). Good election forecasts are information “that is intrinsically interesting in a healthy democracy” (p. 146). While polls may promote democracy by enabling political leaders to know what their constituents think, the picture of what public opinion polls create is illusory and “is typically purchased at the price of moral reasoning, which tends not be quantifiable in its very nature” (Pearson, 2004: 71). Polling needs to work with democratic theory to encourage participation, but it cannot make science the solution to all problems. “This makes it all the more imperative that in a democracy such as ours it is the philosophical problems that are most in need of clarity because our culture is already predisposed to believe that science is the answer to whatever problem we face” (p. 71). This illusion of “certainty” of who will win has the potential to affect the outcome of elections. Critics say voters are less likely to participate if they think their vote will not make a difference in the final tally. Polls that only predict winners, or horse-race polls, also lead to voters knowing less about the issues and caring less about the election (Meyer & Potter, 1998). So many public opinion polls exist today that voters have neither the time, nor the desire, nor the ability to sort through them all (Lewis-Black, 2005). This glut of opinion polls actually works to empower voters with better information on which to base their decision. It can also have a “catalytic effect” that makes the contest more interesting to citizens. “As citizens follow the candidate’s progress or lack of it, they become interested in the race and why it is developing that way, and they start to look at issues” (Meyer & Potter, 1998: 36). 6 The key to ensuring that polls work with and not against democracy is for pollsters to disclose all of the research biases and limitations that can push their findings away from reality (Rotfield, 2007). Even though news organizations have become increasingly reliant on polls as the basis for most of their political news, they have failed to adequately address their limitations. “Small shifts of public opinion have become the lead for all major news programs with reporters giving the statistical sampling error as if that and that alone would explain any limitations to the data” (p. 187). Newspapers especially should recognize the need to report limitations because from their beginnings, they have always placed truth above entertainment (Rotfield, 2007). Polls and Journalism A newspaper’s commitment to truth, however, presents a conundrum in reporting and participating in public opinion polling. Traditionally, newspapers have shied away from making news, even as they have embraced opinion polls more and more. News organizations should not feel guilty because public opinion polling represents the kind of “precision journalism” news organizations should practice (Meyer, 1990). In fact, it has become a “vital component of the picture journalists draw of a political campaign” (Stovall & Solomon, 1984). Being able to predict the outcome of an election can only strengthen a news organization’s credibility because it moves journalism closer to science, which states its propositions in forms which can be tested (Meyer, 1990: 456). They also enhance credibility because polls remain one of the few journalistic offerings whose accuracy can be quickly and decisively tested, “and for that reason, they tend to keep us honest” 7 (Meyer, 1990: 454). While these predictions may affect the outcome of elections, at least they are more likely to be accurate than data based on rumor, speculation and the “musings of spin doctors” (p. 458). Besides, predicting who will win an election is the bread and butter of journalism. “The most interesting fact about an election is who wins … Yes, of course, you want to know about the dynamics of the campaign and what put the front-runner where he or she is … But none of that interesting and useful information is going to make much sense unless you can identify the front runner” (Meyer, 1990: 455). Journalism and Poll Accuracy The solution to ensuring polls serve democracy is by providing more information, not less. “By keeping track of the polls’ successes and failures at predicting elections, (news organizations) can help the information marketplace sort the good from the bad” (Meyer, 1990). While news organizations have a long tradition of using polls, they have not always been either the most accurate or the most forthcoming in how they obtained the results. The initial acceptance of public opinion polls as a credible source of information about public opinion rested primarily on the belief that pre-election polls predict elections accurately. Three components to this belief are 1) polls have generally been very accurate, 2) pre-election polls predict how people will vote, 3) measuring voting behavior is comparable to measuring opinions on issues (Crespi, 1989). News organizations did not begin to seriously question what polls and polling procedures meant until pollsters inaccurately predicted Dewey would beat Eisenhower in 1948. The debacle, in fact, hastened “the wholesale adoption of probability sampling” (Bogart, 1998). 8 Most of the media’s navel gazing about polls relates to their impact. As early as in the 1980 election (Stovall & Soloman, 1984), researchers examined whether newspapers focused too much on the horserace aspects of polls instead of the issues. What they found is that newspapers are actually less likely to play a poll story as prominently as other campaign stories (622). Meyer and Potter (1998) agreed that media may have hurt themselves by minimizing coverage of horse-race polls. “With the horserace to arouse interest, citizen attention to the expanded issue coverage was reduced – perhaps by an amount sufficient to wash out the effect of that coverage” (p. 42). The role the accuracy of these predictions had, however, was more specifically examined in the 1988 presidential election. In a study of all the polls from Jan. 1 to election day in 1988 of five major news organizations (New York Times/CBS News, Wall Street Journal/NBC News, The Washington Post/ABC News, Newsweek/Gallup, and Time/Yankelovich Clancy Shulman) Shelley & Hwang (1991) suggest accurate polls do a good job in pulling people into the election. They defined accuracy as “the difference between the poll predictions and the actual election outcome,” and found that the average deviation from the final election result for both the first President Bush and Michael Dukakis was between 4 and 4.4 percentage points. Their findings suggest that the dynamic of public opinion in the 1988 presidential campaign was more a process of initially or temporarily undecided voters coming to a decision and less a matter of prospective voters switching from one candidate to the other. Timing remained critical for the ability of the poll to predict accurately because intervening events dramatically shifted poll numbers. 9 Accurate election prediction was most questioned during the 2000 presidential election when Republican George W. Bush won the presidency despite losing the popular vote to Democrat Al Gore. For the most part, news organizations and pollsters succeeded in calling elections before and even after 2000, much of that success came from elections where the outcome was relatively clear cut (Konner, 2003). “Polls are statistical calculations, not factual realities. They are imperfect measures of voter intent and actual voting, and their inaccuracies are especially perilous in close elections” (p. 16). The best thing to come from the 2000 election might be that news organizations began to understand they need to make it clear they were reporting only projections and explaining more clearly how calls were made (Frankovic, 2003). “Calling elections is not magic, but too often it is presented as such. Consequently, many reporters and many viewers (including the candidates themselves) have held a mistaken belief in the news media’s election omniscience” (p. 30). In studying Israeli polls and elections, Weimann (1990) found those polls that carefully detail methodological problems, “thus limiting the value of predictions based on the poll” are more accurate in their predictions (p. 404). News organizations were less likely to do this as the election drew nearer. “However, when this growing reliance on surveys and polls is not accompanied by increasing familiarity and understanding of the statistical and methodological problems involved in polling, and when standards for reporting polls are non-existent or poorly observed, the results of such ‘precision journalism’ would emerge as far from accurate and valid” (p. 406). Accuracy Measures Describing statistical procedures and methodological problems is not as easy as listing a few formulas. Polling is a “complex human enterprise that requires many different steps, some mechanical, some judgmental,” Bogart (1998) writes (p. 12). All 10 survey statistics, he adds, arise from a series of professional judgments. One of the most important, and one of the least commonly identified, is how to decide who will participate. All polls must weigh results to conform to the population characteristics identified in the U.S. Census. Shoehorning data into demographic proportions, however, has never been simple as it has never been easy to “ascertain the psychological attributes of the many people with whom interviews are never completed” (Bogart, 1998). In predicting a presidential election, meeting census estimates are not enough because the population does not choose the president. The Electoral College does. To predict the Electoral College votes, one simply needs to use the predicted statewide popular vote to project a winner for each state, and DeSart & Holbrook (2003) found that state-wide trial heat polls taken in September “predicted the outcome of the 2000 election well. In fact, they did a better job of predicting the 2000 election than they did the previous two elections. It is not enough just to find the demographic groups. As early as 1945, researchers said polls must focus on those who intend to vote because it ensures a better estimate of actual voting behavior (Lazarsfeld & Franzen, 1945). Finding people who are likely to vote is not as easy as polling those who are registered because many who are registered to vote do not. There is no easy way of determining who will vote (Bogart, 1998: 11), and it is only getting more difficult as voting procedures and devices change and more opportunities to vote early arise (Traugott, 2005). “More public disclosure of methods and their consequences will be required to maintain public confidence in the profession, as well as in the basic foundation of the American electoral system and its transparency to the public.” 11 To compensate, many polling companies alter who they sample to cover all the bases. The Harris Poll, for example, began its 1995 presidential election poll by reporting the opinions of the general public, then switched to sampling registered voters in the summer (Bogart, 1998). But Harris’ switch underscores the importance timing plays in the efficacy of pre-election polling. Generally speaking the closer to the election a poll is conducted the more accurate it is, but Wolfers & Leigh (2002) found that polls taken one month prior to the election also have substantial predictive power (p. 226). Timing also tends to lessen the number of undecided voters who can add another wrinkle to election prediction. Panagakis (1999) said that undecided voters choose challengers more often than incumbents because it takes less time to decide to vote for a known incumbent than an unknown challenger. In addition, the less interested people were, the later they made up their minds. Undecided voters present an additional challenge in determining a poll’s accuracy because some debate exists over how they are counted in the final tally. The single most important statistic that determines which candidate will win an election and the one most commonly reported by the media is the margin between the top two candidates (Mitofsky, 1999). The most common ways to measure this statistic come from the systematic evaluation of polling accuracy conducted by Mosteller et al. after the 1948 presidential election debacle (Martin, Traugott & Kennedy, 2005). Mitofsky (1999) has repeatedly supported two of Mosteller’s measure as the most viable for prediction poll accuracy: Measure 3 computes the average error on all major candidates between the prediction and the actual results, while Measure 5 examines only the difference between the top two candidates. He also questions how to handle the percentages of undecided 12 voters. Panagakis (1999), and much of the literature, agree that undecided voters must be dealt with in some way besides simply not including their percentages. He argues for allocating the undecided voters in proportion to their pre-election poll support. Mitofsky (1999) found more consistency among different measures when the undecided voters were allocated proportionally. To account for both accuracy and bias, as well as compare elections across time, Martin, Traugott & Kennedy (2005) proposed a new measure that calculates the odds of a Republican and Democratic choice in a given poll against the total number of respondents who favor either the Democrat or the Republican in the poll. In examining the L.A. Times decision to aggregate polls for each state on its Web site during the 2004 election, this study will look most closely at the difference between each major candidate’s predicted and actual totals and the proportion of undecided voters. It also looks at how the timing and the number of respondents affected the poll’s accuracy. Method Polling data were gathered from an election website produced by the Los Angeles Times. The Times website reported on polling results for all 50 states and the District of Columbia throughout the 2004 U.S election campaign, updating the site nearly daily. Poll results came from several sources, including from the news media of each state. The analysis examined one dependent variable: Polling accuracy. The variable was determined by taking the margin of victory predicted by an individual poll from a state and comparing it to the actual vote totals for the state. Thus, the variable had 51 observations (one for each of the 50 states and the District of Columbia). The variable 13 ranged from zero (Connecticut, Delaware and New Hampshire – the three states where the poll results and election results differed by less than one-half percent) to 14 (District of Columbia) Independent variables: Four independent variables were included in the analysis. Timing: The number of days before the date of the election was recorded. This variable had a range of 2 (Ohio, New Jersey, Florida) to 53 (Idaho). Logically, a poll conducted closer to an election would be more accurate than a poll conducted earlier in a campaign (Crespi, 1989). Closeness: The difference between the winner of a state’s election total and the loser was recorded. This variable had a range of 1 (Iowa, New Hampshire, New Mexico, Wisconsin) to 81 (District of Columbia). How close an election was could influence how accurate a poll is. Voters in states with a wider margin of victory may have fluctuated more in their voting patterns, with some voters not bothering to vote because they may have felt that their vote would not matter. Number: This variable measured the number of respondents in a poll. It ranged from 400 (South Dakota) to 1,345 (Wisconsin). Logically, the more respondents polled, the more accurate the poll. Undecided: This variable recorded the percentage?? of undecided respondents in a poll. It ranged from 2 (New Hampshire) to 16 (Delaware). Undecided voters could be swayed either way in an election. The more undecided voters, the less accurate a poll might be. 14 The data were analyzed through a path analysis model. Path coefficients from each of the four independent variables were examined leading to the dependent variable of polling accuracy. Results and Discussion Overall, the polls were very accurate. More than half of the polls (54.9 percent) were within 3 percent of the actual vote count. As mentioned above, three polls (Connecticut, Delaware and New Hampshire) predicted the actual vote results exactly (less than one-half percent difference). Ten other polls were off by just one percent of the actual vote. Table 1 shows the path analysis coefficients for all of the variables in the study. As the table shows, two factors were able to predict the accuracy of the polls: The timing of the poll and the closeness of the race. Timing was positively related to the accuracy of the polls – the closer the poll was conducted to the election date, the more accurate the poll was. The closeness of the race was also positively related to accuracy – the closer a race was, the more accurate the poll was. The number of respondents in a poll and the number of undecided voters in a poll were not related to polling accuracy. The lack of statistical significance of the variable measuring the number of respondents points to the accuracy overall of polling methods. Results were accurate even with a relatively small number of respondents. The lack of significance for the variable measuring the number of undecided voters in a state was surprising – apparently, undecided voters were equally distributed among Democrats and Republicans in a proportion similar to the eventual voting results. In other words, undecided respondents voted similarly to the respondents who had made their choice for president much earlier. 15 The two significant variables showed that (1) polling results were more accurate the closer in time to an election – later polls captured the trends of voters more accurately than polls conducted earlier in a campaign; and (2) it is easier to track voters’ opinions in a closer election – landslide elections could produce wilder swings among voters because of lower turnout or other factors. Both of the significant predictors of polling accuracy are good news for survey researchers. Timing of polls is important. Early polls cannot capture voting trends as well as later polls, suggesting that polls late in a campaign can accurately gauge voter intentions. In addition, hotly contested states would be more important to track than landslide states. Pollsters are less concerned about states in which one candidate has a sizable margin. The winner can be easily predicted, though the margin of victory is somewhat less clear. Pollsters are much more interested in attempting to predict the winner in the hotly contested states. The results here show that they were successful in gauging voter intentions. Overall, the findings point to the successes of public opinion polling during the 2004 U.S. presidential election. Future studies should examine other factors that could play a role in polling accuracy, as well as investigating whether these same factors affect polling accuracy in other election settings – such as local elections – or in other cultures and other countries. 16 Table 1: Path coefficients examining factors related to polling accuracy. Beta Standardized Beta Sig. Timing R: .288; R-square: .083; Adjusted R-Square: .064. .061 .288 .041 Closeness R: .301; R-square: .090; Adjusted R-Square: .072. .074 .301 .032 Respondents R: .024; R-square: .001; Adjusted R-Square: -.020. .000 -.024 .869 Undecided R: .052; R-square: .003; Adjusted R-Square: -.018. -.061 -.052 .715 17 References Bogart, L. (1998). "Politics, Polls, and Poltergeists." Society, 35(4), 8-16. Crespi, Irving. (Winter, 1980). “Polls as Journalism.” Public Opinion Quarterly, 44 (4): 462-476 Crespi, Irving (1989). Public Opinion, Polls, and Democracy. Westview Press: Boulder, Colo. Frankovic, K. A. (2003). "News Organizations' Responses to the Mistakes of Election 2000." Public Opinion Quarterly, 67(1), 19-31. Konner, J. (2003). "The Case for Caution." Public Opinion Quarterly, 67(1), 5-18. Lazarsfeld, P. F., & Franzen, R. H. (1945). "Prediction of Political Behavior in America." American Sociological Review, 10(2), 261-273. Martin, E. A., Traugott, M. W., & Kennedy, C. (2005). "A Reviews and Proposal for a New Measure of Poll Accuracy." Public Opinion Quarterly, 69(3), 342-369. Meyer, P. (1990). "Polling as Political Science and Polling as Journalism." Public Opinion Quarterly, 54(3), 451-459. Meyer, P., & Potter, D. (1998). Preelection polls and issue knowledge in the 1996 U.S. presidential election. Harvard International Journal of Press/Politics, 3(4), 35. Mitofsky, W. J. (1999). "The Polls - Reply." Public Opinion Quarterly, 63(2), 282-284. Panagakis, N. (1999). "The Polls - Response." Public Opinion Quarterly, 63(2), 278-281. Shelley, Mack C. II and Hwarng-Du Hwang. (January 1991). “The Mass Media and Public Opinion Polls in the 1988 Presidential Election: Trends, Accuracy, Consistency, and Events.” American Politics Quarterly, 19 (1): 59-79 Traugott, M. W. (2005). "The Accuracy of the National Preelection Polls in the 2004 Presidential Election." Public Opinion Quarterly, 69(5), 642-654. Tsfati, Y. (2001). "Why do People Trust Media Pre-Election Polls? Evidence from the Israeli 1996 Elections." International Journal of Public Opinion Research, 13(4), 433-441. Weimann, G. (1990). "The Obsession to Forecast: Pre-election Polls in the Israeli Press." Public Opinion Quarterly, 54(3), 396-408. 18 Wolfers, Justin & Andrew Leigh. (2002). "Three Tools for Forecasting Federal Elections: Lessons from 2001." Australian Journal of Political Science, 37 (2): 223-240. 19