The Efficacy of Social Media Measures: Assessing Association Between Fast Food Consumption and Political Preference Using Twitter Data and Spatial Analysis Ara Cho Quantitative Methods in the Social Sciences (QMSS) Graduate School of Arts and Sciences Columbia University Advisor: Gregory Eirich, Director of QMSS May 2015 This paper was completed as part of the requirement for the QMSS program. I am indebted to Dr. Gregory M. Eirich for his invaluable advice and direction, as well as the many specific suggestions he offered that are now integral part of this thesis. I am also grateful to Dr. Cho Sung-­‐Woo for important pointers throughout the thesis class. All errors are my own. CHO, Ara (UNI: ac3772) 1. Abstract With the ever-growing availability of online data, social scientists are trying to understand how it compares with more traditional measures of public opinion and behavior (Anderson et al., 2014; Lazer et al., 2013). This thesis investigates how social media measures compare with more standard ones in one particular context: the relationship of fast food consumption and political preferences, at the state level. This paper finds that while (a) there are relationships between where Pizza Huts are located and where fast food establishments are more generally; that (b) geographic locations where there are more followers of Pizza Hut per capital correspond with where Pizza Huts are located; that (c) fast food prevalence does predict obesity prevalence; and that (d) obesity prevalence does predict Republican voting—there is no discernible relationship at the state level, between the number of Pizza Hut followers per capita and whether the state is Republican-voting. Ultimately, the relationships among these variables are sufficiently weak, the measures are sufficiently coarse, and the units of analysis (=states) are large enough that a clear relationship between the number of Pizza Hut followers per capita and whether the state is Republican-voting is not found. That said, this paper did establish that the number of Twitter followers of Pizza Hut is a good proxy for how many Pizza Huts are located in a state, which is a promising development and one that could be extended to other domains where social media data can help to capture hard-to-measure reality. 1/51 CHO, Ara (UNI: ac3772) 2. Introduction The following events are all part of the bigger pattern in which obesity and nutrition have become deeply partisan in recent years: states where high Body Mass Indices are more prevalent are likelier to vote Republican, and that this rightward lean is more pronounced in counties in the South that make up what the Centers for Disease Control (CDC) calls the “diabetes belt” (Krugman, 2015 March 6); the biggest pushback against First Lady Michelle Obama’s healthy lunch initiative came from leadership in the above-mentioned “diabetes belt” area (Pianin, 2014); and fast food corporations in recent years have been allotting an overwhelming majority of their political contributions for the Republican Party (Martin, 2015). Given the apparent correlation between fast food consumption and political preferences, this study aims to investigate how well social media measures capture the said relationship compared with more traditional ones. To this end, I look at Twitter data of a randomly drawn sample of Twitter followers. Approximately 5,300 Twitter followers were culled from more than 10 million followers of a major pizza franchise Pizza Hut (Twitter ID: @pizzahut) to help determine whether states with more Twitter followers of Pizza Hut are more likely to be Republican-voting. As a proxy for voting behavior, this research utilizes the 2012 presidential election results to categorize the 50 U.S. states and one federal district as either a “red” or “blue” state or district. In order to assess the efficacy of Twitter data, the results of the analyses—regression and spatial—are compared against outcome generated using a more traditional dataset, such as the number of Pizza Huts per capita in each state, as well as against other studies that measured the relationship using more standard methods. 2/51 CHO, Ara (UNI: ac3772) Using the popular microblogging site and spatial analytic techniques, as well as looking at other related studies, have allowed me to capture association between politics and obesity as seen in the following relationships: there are discernable relationships between where Pizza Huts are located and where fast food establishments are more generally (Fraser et al., 2010), there are more Twitter followers of Pizza Hut per capita where there are also more Pizza Hut outlets per capita; fast food prevalence does predict obesity prevalence (Currie et al., 2009; Jeffrey et al., 2006); and obesity prevalence does predict Republican voting (Shin and McCarthy, 2013). The thesis does not find a discernable relationship, at the state level, between the number of Pizza Hut followers per capita and whether the state is Republican-voting. Finally, in addition to the fast food component of the analysis, I will employ additional proxies — well-known liberal and conservative media pundits Stephen Colbert and Bill O’Reilly — to assess whether a randomly generated follower samples reflect partisan identification geographically. This is a supplementary exercise to test the efficacy of Twitter-generated data in providing an adequate sample of the population it purports to represent. 3. Literature Review 3.1 Politicization of Obesity The unrelenting rise in obesity rates across the United States has prompted the need for more effective policy responses (Kersh, 2009). More than a third of all adults and 17 percent of young people are obese, and many of them have been consigned to a whole host of related health problems like diabetes, coronary heart disease, stroke, hypertension and even cancer (Pianin & Ehley, 2014). Without a major intervention or a sea change in many Americans’ unhealthy eating habits, the adult obesity rate could reach a whopping 50 percent in less than two decades (Pianin 3/51 CHO, Ara (UNI: ac3772) & Ehley, 2014). The associated cost to society in terms of damaged lives, health care costs and diminished productivity and economic growth is staggering: $305 billion (Pianin & Ehley, 2014). The health issue, which has been billed as the “new national epidemic,” is an oft talked about topic in public discourse, and it is discussed using mainly two “frames”— personal responsibility and environmental—lending itself to very different voter reactions and policy responses (Kersh, 2009; Satcher, 2001). The variance in how people talk about the obesity issue prompts different voter reactions. For instance, when it is discussed as a “personal responsibility” problem, the dialogue is often shaped in such a way that the government’s interventions to address the issue are characterized and perceived as an infringement upon personal freedom, more specifically freedom of making lifestyle choices. The same can be said about personal freedom in raising one’s own children, at least as it relates to policy choices by schools (Have et al., 2011). Since Republican voters tend to prefer a small government, the appearance of a federal or state infringement is likely unwelcome and likely one of the reasons behind the pushback against the federal initiative to have schools serve healthier lunches. Indeed, discussion surrounding the freedom to eat whatever food — however unhealthy — has been framed by conservatives as government overreach and attack on personal freedom, a type of rhetoric typically attributed to the politically conservative (Wieder, 2013; Trinko, 2013). Public Opinion researchers have previously demonstrated that Americans are indeed divided in their views about the role of government in obesity prevention and for specific obesity prevention policies such as market regulation, junk food taxes, and school food restrictions (Barry et al., 2009). 4/51 CHO, Ara (UNI: ac3772) Among the different policy responses are calorie menu labeling and reducing unhealthy foods at school (Satcher, 2001). At issue is the first lady’s latest campaign to get schools to serve healthier lunches to schoolchildren, as part of a law passed in 2010 that sets nutrition benchmarks for public schools. Many Grand Old Party lawmakers are pursuing a measure that would exempt many schools from the new regulation of requiring school lunches to have more fresh fruit, whole grains, and vegetables. The reason they gave for their action was that schools are simply wasting money trying to impose the new standard, as many students are resisting the new healthier fare in favor of more popular junk foods like pizza (Pianin, 2014). Pizza’s dominance in food supply to school cafeterias was especially made vulnerable by the new federal nutrition standards, but the toll it has taken on pizza makers does not stop there: the labeling laws require that pizzerias post calories for a whole pie, rather than a single slice, which pizza makers say would drive away consumers due to what they call a “sticker shock.” Complicating the labeling issue is the sheer variety and options involved in topping choices, they say, and that it would simply be “impossible” for pizza makers to post accurate calorie information for all possible combination of pies (Martin, 2015). These conflicts lead to the current partisan landscape where a rare coalition of competing pizza purveyors banded together to form a lobby to advocate for the special dish and allot up to a remarkable 99 percent of their political contributions to the GOP (Krugman, 2015). Further delineating the party divide on the issue of obesity is studies that help show an association between county-level political inclination and obesity. According to a study that utilized a more traditional methodology of a telephone survey asking for self-reported weight and height information and the county-level voter preference in the 2012 presidential election, there was a modest but positive association between county-level support for the 2012 5/51 CHO, Ara (UNI: ac3772) Republican presidential candidate and county-level obesity prevalence (Shin and McCarthy, 2013). 3.2 Pizza as Proxy for Fast Food I will choose Pizza Hut as my proxy for fast food makers for a couple of reasons. As I mentioned briefly before, pizza makers have proven unique in the so-called war of nutrition. In the pushback against government intervention pizza makers have taken the most unique stance among most major fast food chains in that it formed a specialized lobby rather than backing out and complying to federally mandated nutrition standards. While other purveyors of fast food like McDonald’s and Wendy’s have been nudged along by new laws to voluntarily remove soda from its children’s menus and honor new labeling regulations, advocates for pizza have taken a more confrontational stance and formed their own lobbying force. U.S. pizza companies made political contributions totaling $1.5 million in the 2012 and 2014 elections, of which some 88 percent went to Republican candidates and groups. For instance, industry leader Pizza Hut allotted a whopping 99 percent to the Republicans, Papa John’s assigned some 87 percent, Domino’s gave some 80 percent, and Little Caesars spent 73 percent on supporting campaigns of GOP candidates and lawmakers (Martin, 2015). In addition, among the major pizza makers, Pizza Hut has the most number of stores nationwide, nearly twice as many as the stores of Domino’s, which is the second most prevalent pizza chain in the U.S. (Cosper, 2013). The second reason is quite simple: pizza is a widely preferred dish by many Americans across the nation. Pizza would serve as a sound proxy for studying people’s preference for fast food, a leading cause of obesity (Duffey, 2007) for the following reasons. Some 41 million Americans—more than the population of California—eat a slice of pizza on any given day, and 6/51 CHO, Ara (UNI: ac3772) more than one in four young males consumed pizza on a given day (Rhodes, et al., 2014). As mentioned before, the voluntary expression of enthusiasm for pizza via Twitter—whether by Tweeting about it, following pizza franchises or re-Tweeting statuses about pizza—cannot supplant measured health information like BMI, but it may still serve as a compelling indicator of lifestyle choice when it comes to affinity for, and habitual consumption of, fast food. 3.3 Twitter: Limitations and Strengths Data from Twitter, understandably, have limitations. Only 23 percent of online adults use Twitter (Pew Research, 2014). But they still account for 19 percent of the entire adult population (Pew Research, 2015). Twitter is particularly popular among those under the age of 50 and among those who are college-educated. However, Twitter has seen increases in usership across a variety of demographic groups; for instance, some 21 percent of white online adults used Twitter, compared with 27 percent for black online adults, and 25 percent for Hispanic online adults (Pew Research, 2015). With regard to education, some 16 percent of online adults with high school degree or less use Twitter, compared with 24 percent of online adults with some college experience, and 30 percent for online adults with at least a college degree. In terms of types of locality, some 25 percent of online adults who live in urban areas use Twitter, compared with 23 percent for online adults in suburban areas and 17 percent for rural areas. The comparable proportion extends to income levels as well; some 20 percent of online adults who make less than $30,000 a year use Twitter, compared with 21 percent for those who make between $30,000 and $49,999, and 27 percent each for those making between $50,000 and $74,999 and those making $75,000 or more (Pew Research, 2015). Many scholars who wrote about Twitter data are also forthright about the limitations of Twitter data. Due to its tendency to be data-driven—rather than question-driven, per se—much 7/51 CHO, Ara (UNI: ac3772) of the current quantitative research on the popular microblogging service is centered on measuring specific structural parameters in large data samples, sometimes at the expense of theoretical salience of said parameters (Weller et al., 2014). Further, there is no way of checking how completely a given data set captures what transpired on Twitter at the time of data collection. Without what Twitter calls “firehouse access,” researchers rely on the service to provide a “representative sample” of what is there (Weller et al., 2014). This, Weller says, is due in part to the unique challenge of building an infrastructure powerful enough to store vast quantities of information in real-time. Further, it would mitigate reproducibility of research results, a serious hindrance to advancing research goals (2014). Furthermore, according to a 2013 research, reaction on Twitter to major political events and policy decisions often differs a great deal from public opinion as measured by surveys. The yearlong Pew Research Center study compared the results of national polls to the tone of Tweets in response to eight major news events, including outcome of the presidential election, the first presidential debate and major speeches by the president (Mitchell and Hitlin, 2013). That said, Twitter data is not without its strengths. Twitter is a cost-effective research tool in that collection of data is relatively costless compared with traditional data collection methods such as surveys. One method of inquiry that might most benefit from Twitter-generated data may be text mining and text analysis. In addition to culling information about followers, followees or friends, Twitter API (Application Program Interface) allows us to collect statuses as well, providing a rich body of text information with which to either conduct a sentiment analysis or simple text visualization exercises, such as word clouds (Weller et al., 2014). With those aforementioned limitations and strengths in mind, it is still worthwhile to explore Twitter data in a variety of novel ways—by examining what type of information can be 8/51 CHO, Ara (UNI: ac3772) gleaned from looking at someone’s, or some corporate entity’s, follower list and their followers’ geographic locations. 3.4 Association Between Politics and Obesity Researchers Michael Shin and William McCarthy suggest in their study that residents in counties that vote Republican are electing legislators who embrace a political philosophy that makes them less likely to promote environmental and policy strategies for obesity prevention (2013). However, a critique of the study states that the authors of the study would benefit to consider a possible bias in measurement of local political inclination, due to the fact that propensity to turn out to vote in the 2012 election may be associated with individual-level health characteristics. Furthermore, Shin and McCarthy’s study fails to establish causal ordering between political preferences and dietary preferences (Gollust, 2013). Some observers have raised different hypotheses to attempt to explain this association. For instance, a research hypothesized whether neighborhoods in Democratic-tilting counties are more likely to have sidewalks, facilitating physical activity (Lindeke, 2012) or whether partisan preferences and food preferences are related in that partisanship is “passed on” among families in the same way food and exercise preferences are shaped by familial context (Kinder, 2006). Another example would be whether there are identified personality traits and psychological differences between liberals and conservatives, such that liberals are more likely to seek novelty and change while conservatives tend to be happier with the status quo (Carney et al, 2008). Despite the vibrant discussion aimed at understanding the causal association between politics and obesity, existing data on health behaviors and health status is paltry and limiting (Gollust, 2013), prompting the need for creative data linking and collection. 9/51 CHO, Ara (UNI: ac3772) On a related note, there is also a growing body of research investigating the effect of geography of fast food outlets on obesity. Most studies have found a positive association between proximity and density of fast food outlets and increasing deprivation. This may be due to a business strategy of targeting more deprived areas where real estate costs are lower or due to market research showing that demand in these areas is greater (Fraser et al, 2010). However, some studies using traditional methods such as a telephone survey failed to show that proximity to fast food restaurants was associated with obesity (Jeffrey et al, 2006). 3.5 Pundits as Proxy A Pew Research Center report found that the media outlets people named as their main sources of news about politics are strongly correlated with their political views. In other words, the Pew study can be interpreted to mean that people are living in partisan and ideological echo chambers. This correlation likely can be extrapolated to apply to the Twitter ream in that people likely follow news outlets or pundits whose political ideology they agree with. According to another study mentioned in this New York Times article gave a more nuanced interpretation of the Pew study, saying that partisans from both sides of the ideological aisle were most likely to consume news from outlets that were estimated to be relatively centrist, such as network morning shows and evening news broadcasts. This ideological echo chamber behavior was more or less consistent when looking at an analysis of online news consumption as well (Nyhan, 2014). Social media, too, appeared somewhat encouraging of this ideological echo chamber behavior, a phenomenon where individuals are primarily exposed to like-minded views. People tend to follow like-minded individuals on Twitter—about two-thirds of the people followed by the media Twitter user in the United States share the user’s political leanings, a study showed 10/51 CHO, Ara (UNI: ac3772) (Nyhan, 2014). Stephen Colbert is an American political satirist and host to a popular Comedy Central show called The Colbert Report. Colbert assumes an eponymous character that embodies the most outlandish traits associated with the right wing, attacking them in the process (West, 2014). On the opposite extreme is Bill O’Reilly, American political commentator and host to Fox News Channel political commentary program The O’Reilly Factor. O’Reilly is one of the more visible conservative political pundits, so he could serve as a proxy to capture right-wing Twitter users. For this reason, using characteristically liberal or conservative media pundits as proxy on Twitter may yield illuminating results about the partisan landscape of Twitter followers across continental U.S. 4. Data 4.1 Twitter Data Twitter is a social-networking, microblogging site that allows its users to post messages online in real-time, which are called Tweets. Tweets are restricted to 140 characters in length. Users of Twitter are sometimes called Tweeps, and because of the limitation in length of the message they can upload at a time, Tweeps use acronyms and emoticons in expressing their thoughts, conveying feelings or relaying information. Tweeps use the “@” (called, “at”) symbol to indicate an account, and “#” (called hash tag) to allow others to find the status related to the topic (Agarwal et al., 2011). Data collection via Twitter for research purposes usually entails one of three Application Program Interfaces (APIs) – the Streaming API, the REST API and the Search API (Weller et al., 2014). The Streaming API is likely the most widely used data source for Twitter research, and it works as a stream of data continually provides data real-speed (Weller et al., 2014). 11/51 CHO, Ara (UNI: ac3772) More specifically, I will be drawing a random sample of around 7,500 Twitter users who follow Pizza Hut, which has a little over a million followers worldwide. Of those randomly culled accounts, only those with U.S. state locations specified (e.g. California, New York, etc.) on their user profiles will be culled to form a data set with which regression analyses will be conducted. There are other ways of measuring Twitter user location, namely looking for Tweets that are geocoded. Those Tweets, however, cannot at the same time account for whom they follow, and because those Tweets can only be geocoded to predesignated locations, that sample would be restrictive when it comes to investigating the relationship between geographic locations and penchant for a particular fast food maker. Instead, I will be culling the randomly drawn sample of some 5,000 Twitter followers of a particular fast food franchise using text mining techniques in order to cull accounts associated with any one of the 51 U.S. states and district. And subsequently weighing the outcome using the states’ population, or controlling for population, I hope to be able to show, with some accuracy, that Twitter-harvested data, or analysis done using that data, can replicate a correlation between obesity-related lifestyle choice (as operationalized by expression of enthusiasm via Twitterfollowing) and voting behavior (as operationalized by voting results from the 2010 Presidential Election. Table 1 in the Appendix shows an example of a randomly generated, and subsequently cleaned, data set organized by state and the number of Twitter followers of Pizza Hut culled. Of the 7,500 users who follow Pizza Hut on Twitter, roughly 5,300 of them have indicated on their profile their geographic locations that match the name of any one of 50 U.S. states and 1 federal district. The initial 7,500 users, randomly generated using a statistical software command, is out 12/51 CHO, Ara (UNI: ac3772) of the more than 10 million followers worldwide of Pizza Hut. So the regression and spatial analyses will be done using this sample. The first column, named “abbr,” refers to abbreviations of state names, and the second column, dubbed “follow,” refers to the number of Twitter followers of Pizza Hut. This is a randomly generated sample and may not proportionally represent the actual number of people who follow Pizza Hut in each state, a limitation of using Twitter-generated data that was mentioned earlier in the paper. Figure 1, below, shows the summary statistics of this randomly culled sample of some 5,535 Twitter users who follow Pizza Hut. As you can see, the minimum value is zero, associated with the state of Vermont and the maximum of 318 associated with Oregon. Bias exists within these data, as each state has different levels of exposure to Pizza Hut and the Internet (and social networking services like Twitter), therefore, the numbers will be weighted with differences in state population in mind. The median number of followers is 56, and the average is around 80 per state. Figure 1: Summary Statistics of Example Sample Minimum 0 1st Quarter 13 Median 56 Mean 79.27 3rd Quarter 116 Maximum 318 Another potential method to measure Twitter users’ penchant for fast food, and therefore their correlation to obesity-inclined lifestyle choices, is to look at Tweeting history. In other words, by looking at users’ Tweeting history and see what they have been saying about the above-mentioned fast food makers. There is but one problem with that approach. A recent report found that unlike Facebook, Twitter has many users who are not active, with some 44 percent of all people who signed up have never event sent a single Tweet (Sherman, 2014). Furthermore, Twitter said in the last three months of 2013, less than a quarter of the total users have logged in 13/51 CHO, Ara (UNI: ac3772) at least a month, according to Sherman (2014). Because sending Tweets may reveal even less about a user than what who they follow may reveal, I will not be utilizing users’ Tweets or reTweets for that matter. 4.2 Comparison with Standard Methods In order to assess the efficacy of social media measures, I will discuss at length a traditional dataset used in this study, as well as other papers employing standard methodology while covering the same topic of obesity and partisan identification. First, this study looks at the state-level data set on the number of Pizza Huts in each state. The number of Pizza Huts per state will also be weighted for population size, that is to say, the number of Pizza Huts per capita. The reason for this is to contrast the results of analysis using Twitter data with those obtained through a more traditional research method. Depending on the result, this could show whether Twitter data, however rudimentary in some aspects, may be used to complement traditional data analysis or rather that it still needs further study before it can be considered a reliable method for social research. The dataset on Pizza Hut outlets has over 6,000 rows of data, with each row comprising information ranging from the state in which the branch is located, the longitude and latitude, the official name of the branch, address and phone number. I will only use the state identification of data from the U.S., which will leave me with roughly 5,800 branches of Pizza Hut, even though another data set states there are over 7,000 stores in the U.S. (Cosper, 2013). The data was collected in 2011 and includes Pizza Hut locations in Canada as well, which have been removed (Bulmer, 2013). The data not being more current is not likely to be a problem, as the collection date of 2011 is close to the time of the election results (2012) and the Census data from 2010, which will be elaborated later. 14/51 CHO, Ara (UNI: ac3772) Secondly, there are many other studies that looked at the relationship of obesity and partisanship, as well as obesity and proximity to fast food outlets, while using a more standard data set and traditional methodology. In a provocative but illuminating article in the journal of Preventative Medicine, researchers Shin and McCarthy showed that a country’s political inclination, as measured by the share of votes for the Republican candidate for president in 2012, is associated with county-level obesity prevalence. This study features age-adjusted county-level prevalence estimates for the percentage of adults who were obese, which is defined as those with BMI over 30. This data was based on an on-going, state-based telephone survey using randomdigit dialing of adults in the U.S, in which the estimates were derived from self-reported weight and height from responders. With regard to partisan identification, the county-specific percentage of votes obtained by the 2012 Republican Party candidate was used as a proxy for local political inclination, which was defined in the study as established and stable county-level voter preferences. Correlation analyses showed that county-level support for the Republican candidate closely followed patterns of support for Republican candidates in 2008 and 2004. The biggest difference between this author’s study and the research by Shin and McCarthy was that the latter used county-level data for obesity and partisan identification, whereas the former analyzed the data on a state level. The results of the research showed that higher county-level obesity prevalence rates were associated with higher levels of support for the republican candidate in the 2012 presidential election (Shin and McCarthy, 2013). 4.3 State-level Partisan Identification Data Both the Twitter-generated data and traditionally tallied information about the number of Pizza Hut stores will be used in the regression analysis along with state-level partisan identification data, or presidential election data. The data will first be visualized on maps for 15/51 CHO, Ara (UNI: ac3772) spatial analysis. Whether a state identifies as Republican or Democrat depends on a variety of factors including issues, the distribution of registered voters, voting records on either the state or federal level, stance on social issues, etc. In other words, there isn’t one agreed-upon way that decides a state’s partisan identity, but I have decided to take into account voting records of the 2012 Presidential Election in determining the state political identity (President Map, 2012). The year 2012 is relatively close in time with when the Pizza Hut store data was collected (2011), and also it is only two years after the state population data was collected in 2010 during the centennial Census. Due to the nature of the bipartisan political landscape in the U.S. the partisan identity of the states will be coded as binary, and regression using the Twitter follower figure will be logistic, for enhanced interpretability. The whole data set comprising Twitter data (numeric), state-level Pizza Hut store information (numeric), and state-level political identification data (binary) will be used for spatial analysis. Spatial analysis will help to visualize the figures and subsequently compare them, if in the case that regression analyses do not yield compelling comparative results. Summary statistics of the combined data set are shown in Table 2 in the Appendix portion. In the summary statistics, there are a total of 51 rows, for all 50 states and the District of Columbia, and 12 columns. The first column – “abbr” – refers to abbreviations for the state names, which were used for merging data frames. The second, third and eighth columns are created for similar purposes as the first column in that they are need to merge data frames and later merge them with shape files of the U.S. state to visually represent findings. The column named the number of followers refers to the number of Twitter users who follow Pizza Hut on Twitter and those who also indicated their geographic locations as one of the 51 states or district. The column the number of 16/51 CHO, Ara (UNI: ac3772) Pizza Huts contains information about the number of Pizza Hut locations in each state or district as of 2011. Though they are neither complete nor up-to-date, they offer accurate estimate as of 2011. Columns referred to as Republican tendency and Republican tendency represent results from the Presidential Election of 2012. The Twitter follower figure and Pizza Hut figures will be divided by each state’s population, to control for population, and they will be called per capita number of followers for Twitter follower data divided by the number of population of each respective state, and per capita number of Pizza Huts for the number Pizza Huts divided by the number of population for each respective state. In other words, variables weighted for population are per capita figures. 4.4 State-level Population Data One of the challenges of comparing one state with another by simply using the raw figure collected in the random sample is the difference in population of the states. California, with its 37 million population, is likely going to have more Twitter users in the random sample and more Pizza Huts, than say North Dakota, with some 673,000. To compare the two, or any other states with a wide population gap for that matter, would not be conducive to correctly understanding the relation between lifestyle choices (when it comes to obesity-inducing diet), which is operationalized as Twitter followers of Pizza Hut or the number of Pizza Huts in a given state, and partisan identification. In order to overcome this problem, the figures will be population “weighted,” or the analysis will be done using per-capita figures for Twitter followers and Pizza Hut outlets. The number of Twitter followers of Pizza Hut by each state or Pizza hut stores in one of 51 state or district will be divided up by each respective state’s population, as stated in the 2010 Census data. The divided outcome will then be multiplied by a 1,000 and rounded up so the outcome 17/51 CHO, Ara (UNI: ac3772) will be in whole figures for ease of interpretability. The state-level population data was culled from the U.S. Census data of 2010, which was collected on April 1, 2010, as stipulated by the U.S. Constitution. The 2010 data also roughly coincides with the Pizza Hut data, which dates from 2011, and two years shy of the 2012 presidential election outcome. 4.5 Media Pundit Follower Data With regard to Twitter followers of Stephen Colbert, some 4,241 users out of the randomly generated sample of 7,500 followers of Colbert (@StephenAtHome) have specified their geographic locations that are identified as being one of 50 U.S. states or one federal district. There are roughly 7.64 million followers of @StephenAtHome. As shown in Figure 2A, the lowest number of followers was attributed to South Dakota, followed by Louisiana and West Virginia. The highest number of followers was found in Indiana, with 305 Twitter users, followed by Oregon with 299 and California at 245. Figure 2A: Stephen Colbert’s Twitter Follower Data 1 2 3 4 5 6 7 8 9 10 11 12 abbr AK AL AR AZ CA CO CT DC DE FL GA HI follow 60 261 60 41 245 152 11 18 114 77 98 138 13 14 15 16 17 18 19 20 21 22 23 24 25 IA ID IL IN KS KY LA MA MD ME MI MN MO 240 71 142 305 23 21 2 133 16 63 111 21 75 1 26 27 28 29 30 31 32 33 34 35 36 37 38 MS MT NC ND NE NH NJ NM NV NY OH OK OR 12 3 70 199 219 14 22 17 36 70 36 32 299 39 40 41 42 43 44 45 46 47 48 49 50 51 PA RI SC SD TN TX UT VA VT WA WI WV WY 100 193 54 0 13 70 68 71 4 90 45 2 4 With regard to conservative pundit Bill O’Reilly, some 6,170 users out of the randomly generated sample of 7,500 followers of O’Reilly (@oreillyfactor) have specified their geographic 1 States with the number of followers exceeding 100 is colored blue, and the rest is colored red. 18/51 CHO, Ara (UNI: ac3772) locations that are identified as being one of 50 U.S. states or one federal district (Figure 2B). There are roughly 737,000 followers of @oreillyfactor. The state with the fewest number of followers was South Dakota and Vermont at zero, and the maximum number of followers was found in Oregon with 474, and Indiana, with 450. Figure 2B: Bill O’Reilly’s Twitter Follower Data 1 2 3 4 5 6 7 8 9 10 11 12 abbr AK AL AR AZ CA CO CT DC DE FL GA HI follow 57 326 65 65 303 192 28 30 130 150 152 234 13 14 15 16 17 18 19 20 21 22 23 24 25 IA ID IL IN KS KY LA MA MD ME MI MN MO 292 141 210 450 59 34 7 169 16 115 170 24 97 2 26 27 28 29 30 31 32 33 34 35 36 37 38 MS MT NC ND NE NH NJ NM NV NY OH OK OR 26 7 110 273 272 12 33 9 56 103 74 45 474 39 40 41 42 43 44 45 46 47 48 49 50 51 PA RI SC SD TN TX UT VA VT WA WI WV WY 151 288 69 0 48 177 90 112 0 133 78 6 8 5. Methodology The methodology will be in discussed in three main parts: a. Twitter-generated data of followers of Pizza Hut (@pizzahut), Stephen Colbert (@StephenAtHome), and Bill O’Reilly (@oreillyfactor) in each of the 50 U.S. states and one federal district will be regressed to show the correlation with the respective states’ political identification using linear regression analyses. b. Data on the number of Pizza Hut locations in each state, controlling for state population, will be regressed to show the correlation with the respective states’ political identification using linear regression analyses 2 States with the number of followers exceeding 100 is colored red, while the rest is colored blue. 19/51 CHO, Ara (UNI: ac3772) c. The statistics from the random sample along with the linear regression results will be visually represented on a continental U.S. map for geographic comparison to see what regression analysis could not capture may be gleaned using data visualization 5.1 Twitter Analysis Using R and Twitter application program interface, I will cull a random sample of around 5,000 followers of Pizza Hut using R command (Table 3 in Appendix) and subsequently cleaning the data and categorizing it to tally up the followers by state. Also culled will be followers of well-known conservative and liberal media pundits Colbert and O’Reilly. Cleaning of the data includes ridding the location information of the Twitter followers on their profile pages that used foreign languages, punctuation marks and other special symbols that do not offer valuable information regarding where those Twitter users are geographically located. The text cleaning process will be done using programming language r, and is demonstrated in the Appendix. After the data have been cleaned, the users will be counted based on which U.S. state they listed as their location. The tally will then be “weighted” by the total state population of each state. As I mentioned before, this process will involve a simple arithmetic function to divide the number of Twitter followers and the number of Pizza Hut stores by each respective state’s population. The divided figures will be multiplied by 1,000 and rounded up to yield whole numbers so that the outcome can be visualized without difficulty in spatial analytic tools. After the figures have been “weighted” by state population (i.e. controlled for population), I will utilize both regression analysis and spatial analysis to contrast the Twitter followers and Pizza Huts figures against a state’s political identification. Correlation between a 20/51 CHO, Ara (UNI: ac3772) state’s follower figure (weighted by population) and that state’s partisan identity based on voting records from the 2012 Presidential Election will be shown via a linear regression (President Map, 2012). The same methodology of culling, cleaning and categorizing will be applied toward datasets containing information about the Twitter followers of conservative and liberal political show hosts Colbert and O’Reilly. 5.2 Spatial Analysis Using spatial analysis to show the correlation between the number of followers, the number of Pizza Huts, and political identification may help to explore and present their relationship effectively. Should spatial analysis prove to be insufficient in fully illustrating the association between politics and obesity, additional research methods such as regression analysis will be employed. Using the data set that combined all three data frames—Twitter follower, Pizza Hut stores, and Presidential Election map—and matching them with a state-level shape file, I can color-code the degrees of numeric variables and compare them visually. The process will involve exporting the complete data frame from R and importing them in QGIS, a spatial analysis software, and matching them with the states and visualizing how each state compares with one another in terms of the number of Twitter followers, Pizza Hut stores, and political landscape. 5.3 Traditional Analysis Traditional analysis in this study refers to linear and logistic regression analyses of a dataset in a more traditional format and scope, or simply non-Twitter-generated data. I will utilize a dataset on the number of Pizza Hut stores in each state, controlling for state population, before regressing it against partisan identity of each respective state. Because Twitter data has 21/51 CHO, Ara (UNI: ac3772) many limitations, as discussed before, having a reference point to assess the efficacy of utilizing Twitter data for analysis as a prudent complement to traditional research method is important. It can also help to assess whether using Twitter requires further study and experimentation before being applied to complementing traditional research. It is not certain whether this traditional method will be able to effectively capture the correlation between politics and obesity as shown by Shin and McCarthy in their research using standard research methodology. 6. Results and Analysis 6.1 Spatial Analysis First, I will investigate the relationship between politics and obesity using proxies of Twitter following, Pizza Hut store information, and the election outcome in 2012. This analytic component will comprise visualizations of above statistics in a manner that could reveal trends that the author sets out to capture using Twitter data and traditional data of Pizza Huts, as well as randomly generated samples of Twitter followers of Stephen Colbert (@StephenAtHome) and Bill O’Reilly (@oreillyfactor), as they are the Twitter-verified, authenticated accounts belonging to the aforementioned celebrities. The supplementary analysis using political pundits as proxy will also be visualized in the same manner as described. The visualization will take the form of state-level shape files, with a gradient range of colors from blue and red, with blue representing Democratic states and red representing Republican ones. With regard to the Colbert and O’Reilly data sets, states with higher number of followers of Colbert will be represented as blue while states with higher number of O’Reilly followers will be shown in red. 22/51 CHO, Ara (UNI: ac3772) Using a combined data set of the above-mentioned variables, and matching them with a state-level shape file of the U.S., the degrees of numeric variables can be represented using different colors, and for comparison. The process will involve importing the data in QGIS, a spatial analysis software, and visualizing different values associated with the variables to compare the states in terms of Twitter followers, Pizza Hut stores, and political landscape, as well as Twitter following information relating to conservative and liberal political commentators and TV show hosts O’Reilly and Colbert, respectively. First and foremost, Map 1 will feature a U.S. map with the election outcome from 2012, which will serve as a reference point in determining a state’s partisan identification. The red represents Republican states and the blue represents states that voted for the Democratic Party. The West coast states are largely Democratic, while the Midwest and many Southern states are mainly Republican. Map 1: Presidential Election of 2012 Voting Outcome Data Source: The New York Times 23/51 CHO, Ara (UNI: ac3772) In Map 2, the number of Pizza Hut locations per state is distributed into five quintiles, and the unit is the number of stores. The states in the South including Georgia, the Carolinas, and Alabama are on the higher side, therefore red. But this Pizza Hut figure has not been weighted by state population, so states with high population like California are shown here as having more than 133 stores, therefore seen here as red. This can pose a problem, as the difference in the size of state population can hinder seeing correlation between having more Pizza Huts and voting more Republican. California is shown here as being very red, but as it will be shown in other visualizations, California will prove to be an exception in terms of having characteristics that are expected to be associated with Republican states. Map 2: The number of Pizza Huts in each state is shown using variations of color from blue to red, with different shades representing first through fifth quintiles. The specific number of stores is shown in the legend. In Map 3, the number of Twitter followers is also divided into five quintiles to show that California and Oregon are well represented in terms of Twitter using population. Especially, 24/51 CHO, Ara (UNI: ac3772) Oregon is well represented in terms of Twitter-using population along with Indiana. Also highly represented states with Twitter users (from the random sample) are California, North Dakota, Nebraska, Iowa, Alabama and New York, indicating that at least in this sample, the number of Twitter followers does not seem to indicate a correlation between having more Twitter followers of Pizza Hut and belonging in a state that has voted either Democratic or Republican. Map 3: The number of Twitter followers of Pizza Hut by state, represented with varying gradients from blue to red representing first through fifth quintiles of the number of followers by state. In Map 4, per capita number of Pizza Huts per state is represented. This map, too, features a division of the figures into five quintiles. The weighted figures were derived using a simple arithmetic function of dividing the number of Pizza Huts in each state by that respective state’s population. For ease of visual representation, the figures were multiplied by 1,000 and rounded up to yield whole numbers. 25/51 CHO, Ara (UNI: ac3772) Map 4: The number of Pizza Huts per capita in each state is shown using variations of color from blue to red, with different shades representing first through fifth quintiles. The figures in the legend have been multiplied by 1,000 to show the scale of the figures in whole numbers. In Map 5, population “weighted”, or per capita, Twitter follower figures are represented in five quintiles. The numbers were derived by simply dividing the number of Twitter followers of Pizza Hut by each respective state’s population. The outcome was subsequently multiplied and rounded up to yield whole numbers. Some states feature a high number of Twitter followers by population, and that trend shares similarities with the partisan map (Figure 3). Texas, where Pizza Hut’s headquarter is based (Texas Wide Open for Business, 2014), is understandably red, which means many Twitter followers of Pizza Hut were from the Lone Star State. There are also discernable pink areas in the central part of continental U.S., such as Nebraska, Kansas, Missouri, Arkansas and Louisiana that featured prominent number of Twitter followers of Pizza Hut per population. There were, however, exceptions, too, such as Minnesota and Florida, but 26/51 CHO, Ara (UNI: ac3772) considering the fact that previous election outcomes showed Florida as being more traditionally conservative in its voting record prior to the incumbent President Obama. Map 5: The number of Twitter followers of Pizza Huts per capita in each state is shown using variations of color from blue to red, with different shades representing first through fifth quintiles. The figures in the legend have been multiplied by 1,000 to show the scale of the figures in whole numbers. In Maps 6 and 7, the natural log of the per-capita figures of Pizza Hut stores and Twitter followers of Pizza Hut have been represented in quintiles. The reason for taking the log is to assess whether a more symmetric distribution of figures may yield information about any correlation between voting pattern and having high numbers of Pizza Hut stores and followers. There are some of the similarities can be seen between Maps 6 and 7. For instance, from the west, Idaho, Utah, New Mexico, Texas, Arkansas, Illinois and Ohio can be seen as having high figures on both maps with the remainder of the Midwest following a similar pattern. However, in Map 6, when it comes to some Southern states like the Carolinas, Georgia and Alabama, and 27/51 CHO, Ara (UNI: ac3772) Midwestern states like the Dakotas, even though they are traditionally Republican states, they are seen as having fewer Pizza Huts than other states, controlling for population. Using only Pizza Hut as a proxy for consumption of fast food has limitations, including not being able to capture all states where Pizza Hut may not be a popular brand of choice. Map 6: The log of the number of Pizza Huts per capita in each state is shown using variations of color from blue to red, with different shades representing first through fifth quintiles. The figures in the legend have been multiplied by 1,000 to show the scale of the figures in whole numbers. In Map 7, the log of the number of Twitter followers of Pizza Hut per capita is represented in five quintiles, and the map shows similarity to the map of the log of Pizza Hut data (Map 6). This suggests that while there is some evidence of correlation with political identification map, it isn’t more consistent or widespread enough to reliably use Twitter data to estimate correlation with voting behavior. There is, however, apparent correlation between having more Pizza Huts and having more Twitter followers of Pizza Hut, when controlling for population. 28/51 CHO, Ara (UNI: ac3772) Map 7: The log of the number of Twitter followers of Pizza Huts per capita in each state is shown using variations of color from blue to red, with different shades representing first through fifth quintiles. The figures in the legend have been multiplied by 1,000 to show the scale of the figures in whole numbers. In conclusion, spatial analysis was helpful in showing some correlation between the number of Twitter followers and the number of Pizza Huts, controlling for population. There is observable correlation between voting behavior (i.e. Democrat or Republican in the 2012 presidential election) and having a high number of Twitter followers of Pizza Huts (Map 5), though there were some noticeable exceptions including Florida. In summary, a couple of visualizations with figures, controlling for state population, show some correlation with voting behavior of each state consistent with the 2012 presidential election outcome, but the rest of the maps do not offer compelling reasons to accept them as credible evidence of correlation. 29/51 CHO, Ara (UNI: ac3772) In Map 8, the number of Twitter followers of Colbert is distributed into five quintiles, with higher numbers being represented as blue and lower numbers attributed to red. California, as is often the case, had the most number of followers, along with Oregon, North Dakota, Nebraska, Iowa, Illinois, Indiana, and Alabama. Red states, or states with the smallest following were Montana, Wyoming, South Dakota, Minnesota, Louisiana, Mississippi, etc. Many of those states have voted for the GOP candidate in 2012, but other than that, the correlation is not clear. Map 8: The number of Twitter followers of Stephen Colbert by state, represented with varying gradients from blue to red representing first through fifth quintiles of the number of followers by state. In Map 9, the number of Twitter followers of Colbert, divided by the state population and subsequently multiplied by 1,000 to yield whole numbers, is distributed into five quintiles, with higher numbers being represented as blue and lower numbers attributed to red. Texas, Utah, and Florida, among others, are shown to have high numbers of followers for Colbert. Similar to Map 10, the correlation is not apparent. 30/51 CHO, Ara (UNI: ac3772) Map 9: The number of Twitter followers of Colbert per capita by state, represented with varying gradients from blue to red representing first through fifth quintiles of the number of followers by state. In Map 10 the number of Twitter followers of O’Reilly is distributed into five quintiles, with higher numbers being represented as red and lower numbers attributed to blue. Again, California and Oregon are shown as being red, for they have high numbers of Twitter followers for O’Reilly, along with North Dakota, Nebraska, Iowa, Illinois, Indiana, and Alabama. Blue states, or states with fewer number of follows in the randomly generated sample, were Montana, Wyoming, New Mexico, South Dakota, Minnesota, Louisiana, Alabama, West Virginia, etc. Though many of the states are traditionally Republican in their voting pattern, they are shown as having fewer number of O’Reilly followers. This goes against expectations that conservative pundit will likely have more followers in traditionally conservative states, as people tend to follow ideologically like-minded people on Twitter (Nyhan, 2014). 31/51 CHO, Ara (UNI: ac3772) Map 10: The number of Twitter followers of Bill O’Reilly by state, represented with varying gradients from blue to red representing first through fifth quintiles of the number of followers by state. In Map 11, the per capita number of Twitter followers of O’Reilly, which was divided by the state population and multiplied by 1,000 to yield whole numbers, is distributed into five quintiles, with higher numbers being represented as red and lower numbers attributed to blue. Even after controlling for population, California, after controlling for population, is not shown as belonging to the top quintile for the Twitter follower of O’Reilly. Red states, such as Utah, North Dakota Texas and Indiana, are traditionally conservative states, while the other “red” states like Illinois and Ohio aren’t. States that have small number of followers of O’Reilly and also voted for Obama in 2012 are Nevada, New Mexico, Colorado, Michigan, Delaware, Connecticut, Maryland, and Maine. This is about eight out of 12 states that 32/51 CHO, Ara (UNI: ac3772) are tagged as blue. Other than that, there wasn’t substantial correlative relationship between O’Reilly followers’ geographic location and traditionally conservative states. Map 11: The number of Twitter followers of O’Reilly per capita by state, represented with varying gradients from blue to red representing first through fifth quintiles of the number of followers by state. The figures were multiplied by 1,000 to be represented using whole figures. In conclusion, using a seemingly more direct proxy of conservative and liberal icons did not result in clear correlation of political identification of a geographic location and Twitter following. There was some correlation that could be found in the visualizations of O’Reilly Twitter data, in that when controlling for population, states that are traditionally Democratic belonged to the lowest quintile of O’Reilly’s Twitter followers data, while many traditionally red states, or states that vote Republican, were not as widely represented. The effect was minimal in the Colbert data. In other words, the figures associated with conservative pundit were more consistent with expectations. Another aspect to note is that certain states like California and 33/51 CHO, Ara (UNI: ac3772) Oregon had unusually larger numbers of Twitters followers for all iteration, from Pizza Hut to Colbert to O’Reilly, that those two states always had follower figures far larger than other states with similar population size. This may be due to the states’ characteristic of being more digital social media friendly, but that is conjecture for the time being. 6.2 Linear Regression A sample size of roughly 5,300 Twitter followers of Pizza Hut was culled from some 10 million followers of the fast food pizza franchise. The collected data was then separated into different U.S. state or district, depending on which geographic location each user specified on their public account profiles. This data was subsequently regressed with other variables such as the number of Pizza Huts in each state and the political identification of each state, and then the results visually represented using spatial analytic tools. First, I ran a linear regression of the variable the number of followers the number of Twitter followers of Pizza Hut in the sample, and the number of Pizza Huts in each, as illustrated in Table 4 in Appendix. This exercise was aimed at showing that there is a positive relationship between having more Pizza Huts and having more Twitter followers of Pizza Hut by state. As you can see, additional increase in a unit of the number of Pizza Hut outlets leads to a 0.118 point increase in the number of followers. The relationship is positive and consistent with expectations that there are likely more followers of Pizza Hut in states where there are more Pizza Hut outlets, but the finding is not statistically significant. After affirming that there is a positive correlation between the number of Twitter followers and number of Pizza Hut stores, I will then try to capture their correlation with respect to each state’s political identification, as demonstrated in Tables 5 and 6 in the Appendix portion. 34/51 CHO, Ara (UNI: ac3772) In Table 5, a positive correlation can be observed between the variable the number of Pizza Huts and the dependent variable—the number of followers. In other words, a unit increase in the number of Pizza Huts will lead to a 0.11-point increase in “follower.” Positive correlation can also be observed between variable Republican tendency and the dependent variable. When a state is “red,” or a Republican state, it is associated with a 6.4-point increase in the variable the number of followers. This means that in states that have voted for the Republican Party in the presidential election in 2012 are likely to have a 6.36 point increase in the number of followers compared with a or blue state, which voted for the Democratic Party. Again, the positive sign of the regression coefficients support the idea that the variables are positively correlated. In Table 6, the dependent variable is the number of Pizza Huts. The positive correlation between the number of followers and the number of Pizza Huts remains, but the relationship is reversed when the variable Republican tendency is regressed. If a state is a red state, it will lead to a 0.27 decrease in the number of Pizza Huts. This reversal of the correlation is a bit problematic in that it goes against the hypothesis of this study—there is a positive correlation between lifestyles that are in alignment with obesity and voting preference leaning toward the Republican Party. As I mentioned before, one of the challenges of comparing one state with another is the difference in population. To account for this difference in state population, I have divided the number of Twitter followers and Pizza Hut stores by each respective state’s population and regressed with the variable Republican tendency. Because the value of the state population is far bigger than the value of the Twitter follower or Pizza Hut store figures, I have used a thousandth of the population figures, in addition to multiplying the outcome by a thousand, in order to yield coefficients that are more 35/51 CHO, Ara (UNI: ac3772) easily interpretable, as in are in whole numbers. These “weighted” figures, or figures controlling for population, will also paint a more accurate picture of how well represented each state is when it comes to Twitter followers and Pizza Huts. The outcome of the population “weighted” figures would be variables per capita number of followers or the number of Twitter followers divided by state population, and per capita number of Pizza Huts or the number of Pizza Huts in each state divided by the number of population by each respective state. A linear regression between the dependent variable per capita number of Twitter followers and independent variables per capita number of Pizza Huts and Republican tendency yields results (Table 7) similar to findings in Table 6, in which a positive correlation between the number of followers and the number of Pizza Huts was maintained, while the opposite was true for the variable Republican tendency. However, one notable change is in the statistical significance; the p-value associated with the coefficient for per capita number of Pizza Huts has dramatically improved to yield a statistical significance on the p <0.0001 levels Table 7: Linear regression of the per capita number of Twitter followers of Pizza Hut (DV) and the per capita number of Pizza Hut outlets per state (IV) and partisan identification. Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 463.7313 511.1573 0.907 0.369 Per capita 0.6524 0.1246 5.234 3.61e-­‐06 *** Pizza Huts Republican -­‐23.2977 626.4094 -­‐0.037 0.970 Tendency Notes: These are linear regression results (Dummy; 1=Republican, 0=Democratic). The asterisks indicate statistical significance. Adjusted R-­‐squared associated is 0.351. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐ value < 0.1. This means that there is less than 0.1 percent chance of obtaining a t-value of 5.234 or larger. The regression coefficient associated with per capita number of Pizza Huts shows that an increase in the unit of per capita number of Pizza Huts results in a 0.65 increase in per capita 36/51 CHO, Ara (UNI: ac3772) number of followers, which means that where there is more Pizza Huts per capita, there is also more Twitter followers of Pizza in that state. Going from being a Democratic state to a Republican state makes per capita number of Twitter followers go down by 23.29 points, which is not supportive of the hypothesis. This coefficient, however, is not statistically significant. The same pattern was observed in the linear regression between per capita number of Pizza Huts and per capita number of Twitter followers and Republican tendency (Table 8). Table 8: Linear regression results of the per capita number of Pizza Huts (DV) and the per capita number of Twitter followers of Pizza Hut (IV) and partisan identification. Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 1243.9241 441.2010 2.819 0.00697 ** Per capita 0.5570 0.1064 5.234 3.61e-­‐06 *** No. of follower Republican -­‐726.7353 569.2182 -­‐1.277 0.20784 Tendency Notes: These are linear regression results of the per capita number of Pizza Huts (DV) and the per capita number of Twitter followers of Pizza Hut outlets per state (IV) and partisan identification (Dummy; 1=Republican, 0=Democratic). The asterisks indicate statistical significance. Adjusted R-­‐squared associated is 0.3723. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. With regard to Pizza Hut data, linear regression method was inadequate in representing correlation between Twitter following, the number of Pizza Hut stores, and voting behavior. This is likely due to the limitation of Twitter data and its inability to capture meaningful trends found in real life setting. With regard to the supplementary analysis using media pundits as proxy, there was observable correlation between political identification, represented by Republican tendency for the Colbert data and Republican tendency for the O’Reilly data, and Twitter follower in the O’Reilly data, but not in the Colbert data. As seen in Table 9, with an increase in the unit of the number of followers there was a positive 0.06 point increase in binary (in the direction from Democrat to Republican). The regression coefficient is significant at the p < 0.5 level. The regression doesn’t reveal a lot of information about the correlative relationship between the two. 37/51 CHO, Ara (UNI: ac3772) In Table 10, a linear regression of the variables Republican tendency and the number of followers and per capita number of followers and state population shows, again, no significant regression between the said variables. The relationship is positive though, which means that with increases in the number of followers (before and after controlling for population) are positively correlated with Republican tendency, or the Republican state. This is the opposite of the expected relationship between a liberal political commentator and their followers. Table 10: Linear regression of partisan identification (DV) and the number of Twitter followers of Colbert, the per capita number of Twitter followers of Colbert, and the size of the population. Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 1.016977 0.223765 4.545 3.85e-­‐05 *** No. of Twitter 0.008731 0.005936 1.471 0.148 Followers Per capita 0.002747 0.005146 0.534 0.596 No. of Followers Population 0.006856 0.004842 1.416 0.163 Per state Notes: These are linear regression results of partisan identification (DV) and the number of Twitter followers of Stephen Colbert per state (IV), the per capita number of Twitter followers of Colbert, and state population. The asterisks indicate statistical significance. Adjusted R-­‐squared associated is 0.04279. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. In Table 11, a linear regression of the variables Republican tendency and the number of followers and per capita number of followers, and state population shows a statistically significant relationship between the number of followers and per capita number of followers. This indicates that the number of Twitter followers reflects the population of the state in that the increases are proportional. This is supportive evidence that Twitter-harvested data on the list of followers may be representative of the geographic distribution of the population they purport to represent. 38/51 CHO, Ara (UNI: ac3772) Table 11: Linear regression of the number of Twitter followers of Stephen Colbert (DV) and the per capita number of Twitter followers of Colbert per state (IV), with covariates partisan identification both as factor and numeric variables. Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 14.9242 3.5984 4.147 0.000136 *** Per capita 0.23 0.1143 2.012 0.049806 * No. of Followers Republican 5.4134 3.3139 1.634 0.108890 Tendency (factor) Republican NA NA NA NA Tendency (numeric) Notes: Linear regression of the number of Twitter followers of Stephen Colbert (DV) and the per capita number of Twitter followers of Colbert per state (IV), with covariates partisan identification both as factor and numeric variables (Dummy; 1=Republican, 0=Democratic). The asterisks indicate statistical significance. Adjusted R-­‐squared associated is 0.09808. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. Different outcome was shown in the regression analysis using the O’Reilly data. As seen in Table 12, with an increase in the unit of the number of followers there was a positive 0.06 point increase in binary (in the direction from Democrat to Republican). The regression coefficient is significant at the p < 0.1 level, which means that there is less than 1 percent chance of repeating a z-value of 2.29 as seen in the coefficient. The regression reveals a statistically significant outcome between the two said variables, and further, meets the expectations that states with a greater number of Twitter followers of conservative darling O’Reilly will be more inclined to be Republican, or to have voted for the GOP in the 2012 Presidential Election. This is in contrast to what was seen in the Colbert data, the reason may be due to the size of the samples or due to the fact that Colbert is considered a comedian and an entertainer, whereas O’Reilly takes on the role as a show host of a political commentary program. 39/51 CHO, Ara (UNI: ac3772) Table 12: Generalized Linear Model of partisan identification (DV) and the number of Twitter followers of Bill O’Reilly, with covariates the per capita number of Twitter followers of O’Reilly, and state population. Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) -­‐3.1502 1.16605 -­‐2.702 0.0069 ** No. of Twitter 0.05608 0.02449 2.290 0.0220 * Followers Per capita 0.02243 0.02301 0.975 0.3295 No. of Followers Population 0.03929 0.02325 1.69 0.0910 . Per Population Notes: Generalized Linear Model of partisan identification (DV) and the number of Twitter followers of Bill O’Reilly, with covariates the per capita number of Twitter followers of O’Reilly, and state population. The asterisks indicate statistical significance. The AIC associated is 67.431. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. In Table 13, a linear regression of the variables Republican tendency and the number of followers, and per capita number of followers, and state population shows a statistically significant relationship between the number of followers and Republican tendency This shows that an increase in a unit of the number of followers is positively correlated with Republican tendency which is coded as “0” for Democratic state and “1” for Republican. The regression coefficient of 0.012 indicates that a state with a greater number of Twitter followers of O’Reilly in the random sample is more likely to have voted for GOP in the 2012 Presidential Election, controlling for population. Again, this is a statistically significant finding at the p < 0.01 level. Table 13: Linear Regression of partisan identification (DV) and the number of Twitter followers of Bill O’Reilly, with the per capita number of Twitter followers of O’Reilly, and state population. Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 0.857048 0.201669 4.250 0.000101 *** No. of Twitter 0.011877 0.004853 2.447 0.018190 * Followers Per capita 0.004698 0.004839 0.971 0.336607 No. of Followers Population 0.007704 0.004447 1.733 0.089717 . Per Population Notes: Linear regression of partisan identification (DV) and the number of Twitter followers of Bill O’Reilly, with covariates the per capita number of Twitter followers of O’Reilly, and state population. The asterisks indicate statistical significance. The adjusted R-­‐squared associated is 0.1482. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. 40/51 CHO, Ara (UNI: ac3772) In Table 14, a linear regression of the variables Republican tendency and the number of followers and per capita number of followers, and state population shows a statistically significant relationship between the number of followers and Republican tendency as a factor variable. If a state has voted Republican in the most recent presidential election, as opposed to voting Democrat, that state will have an average increase of about 8.85 point in the number of followers numbers, meaning more followers of O’Reilly in the randomly generated sample. The regression coefficient is statistically significant at the p < 0.01 There is a more moderate, statistically less significant correlation between population-controlled Twitter follower data and the raw figures for Twitter followers of O’Reilly; an increase in the unit of Twitter follower figure, controlling for population, is correlated with an increase in the Twitter follower figure without controlling for population. This can be interpreted to mean that the Twitter-generated sample is somewhat proportional to, therefore representative of, the total population of the states. Table 14: Linear regression of the number of Twitter follower of O’Reilly (DV) and the per capita number of Twitter followers and partisan identification, both as factor and numeric. Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 13.9541 3.8963 3.581 0.000795 *** Per capita 0.257 0.1318 1.949 0.057133 . No. of Followers Republican 8.8544 3.7752 2.345 0.023186 * Tendency (factor) Republican NA NA NA NA Tendency (numeric) Notes: Linear regression of the number of Twitter follower of O’Reilly (DV) and the per capita number of Twitter followers and partisan identification, both as factor and numeric (DV; Dummy; 1=Republican, 0=Democratic). The asterisks indicate statistical significance. Adjusted R-­‐squared associated is 0.1661. ***: p-­‐ value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. In conclusion, using well-known political commentators as proxy was somewhat illuminating when it comes to the efficacy of Twitter in generating representative sample of the U.S. population. With regard to investigating the relationship between political identification of 41/51 CHO, Ara (UNI: ac3772) the states (i.e. voting behavior data from the 2010 presidential election) and the size of the Twitter following, there was observable, positive correlation in the O’Reilly data whereas there wasn’t statistically significant correlation in the Colbert data. The author is uncertain as to the exact reason for this, as there are many factors that might lead to such outcome: difference in size of the randomly generated sample, the nature of the TV shows hosted by each political pundits (i.e. one is a satirical comedy show with political content while the other is a political commentary show part of Fox News), etc. 7. Limitation Murmurs on the tech industry that Twitter is on its way out have been around for over a year now (LaFrance and Meyer, 2014). Media pundits say an Internet publishing platform that carried us into the mobile age is receding, and what was once a sounding board for people to organize ideas around different opinion became a place of narratives took the turn for the predictable in a cruel, petty and facetious way (LaFrance and Meyer, 2014). This waning influence of Twitter, though its impact on publishing will likely remain, is just one of many limitations of using Twitter-generated data. Furthermore, there are other, more obvious limitations to using Twitter-generated data, as it hasn’t been proven to be highly representative of actual population it purports to reflect. It is with that limitation in mind that this study has been conducted; it is to determine to what degree utilizing Twitter data, or data from other types of social network services, can be helpful in research. The spatial analysis component was included in this study as part of effort to capture a correlation between partisan identification and dietary preference that was more or less elusive in regression analyses. However, the spatial component still relied on the same figures that were 42/51 CHO, Ara (UNI: ac3772) used in the regression analyses, and neither the linear or logistic regression yielded meaningful information about how correlated the above-mentioned variables were. This study is aware of the many limitations each step of the analysis and the dataset had, but also that much of it was due to the novel characteristic and size of the data. 8. Conclusion Research efforts to unpack and ultimately shift the determinants of health outcomes are often hampered by deficits in existing data sources. For instance, national health surveys do not include information on individuals’ party identification or other political beliefs. Similarly, national sources of political data offer paltry measures of health behaviors or status (Gollust, 2013). Thus, creative data linking and collection is critical, and this study is just one example of such attempt. More specifically, this thesis has set out to investigate how social media measures compare with more standard ones in the context examining the association of politics and obesity. This study has employed two main methods—regression and spatial analyses—to see whether a correlation between politics and obesity can be replicated. Spatial analysis was marginally more helpful in replicating the correlation. One map, of the per capita number of Twitter followers of Pizza Hut per state (Map 5), showed more similarity with the presidential election map (Map 1). Another map, showing the per capita number of Pizza Huts per state (Map 4), also shared similarities with the 2012 election map. Those maps, however, had enough exceptions to the pattern that they are not above scrutiny. Twitter data analysis, at least when using data on geographic locations of followers, is a research method that requires a more careful calibration and scrutiny for its results to complement a traditional research method. 43/51 CHO, Ara (UNI: ac3772) This paper was able to find that there is no discernible relationship at the state level, between the number of Pizza Hut followers per capita and whether the state is Republicanvoting, despite many patterns to the contrary. The relationships among the many variables that serve as proxies for partisan identification and measures of obesity are sufficiently are weak, the measures are sufficiently coarse, and the units of analysis are large enough that a convincing relationship between the number of Pizza Hut followers per capita and whether the state is Republican-voting is not found. That said, this paper did establish that the number of Twitter followers of Pizza Hut serves as a good predictor of how many Pizza Hut outlets there are in each state, a promising development and one that could be extended to other domains, where social media data can help to capture hard-to-measure reality. 44/51 CHO, Ara (UNI: ac3772) References Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2011). Sentiment Analysis of Twitter Data. Proceedings of the Workings on Languages in Social Media, 11, 30-38. Retrieved March 11, 2015, from http://dl.acm.org/citation.cfm?id=2021114 Anderson, A., Goel, S., Huber, G., Malhotra, N., and Watts, D. (2014). Political Ideology and Racial Preferences in Online Dating. Sociological Science 1 (February 18, 2014), 28-40. Retrieved on April 26, 2015, from http://tinyurl.com/o92nkyv Barry, C.L., Brescoll, V.L, Brownell, K.D., and Schlesinger, M. (2009). Obesity Metaphors: How Beliefs About the Causes of Obesity Affect Support for Public Policy. Milbank Quarterly, 87 (2009), 7-47. Bulmer, J. (2013). Fast Food Map. Retrieved March 29, 2015 from https://github.com/hardwork-aunts/james-bulmer Carney, D.R., Jost, J.T., Gosling, S.D., and Potter, J. (2008). The Secret Lives of Liberals and Conservatives: Personality Profiles, Interaction Styles, and the Things They Leave Behind. Political Pyschology, 29 (2008), 807-840. Cosper, D. (2013, May 13). 10 Biggest Fast Food Chains in the U.S. (By Location). Retrieved May 30, 2015, from http://ezlocal.com/blog/post/10-largest-fast-food-chains-2013.aspx Currie, J., DellaVigna, S., Moretti, E., and Pathania, V. (2009). The Effect of Fast Food Restaurants on Obesity. Retrieved on April 27, 2015, from http://eml.berkeley.edu/~sdellavi/wp/fastfoodJan09.pdf Duffey, K. (2007). Differential Associations of Fast Food and Restaurant Food Consumption with 3-y Change in Body Mass Index: The Coronary Artery Risk Development in Young Adults Study. The American Journal of Clinical Nutrition, 85(1), 201-201. Retrieved March 9, 2015. Fraser, L., Edwards, K., Cade, J., and Clarke, G. (2010). The Geography of Fast Food Outlets: A Review. The International Journal of Environmental Research and Public Health, 7(5), 2,290-2,308. Retrieved April 26, 2015, from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2898050 Gollust, S. (2013). Obesity in Red and Blue: Understanding the Associations Between Politics and Obesity. Preventative Medicine, 57:5 (November 2013), 436-437. Retrieved on April 28, 2015, from http://www.sciencedirect.com/science/article/pii/S0091743513003071 Have, M., de Beauford, D., Teixeira, P.J., Mackenbach, J.P., van der Heide, A (2011). Ethics and Prevention of Overweight and Obesity: An Inventory. Obesity Reviews, 12(9), 1. Retrieved April 2, 2015. 45/51 CHO, Ara (UNI: ac3772) Jeffrey, R., Baxter, J., McGuire, M., and Linde, J. (2006). International Journal of Behavioral Nutrition and Physical Activity, 3:2. Retrieved on April 27, 2015, from http://www.ijbnpa.org/content/3/1/2 Kersh, R. (2009). The Politics of Obesity: A Current Assessment and Look Ahead. The Milbank Quarterly, 87(1), 295-316. Retrieved March 8, 2015, from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2879181 Kinder, D.R. (2006). Politics and the Life Cycle. Science, 312 (2006), 1,905-1,908. Krugman, P. (2015, March 5). The Conscience of a Liberal. The New York Times. Retrieved March 8, 2015, from http://krugman.blogs.nytimes.com/2015/03/05/heavypolitics Krugman, P. (2015, March 6). Pepperoni Turns Partisan. The New York Times. Retrieved March 8, 2015, from http://www.nytimes.com/2015/03/06/opinion/paul-krugmanpepperoni-turns-partisan.html Kumar, S., Morstatter, F., and Liu, H. (2014). 1. Introduction. In Twitter Data Analytics. New York, New York: Springer. LaFrance, A. and Meyter, R. (2014, April 30). A Eulogy for Twitter. The Atlantic. Retrieved on April 4, 2015, from http://www.theatlantic.com/technology/archive/2014/04/a-eulogyfor-twitter/361339 Lazer, D., Kennedy, R., King, G., and Vespignani, A. (2014). The Parable of Google Flu: Traps in Big Data Analysis. Sciences 343 (March 14), 1,203-1,205. Retrieved on April 26, 2015, from http://gking.harvard.edu/publications/parable-google-flu%C2%A0traps-bigdata-analysis Lindeke, B. (2012, November 6). Do Sidewalks Make You Vote Democratic? Streets.mn. Retrieved on April 28, 2015, from http://streets.mn/2012/11/06/do-sidewalks-make-youvote-democratic Makice, K. (2009). Twitter API: Up and running. Sebastopol, California: O'Reilly. Martin, A. (2015, March 3). Inside the Powerful Lobby Fighting for Your Right to Eat Pizza. Bloomberg Business. Retrieved March 8, 2015, from http://www.bloomberg.com/news/features/2015-03-03/junk-food-s-last-stand-the-pizzalobby-is-not-backing-down Mitchell, A, and Hitlin, P. (2013, March 4). Twitter Reaction to Events Often at Odds with Overall Public Opinion. Pew Research. Retrieved on April 26, 2015, from http://www.pewresearch.org/2013/03/04/twitter-reaction-to-events-often-at-odds-withoverall-public-opinion 46/51 CHO, Ara (UNI: ac3772) Nyhan, B. (2014, October 24). Americans Don’t Live in Information Cocoons. The New York Times. Retrieved on April 13, 2015, from http://www.nytimes.com/2014/10/25/upshot/americans-dont-live-in-informationcocoons.html Obesity Prevalence Maps. (2014, September 9). Retrieved March 8, 2015, from http://www.cdc.gov/obesity/data/prevalence-maps.html OpenSecrets (2015, February 2) Restaurants & Drinking Establishments: Top Contributors, 2013-2014. Retrieved March 8, 2015, from http://www.opensecrets.org/industries/contrib.php?cycle=2014&ind=G2900 Pew Research. (2014, September 1). Social Networking Fact Sheet. Retrieved March 1, 2015, from http://www.pewinternet.org/fact-sheets/social-networking-fact-sheet Pew Research. (2015, January 9). Social Media Update 2014. Retrieved March 9, 2015, from http://www.pewinternet.org/2015/01/09/social-media-update-2014 Pianin, E. (2014, June 19). Michelle Obama Battles Over School Lunch Nutrition Standards.The Fiscal Times. Retrieved March 9, 2015, from http://www.openculture.com/2014/02/kurt-vonnegut-masters-thesis-rejected-by-uchicago.html Pianin, E., & Ehley, B. (2014, June 19). Budget Busting U.S. Obesity Costs Climb Past $300 Billion a Year. The Fiscal Times. Retrieved March 9, 2015, from http://www.thefiscaltimes.com/Articles/2014/06/19/Budget-Busting-US-Obesity-CostsClimb-Past-300-Billion-Year President Map (2012, November 29). Election 2012. The New York Times. Retrieved March 30, 2015, from http://elections.nytimes.com/2012/results/president Rhodes, D., Adler, M., Clemens, J., LaComb, R., & Moshfegh, A. (2014, February 1). Consumption of Pizza: What We Eat in America, NHANES 2007-2010. Retrieved March 9, 2015, from http://www.ars.usda.gov/SP2UserFiles/Place/80400530/pdf/DBrief/11_consumption_of_ pizza_0710.pdf Russell, M. (2011). 21 recipes for mining Twitter. Sebastopol, Calif.: O'Reilly Media. Satcher, D. (2001, October 1). The Surgeon General's Call to Action to Prevent and Decrease Obesity. Retrieved March 9, 2015, from http://www.cdc.gov/nccdphp/dnpa/pdf/CalltoAction.pdf Sherman, Erik. (2014, April 14). Many Twitter Users Don’t Tweet, Finds Report. CBS Money 47/51 CHO, Ara (UNI: ac3772) Watch. Retrieved March 29, 2015, from http://www.cbsnews.com/news/many-twitterusers-dont-Tweet-finds-report Shin, M., and McCarthy, W (2013). The Association Between County Political Inclination and Obesity: Results from the 2012 Presidential Election in the United States. Preventative Medicine, 57:5 (November 2013), 721-724. Retrieved on April 27, 2015, from http://www.sciencedirect.com/science/article/pii/S0091743513003046 Swanson, A. (2015, April 7). Map: The Most Liberal and Conservative Towns in Each State. The Washington Post. Retrieved April 9, 2015: http://www.washingtonpost.com/blogs/wonkblog/wp/2015/04/07/map-the-most-liberaland-conservative-towns-in-each-state/ Texas Wide Open For Business (2014) Fortune 500 Companies in Texas: The Lone Star State Is Home to 52 Fortune 500 Corporate Headquarters. Retrieved on April 4, 2015, from https://texaswideopenforbusiness.com/sites/default/files/11/14/14/texasfortune500.pdf Trinko, K. (2014, March 10). Soda Ban? What About Personal Choice? Column. USA Today. Retrieved April 2, 2015, from http://www.usatoday.com/story/opinion/2013/03/10/sodaban-what-about-personal-choice-column/1977091 Weller, K., Bruns, A., Burgess, J., Mahrt, M., & Puschmann, C. (Eds.). (2014). Twitter and Society. New York: Peter Lang Publishing. West, J. (2014, December 18). Listen to the Real Stephen Colbert Explain How He Maintained His Flawless Character for 9 Years. Mother Jones. Retrieved on April 13, 2015, from http://www.motherjones.com/mixed-media/2014/12/stephen-colbert-character-podcastartist-farewell Wieder, R. (2013, March 21). Selling Unhealthy Food as “Freedom.” CalorieLab. Retrieved April 2, 2015, from http://calorielab.com/news/2013/03/21/selling-unhealthy-food-asfreedom 48/51 CHO, Ara (UNI: ac3772) Appendix Table 1: Twitter followers of Pizza Hut (@pizzahut) by U.S. state from a randomly generated sample of 7,500 yielded some 5,535 Twitter users: States with followers of 100 or more are colored 3 red, and the rest in blue. 1 2 3 4 5 6 7 8 9 10 11 12 abbr AK AL AR AZ CA CO CT DC DE FL GA HI follow 58 217 59 21 226 131 18 4 125 54 99 147 13 14 15 16 17 18 19 20 21 22 23 24 25 IA ID IL IN KS KY LA MA MD ME MI MN MO 209 67 158 303 38 15 8 133 5 103 96 7 85 26 27 28 29 30 31 32 33 34 35 36 37 38 MS MT NC ND NE NH NJ NM NV NY OH OK OR 19 6 56 227 175 12 12 9 13 51 47 31 318 39 40 41 42 43 44 45 46 47 48 49 50 51 PA RI SC SD TN TX UT VA VT WA WI WV WY 107 187 41 0 13 72 67 68 0 70 50 4 2 Table 2: Summary statistics of combined data 3 California and Oregon yielded unusually large numbers for most iteration. This may be due to the fact that the states have more younger population that is active on social media platforms. 49/51 CHO, Ara (UNI: ac3772) Table 3: R commands (a selection) to cull randomly selected sample of Twitter followers Table 4: Linear regression of the number of Twitter followers of Pizza Hut (DV) and the number of Pizza Huts per state (IV), with no covariates Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 20.1528 3.6783 5.479 1.47e-­‐06 *** No. of Pizza Huts 0.1184 0.1486 0.797 0.429 Notes: These are linear regression results of the number of Twitter followers of Pizza Hut (DV) and the number of Pizza Hut outlets per state (IV), with no covariates. The asterisks indicate statistical significance. Adjusted R-­‐squared associated is -­‐0.007352. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. Table 5: Linear regression of the number of Twitter followers of Pizza Hut (DV) with independent variables the number of Pizza Huts and the number of political identification Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 17.2459 4.0087 4.302 8.27e-­‐05 *** No. of Pizza Huts 0.1143 0.1460 0.783 0.437 Republican 6.3618 3.8027 1.673 0.101 Tendency Notes: These are linear regression results of the number of Twitter followers of Pizza Hut (DV) and the number of Pizza Hut outlets per state (IV) and partisan identification (Dummy; 1=Republican, 0=Democratic). The asterisks indicate statistical significance. Adjusted R-­‐squared associated is 0.02832. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. 50/51 CHO, Ara (UNI: ac3772) Table 6: Linear regression of number of Pizza Huts (DV) and the number of Twitter followers of Pizza Hut outlets per state (IV) and partisan identification Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 18.6860 3.7705 4.956 9.37e-­‐06 *** No. of Twitter 0.1103 0.1409 0.783 0.437 Followers Republican -­‐0.2677 3.8431 -­‐0.070 0.945 Tendency Notes: These are linear regression results of the number of Pizza Huts (DV) and the number of Twitter followers of Pizza Hut outlets per state (IV) and partisan identification (Dummy; 1=Republican, 0=Democratic). The asterisks indicate statistical significance. Adjusted R-­‐squared associated is -­‐0.02823. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. Table 9: Generalized Linear Model of the partisan identification (DV) and the number of Twitter followers of Stephen Colbert, the per capita number of Twitter followers and the size of the population per state. Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) -­‐0.930653 1.117349 -­‐0.833 0.4049 No. of Twitter 0.062072 0.032648 1.901 0.0573 . followers Per capita -­‐0.398678 0.419067 -­‐0.951 0.3414 No. of followers No. of population -­‐0.002284 0.032286 -­‐0.071 0.9436 By state Notes: These are linear regression results of partisan identification (DV; Dummy; 1=Republican, 0=Democratic) and the number of Twitter followers of Stephen Colbert per state (IV), the per capita number of Twitter followers and the size of the population per state. The asterisks indicate statistical significance. AIC associated is 71.167. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. 51/51