here - Squarespace

advertisement
 The Efficacy of Social Media Measures:
Assessing Association Between
Fast Food Consumption and Political Preference
Using Twitter Data and Spatial Analysis
Ara Cho
Quantitative Methods in the Social Sciences (QMSS)
Graduate School of Arts and Sciences
Columbia University
Advisor: Gregory Eirich, Director of QMSS
May 2015
This paper was completed as part of the requirement for the QMSS program. I am indebted to Dr. Gregory M. Eirich for his invaluable advice and direction, as well as the many specific suggestions he offered that are now integral part of this thesis. I am also grateful to Dr. Cho Sung-­‐Woo for important pointers throughout the thesis class. All errors are my own. CHO, Ara (UNI: ac3772)
1. Abstract
With the ever-growing availability of online data, social scientists are trying to
understand how it compares with more traditional measures of public opinion and behavior
(Anderson et al., 2014; Lazer et al., 2013). This thesis investigates how social media measures
compare with more standard ones in one particular context: the relationship of fast food
consumption and political preferences, at the state level. This paper finds that while (a) there are
relationships between where Pizza Huts are located and where fast food establishments are more
generally; that (b) geographic locations where there are more followers of Pizza Hut per capital
correspond with where Pizza Huts are located; that (c) fast food prevalence does predict obesity
prevalence; and that (d) obesity prevalence does predict Republican voting—there is no
discernible relationship at the state level, between the number of Pizza Hut followers per
capita and whether the state is Republican-voting. Ultimately, the relationships among these
variables are sufficiently weak, the measures are sufficiently coarse, and the units of analysis
(=states) are large enough that a clear relationship between the number of Pizza Hut followers
per capita and whether the state is Republican-voting is not found. That said, this paper did
establish that the number of Twitter followers of Pizza Hut is a good proxy for how many Pizza
Huts are located in a state, which is a promising development and one that could be extended to
other domains where social media data can help to capture hard-to-measure reality.
1/51 CHO, Ara (UNI: ac3772)
2. Introduction
The following events are all part of the bigger pattern in which obesity and nutrition have
become deeply partisan in recent years: states where high Body Mass Indices are more prevalent
are likelier to vote Republican, and that this rightward lean is more pronounced in counties in the
South that make up what the Centers for Disease Control (CDC) calls the “diabetes belt”
(Krugman, 2015 March 6); the biggest pushback against First Lady Michelle Obama’s healthy
lunch initiative came from leadership in the above-mentioned “diabetes belt” area (Pianin, 2014);
and fast food corporations in recent years have been allotting an overwhelming majority of their
political contributions for the Republican Party (Martin, 2015).
Given the apparent correlation between fast food consumption and political preferences,
this study aims to investigate how well social media measures capture the said relationship
compared with more traditional ones.
To this end, I look at Twitter data of a randomly drawn sample of Twitter followers.
Approximately 5,300 Twitter followers were culled from more than 10 million followers of a
major pizza franchise Pizza Hut (Twitter ID: @pizzahut) to help determine whether states with
more Twitter followers of Pizza Hut are more likely to be Republican-voting. As a proxy for
voting behavior, this research utilizes the 2012 presidential election results to categorize the 50
U.S. states and one federal district as either a “red” or “blue” state or district. In order to assess
the efficacy of Twitter data, the results of the analyses—regression and spatial—are compared
against outcome generated using a more traditional dataset, such as the number of Pizza Huts per
capita in each state, as well as against other studies that measured the relationship using more
standard methods.
2/51 CHO, Ara (UNI: ac3772)
Using the popular microblogging site and spatial analytic techniques, as well as looking
at other related studies, have allowed me to capture association between politics and obesity as
seen in the following relationships: there are discernable relationships between where Pizza Huts
are located and where fast food establishments are more generally (Fraser et al., 2010), there are
more Twitter followers of Pizza Hut per capita where there are also more Pizza Hut outlets per
capita; fast food prevalence does predict obesity prevalence (Currie et al., 2009; Jeffrey et al.,
2006); and obesity prevalence does predict Republican voting (Shin and McCarthy, 2013). The
thesis does not find a discernable relationship, at the state level, between the number of Pizza
Hut followers per capita and whether the state is Republican-voting.
Finally, in addition to the fast food component of the analysis, I will employ additional
proxies — well-known liberal and conservative media pundits Stephen Colbert and Bill O’Reilly
— to assess whether a randomly generated follower samples reflect partisan identification
geographically. This is a supplementary exercise to test the efficacy of Twitter-generated data in
providing an adequate sample of the population it purports to represent.
3. Literature Review
3.1 Politicization of Obesity
The unrelenting rise in obesity rates across the United States has prompted the need for
more effective policy responses (Kersh, 2009). More than a third of all adults and 17 percent of
young people are obese, and many of them have been consigned to a whole host of related health
problems like diabetes, coronary heart disease, stroke, hypertension and even cancer (Pianin &
Ehley, 2014). Without a major intervention or a sea change in many Americans’ unhealthy eating
habits, the adult obesity rate could reach a whopping 50 percent in less than two decades (Pianin
3/51 CHO, Ara (UNI: ac3772)
& Ehley, 2014). The associated cost to society in terms of damaged lives, health care costs and
diminished productivity and economic growth is staggering: $305 billion (Pianin & Ehley,
2014).
The health issue, which has been billed as the “new national epidemic,” is an oft talked
about topic in public discourse, and it is discussed using mainly two “frames”— personal
responsibility and environmental—lending itself to very different voter reactions and policy
responses (Kersh, 2009; Satcher, 2001).
The variance in how people talk about the obesity issue prompts different voter reactions.
For instance, when it is discussed as a “personal responsibility” problem, the dialogue is often
shaped in such a way that the government’s interventions to address the issue are characterized
and perceived as an infringement upon personal freedom, more specifically freedom of making
lifestyle choices. The same can be said about personal freedom in raising one’s own children, at
least as it relates to policy choices by schools (Have et al., 2011). Since Republican voters tend
to prefer a small government, the appearance of a federal or state infringement is likely
unwelcome and likely one of the reasons behind the pushback against the federal initiative to
have schools serve healthier lunches. Indeed, discussion surrounding the freedom to eat whatever
food — however unhealthy — has been framed by conservatives as government overreach and
attack on personal freedom, a type of rhetoric typically attributed to the politically conservative
(Wieder, 2013; Trinko, 2013). Public Opinion researchers have previously demonstrated that
Americans are indeed divided in their views about the role of government in obesity prevention
and for specific obesity prevention policies such as market regulation, junk food taxes, and
school food restrictions (Barry et al., 2009).
4/51 CHO, Ara (UNI: ac3772)
Among the different policy responses are calorie menu labeling and reducing unhealthy
foods at school (Satcher, 2001). At issue is the first lady’s latest campaign to get schools to serve
healthier lunches to schoolchildren, as part of a law passed in 2010 that sets nutrition
benchmarks for public schools. Many Grand Old Party lawmakers are pursuing a measure that
would exempt many schools from the new regulation of requiring school lunches to have more
fresh fruit, whole grains, and vegetables. The reason they gave for their action was that schools
are simply wasting money trying to impose the new standard, as many students are resisting the
new healthier fare in favor of more popular junk foods like pizza (Pianin, 2014).
Pizza’s dominance in food supply to school cafeterias was especially made vulnerable by
the new federal nutrition standards, but the toll it has taken on pizza makers does not stop there:
the labeling laws require that pizzerias post calories for a whole pie, rather than a single slice,
which pizza makers say would drive away consumers due to what they call a “sticker shock.”
Complicating the labeling issue is the sheer variety and options involved in topping choices, they
say, and that it would simply be “impossible” for pizza makers to post accurate calorie
information for all possible combination of pies (Martin, 2015).
These conflicts lead to the current partisan landscape where a rare coalition of competing
pizza purveyors banded together to form a lobby to advocate for the special dish and allot up to a
remarkable 99 percent of their political contributions to the GOP (Krugman, 2015).
Further delineating the party divide on the issue of obesity is studies that help show an
association between county-level political inclination and obesity. According to a study that
utilized a more traditional methodology of a telephone survey asking for self-reported weight
and height information and the county-level voter preference in the 2012 presidential election,
there was a modest but positive association between county-level support for the 2012
5/51 CHO, Ara (UNI: ac3772)
Republican presidential candidate and county-level obesity prevalence (Shin and McCarthy,
2013).
3.2 Pizza as Proxy for Fast Food
I will choose Pizza Hut as my proxy for fast food makers for a couple of reasons. As I
mentioned briefly before, pizza makers have proven unique in the so-called war of nutrition. In
the pushback against government intervention pizza makers have taken the most unique stance
among most major fast food chains in that it formed a specialized lobby rather than backing out
and complying to federally mandated nutrition standards. While other purveyors of fast food like
McDonald’s and Wendy’s have been nudged along by new laws to voluntarily remove soda from
its children’s menus and honor new labeling regulations, advocates for pizza have taken a more
confrontational stance and formed their own lobbying force. U.S. pizza companies made political
contributions totaling $1.5 million in the 2012 and 2014 elections, of which some 88 percent
went to Republican candidates and groups. For instance, industry leader Pizza Hut allotted a
whopping 99 percent to the Republicans, Papa John’s assigned some 87 percent, Domino’s gave
some 80 percent, and Little Caesars spent 73 percent on supporting campaigns of GOP
candidates and lawmakers (Martin, 2015).
In addition, among the major pizza makers, Pizza Hut has the most number of stores
nationwide, nearly twice as many as the stores of Domino’s, which is the second most prevalent
pizza chain in the U.S. (Cosper, 2013).
The second reason is quite simple: pizza is a widely preferred dish by many Americans
across the nation. Pizza would serve as a sound proxy for studying people’s preference for fast
food, a leading cause of obesity (Duffey, 2007) for the following reasons. Some 41 million
Americans—more than the population of California—eat a slice of pizza on any given day, and
6/51 CHO, Ara (UNI: ac3772)
more than one in four young males consumed pizza on a given day (Rhodes, et al., 2014). As
mentioned before, the voluntary expression of enthusiasm for pizza via Twitter—whether by
Tweeting about it, following pizza franchises or re-Tweeting statuses about pizza—cannot
supplant measured health information like BMI, but it may still serve as a compelling indicator
of lifestyle choice when it comes to affinity for, and habitual consumption of, fast food.
3.3 Twitter: Limitations and Strengths
Data from Twitter, understandably, have limitations. Only 23 percent of online adults use
Twitter (Pew Research, 2014). But they still account for 19 percent of the entire adult population
(Pew Research, 2015). Twitter is particularly popular among those under the age of 50 and
among those who are college-educated. However, Twitter has seen increases in usership across a
variety of demographic groups; for instance, some 21 percent of white online adults used
Twitter, compared with 27 percent for black online adults, and 25 percent for Hispanic online
adults (Pew Research, 2015). With regard to education, some 16 percent of online adults with
high school degree or less use Twitter, compared with 24 percent of online adults with some
college experience, and 30 percent for online adults with at least a college degree. In terms of
types of locality, some 25 percent of online adults who live in urban areas use Twitter, compared
with 23 percent for online adults in suburban areas and 17 percent for rural areas. The
comparable proportion extends to income levels as well; some 20 percent of online adults who
make less than $30,000 a year use Twitter, compared with 21 percent for those who make
between $30,000 and $49,999, and 27 percent each for those making between $50,000 and
$74,999 and those making $75,000 or more (Pew Research, 2015).
Many scholars who wrote about Twitter data are also forthright about the limitations of
Twitter data. Due to its tendency to be data-driven—rather than question-driven, per se—much
7/51 CHO, Ara (UNI: ac3772)
of the current quantitative research on the popular microblogging service is centered on
measuring specific structural parameters in large data samples, sometimes at the expense of
theoretical salience of said parameters (Weller et al., 2014). Further, there is no way of checking
how completely a given data set captures what transpired on Twitter at the time of data
collection. Without what Twitter calls “firehouse access,” researchers rely on the service to
provide a “representative sample” of what is there (Weller et al., 2014). This, Weller says, is due
in part to the unique challenge of building an infrastructure powerful enough to store vast
quantities of information in real-time. Further, it would mitigate reproducibility of research
results, a serious hindrance to advancing research goals (2014).
Furthermore, according to a 2013 research, reaction on Twitter to major political events
and policy decisions often differs a great deal from public opinion as measured by surveys. The
yearlong Pew Research Center study compared the results of national polls to the tone of Tweets
in response to eight major news events, including outcome of the presidential election, the first
presidential debate and major speeches by the president (Mitchell and Hitlin, 2013).
That said, Twitter data is not without its strengths. Twitter is a cost-effective research tool
in that collection of data is relatively costless compared with traditional data collection methods
such as surveys. One method of inquiry that might most benefit from Twitter-generated data may
be text mining and text analysis. In addition to culling information about followers, followees or
friends, Twitter API (Application Program Interface) allows us to collect statuses as well,
providing a rich body of text information with which to either conduct a sentiment analysis or
simple text visualization exercises, such as word clouds (Weller et al., 2014).
With those aforementioned limitations and strengths in mind, it is still worthwhile to
explore Twitter data in a variety of novel ways—by examining what type of information can be
8/51 CHO, Ara (UNI: ac3772)
gleaned from looking at someone’s, or some corporate entity’s, follower list and their followers’
geographic locations.
3.4 Association Between Politics and Obesity
Researchers Michael Shin and William McCarthy suggest in their study that residents in
counties that vote Republican are electing legislators who embrace a political philosophy that
makes them less likely to promote environmental and policy strategies for obesity prevention
(2013).
However, a critique of the study states that the authors of the study would benefit to
consider a possible bias in measurement of local political inclination, due to the fact that
propensity to turn out to vote in the 2012 election may be associated with individual-level health
characteristics. Furthermore, Shin and McCarthy’s study fails to establish causal ordering
between political preferences and dietary preferences (Gollust, 2013).
Some observers have raised different hypotheses to attempt to explain this association.
For instance, a research hypothesized whether neighborhoods in Democratic-tilting counties are
more likely to have sidewalks, facilitating physical activity (Lindeke, 2012) or whether partisan
preferences and food preferences are related in that partisanship is “passed on” among families in
the same way food and exercise preferences are shaped by familial context (Kinder, 2006).
Another example would be whether there are identified personality traits and psychological
differences between liberals and conservatives, such that liberals are more likely to seek novelty
and change while conservatives tend to be happier with the status quo (Carney et al, 2008).
Despite the vibrant discussion aimed at understanding the causal association between
politics and obesity, existing data on health behaviors and health status is paltry and limiting
(Gollust, 2013), prompting the need for creative data linking and collection.
9/51 CHO, Ara (UNI: ac3772)
On a related note, there is also a growing body of research investigating the effect of
geography of fast food outlets on obesity. Most studies have found a positive association
between proximity and density of fast food outlets and increasing deprivation. This may be due
to a business strategy of targeting more deprived areas where real estate costs are lower or due to
market research showing that demand in these areas is greater (Fraser et al, 2010). However,
some studies using traditional methods such as a telephone survey failed to show that proximity
to fast food restaurants was associated with obesity (Jeffrey et al, 2006).
3.5 Pundits as Proxy
A Pew Research Center report found that the media outlets people named as their main
sources of news about politics are strongly correlated with their political views. In other words,
the Pew study can be interpreted to mean that people are living in partisan and ideological echo
chambers. This correlation likely can be extrapolated to apply to the Twitter ream in that people
likely follow news outlets or pundits whose political ideology they agree with. According to
another study mentioned in this New York Times article gave a more nuanced interpretation of
the Pew study, saying that partisans from both sides of the ideological aisle were most likely to
consume news from outlets that were estimated to be relatively centrist, such as network
morning shows and evening news broadcasts. This ideological echo chamber behavior was more
or less consistent when looking at an analysis of online news consumption as well (Nyhan,
2014).
Social media, too, appeared somewhat encouraging of this ideological echo chamber
behavior, a phenomenon where individuals are primarily exposed to like-minded views. People
tend to follow like-minded individuals on Twitter—about two-thirds of the people followed by
the media Twitter user in the United States share the user’s political leanings, a study showed
10/51 CHO, Ara (UNI: ac3772)
(Nyhan, 2014). Stephen Colbert is an American political satirist and host to a popular Comedy
Central show called The Colbert Report. Colbert assumes an eponymous character that embodies
the most outlandish traits associated with the right wing, attacking them in the process (West,
2014). On the opposite extreme is Bill O’Reilly, American political commentator and host to Fox
News Channel political commentary program The O’Reilly Factor. O’Reilly is one of the more
visible conservative political pundits, so he could serve as a proxy to capture right-wing Twitter
users. For this reason, using characteristically liberal or conservative media pundits as proxy on
Twitter may yield illuminating results about the partisan landscape of Twitter followers across
continental U.S.
4. Data
4.1 Twitter Data
Twitter is a social-networking, microblogging site that allows its users to post messages
online in real-time, which are called Tweets. Tweets are restricted to 140 characters in length.
Users of Twitter are sometimes called Tweeps, and because of the limitation in length of the
message they can upload at a time, Tweeps use acronyms and emoticons in expressing their
thoughts, conveying feelings or relaying information. Tweeps use the “@” (called, “at”) symbol
to indicate an account, and “#” (called hash tag) to allow others to find the status related to the
topic (Agarwal et al., 2011).
Data collection via Twitter for research purposes usually entails one of three Application
Program Interfaces (APIs) – the Streaming API, the REST API and the Search API (Weller et
al., 2014). The Streaming API is likely the most widely used data source for Twitter research,
and it works as a stream of data continually provides data real-speed (Weller et al., 2014).
11/51 CHO, Ara (UNI: ac3772)
More specifically, I will be drawing a random sample of around 7,500 Twitter users who
follow Pizza Hut, which has a little over a million followers worldwide. Of those randomly
culled accounts, only those with U.S. state locations specified (e.g. California, New York, etc.)
on their user profiles will be culled to form a data set with which regression analyses will be
conducted.
There are other ways of measuring Twitter user location, namely looking for Tweets that
are geocoded. Those Tweets, however, cannot at the same time account for whom they follow,
and because those Tweets can only be geocoded to predesignated locations, that sample would
be restrictive when it comes to investigating the relationship between geographic locations and
penchant for a particular fast food maker.
Instead, I will be culling the randomly drawn sample of some 5,000 Twitter followers of
a particular fast food franchise using text mining techniques in order to cull accounts associated
with any one of the 51 U.S. states and district. And subsequently weighing the outcome using the
states’ population, or controlling for population, I hope to be able to show, with some accuracy,
that Twitter-harvested data, or analysis done using that data, can replicate a correlation between
obesity-related lifestyle choice (as operationalized by expression of enthusiasm via Twitterfollowing) and voting behavior (as operationalized by voting results from the 2010 Presidential
Election.
Table 1 in the Appendix shows an example of a randomly generated, and subsequently
cleaned, data set organized by state and the number of Twitter followers of Pizza Hut culled. Of
the 7,500 users who follow Pizza Hut on Twitter, roughly 5,300 of them have indicated on their
profile their geographic locations that match the name of any one of 50 U.S. states and 1 federal
district. The initial 7,500 users, randomly generated using a statistical software command, is out
12/51 CHO, Ara (UNI: ac3772)
of the more than 10 million followers worldwide of Pizza Hut. So the regression and spatial
analyses will be done using this sample.
The first column, named “abbr,” refers to abbreviations of state names, and the second
column, dubbed “follow,” refers to the number of Twitter followers of Pizza Hut. This is a
randomly generated sample and may not proportionally represent the actual number of people
who follow Pizza Hut in each state, a limitation of using Twitter-generated data that was
mentioned earlier in the paper.
Figure 1, below, shows the summary statistics of this randomly culled sample of some
5,535 Twitter users who follow Pizza Hut. As you can see, the minimum value is zero,
associated with the state of Vermont and the maximum of 318 associated with Oregon. Bias
exists within these data, as each state has different levels of exposure to Pizza Hut and the
Internet (and social networking services like Twitter), therefore, the numbers will be weighted
with differences in state population in mind. The median number of followers is 56, and the
average is around 80 per state.
Figure 1: Summary Statistics of Example Sample
Minimum 0 1st Quarter 13 Median 56 Mean 79.27 3rd Quarter 116 Maximum 318 Another potential method to measure Twitter users’ penchant for fast food, and therefore
their correlation to obesity-inclined lifestyle choices, is to look at Tweeting history. In other
words, by looking at users’ Tweeting history and see what they have been saying about the
above-mentioned fast food makers. There is but one problem with that approach. A recent report
found that unlike Facebook, Twitter has many users who are not active, with some 44 percent of
all people who signed up have never event sent a single Tweet (Sherman, 2014). Furthermore,
Twitter said in the last three months of 2013, less than a quarter of the total users have logged in
13/51 CHO, Ara (UNI: ac3772)
at least a month, according to Sherman (2014). Because sending Tweets may reveal even less
about a user than what who they follow may reveal, I will not be utilizing users’ Tweets or reTweets for that matter.
4.2 Comparison with Standard Methods
In order to assess the efficacy of social media measures, I will discuss at length a
traditional dataset used in this study, as well as other papers employing standard methodology
while covering the same topic of obesity and partisan identification.
First, this study looks at the state-level data set on the number of Pizza Huts in each state.
The number of Pizza Huts per state will also be weighted for population size, that is to say, the
number of Pizza Huts per capita. The reason for this is to contrast the results of analysis using
Twitter data with those obtained through a more traditional research method. Depending on the
result, this could show whether Twitter data, however rudimentary in some aspects, may be used
to complement traditional data analysis or rather that it still needs further study before it can be
considered a reliable method for social research.
The dataset on Pizza Hut outlets has over 6,000 rows of data, with each row comprising
information ranging from the state in which the branch is located, the longitude and latitude, the
official name of the branch, address and phone number. I will only use the state identification of
data from the U.S., which will leave me with roughly 5,800 branches of Pizza Hut, even though
another data set states there are over 7,000 stores in the U.S. (Cosper, 2013).
The data was collected in 2011 and includes Pizza Hut locations in Canada as well, which
have been removed (Bulmer, 2013). The data not being more current is not likely to be a
problem, as the collection date of 2011 is close to the time of the election results (2012) and the
Census data from 2010, which will be elaborated later.
14/51 CHO, Ara (UNI: ac3772)
Secondly, there are many other studies that looked at the relationship of obesity and
partisanship, as well as obesity and proximity to fast food outlets, while using a more standard
data set and traditional methodology. In a provocative but illuminating article in the journal of
Preventative Medicine, researchers Shin and McCarthy showed that a country’s political
inclination, as measured by the share of votes for the Republican candidate for president in 2012,
is associated with county-level obesity prevalence. This study features age-adjusted county-level
prevalence estimates for the percentage of adults who were obese, which is defined as those with
BMI over 30. This data was based on an on-going, state-based telephone survey using randomdigit dialing of adults in the U.S, in which the estimates were derived from self-reported weight
and height from responders. With regard to partisan identification, the county-specific
percentage of votes obtained by the 2012 Republican Party candidate was used as a proxy for
local political inclination, which was defined in the study as established and stable county-level
voter preferences. Correlation analyses showed that county-level support for the Republican
candidate closely followed patterns of support for Republican candidates in 2008 and 2004. The
biggest difference between this author’s study and the research by Shin and McCarthy was that
the latter used county-level data for obesity and partisan identification, whereas the former
analyzed the data on a state level. The results of the research showed that higher county-level
obesity prevalence rates were associated with higher levels of support for the republican
candidate in the 2012 presidential election (Shin and McCarthy, 2013).
4.3 State-level Partisan Identification Data
Both the Twitter-generated data and traditionally tallied information about the number of
Pizza Hut stores will be used in the regression analysis along with state-level partisan
identification data, or presidential election data. The data will first be visualized on maps for
15/51 CHO, Ara (UNI: ac3772)
spatial analysis. Whether a state identifies as Republican or Democrat depends on a variety of
factors including issues, the distribution of registered voters, voting records on either the state or
federal level, stance on social issues, etc. In other words, there isn’t one agreed-upon way that
decides a state’s partisan identity, but I have decided to take into account voting records of the
2012 Presidential Election in determining the state political identity (President Map, 2012). The
year 2012 is relatively close in time with when the Pizza Hut store data was collected (2011), and
also it is only two years after the state population data was collected in 2010 during the
centennial Census. Due to the nature of the bipartisan political landscape in the U.S. the partisan
identity of the states will be coded as binary, and regression using the Twitter follower figure
will be logistic, for enhanced interpretability.
The whole data set comprising Twitter data (numeric), state-level Pizza Hut store
information (numeric), and state-level political identification data (binary) will be used for
spatial analysis. Spatial analysis will help to visualize the figures and subsequently compare
them, if in the case that regression analyses do not yield compelling comparative results.
Summary statistics of the combined data set are shown in Table 2 in the Appendix
portion. In the summary statistics, there are a total of 51 rows, for all 50 states and the District of
Columbia, and 12 columns.
The first column – “abbr” – refers to abbreviations for the state names, which were used
for merging data frames. The second, third and eighth columns are created for similar purposes
as the first column in that they are need to merge data frames and later merge them with shape
files of the U.S. state to visually represent findings. The column named the number of followers
refers to the number of Twitter users who follow Pizza Hut on Twitter and those who also
indicated their geographic locations as one of the 51 states or district. The column the number of
16/51 CHO, Ara (UNI: ac3772)
Pizza Huts contains information about the number of Pizza Hut locations in each state or district
as of 2011. Though they are neither complete nor up-to-date, they offer accurate estimate as of
2011. Columns referred to as Republican tendency and Republican tendency represent results
from the Presidential Election of 2012. The Twitter follower figure and Pizza Hut figures will be
divided by each state’s population, to control for population, and they will be called per capita
number of followers for Twitter follower data divided by the number of population of each
respective state, and per capita number of Pizza Huts for the number Pizza Huts divided by the
number of population for each respective state. In other words, variables weighted for population
are per capita figures.
4.4 State-level Population Data
One of the challenges of comparing one state with another by simply using the raw figure
collected in the random sample is the difference in population of the states. California, with its
37 million population, is likely going to have more Twitter users in the random sample and more
Pizza Huts, than say North Dakota, with some 673,000. To compare the two, or any other states
with a wide population gap for that matter, would not be conducive to correctly understanding
the relation between lifestyle choices (when it comes to obesity-inducing diet), which is
operationalized as Twitter followers of Pizza Hut or the number of Pizza Huts in a given state,
and partisan identification.
In order to overcome this problem, the figures will be population “weighted,” or the
analysis will be done using per-capita figures for Twitter followers and Pizza Hut outlets. The
number of Twitter followers of Pizza Hut by each state or Pizza hut stores in one of 51 state or
district will be divided up by each respective state’s population, as stated in the 2010 Census
data. The divided outcome will then be multiplied by a 1,000 and rounded up so the outcome
17/51 CHO, Ara (UNI: ac3772)
will be in whole figures for ease of interpretability. The state-level population data was culled
from the U.S. Census data of 2010, which was collected on April 1, 2010, as stipulated by the
U.S. Constitution. The 2010 data also roughly coincides with the Pizza Hut data, which dates
from 2011, and two years shy of the 2012 presidential election outcome.
4.5 Media Pundit Follower Data
With regard to Twitter followers of Stephen Colbert, some 4,241 users out of the
randomly generated sample of 7,500 followers of Colbert (@StephenAtHome) have specified
their geographic locations that are identified as being one of 50 U.S. states or one federal district.
There are roughly 7.64 million followers of @StephenAtHome.
As shown in Figure 2A, the lowest number of followers was attributed to South Dakota,
followed by Louisiana and West Virginia. The highest number of followers was found in
Indiana, with 305 Twitter users, followed by Oregon with 299 and California at 245.
Figure 2A: Stephen Colbert’s Twitter Follower Data
1 2 3 4 5 6 7 8 9 10 11 12 abbr AK AL AR AZ CA CO CT DC DE FL GA HI follow 60 261 60 41 245 152 11 18 114 77 98 138 13 14 15 16 17 18 19 20 21 22 23 24 25 IA ID IL IN KS KY LA MA MD ME MI MN MO 240 71 142 305 23 21 2 133 16 63 111 21 75 1
26 27 28 29 30 31 32 33 34 35 36 37 38 MS MT NC ND NE NH NJ NM NV NY OH OK OR 12 3 70 199 219 14 22 17 36 70 36 32 299 39 40 41 42 43 44 45 46 47 48 49 50 51 PA RI SC SD TN TX UT VA VT WA WI WV WY 100 193 54 0 13 70 68 71 4 90 45 2 4 With regard to conservative pundit Bill O’Reilly, some 6,170 users out of the randomly
generated sample of 7,500 followers of O’Reilly (@oreillyfactor) have specified their geographic
1
States with the number of followers exceeding 100 is colored blue, and the rest is colored red.
18/51 CHO, Ara (UNI: ac3772)
locations that are identified as being one of 50 U.S. states or one federal district (Figure 2B).
There are roughly 737,000 followers of @oreillyfactor. The state with the fewest number of
followers was South Dakota and Vermont at zero, and the maximum number of followers was
found in Oregon with 474, and Indiana, with 450.
Figure 2B: Bill O’Reilly’s Twitter Follower Data
1 2 3 4 5 6 7 8 9 10 11 12 abbr AK AL AR AZ CA CO CT DC DE FL GA HI follow 57 326 65 65 303 192 28 30 130 150 152 234 13 14 15 16 17 18 19 20 21 22 23 24 25 IA ID IL IN KS KY LA MA MD ME MI MN MO 292 141 210 450 59 34 7 169 16 115 170 24 97 2
26 27 28 29 30 31 32 33 34 35 36 37 38 MS MT NC ND NE NH NJ NM NV NY OH OK OR 26 7 110 273 272 12 33 9 56 103 74 45 474 39 40 41 42 43 44 45 46 47 48 49 50 51 PA RI SC SD TN TX UT VA VT WA WI WV WY 151 288 69 0 48 177 90 112 0 133 78 6 8 5. Methodology
The methodology will be in discussed in three main parts:
a. Twitter-generated data of followers of Pizza Hut (@pizzahut), Stephen Colbert
(@StephenAtHome), and Bill O’Reilly (@oreillyfactor) in each of the 50 U.S. states and
one federal district will be regressed to show the correlation with the respective states’
political identification using linear regression analyses.
b. Data on the number of Pizza Hut locations in each state, controlling for state population,
will be regressed to show the correlation with the respective states’ political identification
using linear regression analyses
2 States with the number of followers exceeding 100 is colored red, while the rest is colored blue. 19/51 CHO, Ara (UNI: ac3772)
c. The statistics from the random sample along with the linear regression results will be
visually represented on a continental U.S. map for geographic comparison to see what
regression analysis could not capture may be gleaned using data visualization
5.1 Twitter Analysis
Using R and Twitter application program interface, I will cull a random sample of around
5,000 followers of Pizza Hut using R command (Table 3 in Appendix) and subsequently
cleaning the data and categorizing it to tally up the followers by state. Also culled will be
followers of well-known conservative and liberal media pundits Colbert and O’Reilly.
Cleaning of the data includes ridding the location information of the Twitter followers on
their profile pages that used foreign languages, punctuation marks and other special symbols that
do not offer valuable information regarding where those Twitter users are geographically
located. The text cleaning process will be done using programming language r, and is
demonstrated in the Appendix.
After the data have been cleaned, the users will be counted based on which U.S. state
they listed as their location. The tally will then be “weighted” by the total state population of
each state.
As I mentioned before, this process will involve a simple arithmetic function to divide the
number of Twitter followers and the number of Pizza Hut stores by each respective state’s
population. The divided figures will be multiplied by 1,000 and rounded up to yield whole
numbers so that the outcome can be visualized without difficulty in spatial analytic tools.
After the figures have been “weighted” by state population (i.e. controlled for
population), I will utilize both regression analysis and spatial analysis to contrast the Twitter
followers and Pizza Huts figures against a state’s political identification. Correlation between a
20/51 CHO, Ara (UNI: ac3772)
state’s follower figure (weighted by population) and that state’s partisan identity based on voting
records from the 2012 Presidential Election will be shown via a linear regression (President Map,
2012).
The same methodology of culling, cleaning and categorizing will be applied toward
datasets containing information about the Twitter followers of conservative and liberal political
show hosts Colbert and O’Reilly.
5.2 Spatial Analysis
Using spatial analysis to show the correlation between the number of followers, the
number of Pizza Huts, and political identification may help to explore and present their
relationship effectively. Should spatial analysis prove to be insufficient in fully illustrating the
association between politics and obesity, additional research methods such as regression analysis
will be employed. Using the data set that combined all three data frames—Twitter follower,
Pizza Hut stores, and Presidential Election map—and matching them with a state-level shape
file, I can color-code the degrees of numeric variables and compare them visually. The process
will involve exporting the complete data frame from R and importing them in QGIS, a spatial
analysis software, and matching them with the states and visualizing how each state compares
with one another in terms of the number of Twitter followers, Pizza Hut stores, and political
landscape.
5.3 Traditional Analysis
Traditional analysis in this study refers to linear and logistic regression analyses of a
dataset in a more traditional format and scope, or simply non-Twitter-generated data. I will
utilize a dataset on the number of Pizza Hut stores in each state, controlling for state population,
before regressing it against partisan identity of each respective state. Because Twitter data has
21/51 CHO, Ara (UNI: ac3772)
many limitations, as discussed before, having a reference point to assess the efficacy of utilizing
Twitter data for analysis as a prudent complement to traditional research method is important. It
can also help to assess whether using Twitter requires further study and experimentation before
being applied to complementing traditional research. It is not certain whether this traditional
method will be able to effectively capture the correlation between politics and obesity as shown
by Shin and McCarthy in their research using standard research methodology.
6. Results and Analysis
6.1 Spatial Analysis
First, I will investigate the relationship between politics and obesity using proxies of
Twitter following, Pizza Hut store information, and the election outcome in 2012.
This analytic component will comprise visualizations of above statistics in a manner that could
reveal trends that the author sets out to capture using Twitter data and traditional data of Pizza
Huts, as well as randomly generated samples of Twitter followers of Stephen Colbert
(@StephenAtHome) and Bill O’Reilly (@oreillyfactor), as they are the Twitter-verified,
authenticated accounts belonging to the aforementioned celebrities. The supplementary analysis
using political pundits as proxy will also be visualized in the same manner as described. The
visualization will take the form of state-level shape files, with a gradient range of colors from
blue and red, with blue representing Democratic states and red representing Republican ones.
With regard to the Colbert and O’Reilly data sets, states with higher number of followers of
Colbert will be represented as blue while states with higher number of O’Reilly followers will be
shown in red.
22/51 CHO, Ara (UNI: ac3772)
Using a combined data set of the above-mentioned variables, and matching them with a
state-level shape file of the U.S., the degrees of numeric variables can be represented using
different colors, and for comparison. The process will involve importing the data in QGIS, a
spatial analysis software, and visualizing different values associated with the variables to
compare the states in terms of Twitter followers, Pizza Hut stores, and political landscape, as
well as Twitter following information relating to conservative and liberal political commentators
and TV show hosts O’Reilly and Colbert, respectively.
First and foremost, Map 1 will feature a U.S. map with the election outcome from 2012,
which will serve as a reference point in determining a state’s partisan identification. The red
represents Republican states and the blue represents states that voted for the Democratic Party.
The West coast states are largely Democratic, while the Midwest and many Southern states are
mainly Republican.
Map 1: Presidential Election of 2012 Voting Outcome
Data Source: The New York Times
23/51 CHO, Ara (UNI: ac3772)
In Map 2, the number of Pizza Hut locations per state is distributed into five quintiles,
and the unit is the number of stores. The states in the South including Georgia, the Carolinas, and
Alabama are on the higher side, therefore red. But this Pizza Hut figure has not been weighted by
state population, so states with high population like California are shown here as having more
than 133 stores, therefore seen here as red. This can pose a problem, as the difference in the size
of state population can hinder seeing correlation between having more Pizza Huts and voting
more Republican. California is shown here as being very red, but as it will be shown in other
visualizations, California will prove to be an exception in terms of having characteristics that are
expected to be associated with Republican states.
Map 2: The number of Pizza Huts in each state is shown using variations of color from blue to red,
with different shades representing first through fifth quintiles. The specific number of stores is
shown in the legend.
In Map 3, the number of Twitter followers is also divided into five quintiles to show that
California and Oregon are well represented in terms of Twitter using population. Especially,
24/51 CHO, Ara (UNI: ac3772)
Oregon is well represented in terms of Twitter-using population along with Indiana. Also highly
represented states with Twitter users (from the random sample) are California, North Dakota,
Nebraska, Iowa, Alabama and New York, indicating that at least in this sample, the number of
Twitter followers does not seem to indicate a correlation between having more Twitter followers
of Pizza Hut and belonging in a state that has voted either Democratic or Republican.
Map 3: The number of Twitter followers of Pizza Hut by state, represented with varying gradients
from blue to red representing first through fifth quintiles of the number of followers by state.
In Map 4, per capita number of Pizza Huts per state is represented. This map, too,
features a division of the figures into five quintiles. The weighted figures were derived using a
simple arithmetic function of dividing the number of Pizza Huts in each state by that respective
state’s population. For ease of visual representation, the figures were multiplied by 1,000 and
rounded up to yield whole numbers.
25/51 CHO, Ara (UNI: ac3772)
Map 4: The number of Pizza Huts per capita in each state is shown using variations of color from
blue to red, with different shades representing first through fifth quintiles. The figures in the
legend have been multiplied by 1,000 to show the scale of the figures in whole numbers.
In Map 5, population “weighted”, or per capita, Twitter follower figures are represented
in five quintiles. The numbers were derived by simply dividing the number of Twitter followers
of Pizza Hut by each respective state’s population. The outcome was subsequently multiplied
and rounded up to yield whole numbers. Some states feature a high number of Twitter followers
by population, and that trend shares similarities with the partisan map (Figure 3). Texas, where
Pizza Hut’s headquarter is based (Texas Wide Open for Business, 2014), is understandably red,
which means many Twitter followers of Pizza Hut were from the Lone Star State. There are also
discernable pink areas in the central part of continental U.S., such as Nebraska, Kansas,
Missouri, Arkansas and Louisiana that featured prominent number of Twitter followers of Pizza
Hut per population. There were, however, exceptions, too, such as Minnesota and Florida, but
26/51 CHO, Ara (UNI: ac3772)
considering the fact that previous election outcomes showed Florida as being more traditionally
conservative in its voting record prior to the incumbent President Obama.
Map 5: The number of Twitter followers of Pizza Huts per capita in each state is shown using
variations of color from blue to red, with different shades representing first through fifth quintiles.
The figures in the legend have been multiplied by 1,000 to show the scale of the figures in whole
numbers.
In Maps 6 and 7, the natural log of the per-capita figures of Pizza Hut stores and Twitter
followers of Pizza Hut have been represented in quintiles. The reason for taking the log is to
assess whether a more symmetric distribution of figures may yield information about any
correlation between voting pattern and having high numbers of Pizza Hut stores and followers.
There are some of the similarities can be seen between Maps 6 and 7. For instance, from the
west, Idaho, Utah, New Mexico, Texas, Arkansas, Illinois and Ohio can be seen as having high
figures on both maps with the remainder of the Midwest following a similar pattern. However, in
Map 6, when it comes to some Southern states like the Carolinas, Georgia and Alabama, and
27/51 CHO, Ara (UNI: ac3772)
Midwestern states like the Dakotas, even though they are traditionally Republican states, they are
seen as having fewer Pizza Huts than other states, controlling for population. Using only Pizza
Hut as a proxy for consumption of fast food has limitations, including not being able to capture
all states where Pizza Hut may not be a popular brand of choice.
Map 6: The log of the number of Pizza Huts per capita in each state is shown using variations of
color from blue to red, with different shades representing first through fifth quintiles. The figures
in the legend have been multiplied by 1,000 to show the scale of the figures in whole numbers.
In Map 7, the log of the number of Twitter followers of Pizza Hut per capita is
represented in five quintiles, and the map shows similarity to the map of the log of Pizza Hut
data (Map 6). This suggests that while there is some evidence of correlation with political
identification map, it isn’t more consistent or widespread enough to reliably use Twitter data to
estimate correlation with voting behavior. There is, however, apparent correlation between
having more Pizza Huts and having more Twitter followers of Pizza Hut, when controlling for
population.
28/51 CHO, Ara (UNI: ac3772)
Map 7: The log of the number of Twitter followers of Pizza Huts per capita in each state is shown
using variations of color from blue to red, with different shades representing first through fifth
quintiles. The figures in the legend have been multiplied by 1,000 to show the scale of the figures
in whole numbers.
In conclusion, spatial analysis was helpful in showing some correlation between the
number of Twitter followers and the number of Pizza Huts, controlling for population. There is
observable correlation between voting behavior (i.e. Democrat or Republican in the 2012
presidential election) and having a high number of Twitter followers of Pizza Huts (Map 5),
though there were some noticeable exceptions including Florida.
In summary, a couple of visualizations with figures, controlling for state population,
show some correlation with voting behavior of each state consistent with the 2012 presidential
election outcome, but the rest of the maps do not offer compelling reasons to accept them as
credible evidence of correlation.
29/51 CHO, Ara (UNI: ac3772)
In Map 8, the number of Twitter followers of Colbert is distributed into five quintiles,
with higher numbers being represented as blue and lower numbers attributed to red. California,
as is often the case, had the most number of followers, along with Oregon, North Dakota,
Nebraska, Iowa, Illinois, Indiana, and Alabama. Red states, or states with the smallest following
were Montana, Wyoming, South Dakota, Minnesota, Louisiana, Mississippi, etc. Many of those
states have voted for the GOP candidate in 2012, but other than that, the correlation is not clear.
Map 8: The number of Twitter followers of Stephen Colbert by state, represented with varying
gradients from blue to red representing first through fifth quintiles of the number of followers by
state.
In Map 9, the number of Twitter followers of Colbert, divided by the state population
and subsequently multiplied by 1,000 to yield whole numbers, is distributed into five quintiles,
with higher numbers being represented as blue and lower numbers attributed to red. Texas, Utah,
and Florida, among others, are shown to have high numbers of followers for Colbert. Similar to
Map 10, the correlation is not apparent.
30/51 CHO, Ara (UNI: ac3772)
Map 9: The number of Twitter followers of Colbert per capita by state, represented with varying
gradients from blue to red representing first through fifth quintiles of the number of followers by
state.
In Map 10 the number of Twitter followers of O’Reilly is distributed into five quintiles,
with higher numbers being represented as red and lower numbers attributed to blue. Again,
California and Oregon are shown as being red, for they have high numbers of Twitter followers
for O’Reilly, along with North Dakota, Nebraska, Iowa, Illinois, Indiana, and Alabama. Blue
states, or states with fewer number of follows in the randomly generated sample, were Montana,
Wyoming, New Mexico, South Dakota, Minnesota, Louisiana, Alabama, West Virginia, etc.
Though many of the states are traditionally Republican in their voting pattern, they are shown as
having fewer number of O’Reilly followers. This goes against expectations that conservative
pundit will likely have more followers in traditionally conservative states, as people tend to
follow ideologically like-minded people on Twitter (Nyhan, 2014).
31/51 CHO, Ara (UNI: ac3772)
Map 10: The number of Twitter followers of Bill O’Reilly by state, represented with varying
gradients from blue to red representing first through fifth quintiles of the number of followers by
state.
In Map 11, the per capita number of Twitter followers of O’Reilly, which was divided by
the state population and multiplied by 1,000 to yield whole numbers, is distributed into five
quintiles, with higher numbers being represented as red and lower numbers attributed to blue.
Even after controlling for population, California, after controlling for population, is not shown as
belonging to the top quintile for the Twitter follower of O’Reilly.
Red states, such as Utah, North Dakota Texas and Indiana, are traditionally conservative
states, while the other “red” states like Illinois and Ohio aren’t. States that have small number of
followers of O’Reilly and also voted for Obama in 2012 are Nevada, New Mexico, Colorado,
Michigan, Delaware, Connecticut, Maryland, and Maine. This is about eight out of 12 states that
32/51 CHO, Ara (UNI: ac3772)
are tagged as blue. Other than that, there wasn’t substantial correlative relationship between
O’Reilly followers’ geographic location and traditionally conservative states.
Map 11: The number of Twitter followers of O’Reilly per capita by state, represented with varying
gradients from blue to red representing first through fifth quintiles of the number of followers by
state. The figures were multiplied by 1,000 to be represented using whole figures.
In conclusion, using a seemingly more direct proxy of conservative and liberal icons did
not result in clear correlation of political identification of a geographic location and Twitter
following. There was some correlation that could be found in the visualizations of O’Reilly
Twitter data, in that when controlling for population, states that are traditionally Democratic
belonged to the lowest quintile of O’Reilly’s Twitter followers data, while many traditionally red
states, or states that vote Republican, were not as widely represented. The effect was minimal in
the Colbert data. In other words, the figures associated with conservative pundit were more
consistent with expectations. Another aspect to note is that certain states like California and
33/51 CHO, Ara (UNI: ac3772)
Oregon had unusually larger numbers of Twitters followers for all iteration, from Pizza Hut to
Colbert to O’Reilly, that those two states always had follower figures far larger than other states
with similar population size. This may be due to the states’ characteristic of being more digital
social media friendly, but that is conjecture for the time being.
6.2 Linear Regression
A sample size of roughly 5,300 Twitter followers of Pizza Hut was culled from some 10
million followers of the fast food pizza franchise. The collected data was then separated into
different U.S. state or district, depending on which geographic location each user specified on
their public account profiles. This data was subsequently regressed with other variables such as
the number of Pizza Huts in each state and the political identification of each state, and then the
results visually represented using spatial analytic tools.
First, I ran a linear regression of the variable the number of followers the number of
Twitter followers of Pizza Hut in the sample, and the number of Pizza Huts in each, as illustrated
in Table 4 in Appendix. This exercise was aimed at showing that there is a positive relationship
between having more Pizza Huts and having more Twitter followers of Pizza Hut by state. As
you can see, additional increase in a unit of the number of Pizza Hut outlets leads to a 0.118
point increase in the number of followers. The relationship is positive and consistent with
expectations that there are likely more followers of Pizza Hut in states where there are more
Pizza Hut outlets, but the finding is not statistically significant.
After affirming that there is a positive correlation between the number of Twitter
followers and number of Pizza Hut stores, I will then try to capture their correlation with respect
to each state’s political identification, as demonstrated in Tables 5 and 6 in the Appendix
portion.
34/51 CHO, Ara (UNI: ac3772)
In Table 5, a positive correlation can be observed between the variable the number of
Pizza Huts and the dependent variable—the number of followers. In other words, a unit increase
in the number of Pizza Huts will lead to a 0.11-point increase in “follower.” Positive correlation
can also be observed between variable Republican tendency and the dependent variable. When a
state is “red,” or a Republican state, it is associated with a 6.4-point increase in the variable the
number of followers. This means that in states that have voted for the Republican Party in the
presidential election in 2012 are likely to have a 6.36 point increase in the number of followers
compared with a or blue state, which voted for the Democratic Party. Again, the positive sign of
the regression coefficients support the idea that the variables are positively correlated.
In Table 6, the dependent variable is the number of Pizza Huts. The positive correlation
between the number of followers and the number of Pizza Huts remains, but the relationship is
reversed when the variable Republican tendency is regressed. If a state is a red state, it will lead
to a 0.27 decrease in the number of Pizza Huts. This reversal of the correlation is a bit
problematic in that it goes against the hypothesis of this study—there is a positive correlation
between lifestyles that are in alignment with obesity and voting preference leaning toward the
Republican Party.
As I mentioned before, one of the challenges of comparing one state with another is the
difference in population. To account for this difference in state population, I have divided the
number of Twitter followers and Pizza Hut stores by each respective state’s population and
regressed with the variable Republican tendency.
Because the value of the state population is far bigger than the value of the Twitter
follower or Pizza Hut store figures, I have used a thousandth of the population figures, in
addition to multiplying the outcome by a thousand, in order to yield coefficients that are more
35/51 CHO, Ara (UNI: ac3772)
easily interpretable, as in are in whole numbers. These “weighted” figures, or figures controlling
for population, will also paint a more accurate picture of how well represented each state is when
it comes to Twitter followers and Pizza Huts.
The outcome of the population “weighted” figures would be variables per capita number
of followers or the number of Twitter followers divided by state population, and per capita
number of Pizza Huts or the number of Pizza Huts in each state divided by the number of
population by each respective state.
A linear regression between the dependent variable per capita number of Twitter
followers and independent variables per capita number of Pizza Huts and Republican tendency
yields results (Table 7) similar to findings in Table 6, in which a positive correlation between
the number of followers and the number of Pizza Huts was maintained, while the opposite was
true for the variable Republican tendency. However, one notable change is in the statistical
significance; the p-value associated with the coefficient for per capita number of Pizza Huts has
dramatically improved to yield a statistical significance on the p <0.0001 levels
Table 7: Linear regression of the per capita number of Twitter followers of Pizza Hut (DV) and the
per capita number of Pizza Hut outlets per state (IV) and partisan identification.
Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 463.7313 511.1573 0.907 0.369 Per capita 0.6524 0.1246 5.234 3.61e-­‐06 *** Pizza Huts Republican -­‐23.2977 626.4094 -­‐0.037 0.970 Tendency Notes: These are linear regression results (Dummy; 1=Republican, 0=Democratic). The asterisks indicate statistical significance. Adjusted R-­‐squared associated is 0.351. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐
value < 0.1. This means that there is less than 0.1 percent chance of obtaining a t-value of 5.234 or
larger. The regression coefficient associated with per capita number of Pizza Huts shows that an
increase in the unit of per capita number of Pizza Huts results in a 0.65 increase in per capita
36/51 CHO, Ara (UNI: ac3772)
number of followers, which means that where there is more Pizza Huts per capita, there is also
more Twitter followers of Pizza in that state. Going from being a Democratic state to a
Republican state makes per capita number of Twitter followers go down by 23.29 points, which
is not supportive of the hypothesis. This coefficient, however, is not statistically significant.
The same pattern was observed in the linear regression between per capita number of
Pizza Huts and per capita number of Twitter followers and Republican tendency (Table 8).
Table 8: Linear regression results of the per capita number of Pizza Huts (DV) and the per capita
number of Twitter followers of Pizza Hut (IV) and partisan identification.
Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 1243.9241 441.2010 2.819 0.00697 ** Per capita 0.5570 0.1064 5.234 3.61e-­‐06 *** No. of follower Republican -­‐726.7353 569.2182 -­‐1.277 0.20784 Tendency Notes: These are linear regression results of the per capita number of Pizza Huts (DV) and the per capita number of Twitter followers of Pizza Hut outlets per state (IV) and partisan identification (Dummy; 1=Republican, 0=Democratic). The asterisks indicate statistical significance. Adjusted R-­‐squared associated is 0.3723. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. With regard to Pizza Hut data, linear regression method was inadequate in representing
correlation between Twitter following, the number of Pizza Hut stores, and voting behavior. This
is likely due to the limitation of Twitter data and its inability to capture meaningful trends found
in real life setting.
With regard to the supplementary analysis using media pundits as proxy, there was
observable correlation between political identification, represented by Republican tendency for
the Colbert data and Republican tendency for the O’Reilly data, and Twitter follower in the
O’Reilly data, but not in the Colbert data. As seen in Table 9, with an increase in the unit of the
number of followers there was a positive 0.06 point increase in binary (in the direction from
Democrat to Republican). The regression coefficient is significant at the p < 0.5 level. The
regression doesn’t reveal a lot of information about the correlative relationship between the two.
37/51 CHO, Ara (UNI: ac3772)
In Table 10, a linear regression of the variables Republican tendency and the number of
followers and per capita number of followers and state population shows, again, no significant
regression between the said variables. The relationship is positive though, which means that with
increases in the number of followers (before and after controlling for population) are positively
correlated with Republican tendency, or the Republican state. This is the opposite of the
expected relationship between a liberal political commentator and their followers.
Table 10: Linear regression of partisan identification (DV) and the number of Twitter followers of
Colbert, the per capita number of Twitter followers of Colbert, and the size of the population.
Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 1.016977 0.223765 4.545 3.85e-­‐05 *** No. of Twitter 0.008731 0.005936 1.471 0.148 Followers Per capita 0.002747 0.005146 0.534 0.596 No. of Followers Population 0.006856 0.004842 1.416 0.163 Per state Notes: These are linear regression results of partisan identification (DV) and the number of Twitter followers of Stephen Colbert per state (IV), the per capita number of Twitter followers of Colbert, and state population. The asterisks indicate statistical significance. Adjusted R-­‐squared associated is 0.04279. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. In Table 11, a linear regression of the variables Republican tendency and the number of
followers and per capita number of followers, and state population shows a statistically
significant relationship between the number of followers and per capita number of followers.
This indicates that the number of Twitter followers reflects the population of the state in that the
increases are proportional. This is supportive evidence that Twitter-harvested data on the list of
followers may be representative of the geographic distribution of the population they purport to
represent.
38/51 CHO, Ara (UNI: ac3772)
Table 11: Linear regression of the number of Twitter followers of Stephen Colbert (DV) and the per
capita number of Twitter followers of Colbert per state (IV), with covariates partisan identification
both as factor and numeric variables.
Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 14.9242 3.5984 4.147 0.000136 *** Per capita 0.23 0.1143 2.012 0.049806 * No. of Followers Republican 5.4134 3.3139 1.634 0.108890 Tendency (factor) Republican NA NA NA NA Tendency (numeric) Notes: Linear regression of the number of Twitter followers of Stephen Colbert (DV) and the per capita number of Twitter followers of Colbert per state (IV), with covariates partisan identification both as factor and numeric variables (Dummy; 1=Republican, 0=Democratic). The asterisks indicate statistical significance. Adjusted R-­‐squared associated is 0.09808. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. Different outcome was shown in the regression analysis using the O’Reilly data. As seen
in Table 12, with an increase in the unit of the number of followers there was a positive 0.06
point increase in binary (in the direction from Democrat to Republican). The regression
coefficient is significant at the p < 0.1 level, which means that there is less than 1 percent chance
of repeating a z-value of 2.29 as seen in the coefficient.
The regression reveals a statistically significant outcome between the two said variables,
and further, meets the expectations that states with a greater number of Twitter followers of
conservative darling O’Reilly will be more inclined to be Republican, or to have voted for the
GOP in the 2012 Presidential Election. This is in contrast to what was seen in the Colbert data,
the reason may be due to the size of the samples or due to the fact that Colbert is considered a
comedian and an entertainer, whereas O’Reilly takes on the role as a show host of a political
commentary program.
39/51 CHO, Ara (UNI: ac3772)
Table 12: Generalized Linear Model of partisan identification (DV) and the number of Twitter
followers of Bill O’Reilly, with covariates the per capita number of Twitter followers of O’Reilly,
and state population.
Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) -­‐3.1502 1.16605 -­‐2.702 0.0069 ** No. of Twitter 0.05608 0.02449 2.290 0.0220 * Followers Per capita 0.02243 0.02301 0.975 0.3295 No. of Followers Population 0.03929 0.02325 1.69 0.0910 . Per Population Notes: Generalized Linear Model of partisan identification (DV) and the number of Twitter followers of Bill O’Reilly, with covariates the per capita number of Twitter followers of O’Reilly, and state population. The asterisks indicate statistical significance. The AIC associated is 67.431. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. In Table 13, a linear regression of the variables Republican tendency and the number of
followers, and per capita number of followers, and state population shows a statistically
significant relationship between the number of followers and Republican tendency This shows
that an increase in a unit of the number of followers is positively correlated with Republican
tendency which is coded as “0” for Democratic state and “1” for Republican. The regression
coefficient of 0.012 indicates that a state with a greater number of Twitter followers of O’Reilly
in the random sample is more likely to have voted for GOP in the 2012 Presidential Election,
controlling for population. Again, this is a statistically significant finding at the p < 0.01 level.
Table 13: Linear Regression of partisan identification (DV) and the number of Twitter followers of
Bill O’Reilly, with the per capita number of Twitter followers of O’Reilly, and state population.
Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 0.857048 0.201669 4.250 0.000101 *** No. of Twitter 0.011877 0.004853 2.447 0.018190 * Followers Per capita 0.004698 0.004839 0.971 0.336607 No. of Followers Population 0.007704 0.004447 1.733 0.089717 . Per Population Notes: Linear regression of partisan identification (DV) and the number of Twitter followers of Bill O’Reilly, with covariates the per capita number of Twitter followers of O’Reilly, and state population. The asterisks indicate statistical significance. The adjusted R-­‐squared associated is 0.1482. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. 40/51 CHO, Ara (UNI: ac3772)
In Table 14, a linear regression of the variables Republican tendency and the number of
followers and per capita number of followers, and state population shows a statistically
significant relationship between the number of followers and Republican tendency as a factor
variable. If a state has voted Republican in the most recent presidential election, as opposed to
voting Democrat, that state will have an average increase of about 8.85 point in the number of
followers numbers, meaning more followers of O’Reilly in the randomly generated sample. The
regression coefficient is statistically significant at the p < 0.01 There is a more moderate,
statistically less significant correlation between population-controlled Twitter follower data and
the raw figures for Twitter followers of O’Reilly; an increase in the unit of Twitter follower
figure, controlling for population, is correlated with an increase in the Twitter follower figure
without controlling for population. This can be interpreted to mean that the Twitter-generated
sample is somewhat proportional to, therefore representative of, the total population of the states.
Table 14: Linear regression of the number of Twitter follower of O’Reilly (DV) and the per capita
number of Twitter followers and partisan identification, both as factor and numeric.
Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 13.9541 3.8963 3.581 0.000795 *** Per capita 0.257 0.1318 1.949 0.057133 . No. of Followers Republican 8.8544 3.7752 2.345 0.023186 * Tendency (factor) Republican NA NA NA NA Tendency (numeric) Notes: Linear regression of the number of Twitter follower of O’Reilly (DV) and the per capita number of Twitter followers and partisan identification, both as factor and numeric (DV; Dummy; 1=Republican, 0=Democratic). The asterisks indicate statistical significance. Adjusted R-­‐squared associated is 0.1661. ***: p-­‐
value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. In conclusion, using well-known political commentators as proxy was somewhat
illuminating when it comes to the efficacy of Twitter in generating representative sample of the
U.S. population. With regard to investigating the relationship between political identification of
41/51 CHO, Ara (UNI: ac3772)
the states (i.e. voting behavior data from the 2010 presidential election) and the size of the
Twitter following, there was observable, positive correlation in the O’Reilly data whereas there
wasn’t statistically significant correlation in the Colbert data. The author is uncertain as to the
exact reason for this, as there are many factors that might lead to such outcome: difference in
size of the randomly generated sample, the nature of the TV shows hosted by each political
pundits (i.e. one is a satirical comedy show with political content while the other is a political
commentary show part of Fox News), etc.
7. Limitation
Murmurs on the tech industry that Twitter is on its way out have been around for over a
year now (LaFrance and Meyer, 2014). Media pundits say an Internet publishing platform that
carried us into the mobile age is receding, and what was once a sounding board for people to
organize ideas around different opinion became a place of narratives took the turn for the
predictable in a cruel, petty and facetious way (LaFrance and Meyer, 2014). This waning
influence of Twitter, though its impact on publishing will likely remain, is just one of many
limitations of using Twitter-generated data.
Furthermore, there are other, more obvious limitations to using Twitter-generated data, as
it hasn’t been proven to be highly representative of actual population it purports to reflect. It is
with that limitation in mind that this study has been conducted; it is to determine to what degree
utilizing Twitter data, or data from other types of social network services, can be helpful in
research.
The spatial analysis component was included in this study as part of effort to capture a
correlation between partisan identification and dietary preference that was more or less elusive in
regression analyses. However, the spatial component still relied on the same figures that were
42/51 CHO, Ara (UNI: ac3772)
used in the regression analyses, and neither the linear or logistic regression yielded meaningful
information about how correlated the above-mentioned variables were. This study is aware of the
many limitations each step of the analysis and the dataset had, but also that much of it was due to
the novel characteristic and size of the data.
8. Conclusion
Research efforts to unpack and ultimately shift the determinants of health outcomes are
often hampered by deficits in existing data sources. For instance, national health surveys do not
include information on individuals’ party identification or other political beliefs. Similarly,
national sources of political data offer paltry measures of health behaviors or status (Gollust,
2013). Thus, creative data linking and collection is critical, and this study is just one example of
such attempt.
More specifically, this thesis has set out to investigate how social media measures
compare with more standard ones in the context examining the association of politics and
obesity. This study has employed two main methods—regression and spatial analyses—to see
whether a correlation between politics and obesity can be replicated. Spatial analysis was
marginally more helpful in replicating the correlation. One map, of the per capita number of
Twitter followers of Pizza Hut per state (Map 5), showed more similarity with the presidential
election map (Map 1). Another map, showing the per capita number of Pizza Huts per state
(Map 4), also shared similarities with the 2012 election map. Those maps, however, had enough
exceptions to the pattern that they are not above scrutiny.
Twitter data analysis, at least when using data on geographic locations of followers, is a
research method that requires a more careful calibration and scrutiny for its results to
complement a traditional research method.
43/51 CHO, Ara (UNI: ac3772)
This paper was able to find that there is no discernible relationship at the state level,
between the number of Pizza Hut followers per capita and whether the state is Republicanvoting, despite many patterns to the contrary. The relationships among the many variables that
serve as proxies for partisan identification and measures of obesity are sufficiently are weak, the
measures are sufficiently coarse, and the units of analysis are large enough that a convincing
relationship between the number of Pizza Hut followers per capita and whether the state is
Republican-voting is not found. That said, this paper did establish that the number of Twitter
followers of Pizza Hut serves as a good predictor of how many Pizza Hut outlets there are in
each state, a promising development and one that could be extended to other domains, where
social media data can help to capture hard-to-measure reality.
44/51 CHO, Ara (UNI: ac3772)
References
Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2011). Sentiment Analysis of
Twitter Data. Proceedings of the Workings on Languages in Social Media, 11, 30-38.
Retrieved March 11, 2015, from http://dl.acm.org/citation.cfm?id=2021114
Anderson, A., Goel, S., Huber, G., Malhotra, N., and Watts, D. (2014). Political Ideology and
Racial Preferences in Online Dating. Sociological Science 1 (February 18, 2014), 28-40.
Retrieved on April 26, 2015, from http://tinyurl.com/o92nkyv
Barry, C.L., Brescoll, V.L, Brownell, K.D., and Schlesinger, M. (2009). Obesity Metaphors:
How Beliefs About the Causes of Obesity Affect Support for Public Policy. Milbank
Quarterly, 87 (2009), 7-47.
Bulmer, J. (2013). Fast Food Map. Retrieved March 29, 2015 from https://github.com/hardwork-aunts/james-bulmer
Carney, D.R., Jost, J.T., Gosling, S.D., and Potter, J. (2008). The Secret Lives of Liberals and
Conservatives: Personality Profiles, Interaction Styles, and the Things They Leave
Behind. Political Pyschology, 29 (2008), 807-840.
Cosper, D. (2013, May 13). 10 Biggest Fast Food Chains in the U.S. (By Location). Retrieved
May 30, 2015, from http://ezlocal.com/blog/post/10-largest-fast-food-chains-2013.aspx
Currie, J., DellaVigna, S., Moretti, E., and Pathania, V. (2009). The Effect of Fast Food
Restaurants on Obesity. Retrieved on April 27, 2015, from
http://eml.berkeley.edu/~sdellavi/wp/fastfoodJan09.pdf
Duffey, K. (2007). Differential Associations of Fast Food and Restaurant Food
Consumption with 3-y Change in Body Mass Index: The Coronary Artery Risk
Development in Young Adults Study. The American Journal of Clinical Nutrition, 85(1),
201-201. Retrieved March 9, 2015.
Fraser, L., Edwards, K., Cade, J., and Clarke, G. (2010). The Geography of Fast Food Outlets: A
Review. The International Journal of Environmental Research and Public Health, 7(5),
2,290-2,308. Retrieved April 26, 2015, from
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2898050
Gollust, S. (2013). Obesity in Red and Blue: Understanding the Associations Between Politics
and Obesity. Preventative Medicine, 57:5 (November 2013), 436-437. Retrieved on April
28, 2015, from http://www.sciencedirect.com/science/article/pii/S0091743513003071
Have, M., de Beauford, D., Teixeira, P.J., Mackenbach, J.P., van der Heide, A (2011). Ethics and
Prevention of Overweight and Obesity: An Inventory. Obesity Reviews, 12(9), 1.
Retrieved April 2, 2015.
45/51 CHO, Ara (UNI: ac3772)
Jeffrey, R., Baxter, J., McGuire, M., and Linde, J. (2006). International Journal of Behavioral
Nutrition and Physical Activity, 3:2. Retrieved on April 27, 2015, from
http://www.ijbnpa.org/content/3/1/2
Kersh, R. (2009). The Politics of Obesity: A Current Assessment and Look Ahead. The
Milbank Quarterly, 87(1), 295-316. Retrieved March 8, 2015, from
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2879181
Kinder, D.R. (2006). Politics and the Life Cycle. Science, 312 (2006), 1,905-1,908.
Krugman, P. (2015, March 5). The Conscience of a Liberal. The New York Times.
Retrieved March 8, 2015, from http://krugman.blogs.nytimes.com/2015/03/05/heavypolitics
Krugman, P. (2015, March 6). Pepperoni Turns Partisan. The New York Times. Retrieved
March 8, 2015, from http://www.nytimes.com/2015/03/06/opinion/paul-krugmanpepperoni-turns-partisan.html
Kumar, S., Morstatter, F., and Liu, H. (2014). 1. Introduction. In Twitter Data Analytics.
New York, New York: Springer.
LaFrance, A. and Meyter, R. (2014, April 30). A Eulogy for Twitter. The Atlantic. Retrieved on
April 4, 2015, from http://www.theatlantic.com/technology/archive/2014/04/a-eulogyfor-twitter/361339
Lazer, D., Kennedy, R., King, G., and Vespignani, A. (2014). The Parable of Google Flu: Traps
in Big Data Analysis. Sciences 343 (March 14), 1,203-1,205. Retrieved on April 26,
2015, from http://gking.harvard.edu/publications/parable-google-flu%C2%A0traps-bigdata-analysis
Lindeke, B. (2012, November 6). Do Sidewalks Make You Vote Democratic? Streets.mn.
Retrieved on April 28, 2015, from http://streets.mn/2012/11/06/do-sidewalks-make-youvote-democratic
Makice, K. (2009). Twitter API: Up and running. Sebastopol, California: O'Reilly.
Martin, A. (2015, March 3). Inside the Powerful Lobby Fighting for Your Right to Eat
Pizza. Bloomberg Business. Retrieved March 8, 2015, from
http://www.bloomberg.com/news/features/2015-03-03/junk-food-s-last-stand-the-pizzalobby-is-not-backing-down
Mitchell, A, and Hitlin, P. (2013, March 4). Twitter Reaction to Events Often at Odds with
Overall Public Opinion. Pew Research. Retrieved on April 26, 2015, from
http://www.pewresearch.org/2013/03/04/twitter-reaction-to-events-often-at-odds-withoverall-public-opinion
46/51 CHO, Ara (UNI: ac3772)
Nyhan, B. (2014, October 24). Americans Don’t Live in Information Cocoons. The New York
Times. Retrieved on April 13, 2015, from
http://www.nytimes.com/2014/10/25/upshot/americans-dont-live-in-informationcocoons.html
Obesity Prevalence Maps. (2014, September 9). Retrieved March 8, 2015, from
http://www.cdc.gov/obesity/data/prevalence-maps.html
OpenSecrets (2015, February 2) Restaurants & Drinking Establishments: Top Contributors,
2013-2014. Retrieved March 8, 2015, from
http://www.opensecrets.org/industries/contrib.php?cycle=2014&ind=G2900
Pew Research. (2014, September 1). Social Networking Fact Sheet. Retrieved March 1, 2015,
from http://www.pewinternet.org/fact-sheets/social-networking-fact-sheet
Pew Research. (2015, January 9). Social Media Update 2014. Retrieved March 9, 2015, from
http://www.pewinternet.org/2015/01/09/social-media-update-2014
Pianin, E. (2014, June 19). Michelle Obama Battles Over School Lunch Nutrition
Standards.The Fiscal Times. Retrieved March 9, 2015, from
http://www.openculture.com/2014/02/kurt-vonnegut-masters-thesis-rejected-by-uchicago.html
Pianin, E., & Ehley, B. (2014, June 19). Budget Busting U.S. Obesity Costs Climb Past $300
Billion a Year. The Fiscal Times. Retrieved March 9, 2015, from
http://www.thefiscaltimes.com/Articles/2014/06/19/Budget-Busting-US-Obesity-CostsClimb-Past-300-Billion-Year
President Map (2012, November 29). Election 2012. The New York Times. Retrieved March 30,
2015, from http://elections.nytimes.com/2012/results/president
Rhodes, D., Adler, M., Clemens, J., LaComb, R., & Moshfegh, A. (2014, February 1).
Consumption of Pizza: What We Eat in America, NHANES 2007-2010. Retrieved March
9, 2015, from
http://www.ars.usda.gov/SP2UserFiles/Place/80400530/pdf/DBrief/11_consumption_of_
pizza_0710.pdf
Russell, M. (2011). 21 recipes for mining Twitter. Sebastopol, Calif.: O'Reilly Media.
Satcher, D. (2001, October 1). The Surgeon General's Call to Action to Prevent and Decrease
Obesity. Retrieved March 9, 2015, from
http://www.cdc.gov/nccdphp/dnpa/pdf/CalltoAction.pdf
Sherman, Erik. (2014, April 14). Many Twitter Users Don’t Tweet, Finds Report. CBS Money
47/51 CHO, Ara (UNI: ac3772)
Watch. Retrieved March 29, 2015, from http://www.cbsnews.com/news/many-twitterusers-dont-Tweet-finds-report
Shin, M., and McCarthy, W (2013). The Association Between County Political Inclination and
Obesity: Results from the 2012 Presidential Election in the United States. Preventative
Medicine, 57:5 (November 2013), 721-724. Retrieved on April 27, 2015, from
http://www.sciencedirect.com/science/article/pii/S0091743513003046
Swanson, A. (2015, April 7). Map: The Most Liberal and Conservative Towns in Each State. The
Washington Post. Retrieved April 9, 2015:
http://www.washingtonpost.com/blogs/wonkblog/wp/2015/04/07/map-the-most-liberaland-conservative-towns-in-each-state/
Texas Wide Open For Business (2014) Fortune 500 Companies in Texas: The Lone Star State Is
Home to 52 Fortune 500 Corporate Headquarters. Retrieved on April 4, 2015, from
https://texaswideopenforbusiness.com/sites/default/files/11/14/14/texasfortune500.pdf
Trinko, K. (2014, March 10). Soda Ban? What About Personal Choice? Column. USA Today.
Retrieved April 2, 2015, from http://www.usatoday.com/story/opinion/2013/03/10/sodaban-what-about-personal-choice-column/1977091
Weller, K., Bruns, A., Burgess, J., Mahrt, M., & Puschmann, C. (Eds.). (2014). Twitter
and Society. New York: Peter Lang Publishing.
West, J. (2014, December 18). Listen to the Real Stephen Colbert Explain How He Maintained
His Flawless Character for 9 Years. Mother Jones. Retrieved on April 13, 2015, from
http://www.motherjones.com/mixed-media/2014/12/stephen-colbert-character-podcastartist-farewell
Wieder, R. (2013, March 21). Selling Unhealthy Food as “Freedom.” CalorieLab. Retrieved
April 2, 2015, from http://calorielab.com/news/2013/03/21/selling-unhealthy-food-asfreedom
48/51 CHO, Ara (UNI: ac3772)
Appendix
Table 1: Twitter followers of Pizza Hut (@pizzahut) by U.S. state from a randomly generated
sample of 7,500 yielded some 5,535 Twitter users: States with followers of 100 or more are colored
3
red, and the rest in blue.
1 2 3 4 5 6 7 8 9 10 11 12 abbr AK AL AR AZ CA CO CT DC DE FL GA HI follow 58 217 59 21 226 131 18 4 125 54 99 147 13 14 15 16 17 18 19 20 21 22 23 24 25 IA ID IL IN KS KY LA MA MD ME MI MN MO 209 67 158 303 38 15 8 133 5 103 96 7 85 26 27 28 29 30 31 32 33 34 35 36 37 38 MS MT NC ND NE NH NJ NM NV NY OH OK OR 19 6 56 227 175 12 12 9 13 51 47 31 318 39 40 41 42 43 44 45 46 47 48 49 50 51 PA RI SC SD TN TX UT VA VT WA WI WV WY 107 187 41 0 13 72 67 68 0 70 50 4 2 Table 2: Summary statistics of combined data
3 California and Oregon yielded unusually large numbers for most iteration. This may be due to
the fact that the states have more younger population that is active on social media platforms.
49/51 CHO, Ara (UNI: ac3772)
Table 3: R commands (a selection) to cull randomly selected sample of Twitter followers
Table 4: Linear regression of the number of Twitter followers of Pizza Hut (DV) and the number of
Pizza Huts per state (IV), with no covariates
Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 20.1528 3.6783 5.479 1.47e-­‐06 *** No. of Pizza Huts 0.1184 0.1486 0.797 0.429 Notes: These are linear regression results of the number of Twitter followers of Pizza Hut (DV) and the number of Pizza Hut outlets per state (IV), with no covariates. The asterisks indicate statistical significance. Adjusted R-­‐squared associated is -­‐0.007352. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. Table 5: Linear regression of the number of Twitter followers of Pizza Hut (DV) with independent
variables the number of Pizza Huts and the number of political identification
Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 17.2459 4.0087 4.302 8.27e-­‐05 *** No. of Pizza Huts 0.1143 0.1460 0.783 0.437 Republican 6.3618 3.8027 1.673 0.101 Tendency Notes: These are linear regression results of the number of Twitter followers of Pizza Hut (DV) and the number of Pizza Hut outlets per state (IV) and partisan identification (Dummy; 1=Republican, 0=Democratic). The asterisks indicate statistical significance. Adjusted R-­‐squared associated is 0.02832. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. 50/51 CHO, Ara (UNI: ac3772)
Table 6: Linear regression of number of Pizza Huts (DV) and the number of Twitter followers of
Pizza Hut outlets per state (IV) and partisan identification
Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) 18.6860 3.7705 4.956 9.37e-­‐06 *** No. of Twitter 0.1103 0.1409 0.783 0.437 Followers Republican -­‐0.2677 3.8431 -­‐0.070 0.945 Tendency Notes: These are linear regression results of the number of Pizza Huts (DV) and the number of Twitter followers of Pizza Hut outlets per state (IV) and partisan identification (Dummy; 1=Republican, 0=Democratic). The asterisks indicate statistical significance. Adjusted R-­‐squared associated is -­‐0.02823. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. Table 9: Generalized Linear Model of the partisan identification (DV) and the number of Twitter
followers of Stephen Colbert, the per capita number of Twitter followers and the size of the
population per state.
Coefficients: Estimated Std. Error t-­‐value Pr (>|t|) (Intercept) -­‐0.930653 1.117349 -­‐0.833 0.4049 No. of Twitter 0.062072 0.032648 1.901 0.0573 . followers Per capita -­‐0.398678 0.419067 -­‐0.951 0.3414 No. of followers No. of population -­‐0.002284 0.032286 -­‐0.071 0.9436 By state Notes: These are linear regression results of partisan identification (DV; Dummy; 1=Republican, 0=Democratic) and the number of Twitter followers of Stephen Colbert per state (IV), the per capita number of Twitter followers and the size of the population per state. The asterisks indicate statistical significance. AIC associated is 71.167. ***: p-­‐value < 0.01, **: p-­‐value < 0.05, *: p-­‐value < 0.1. 51/51 
Download