BELIEFS, COORDINATION AND MEDIA FOCUS

advertisement
BELIEFS, COORDINATION AND MEDIA FOCUS
KRISTOFFER P. NIMARK AND STEFAN PITSCHNER
Abstract. News media provide an editorial service for their audiences by monitoring a
large number of events and by selecting the most newsworthy of these to report. Using a
Latent Dirichlet Allocation topic model to classify news articles we document the editorial
function of US newspapers.
We nd that, while dierent newspapers tend to report on
dierent topics to dierent degrees, news coverage becomes more homogenous across newspapers after major events. We present a simple theoretical model that can match this fact
and then use it to analyze the implications of the editorial function of news providers for
agents' beliefs and their ability to coordinate. We show that, compared to a setting where
agents choose ex ante what to get information about, their actions become more correlated
when they rely on the editorial function of news providers. Information about large events is
closer to common knowledge than information about small events. As a consequence, agents
respond more than proportionally to large events, and in expectations, do not respond at
all to events that are small enough.
1. Introduction
Every day, a vast number of events occur, each of them potentially relevant for the decisions
of rms and households.
However, no individual rm or household has the resources to
observe all of these events.
their behalf.
Instead, many rely on news media to monitor the world on
One important function that news media perform is thus editorial.
Among
all potential stories that occur, only those that are deemed most newsworthy are reported.
In this paper, we analyze how such a news-selection mechanism can aect the beliefs of
economic agents and their ability to coordinate.
Strategic decisions based on imperfect information are pervasive in economics. Producers
in oligopolistic markets need to predict the output of their competitors, speculators need to
predict whether other speculators plan to attack a currency, and price setters need to predict
the pricing decisions of other rms. In such settings, it is well known that public signals are
disproportionately inuential as they tend to be particularly useful for agents that need to
predict the actions of other agents, e.g. Morris and Shin (2002). Arguably, everything that
is reported by news media is public in the sense that it is available to those who care to read
it. However, in reality, not all of this information is common knowledge. That is, not all
information that is publicly available is also observed by everybody, and not all information
that
is
observed by everybody is also
known
to be observed by everybody, and so on. In
Date : December 6, 2015, Nimark: Economics Department, Cornell University. e-mail : pkn8@cornell.edu
webpage : www.kris-nimark.net. Pitschner: Universitat Pompeu Fabra, e-mail : pitschner@gmail.com, webpage : www.stefanpitschner.com. The authors thank Ed Green, Karel Mertens and conference and seminar
participants at Penn State, Sveriges Riksbank, Stockholm University, Helsinki University, SED 2015 and
Cornell University for useful comments and suggestions.
1
2
KRISTOFFER P. NIMARK AND STEFAN PITSCHNER
this paper, we argue that understanding the editorial role played by news media is central
to understanding what determines the amount of news coverage a particular event receives
and the degree to which knowledge about an event is common among agents.
We begin by estimating a Latent Dirichlet Allocation (LDA) topic model based on texts
from almost 15,000 archived newspaper stories from 17 US newspapers.
The newspaper
stories are from two periods that we know a priori contained major news events, namely
the 90 day period around the 9/11 terrorist attacks on New York and the Pentagon and
the 90 day period around the Lehman bankruptcy that signaled the start of the nancial
crisis. We use the model to document three stylized facts of news coverage. First, dierent
newspapers specialize in dierent topics.
For example, the Wall Street Journal allocated
more news coverage to the nancial crisis than the average newspaper and the New York
Times allocated more than average coverage to presidential politics.
Second, the extent
of total news coverage allocated to dierent topics varies over time and depends on what
has happened. Third, major events make news coverage more homogenous across dierent
outlets. The September 11 terrorist attacks, the 2008 political party conventions, the Lehman
bankruptcy and the failed bailout package proposed by then Secretary of the Treasury Hank
Paulson, were all events that resulted in a majority of newspapers devoting more coverage to
these events than to any other. Together, these facts suggest that information about major
events is closer to common knowledge than information about minor events.
In order to analyze how the documented editorial behaviour of news media aects agents'
beliefs and decisions, we propose a theoretical model with incomplete information that can
replicate the stylized facts described above. The model is a beauty contest game in which an
agent's pay-o depends on the distance of his action from an agent specic latent variable
and the distance of his action from the action taken by other agents.
This heterogeneity
in agents' pay-o functions is taken as given but could arise for various reasons, such as
dierences in geographical location or sector aliation.
A basic premise of our model is
that the dimensionality of the state of the world is too high for individual agents to monitor
it on their own. Therefore, they rely on information providers that do so on their behalf.
Furthermore, because agents are heterogeneous in terms of what information they nd most
useful, news providers specialize and cater to their dierent interests. However, because of
a strategic motive, the agents in our model also have an indirect interest in events that are
only important for predicting the actions of others. As a result, in some states of the world
all information providers report on the same events.
Agents in our model delegate the decision of what to get information about to specialized
information providers. These information providers can monitor a larger set of events than
they eventually end up reporting.
Because their decision about what to report depends
on the relative newsworthiness of the realized events, what agents get information about
depends on
what has happened.
The model presented here formalizes this editorial function
of news media and thus provides a theory of how and why news media focus changes over
time. While the model is abstract, it oers several insights that we believe are general.
One consequence of a state-dependent news selection is that reported news stories can
be informative about more than the events they actually cover. More precisely, we derive
formal conditions for when news reports also reveal information about those events that are
not
reported.
To see this, consider a person who opens a San Francisco newspaper and
BELIEFS, COORDINATION AND MEDIA FOCUS
nds that it only contains stories about New York.
3
If this person knows that the paper
always covers all important San Francisco events, the lack of stories on such events reveals
to him that none have actually taken place. Therefore, even though he only reads about New
York, he can also update his beliefs about San Francisco. Moreover, because this type of
information transmission results directly from the systematic news-selection, it also occurs
if the realizations of the reported and unreported events are unconditionally independent.
The systematic selection of what gets reported also aects the degree to which knowledge
about an event is common among agents. In the existing imperfect information literature,
signals are typically assumed to be either private or common knowledge, e.g.
Morris and
Shin (2002), Angeletos and Pavan (2007), Angeletos, Hellwig and Pavan (2007), Hellwig and
Veldkamp (2009), Amador and Weill (2010, 2012), Cespa and Vives (2012) and Edmond
(2013). In our model, information about a particular event is typically neither private nor
common knowledge.
Instead, the degree to which knowledge about an event is common
among agents is endogenous and depends probabilistically on agents preferences and the distribution of events. Because news selection is state-dependent, what agents get information
about, also inuences how probable they think it is that other agents read about the same
event. As an example, consider again a person living in San Francisco. If the San Francisco
newspaper reports about some event in Manhattan that normally would be of more interest
to a reader from New York, the reader in San Francisco can infer that New Yorkers are
probably also reading about that event. However, even though both San Franciscans and
New Yorkers are reading about the same event, this event may not be common knowledge:
While the San Franciscan can be sure that the New Yorker is also reading about the event
on Manhattan, the New Yorker cannot draw a corresponding conclusion.
When extreme events such as large terrorist attacks or major nancial crises occur, they
tend to be reported on the front page of almost all major newspapers. In the model, events of
low probability and large magnitude are important for all agents even if only a subset of them
has a direct interest in these events. Because individual agents care about the strong actions
that some agents will take in response to the extreme event, such events tend to be reported
by all information providers. Moreover, because agents understand this, information about
more extreme events also tends to be closer to being common knowledge. So, when a person
in New York reads about a major nancial crisis on Wall Street, he can be almost sure that
people in San Francisco are reading about the same event.
The agents in our model cannot directly observe the entire state of world. This makes them
similar to the rationally inattentive agents in Sims (2003), Mackowiak and Wiederholt (2009,
2010), Alvarez, Lippi and Paciello (2011), Matejka (forthcoming), Matejka and McKay (2015)
and Stevens (2014) as well as to agents that need to pay a cost to observe a signal about
pay-o relevant latent variables such as those in Grossman and Stiglitz (1980), Veldkamp
(2006a,2006b), Van Nieuwerburgh and Veldkamp (2009, 2010). The key dierence of our set
up relative to such existing endogenous information choice models is that the agents in our
model rely on specialists that monitor the entire set of realized events before deciding what
to report. That is, our agents make
ex ante
choices of which newspapers to read while each
newspaper makes its editorial decision of what to report
world has realized.
ex post,
i.e. after the state of the
This setup captures the fact that the decision to acquire information
4
KRISTOFFER P. NIMARK AND STEFAN PITSCHNER
often is a decision about which information provider to use, rather than a decision about
1
what variable or event to get information about.
Two papers that are closely related to this one are Veldkamp (2006b) and Hellwig and
Veldkamp (2009). Veldkamp studies a model in which ex ante identical agents choose asset
portfolios and signals simultaneously. If dierent agents hold dierent portfolios, they prefer
to observe signals about the pay-os of dierent assets. However, due to increasing returns
to scale in information production, agents tend to purchase similar signals and hold similar portfolios. Apart from the delegated information acquisition decision described above,
the main dierence between our model and Veldkamp's is that our agents are intrinsically
heterogenous in a way that is not directly aected by the information they observe.
In the coordination game in Hellwig and Veldkamp (2009), ex ante identical agents can
chose to observe dierent combinations of private and public signals about a single latent
variable of common interest. Hellwig and Veldkamp show that in such a setting, information
acquisition inherits the strategic properties of the coordination game.
Thus, if there is a
strategic complementarity in actions, agents also want to buy the same signals as other
agents. The main dierence between that paper and ours is that the agents in Hellwig and
Veldkamp's model do not choose what to get information about, but whether the noise in
their signals is common to all agents or idiosyncratic.
Gentzkow and Shapiro (2006, 2008), like our paper, study the editorial function of news
media, but are primarily focused on identifying and analyzing the causes and consequences
of ideologically slanted reporting. The political science literature has also studied the role
of news journalists and newspaper editors as "gatekeepers" that decide what information
gets reported, e.g. Soroka (2006, 2012) and Soroka, Stecula, Wlezien (2014 ). Again, this
literature focus primarily on analyzing and documenting ideologically biased reporting.
The rest of the paper is structured as follows. In the next section we document several
stylized facts about news coverage using a statistical topic model applied to US newspaper
data. Section 3 presents the basic set up of a beauty contest-style model in which agents
have heterogenous interests that can match the documented facts. Section 4 presents formal results based on discrete distributions of events and Section 5 extends the analysis to
continuous distributions. Section 6 concludes.
2. Three Stylized Facts of News Coverage
In this section, we estimate a Latent Dirichlet Allocation (LDA) topic model based on texts
from a large number of archived newspaper articles. We then use this model to document
three stylized facts about news coverage. In particular, we show that dierent newspapers
specialize in dierent topics, that the weights they assign to topics depend on what has
happened, and that major events make news coverage more homogeneous across papers.
2.1.
The News Data.
Our empirical analysis focuses on two 3-month periods for which
we know that they contained major news events. The st period covers the months August
to October of 2001, encompassing the terrorist attacks on the World Trade center and the
1The motive of our agents is well-captured by Marschak (1960) who writes that
"The man who buys a
newspaper does not know beforehand what will be in the news. He acquires access to potential messages
belonging to a set called news."
BELIEFS, COORDINATION AND MEDIA FOCUS
5
Table 1. Newspapers in Database
Newspaper Full Name
Short Name
Newspaper Full Name
Short Name
Atlanta Journal
AJ
The Las Vegas Review-Journal
LVR
Charleston Gazette
CG
The New York Times
NYT
Pittsburgh Post-Gazette
PPG
The Pantagraph
PG
Portland Press Herald
PPH
The Philadelphia Inquirer
PI
Sarasota Herald-Tribune
SHT
The Wall Street Journal
WSJ
St. Louis Post-Dispatch
SLP
The Washington Post
WP
Telegram & Gazette Worcester
TGW
USA Today
UT
The Boston Globe
BG
Winston-Salem Journal
WiSJ
The Evansville Courier
EC
Notes: The table shows the full names of the newspapers whose front-page articles are in our text corpus. It
also shows corresponding short names used in the empirical analysis below.
Newspapers that have changed
their names over time or have merged are combined into one entry.
Pentagon on September 11. The second period runs from August to October of 2008 and
thus encompasses the Lehman Brothers bankruptcy as well as the outbreak of the nancial
crisis.
The data we use are parts of news articles obtained from the Dow Jones Factiva database.
Factiva contains historical content from more than 30,000 news papers, wire services and
online sources beginning in 1970. We exclude content from wire services since their main
audiences are other news organizations. We also limit our data set to articles that appeared
either on front pages of newspapers or on the rst pages of their general interest sections.
In total, we obtain data from 14,817 articles reported by 17 dierent US newspapers.
Each of these articles is stored in our data set in the form of a text snippet that typically
comprises its rst one or two sentences. The selection of newspapers includes all those for
which we are able to reliably identify the stories that appeared on their front pages or the
rst pages of their general interest sections. Table 1 contains an overview of the newspapers
in our database as well as corresponding short names that we use in the analysis below. To
illustrate the type of information that the text snippets contain, Table 2 shows a number of
examples.
2.2.
Latent Dirichlet Allocation.
To extract topics from our text corpus, we estimate
a Latent Dirichlet Allocation (LDA) topic model. Introduced in Blei et al (2003), Latent
Dirichlet Allocation is now one of the most-widely applied tools in machine learning and
natural language processing. Variants of it have been used, for example, to identify scientic
topics (Griths and Steyvers, 2004) and to classify micro blogs (Ramage et al, 2010). The
rst application to economics or nance that we are aware of is Mahajan, Dey and Haque
(2008), who use it to classify nancial news articles. More recently it has also been used by
6
KRISTOFFER P. NIMARK AND STEFAN PITSCHNER
Table 2. Sample Text Snippets of Newspaper Articles in the Database
Text Snippet
Newspaper
Publication Date
An 18-year-old student who wounded ve
The New York Times
2001/10/31
The Washington Post
2001/09/15
Portland Press Herald
2008/10/01
The Philadelphia Inquirer
2008/09/10
people at his suburban San Diego high
school earlier this year committed suicide,
hanging himself with a sheet in his jail cell.
The student, Jason Anthony Homan,
pleaded guilty last month in the ...
Passengers returned to US airports in
increasing numbers yesterday to nd long
lines, layers of new security and limited
service. But many travelers were able to
reach their destinations as more than a
third of the usual number of ...
A day after dividing their votes on a failed
proposal for a 700 billion Wall Street
bailout, Maines two US House members
agreed Tuesday that its vital for lawmakers
to pass a relief bill for credit markets.
In a case that could have dramatic
consequences for school districts and towns
across Pennsylvania, the state Supreme
Court will hear arguments today on the
constitutionality of the commonwealths
property-tax system, which raises more ... Notes: The table shows examples of the text snippets used to estimate the LDA topic model below. The text
snippets were extracted from the Dow-Jones Factiva database. The dates shown are those on which the articles
were originally published in the print-editions of the respective newspapers.
Bao and Datta (2014) to discover risk-factors disclosed in annual corporate lings. Furthermore, Fligstein, Brundage and Schultz (2014) as well as Hansen, McMahon and Prat (2015)
have used Latent Dirichlet Allocation to analyze FOMC transcripts.
Using LDA allows us to discover and quantify the topics of a very large number of news
texts without relying on manual classications or pre-dened categories. Moreover, because
LDA denes articles as mixtures of dierent topics, it can accommodate the fact that many
news stories talk about more than one specic issue. For example, it can capture that an
article about a government bailout package may discuss both politics and nancial markets.
BELIEFS, COORDINATION AND MEDIA FOCUS
7
The main parameter of choice researchers need to set before estimating an LDA model
is the number of topics.
Once this number has been set, the actual topics are formed
endogenously and are thus outputs of the estimated model. Relative to approaches that use
word counts to measure news coverage, e.g. Baker, Bloom and Davis (2013), the LDA does
therefore not require researchers to pre-specify words or topics of interest. Another desirable
property of LDA is that it captures not only changes in the importance of a topic over time,
but also how important that topic is in an absolute sense.
The text data or corpus used for estimating an LDA topic model is described by a
vocab-
ulary, which is a list of all words that it contains, and documents, which are partitions of the
text corpus into separate "pieces". In our case, each text snippet from a news article is one
document, and all text snippets together form the corpus. Generally speaking, an LDA topic
model can be thought of as a latent structure that could have generated the observed text
corpus following probabilistic rules. It is parameterized by (i) a distribution over topics that
determine the probability that a document belongs to a topic and (ii) a distribution over
the words in the vocabulary that denes each of the topics. In the LDA framework, each
document in a corpus can be thought of as having been generated by the following steps:
(1) Draw a set of topic weights from the distribution over topics.
(2) Draw
N
topics from this document-specic topic distribution, with
N
being the
number of words in the document
(3) Draw one word from each of these
N
topics.
k ∈ {1, 2, ..., K}, documents
d ∈ {1, 2, ..., D} , the words in the vocabulary by v ∈ {1, 2, ..., V }, and words in a document
by n ∈ {1, 2, ..., N }. The probability of a specic text corpora being generated is then given
To describe the LDA model more formally, we index topics by
by
by the distribution
p (β, θ, z, w) =
K
Y
p (βi )
i=1
where
β, θ
and
z
D
Y
d=1
p (θd )
N
Y
!
p (zd,n | θd ) p (wd,n | β1:K , zd,n )
(2.1)
n=1
are unobserved parameters. The rows of the
K ×V
matrix
β
contains the
βk for topic k , the K × D matrix θ contains the topic proportions θd of
θd,k is the proportion of words in document d drawn from topic k . The
topics assignment of document d is zd so that word n in topic d is drawn from topic zd,n .
The text corpus enter the distribution (2.1) through the matrix w, dened so that the words
observed in document d is the vector wd and wd,n is word n in document d.
word distribution
document
d
so that
There are two underlying properties that are particularly important here.
is a mixed membership model.
topics to dierent degrees.
First, LDA
This implies that each document may belong to dierent
As discussed above, this is helpful for our application as it
allows newspaper articles to be treated as belonging to several topics at the same time.
For example, an article could be classied as belonging to the topics
congressional politics
with topics weight 0.4 and 0.6, respectively.
nancial crisis
and
Second, the order and
grammatical structure of words within documents is assumed to be irrelevant. While this
so-called bag-of-words assumption makes LDA inappropriate for the extraction of detailed
grammatical relationships between words in a text, it is useful for discovering the topics
these texts generally describe.
8
KRISTOFFER P. NIMARK AND STEFAN PITSCHNER
In order to apply Latent Dirichlet Allocation to an observed text corpus, the generative
process described above needs to be inverted.
The posterior distribution for the latent
parameters conditional on the text corpus can be formed by dividing the likelihood function
(2.1) by the probability of observing that corpus
p (β, θ, z | w) =
p (β, θ, z, w)
.
p (w)
(2.2)
Evaluating the denominator in (2.2) is computationally infeasible as it entails integrating
over the distributions of the latent parameters.
However, there are several methods that
can be used to approximate the distribution, see Asuncion, Welling, Smyth and Teh (2009).
Here, we rely on collapsed Gibbs sampling algorithm of Griths and Steyvers (2004) to
estimate
β, θ
and
z.
Both the limited number of discretionary decisions required for the
LDA estimation and the fact that topics emerge from the analysis without having to predene them are particularly attractive for our application.
These properties allow us to
analyze the thousands of documents in our database in an objective and replicable manner.
2.3.
Estimation.
To be able to estimate the LDA model using the approach described
above, we rst have to translate the raw newspaper texts into a vector-space representation
that captures their word frequencies.
For this, we break the text down into single words
and remove a number of very common terms that have little informative value in bag-ofword models, see Blei et al (2009). Then, we remove word-suxes using the Porter (1980)
stemming algorithm. This step allows us to group closely related words such as presidential
and president or worker and workers and thus reduces the size of the resulting vector
space. For computational reasons, we also limit our vector-space to words that occur at least
200 times.
The number of topics in the benchmark model is set to 10. While choosing a larger number
can generally result in more of the topics having a clear interpretation, it can also yield a
classication that is too ne for subsequent analyses. We estimate a single LDA model using
the texts from both 2001 and 2008 jointly. This allows for the possibility that some topics
may have a timeless dimension. For instance, the vocabulary used in sports related articles
may change little over time and form a topic that is present in news articles in both 2001
and 2008.
2.4.
2
Estimated LDA Topics.
Table 3 shows the topics identied by our estimated LDA
model in terms of their highest-probability words. We nd that several of the topics that
emerge from our estimation are intuitively meaningful. For example, Topic 1 relates to the
war in Afghanistan, Topic 2 relates to the candidates of the 2008 US presidential elections,
and Topic 9 covers the September 11 terrorist attacks. Furthermore, a relatively clear interpretation can also be attached to Topic 5, which seems to capture both the nancial crisis
and the reactions of the US government to it.
2If
no topic occurs in both periods and when the number of documents are approximately the same for the
two periods, estimating a joint LDA model for both time periods with 10 topics should yield the same topics
and assigned topics weights as if we were to estimate two separate models with 5 topics for each period.
BELIEFS, COORDINATION AND MEDIA FOCUS
9
Table 3. Estimated LDA Topics: High-Probability Words
Topic
Words with the highest assigned probabilities (in descending order)
1
presid bush afghanistan washington unit today state militari taliban said
2
john democrat obama republican mccain presidenti campaign barack sen candid
3
school citi new counti student high year univers worcest state
4
year two old ago day like today aug just bank
5
nanci washington bush billion presid hous plan market bank wall
6
year state million new cut percent month rate price compani
7
mail west state daili virginia sta report get new work
8
state yesterday oci anthrax feder court said oc investig washington
9
attack terrorist new world sept york center trade airport washington
10
year polic old said man review oc counti two journal
Notes: For each of the 10 topics estimated using Latent Dirichlet Allocation, the table shows the 10 words with
the highest probabilities of occurring in that topic. The order of words is descending in terms of the probabilities
assigned to them in the given topic. All words have been stemmed using the Porter (1980) stemmer.
To get a more complete understanding of these four topics and their associated word
probabilities, we also plot them in the form of word clouds (gure 1).
These graphical
representations show a larger number of words for each topic, reecting their probabilities
within a given topic in terms of the sizes at which they are displayed. The interpretations
of the four topics that we derive based on the word clouds reinforces the ones obtained from
high-probability words shown in table 3.
2.5.
Dierent Newspapers Specialize in Dierent Topics.
The rst specic aspect of
newspaper coverage that we assess using the estimated LDA model concerns the extent to
which newspapers are specialized. In other words, we investigate the extent to which dierent
newspapers tend to over- or underweight dierent topics relative to the overall average. For
this purpose, Figure 2 plots normalized deviations of newspaper-specic topic probabilities
3
for the same four topics discussed above.
The plots document that there are large amounts of variation in terms of which newspapers
tend to cover which topics. For example, the nancial crisis as captured by topic 5 received
more than twice as much coverage in the Wall Street Journal than it did in the hypothetical
average outlet.
Similarly, both the New York Times and USA Today allocated a larger
fraction of their news coverage to the September 11 terror attacks that the average newspaper
in our sample. These deviations suggest that newspapers do indeed specialize, resulting in
coverage that is heterogeneous in the cross-section of outlets.
3We calculate these normalized deviations as d
i
reports on topic
j
and
pj =
PI
1
I
i=1
pi,j
i,j
=
pi,j −pj
, with
pj
pi,j
denoting the probability that newspaper
being the corresponding average across all
I
newspapers.
10
KRISTOFFER P. NIMARK AND STEFAN PITSCHNER
Figure 1. Estimated LDA Topics: Word Clouds of Selected Topics
Topic 1: Afghanistan
Topic 2: 2008 Presidential Candidates
defens
terror
nation
move administr call
line
congression
polittwopresid state
state
senior american
candidgov
support
troop aug militari
appear
final
vote
mayor
run
john
said
hous
gener
obama
taliban
washington
governorconvent parti
base
secretari
elect
saidafghanistan
race democrat night
effort
yesterday senat
nation day
pakistanday govern today
republican
support
barack
bomb rule
oct
voter
sarah
fight
presidenti
air
campaign
bush
washington sen
georgia
attack yesterday
debat palin paul
week
vice
warstrike forc offici
control
warn
prepar
plan
unit
palestinian
order
ground
tuesday
presid
capit week
begin
georg
call
month made
new
long
second
lead
district
announc
wednesday new
economi
power america
say
oper
time
thursday
sept
Topic 5: Financial Crisis and Bailouts
crisiadministr wall street
industri
plan
bill
bush billion
govern
feder
monday
day
new rescu
week
econom
washington
yesterday
said
secur
economi
system
american
nation
market
financi
hous bank presid
white
congression senat
lawmak
leader
stock congress
tuesday
Topic 9: Terror Attacks
yesterday
airport ladenbin
center
state
york world
hijack
suspect
thousand
american
attack
said
trade osama
pentagon
airlin
two
tuesday citi
new
terrorist
sept washington
nation
crash
terror
follow
today
securflight peopl week
unit intern
day
Notes: The word-clouds illustrate the probabilities associated with specic words in the topics estimated using
Latent Dirichlet Allocation. Words with higher probabilities are shown in a larger size. All words were stemmed
using the Porter (1980) stemmer. The topics correspond to those shown in table 3.
2.6.
Major Events Shift News Focus and Increase the Homogeneity of News.
Next, we wish to assess how major events aect news coverage along two specic dimensions:
the average emphasis specic topics receive, as well as the homogeneity of news coverage
in the cross-section of outlets.
For this, we explore time variation in the estimated topic
probabilities as well as their distribution across newspapers.
If major events do indeed
aect the focus of news coverage and its cross-sectional homogeneity, we would expect the
September 11 terrorist attacks, the nominations of presidential candidates and the outbreak
of the nancial crisis to be associated with such a behavior.
BELIEFS, COORDINATION AND MEDIA FOCUS
11
Figure 2. Newspaper Specialization: Probabilities of Selected Topics
Topic 1: Afghanistan
1
0
−1
AJ
CG
PPG PPH SHT SLP TGW
BG
EC
LVR NYT
PG
PI
WSJ
WP
UT
WiSJ
PI
WSJ
WP
UT
WiSJ
PG
PI
WSJ
WP
UT
WiSJ
PG
PI
WSJ
WP
UT
WiSJ
Topic 2: 2008 Presidential Canditate Nominations
0.5
0
−0.5
−1
AJ
CG
PPG PPH SHT SLP TGW
BG
EC
LVR NYT
PG
Topic 5: Financial Crisis and Bailouts
1
0
−1
AJ
CG
PPG PPH SHT SLP TGW
BG
EC
LVR NYT
Topic 9: Terror Attacks
0.5
0
−0.5
−1
AJ
CG
PPG PPH SHT SLP TGW
BG
EC
LVR NYT
Notes: The gure illustrates the specialization of newspapers on dierent topics. The topics correspond to those
shown in gure 1 and table 3. The short names of newspapers correspond to those in table 1. The normalized
topic-specic deviations of news focus are calculated as
newspaper
i
reports on topic
j
and
pj =
1
I
PI
i=1
pi,j
di,j =
pi,j −pj
, with
pj
pi,j
denoting the probability that
being the corresponding average across all
I
newspapers.
To assess if this is the case, we use two dierent measures. First, we calculate overall topic
probabilities at a daily frequency by averaging the estimated topic probabilities of all stories
in our database for a given day. The fraction
Ft,k
of total news devoted to topic
k
at date
t
is thus given by
P
Ft,k ≡
d θt,d,k
Dt
(2.3)
12
KRISTOFFER P. NIMARK AND STEFAN PITSCHNER
Figure 3. 2001 Terror Attacks: Time-Variation of Average Topic Probabili-
ties and Homogeneity of Coverage Across Newspapers
plot a: average topic probabilities
fraction
1
0.5
0
08/07/01
08/17/01
08/27/01
09/06/01
09/16/01
09/26/01
10/06/01
10/16/01
10/26/01
plot b: homogeneity of news coverage across outlets
1
Beginning of Afghanistan War
fraction
Terror Attacks
0.5
0
08/07/01
08/17/01
08/27/01
09/06/01
09/16/01
09/26/01
10/06/01
10/16/01
10/26/01
Notes: The gure illustrates time-series variation in the probabilities assigned to the estimated topics and the
cross-sectional homogeneity in newspaper coverage.The time-horizon shown is 08/01/2001 to 10/31/2001. Only
days with coverage of at least 10 newspapers are shown. The topics correspond to those shown in table 3. The
topic probabilities for a specic day shown in plot a are dened as the simple average of the corresponding
probabilities of all articles in the database for that day. The homogeneity measure shown in plot b is dened as
the fraction of newspapers for which the highest-probability topic is the same one that also carries the highest
probability across all articles published on that day.
where
Dt
is the total number of articles in the sample from day
t.
Second, to assess homogeneity in news-coverage across newspapers, we consider to what
extent the outlets agree on which topic is most important on a given day. For this, we rst
identify the topic that has the highest probability across all articles of a given day. Then,
we calculate the fraction of newspapers that assign the highest weight to that same topic,
BELIEFS, COORDINATION AND MEDIA FOCUS
13
Figure 4. 2008 Financial Crisis: Time-Variation of Average Topic Probabil-
ities and Homogeneity of Coverage Across Newspapers
plot a: average topic probabilities
fraction
1
0.5
0
08/10/08
08/20/08
08/30/08
09/09/08
09/19/08
09/29/08
10/09/08
10/19/08
10/29/08
plot b: homogeneity of news coverage across outlets
1
fraction
Presidential Canditate Nominations
Lehman Bankruptcy
Failed Bailout
0.5
0
08/10/08
08/20/08
08/30/08
09/09/08
09/19/08
09/29/08
10/09/08
10/19/08
10/29/08
Notes: The gure illustrates time-series variation in the probabilities assigned to the estimated topics and the
cross-sectional homogeneity in newspaper coverage. The time-horizon shown is 08/01/2008 to 10/31/2008. Only
days with coverage of at least 10 newspapers are shown. The topics correspond to those shown in table 3. The
topic probabilities for a specic day shown in plot a are dened as the simple average of the corresponding
probabilities of all articles in the database for that day. The homogeneity measure shown in plot b is dened as
the fraction of newspapers for which their highest-probability topic is the same one that also carries the highest
probability across all articles published on that day.
i.e. homogeneity
Ht
of news coverage on day
P
Ht ≡
m
t
is dened as
I (arg maxk Ft,m,k = arg maxk Ft,k )
M
where
I
Ft,m,k
is the fraction of news coverage devoted to topic
is an indicator function that takes the value 1 when the equality in brackets is holds.
k
by newspaper
m at time t and M
is
14
KRISTOFFER P. NIMARK AND STEFAN PITSCHNER
the total number of newspapers. The range of
Ht
is thus between
0
and
1,
with
1
indicating
that all newspapers agree on which topic is the most important one.
Figure 3 shows the evolution of both of these measures for the period August to October
2001.
The top panel illustrates that on September 12 and the following days newspapers
assigned very high weights to the terrorism topic (topic 9) as displayed in light red. Furthermore, a second pronounced change in the average topic probabilities occurs on October
8, the day after the war in Afghanistan began (topic 1).
In addition to these changes in
average topic probabilities, we can see from the bottom panel that the same two days are
also associated with pronounced increases in topic homogeneity.
That is, both the terror
attacks and the beginning of the Afghanistan war caused coverage to become more similar
across newspapers.
For the second period used in our analysis, i.e. August to October 2008, the same exercise
is repeated in Figure 4.
Here, too, several events stand out in the sense that they seem
to aect both the focus of news coverage and its cross-sectional homogeneity.
First, the
presidential nomination conventions received high levels of attention and caused an increase
in homogeneity. Then, the Lehman Brothers Bankruptcy on September 15 caused another
spike. Finally, a last spike occurs on September 30, the day after the Emergency Economic
Stabilization Act of 2008 failed to pass the US House of Representatives.
3. A Beauty Contest Model with State Dependent News Selection
Above we documented some stylized facts about news coverage that can be attributed
to the editorial decisions of newspapers.
Below, we present a theoretical model that can
explain these facts and help us understand the role the editorial decisions of newspapers
play in determining agents' beliefs and actions.
The model is an abstract beauty contest
game in the spirit of Morris and Shin (2002) in which agents' pay-os depend on the distance
of their actions from a latent variable as well as the distance of their action from other agents'
actions. However, we depart from the original model in several important ways.
First, agents have heterogenous interests in the sense that dierent agents want their
actions to be close to dierent latent variables. Second, agents are constrained in the number
of stories that they can read about and therefore delegate the information acquisition to
specialist information providers that can monitor a large set of events on the agents' behalf.
Each information provider is characterized by a news selection function, which is a mapping
from states of the world to a set of reported events. The news selection functions formalize
the editorial decisions of newspapers, and below we will analyze in detail how they aect
agents' beliefs and ability to coordinate.
3.1.
Information consumers with heterogeneous interests.
Our model is populated
potential
stories, Xa and Xb . A potential story Xi is a random variable that takes values in X and
an event xi is a particular realization of Xi . The state of the world is described by the pair
by the two information consumers Alice and Bob. They live in a world with two
BELIEFS, COORDINATION AND MEDIA FOCUS
(xa , xb ) ∈ Ω where Ω = X × X
15
is the set of all (joint) events. An event is of interest to Alice
4
or Bob if their utility increases as a result of knowing about it.
Utility and heterogenous interests.
3.1.1.
Alice and Bob have dierent interests and this
heterogeneity is introduced via their utility functions.
The basic set-up is a two person
beauty contest game in which Alice wants to take an action
latent variable
xa
Bob's action
yb .
ya
that is close to both the
This is formalized by the following utility function for
Alice
Ua = − (1 − λ) (ya − xa )2 − λ (ya − yb )2 .
If Bob also wanted to take an action that was close to
xa
(3.1)
and close to Alice's action
ya
this setup would be a two-person version of the beauty contest in Morris and Shin (2002).
However, we introduce heterogeneity among the agents by making Bob want to take an
action that is close to
xb .
Bob's utility function
Ub
is symmetric to Alice's and given by
Ub = − (1 − λ) (yb − xb )2 − λ (yb − ya )2
where
ya
is the action taken by Alice. We say that Alice has a direct interest in
(3.2)
Xa
because
xa . Symmetrically, Bob has a direct
Xb . The parameter λ governs the strength of the strategic motive. Because of this
strategic motive, Alice has an indirect interest in knowing about Xb since that may help her
her utility depends directly on the realized value of
interest in
better predict Bob's action. Symmetrically, Bob has an indirect interest in knowing about
Xa .
ya
Alice's optimal action
is given by the rst order condition
ya = (1 − λ) Ea [xa ] + λEa [yb ]
where
Ea
(3.3)
denotes the expectations operator conditional on Alice's information set. (A sym-
metric expression describes Bob's optimal action.) If agents could observe both
xa
and
xb
directly, the equilibrium decision would be described by
yi =
1
λ
xi +
xj : i, j ∈ {a, b} , i 6= j
1+λ
1+λ
However, Alice and Bob observe neither
Xa
nor
Xb
(3.4)
directly and instead have to rely on
information providers who monitor the state of the world on their behalf.
3.1.2.
Information constraints.
News stories are to some extent indivisible in the sense that
reading one word about many dierent stories is less useful than reading a full paragraph
about fewer stories. It is also not feasible for an individual to read all stories that are reported
by all newspapers. To capture these constraints, Alice and Bob are restricted to read about
only one event, though Alice and Bob may not always read about the same event.
4Here
we use the word
outcome of
Xi
probability to.
event
to mean a specic story that a newspaper might report about, i.e. a realized
and not in the more general way as meaning any collection of outcomes that we can assign a
16
KRISTOFFER P. NIMARK AND STEFAN PITSCHNER
3.2.
Information providers.
There are two information providers, Paper
A and Paper B .
To capture that individuals cannot read all available information, Alice and Bob are restricted
5
to reading only one paper each.
she nds most interesting.
Alice reads Paper
A
because it reports those stories that
Similarly, Bob reads Paper
B
because it reports those stories
that he nds most interesting. While not modeled explicitly here, this is a simple way of
capturing that newspapers compete for readers/subscribers by oering specialized content.
We formalize the editorial decision of a newspaper by dening its news selection function
as a mapping from the realized state of the world into a discrete decision of what to report.
Denition 1. The news selection function
Si : Ω → {0, 1} is an indicator function that
takes the value 1 when paper i reports the realized value of Xi and 0 otherwise.
Depending on the state of the world, Alice observes either
Xa
or
Xb .
Both newspapers
make their editorial decisions in order to maximize the expected utility of their readers. In
doing so, they take the news selection function of the other newspaper as given. The news
selection functions are thus determined by
Si (xi , xj ) = arg max E [Ui (Si , Sj , Uj )]
(3.5)
Si
where the expression makes it clear that the expected utility of an agent depends not only
on the news selection of the paper that he or she reads but also on the utility function of,
and the news selection function of the paper read by, the other agent.
3.3.
News selection and beliefs.
Reading a news report about either
immediately informative about that specic variable.
Xa
or
Xb
is always
However, one implication of non-
random news selection is that whether an event is reported or not is informative by itself.
We can state this result more formally in the following proposition.
Proposition 1. Posterior beliefs about the unreported story Xj coincide with the conditional
distribution p (xj | xi ) only if the probability of reporting about Xi is conditionally independent
of the unreported variable, i.e.
p (xj | Si = 1, xi ) = p (xj | xi )
(3.6)
p (Si = 1 | xi ) = p (Si = 1 | xj, xi ) .
(3.7)
only if
Proof.
By Bayes' rule we can express the posterior about the unreported variable as
p (xj | Si = 1, xi ) =
p (Si = 1 | xj, xi )
p (xj | xi ) .
p (Si = 1 | xi )
(3.8)
It then follows immediately that (3.6) holds only if
p (Si = 1 | xj, xi )
=1
p (Si = 1 | xi )
which completes the proof.
5It
(3.9)
would be straightforward to endogenize the decision of how may newspapers each agent chooses to read.
A xed cost of reading a newspaper that is large enough to discourage Alice and Bob from reading both
newspapers while not being so large as to make it prohibitively expensive to read one newspaper would result
in the same outcome.
BELIEFS, COORDINATION AND MEDIA FOCUS
Xa
Consider a set up where
about
xi
xi
xj .
probability of reporting
is
and
Xb
are independent so that
is then by itself uninformative about
informative about
xj .
i
depends on the realized value of
would report
Xj
instead of
Xi .
p (xj | xi ) = p (xj ).
Knowing
But Proposition 1 states that if the
Xj , the fact that xi
The implication of Proposition 1 is starkest if the support of
where paper
17
Since paper
i
Xj
was reported
contains states of the world
did not report
xj ,
these states
can then be ruled out, i.e. these states are associated with a zero probability conditional on
Si = 1
and
xi .
As an example, consider somebody reading the Wall Street Journal. If there
is no report about a stock market crash, the reader can infer that no stock market crash has
occurred since the Wall Street Journal would for sure have reported such an event, had it
occurred.
4. News selection, public information and correlated actions
To investigate the implications of news selection, we here rst specify a discrete state space
for the random variables
Xa
and
Xb . This allows us to derive explicit expressions for optimal
actions, the publicness of information as well as how the correlation of agent's actions are
aected by the editorial decisions of newspapers.
4.1.
Discrete states of the world.
In this section, the potential stories
discrete random variables that can take the values
−1, 0,
or
1.
Xa
and
Xb
are
The dierent states occur
with probabilities given by
1
1
1
pi (−1) = , pi (0) = , pi (1) = : i ∈ {a, b}
(4.1)
4
2
4
where pi (xi ) is the pmf of xi . The random variables Xa and Xb are thus identically and
symmetrically distributed around zero. We also assume that Xa and Xb are independent of
one another so that
pi (xi | xj ) = pi (xi ) : i 6= j, ∈ i, j {a, b} .
Neither the symmetry nor the independence of the distributions for
(4.2)
Xa
and
Xb
are necessary
for what follows, but help simplify the presentation.
4.2.
Optimal news selection functions.
Each information provider chooses what to re-
port in order to maximize the expected utility of its respective reader.
Because of the
strategic motive in agents' utility, what information will be most useful to Alice depends on
Bob's action. Since Bob's action in turn depends on what information he has available, the
news selection function of a Paper
A
depends on the news selection function of Paper
B.
A Nash equilibrium in the news selection game is a xed point at which neither newspaper
wants to change its selection function, taking the other paper's selection function as given.
Because the optimal news selection functions depend on how agents respond to information, we cannot fully characterize them before we have derived the agents' optimal actions.
However, these actions depend on the news selection functions. We therefore rst state the
equilibrium news selection functions without proof. Below, we will derive the optimal actions
of the agents, taking the conjectured news selection functions as given. It is then straightforward to verify that the news selection functions postulated here do indeed constitute a
Nash equilibrium.
18
KRISTOFFER P. NIMARK AND STEFAN PITSCHNER
4.2.1.
No strategic motive.
As a benchmark, consider rst the case in which there is no
λ = 0, it is optimal for Paper A to always report
Xa since Alice's utility then neither directly nor indirectly depends on Xb . Symmetrically, it
will always be optimal for paper B to report Xb . (Alice would be indierent to reading about
Xa or Xb when Xa = 0 and the same holds for Bob and Xb .) In the absence of a strategic
strategic motive in the agents' actions. When
motive in actions, the news selection functions are thus simply described by
Si = 1 ∀ {xi , xj } ∈ Ω.
The news selection functions for Paper
A
B
and Paper
(4.3)
when
λ=0
are also given in tabular
form in the top row of Table 1.
4.2.2.
Strategic complementarities.
When agents have an incentive to take actions that are
close to the action of the other agent, i.e.
when
λ > 0,
the equilibrium news selection
function is described by
Si =
0
1
if
xi = 0
and
xj ∈ {−1, 1}
(4.4)
otherwise
A will then report Xa when Xa equals −1 or 1 but report Xb if Xa = 0 and Xb
1. Again, the news selection functions are given in tabular form in the second
1.
6
As in the case with no strategic motives, Paper
A
will report about
Xa
equals
−1
or
row of Table
most of the time.
However, when Alice wants to take an action that is close to Bob's action, it is optimal for
Paper
A
to report about
is simple.
Xb
in states of the world when
When the realized value of
Xa
xa = 0
and
xb 6= 0.
whether Bob will take a positive or negative action.
Knowing the realized value of
then more useful to Alice since she can then better predict Bob's action.
6Suciently
The intuition
is zero, it is more important for Alice to know
7
Xb
is
strong complementarities result in multiple equilibria in news selection strategies. This case is
discussed in the Appendix.
7In
fact, given the news selection function (4.4), Alice can infer that if she reads about
probability
1.
Xb ,
then
xa = 0
with
However, that Alice can infer the realized value of the unreported value with certainty is to
some degree an artefact of the low dimensional state space. (Proposition 1 above provided a more general
characterization of the information about the unreported event, conditional on what was reported.)
BELIEFS, COORDINATION AND MEDIA FOCUS
19
Table 1: News selection functions
Paper A
Paper B
No strategic motive
Xa = −1 Xa = 0 Xa = 1
Xb = −1
A
A
A
Xb = 0
A
A
A
Xb = 1
A
A
A
Xa = −1 Xa = 0 Xa = 1
Xb = −1
B
B
B
Xb = 0
B
B
B
Xb = 1
B
B
B
actions λ > 0
Xa = −1 Xa = 0 Xa = 1
Xb = −1
B
B
B
Xb = 0
A
B
A
Xb = 1
B
B
B
Complementarities in
Xb = −1
Xb = 0
Xb = 1
4.3.
λ=0
Xa = −1 Xa = 0 Xa = 1
A
B
A
A
A
A
A
B
A
News selection and higher order beliefs.
Public signals that are commonly known
to be observed by all agents are particularly inuential when privately informed agents
interact strategically because such signals are particularly useful for agents that want to
predict other agents' actions (e.g.
Morris and Shin 2002).
Arguably, everything that is
reported by newspapers is public in the sense that it is available for those who care to look
for it. However, not all information that is printed in a newspaper is observed by everybody,
and even when an event is widely reported, it may not be known to readers of all newspapers
how widely reported it is. In the model above with strategic complementarities, there are
states of the world where Alice and Bob read about the same event. Yet, this event may not
be common knowledge.
Consider rst the case when Paper
(0, 1)
(0, −1)
A
reports about
Xb .
This only happens in the states
Xb . This
is natural since Alice has no direct interest in Xb and nds it useful to know about Xb only to
and
i.e. only in states of the world where Paper
B
also reports about
the extent that it helps her predict the action of Bob. Because Alice understands that Bob
will read about
Xb
for sure whenever she does, she knows that
X b = xb
and that Bob knows
this as well. Yet, this fact will not be common knowledge. Bob knows that he observes
the states
(−1, 1) , (1, 1) , (−1, −1) , (1, 1) , (0, −1)
and
(0, 1) .
But since Alice observes
Xb
Xb
in
in
only the latter two states and because Bob attaches positive probability to the states where
Alice does not observe
Xb ,
the fact that
Alice and Bob both know this to be true.
which Bob believes that Alice observes
the realized value
4.4.
8
Xb
Xb = x b
is not common knowledge even though
As we will now demonstrate, the probability with
when he does aects how strongly he responds to
xb .
Equilibrium actions.
Alice and Bob's equilibrium actions depend on the degree of
strategic complementarities both directly and through the eect the strategic motive have
on the equilibrium news selection functions. Here, we derive the optimal actions taking the
news selection functions described by (4.3) as given.
8In
fact, in the simple discrete example here, the only state in which any event is common knowledge is
(0, 0)
since it is only in this state that Alice or Bob reads a report stating that the variable they have a direct
interest in equals zero.
20
KRISTOFFER P. NIMARK AND STEFAN PITSCHNER
4.4.1.
No strategic motive.
always observe
and
yb = xb .
Xb .
Since
With no strategic motive, Alice always observe
Xa
and Bob
Alice and Bob's equilibrium actions are then trivially given by
Xa
and
Xb
y a = xa
are independent random variables, Alice's and Bob's actions
are also independent.
4.4.2.
Strategic complementarities.
Bob sometimes observe
Xa .
With a strategic motive, Alice sometimes observe
Bob knows that he only observes
Xa
when Alice does so as well.
Bob can thus infer Alice's action with certainty when he observes
Bob only observes
Xa
when
xb = 0,
Xb and
Xa .
Furthermore, since
Bob's optimal action when he observes
Xa
is simply
given by
yb (xa , Sb = 0) = λya (xa , Sa = 1)
When Alice observes
(4.5)
Xa
she does not know with certainty whether Bob does so as well.
xa if xb = 0 which happens with probability 12 . Alice's
optimal action when xa ∈ {−1, 1} is then given by
If
xa ∈ {−1, 1} ,
Bob also observes
ya (xa , Sa = 1) = (1 − λ) xa
1
+λ yb (xa , Sb = 0)
2
1
+λ E [yb (xb , Sb = 1) | xa , Sa = 1, Sb = 1]
2
(4.6)
Because of the symmetry, the expectation on the third line equals zero. Substituting (4.5)
into (4.6), simplifying and switching to general indices gives
yi (xi , Si = 1) =
(1 − λ)
xi
1 − 12 λ2
(4.7)
and
yi (xj , Si = 0) = λ
(1 − λ)
xj
1 − 12 λ2
We can see from (4.7) - (4.8) that regardless of whether Alice observes
tude of her response depends on the probability
Xa
(4.8)
Xa
or
Xb , the magni-
p (Sj = 0 | xi , Si = 1) . When Alice observes
Xa . When Alice
this is the probability she attaches to the event that Bob also observes
Xb this is the probability that Alice believes Bob attaches to the event that she obXb . Thus, the higher this probability is, the stronger will the response of both agents
observes
serves
be. The degree to which information about an event is common among agents thus matter
for the strength of their responses, even when an event is mutual knowledge.
Incidentally, the expression (4.7) also describes the optimal action when agents observe
that the variable they have a direct interest in equals zero, since the state
(0, 0)
is common
knowledge. It is then optimal for both agent to take a zero action.
4.5.
Verifying the optimality of the conjectured news selection functions.
Given
the optimal actions derived above, it is straightforward to verify by direct computation that
neither Paper
A nor Paper B
has an incentive to deviate from the conjectured news selection
functions described by (4.3). The Appendix describes a operational algorithm for doing so.
BELIEFS, COORDINATION AND MEDIA FOCUS
4.6.
21
Correlation of actions with and without delegated news selection.
To isolate
the implications of the editorial function of the newspapers for agents' actions we now compare the predictions of the model with a natural alternative. In the alternative model Alice
and Bob are, as in the benchmark model, restricted to observing only one out of the two
realized events. However, instead of delegating the news selection to a newspaper that can
condition on ex post outcomes, Alice and Bob have to make a decision ex ante about which
variable to observe.
Without the possibility of delegating the selection of what to observe, Alice will always
chose to observe
Xa
and Bob will always chose to observe
dependent, observing
xa
is then uninformative about
xb
Xb .9
Since
Xa
and vice versa.
and
Xb
are in-
The conditional
expectation of the unobserved variable is then equal to its unconditional mean and the
optimal action with ex ante story choice is given by
yi = (1 − λ) xi : i ∈ a, b
Clearly, if
Xa
and
Xb
(4.9)
are independent, Alice and Bob's actions are uncorrelated in this
alternative model.
Proposition 2. Delegated news selection introduces positive correlation between Alice and
Bob's actions.
Proof.
Direct computation of the correlation of Alice and Bob's actions gives
P
ω∈Ω
p (xa , xb ) ya (xa , xb )yb (xa , xb )
(1 − λ)2
−1
p
p
= 2λ
2 var (yi )
2
(2 − λ )
var (ya ) var (yb )
> 0
(4.10)
(1, 1) and (−1, −1) will cancel against the terms
(−1, 1) and (1, −1) . the term associated with the
with the states (0, 1) , (0, −1) , (1, 0) and (1, 0) are
Here, the terms associated with the states
associated with the equally probable states
state
(0, 0)
is zero. The terms associated
all positive and when weighted by their probabilities sums up to the term multiplying the
reciprocal of the variance in (4.10). Under ex ante information choice, these terms would all
be zero. The editorial function of newspapers thus introduces correlation in agents actions
that is absent if agents choose ex ante what variable to get information about.
5. Extreme events and approximate common knowledge
The model above allowed us to analyze how state dependent news selection aects agents'
beliefs and actions and we demonstrated that agents' preferences and the distribution of
events inuences the degree to which an event is commonly known. In the data, we saw that
the
9/11
attacks and the Lehman Brothers bankruptcy made news coverage more homoge-
nous across news outlets. Arguably, what made these events special and so widely reported
were their magnitude, as both bank failures and terrorist attacks happen frequently on a
smaller scale. The simple discrete state space set up above did not allow us to capture the
9With λ
or
Xb .
close enough to
1,
it is optimal for Alice and Bob to coordinate on both always observing either
Xa
22
KRISTOFFER P. NIMARK AND STEFAN PITSCHNER
notion of a large magnitude event. In this section we therefore extend the model above to
allow for continuously distributed events so that we can meaningfully analyze the implication
of large magnitude events.
5.1.
Optimal simple news selection functions.
potential stories
Xa
and
Xb ,
With continuous distributions of the
the optimal news selection functions are innite dimensional
objects and do not in general have known functional forms.
While it is possible to show
that the optimal news selection functions will be of threshold function that determines how
large (or how negative)
Xb
need to be for Paper
general be specic to the realized value of
Xa .
A
to report about it, the thresholds will in
Here, we restrict the news selection functions
to belong to a simple parametric class of the form
Si =
1
0
if
|xi | ≥ α + β |xj |γ
(5.1)
otherwise
The threshold function (5.1) is symmetric around zero, and symmetric across Paper
Paper
Paper
5.2.
A
and
B . Subject to these constraints, the parameters α, β and γ are again chosen so that
A maximizes the expected utility of Alice and Paper B does the same thing for Bob.
Conditional actions.
When
Xa
Xb
and
are continuously distributed, the conditional
expectations in the rst order condition (3.3) can be expressed as
RR
xi p (xi , xj , Si ) dxi
yi (xk , Si ) = (1 − λ)
p (xk , Si )
RR
yj (xi , xj ) p (xi , xj , Si ) dxi dxj
: i, j, k ∈ {a, b} , i 6= j
+ λ
p (xk , Si )
Xb , the news selection funcxj and Sj = 0 is also zero.10
As in the previous section, it will again be optimal for Paper B to report about Xa only
when Paper A does so as well. That is, if Sj = 0, then Sj = 1. This allows us to simplify the
For independent, symmetric, zero mean distributions of
tion (5.1) implies that the expected value of
xi
Xa
(5.2)
and
conditional on
expression (5.2) to
yi (xi , Si = 1) =
1−
(1 − λ)
xi
= 0 | xi , Si = 1)
λ2 p (Sj
and
yi (xj , Si = 0) = λ
1−
(1 − λ)
xj
= 0 | xj , Sj = 1)
λ2 p (Si
(5.3)
(5.4)
xi depends on the
xi is common knowledge. However, here the probability
xi believes that the other agent also observes xi varies
As in the discrete states model above, the strength of the responses to
degree to which the realized value of
with which an agent that observe
continuously with the realized state. Given the news selection function (5.1), the probability
in the denominator of (5.3) and (5.4) is increasing in the absolute realized value of
xj
xi
and
so agents responds more than proportionally to large magnitude events.
10One
way to think about this is that
γ
|xi | = αi + βi |xj | i .
p (xi , xj , Si = 0)
is simply
p (xi )
with symmetrically truncated tails at
BELIEFS, COORDINATION AND MEDIA FOCUS
23
With continuous distributions, we need to solve the model numerically. A solution can
be found by letting Paper
A
the news selection function of
α, β
and
γ
α, β and γ in order to maximize Alice's utility, taking
Paper B and Bob's actions as given. Paper B then chooses
choose
in order to maximize Bob's expected utility, taking the Paper A news selection
function from the rst step as given. Iterating between these two steps until convergence
yields a solution.
Figure 5 illustrates several model outputs. The left column corresponds to Xi ∼ U (−1, 1)
Xi ∼ N (0, 13 ).To facilitate comparison, the variance of the Gaussian
distribution is chosen so that most of its probability mass lies within the support of the
and the right column to
uniform distribution.
5.3.
News selection, the strategic motive and publicness of information.
discrete state model above, when
report
Xb .
When
λ > 0,
Paper
A
λ = 0
A
Paper
always report
Xa
As in the
B
and Paper
always
Xb and Paper B
Xa . Clearly, Alice's expected loss of not knowing Xa
value of Xa . When λ > 0, it is also increasingly costly
will sometimes nd it optimal to report
will sometimes nd it optimal to report
is increasing in the absolute realized
for Bob to not know about Alice's action as the absolute magnitude of her action increases.
The probability that Alice and Bob observes
Xa
is thus increasing in the absolute value of
xa .
5.3.1.
No strategic motive.
The second row of Figure 5 illustrates the probability that Alice
(solid lines) and Bob (dashed lines) observes
Xa
conditional on the realized value
xa .
λ = 0 (blue and purple lines), Alice always observes Xa and Bob never observes Xa
Xa the associated probabilities are 1 and 0 respectively.
When
so for all
values of
Moderate strategic motive.
λ = 0.3 (red and green lines) and the absolute value
Xb . However, the probability that Alice
observes Xa tends to 1 rapidly as |xa | increases. Bob is also more likely to observe Xa as |xa |
increases but since the states in which Bob observes Xa is a subset of the states where Alice
observes Xa , Bob's probability of observing Xa is lower than the probability that Alice does
so for every value of xa and it is increasing at a slower rate in |xa | .
By Bayes' rule, the probability that Bob observes Xa conditional on Alice doing so is given
5.3.2.
xa
of
is small, Paper
A
When
is more likely to report
by
p (Sa = 1 | Sb = 0, xa ) p (Sb = 0 | xa )
(5.5)
p (Sa = 1 | xa )
Since Bob only observes Xa when Alice does we have that p (Sa = 1 | Sb = 0, xa ) = 1 so that
p (Sb = 0 | Sa = 1, xa ) =
(5.5) simplies to
p (Sb = 0 | xa )
.
p (Sa = 1 | xa )
Alice knows that Bob is more likely to observe Xa as the absolute value of xa
p (Sb = 0 | Sa = 1, xa ) =
(5.6)
increases, so
larger magnitude events tend to be closer to common knowledge. For the uniform distribution, Alice attaches about a 30 per cent probability to the event that Bob observes
Xa
is close to
−1
or
1.
Xa
when
When the events are normally distributed, the same probability is
just above 60 per cent. The dierence is explained by the fact that with normally distributed
variables, the probability mass is more concentrated around the means, so conditional on
24
KRISTOFFER P. NIMARK AND STEFAN PITSCHNER
the realized value of
make Paper
B
Xa ,
it is then less likely that
Xb
has a large enough absolute value to
report about that instead. While not shown in the graph, if we extended the
x-axis, the probability that Bob observes
Xa
would tend to 1 as the absolute value of
|xa |
grows arbitrarily large.
5.3.3.
Strong strategic motive.
As the strategic motive is strengthened the cost of not observ-
ing the event that the other agent observes increases. When
lines) both Paper
A
and Paper
B
λ = 0.6
(yellow and turquoise
will simply report the variable that has had the largest
absolute realization. Since the news selection functions are known to both agents, Alice can
then infer that if she observes
Xa
then
|xa | > |xb |
so that Paper
B
will also report
Xa . With
suciently strong complementarities in actions, both papers will always report the same
event and the reported event will be common knowledge.
U(-1,1)
N(0,1/3)
1.5
1
1
0.5
0.5
p(x)
1.5
-0.5
0
0.5
0
-1
1
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
-0.5
0
0.5
1
0.5
1
i
p(S | x )
0
-1
0
-1
-0.5
0
E[ya + yb | xi]
p(Si = 1 | xi),  = 0
0.5
p(Si = 1 | xi),  = 0.3
0
-1
1
p(Si = 1 | xi),  = 0.6
1
1
0.5
0.5
0
0
-0.5
-0.5
-1
-1
-0.5
0
0.5
-1
-1
1
=0
-0.5
p(Sj = 0 | xi),  = 0
 = 0.3
0
p(Sj = 0 | xi),  = 0.3
-0.5
p(Sj = 0 | xi),  = 0.6
0
0.5
1
 = 0.6
Figure 5. The top row illustrates the pdfs of the
U (−1, 1)
and
N (0, 1/3)
distributions. The second row illustrates the probability that paper A (solid
lines) and paper B (dashed lines) reports
Xi
row illustrates the expected aggregate response conditional
5.4.
xi
Non-linear aggregate responses.
Xi . The
on xi .
conditional on
bottom
The expected aggregate response conditional on
depends on how likely it is that agents observe
Xi . When Alice observed Xa ,
the strength
BELIEFS, COORDINATION AND MEDIA FOCUS
25
of her response depends how likely she thinks it is that Bob also observes
Xa , and the
Xa , and so
probability she believes that Bob attaches to the event that Alice also observes
on.
Since it is more likely that Alice and Bob observes
Xa
when it has a large absolute
realization, and because Alice and Bob knows that it is then more likely that they observe
Xa ,
and so on, their expected responses conditionally on observing
Xa
are non-linear.
This nonlinearity is illustrated in the bottom row of Figure 5. For realizations of
to zero, the probability that Alice or Bob observes
Xa
is also close to zero.
the expected response is then also zero since if no agent observes
Xa
equals its unconditional mean.
to observe
Xa
Xa
Xa
close
In the limit,
their expectation of
With a moderate strategic motive, even if Alice were
she knows that there is only a small probability that Bob also observes
This makes Alice's response, conditionally on observing a small realization of
as well. As the realized absolute value of
Xa
Xa .
Xa .
weaker
increases, the probability that Alice and Bob
Xa the magnitude of here
|xa | as the probability that Bob also observes
reads about it increases, and conditionally on Alice observing
response increases more than proportionally in
Xa
increases.
6. Conclusions
News media are an important source of information for a large part of society. In this paper
we have argued that in order to understand how news media aect decisions, we need to rst
understand how they select what stories to report.
We therefore obtained text fragments
from a large number of news stories published in US newspapers during the months around
the September 11 terrorist attacks and the Lehman bankruptcy in 2008. We then used a
Latent Dirichlet Allocation statistical topics model to document three stylized facts about
newspaper coverage. First, dierent newspapers provide specialized content and tend to cover
dierent topics to dierent degrees. For example, the nancial crisis received particularly
large amounts of coverage by the Wall Street Journal, and both the New York Times and
USA Today assigned above-average weights to the September 11 attacks.
Second, major
events such as terrorist attacks or nancial crises result in a high fraction of news content
being devoted to the topics associated with these events. As an example, the LDA model
attributes more than 50 per cent of the total news coverage during the days following the
September 11 attacks to the topic associated with the attacks. Third, major events make
news coverage more homogenous across newspapers.
The September 11 terrorist attacks,
the 2008 political party conventions, the Lehman bankruptcy and the failed bailout package
proposed by then Secretary of the Treasury Hank Paulson, were events that all resulted in
a majority of newspapers devoting more coverage to these events than to any other.
Motivated by these stylized facts about news coverage, we proposed a theoretical model
that can match these facts.
We used to the model to argue that, in order to understand
how agents respond to particular events, one has to distinguish between information that is
publicly available, in the sense of being reported by at least one newspaper, mutual information, in the sense of being reported by all newspapers, and information that is common
knowledge, i.e. information that all agents know and that all agents know that all agents
know, and so on.
26
KRISTOFFER P. NIMARK AND STEFAN PITSCHNER
In the model, how widely reported an event becomes is endogenous and depends on agents'
preferences and what other events have occurred that compete for the available news coverage space. The probability with which agents believe that other agents observe the same
information as they are depends on the news selection functions of the information providers.
In general, information in the model is neither purely private nor common knowledge but
varies probabilistically.
We demonstrated that, in the model, agents' actions tend to be more correlated when
they delegate the news selection to newspapers that can condition on ex post events before
deciding what to report, compared to a setting where agents decide ex ante what to get
information about. We also showed that large events make information closer to common
knowledge.
With strategic complementarities in actions, agents then respond more than
proportionally to large events.
That the editorial role of information providers facilitates coordination in some states of
the world has implications for the large existing literature proposing that business cycles are
(at least partly) caused by agents coordinating on either pure sun-spot shocks, e.g.
Cass
and Shell (1983), on noisy public signals e.g. Lorenzoni (2009) and Nimark (2014), or on
"sentiment" shocks e.g. Angeletos and La'O (2013) and Angeletos, Collard and Dellas (2014).
One feature these papers have in common is that the coordination of actions cannot rely solely
on the information that is transmitted through prices. Blinder and Krueger (2004) report
that a majority of households get most of their economic information from newspapers. Since
business cycles require that millions of households and rms take correlated actions, it seems
plausible that coordination must then partly work through mass media. The argument we
make in this paper is that, to the extent that coordination works via news media, coordination
is facilitated in those states of the world where news coverage is more uniform across dierent
news providers.
In the theoretical model proposed here, we took a very benevolent view of how news
media selects what to report. While truthful and unbiased reporting that aim to maximize
the utility of the reader may or may not be a good approximation of reality, the mechanism
that we have described in this paper will be at work as long as any biases in news reporting
is systematic and understood by the agents in the model. For instance, if newspapers tend to
be more likely to report bad news than good news, then bad news events will be more widely
reported, closer to common knowledge and provoke stronger responses than good news of
similar magnitude.
A benevolent and accurate news media is also natural benchmark to
start from and we think that it is interesting that even under such ideal assumptions, news
media can have important eects on agents' decisions and beliefs.
References
[1] Alvarez, F., F. Lippi and L. Paciello, 2011, "Optimal Price Setting With Observation and Menu Costs",
Quarterly Journal of Economics
126, pp1909-1960.
[2] Angeletos, G.M., F. Collard and H. Dellas, 2014, Quantifying Condence", working paper MIT.
[3] Angeletos, G.M., J. La'o, 2013, Sentiments", Econometrica, Volume 81,pp.739779.
[4] Angeletos, G.M., L. Iovion, J. La'o, 2015, Real Rigidity, Nominal Rigidity, and the Social Value of
Information", mimeo, MIT.
BELIEFS, COORDINATION AND MEDIA FOCUS
27
[5] Asuncion, A., M. Welling, P. Smyth and Y.W. Teh, 2009, "On smoothing and inference for topic models",
Proceedings of the Twenty-Fifth Conference on Uncertainty in Articial Intelligence,
pp. 27-34. AUAI
Press.
[6] Baker, S.R., N. Bloom and S.J. Davis, 2013, "Measuring economic policy uncertainty", Chicago Booth
research paper 13-02.
[7] Bao, Y. and A. Datta, 2014, Simultaneously Discovering and Quantifying Risk Types from Textual
Risk Disclosures",
Management Science vol 60, pp1371-1391.
Text Min-
[8] Blei, D.M. and J.D. Laerty, 2009, Topic Models", in A. Srivastava and M. Sahami, editors,
ing: Classication, Clustering, and Applications Chapman
Discovery Series.
[9] Cass, D. and K. Shell, 1983, Do Sunspots Matter?",
and Hall CRC Data Mining and Knowledge
Journal of Political Economy
Vol. 91, pp. 193-227.
[10] Doms, M. and N. Morin, 2004, Consumer Sentiment, the Economy, and the News Media", Federal
Reserve Bank of San Francisco Working Paper 2004-9.
[11] Fogarty, B.J., 2005, Determining Economic News Coverage",
Research, vol 17, pp.149-172.
International Journal of Public Opinion
[12] Gentzkow, M. and J. Shapiro, 2006, Media Bias and Reputation",
Journal of Political Economy,
14, pp280-316.
[13] Gentzkow, M. and J. Shapiro, 2008, Competition and Truth in the Market for News,
Economic Perspectives, pp133-154.
[14] Griths, T. and M. Steyvers, 2004, "Finding scientic topics",
Sciences,
Journal of
Proceedings of the National Academy of
vol 101, pp5228-5235
[15] Hellwig, C. and L. Veldkamp, 2009, Knowing what others know",
Review of Economic Studies, pp223-
251.
[16] Jaimovich, N. and S. Rebelo, 2009, Can News about the Future Drive the Business Cycle?",
Economic Review, vol. 99, issue 4,
pp1097-1118.
[17] Kajii, A. and S. Morris, 1997, Common p-Belief: The General Case",
vol 18 pp73-82.
[18] Lorenzoni, Guido, 2009, "A Theory of Demand Shocks",
vol
American
Games and Economic Behavior,
American Economic Review,
American Eco-
nomic Association, vol. 99(5), pages 2050-84, December.
[19] Mackowiak, B. and M. Wiederholt, 2009, Optimal Sticky Prices under Rational Inattention",
Economic Review, vol. 99(3), pages 769-803.
American
[20] Mackowiak, B. and M. Wiederholt, 2010, Business Cycle Dynamics under Rational Inattention",
of Economic Studies, pp1502-1532.
[21] Matejka, F., forthcoming, "Rationally Inattentive Seller: Sales and Discrete Pricing",
nomic Studies.
Review
Review of Eco-
[22] Matejka, F. and A. McKay, 2015, "Rational inattention to discrete choices: A new foundation for the
multinomial logit model",
American Economic Review
105, pp272-98.
[23] Mahajan, A., L. Dey, and S. M. Haque, 2008, Mining nancial news for major events and their impacts
on the market", in
Web Intelligence and Intelligent Agent Technology,
vol. 1, pp. 423-426, IEEE.
[24] Monderer, D. and D. Samet, (1989), "Approximating common knowledge with common beliefs",
and Economic Behavior, pages 170-190.
Games
[25] Morris, S. and H.S. Shin, 1997, Approximate Common Knowledge and Co-ordination: Recent Lessons
from Game Theory",
Journal of Logic, Language, and Information
vol 6, pp171190.
[26] Morris, S. and H.S. Shin, 2002, "The social value of public information",
American Economic Review
92, pp1521-1534.
[27] Paciello, L. and M. Wiederholt, 2013, "Exogenous Information, Endogenous Information and Optimal
Monetary Policy",
Review of Economic Studies.
[28] Porter, M.F., 1980, An algorithm for sux stripping",
Program
Vol. 14, pp. 130-137.
[29] Ramage, D., S. Dumais and D. Liebling, 2010, Characterizing Microblogs with Topic Models",
ceedings of the Fourth International AAAI Conference on Weblogs and Social Media .
Pro-
28
KRISTOFFER P. NIMARK AND STEFAN PITSCHNER
[30] Soroka, S.N., 2006, "Good news and bad news: Asymmetric responses to economic information",
of Politics
[31] Soroka, S.N., 2012, "The gatekeeping function:
world."
Journal
68, pp372-385.
Journal of Politics
Distributions of information in media and the real
74, pp514-528.
[32] Soroka, S.N., D.A. Stecula and C. Wlezien, 2015, "It's (Change in) the (Future) Economy, Stupid:
Economic Indicators, the Media, and Public Opinion",
American Journal of Political Science 59, pp457-
474.
[33] Stevens, Luminita, 2014, "Coarse Pricing Policies", working paper, University of Maryland.
[34] Van Nieuwerburgh, S. and L. Veldkamp, 2009, "Information immobility and the home bias puzzle",
Journal of Finance
64, pp1187-1215.
[35] Van Nieuwerburgh, S. and L. Veldkamp, 2010, "Information acquisition and under-diversication",
Review of Economic Studies
77, pp779-805.
[36] Veldkamp, L., 2006a, "Information markets and the comovement of asset prices",
Studies
73, pp823-845.
[37] Veldkamp, L., 2006b,Media Frenzies in Markets for Financial Information",
view, Vol. 96, pp. 577-601.
Review of Economic
American Economic Re-
Download