BELIEFS, COORDINATION AND MEDIA FOCUS KRISTOFFER P. NIMARK AND STEFAN PITSCHNER Abstract. News media provide an editorial service for their audiences by monitoring a large number of events and by selecting the most newsworthy of these to report. Using a Latent Dirichlet Allocation topic model to classify news articles we document the editorial function of US newspapers. We nd that, while dierent newspapers tend to report on dierent topics to dierent degrees, news coverage becomes more homogenous across newspapers after major events. We present a simple theoretical model that can match this fact and then use it to analyze the implications of the editorial function of news providers for agents' beliefs and their ability to coordinate. We show that, compared to a setting where agents choose ex ante what to get information about, their actions become more correlated when they rely on the editorial function of news providers. Information about large events is closer to common knowledge than information about small events. As a consequence, agents respond more than proportionally to large events, and in expectations, do not respond at all to events that are small enough. 1. Introduction Every day, a vast number of events occur, each of them potentially relevant for the decisions of rms and households. However, no individual rm or household has the resources to observe all of these events. their behalf. Instead, many rely on news media to monitor the world on One important function that news media perform is thus editorial. Among all potential stories that occur, only those that are deemed most newsworthy are reported. In this paper, we analyze how such a news-selection mechanism can aect the beliefs of economic agents and their ability to coordinate. Strategic decisions based on imperfect information are pervasive in economics. Producers in oligopolistic markets need to predict the output of their competitors, speculators need to predict whether other speculators plan to attack a currency, and price setters need to predict the pricing decisions of other rms. In such settings, it is well known that public signals are disproportionately inuential as they tend to be particularly useful for agents that need to predict the actions of other agents, e.g. Morris and Shin (2002). Arguably, everything that is reported by news media is public in the sense that it is available to those who care to read it. However, in reality, not all of this information is common knowledge. That is, not all information that is publicly available is also observed by everybody, and not all information that is observed by everybody is also known to be observed by everybody, and so on. In Date : December 6, 2015, Nimark: Economics Department, Cornell University. e-mail : pkn8@cornell.edu webpage : www.kris-nimark.net. Pitschner: Universitat Pompeu Fabra, e-mail : pitschner@gmail.com, webpage : www.stefanpitschner.com. The authors thank Ed Green, Karel Mertens and conference and seminar participants at Penn State, Sveriges Riksbank, Stockholm University, Helsinki University, SED 2015 and Cornell University for useful comments and suggestions. 1 2 KRISTOFFER P. NIMARK AND STEFAN PITSCHNER this paper, we argue that understanding the editorial role played by news media is central to understanding what determines the amount of news coverage a particular event receives and the degree to which knowledge about an event is common among agents. We begin by estimating a Latent Dirichlet Allocation (LDA) topic model based on texts from almost 15,000 archived newspaper stories from 17 US newspapers. The newspaper stories are from two periods that we know a priori contained major news events, namely the 90 day period around the 9/11 terrorist attacks on New York and the Pentagon and the 90 day period around the Lehman bankruptcy that signaled the start of the nancial crisis. We use the model to document three stylized facts of news coverage. First, dierent newspapers specialize in dierent topics. For example, the Wall Street Journal allocated more news coverage to the nancial crisis than the average newspaper and the New York Times allocated more than average coverage to presidential politics. Second, the extent of total news coverage allocated to dierent topics varies over time and depends on what has happened. Third, major events make news coverage more homogenous across dierent outlets. The September 11 terrorist attacks, the 2008 political party conventions, the Lehman bankruptcy and the failed bailout package proposed by then Secretary of the Treasury Hank Paulson, were all events that resulted in a majority of newspapers devoting more coverage to these events than to any other. Together, these facts suggest that information about major events is closer to common knowledge than information about minor events. In order to analyze how the documented editorial behaviour of news media aects agents' beliefs and decisions, we propose a theoretical model with incomplete information that can replicate the stylized facts described above. The model is a beauty contest game in which an agent's pay-o depends on the distance of his action from an agent specic latent variable and the distance of his action from the action taken by other agents. This heterogeneity in agents' pay-o functions is taken as given but could arise for various reasons, such as dierences in geographical location or sector aliation. A basic premise of our model is that the dimensionality of the state of the world is too high for individual agents to monitor it on their own. Therefore, they rely on information providers that do so on their behalf. Furthermore, because agents are heterogeneous in terms of what information they nd most useful, news providers specialize and cater to their dierent interests. However, because of a strategic motive, the agents in our model also have an indirect interest in events that are only important for predicting the actions of others. As a result, in some states of the world all information providers report on the same events. Agents in our model delegate the decision of what to get information about to specialized information providers. These information providers can monitor a larger set of events than they eventually end up reporting. Because their decision about what to report depends on the relative newsworthiness of the realized events, what agents get information about depends on what has happened. The model presented here formalizes this editorial function of news media and thus provides a theory of how and why news media focus changes over time. While the model is abstract, it oers several insights that we believe are general. One consequence of a state-dependent news selection is that reported news stories can be informative about more than the events they actually cover. More precisely, we derive formal conditions for when news reports also reveal information about those events that are not reported. To see this, consider a person who opens a San Francisco newspaper and BELIEFS, COORDINATION AND MEDIA FOCUS nds that it only contains stories about New York. 3 If this person knows that the paper always covers all important San Francisco events, the lack of stories on such events reveals to him that none have actually taken place. Therefore, even though he only reads about New York, he can also update his beliefs about San Francisco. Moreover, because this type of information transmission results directly from the systematic news-selection, it also occurs if the realizations of the reported and unreported events are unconditionally independent. The systematic selection of what gets reported also aects the degree to which knowledge about an event is common among agents. In the existing imperfect information literature, signals are typically assumed to be either private or common knowledge, e.g. Morris and Shin (2002), Angeletos and Pavan (2007), Angeletos, Hellwig and Pavan (2007), Hellwig and Veldkamp (2009), Amador and Weill (2010, 2012), Cespa and Vives (2012) and Edmond (2013). In our model, information about a particular event is typically neither private nor common knowledge. Instead, the degree to which knowledge about an event is common among agents is endogenous and depends probabilistically on agents preferences and the distribution of events. Because news selection is state-dependent, what agents get information about, also inuences how probable they think it is that other agents read about the same event. As an example, consider again a person living in San Francisco. If the San Francisco newspaper reports about some event in Manhattan that normally would be of more interest to a reader from New York, the reader in San Francisco can infer that New Yorkers are probably also reading about that event. However, even though both San Franciscans and New Yorkers are reading about the same event, this event may not be common knowledge: While the San Franciscan can be sure that the New Yorker is also reading about the event on Manhattan, the New Yorker cannot draw a corresponding conclusion. When extreme events such as large terrorist attacks or major nancial crises occur, they tend to be reported on the front page of almost all major newspapers. In the model, events of low probability and large magnitude are important for all agents even if only a subset of them has a direct interest in these events. Because individual agents care about the strong actions that some agents will take in response to the extreme event, such events tend to be reported by all information providers. Moreover, because agents understand this, information about more extreme events also tends to be closer to being common knowledge. So, when a person in New York reads about a major nancial crisis on Wall Street, he can be almost sure that people in San Francisco are reading about the same event. The agents in our model cannot directly observe the entire state of world. This makes them similar to the rationally inattentive agents in Sims (2003), Mackowiak and Wiederholt (2009, 2010), Alvarez, Lippi and Paciello (2011), Matejka (forthcoming), Matejka and McKay (2015) and Stevens (2014) as well as to agents that need to pay a cost to observe a signal about pay-o relevant latent variables such as those in Grossman and Stiglitz (1980), Veldkamp (2006a,2006b), Van Nieuwerburgh and Veldkamp (2009, 2010). The key dierence of our set up relative to such existing endogenous information choice models is that the agents in our model rely on specialists that monitor the entire set of realized events before deciding what to report. That is, our agents make ex ante choices of which newspapers to read while each newspaper makes its editorial decision of what to report world has realized. ex post, i.e. after the state of the This setup captures the fact that the decision to acquire information 4 KRISTOFFER P. NIMARK AND STEFAN PITSCHNER often is a decision about which information provider to use, rather than a decision about 1 what variable or event to get information about. Two papers that are closely related to this one are Veldkamp (2006b) and Hellwig and Veldkamp (2009). Veldkamp studies a model in which ex ante identical agents choose asset portfolios and signals simultaneously. If dierent agents hold dierent portfolios, they prefer to observe signals about the pay-os of dierent assets. However, due to increasing returns to scale in information production, agents tend to purchase similar signals and hold similar portfolios. Apart from the delegated information acquisition decision described above, the main dierence between our model and Veldkamp's is that our agents are intrinsically heterogenous in a way that is not directly aected by the information they observe. In the coordination game in Hellwig and Veldkamp (2009), ex ante identical agents can chose to observe dierent combinations of private and public signals about a single latent variable of common interest. Hellwig and Veldkamp show that in such a setting, information acquisition inherits the strategic properties of the coordination game. Thus, if there is a strategic complementarity in actions, agents also want to buy the same signals as other agents. The main dierence between that paper and ours is that the agents in Hellwig and Veldkamp's model do not choose what to get information about, but whether the noise in their signals is common to all agents or idiosyncratic. Gentzkow and Shapiro (2006, 2008), like our paper, study the editorial function of news media, but are primarily focused on identifying and analyzing the causes and consequences of ideologically slanted reporting. The political science literature has also studied the role of news journalists and newspaper editors as "gatekeepers" that decide what information gets reported, e.g. Soroka (2006, 2012) and Soroka, Stecula, Wlezien (2014 ). Again, this literature focus primarily on analyzing and documenting ideologically biased reporting. The rest of the paper is structured as follows. In the next section we document several stylized facts about news coverage using a statistical topic model applied to US newspaper data. Section 3 presents the basic set up of a beauty contest-style model in which agents have heterogenous interests that can match the documented facts. Section 4 presents formal results based on discrete distributions of events and Section 5 extends the analysis to continuous distributions. Section 6 concludes. 2. Three Stylized Facts of News Coverage In this section, we estimate a Latent Dirichlet Allocation (LDA) topic model based on texts from a large number of archived newspaper articles. We then use this model to document three stylized facts about news coverage. In particular, we show that dierent newspapers specialize in dierent topics, that the weights they assign to topics depend on what has happened, and that major events make news coverage more homogeneous across papers. 2.1. The News Data. Our empirical analysis focuses on two 3-month periods for which we know that they contained major news events. The st period covers the months August to October of 2001, encompassing the terrorist attacks on the World Trade center and the 1The motive of our agents is well-captured by Marschak (1960) who writes that "The man who buys a newspaper does not know beforehand what will be in the news. He acquires access to potential messages belonging to a set called news." BELIEFS, COORDINATION AND MEDIA FOCUS 5 Table 1. Newspapers in Database Newspaper Full Name Short Name Newspaper Full Name Short Name Atlanta Journal AJ The Las Vegas Review-Journal LVR Charleston Gazette CG The New York Times NYT Pittsburgh Post-Gazette PPG The Pantagraph PG Portland Press Herald PPH The Philadelphia Inquirer PI Sarasota Herald-Tribune SHT The Wall Street Journal WSJ St. Louis Post-Dispatch SLP The Washington Post WP Telegram & Gazette Worcester TGW USA Today UT The Boston Globe BG Winston-Salem Journal WiSJ The Evansville Courier EC Notes: The table shows the full names of the newspapers whose front-page articles are in our text corpus. It also shows corresponding short names used in the empirical analysis below. Newspapers that have changed their names over time or have merged are combined into one entry. Pentagon on September 11. The second period runs from August to October of 2008 and thus encompasses the Lehman Brothers bankruptcy as well as the outbreak of the nancial crisis. The data we use are parts of news articles obtained from the Dow Jones Factiva database. Factiva contains historical content from more than 30,000 news papers, wire services and online sources beginning in 1970. We exclude content from wire services since their main audiences are other news organizations. We also limit our data set to articles that appeared either on front pages of newspapers or on the rst pages of their general interest sections. In total, we obtain data from 14,817 articles reported by 17 dierent US newspapers. Each of these articles is stored in our data set in the form of a text snippet that typically comprises its rst one or two sentences. The selection of newspapers includes all those for which we are able to reliably identify the stories that appeared on their front pages or the rst pages of their general interest sections. Table 1 contains an overview of the newspapers in our database as well as corresponding short names that we use in the analysis below. To illustrate the type of information that the text snippets contain, Table 2 shows a number of examples. 2.2. Latent Dirichlet Allocation. To extract topics from our text corpus, we estimate a Latent Dirichlet Allocation (LDA) topic model. Introduced in Blei et al (2003), Latent Dirichlet Allocation is now one of the most-widely applied tools in machine learning and natural language processing. Variants of it have been used, for example, to identify scientic topics (Griths and Steyvers, 2004) and to classify micro blogs (Ramage et al, 2010). The rst application to economics or nance that we are aware of is Mahajan, Dey and Haque (2008), who use it to classify nancial news articles. More recently it has also been used by 6 KRISTOFFER P. NIMARK AND STEFAN PITSCHNER Table 2. Sample Text Snippets of Newspaper Articles in the Database Text Snippet Newspaper Publication Date An 18-year-old student who wounded ve The New York Times 2001/10/31 The Washington Post 2001/09/15 Portland Press Herald 2008/10/01 The Philadelphia Inquirer 2008/09/10 people at his suburban San Diego high school earlier this year committed suicide, hanging himself with a sheet in his jail cell. The student, Jason Anthony Homan, pleaded guilty last month in the ... Passengers returned to US airports in increasing numbers yesterday to nd long lines, layers of new security and limited service. But many travelers were able to reach their destinations as more than a third of the usual number of ... A day after dividing their votes on a failed proposal for a 700 billion Wall Street bailout, Maines two US House members agreed Tuesday that its vital for lawmakers to pass a relief bill for credit markets. In a case that could have dramatic consequences for school districts and towns across Pennsylvania, the state Supreme Court will hear arguments today on the constitutionality of the commonwealths property-tax system, which raises more ... Notes: The table shows examples of the text snippets used to estimate the LDA topic model below. The text snippets were extracted from the Dow-Jones Factiva database. The dates shown are those on which the articles were originally published in the print-editions of the respective newspapers. Bao and Datta (2014) to discover risk-factors disclosed in annual corporate lings. Furthermore, Fligstein, Brundage and Schultz (2014) as well as Hansen, McMahon and Prat (2015) have used Latent Dirichlet Allocation to analyze FOMC transcripts. Using LDA allows us to discover and quantify the topics of a very large number of news texts without relying on manual classications or pre-dened categories. Moreover, because LDA denes articles as mixtures of dierent topics, it can accommodate the fact that many news stories talk about more than one specic issue. For example, it can capture that an article about a government bailout package may discuss both politics and nancial markets. BELIEFS, COORDINATION AND MEDIA FOCUS 7 The main parameter of choice researchers need to set before estimating an LDA model is the number of topics. Once this number has been set, the actual topics are formed endogenously and are thus outputs of the estimated model. Relative to approaches that use word counts to measure news coverage, e.g. Baker, Bloom and Davis (2013), the LDA does therefore not require researchers to pre-specify words or topics of interest. Another desirable property of LDA is that it captures not only changes in the importance of a topic over time, but also how important that topic is in an absolute sense. The text data or corpus used for estimating an LDA topic model is described by a vocab- ulary, which is a list of all words that it contains, and documents, which are partitions of the text corpus into separate "pieces". In our case, each text snippet from a news article is one document, and all text snippets together form the corpus. Generally speaking, an LDA topic model can be thought of as a latent structure that could have generated the observed text corpus following probabilistic rules. It is parameterized by (i) a distribution over topics that determine the probability that a document belongs to a topic and (ii) a distribution over the words in the vocabulary that denes each of the topics. In the LDA framework, each document in a corpus can be thought of as having been generated by the following steps: (1) Draw a set of topic weights from the distribution over topics. (2) Draw N topics from this document-specic topic distribution, with N being the number of words in the document (3) Draw one word from each of these N topics. k ∈ {1, 2, ..., K}, documents d ∈ {1, 2, ..., D} , the words in the vocabulary by v ∈ {1, 2, ..., V }, and words in a document by n ∈ {1, 2, ..., N }. The probability of a specic text corpora being generated is then given To describe the LDA model more formally, we index topics by by by the distribution p (β, θ, z, w) = K Y p (βi ) i=1 where β, θ and z D Y d=1 p (θd ) N Y ! p (zd,n | θd ) p (wd,n | β1:K , zd,n ) (2.1) n=1 are unobserved parameters. The rows of the K ×V matrix β contains the βk for topic k , the K × D matrix θ contains the topic proportions θd of θd,k is the proportion of words in document d drawn from topic k . The topics assignment of document d is zd so that word n in topic d is drawn from topic zd,n . The text corpus enter the distribution (2.1) through the matrix w, dened so that the words observed in document d is the vector wd and wd,n is word n in document d. word distribution document d so that There are two underlying properties that are particularly important here. is a mixed membership model. topics to dierent degrees. First, LDA This implies that each document may belong to dierent As discussed above, this is helpful for our application as it allows newspaper articles to be treated as belonging to several topics at the same time. For example, an article could be classied as belonging to the topics congressional politics with topics weight 0.4 and 0.6, respectively. nancial crisis and Second, the order and grammatical structure of words within documents is assumed to be irrelevant. While this so-called bag-of-words assumption makes LDA inappropriate for the extraction of detailed grammatical relationships between words in a text, it is useful for discovering the topics these texts generally describe. 8 KRISTOFFER P. NIMARK AND STEFAN PITSCHNER In order to apply Latent Dirichlet Allocation to an observed text corpus, the generative process described above needs to be inverted. The posterior distribution for the latent parameters conditional on the text corpus can be formed by dividing the likelihood function (2.1) by the probability of observing that corpus p (β, θ, z | w) = p (β, θ, z, w) . p (w) (2.2) Evaluating the denominator in (2.2) is computationally infeasible as it entails integrating over the distributions of the latent parameters. However, there are several methods that can be used to approximate the distribution, see Asuncion, Welling, Smyth and Teh (2009). Here, we rely on collapsed Gibbs sampling algorithm of Griths and Steyvers (2004) to estimate β, θ and z. Both the limited number of discretionary decisions required for the LDA estimation and the fact that topics emerge from the analysis without having to predene them are particularly attractive for our application. These properties allow us to analyze the thousands of documents in our database in an objective and replicable manner. 2.3. Estimation. To be able to estimate the LDA model using the approach described above, we rst have to translate the raw newspaper texts into a vector-space representation that captures their word frequencies. For this, we break the text down into single words and remove a number of very common terms that have little informative value in bag-ofword models, see Blei et al (2009). Then, we remove word-suxes using the Porter (1980) stemming algorithm. This step allows us to group closely related words such as presidential and president or worker and workers and thus reduces the size of the resulting vector space. For computational reasons, we also limit our vector-space to words that occur at least 200 times. The number of topics in the benchmark model is set to 10. While choosing a larger number can generally result in more of the topics having a clear interpretation, it can also yield a classication that is too ne for subsequent analyses. We estimate a single LDA model using the texts from both 2001 and 2008 jointly. This allows for the possibility that some topics may have a timeless dimension. For instance, the vocabulary used in sports related articles may change little over time and form a topic that is present in news articles in both 2001 and 2008. 2.4. 2 Estimated LDA Topics. Table 3 shows the topics identied by our estimated LDA model in terms of their highest-probability words. We nd that several of the topics that emerge from our estimation are intuitively meaningful. For example, Topic 1 relates to the war in Afghanistan, Topic 2 relates to the candidates of the 2008 US presidential elections, and Topic 9 covers the September 11 terrorist attacks. Furthermore, a relatively clear interpretation can also be attached to Topic 5, which seems to capture both the nancial crisis and the reactions of the US government to it. 2If no topic occurs in both periods and when the number of documents are approximately the same for the two periods, estimating a joint LDA model for both time periods with 10 topics should yield the same topics and assigned topics weights as if we were to estimate two separate models with 5 topics for each period. BELIEFS, COORDINATION AND MEDIA FOCUS 9 Table 3. Estimated LDA Topics: High-Probability Words Topic Words with the highest assigned probabilities (in descending order) 1 presid bush afghanistan washington unit today state militari taliban said 2 john democrat obama republican mccain presidenti campaign barack sen candid 3 school citi new counti student high year univers worcest state 4 year two old ago day like today aug just bank 5 nanci washington bush billion presid hous plan market bank wall 6 year state million new cut percent month rate price compani 7 mail west state daili virginia sta report get new work 8 state yesterday oci anthrax feder court said oc investig washington 9 attack terrorist new world sept york center trade airport washington 10 year polic old said man review oc counti two journal Notes: For each of the 10 topics estimated using Latent Dirichlet Allocation, the table shows the 10 words with the highest probabilities of occurring in that topic. The order of words is descending in terms of the probabilities assigned to them in the given topic. All words have been stemmed using the Porter (1980) stemmer. To get a more complete understanding of these four topics and their associated word probabilities, we also plot them in the form of word clouds (gure 1). These graphical representations show a larger number of words for each topic, reecting their probabilities within a given topic in terms of the sizes at which they are displayed. The interpretations of the four topics that we derive based on the word clouds reinforces the ones obtained from high-probability words shown in table 3. 2.5. Dierent Newspapers Specialize in Dierent Topics. The rst specic aspect of newspaper coverage that we assess using the estimated LDA model concerns the extent to which newspapers are specialized. In other words, we investigate the extent to which dierent newspapers tend to over- or underweight dierent topics relative to the overall average. For this purpose, Figure 2 plots normalized deviations of newspaper-specic topic probabilities 3 for the same four topics discussed above. The plots document that there are large amounts of variation in terms of which newspapers tend to cover which topics. For example, the nancial crisis as captured by topic 5 received more than twice as much coverage in the Wall Street Journal than it did in the hypothetical average outlet. Similarly, both the New York Times and USA Today allocated a larger fraction of their news coverage to the September 11 terror attacks that the average newspaper in our sample. These deviations suggest that newspapers do indeed specialize, resulting in coverage that is heterogeneous in the cross-section of outlets. 3We calculate these normalized deviations as d i reports on topic j and pj = PI 1 I i=1 pi,j i,j = pi,j −pj , with pj pi,j denoting the probability that newspaper being the corresponding average across all I newspapers. 10 KRISTOFFER P. NIMARK AND STEFAN PITSCHNER Figure 1. Estimated LDA Topics: Word Clouds of Selected Topics Topic 1: Afghanistan Topic 2: 2008 Presidential Candidates defens terror nation move administr call line congression polittwopresid state state senior american candidgov support troop aug militari appear final vote mayor run john said hous gener obama taliban washington governorconvent parti base secretari elect saidafghanistan race democrat night effort yesterday senat nation day pakistanday govern today republican support barack bomb rule oct voter sarah fight presidenti air campaign bush washington sen georgia attack yesterday debat palin paul week vice warstrike forc offici control warn prepar plan unit palestinian order ground tuesday presid capit week begin georg call month made new long second lead district announc wednesday new economi power america say oper time thursday sept Topic 5: Financial Crisis and Bailouts crisiadministr wall street industri plan bill bush billion govern feder monday day new rescu week econom washington yesterday said secur economi system american nation market financi hous bank presid white congression senat lawmak leader stock congress tuesday Topic 9: Terror Attacks yesterday airport ladenbin center state york world hijack suspect thousand american attack said trade osama pentagon airlin two tuesday citi new terrorist sept washington nation crash terror follow today securflight peopl week unit intern day Notes: The word-clouds illustrate the probabilities associated with specic words in the topics estimated using Latent Dirichlet Allocation. Words with higher probabilities are shown in a larger size. All words were stemmed using the Porter (1980) stemmer. The topics correspond to those shown in table 3. 2.6. Major Events Shift News Focus and Increase the Homogeneity of News. Next, we wish to assess how major events aect news coverage along two specic dimensions: the average emphasis specic topics receive, as well as the homogeneity of news coverage in the cross-section of outlets. For this, we explore time variation in the estimated topic probabilities as well as their distribution across newspapers. If major events do indeed aect the focus of news coverage and its cross-sectional homogeneity, we would expect the September 11 terrorist attacks, the nominations of presidential candidates and the outbreak of the nancial crisis to be associated with such a behavior. BELIEFS, COORDINATION AND MEDIA FOCUS 11 Figure 2. Newspaper Specialization: Probabilities of Selected Topics Topic 1: Afghanistan 1 0 −1 AJ CG PPG PPH SHT SLP TGW BG EC LVR NYT PG PI WSJ WP UT WiSJ PI WSJ WP UT WiSJ PG PI WSJ WP UT WiSJ PG PI WSJ WP UT WiSJ Topic 2: 2008 Presidential Canditate Nominations 0.5 0 −0.5 −1 AJ CG PPG PPH SHT SLP TGW BG EC LVR NYT PG Topic 5: Financial Crisis and Bailouts 1 0 −1 AJ CG PPG PPH SHT SLP TGW BG EC LVR NYT Topic 9: Terror Attacks 0.5 0 −0.5 −1 AJ CG PPG PPH SHT SLP TGW BG EC LVR NYT Notes: The gure illustrates the specialization of newspapers on dierent topics. The topics correspond to those shown in gure 1 and table 3. The short names of newspapers correspond to those in table 1. The normalized topic-specic deviations of news focus are calculated as newspaper i reports on topic j and pj = 1 I PI i=1 pi,j di,j = pi,j −pj , with pj pi,j denoting the probability that being the corresponding average across all I newspapers. To assess if this is the case, we use two dierent measures. First, we calculate overall topic probabilities at a daily frequency by averaging the estimated topic probabilities of all stories in our database for a given day. The fraction Ft,k of total news devoted to topic k at date t is thus given by P Ft,k ≡ d θt,d,k Dt (2.3) 12 KRISTOFFER P. NIMARK AND STEFAN PITSCHNER Figure 3. 2001 Terror Attacks: Time-Variation of Average Topic Probabili- ties and Homogeneity of Coverage Across Newspapers plot a: average topic probabilities fraction 1 0.5 0 08/07/01 08/17/01 08/27/01 09/06/01 09/16/01 09/26/01 10/06/01 10/16/01 10/26/01 plot b: homogeneity of news coverage across outlets 1 Beginning of Afghanistan War fraction Terror Attacks 0.5 0 08/07/01 08/17/01 08/27/01 09/06/01 09/16/01 09/26/01 10/06/01 10/16/01 10/26/01 Notes: The gure illustrates time-series variation in the probabilities assigned to the estimated topics and the cross-sectional homogeneity in newspaper coverage.The time-horizon shown is 08/01/2001 to 10/31/2001. Only days with coverage of at least 10 newspapers are shown. The topics correspond to those shown in table 3. The topic probabilities for a specic day shown in plot a are dened as the simple average of the corresponding probabilities of all articles in the database for that day. The homogeneity measure shown in plot b is dened as the fraction of newspapers for which the highest-probability topic is the same one that also carries the highest probability across all articles published on that day. where Dt is the total number of articles in the sample from day t. Second, to assess homogeneity in news-coverage across newspapers, we consider to what extent the outlets agree on which topic is most important on a given day. For this, we rst identify the topic that has the highest probability across all articles of a given day. Then, we calculate the fraction of newspapers that assign the highest weight to that same topic, BELIEFS, COORDINATION AND MEDIA FOCUS 13 Figure 4. 2008 Financial Crisis: Time-Variation of Average Topic Probabil- ities and Homogeneity of Coverage Across Newspapers plot a: average topic probabilities fraction 1 0.5 0 08/10/08 08/20/08 08/30/08 09/09/08 09/19/08 09/29/08 10/09/08 10/19/08 10/29/08 plot b: homogeneity of news coverage across outlets 1 fraction Presidential Canditate Nominations Lehman Bankruptcy Failed Bailout 0.5 0 08/10/08 08/20/08 08/30/08 09/09/08 09/19/08 09/29/08 10/09/08 10/19/08 10/29/08 Notes: The gure illustrates time-series variation in the probabilities assigned to the estimated topics and the cross-sectional homogeneity in newspaper coverage. The time-horizon shown is 08/01/2008 to 10/31/2008. Only days with coverage of at least 10 newspapers are shown. The topics correspond to those shown in table 3. The topic probabilities for a specic day shown in plot a are dened as the simple average of the corresponding probabilities of all articles in the database for that day. The homogeneity measure shown in plot b is dened as the fraction of newspapers for which their highest-probability topic is the same one that also carries the highest probability across all articles published on that day. i.e. homogeneity Ht of news coverage on day P Ht ≡ m t is dened as I (arg maxk Ft,m,k = arg maxk Ft,k ) M where I Ft,m,k is the fraction of news coverage devoted to topic is an indicator function that takes the value 1 when the equality in brackets is holds. k by newspaper m at time t and M is 14 KRISTOFFER P. NIMARK AND STEFAN PITSCHNER the total number of newspapers. The range of Ht is thus between 0 and 1, with 1 indicating that all newspapers agree on which topic is the most important one. Figure 3 shows the evolution of both of these measures for the period August to October 2001. The top panel illustrates that on September 12 and the following days newspapers assigned very high weights to the terrorism topic (topic 9) as displayed in light red. Furthermore, a second pronounced change in the average topic probabilities occurs on October 8, the day after the war in Afghanistan began (topic 1). In addition to these changes in average topic probabilities, we can see from the bottom panel that the same two days are also associated with pronounced increases in topic homogeneity. That is, both the terror attacks and the beginning of the Afghanistan war caused coverage to become more similar across newspapers. For the second period used in our analysis, i.e. August to October 2008, the same exercise is repeated in Figure 4. Here, too, several events stand out in the sense that they seem to aect both the focus of news coverage and its cross-sectional homogeneity. First, the presidential nomination conventions received high levels of attention and caused an increase in homogeneity. Then, the Lehman Brothers Bankruptcy on September 15 caused another spike. Finally, a last spike occurs on September 30, the day after the Emergency Economic Stabilization Act of 2008 failed to pass the US House of Representatives. 3. A Beauty Contest Model with State Dependent News Selection Above we documented some stylized facts about news coverage that can be attributed to the editorial decisions of newspapers. Below, we present a theoretical model that can explain these facts and help us understand the role the editorial decisions of newspapers play in determining agents' beliefs and actions. The model is an abstract beauty contest game in the spirit of Morris and Shin (2002) in which agents' pay-os depend on the distance of their actions from a latent variable as well as the distance of their action from other agents' actions. However, we depart from the original model in several important ways. First, agents have heterogenous interests in the sense that dierent agents want their actions to be close to dierent latent variables. Second, agents are constrained in the number of stories that they can read about and therefore delegate the information acquisition to specialist information providers that can monitor a large set of events on the agents' behalf. Each information provider is characterized by a news selection function, which is a mapping from states of the world to a set of reported events. The news selection functions formalize the editorial decisions of newspapers, and below we will analyze in detail how they aect agents' beliefs and ability to coordinate. 3.1. Information consumers with heterogeneous interests. Our model is populated potential stories, Xa and Xb . A potential story Xi is a random variable that takes values in X and an event xi is a particular realization of Xi . The state of the world is described by the pair by the two information consumers Alice and Bob. They live in a world with two BELIEFS, COORDINATION AND MEDIA FOCUS (xa , xb ) ∈ Ω where Ω = X × X 15 is the set of all (joint) events. An event is of interest to Alice 4 or Bob if their utility increases as a result of knowing about it. Utility and heterogenous interests. 3.1.1. Alice and Bob have dierent interests and this heterogeneity is introduced via their utility functions. The basic set-up is a two person beauty contest game in which Alice wants to take an action latent variable xa Bob's action yb . ya that is close to both the This is formalized by the following utility function for Alice Ua = − (1 − λ) (ya − xa )2 − λ (ya − yb )2 . If Bob also wanted to take an action that was close to xa (3.1) and close to Alice's action ya this setup would be a two-person version of the beauty contest in Morris and Shin (2002). However, we introduce heterogeneity among the agents by making Bob want to take an action that is close to xb . Bob's utility function Ub is symmetric to Alice's and given by Ub = − (1 − λ) (yb − xb )2 − λ (yb − ya )2 where ya is the action taken by Alice. We say that Alice has a direct interest in (3.2) Xa because xa . Symmetrically, Bob has a direct Xb . The parameter λ governs the strength of the strategic motive. Because of this strategic motive, Alice has an indirect interest in knowing about Xb since that may help her her utility depends directly on the realized value of interest in better predict Bob's action. Symmetrically, Bob has an indirect interest in knowing about Xa . ya Alice's optimal action is given by the rst order condition ya = (1 − λ) Ea [xa ] + λEa [yb ] where Ea (3.3) denotes the expectations operator conditional on Alice's information set. (A sym- metric expression describes Bob's optimal action.) If agents could observe both xa and xb directly, the equilibrium decision would be described by yi = 1 λ xi + xj : i, j ∈ {a, b} , i 6= j 1+λ 1+λ However, Alice and Bob observe neither Xa nor Xb (3.4) directly and instead have to rely on information providers who monitor the state of the world on their behalf. 3.1.2. Information constraints. News stories are to some extent indivisible in the sense that reading one word about many dierent stories is less useful than reading a full paragraph about fewer stories. It is also not feasible for an individual to read all stories that are reported by all newspapers. To capture these constraints, Alice and Bob are restricted to read about only one event, though Alice and Bob may not always read about the same event. 4Here we use the word outcome of Xi probability to. event to mean a specic story that a newspaper might report about, i.e. a realized and not in the more general way as meaning any collection of outcomes that we can assign a 16 KRISTOFFER P. NIMARK AND STEFAN PITSCHNER 3.2. Information providers. There are two information providers, Paper A and Paper B . To capture that individuals cannot read all available information, Alice and Bob are restricted 5 to reading only one paper each. she nds most interesting. Alice reads Paper A because it reports those stories that Similarly, Bob reads Paper B because it reports those stories that he nds most interesting. While not modeled explicitly here, this is a simple way of capturing that newspapers compete for readers/subscribers by oering specialized content. We formalize the editorial decision of a newspaper by dening its news selection function as a mapping from the realized state of the world into a discrete decision of what to report. Denition 1. The news selection function Si : Ω → {0, 1} is an indicator function that takes the value 1 when paper i reports the realized value of Xi and 0 otherwise. Depending on the state of the world, Alice observes either Xa or Xb . Both newspapers make their editorial decisions in order to maximize the expected utility of their readers. In doing so, they take the news selection function of the other newspaper as given. The news selection functions are thus determined by Si (xi , xj ) = arg max E [Ui (Si , Sj , Uj )] (3.5) Si where the expression makes it clear that the expected utility of an agent depends not only on the news selection of the paper that he or she reads but also on the utility function of, and the news selection function of the paper read by, the other agent. 3.3. News selection and beliefs. Reading a news report about either immediately informative about that specic variable. Xa or Xb is always However, one implication of non- random news selection is that whether an event is reported or not is informative by itself. We can state this result more formally in the following proposition. Proposition 1. Posterior beliefs about the unreported story Xj coincide with the conditional distribution p (xj | xi ) only if the probability of reporting about Xi is conditionally independent of the unreported variable, i.e. p (xj | Si = 1, xi ) = p (xj | xi ) (3.6) p (Si = 1 | xi ) = p (Si = 1 | xj, xi ) . (3.7) only if Proof. By Bayes' rule we can express the posterior about the unreported variable as p (xj | Si = 1, xi ) = p (Si = 1 | xj, xi ) p (xj | xi ) . p (Si = 1 | xi ) (3.8) It then follows immediately that (3.6) holds only if p (Si = 1 | xj, xi ) =1 p (Si = 1 | xi ) which completes the proof. 5It (3.9) would be straightforward to endogenize the decision of how may newspapers each agent chooses to read. A xed cost of reading a newspaper that is large enough to discourage Alice and Bob from reading both newspapers while not being so large as to make it prohibitively expensive to read one newspaper would result in the same outcome. BELIEFS, COORDINATION AND MEDIA FOCUS Xa Consider a set up where about xi xi xj . probability of reporting is and Xb are independent so that is then by itself uninformative about informative about xj . i depends on the realized value of would report Xj instead of Xi . p (xj | xi ) = p (xj ). Knowing But Proposition 1 states that if the Xj , the fact that xi The implication of Proposition 1 is starkest if the support of where paper 17 Since paper i Xj was reported contains states of the world did not report xj , these states can then be ruled out, i.e. these states are associated with a zero probability conditional on Si = 1 and xi . As an example, consider somebody reading the Wall Street Journal. If there is no report about a stock market crash, the reader can infer that no stock market crash has occurred since the Wall Street Journal would for sure have reported such an event, had it occurred. 4. News selection, public information and correlated actions To investigate the implications of news selection, we here rst specify a discrete state space for the random variables Xa and Xb . This allows us to derive explicit expressions for optimal actions, the publicness of information as well as how the correlation of agent's actions are aected by the editorial decisions of newspapers. 4.1. Discrete states of the world. In this section, the potential stories discrete random variables that can take the values −1, 0, or 1. Xa and Xb are The dierent states occur with probabilities given by 1 1 1 pi (−1) = , pi (0) = , pi (1) = : i ∈ {a, b} (4.1) 4 2 4 where pi (xi ) is the pmf of xi . The random variables Xa and Xb are thus identically and symmetrically distributed around zero. We also assume that Xa and Xb are independent of one another so that pi (xi | xj ) = pi (xi ) : i 6= j, ∈ i, j {a, b} . Neither the symmetry nor the independence of the distributions for (4.2) Xa and Xb are necessary for what follows, but help simplify the presentation. 4.2. Optimal news selection functions. Each information provider chooses what to re- port in order to maximize the expected utility of its respective reader. Because of the strategic motive in agents' utility, what information will be most useful to Alice depends on Bob's action. Since Bob's action in turn depends on what information he has available, the news selection function of a Paper A depends on the news selection function of Paper B. A Nash equilibrium in the news selection game is a xed point at which neither newspaper wants to change its selection function, taking the other paper's selection function as given. Because the optimal news selection functions depend on how agents respond to information, we cannot fully characterize them before we have derived the agents' optimal actions. However, these actions depend on the news selection functions. We therefore rst state the equilibrium news selection functions without proof. Below, we will derive the optimal actions of the agents, taking the conjectured news selection functions as given. It is then straightforward to verify that the news selection functions postulated here do indeed constitute a Nash equilibrium. 18 KRISTOFFER P. NIMARK AND STEFAN PITSCHNER 4.2.1. No strategic motive. As a benchmark, consider rst the case in which there is no λ = 0, it is optimal for Paper A to always report Xa since Alice's utility then neither directly nor indirectly depends on Xb . Symmetrically, it will always be optimal for paper B to report Xb . (Alice would be indierent to reading about Xa or Xb when Xa = 0 and the same holds for Bob and Xb .) In the absence of a strategic strategic motive in the agents' actions. When motive in actions, the news selection functions are thus simply described by Si = 1 ∀ {xi , xj } ∈ Ω. The news selection functions for Paper A B and Paper (4.3) when λ=0 are also given in tabular form in the top row of Table 1. 4.2.2. Strategic complementarities. When agents have an incentive to take actions that are close to the action of the other agent, i.e. when λ > 0, the equilibrium news selection function is described by Si = 0 1 if xi = 0 and xj ∈ {−1, 1} (4.4) otherwise A will then report Xa when Xa equals −1 or 1 but report Xb if Xa = 0 and Xb 1. Again, the news selection functions are given in tabular form in the second 1. 6 As in the case with no strategic motives, Paper A will report about Xa equals −1 or row of Table most of the time. However, when Alice wants to take an action that is close to Bob's action, it is optimal for Paper A to report about is simple. Xb in states of the world when When the realized value of Xa xa = 0 and xb 6= 0. whether Bob will take a positive or negative action. Knowing the realized value of then more useful to Alice since she can then better predict Bob's action. 6Suciently The intuition is zero, it is more important for Alice to know 7 Xb is strong complementarities result in multiple equilibria in news selection strategies. This case is discussed in the Appendix. 7In fact, given the news selection function (4.4), Alice can infer that if she reads about probability 1. Xb , then xa = 0 with However, that Alice can infer the realized value of the unreported value with certainty is to some degree an artefact of the low dimensional state space. (Proposition 1 above provided a more general characterization of the information about the unreported event, conditional on what was reported.) BELIEFS, COORDINATION AND MEDIA FOCUS 19 Table 1: News selection functions Paper A Paper B No strategic motive Xa = −1 Xa = 0 Xa = 1 Xb = −1 A A A Xb = 0 A A A Xb = 1 A A A Xa = −1 Xa = 0 Xa = 1 Xb = −1 B B B Xb = 0 B B B Xb = 1 B B B actions λ > 0 Xa = −1 Xa = 0 Xa = 1 Xb = −1 B B B Xb = 0 A B A Xb = 1 B B B Complementarities in Xb = −1 Xb = 0 Xb = 1 4.3. λ=0 Xa = −1 Xa = 0 Xa = 1 A B A A A A A B A News selection and higher order beliefs. Public signals that are commonly known to be observed by all agents are particularly inuential when privately informed agents interact strategically because such signals are particularly useful for agents that want to predict other agents' actions (e.g. Morris and Shin 2002). Arguably, everything that is reported by newspapers is public in the sense that it is available for those who care to look for it. However, not all information that is printed in a newspaper is observed by everybody, and even when an event is widely reported, it may not be known to readers of all newspapers how widely reported it is. In the model above with strategic complementarities, there are states of the world where Alice and Bob read about the same event. Yet, this event may not be common knowledge. Consider rst the case when Paper (0, 1) (0, −1) A reports about Xb . This only happens in the states Xb . This is natural since Alice has no direct interest in Xb and nds it useful to know about Xb only to and i.e. only in states of the world where Paper B also reports about the extent that it helps her predict the action of Bob. Because Alice understands that Bob will read about Xb for sure whenever she does, she knows that X b = xb and that Bob knows this as well. Yet, this fact will not be common knowledge. Bob knows that he observes the states (−1, 1) , (1, 1) , (−1, −1) , (1, 1) , (0, −1) and (0, 1) . But since Alice observes Xb Xb in in only the latter two states and because Bob attaches positive probability to the states where Alice does not observe Xb , the fact that Alice and Bob both know this to be true. which Bob believes that Alice observes the realized value 4.4. 8 Xb Xb = x b is not common knowledge even though As we will now demonstrate, the probability with when he does aects how strongly he responds to xb . Equilibrium actions. Alice and Bob's equilibrium actions depend on the degree of strategic complementarities both directly and through the eect the strategic motive have on the equilibrium news selection functions. Here, we derive the optimal actions taking the news selection functions described by (4.3) as given. 8In fact, in the simple discrete example here, the only state in which any event is common knowledge is (0, 0) since it is only in this state that Alice or Bob reads a report stating that the variable they have a direct interest in equals zero. 20 KRISTOFFER P. NIMARK AND STEFAN PITSCHNER 4.4.1. No strategic motive. always observe and yb = xb . Xb . Since With no strategic motive, Alice always observe Xa and Bob Alice and Bob's equilibrium actions are then trivially given by Xa and Xb y a = xa are independent random variables, Alice's and Bob's actions are also independent. 4.4.2. Strategic complementarities. Bob sometimes observe Xa . With a strategic motive, Alice sometimes observe Bob knows that he only observes Xa when Alice does so as well. Bob can thus infer Alice's action with certainty when he observes Bob only observes Xa when xb = 0, Xb and Xa . Furthermore, since Bob's optimal action when he observes Xa is simply given by yb (xa , Sb = 0) = λya (xa , Sa = 1) When Alice observes (4.5) Xa she does not know with certainty whether Bob does so as well. xa if xb = 0 which happens with probability 12 . Alice's optimal action when xa ∈ {−1, 1} is then given by If xa ∈ {−1, 1} , Bob also observes ya (xa , Sa = 1) = (1 − λ) xa 1 +λ yb (xa , Sb = 0) 2 1 +λ E [yb (xb , Sb = 1) | xa , Sa = 1, Sb = 1] 2 (4.6) Because of the symmetry, the expectation on the third line equals zero. Substituting (4.5) into (4.6), simplifying and switching to general indices gives yi (xi , Si = 1) = (1 − λ) xi 1 − 12 λ2 (4.7) and yi (xj , Si = 0) = λ (1 − λ) xj 1 − 12 λ2 We can see from (4.7) - (4.8) that regardless of whether Alice observes tude of her response depends on the probability Xa (4.8) Xa or Xb , the magni- p (Sj = 0 | xi , Si = 1) . When Alice observes Xa . When Alice this is the probability she attaches to the event that Bob also observes Xb this is the probability that Alice believes Bob attaches to the event that she obXb . Thus, the higher this probability is, the stronger will the response of both agents observes serves be. The degree to which information about an event is common among agents thus matter for the strength of their responses, even when an event is mutual knowledge. Incidentally, the expression (4.7) also describes the optimal action when agents observe that the variable they have a direct interest in equals zero, since the state (0, 0) is common knowledge. It is then optimal for both agent to take a zero action. 4.5. Verifying the optimality of the conjectured news selection functions. Given the optimal actions derived above, it is straightforward to verify by direct computation that neither Paper A nor Paper B has an incentive to deviate from the conjectured news selection functions described by (4.3). The Appendix describes a operational algorithm for doing so. BELIEFS, COORDINATION AND MEDIA FOCUS 4.6. 21 Correlation of actions with and without delegated news selection. To isolate the implications of the editorial function of the newspapers for agents' actions we now compare the predictions of the model with a natural alternative. In the alternative model Alice and Bob are, as in the benchmark model, restricted to observing only one out of the two realized events. However, instead of delegating the news selection to a newspaper that can condition on ex post outcomes, Alice and Bob have to make a decision ex ante about which variable to observe. Without the possibility of delegating the selection of what to observe, Alice will always chose to observe Xa and Bob will always chose to observe dependent, observing xa is then uninformative about xb Xb .9 Since Xa and vice versa. and Xb are in- The conditional expectation of the unobserved variable is then equal to its unconditional mean and the optimal action with ex ante story choice is given by yi = (1 − λ) xi : i ∈ a, b Clearly, if Xa and Xb (4.9) are independent, Alice and Bob's actions are uncorrelated in this alternative model. Proposition 2. Delegated news selection introduces positive correlation between Alice and Bob's actions. Proof. Direct computation of the correlation of Alice and Bob's actions gives P ω∈Ω p (xa , xb ) ya (xa , xb )yb (xa , xb ) (1 − λ)2 −1 p p = 2λ 2 var (yi ) 2 (2 − λ ) var (ya ) var (yb ) > 0 (4.10) (1, 1) and (−1, −1) will cancel against the terms (−1, 1) and (1, −1) . the term associated with the with the states (0, 1) , (0, −1) , (1, 0) and (1, 0) are Here, the terms associated with the states associated with the equally probable states state (0, 0) is zero. The terms associated all positive and when weighted by their probabilities sums up to the term multiplying the reciprocal of the variance in (4.10). Under ex ante information choice, these terms would all be zero. The editorial function of newspapers thus introduces correlation in agents actions that is absent if agents choose ex ante what variable to get information about. 5. Extreme events and approximate common knowledge The model above allowed us to analyze how state dependent news selection aects agents' beliefs and actions and we demonstrated that agents' preferences and the distribution of events inuences the degree to which an event is commonly known. In the data, we saw that the 9/11 attacks and the Lehman Brothers bankruptcy made news coverage more homoge- nous across news outlets. Arguably, what made these events special and so widely reported were their magnitude, as both bank failures and terrorist attacks happen frequently on a smaller scale. The simple discrete state space set up above did not allow us to capture the 9With λ or Xb . close enough to 1, it is optimal for Alice and Bob to coordinate on both always observing either Xa 22 KRISTOFFER P. NIMARK AND STEFAN PITSCHNER notion of a large magnitude event. In this section we therefore extend the model above to allow for continuously distributed events so that we can meaningfully analyze the implication of large magnitude events. 5.1. Optimal simple news selection functions. potential stories Xa and Xb , With continuous distributions of the the optimal news selection functions are innite dimensional objects and do not in general have known functional forms. While it is possible to show that the optimal news selection functions will be of threshold function that determines how large (or how negative) Xb need to be for Paper general be specic to the realized value of Xa . A to report about it, the thresholds will in Here, we restrict the news selection functions to belong to a simple parametric class of the form Si = 1 0 if |xi | ≥ α + β |xj |γ (5.1) otherwise The threshold function (5.1) is symmetric around zero, and symmetric across Paper Paper Paper 5.2. A and B . Subject to these constraints, the parameters α, β and γ are again chosen so that A maximizes the expected utility of Alice and Paper B does the same thing for Bob. Conditional actions. When Xa Xb and are continuously distributed, the conditional expectations in the rst order condition (3.3) can be expressed as RR xi p (xi , xj , Si ) dxi yi (xk , Si ) = (1 − λ) p (xk , Si ) RR yj (xi , xj ) p (xi , xj , Si ) dxi dxj : i, j, k ∈ {a, b} , i 6= j + λ p (xk , Si ) Xb , the news selection funcxj and Sj = 0 is also zero.10 As in the previous section, it will again be optimal for Paper B to report about Xa only when Paper A does so as well. That is, if Sj = 0, then Sj = 1. This allows us to simplify the For independent, symmetric, zero mean distributions of tion (5.1) implies that the expected value of xi Xa (5.2) and conditional on expression (5.2) to yi (xi , Si = 1) = 1− (1 − λ) xi = 0 | xi , Si = 1) λ2 p (Sj and yi (xj , Si = 0) = λ 1− (1 − λ) xj = 0 | xj , Sj = 1) λ2 p (Si (5.3) (5.4) xi depends on the xi is common knowledge. However, here the probability xi believes that the other agent also observes xi varies As in the discrete states model above, the strength of the responses to degree to which the realized value of with which an agent that observe continuously with the realized state. Given the news selection function (5.1), the probability in the denominator of (5.3) and (5.4) is increasing in the absolute realized value of xj xi and so agents responds more than proportionally to large magnitude events. 10One way to think about this is that γ |xi | = αi + βi |xj | i . p (xi , xj , Si = 0) is simply p (xi ) with symmetrically truncated tails at BELIEFS, COORDINATION AND MEDIA FOCUS 23 With continuous distributions, we need to solve the model numerically. A solution can be found by letting Paper A the news selection function of α, β and γ α, β and γ in order to maximize Alice's utility, taking Paper B and Bob's actions as given. Paper B then chooses choose in order to maximize Bob's expected utility, taking the Paper A news selection function from the rst step as given. Iterating between these two steps until convergence yields a solution. Figure 5 illustrates several model outputs. The left column corresponds to Xi ∼ U (−1, 1) Xi ∼ N (0, 13 ).To facilitate comparison, the variance of the Gaussian distribution is chosen so that most of its probability mass lies within the support of the and the right column to uniform distribution. 5.3. News selection, the strategic motive and publicness of information. discrete state model above, when report Xb . When λ > 0, Paper A λ = 0 A Paper always report Xa As in the B and Paper always Xb and Paper B Xa . Clearly, Alice's expected loss of not knowing Xa value of Xa . When λ > 0, it is also increasingly costly will sometimes nd it optimal to report will sometimes nd it optimal to report is increasing in the absolute realized for Bob to not know about Alice's action as the absolute magnitude of her action increases. The probability that Alice and Bob observes Xa is thus increasing in the absolute value of xa . 5.3.1. No strategic motive. The second row of Figure 5 illustrates the probability that Alice (solid lines) and Bob (dashed lines) observes Xa conditional on the realized value xa . λ = 0 (blue and purple lines), Alice always observes Xa and Bob never observes Xa Xa the associated probabilities are 1 and 0 respectively. When so for all values of Moderate strategic motive. λ = 0.3 (red and green lines) and the absolute value Xb . However, the probability that Alice observes Xa tends to 1 rapidly as |xa | increases. Bob is also more likely to observe Xa as |xa | increases but since the states in which Bob observes Xa is a subset of the states where Alice observes Xa , Bob's probability of observing Xa is lower than the probability that Alice does so for every value of xa and it is increasing at a slower rate in |xa | . By Bayes' rule, the probability that Bob observes Xa conditional on Alice doing so is given 5.3.2. xa of is small, Paper A When is more likely to report by p (Sa = 1 | Sb = 0, xa ) p (Sb = 0 | xa ) (5.5) p (Sa = 1 | xa ) Since Bob only observes Xa when Alice does we have that p (Sa = 1 | Sb = 0, xa ) = 1 so that p (Sb = 0 | Sa = 1, xa ) = (5.5) simplies to p (Sb = 0 | xa ) . p (Sa = 1 | xa ) Alice knows that Bob is more likely to observe Xa as the absolute value of xa p (Sb = 0 | Sa = 1, xa ) = (5.6) increases, so larger magnitude events tend to be closer to common knowledge. For the uniform distribution, Alice attaches about a 30 per cent probability to the event that Bob observes Xa is close to −1 or 1. Xa when When the events are normally distributed, the same probability is just above 60 per cent. The dierence is explained by the fact that with normally distributed variables, the probability mass is more concentrated around the means, so conditional on 24 KRISTOFFER P. NIMARK AND STEFAN PITSCHNER the realized value of make Paper B Xa , it is then less likely that Xb has a large enough absolute value to report about that instead. While not shown in the graph, if we extended the x-axis, the probability that Bob observes Xa would tend to 1 as the absolute value of |xa | grows arbitrarily large. 5.3.3. Strong strategic motive. As the strategic motive is strengthened the cost of not observ- ing the event that the other agent observes increases. When lines) both Paper A and Paper B λ = 0.6 (yellow and turquoise will simply report the variable that has had the largest absolute realization. Since the news selection functions are known to both agents, Alice can then infer that if she observes Xa then |xa | > |xb | so that Paper B will also report Xa . With suciently strong complementarities in actions, both papers will always report the same event and the reported event will be common knowledge. U(-1,1) N(0,1/3) 1.5 1 1 0.5 0.5 p(x) 1.5 -0.5 0 0.5 0 -1 1 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 -0.5 0 0.5 1 0.5 1 i p(S | x ) 0 -1 0 -1 -0.5 0 E[ya + yb | xi] p(Si = 1 | xi), = 0 0.5 p(Si = 1 | xi), = 0.3 0 -1 1 p(Si = 1 | xi), = 0.6 1 1 0.5 0.5 0 0 -0.5 -0.5 -1 -1 -0.5 0 0.5 -1 -1 1 =0 -0.5 p(Sj = 0 | xi), = 0 = 0.3 0 p(Sj = 0 | xi), = 0.3 -0.5 p(Sj = 0 | xi), = 0.6 0 0.5 1 = 0.6 Figure 5. The top row illustrates the pdfs of the U (−1, 1) and N (0, 1/3) distributions. The second row illustrates the probability that paper A (solid lines) and paper B (dashed lines) reports Xi row illustrates the expected aggregate response conditional 5.4. xi Non-linear aggregate responses. Xi . The on xi . conditional on bottom The expected aggregate response conditional on depends on how likely it is that agents observe Xi . When Alice observed Xa , the strength BELIEFS, COORDINATION AND MEDIA FOCUS 25 of her response depends how likely she thinks it is that Bob also observes Xa , and the Xa , and so probability she believes that Bob attaches to the event that Alice also observes on. Since it is more likely that Alice and Bob observes Xa when it has a large absolute realization, and because Alice and Bob knows that it is then more likely that they observe Xa , and so on, their expected responses conditionally on observing Xa are non-linear. This nonlinearity is illustrated in the bottom row of Figure 5. For realizations of to zero, the probability that Alice or Bob observes Xa is also close to zero. the expected response is then also zero since if no agent observes Xa equals its unconditional mean. to observe Xa Xa Xa close In the limit, their expectation of With a moderate strategic motive, even if Alice were she knows that there is only a small probability that Bob also observes This makes Alice's response, conditionally on observing a small realization of as well. As the realized absolute value of Xa Xa . Xa . weaker increases, the probability that Alice and Bob Xa the magnitude of here |xa | as the probability that Bob also observes reads about it increases, and conditionally on Alice observing response increases more than proportionally in Xa increases. 6. Conclusions News media are an important source of information for a large part of society. In this paper we have argued that in order to understand how news media aect decisions, we need to rst understand how they select what stories to report. We therefore obtained text fragments from a large number of news stories published in US newspapers during the months around the September 11 terrorist attacks and the Lehman bankruptcy in 2008. We then used a Latent Dirichlet Allocation statistical topics model to document three stylized facts about newspaper coverage. First, dierent newspapers provide specialized content and tend to cover dierent topics to dierent degrees. For example, the nancial crisis received particularly large amounts of coverage by the Wall Street Journal, and both the New York Times and USA Today assigned above-average weights to the September 11 attacks. Second, major events such as terrorist attacks or nancial crises result in a high fraction of news content being devoted to the topics associated with these events. As an example, the LDA model attributes more than 50 per cent of the total news coverage during the days following the September 11 attacks to the topic associated with the attacks. Third, major events make news coverage more homogenous across newspapers. The September 11 terrorist attacks, the 2008 political party conventions, the Lehman bankruptcy and the failed bailout package proposed by then Secretary of the Treasury Hank Paulson, were events that all resulted in a majority of newspapers devoting more coverage to these events than to any other. Motivated by these stylized facts about news coverage, we proposed a theoretical model that can match these facts. We used to the model to argue that, in order to understand how agents respond to particular events, one has to distinguish between information that is publicly available, in the sense of being reported by at least one newspaper, mutual information, in the sense of being reported by all newspapers, and information that is common knowledge, i.e. information that all agents know and that all agents know that all agents know, and so on. 26 KRISTOFFER P. NIMARK AND STEFAN PITSCHNER In the model, how widely reported an event becomes is endogenous and depends on agents' preferences and what other events have occurred that compete for the available news coverage space. The probability with which agents believe that other agents observe the same information as they are depends on the news selection functions of the information providers. In general, information in the model is neither purely private nor common knowledge but varies probabilistically. We demonstrated that, in the model, agents' actions tend to be more correlated when they delegate the news selection to newspapers that can condition on ex post events before deciding what to report, compared to a setting where agents decide ex ante what to get information about. We also showed that large events make information closer to common knowledge. With strategic complementarities in actions, agents then respond more than proportionally to large events. That the editorial role of information providers facilitates coordination in some states of the world has implications for the large existing literature proposing that business cycles are (at least partly) caused by agents coordinating on either pure sun-spot shocks, e.g. Cass and Shell (1983), on noisy public signals e.g. Lorenzoni (2009) and Nimark (2014), or on "sentiment" shocks e.g. Angeletos and La'O (2013) and Angeletos, Collard and Dellas (2014). One feature these papers have in common is that the coordination of actions cannot rely solely on the information that is transmitted through prices. Blinder and Krueger (2004) report that a majority of households get most of their economic information from newspapers. Since business cycles require that millions of households and rms take correlated actions, it seems plausible that coordination must then partly work through mass media. The argument we make in this paper is that, to the extent that coordination works via news media, coordination is facilitated in those states of the world where news coverage is more uniform across dierent news providers. In the theoretical model proposed here, we took a very benevolent view of how news media selects what to report. While truthful and unbiased reporting that aim to maximize the utility of the reader may or may not be a good approximation of reality, the mechanism that we have described in this paper will be at work as long as any biases in news reporting is systematic and understood by the agents in the model. For instance, if newspapers tend to be more likely to report bad news than good news, then bad news events will be more widely reported, closer to common knowledge and provoke stronger responses than good news of similar magnitude. A benevolent and accurate news media is also natural benchmark to start from and we think that it is interesting that even under such ideal assumptions, news media can have important eects on agents' decisions and beliefs. References [1] Alvarez, F., F. Lippi and L. Paciello, 2011, "Optimal Price Setting With Observation and Menu Costs", Quarterly Journal of Economics 126, pp1909-1960. [2] Angeletos, G.M., F. Collard and H. Dellas, 2014, Quantifying Condence", working paper MIT. [3] Angeletos, G.M., J. La'o, 2013, Sentiments", Econometrica, Volume 81,pp.739779. [4] Angeletos, G.M., L. Iovion, J. La'o, 2015, Real Rigidity, Nominal Rigidity, and the Social Value of Information", mimeo, MIT. BELIEFS, COORDINATION AND MEDIA FOCUS 27 [5] Asuncion, A., M. Welling, P. Smyth and Y.W. Teh, 2009, "On smoothing and inference for topic models", Proceedings of the Twenty-Fifth Conference on Uncertainty in Articial Intelligence, pp. 27-34. AUAI Press. [6] Baker, S.R., N. Bloom and S.J. Davis, 2013, "Measuring economic policy uncertainty", Chicago Booth research paper 13-02. [7] Bao, Y. and A. Datta, 2014, Simultaneously Discovering and Quantifying Risk Types from Textual Risk Disclosures", Management Science vol 60, pp1371-1391. Text Min- [8] Blei, D.M. and J.D. Laerty, 2009, Topic Models", in A. Srivastava and M. Sahami, editors, ing: Classication, Clustering, and Applications Chapman Discovery Series. [9] Cass, D. and K. Shell, 1983, Do Sunspots Matter?", and Hall CRC Data Mining and Knowledge Journal of Political Economy Vol. 91, pp. 193-227. [10] Doms, M. and N. Morin, 2004, Consumer Sentiment, the Economy, and the News Media", Federal Reserve Bank of San Francisco Working Paper 2004-9. [11] Fogarty, B.J., 2005, Determining Economic News Coverage", Research, vol 17, pp.149-172. International Journal of Public Opinion [12] Gentzkow, M. and J. Shapiro, 2006, Media Bias and Reputation", Journal of Political Economy, 14, pp280-316. [13] Gentzkow, M. and J. Shapiro, 2008, Competition and Truth in the Market for News, Economic Perspectives, pp133-154. [14] Griths, T. and M. Steyvers, 2004, "Finding scientic topics", Sciences, Journal of Proceedings of the National Academy of vol 101, pp5228-5235 [15] Hellwig, C. and L. Veldkamp, 2009, Knowing what others know", Review of Economic Studies, pp223- 251. [16] Jaimovich, N. and S. Rebelo, 2009, Can News about the Future Drive the Business Cycle?", Economic Review, vol. 99, issue 4, pp1097-1118. [17] Kajii, A. and S. Morris, 1997, Common p-Belief: The General Case", vol 18 pp73-82. [18] Lorenzoni, Guido, 2009, "A Theory of Demand Shocks", vol American Games and Economic Behavior, American Economic Review, American Eco- nomic Association, vol. 99(5), pages 2050-84, December. [19] Mackowiak, B. and M. Wiederholt, 2009, Optimal Sticky Prices under Rational Inattention", Economic Review, vol. 99(3), pages 769-803. American [20] Mackowiak, B. and M. Wiederholt, 2010, Business Cycle Dynamics under Rational Inattention", of Economic Studies, pp1502-1532. [21] Matejka, F., forthcoming, "Rationally Inattentive Seller: Sales and Discrete Pricing", nomic Studies. Review Review of Eco- [22] Matejka, F. and A. McKay, 2015, "Rational inattention to discrete choices: A new foundation for the multinomial logit model", American Economic Review 105, pp272-98. [23] Mahajan, A., L. Dey, and S. M. Haque, 2008, Mining nancial news for major events and their impacts on the market", in Web Intelligence and Intelligent Agent Technology, vol. 1, pp. 423-426, IEEE. [24] Monderer, D. and D. Samet, (1989), "Approximating common knowledge with common beliefs", and Economic Behavior, pages 170-190. Games [25] Morris, S. and H.S. Shin, 1997, Approximate Common Knowledge and Co-ordination: Recent Lessons from Game Theory", Journal of Logic, Language, and Information vol 6, pp171190. [26] Morris, S. and H.S. Shin, 2002, "The social value of public information", American Economic Review 92, pp1521-1534. [27] Paciello, L. and M. Wiederholt, 2013, "Exogenous Information, Endogenous Information and Optimal Monetary Policy", Review of Economic Studies. [28] Porter, M.F., 1980, An algorithm for sux stripping", Program Vol. 14, pp. 130-137. [29] Ramage, D., S. Dumais and D. Liebling, 2010, Characterizing Microblogs with Topic Models", ceedings of the Fourth International AAAI Conference on Weblogs and Social Media . Pro- 28 KRISTOFFER P. NIMARK AND STEFAN PITSCHNER [30] Soroka, S.N., 2006, "Good news and bad news: Asymmetric responses to economic information", of Politics [31] Soroka, S.N., 2012, "The gatekeeping function: world." Journal 68, pp372-385. Journal of Politics Distributions of information in media and the real 74, pp514-528. [32] Soroka, S.N., D.A. Stecula and C. Wlezien, 2015, "It's (Change in) the (Future) Economy, Stupid: Economic Indicators, the Media, and Public Opinion", American Journal of Political Science 59, pp457- 474. [33] Stevens, Luminita, 2014, "Coarse Pricing Policies", working paper, University of Maryland. [34] Van Nieuwerburgh, S. and L. Veldkamp, 2009, "Information immobility and the home bias puzzle", Journal of Finance 64, pp1187-1215. [35] Van Nieuwerburgh, S. and L. Veldkamp, 2010, "Information acquisition and under-diversication", Review of Economic Studies 77, pp779-805. [36] Veldkamp, L., 2006a, "Information markets and the comovement of asset prices", Studies 73, pp823-845. [37] Veldkamp, L., 2006b,Media Frenzies in Markets for Financial Information", view, Vol. 96, pp. 577-601. Review of Economic American Economic Re-