Multiproduct Firms, Information, and Loyalty∗ Bharat N. Anand† Ron Shachar‡ Harvard Business School Tel-Aviv University Abstract The information set of consumers who are uncertain about product attributes is a central ingredient in the analysis of marketing researchers and the strategies of marketing practitioners. Here, we suggest that the profile of a multiproduct firm is an important element in the information set about each one of its products. As an example of such profiles, Honda is known to produce gasoline-efficient cars, whereas Volvo is perceived to focus on safety. The model includes product differentiation and heterogeneous consumers who are uncertain about product attributes. Like previous studies, a consumer’s information set for a product includes noisy signals she obtains about the product’s attributes, and the parameters of her prior distribution. Unlike previous studies, her prior distribution of a product’s attributes depends on the firm’s profile (which is defined as the distribution of attributes across the firm’s products). We show that by revising the information set in this way, one can explain interesting empirical regularities concerning consumer behavior. For example, the consumer is loyal to a multiproduct firm even at times that it does not offer a product that matches her preferences better than competing firms. Our model also offers a parsimonious way of thinking about brand extension strategies and maps new channels of spillovers within a multiproduct firm. We estimate the model and test its implications using panel data on television viewing choices. The estimated model also allows for state dependence parameters and unobserved heterogeneity. Our structural estimates come from maximizing a simulated likelihood function and using importance sampling. The empirical results support the model and its implications. We find that the profile of a multiproduct firm is an important element in the information set of consumers. Finally, we illustrate the empirical bias in standard choice models that do not structure the information set as suggested here. ∗ We are grateful to Dmitri Byzalov for outstanding research assistance, and to Luis Cabral, Zvi Eckstein, Tulin Erdem, Barry Nalebuff, Ariel Pakes, Peter Reiss, Manuel Trajtenberg, and seminar participants at Econometrics in Tel-Aviv, Marketing Science 2001, Harvard, and Yale for helpful comments. † Soldiers Field Road, Boston, MA 02163. Phone: (617) 495-5082; Fax: (617) 495-0355; email: banand@hbs.edu. ‡ Faculty of Management, Tel Aviv University, Tel Aviv, Israel, 69978. Phone: 972-3-640-6311; Fax: 972-3-6405621; email: rroonn@post.tau.ac.il. 1 Key words: multiproduct firms, incomplete information, consumer loyalty, unobserved heterogeneity, brand extensions, television networks. 2 1 Introduction The information set of consumers who are uncertain about product attributes is a central ingredient in the analysis of marketing researchers and the strategies of marketing practitioners. Here, we suggest that the profile of a multiproduct firm is an important element in the information set for each one of its products.1 For example, consumers are likely to expect that any Volvo automobile is safer than most cars. We show that by revising the information set to include firms’ profiles, one can explain interesting empirical regularities concerning consumer behavior. We test our model using a dataset on television viewing choices and find that the informational role of multiproduct firms is both statistically and behaviorally significant. Outline of the Model In today’s economy, virtually no firm produces one product only.2 In many cases, these firms have distinct profiles. Honda, for example, is known to produce gasolineefficient cars whereas Volvo is perceived to focus on safety. Including a firm’s profile in the information set is ignored by previous studies of brand choice. Yet, these studies reveal key elements of the information set, such as word of mouth, media coverage, advertising, and previous experience with a product.3 One avenue through which information flows within a multiproduct firm is revealed by Erdem (1998). She shows that consuming a product affects a consumer’s information set about other products of the same firm.4 Although our model differs from previous ones in the structure of the information set, our setting is somewhat similar to many of these studies. Specifically, we model product differentiation and heterogeneous consumers who are uncertain about product attributes. Their utility is a function of the match between their tastes and product attributes. For example, families like vans more than singles do. In previous studies of multiproduct firms, each firm offers a portfolio of products at each point in time. In contrast, in the model studied here, at each point in time t there are J multiproduct firms, and each of them offers a new product. Thus, each product is only consumed once in the time frame of our study.5 1 pro·file (prfl) n. A set of data portraying the significant features of something; a concise biographical sketch. (Merriam-Webster’s Collegiate Dictionary, 2001). 2 See Ogiba (1988) and Kesler (1989). 3 See, for example, Eckstein, Horsky, and Raban (1998), Erdem and Keane (1996), Crawford and Shum (2000), Shachar and Anand (1998), and Anand and Shachar (2001). 4 We highlight the differences between Erdem’s approach and ours later on. 5 Multiple appearances of a product serve studies that focus on dynamic learning from experience, but not ours which is focusing on the prior distribution. These differences express themselves in the data sets as well. We use data on purchases of multiple products of the same firm, while previous studies have used data on multiple purchases of the same product. 3 Following previous studies, we assume that consumers receive unbiased noisy signals on the attributes of each product. These signals come from previous experience with the good, media coverage, and other sources. Under these assumptions, a consumer’s expected utility from a product is a function of these product-specific signals and her prior beliefs about product attributes. We depart from previous studies by hypothesizing that these prior beliefs depend on the profile of the multiproduct firm. A firm’s profile is characterized by (a) the mean attributes across all products that it offers and (b) by the variance of these attributes. Hereafter, we refer to the first characteristic as the firm’s “image” and to the inverse of the variance as the firm’s “precision.” For example, if automobiles were to have only one attribute–miles per gallon–then the profile of any automaker would be completely characterized by the average miles per gallon across all its models and by the variance of this variable. For any given product, the consumer’s prior distribution of its attributes will then depend on this overall profile. In particular, the expectation of the prior is equal to the match between the consumer’s taste and the mean attribute of the firm. We show that the purchase probability of a product is a function of (a) the match between the consumer’s taste and the product attributes observed by the researcher, and (b) the match between the consumer’s tastes and the firm’s image. Notice that the purchase decision depends on the firm profile even if the individual previously did not consume any of this firm’s products. The prediction that the purchase probability depends on the firm’s profile differentiates the suggested model from previous ones. The model, and other testable implications, are presented in section 3 where we show that the effect of a firm’s profile on the purchase probability is not fixed. This effect should be smaller for consumers who are relatively well-informed about a specific product. It is also expected to be smaller for better-known products. Finally, the effect should be smaller for firms with a diverse product profile. Implications The behavioral and managerial implications of the model are presented in detail in sections 3 and 6, and briefly here. First, the model introduces a new source of consumer loyalty. As mentioned above, the purchase probability depends on product attributes and on a firm’s profile. While the first element is product specific, the second is common to all products of a given firm. This common element generates loyalty to multiproduct firms. This loyalty expresses itself in an individual’s tendency to purchase a product from the firm whose image best fits her taste even when the specific product does not match her preferences better than the products of competing firms. (We call this phenomenon “excess loyalty.”) For Although the setting of a firm offering a single product in any period might seem restrictive, it can be reinterpreted as a multiproduct firm offering a portfolio of products at a point in time, with t indexing the product categories. 4 example, since it is virtually impossible to know whether you like a book before you read it, individuals frequently make decisions based on a writer’s style. Thus, a consumer who likes Ernest Hemingway’s no-nonsense writing style–on themes such as war, blood sports, and crime, and about heroes who confront tremendous odds with “grace under pressure”–is likely to purchase his book Green Hills of Africa. This consumer would turn out to be surprised. Notice that although such behavior leads the individual to suboptimal choices in some cases, it is still optimal behavior given her uncertainty about product attributes. A consumer’s excess loyalty is explained in previous studies by including an unobserved individual-firm match parameter. Thus, while the traditional approach leaves the explanation of loyalty in a “black box”, we present a behavioral foundation for such loyalty. Our explanation, based on the informational role of multiproduct firms, thus contrasts with the traditional statistical solution (unobserved heterogeneity) to explain excess loyalty. The behavioral and statistical sources of loyalty have different managerial implications. Note that our model includes both sources of loyalty (behavioral and statistical) as well as state dependence parameters. Each of these sources can generate consumer loyalty. In section 4, we demonstrate how these different sources can be distinguished using a panel dataset. The second implication of the model relates to the issues of brand extensions and brand alliances. In section 6, we illustrate the consequences of extension decisions on a firm’s market share. When considering an extension, a brand manager is concerned typically about the spillover effects on the firm’s other products. It turns out that these spillover effects are richer than one might expect. Consider the following case. A firm adds a new product that does not change its image. While it may seem that such a change should not affect the market share of the firm’s other products, we show that it does. The reason is that such a change increases the firm’s precision, and as a result, consumers place less weight on the signals they receive on each product. Thus, our model offers a parsimonious way of thinking about brand extensions and brand alliances–in terms of their effect on the mean and variance of a firm’s profile–and maps new channels of spillovers. Third, we also illustrate the empirical bias in standard choice models that results from not including the firms’ profiles in the information set. Our Monte Carlo experiments, using a specific and reasonable set of parameters, reveal that this bias is significant. The estimate of the consumer taste parameter is downward biased by about 40%. Empirical Application Even though our model is relevant to several markets, it describes the media and entertainment industries particularly well. In the last decade the television, movie, music, 5 publishing, and new media industries have been growing rapidly.6 The relevant characteristics of these industries, as described in detail later, are: (1) consumers are bound to be uncertain about product attributes (because of constant changes in product attributes); (2) there is high product differentiation; (3) heterogeneity in consumer preferences is large; and (4) multiproduct firms with distinct profiles happen to be the norm in these markets.7 Thus, the setting of our model, while relevant to many industries, is especially germane here, and might be useful to researchers and practitioners seeking to understand consumer choices and firms’ strategies. Since the model is especially relevant to the entertainment industry, we test the model based on data from the television industry. By using a panel dataset on television viewing choices from 1995 and data on show attributes, we estimate the model and test its implications. In our application, the products are television shows and the major multiproduct firms are the four national television networks.8 We describe the datasets in section 2 and estimate the model and test its implications in section 4. The relevance of the model and of the various applications detailed above depends on its empirical validity. The results, presented in section 5, show that the data support the model. Specifically, our non-structural estimation finds that firms’ images affect product viewing choices. It turns out that this effect is as important as the effect of product attributes themselves. This evidence suggests that the new source of information introduced in this study is both statistically significant and behaviorally important. Our structural estimation, based on maximum simulated likelihood methods, reveals that these results hold even when all the restrictions of the model are imposed. For example, one restriction concerns the distribution of the unobserved signals on show attributes. This and other restrictions require multivariate integration of the likelihood function. To deal with the resulting complexity, we employ importance sampling. The structural estimation reveals substantial heterogeneity across viewers and networks in 6 See the survey of technology and entertainment in the November 21, 1998 issue of The Economist. These include, among others, Disney, AOL, NBC, Penguin Books, Julia Roberts, and Harrison Ford. Disney, for example, is family-oriented and thus avoids controversial shows even on ABC, the television network that it acquired. 8 Mankiw (1998) presents the network television industry as a good example of an industry with established multi-product firms. He, like others, refers to such firms as “brands”, and writes: “Establishing a brand name–and ensuring that it conveys the right information–is an important strategy for many businesses, including TV networks.” Furthermore, he reproduces a New York Times article (September 20, 1996) that reads: “In television, an intrinsic part of branding is selecting shows that seem related and might appeal to a particular certain audience segment. It means ‘developing an overall packaging of the network to build a relationship with viewers, so they will come to expect certain things from us,’ said Alan Cohen, executive vice-president for the ABC-TV unit of the Walt Disney Company in New York.” These quotes suggest that the television industry seems to be appropriate to examine the empirical questions at hand. 7 6 the precision and/or number of signals they obtain about product attributes. For example, we find that people are better informed about shows on ABC and NBC. Our estimation also reveals that the informative role of firms’ profiles is substantial. Specifically, when forming their expected utilities, individuals place the same weight on this information as on all the signals they receive about the products. Notice that this is consistent with the non-structural results. Using our structural estimates and two different measures of loyalty, we find that the informational source of loyalty is more important than that due to unobserved heterogeneity. Related Literature One of the closest studies to ours is Erdem (1998). She examines the extent to which a consumer’s perception of product quality is affected by his experience with other products of the same firm. Her paper follows theoretical studies that have focused on the informational role of multiproduct firms, referred to there as “umbrella brands” (Wernerfelt [1988], Montgomery and Wernerfelt [1992], and Cabral [2001]). While these studies focus on experience goods, our model is relevant mostly to products that have at least one non-experience attribute.9 A consumer who knows the attributes of such a product is assumed to be certain about her utility from the good. For example, many consumers buy a computer without testing it first.10 The difference between the focus of previous studies and ours expresses itself in the modeling approaches. Erdem shows that a consumer’s perception of product quality depends on his experience with other products of the same firm. Our model predicts that the purchase decision depends on the firm’s image even if the individual did not consume any of this firm’s products previously. Indeed, in the relevant markets for our model, consumers often have an informed perception of a firm’s offering even without direct experience. They may have gained this information through word of mouth or media coverage. For example, most consumers are aware of the differences between Japanese and Swedish auto makers even without having driven any such cars. The second, and related, difference between Erdem’s study and ours is that her study focuses on the role of firms as signals of product quality. In our setting, the signals convey information both on the vertical attributes of a product as well as on its horizontal characteristics.11 Over 9 These types of products are sometimes called “search goods”. The products in our data set, television shows, cannot be categorized as pure “search goods”, because even a full description of the products cannot resolve all of the uncertainty that an individual has about her utility. However, it is clearly the case that information about the products’ attributes decreases consumers’ uncertainty. For example, the uncertainty that an individual faces about a show’s attributes decreases when she knows its cast demographics. These attributes allow us to apply our model to the television data. 11 The terms vertical and horizontal differentiation are often used in the industrial organization literature. While all consumers have either positive or negative tastes for the vertical attributes of a product, some have positive, and others negative, tastes for horizontal attributes. 10 7 time, product diversity has been dramatically increasing in many markets. This factor is likely to intensify consumer uncertainty about the horizontal attributes of products. As a result, the informational role of firms as signals of these attributes has increased as well. Sullivan (1990) is the first to present non-experimental evidence for spillovers in umbrellabranded products. She provides a methodology to analyze spillovers and uses it to measure two instances of spillover effects. Her study shows that negative spillovers resulted from the Audi 5000’s problems with sudden acceleration and that positive spillovers resulted from Jaguar’s first major model change in 17 years. Sullivan’s approach elegantly exploits cases in which one can easily identify a unique event. Our methodology does not require such natural experiments. Furthermore, while she uses aggregate data, we use individual level panel data that allows us to account for individual-specific effects. The flow of information between a firm and its products has been examined using experimental data. Morrin (1999) shows that brand extensions can modify the perceived profile of a multiproduct firm. Simonin and Ruth (1998) and Park and Srinivasan (1994) demonstrate a similar phenomenon with respect to brand alliances. In another study somewhat similar to ours, Moshkin and Shachar (2001) show that state dependence can be due to asymmetric information and search cost. The main difference between their study and the one presented here is in the aim of the study: they focus on state dependence, which is a specific aspect of loyalty. State dependence is defined as repeat purchase in sequential periods. We, on the other hand, focus on loyalty across any two periods (not necessarily sequential ones). This difference then expresses itself in the modeling approaches.12 A brief description of the datasets in section 2 is used to motivate the description of the model in section 3. There, we describe the structure of the consumer’s information set and the implications of the model. Section 4 discusses estimation issues, and the results are presented in section 5. In section 6, we examine several applications of the model. Section 7 concludes. 2 Data Although the setting of the model is not industry-specific, we start by presenting the data. This approach is intended to make the presentation of the model intuitive. Our data include television viewing choices, viewers’ demographic characteristics, and show 12 For example, while the information set in Moshkin and Shachar (2001) depends on the previous purchase, ours does not. Also, their model focus on asymmetric information, while ours on the informational role of multi-product firms. 8 (product) attributes. The first two, from Nielsen Media Research, are described first. Show attribute data, created specially for this study, are described subsequently. 2.1 The Nielsen Data We obtained data on individuals’ viewing choices and characteristics from Nielsen Media Research, which maintains a sample of over 5,000 households nationwide.13 Nielsen installs a People Meter (NPM) for each television set in the household. The NPM uses a special remote-control to record arrivals and departures of individual viewers, as well as the channel being watched on each television set. Although criticized frequently by the networks, Nielsen data still provide the standard measure of ratings for both network executives and advertising agencies. Although the NPM is calibrated for measurements each minute, the data available to us provide quarter-hour viewing decisions, measured as the channel being watched at the midpoint of each quarter-hour block. The Nielsen data set records specific viewing choices for the four major networks only: ABC, CBS, NBC, and FOX. We focus on viewing choices for network television during prime time, 8:00 to 11:00 PM, using Nielsen data from the week starting Monday, November 6, 1995. Thus, we observe viewers’ choices in 60 time slots. Figure 1 provides the prime-time schedule for the four networks over this week. This study confines itself to East coast viewers, to avoid problems arising from ABC’s Monday night programming.14 Finally, we eliminate from the sample viewers who never watched television during weeknight prime time and those younger than six years of age. From this group, we randomly selected individuals with a probability of 0.5, giving us 1675 viewers. The other 1556 are kept for predictive validation. Nielsen also reports the age and the gender of each individual, and the income, education, cable subscription, and county size for each household. The definitions and summary statistics of the variables that we create appear in table 1. 2.1.1 Show Attributes We have coded the show attributes for the relevant week based on prior knowledge, publications about the shows, and viewing each one of them. Following previous studies, we categorize shows 13 Using 1990 Census data, the sample is designed to reflect the demographic composition of viewers nationwide. The sample is revised regularly, ensuring, in particular, that no single household remains in the sample for more than two years. 14 ABC features Monday Night Football, broadcast live across the country; depending on local starting and ending times of the football game, ABC affiliates across the country fill their Monday night schedule with a variety of other shows. Adjusting for these programming differences by region would unnecessarily complicate this study. 9 based on their genre and their cast demographics. Rust and Alpert (1984) present five show categories–action drama, psychological drama, comedies, movies, and sport–and show that viewers differ in their preferences over these categories. We use the following categories: situation comedies, also called sitcoms (31 shows fall into this category), action dramas (10 shows), and romantic dramas (7 shows). The base group includes news magazines and sports events, which Goettler and Shachar (2002) found to be similar. Shows also were characterized by their cast demographics. Shachar and Emerson (2000) demonstrate that the demographic match between an individual and a show’s cast plays an important role in determining viewing choices. For example, younger viewers tend to watch shows with a young cast; older viewers prefer an older cast. We use the following categories: Generation-X, if the main characters in a show are older than 18 and younger than 34 (21 shows fall into this category); Baby Boomer, if the main show characters are older than 35 and younger than 50 (12 shows); Family, if the show is centered around a family (11 shows); African-American (7 shows); Female (15 shows); and Male (22 shows). Table 2 and figure 2 illustrate the differences in show attributes across each of the four networks. One can see, for example, that FOX is more likely to air romantic dramas centered around Generation-X characters, whereas ABC is more likely to offer male-starring or family sitcoms catering to baby boomers. These statistics highlight the differences in the network profiles. 3 The Model We start by describing the setting of the model. Then, we introduce the utility function and the information set of the individual. Last, we present several implications of the model setup. 3.1 The setup There are J multiproduct firms. In each period t, each firm offers a single product. Each product of firm j is offered only once within the studied time frame. In the empirical example are 4 television networks, with each network broadcasting only one show within each time slot t. Thus, in the empirical example J = 4; a period, t, is also called a time-slot and lasts 15 minutes; and a product is a television show. There are I individuals who are indexed by i. They face J + 1 mutually exclusive and exhaustive alternatives. The (J + 1)’th alternative is the no-purchase option. In each period t, individual i makes a choice, Ci,t , from among these J + 1 options indexed by j. Thus, Ci,t = j 10 when individual i chooses alternative j at time t. Since each product is offered only once, individuals cannot learn from experience about product attributes within the studied time frame.15 Although the setting–of a firm offering a single product in any period–might seem restrictive, it can be reinterpreted as a multiproduct firm offering a portfolio of products at a point in time, with t indexing the product categories. 3.2 The utility The utility from the first J alternatives is: Ui,j,t = Xj,t β i + (ηj,t + εi,j,t ) + αi,j + δ i,j,t I{Ci,t−1 = j} (1) The first element of the utility represents the match between the product’s attributes, Xj,t , and the individual’s preferences, β i .16 The parameter vector β i is a function of observable and unobservable individual characteristics. The specifics of Xj,t and β i for the television example are presented in detail in section 4.1. For example, the interaction Xj,t β i includes the element ActionDramaj,t · (β Male AD · Malei ), where ActionDramaj,t is a binary variable equal to one if the show on network j in period t is an action drama, and zero otherwise; and the binary variable Malei is equal to one if individual i is male, and zero otherwise. If men like action dramas, the parameter β Male AD would be positive. The utility is also a function of the products’ attributes not observed by the researcher. These are represented by the second element of the utility–(η j,t + εi,j,t ). Common unobserved effects are captured by the parameter η j,t , while transitory and personal effects are represented by the random variable εi,j,t . Specifically, since some of the attributes are unobserved by the researcher, some of the match elements will be unobserved as well. The parameter ηj,t can be thought of as the mean (across individuals) of these unobserved matches and εi,j,t can be thought of as the deviations from that mean. Another interpretation of the parameter ηj,t is the quality of the product. Since the term “quality” might be misleading when describing television shows, we henceforth use the term “unexplained popularity” to describe this parameter.17 15 Consequently, our setting is different from previous studies which examined the role of uncertainty on product attributes (Eckstein, Horsky, and Raban 1988, Erdem and Keane 1996, Erdem 1998, and Crawford and Shum 2000). In these studies, each product was offered multiple times within the studied time frame, therefore the researchers focused on the dynamic learning of consumers through their experience with the product. Here, instead, consumers can rely on the profile of the multiproduct firm to resolve their uncertainty on product attributes. 16 The variable Xj,t is a K-dimensional row-vector; and the parameter β i is a K-dimensional vector. 17 In the industrial organization literature the element Xj,t β i is called “the horizontal dimension of utility”, and η j,t “the vertical dimension”. 11 The other elements in the utility concern its dynamic features. Specifically, the third element of the utility, αi,j , represents the unobserved match between individual i and firm j. This parameter is one of the sources of consumer loyalty to a multiproduct firm. It is the only element in the utility that does not change over time (and thus does not have an index t). It appears in individual i’s utility with each of the products offered by firm j. A positive αi,j increases individual i’s propensity to purchase each one of firm j’s products. Earlier, we referred to this unobserved heterogeneity parameter as a “black box” explanation for loyalty because it represents a statistical (not behavioral) solution to account for consumer loyalty to a firm. The fourth and last component of utility, δi,j,t I{Ci,t−1 = j}, represents the state dependence in choices. The indicator function I{·} is equal to one if the individual purchased the product offered by firm j in the previous period. The parameter δ i,j,t is a function of observable and unobservable individual characteristics, product attributes, and time. There are various reasons for state dependence. For example, after getting used to a firm’s product, an individual might find it easier or cheaper to use the subsequent product offered by the same firm. Previous studies of television viewing choices find strong evidence of state dependence even when unobserved heterogeneity is accounted for.18 Thus, we include this element in order to avoid misspecification of the utility. The specifics of δ i,j,t for the television example are presented in detail in section 4.1. For example, δ i,j,t includes the element δ F emale (1 − Malei ). If the tendency of women to flip channels were less than for men, the parameter δF emale would be positive. Both αi,j and δ i,j,t leads to consumer loyalty. However, while the effect of state dependence is limited to two sequential periods, the unobserved individual-firm match leads to loyalty in any two periods. The utility from the outside (non-purchase) alternative is: γ γ + (η J+1,t + αi,J+1 + εi,J+1,t ) + δ out,i I{Ci,t−1 = (J + 1)} Ui,J+1,t = Yi,t (2) γ is a vector of individual i’s observable characteristics, and the remaining elements are where Yi,t 18 In the television industry this well known phenomenon is called the “lead-in effect.” Darmon (1976) introduces the concept of channel loyalty and Horen (1980) estimates a lead-in effect, both using aggregate ratings models. Rust and Alpert (1984) use individual-level data to estimate an audience flow model, in which viewers are described as being in one of five states according to: whether the television was previously on or off; if it was on, whether it was tuned to the same channel as the current viewing option; and whether this option is the start or continuation of a show. Shachar and Emerson (2000) allow state dependence to vary across shows and across demographically defined viewer segments. Goettler and Shachar (2002) demonstrate that the cost of switching remains when unobserved heterogeneity is accounted for. 12 analogous to those defined earlier for the J alternatives.19 Thus, the outside utility is a function of the individual’s observed and unobserved characteristics and state dependence. 3.3 Information set The individual is uncertain about Xj,t and η j,t . The dynamic nature of the modern economy makes it difficult for consumers to stay informed about the attributes of all the alternatives. Today, product diversity is constantly increasing, and new products are introduced often.20 The television industry experiences these changes as well. Each fall, the networks introduce new shows and change the time at which many veteran shows are aired. This, obviously, makes it difficult for viewers to stay informed about the shows that appear on a specific time slot for all the networks. While some information on the attributes of television shows is available in daily newspapers, many other show attributes remain unclear. Uncertainty about Xj,t and ηj,t leads to uncertainty about (η j,t +Xj,t β i ). Since this expression represents the contribution of product attributes to utility, we term it “attribute utility”. We denote this element as ξ i,j,t ≡ ηj,t + Xj,t β i . A consumer’s information set includes (a) a prior distribution of products’ attributes, and (b) product-specific signals such as media coverage, word-of-mouth, and previous experience with the good. We depart from previous studies by hypothesizing that the prior beliefs depend on the profile of the multiproduct firm. Specifically, the prior distribution of individual i on ξ i,j,t is:21 ξ i,j,t ∼ N(µi,j , 1 ς µi,j ) (3) where, by definition,22 µi,j ≡ Et [ξ i,j,t ] = Et [η j,t ] + Et [Xj,t ]β i (4) and 1 = Vt [ξ i,j,t ] ς µi,j The multiproduct firm’s profile for individual i is characterized by two parameters: (1) µi,j 19 These characteristics might change over time for reasons that would become clear in Section 4. For example, even in a small country like Israel, there are 3000 new products in the supermarkets every year (Ha’aretz, November 1998). 21 We actually assume that both η j,t and Xj,t follow a normal distribution, and present the resulting distribution for the “attributes utility”. 22 Et [·] is the expected value across time slots. 20 13 and (2) ς µi,j . We assume that Et [ηj,t ], Et [Xj,t ] and Vt [ξ i,j,t ] are known to all consumers. In other words, while the individual is uncertain about product attributes, she knows the profile of each firm. For example, while most consumers do not know exactly the attributes of each automobile offered by Honda, they know that Honda tends to produce gasoline-efficient cars.23 Consumers also get product-specific signals through word-of-mouth, media coverage, previous experience with the good, and advertising. Since we do not observe these signals, we can model them as a single signal instead of multiple signals without loss of generality. Specifically, the signal that individual i receives on the product offered by firm j at period t is: Si,j,t = ξ i,j,t + ωi,j,t (5) ω i,j,t ∼ N(0, ς ω1 ) (6) where i,j The unbiased signal is assumed to be noisy, because none of these sources of information is precise. For example, even viewing previous episodes of a show does not provide exact information on the current one, because the focus of the show varies from week to week. While in one episode the plot is centered on romantic issues, the next one might be focused on career matters. We allow the precision of the signal, ς ωi,j , to differ across individuals, i, and firms, j. Differences in ς ωi,j across the firms may result from differences in the number of signals individuals receive, since the products of some firms simply may be more well-known than of others. Intuitively, if 1 ςω i,j = 0 (i.e., the signal is not noisy), then exposure to a signal resolves all uncertainty about product attributes. If ς ωi,j > 0, by contrast, an individual who is exposed to such a signal would still not be sure about what the product attributes are.24 23 Similarly, consider the attributes of newspapers (these may include, for example, the amount of space devoted to sensational news items or “trash”, and the amount of investigative reporting). While the level of each attribute may differ with each daily edition of the newspaper, it is still the case that The New York Times has an image that is quite different from The New York Post. In this example, a newspaper is the multiproduct firm and each issue of the newspaper is a product. Similarly, consider movie stars (where the movie star is the analog of the multiproduct firm, and each movie is a product). Although most movie stars have acted in movies of different genres, it is still the case that most movies in which Bill Murray participates are quite different from Sylvester Stallone’s, resulting in distinct images of actors. 24 For example, an individual may know that E.R. is a popular drama set in the emergency room of a hospital, but may know nothing else about the show (e.g., the cast demographics, the show’s time slot, the degree of action or romantic content, etc.). 14 3.4 Expected utility Having described the setup of the model, the utility function, and the consumer’s information set, we are ready to solve the expected utility. Since the only element in the utility that the individual is uncertain about is her “attribute utility”, ξ i,j,t , we start by solving the expected attribute utility. Individual i updates her prior using the signal to form a posterior distribution of the attributeutility, ξ pi,j,t ∼ N(µpi,j,t , ς p1 ). The mean, µpi,j,t , and precision, ς pi,j , of this posterior distribution are i,j given by:25 µpi,j,t = 1 ς pi,j h i 0 ς µi,j µi,j + ς ωi,j Si,j,t ς pi,j = ς µi,j + ς ωi,j (7) (8) 0 where Si,j,t is the realization of the signal. The precision of the posterior, ς pi,j , is the sum of the precision of each source of information. More importantly, the expected attribute-utility, µpi,j,t , is 0 , and the mean of the firm’s a weighted combination of the product-specific signal realization, Si,j,t profile for individual i, µi,j . The weight on each element is a positive function of its precision. For example, the weight on µi,j , which we denote by θi,j , is ςµ i,j ω ςµ i,j +ς i,j θi,j ≡ (9) The more precise is the product-specific signal, the less important is a firm’s profile in determining the expected attribute-utility from the product. 0 = ξ i,j,t + ω0i,j,t , where ω 0i,j,t is the realization of ω i,j,t , we can rewrite equation (7) Since Si,j,t as: µpi,j,t = where £ ¤ θi,j µi,j + (1 − θ i,j ) ξ i,j,t + ω 0i,j,t ω i,j,t ≡ · ςω i,j µ ς i,j +ς ω i,j ¸ (10) ω̃ i,j,t The researcher does not observe ω 0i,j,t . It is distributed normally with mean 0 and variance σ 2ω0 ,i,j ≡ ςω i,j µ 2 (ς i,j +ς ω i,j ) 25 . See DeGroot [1989]. 15 Both θi,j and σ2ω0 ,i,j can be considered as measures of how ill-informed the individual is. Each of these parameters is a negative function of the precision of the signal, ς ωi,j . For example, whenever the signal is noisy ( ς ω1 > 0), we get that θi,j > 0, and thus, the individual relies on the firm’s i,j profile when forming her expected attribute-utility. In contrast, whenever the signal is not noisy ( ς ω1 = 0), it follows that θi,j = σ2ω0 ,i,j = 0 and thus, µpi,j,t = ξ i,j,t . In other words, when the signal is i,j not noisy, the individual is fully informed, and thus, her expected attribute utility is equal to her actual attribute utility. Thus, the full information model is nested within our model. Using (4), we can rewrite the individual’s expected attribute-utility as: ­ ® µpi,j,t = θi,j Et [ηj,t ] + (1 − θi,j ) ηj,t + hθ i,j Et [Xj,t ] + (1 − θi,j ) Xj,t i β i + ω 0i,j,t (11) and her expected utility as: ® ­ E[Ui,j,t ] = θi,j Et [ηj,t ] + (1 − θi,j ) ηi,j + hθi,j Et [Xj,t ] + (1 − θ i,j ) Xj,t i β i (12) + ω 0i,j,t + εi,j,t + αi,j + δ i,j,t I{Ci,t−1 = j} In the next subsection, we derive the implications of this model. In order to assess their novelty, we compare them to the implications of a model that differs from the suggested model in one aspect–a consumer’s information set is not a function of multiproduct firms’ profiles. In other 1 ), where µ0i,j and ς µ,0 words, the prior distribution is ξ i,j,t ∼ N(µ0i,j , ς µ,0 i,j are not a function of i,j the firm’s profile. It is easy to show that in such a case, the expected utility is: E[Ui,j,t ] = ηj,t + Xj,t β i + α1i,j + εi,j,t + ω 1i,j,t + δ i,j,t I{Ci,t−1 = j} where α1i,j 3.5 = αi,j + µ0i,j and ω 1i,j,t = " ς ωi,j ω ς µ,0 i,j + ς i,j # (13) ω i,j,t Implications We describe below the implications for an individual’s purchase probability, which is directly related to her expected utility. The first implication is that the purchase probability of a product is a function of the multiproduct firm’s profile. Since the profile is a function of the set of products offered by the firm, 16 this means that the attributes of any product offered by a firm will affect the demand for any other product produced by this firm.26 This first implication highlights the spillover effects within a multiproduct firm. The magnitude of the spillover effects is determined by θi,j (see equation 10). Notice that the spillover effect varies across individuals and firms. Specifically, the effect is a negative function of the precision of the product-specific signals, ς ωi,j , and a positive function of the precision of the firm, ς µi,j . In other words, the spillover effects are large for firms that offer a homogenous line of products and small for (a) individuals who received many signals and (b) products that are well known. The second implication is that the inclusion of the multiproduct firm profile in the information set leads to consumer loyalty. The expected utility (equation 12) includes the element θi,j Et [Xj,t ]β i that does not vary over time (since Et [Xj,t ] is the same in every period). It appears in individual i’s utility for each product offered by firm j. Thus, a positive match between the firms’ image and the consumers’ taste (Et [Xj,t ]β i > 0) increases her propensity to purchase each one of this firm’s products. We term this source of loyalty “informational attachment”. This loyalty expresses itself in an individual’s tendency to purchase a product from the firm whose image best fits her taste even when the specific product does not match her preferences better than the products of the competing firms. (One might also term this phenomenon “excess loyalty”27 ). Notice that the benchmark model seems to include a similar attachment element, αi,j + µ0i,j . However, while in the benchmark model the α and µ cannot be distinguished empirically, one can separate these two effects using the approach presented here (as illustrated in section 4.4). Thus, while the benchmark model leaves the explanation of loyalty in a “black box,” we present a behavioral foundation for such loyalty. Our explanation, based on the informational role of multiproduct firms, contrasts with the traditional statistical solution (unobserved heterogeneity) to explain excess loyalty. 4 Estimation and identification issues We start by specifying the exact functional forms used in the estimation. Then we construct the likelihood function and, finally, we discuss the sources of identification of the model’s parameters. 26 Thus, when a firm replaces some of its products with new ones that differ in their attributes from the previous ones, it is altering its image and, therefore, affecting the demand to each one of its existing products. As an illustration, Julia Roberts’s image as an actress in “light” movies may have changed after Mary Reilly. 27 Recall that the model includes additional sources of loyalty: (a) the unobserved individual-firm match, αi,j , and (b) the state dependence parameters, δ i,j,t . 17 4.1 4.1.1 Functional form and structure Utilities The following equations present each of the elements in the utility for the television example and specify its full structure. Attribute utility We formulate the attribute utility as: ηj,t + Xj,t β i = η j,t + β Gender I{the gender of viewer iand show j, t is the same} +β Age0 I{the age group of viewer i and show j, t is the same} +β Age1 I{the distance between the age group of viewer i and the cast of show j, t is one} +β Age2 I{the distance between the age group of viewer i and the cast of show j, t is two} +β F amily I{ilives with her family and show j, t is about family matters} +β Race Incomei I{one of the main characters in show j, t is African American} +Sitcomj,t (β Sitcom Yiβ + viSitcom ) +ActionDramaj,t (β AD Yiβ + viAD ) +RomanticDramaj,t (β RD Yiβ + viRD ) where Yiβ is a vector of individuals’ observable characteristics that include T eensi , GenerationXi , BabyBoomeri , Olderi , F emalei , Incomei , Educationi , F amilyi , and Urbani . These variables are defined in the appendix. The parameters viSitcom , viAD , and viRD capture unobserved individual tastes for sitcoms, action dramas, and romantic dramas, respectively. The first six β parameters capture the effect of cast demographics on choices. As mentioned above, previous studies demonstrated that viewers have a higher utility from shows whose cast demographics are similar to their own. Thus, we expect to find that: (1) β Age0 > β Age1 > β Age2 , (2) β Gender > 0, (3) β F amily > 0 and (4) β Race < 0. We use individuals’ income as a proxy for her race, since information on race is not included in our data set.28 The taste for different show categories is a function of observed (Yiβ ) and unobserved (viSitcom , viAD , viRD ) individual 28 The proportion of African-Americans in the highest income category is disproportionately low, while it is disproportionately high in the lowest income category. This relationship persists for all income categories in between as well (U.S. Census Bureau 1995). Nielsen designed the sample to reflect the demographic composition of viewers nationwide and used 1990 Census data to achieve the desired result. We found that the income categories and the proportion of African-Americans in the Nielsen data closely match those in the U.S. population (National Reference Supplement 1995). Although our data set does not include information about race, Nielsen has it and reports its aggregate levels. 18 characteristics. Each of these interactions between show category and individual characteristics is captured through a unique parameter. For example, the interaction between an Action Drama emale . All other parameters are denoted accordingly. show and a female viewer is captured via β FAD While we can, theoretically, estimate an η j,t for each time slot-alternative combination (subject to one normalization), we choose to fix the “unexplained popularity” parameter (η) for the duration of each show. Consequently, a half-hour show and a two hour movie both have one η parameter. Given our intent in uncovering fundamental attributes of the shows, this is a natural restriction. State dependence parameters The utility presented in equation (1) included a state dependence element, δ i,j,t I{Ci,t−1 = j}. Here we specify the structure of δ i,j,t and extend the state dependence to include another element. Specifically, we formulate the state dependence in the network utility as: δ i,j,t I{Ci,t−1 = j} + δ InP rogress I{Ci,t−1 6= j}I{The show on j started at least 15 minutes ago} where δ δ X + vδ Yiδ δ Y + Xj,t i +δ F irst15 I{The show on j started within the past 15 minutes} δ i,j,t = +δ Last15 I{The show on j is at least one hour long and will end within 15 minutes} +δ Continuation I{The show on j started at least 15 minutes ago} where the observed variables included in Yiδ are T eensi , GenerationXi , BabyBoomeri , Olderi , δ inF emalei , F amilyi , and cable subscription status (Basici and P remiumi ), and the vector Xj,t cludes the following show categories: Sitcomj,t , ActionDramaj,t , RomanticDramaj,t , NewsMagazinej,t and Sportj,t . The binary variable Basici is equal to one for the one third of the population that has access to basic cable offerings only, and the binary variable P remiumi is equal to one for the one third of the population that has both basic and premium cable offerings. The binary variable Sportj,t is equal to one for the sport shows (Monday Night Football on ABC and Ice Wars on CBS), and the binary variable NewsMagazinej,t is equal to one for news magazines (e.g., 48 Hours on CBS). We allow the state dependence parameters to vary across age groups, gender, and family status because previous studies (Bellamy and Walker 1996) find preliminary evidence suggesting differences in the use of the remote control across these groups. We also allow these parameters to 19 differ across individuals for unobserved reasons through viδ . These parameters can differ across show types as well. For example, we may expect δ to be smaller for sports shows, since there is no clear plot in these shows when compared with dramas. Finally, we allow the state dependence parameters to depend on time. Specifically, we expect δ to be low in the first 15 minutes of a show, since the viewers have not had enough time to “get hooked” by the show. For the same reason, we expect the state dependence to be high during the last 15 minutes of a show. Furthermore, we expect that the state dependence is higher during a show than between shows (δ Continuation > 0). Last, δ InP rogress applies to individuals who were not watching network j in the previous time slot. Since the tendency to tune into a network to watch a show that has already been running for at least 15 minutes should be lower than for a show which has been on the air for less than 15 minutes, δInP rogress is expected to be negative. We formulate the state dependence parameters in the outside utility as: δ out,i I{Ci,t−1 = (J + 1)} = Yiδ δ Y + viδ +δout + δ Hour I{The time is either 9:00 PM or 10:00 PM} +δF OX10:00 I{Ci,t−1 = F OX}I{The time is 10:00 PM} I{Ci,t−1 = (J + 1)} Individual characteristics (Yiδ δ Y + viδ ) are included in exactly the same way for the outside alternative because they are meant to represent behavioral attributes intrinsic to individuals. Since the outside alternative includes the option to watch non-network shows, we allow the state dependence parameters to change “on the hour”. Notice that many non-network shows end on the hour, and thus we expect the δ to be lower at that time (δ Hour < 0). Furthermore, since FOX ends its national broadcasting at 10:00 PM our data cannot distinguish between viewers who stayed with FOX thereafter and those who chose the outside alternative. Thus, we expect δ F OX10:00 to be positive. Outside alternative Other than the standard observable individual characteristics (age, gender, income, education, family status, and area of residence), the vector Yiγ includes the variables Basici , P remiumi , Alli , and Samei,t . The cable subscription status is included here since the outside alternative includes the option of watching non-network shows–viewers with basic or premium cable have a larger variety of choices, which can lead to a higher utility. The variable Alli is equal to the average time the individual watched television during the previous days of the 20 week, and Samei,t is equal to the average time she watched television in the corresponding time slot, t, during the previous days of the week. Individuals’ tendencies to watch television cannot be fully explained by their demographic characteristics. Thus, their prior viewing habits (Alli , and Samei,t ) and the personal unobserved parameters αi,J+1 are designed to capture other sources of such differences. We allow αi,J+1 to differ across different hours of the night. Furthermore, since our data set starts on Monday, the variables Alli , and Samei,t , based on viewing habits in the previous days of the week, have missing values for this day. We include specific parameters to account for this: γ Monday8:00 is added to the outside utility for the first hour on Monday night prime time, and γ Monday9:00 and γ Monday10:00 are defined analogously. Finally, although we can, in principle, estimate η J+1,t for each of the 60 time slots of the week, we impose the following restriction: η J+1,t = η J+1,t+12 = η J+1,t+24 = ηJ+1,t+36 = ηJ+1,t+48 for t = 1, ..., 12. This implies that the outside utility between 8:00 and 8:15, for example, is the same across all the nights of the week. This allows us to identify the expected increase in the outside utility during the night, but with 12 parameters instead of 60. 4.1.2 The profile of a multiproduct firm The prior distribution was defined in (4) by two parameters µi,j ≡ Et [ξ i,j,t ] and 1 ςµ i,j = Vt [ξ i,j,t ]. T T P P (η̂j,t + Xj,t · β̂ i ) and ς̂ µi,j = These parameters are estimated as follows: µ̂i,j = T1 ξ̂ i,j,t = T1 t=1 t=1 µ i2 ¶−1 T h P 1 . In other words, we use the empirical moments of ξ̂ i,j,t to estimate the ξ̂ i,j,t − µ̂i,j T −1 t=1 expectation and variance of ξ i,j,t . It is known in the television industry that programming between 10:00 and 11:00 PM is somewhat different than that between 8:00 and 10:00 PM. For example, the networks do not schedule any sitcoms after 10:00 PM. Since this strategy is well-known, viewers are likely to have different prior beliefs about the scheduling for these two parts of the night. Therefore, we set two prior distributions for each network. Each one of these priors depends only on the shows scheduled during the relevant part of the night.29 29 That is, for example, for shows aired between 8:00 to 10:00 PM, µ̂8−10 = i,j all the time slots between 8:00 and 10:00 PM during the week. 21 1 40 P t²t8−10 ξ̂ i,j,t , where t8−10 is the set of 4.1.3 Density functions We assume that the εi,j,t are drawn from independent and identical Weibull (i.e., independent type I extreme value) distributions. As McFadden (1973) illustrates, under these conditions the viewing choice probability is multinomial logit. Define the vector vi as (αi,ABC , αi,CBS , αi,NBC , αi,F OX , αi,OU T,8:00 , αi,OU T,9:00 , αi,OU T,10:00 , viSitcom , m m m viAD , viRD , viδ , ς m i,ABC , ς i,CBS , ς i,NBC , ς i,F OX ). That is, vi represents all the unobserved personal parameters of the model. Define the joint density function of this vector by φ(v). Following Kamakura and Russell (1989), this function is assumed to be discrete. Specifically, vi = vk with probability PKexp(λk ) k=1 exp(λk ) for all k. This means that we allow the population to be divided into K different unobserved segments. The number of types K is determined using various information criteria. 4.1.4 Normalizations Since we have five alternatives in each time slot, we can only identify four αs. Thus, we normalize αk,ABC = 0 for each type k. Since we include all the age categories in Yiβ , one needs to normalize the viSitcom , viAD , viRD for one of the unobserved types, therefore we set vkSitcom =vkAD = vkRD = 0 for k = 1. Also, since all the age categories are included in Yiγ , one needs to normalize the αi,J+1 for one of the types; we do so by setting αk,J+1,8:00 =αk,J+1,9:00 =αk,J+1,10:00 = 0 for k = 1. Similarly, since all the age categories are included in Yiδ , one needs to normalize the viδ for one of the types; to do so, we set vkδ = 0 for k = 1. To normalize one of the ηJ+1,t , we set η J+1,8:00 = 0. Finally, since we include all the show δ , we need to normalize one of them, therefore we set δ categories in Xj,t NewsMagazine = 0. Since all the age categories are included in Yiγ , one needs to normalize the η j,t of one of the shows, therefore we set η of FOX’s X Files to be equal to zero. Furthermore, since we estimate the ηj,t for each show, we are required to normalize one of the β Age parameters, and the β Sitcom , β AD eens eens eens and β RD for one age group. We do so by setting β Age2 = β TSitcom = β TAD = β TRD = 0. Finally, we cannot estimate all K values of λk (which determine the size of the types). Therefore, we set λk = 0 for k = 1. 4.2 The likelihood function For the econometrician, the viewing choice is probabilistic, since we do not observe εi,j,t . Furthermore, since the εi,j,t are independent over time, the conditional probability of each viewer’s 22 history of choices for the entire week, Ci = (Ci1 , . . . , CiT ), is simply the product of the conditional probabilities of each of the choices made at each quarter-hour. That is, for a type k person, f1 (Ci |Wi ; vk , ω 0i , Ω) = J P [I{Ci,t = j} exp(U i,j,t (Ci,t−1 , Wi,j,t , vk , ω 0i,j,t , Ω))] T Y j=1 t=1 J P j=1 (14) exp(U i,j,t (Ci,t−1 , Wi,j,t , vk , ω 0i,j,t , Ω)) where Wi,j,t is a vector of all the observable variables in the model (that is, Xj,t and Yi ); Wi = S j,t Wi,j,t ; Ω is the vector of the parameters common to all individuals (that is η j,t , β Gender , β Age0 , β Age1 , β Age2 , β F amily , β Race , β Sitcom , β AD , β RD , δ Y , δ X , δ F irst15 , δ Last15 , δ Continuation , δ InP rogress , S δ out , δ Hour , δ F OX10:00 , and γ); ω0i = j,t ω 0i,j,t and U i,j,t (·) ≡ Ui,j,t (·) − εi,j,t . Specifically, U i,j,t (·) = 1 ς pi,j h i ς µi,j µi,j + ς ωi,j ξ i,j,t + ω 0i,j,t + αi,j + δ i,j,t I{Ci,t−1 = j} (15) + I{Ci,t−1 6= j}I{The show on j started at least 15 minutes ago}δ InP rogress Since we do not observe each viewer’s type and signals, we integrate out unobservable parameters and variables. The resulting marginal distribution is 0 f2 (Ci |Wi ; Ω , Pω ) = Z X K exp(λk ) pω (z)dz f1 (Ci |Wi ; vk , z, Ω) PK k=1 exp(λk ) k=1 (16) where Ω0 includes Ω, the vk s, and the λs; and pω (z) is the density function of the true distribution of the ω 0i,j,t , Pω . Notice that for individual i the variable ω 0i,j,t is constant for the duration of each show. Thus, the integral over z represents 64 integrals for the 64 shows in the week for each individual. Finally, notice that since the ω 0i,j,t are show specific, each one of them appears in at most 8 time slots. Thus, none of the integrals should include the entire history of viewing choices. For example, on Wednesday between 10:00 and 11:00 PM, each of the three major networks airs a one-hour show. Thus, the integrals of the ω 0i,j,t of these shows apply only to the choice probabilities of that hour. We rewrite the likelihood following this simplification in the most compact way. 23 4.3 Simulating the marginal probability Since ω 0i,j,t is normally distributed, the integral in equation (16) does not have a closed-form solution. A consistent and differentiable simulation estimator of f2 (·) is R K 1 XX exp(λk ) fˆ2 (Ci |Wi ; Ω0 , PωR ) = f1 (Ci |Wi ; vk , zr , Ω) PK , R r=1 k=1 exp(λk ) (17) k=1 where the z’s are randomly drawn from the population density, PωR . The Maximum Simulated Likelihood (MSL) estimator is Ω̂MSL = argmax I X i=1 h i log fˆ2 (Ci |Wi,j,t ; Ω0 , PωR ) , (18) where I denotes the number of individuals. As explained in McFadden (1989) and Pakes and Pollard (1989), the R variates for each individual’s ω must be independent and remain constant throughout the estimation procedure. A drawback of using MSL is the bias of Ω̂MSL due to the logarithmic transformation of f2 (·). Despite this bias, the estimator obtained by MSL is consistent if R → ∞ as I → ∞, as detailed in Proposition 3 of Hajivassiliou and Ruud (1994). To attain negligible inconsistency, Hajivassiliou (1997) suggests increasing R until the expectation of the score function is zero at Ω̂MSL .30 In our case, this is achieved at R = 400. In order to reduce the variance of fˆ2 (·), we employ importance sampling as described in the MC literature (see Rubinstein 1981). Our importance sampler is similar to the one used in Berry, Levinsohn, and Pakes (1994). We draw the z’s from a multivariate normal approximation of each person’s posterior distribution of z, given some preliminary MSL estimate of Ω, and appropriately weight the conditional probabilities to account for the oversampling from regions of z that lead to higher probabilities of i’s actual choices. 4.4 Identification We start by considering the identification of a model with full information (that is, under the assumption that 1 ςω i,j = 0 for all i and j). This discussion illustrates which parameters can be identified without the restrictions and additional variables of the partial information model. 30 We simulate all stochastic components of the model to construct an empirical distribution of the score function at Ω̂MSL . A quadratic form of this score function is asymptotically distributed χ2 with degrees of freedom equal to the number of parameters estimated. 24 4.4.1 Identification of a model with full information The parameters β Gender , β Age0 , β Age1 , β Age2 , β F amily , β Race , β Sitcom , β AD , β RD are identified by the correlation between Xj,t · Yiβ and viewer choices. For example, if (conditional on all other elements of the model) the proportion of females, among viewers who watch romantic dramas, is emale higher than the proportion of males, then the estimate of females’ taste for such shows, β FRD , would be higher than males’, β Male RD . The unobserved tastes for show categories (the parameters viSitcom , viAD , viRD ) are identified by viewer choice histories over show types. To clarify, assume there are no observed individual characteristics, Yi . Now, consider the case when, on a particular evening, a network switches from a sitcom to a drama (as CBS does at 9:00 PM on Wednesday). Then, if (conditional on all other elements of the model) a large proportion of viewers that were watching CBS were found to switch to a different network that airs sitcoms (indeed, 32 percent of the CBS viewers switched to the ABC sitcom), then we identify an unobserved segment who likes sitcoms. The δ parameters are identified by the conditional state dependence–that is, by the share of viewers who remain with an alternative over two sequential time slots, conditioning on Xj,t . For example, we find that female viewers like to watch romantic shows. If one of the networks switched from a romantic show to an action drama (e.g., CBS on Wednesday at 10:00 PM)–which female viewers dislike–and many female viewers did not switch to the network that aired romantic shows, then that would be an indication of a positive δ. The parameter αi,j is identified by the history of viewer choices of networks. That is, if a segment of viewers spent most of its time watching the shows on network j independent of the characteristics of these shows, then we identify an unobserved segment whose αi,j with this network is positive.31 Finally, the parameter η j,t is identified by the aggregate ratings, conditional on the show’s characteristics. 4.4.2 Identification of a model with partial information The partial information model imposes some restrictions on the parameters and introduces new explanatory variables. These identify the information set parameters. We start by discussing the 31 As discussed in the literature, there are various sources of identifying δ separate from α. See Chamberlain (1985) and Shachar (1994). The outside alternative provides us with an additional identifying source. When turning on the television, the individual’s “state” (lagged choices) does not attach her to any network. Thus, her viewing choice is influenced by α (and show characteristics), but not by δ. 25 estimation of the prior distribution parameters and proceed (in the next subsection) by presenting the identification of the signals’ parameters. Recall that the estimation of the prior distribution parameters was already discussed in subsection 4.1.2. Once the parameters β and η are identified, we also have all the variables which are a function of them, i.e., ξ i,j,t , µi,j and ς µi,j . Since one of the advantages of our model is the empirical distinction between αi,j and µi,j , it T P is worth clarifying the identifying source of this distinction. In the model we set µ̂i,j = T1 ξ̂ i,j,t = t=1 ¶ µ T T 1 P 1 P η̂ + Xj,t β̂ i ). Thus, the identification of µi,j is based on the introduction of a new j,t T T t=1 t=1 P explanatory variable–the mean offering of each network (for example, T1 t Xj,t for network j). The following exercise might shed some additional light on the distinction between the two sources of loyalty, αi,j and µi,j . Assume that we estimate the benchmark model (that does not include the multiproduct firm profile in the information set). Since we have a long panel for each viewer, we can estimate an individual—firm match parameter for each combination of individual and firm. We denote this “fixed effect” as Li,j and note that it has 6700 different values (i.e., JI). After we estimate Li,j using the benchmark model, we can run the following regression: P Li,j = aj +b( T1 t Xj,t )+εL i,j . We estimate the parameters aj and b and then calculate the predicted value of the error term, ε̂L i,j . We can now separate between the observed individual—network match, 1 P b̂( T t Xj,t ), and the unobserved match, ε̂L i,j . In the estimation of our model (versus the benchmark model) this separation between the observed match, µi,j , and the unobserved match, αi,j , is built into the likelihood function, because the model includes the multiproduct firms’ profiles in the information set. In other words, µi,j is identified by the loyalty of viewer i to network j correlated with the attributes of i and j, whereas αi,j is identified by non-systematic (random) loyalty. Precision of information signals We are now ready to discuss the identification of the precision of the product-specific signals, ζ ωi,j . Notice that ζ ωi,j enters the likelihood via U i,j,t in (15) in two places. First, it determines the relative weight on ξ i,j,t versus µi,j . Second, it determines the variance of ω 0i,j,t (via pω (•) in equation (16)). It follows that ζ ωi,j has two sources of identification. The first is the effect of firms’ profiles on product choices. If the individual were fully informed, she would base her product choices on show attributes only, and place zero weight on the network images. When the individual places zero weight on a network’s image, θi,j = 0 and from equation (9) we get that 1 ζω i,j = 0, that is, the signal is not noisy. Conversely, the stronger the effect of the networks’ image on product choices, 26 the smaller the estimate of ζ ωi,j . Notice that this source of identification relies on observed product and firm attributes. The second source of identification relies on the variance of ω 0 . In the model, the larger the variance of ω 0i,j,t , the smaller the correlation between product attributes and choices.32 Since the variance ω 0i,j,t is a function of ζ ωi,j , the correlation between choices and product attributes assists us in identifying the precision of the product-specific signal. 5 Results Our model has various implications (for example, relating to the variance of the unobservables) which assist in its empirical identification. Before estimating the structural model that incorporates all these implications, we first examine a simpler version of the model that embodies a key implication–specifically, that product purchase probabilities depend not only on product attributes but also on the profile of firms. Recall that we have offered two measures of how ill-informed the individual is: θ and the variance of ω0 . In the non-structural estimation, we show only that θ is positive and statistically different from zero.33 5.1 Non-structural estimation We estimate a simple version of the model described earlier in order to focus on the role of firm profiles on product choices. This model differs from the one presented earlier in the following ways: (1) the structure of the δ and γ parameters is simpler, and there are no show-specific η 34 (2) ω 0i,j,t = 0, and (3) instead of imposing any structure on θi,j as in (9), we estimate it directly as a free parameter θ. Thus, the mean posterior is: µpi,j,t = ¶ µ T P θ[ T1 Xj,t ] + (1 − θ)Xj,t β (19) t=1 As demonstrated in section 3, the full information model predicts that θ = 0, whereas the partial information model implies that θ 6= 0. 32 An easy way to think about this effect is by analogy with a simple regression model. Consider the case where Yi = α + βXi + εi . We know that the larger the variance of εi , the smaller the observed correlation between Yi and Xi . ω 0i,j,t plays a role in our model similar to the role of εi in this simple regression. 33 (1) We estimate our model using Gauss 4.0 on a Pentium Pro processor. (2) The number of unobserved segments K–determined by minimizing the Bayes Information Criterion (BIC)–in both the structural and non-structural model is 5. (3) All estimated parameters which are not reported here are posted on: http://www.tau.ac.il/~rroonn/Vitae.html 34 Specifically, we estimate only three δ parameters (δ 0 , δ Continuation , and δOutside ) and we do not estimate a separate γ for every 15 minutes (η J+1,t = η for every t). 27 Our estimate of θ is 0.56 and is significantly different from zero (at the 0.01 significance level), easily rejecting the hypothesis that individuals are fully informed. This result implies that individuals use the image of the networks to resolve some of their uncertainty about product attributes. In fact, the coefficient estimates suggest that the effect of the firm attributes is larger than the effect of show attributes on utilities. The following simple example illustrates the magnitude of the impact of network profiles on choices. For individuals who most prefer sitcoms in our sample, the probability of watching the ABC sitcom Grace Under Fire is 0.11. If ABC were to replace all its action dramas with sitcoms, that individual’s probability of watching this same show would instead have been 0.16.35 Notice that the parameter θ is identified directly from the correlation between viewing choices and the average firm µi,j . It therefore serves as a simple test that can be conducted by any researcher to assess the role of uncertainty and the importance of firm profiles on product choices. The structural estimation that follows identifies the parameters of the information set by exploiting all the implications of the model rather than a single one. That is, it ensures that additional restrictions imposed by the theory are satisfied (for example, θ depends on the variance of the underlying show attributes). Furthermore, the estimates of the utility parameters and information set allows us to conduct various counterfactual simulations as part of several applications. 5.2 Structural estimation In this section, we first present the estimates of the parameters of the utility. Following that, we present the estimates of the parameters of the information set. The integral in equation (17) is evaluated numerically using importance sampling with 400 points from a Sobol sequence, as detailed in section 4.3. The (asymptotic) standard errors are derived from the inverse of the simulated information matrix.36 We report the results for a model with 5 segments (K = 5) in tables 3(a)-3(f). The number of unobserved segments was determined by minimizing the Bayes Information Criterion (BIC). The largest segment consists of about 41% of the population, while the proportion of the smallest segment is about 13%. The sizes of the other segments are 0.18, 0.14, and 0.14. 35 Specifically, the viewing probabilities are those of an individual in the largest segment (that with the highest β Sitcom ) for the first timeslot of Grace Under Fire, conditional on her actual choices in the previous time slot. The µi,j of ABC is computed as the average of individuals’ µi,j for ABC (i.e., j = 1), for individuals of this segment. This average µi,j is 0.33 for ABC using its actual programming lineup, and 1.31 if it replaced all its action dramas with sitcoms. 36 The reported standard errors, therefore, neglect any additional variance due to simulation error in the numerical integration. 28 5.2.1 Show attributes (βs and ηs) A meaningful way to illustrate the effects of all the parameters is in terms of probabilities. To do so, we define a baseline viewer–she is 30 years old, lives alone, in an urban area, and has median income and education level. Furthermore, she is part of the largest unobserved segment. Viewers prefer shows whose cast demographic is similar to their own (see table 3(a)). The age of the cast has the largest effect. For example, the probability of the baseline viewer to watch a show whose cast demographic (Generation-X) matches hers is more than twice as large as that of a viewer who is 55 years old but similar in other respects.37 Viewers differ in both observed and unobserved ways in their taste for particular show types. For example, Generation-X viewers like psychological dramas more than action dramas and sitcoms. Coupled with the results above, this might explain why the networks choose a Generation-X cast for such shows.38 After controlling for differences in choices based on observed characteristics of individuals and shows, there are large differences in the “unexplained popularity” of shows ( see table 3(b)). The show with the highest “unexplained popularity” is CBS drama Chicago Hope; the one with the lowest ηj,t is ABC’s drama The Marshal. 5.2.2 State dependence parameters (δ) Our results are consistent with previous findings that document state dependence in television viewing. We find that state dependence is the most important source for observed network loyalty within a night. For example, our model predicts that the probability of watching a sitcom conditional on the viewer having watched that sitcom in the previous time slot is 90%. The state dependence parameters (δs) are presented in Table 3(c). The predicted persistence for watching sitcoms is based on the average δ across viewer types (δ = 1.94) and the additional persistence specific to sitcoms (δ sitcom = 0.8556).39 The analogous conditional probability for watching an action drama is 89%, for a romantic drama is 90%; whereas for sports shows it is 82%, and for 37 The probabilities are 0.083 and 0.040 respectively, and are calculated from 100 simulations of choice sequences, averaged over all timeslots of Generation-X shows. We also find that viewers living with their family like to watch shows about families; viewers prefer to watch shows with a cast of the same gender as their own, and low income people are more likely to watch shows whose cast is African-American. 38 Older viewers and “baby boomers” prefer action dramas to psychological dramas or sitcoms. Women prefer psychological dramas over the other show categories. This finding is consistent with common beliefs about the viewing habits of women. People living in urban areas tend to like action dramas less than those living in rural areas. Finally, we find that psychological dramas tend to be preferred by low-income and less-educated viewers. 39 Specifically, the persistence rate is calculated using viewer probabilities of watching the current timeslot of a sitcom, conditional on viewing the previous timeslot of the same show. We use 100 simulations of choice sequences and posterior segment probabilities for individuals. 29 news-magazines 81%. State dependence appears to be higher for shows where there is a plot line of some kind that can hook viewers, as is the case with sitcoms and romantic dramas. State dependence varies across viewer types. The conditional probabilities above (of watching a sitcom on any network, having watched it in the previous time slot) range from 73% to 93% across viewer types. Similarly, for viewers with access to cable channels, hence more program offerings on TV, state dependence is lower (the conditional probability above for watching a sitcom on a network channel reduces to 88% for viewers with cable).State dependence is, as is commonly thought, higher for female viewers compared to males, although, somewhat surprisingly, there is not much difference across age groups. Additional δ parameters help explain when viewers get hooked on a show. We find that state dependence is highest in the last fifteen minutes of a drama and lowest in the first fifteen minutes. Since many non-network shows end on the hour, one should expect to find weaker state dependence for the outside alternative at that time. Indeed, we find that viewers are most likely to tune in to network TV at the beginning of each hour (δ hour = −0.32). And, since FOX programming switches to affiliates at 10 PM (therefore FOX viewers included in outside option after that), we treat FOX viewers at 9:45 separately (δF OX10:00 = 0.69). Finally, we find that, as expected, switching into the middle of a show without watching its beginning is difficult (δ InP rogress = −0.49). 5.3 Preference for the outside option (γ) Viewers with cable access tend to watch less network TV (see table 3(d) for this and other γ parameters). The unconditional probability of watching any network is 6.5%, compared to 8% for those with no cable access. There appear to be clear patterns in the times at which viewers tune in to watch network TV. First, some viewers simply can be categorized as network TV lovers (γ all = −0.768)–the probability of these viewers tuning in to network TV during any time slot of the week is higher than for others. Second, viewers tend to watch at particular times in the evening–the “same time slot” effect is large (γ same = −0.862) and highly significant.40 There are clear differences in the preference for network TV across age groups. Young viewers watch it the least, older viewers the most. This may reflect the existence of popular cable channels, like MTV and ESPN, for young viewers, and the fact that young people have a higher utility from non television activities. Viewers with higher incomes watch more TV. But, there is not much 40 Since the variables Alli,j,t and Samei,j,t have missing values on Monday (the beginning of the sample), it is not surprising that we find an additional negative effect for this day (γ Mon_ 8 , γ Mon_ 9 and γ Mon_ 10 are all negative). 30 difference across education, family, gender, and urban groups. While this may seem surprising at first, recall that we analyze only prime time TV viewing habits here: it is likely that there might be significant differences across viewer demographic groups in their tendency to watch daytime programs. We also find that the utility from the outside alternative increases during the evening.41 5.4 Individual-firm match (α) Loyalty is not fully explained by the observed variables. The unobserved individual-firm match parameters, αi,j (that appear in table 3(e)), capture the unexplained portion of loyalty. Specifically, we find that the largest segment of viewers dislikes CBS,42 whereas the second largest segment likes CBS the most. The third largest segment likes CBS and FOX, and dislikes NBC. The fourth largest segment prefers NBC, and dislikes CBS, and the smallest segment likes CBS and dislikes FOX. 5.5 Information Parameters (ς) The parameter ς measures the precision of information signals on product attributes. Informally, it measures how informed individuals are about shows. The estimates in table 3(f) reflect significant heterogeneity in this parameter across individuals and networks. We start by describing the average differences in this parameter across viewers and networks, and then proceed to describe the pattern of heterogeneity across viewer types. The clarity of a network’s profile (ς µi,j ) is an inverse function of the diversity in product attribute-utilities (ξ i,j,t ) for that network. Specifically, since our estimate of the attribute-utility is · ³ ´2 ¸−1 1 P 1 P − . ξ̂ i,j,t = η̂ j,t +Xj,t β̂ i , the clarity of each network’s profile is then ς̂ µi,j = T −1 ξ̂ ξ̂ i,j,t t t i,j,t T When we average ς̂ µi,j across individuals in each viewer segment, we find that ABC is the “clear- est” brand for all viewer segments, and FOX is the least clear brand (the average of ς̂ µi,j across viewer types is: ς̂ µi,ABC = 1.45, ς̂ µi,CBS = 1.36, ς̂ µi,NBC = 1.21 and ς̂ µi,F OX = 0.86).The finding for FOX may seem surprising since this network offers the most homogeneous profile of shows: many Generation-X dramas and no sitcoms or news magazines.However, recall that the attribute-utility is, ceteris paribus, a function of both vertical attributes η̂j,t and horizontal ones, Xj,t . While the variation in XF OX,t is indeed lower than the other networks, the variance of vertical attributes 41 It seems from the estimates that the utility from the outside alternative between 8:15 and 10:45 is lower than the one at 8:00. This results from the bias of our γ 8:00 estimate. This parameter is biased, since we are missing the lagged choice of 7:45. Most viewers (about 75% of them) have their television off at 7:45, and thus they have a large switching cost to turn on the television. Since we do not include the lagged choice from 7:45, our estimate of γ 8:00 is upward biased. Notice, though, that this bias does not affect the parameters of interest in this study. 42 This is based on the average of the shows’ “unexplained popularity”, η j,t . 31 (η̂ j,t ) is the highest for FOX shows. In other words, while viewers may know what type of show and cast demographic to expect when they tune into FOX, they are much less certain about show quality. Recall that the weight the individual places on a network’s profile is a positive function of its clarity. Thus, the clearer the profile, the larger its effect on choices and the higher is viewer loyalty. In reality, of course, not all things are equal. An individual relies on a network’s profile when she is uncertain about the attributes of a product. Conversely, when she is well informed about such attributes–because of word-of-mouth, previous experience, or advertising–she would place less weight on a network’s profile even if it is clear. Next, we discuss these miscellaneous sources of information and examine their effect on θi,j,t . m The parameter ς m i,j represents the precision of the miscellaneous signals. Thus, a high ς i,j indicates that viewer i is very familiar with the shows of network j. We find that on average viewers m are most familiar with shows on ABC (average ς m i,ABC = 1.50, average ς i,NBC = 1.05, average m ςm i,CBS = 0.71, average ς i,F OX = 1.38). These findings pass a reality check quite easily. To see this, note that one would expect ς m i,j to be a function of the ratings of the shows and their age (i.e., the number of seasons the shows were on the air before 1995). There are two reasons to find a positive relationship between ratings and ς m i,j : (1) people chat more about successful shows, thus creating “word of mouth” information, and (2) viewers are more likely to have previously watched such popular shows. Even though NBC enjoyed the highest average rating (followed by ABC in second place) during the fall season of 1995, it was only third in the ratings race during the 1994 season (behind ABC and CBS). Moreover, while several of NBC’s highest rated shows in 1995 were in their first year of airing, the successful ABC shows were veterans. For example, one of ABC’s highest rated shows is Monday Night Football, which was in its 25th season. The low ς m i,j for CBS is not surprising as well–their average rating lagged that of the other networks, and CBS had additionally introduced many new shows in the fall of 1995. The heterogeneity in the precision of information obtained from miscellaneous sources across m viewer segments is shown in table 3(f). ς m i,j varies from ς̂ i,CBS = 0.50 for the third segment to ς̂ m i,ABC = 2.33 for the fourth segment. The third segment is the best informed about television shows on average. The estimates of ς̂ µi,j (network profile clarity) and ς̂ m i,j (precision of miscellaneous information about shows) can be used to calculate the weight θi,j that individuals place on network attributes. This is done in table 4. The first column in the table presents the values of θ for each segment 32 and network for every i, j, and t.43 Although ABC has the clearest image, viewers tend to place a low weight on its image since they are well-informed about its shows. In contrast, the image of CBS is much more important in guiding viewers’ choices for that network since viewers are not well-informed about its shows. The highest θ is 0.631 for the third viewer segment with respect to CBS, and the lowest θ is 0.268 for the third type with respect to ABC. These results mean that network profiles affect viewers’ choices over each show. To see this easily, consider the following hypothetical example. There are two networks A and B, two types of products (sitcoms and non-sitcoms), and two types of viewers–those whose β i for sitcoms is equal to 1 and those whose β i for sitcoms is equal to −1, with the opposite preferences for nonsitcoms.44 Further, assume that three-quarters of the shows aired by A are sitcoms, whereas B broadcasts non-sitcoms three-quarters of the time. Using our estimates, we can characterize the viewing propensities of the two types of viewers under different scenarios. We find that during time slots when both networks air sitcoms, viewers who like sitcoms watch A 48% of the time and B only 29% of the time. Conversely, viewers who dislike sitcoms have viewing propensities for the two networks of 21% and 37% in these time slots.45 Fully informed consumers should be equally likely to watch the two networks in each of these cases. Thus, these differences illustrate the consumer loyalty that results from their uncertainty about product attributes. 5.6 Goodness of fit Table 5 presents the predictive power of our model as compared to the benchmark model described in Section 3.4. Specifically, the prior distribution of the benchmark model does not depend on the multiproduct firms’ profiles. Instead, we estimate the mean and variance of the prior distribution for each network and each unobserved segment. As a result, the number of parameters of the benchmark model is larger (221 compared with 202 in our model). Although our model has fewer parameters, its predictive power is stronger. We measure predictive power in two different ways: (a) the root mean squared error (RMSE), 43 µ To do this, we use the estimates of ζ m i,j and ζ i,j , and the fact that the relative weight placed on the network image by viewers who have not been exposed to any ads is given by θ i,j = 44 ζµ i,j µ ζ i,j +ζ m i,j . These are approximately the estimates that one finds in the data when comparing those who like and dislike sitcoms. 45 The probabilities are based on 100 Monte Carlo simulations for 1000 individuals. In each case, the networks air twenty shows, each spanning one time slot. An individual i’s utility from network j is: Ui,j = Xj β i + εi,j , where Xj = 1 for a sitcom, 0 otherwise. Her utility from the outside option is Ui,out = εi,out . εi,j and εi,out are drawn from a type-1 extreme value distribution. Last, each individual receives one unbiased miscellaneous signal whose precision equals 1. Her expected utility from any show is based on the network image and the miscellaneous signals. 33 and (b) the average χ2 measure of goodness of fit, time slot by time slot (the precise definitions are presented in table 5). Using these measures, we examine the “out-of-sample” quality of prediction–of our model and the benchmark model–using the primary sample of viewers and (for cross-validation) the holdout sample. In addition to assessing the fit measures for all the time slots in the week, the table also presents the predictive power of the models when focusing only on the first time slot of each night (8:00 PM). In all these comparisons (but one—holdout sample, RMSE, all time slots), our model performs better than the benchmark one. 6 Applications In this section, we examine the implications of our model for consumer and firm behavior and for empirical work. Specifically, we assess how uncertainty increases consumer loyalty, and compare the empirical importance of µi,j and αi,j in building loyalty. We then demonstrate the empirical bias of ignoring the role of firm profiles in consumers’ prior distribution, and illustrate the implications of our model for competition in markets with multiproduct firms. 6.1 6.1.1 Implications for Consumer Loyalty Decomposing Loyalty: Approach I As stated above the model introduced in this study presents a new source of consumer loyalty to a multiproduct firm. As preliminary evidence of such loyalty, we calculate two variables: Currenti,j,n and Othersi,j,n . The first variable, Currenti,j,n , stands for the proportion of time that individual i was watching network j during night n (out of all the time slots that the TV was on). The second variable, Othersi,j,n , represents the percent of time spent watching that channel during all the other nights of the week. The correlation between these two variables (across all four networks) is 0.21, and it is significantly different from zero at better than the 0.1% level. This evidence points to the fact that viewing choices persist not just within a night, but across nights.46 46 We find that this result cannot be explained by “remote control persistence” (i.e., individuals tend to tune into the network that they were “watching last”). For example, of the 488 individuals who were watching ABC in the last (10:45 pm) time slot of a given night , the probabilities of watching each of the four networks in the first (8 pm) time slot of the subsequent night, conditional on watching broadcast TV, are 25.8% (ABC), 21.2% (CBS), 31.2% (NBC), and 21.2% (FOX). By comparison, the probability of watching the same network in the first time slot on successive nights is much higher. 34 In section 2, we described the various factors that result in consumer loyalty to a firm and focused on two in particular: the unexplained loyalty that is measured by αi,j and the informationbased loyalty as captured by µi,j . Using the structural estimates, we can now examine the relative importance of these two factors on loyalty. To do this, we first construct a measure of loyalty. Then, we use the model parameters to perform various counterfactual simulations that aim to examine the relative importance of the different sources of loyalty. Our loyalty measure Ψ is bounded between 0 and 1. The measure equals 0 if each individual spends the same amount of time watching each network, and equals 1 if each individual watches only one network. Furthermore, the larger this measure, the more loyal are individuals to networks. Specifically, Ψ is given by: b vb) = Ψ(Ω, I X 5 ³ ¯ ¯´ X 2¯ 1¯ b ( Ω, v b ) − φ 3 ¯ i,j 4¯ i=1 j=2 b b v) is the predicted share of viewing time spent by individual i on network j.47,48 where φi,j (Ω, To assess the relative importance of µi,j versus αi,j in affecting loyalty, we perform the following counterfactual experiments. In each experiment, we cancel out one or more of the sources of loyalty in the model and examine the resulting decrease in loyalty. Thus, the larger the decrease, the more important that element is as a source of loyalty. We calculate Ψ for four cases: (1) using the estimated parameters (this benchmark case provides a measure of the actual loyalty of consumers), (2) setting αj,k = 0 for all j and k (i.e., here, we ignore the loyalty arising from unobserved individual-firm match), (3) setting µi,j and ς µi,j to be the same for all the networks (i.e., ignoring the informational role of brands), and (4) 47 Specifically, b b φi,j (Ω, v) = 1 T 5 T X X t=1 k=1 b exp(U i,j,t (Ci,t−1 ,Wi,j,t , v bk , Ω) pbi,k 5 P b exp(U i,j,t (Ci,t−1 ,Wi,j,t , v b k , Ω) j=2 This expression is simplified for presentation. In fact, we are not conditioning on the actual lagged choice, but rather integrating over all possible histories starting at 8:00 PM of each night. Simulations were performed with 1000 runs. Finally, notice that pbi,k is the posterior type probability for each individual based on her demographic characteristics and viewing history, and using Bayes’ rule. 48 For an individual who spends the same amount of time watching each network, φi,j = 14 . Thus, for this individual, 5 ¯ ¯´ ³ X 2 ¯ b vb) − 1 ¯¯ = 0. On the other hand, for an individual who watches only one network, φ (Ω, b vb) = 1 for · ( Ω, ¯φ i,j i,j 3 4 j=2 ¯ ¯ ¯ b vb) − 1 ¯¯ = one of that networks, and 0 for the others. Thus, ¯φi,j (Ω, 4 three networks. Multiplying each one of these values by 2 3 gives us 3 4 for one of the networks, 5 ³ ¯ ¯´ X 2 ¯ b b) − 1 ¯¯ = 3 · ¯φi,j (Ω, v 4 j=2 bounded by 0 and 1. And, the more loyal the individual to any network, the higher is Ψ. 35 and 1 4 for the other b b 1. Thus, Ψ(Ω, v ) is a benchmark case where neither unobserved heterogeneity nor information is accounted for (i.e., setting both αj,k = 0 for all j and k, and µi,j and ς µi,j equal for all j ). We assess the importance of the unexplained loyalty (due to αi,j ) by the decrease in Ψ from case (1) to case (2). Similarly, we assess the importance of information-based loyalty as the decrease in Ψ from case (1) to case (3). The fourth case gives us a sense of the combined importance of these sources compared to other sources of loyalty. The values of Ψ in the four cases are (1) 0.270, (2) 0.214, (3) 0.196, and (4) 0.135. Thus, the decrease in loyalty between cases (1) and (2) is 0.056 (or 21%) and between cases (1) and (3) is 0.074 (or 27%). Based on our discussion above, this suggests that informational attachment is more important than unobserved heterogeneity in creating loyalty. Finally, comparing cases (1) and (4) reveals that information and “emotion” (unexplained loyalty) together account for more than 50% of viewer loyalty to networks. 6.1.2 Decomposing Loyalty: Approach II Here, we offer a different approach to assess the relative importance of incomplete information in creating loyalty. This approach is simple, intuitive, and can shed additional light on the identification of the structural model. The basic principle of this exercise is the following. Using our estimate of model 3 (the full information model), we calculate an individual-network variable Li,j . This variable represents the unexplained individual-network match as in a standard full information model. Notice that this model does not include any network-specific observables (for example, the share of network shows that are sitcoms, or the proportion of its shows that have an African-American cast). We then regress Li,j against those observable individual-network match variables. Our model suggests that at least part of the variation across individuals in Li,j should be attributed to these observables. Thus, the R2 of this regression can serve as a simple measure of the importance of uncertainty in creating loyalty. We calculate Li,j as the mean of the posterior distribution; that is, b i,j = L X k α b k,j pbi,k (Yi , Ci ) where pbi,k (Yi , Ci ) is the estimated posterior probability of viewer i belonging to type k based on her demographic characteristics and viewing history, and using Bayes rule.49 Notice that since 49 Notice that there is another, more demanding, way to estimate Li,j –as a fixed individual-network effect (for each individual-network) as part of the full information model. This would involve estimating 5025 parameters (I ∗ (J − 1) fixed effects). 36 b i,j can take on a pbik (Yi , Ci ) is a function of both individual ı́’s characteristics and her choices, L large number of distinct values. Now we have a vector of 5025 observations that describe the unexplained match between individuals and networks.50 The next step is to regress this vector against the observable individual-network variables suggested by our model, denoted as Mi,j . For example, one of the variables, MBoom−Boom , is the interaction between (a) the average time that network j is airing Baby-Boomers shows, and (b) a binary variable equal to one for Baby-Boomer viewers, and zero otherwise (Boomi ). The results of this regression are presented in table 6. Almost all the coefficients have the expected sign and are statistically significant from zero. For example, the coefficient of the variable MBoom−Boom described above is 2.75.51 Our main interest in this regression is the goodness of fit measure. It turns out that the R2 of this regression is 0.35. In other words, at least 35% of the unexplained heterogeneity is explained by these variables. Notice that it is not the case that only 35% of the individual-network heterogeneity is explained by the informational variables. The reason is that Mi,j includes only those multiproduct firm-image variables observed by the researcher. For example, another characteristic of cast and individual demographics might be blue-collar versus white-collar occupations. It might be the case that viewers engaged in blue-collar professions prefer to watch shows with similar characters. Then, if one of the networks, say ABC, were more likely than the others to air shows focused on blue-collar occupations, then our model would predict that blue-collar individuals should be loyal to ABC for informational reasons. However, since we did not include these variables in our dataset, this type of loyalty would remain unexplained. That said, the fact that 35% of the variance is being explained by our model is another indication that standard models may be missing an important variable in their regression. This regression also highlights the fact that our identification of the role of firm profiles is based on simple correlations in the data. 50 We have α b 0ij for three networks only, since in the estimation we have to normalize α b 0ij for one of the networks. Thus, we have 3 ∗ 1675 = 5025 observations. 51 We estimate a “fixed effect” for each network and find that ABC is the most popular, while CBS is the least popular. We find that while teens dislike networks that air many “Baby Boomer” shows (γ teen∗boom = −2.55, standard error= 0.44), baby boomers are attracted to such networks (γ boom∗boom = 2.75, with a standard error of 0.37). Female viewers like networks that air many shows about females, while male viewers are attracted to networks that air many sports shows. The viewers who most like “generation-X networks” are generation-X viewers. As expected, such networks are liked less by baby boomers, and least by older viewers. Older viewers like “baby boomer networks” the most. We use the variables Income and Urban as a proxy for African-American viewers and the find that the Income variable is not significant, while Urban has the expected sign. As expected, younger viewers like “sitcom networks” more than older ones do. Female viewers are not significantly more likely to prefer networks that air more romance dramas. 37 6.2 Biases in Standard Choice Models We have shown empirically that the information set of consumers includes the profile of multiproduct firms. It follows that most brand choice models misspecify the information set of consumers. This is likely to have effects on the unbiasedness of their estimates. The results in table 7 indicate that such misspecification leads to a significant bias in the estimated parameters of these models. The table is based on 100 Monte Carlo experiments. In each case, we simulate the simplest model that can be used to examine the question of bias. Specifically, there are two firms (j = 1, 2) which offer 10 products, one at each point in time t. The products have one continuous attribute, Xj,t . We select each product of firm A as a random draw from a normal distribution with a mean of −0.5 and a variance of 1. This means that the profile of firm A is characterized by these two parameters. The analogous parameters for firm B are 0.5 (mean) and 1 (variance). The utility of individuals is Xj,t Yi β, where Yi = 1 for 50% of the consumers and equal to −1 for the other 50%. The taste parameter β is equal to 1. The εi,j,t comes from an i.i.d. extreme value distribution. Individuals are uncertain about Xj,t , but each gets a noisy unbiased signal from miscellaneous sources about product attributes. This signal follows a normal distribution, and the variance of the signal, σ2s , is 1. As in our model, individuals form their expected utility from each product based on the signal and the firm’s image. The researcher observes Yi , Xj,t , and consumer choices in 20 time slots (two per product), but does not observe the signals. The profile of each firm, characterized by µi,j and 1 , σ 2b,i,j are calculated as in our model, i.e., they are based on the empirical distribution of Xj,t . There are two researchers. Researcher 1 follows the traditional approach to estimation. That is, he assumes that consumers are uncertain and receive noisy unbiased signals as described earlier. However, he does not include the profile of the firms in the information set. Researcher 2 adopts the estimation approach presented in our study. The results in the first line of table 7 present the estimates of the model by researcher 1. As can be seen, whereas the estimates of researcher 2 are unbiased, those of researcher 1 are biased. It turns out that the estimate of β is downward biased (0.635 instead of 1), while the estimate of the variance of signals is upward biased.52 The direction of the bias in β in the traditional model used by researcher 1 should not come as a surprise. Consumer i’s expected utility is equal to 52 We have also done 100 Monte Carlo experiments in which it is assumed that consumers ignore the firm profile (i.e., the information set does not include the multiproduct firm’s image). We found that in this case, as expected, both researchers obtain unbiased estimates of the parameters. 38 à σ2s σ2s + σ 2b,i,j ! à σ2 µi,j Yi β + 1 − 2 s 2 σ s + σb,i,j ! Si,j,t Yi β + εi,j,t (20) which, substituting for Si,j,t , can be written as: à σ2s σ2s + σ2b,i,j ! à σ2 µi,j Yi β + 1 − 2 s 2 σs + σ b,i,j ! ¡ ¢ Xj,t + εsi,j,t Yi β + εi,j,t (21) From researcher 1’s point of view, the expected utility is instead: ¡ ¢ Xj,t + εsi,j,t Yi β + εi,j,t (22) In other words, while the coefficient on the observable variables Xj,t Yi is actually θβ, the researcher does not account for θ in the estimation. This leads the researcher to obtain a downward-biased estimate of β (which in our case would be estimated to be 0.5 instead of 1). If this were the only effect of misspecifying the information set in the utility, there may not be a problem since the estimated effect of Xj,t Yi on utility is correct. However, the consequences of misspecifying the information set does not end here, as can be seen in the comparison of (21) and (22). Researcher 1 also is missing a relevant variable, µi,j Yi . Since this variable is positively correlated (across individuals) with the variable (Xj,t Yi ) included in the estimation, the estimate of the effect of (Xj,t Yi ) on the utility is upward biased. Thus, β is estimated to be 0.63 instead of 0.5. 6.3 6.3.1 Managerial Implications Brand extensions There are three parameters in the model–the mean parameter of the firm’s profile; the precision parameter of the firm’s profile; and the precision of the miscellaneous signals–that can be used to characterize some key choices facing brand managers when evaluating brand extensions. Thus, the framework helps trace out the consequences of these choices.53 Specifically, consider a manager who is considering a brand extension. We can characterize her actions in terms of their impact on the firm’s profile as follows: (1) a mean-preserving brand 53 Brand extensions have been studied by several authors. We do not elaborate on the organizational consequences of spillover effects here. These have been noted by, for example, Aaker and Joachimsthaler (2000): “Looking at brands as stand-alone silos is a recipe for confusion and inefficiency....A portfolio can sometimes be strengthened by the addition of brands, but such additions should always be made or approved by a person or group with a portfolio perspective. Decentralized groups that have little feel for (or incentive to care about) the total brand portfolio may damage it by adding brands too promiscuously.” 39 extension, (2) a precision-preserving brand extension, and (3) extensions that affect both the mean and the precision. Some brand extensions might change the precision but not the mean. For example, by adding a product that is identical in its characteristics to the mean, the precision of the firm’s profile increases without affecting the mean. On the other hand, a decrease in the precision without affecting the mean can result from adding two products that are located symmetrically and far enough on both sides of the mean. These kinds of changes, which would affect the weight that consumers put on the firm’s image, would have the following consequences. An increase in the precision would increase the firm’s market share among consumers who generally like the firm’s image. However, it should decrease its market share among consumers who dislike the firm’s image even for products which fit their preferences well. In a sense, our model magnifies the effects of brand extensions.54 Now consider the effect of adding products which alter the mean without affecting its precision. Products with attributes that are now more similar to the firm’s image than before would benefit from such an addition. But, the market share for the other products of this firm should now decrease. As an example, when CBS introduced Central Park West (a drama with a young cast) in 1995, its market share among older viewers for its traditional shows (which catered to an older audience) fell. Similarly, FOX’s introduction of the NFL in 1993 should be expected not just to have increased market share of FOX’s traditional shows among males, but also to have decreased the share among females for its traditional shows. In practise, most brand extensions will change both the mean and the precision parameters of the firm’s profile. In this case, the discussion above illustrates the various effects to consider with such a change. 6.3.2 Other consequences of informational spillovers Hits NBA Commissioner, David Stern, has been desperately searching for a new Michael Jordan– and for good reason. The original Michael Jordan drew attention not only to the Chicago Bulls, but for the entire league. The spillover effects of stars and hits is well known within the entertainment 54 As an example of this effect, consider that Sharon Stone’s image changed after the movie Basic Instinct in 1992, and the impact this would be expected to have on video rentals of her older movies. If consumers did not know what these earlier movies were about, their decision to to watch one of her older movies would likely be influenced by their image of the character she played in Basic Instinct. Consequently, viewers who like this image should now have been more likely to view her other movies even if the characters she played were different from her new image. On the other hand, consumers to whom her older movies might have appealed well, but to whom her new image did not, should be less likely to watch such movies now. 40 industries. Other examples are Tiger Woods and golf, Who Wants to be a Millionaire and ABC, Pretty Woman and Julia Roberts, etc. As the following examples illustrate, assessing the value of stars or hits is a topic of debate. In 1998, NBC renegotiated a new contract for the show E.R. with its producer, Warner Brothers. The new price per episode increased by more than 500%, from $2 million to $13 million.55 In 1993, FOX’s offer to the NFL for the television rights to its games more than doubled the price paid by the other networks in 1989.56 In those two cases, many observers criticized the high prices paid. Indeed, the rating ratio between E.R. and any other show on NBC was significantly smaller than the price ratio between these shows. NBC and Rupert Murdoch, respectively, explained their high offers by the spillover effects of the products. The problem is that it is hard to measure these spillover effects. The model presented in this study can assist in identifying and measuring the magnitude of these spillovers.57 For example, one can calculate the effect of replacing a show not only on the rating in that specific time slot, but also on the ratings of the entire week. Notice that a hit show improves the firm’s image for all the consumers. That P is, a show with a high η j,t obviously increases the mean, T1 t η j,t . Spillovers have an additional consequence that is illustrated in the following example. In June 2000, the leading television network in India was Zee Television, with 25 of its shows ranked in the top 30. Six months later, this advantage had not just vanished, but indeed reversed. By then, Star Television, its primary competitor, had accumulated a comparable number of shows in the top 30, building around a single hit show that it had introduced.58 Such dramatic reversals in market share are not uncommon to markets where spillovers are important. In such markets, since a single product can undermine the advantage of a leader across multiple product categories, its competitive position will be more fragile than is apparent from market share rankings. This should influence resource allocation and investment criteria as well. Firms competing in such markets might search for the “one big hit”, in contrast to industries where spillovers are not important, where firms are likely to adopt a more conventional approach to investment decisions. 55 Steve Hall, “Spiraling Cost of Shows Really Commercializing TV,” The Augusta Chronicle, January 29, 1998. See Anand (2001). 57 Recently, there has been interest in how to obtain sensible estimates of the true value of stars like Tiger Woods or Michael Jordan. See “A Tiger Economy”, in The Economist, April 14, 2001. 58 The show, Kaun Banega Crorepati, was a quiz show modeled on ABC’s Who Wants to be a Millionaire. 56 41 7 Conclusion Americans spend, on average, more than four hours per day watching television.59 Accounting for the opportunity costs of leisure, consumer spending on television is higher than spending on most other products. Other products within the entertainment industries are similar in this respect. It takes hours to read a book, to watch a movie, etc. The rapid growth of these industries in the last few decades has been stimulated by the increased leisure time and by technological improvements in the production and provision of these products. These industries have very clear and distinct characteristics. First, consumer products are heterogeneous, and products are differentiated. Some people would watch any movie with Robin Williams, while others would avoid a movie with this star. Some people adore country music, others detest it. Another characteristic is the uncertainty that consumers face about product attributes. The uncertainty emerges from the dynamic nature of these industries. New products appear frequently, and established products are changing their attributes occasionally. It is extremely difficult to follow these changes without spending one’s entire day reading about them. These characteristics call for a mechanism that supplies information to consumers easily. In this study, we have shown that multiproduct firms serve this role.60 This new role of multiproduct firms has interesting implications for marketing researchers and practitioners. Preliminary light has been shed on some of these here. We have shown that the informational role of multiproduct firms is a significant contributor to consumer loyalty. We speculated on the use of this concept in considering brand extensions. And we have hinted about possibilities of measuring and evaluating the value of hits via this model. We believe that further development of these applications can be fruitful. For example, it would be interesting to see if one can identify rules of thumb for brand managers that build on the logic of this model. While this study demonstrates the informational role of multiproduct firms, previous studies have documented the flow of information from firms to consumers through advertising.61 Thus, advertising and branding are substitutes when it comes to supplying consumers with information about products. This relationship might have various managerial implications that can be explored. 59 Anderson and Coate (2000) cite data from the Television Advertising Bureau that the average adult man in the U.S. spent 4 hours and 2 minutes watching television per day, and the average woman spent 4 hours and 40 minutes per day. 60 Multiproduct firms obviously have other roles. They decrease production costs via economies of scale and scope, and they signal quality for experience goods. These roles have been extensively studied elsewhere. 61 See Shachar and Anand (1998) and Anand and Shachar (2001). 42 References [1] Aaker, D. and Erich Joachimsthaler (1999). “The Lure of Global Branding,” Harvard Business Review, Nov.-Dec., 137-146. [2] Anand, Bharat (2001). “Fox Broadcasting Company–1993.” Harvard Business School Case No. N9799-042. [3] Anand, Bharat and Ron Shachar (2001) “Advertising, the Matchmaker,” http://www.tau.ac.il/ ~rroonn/matchmaker.pdf [4] Anderson, Simon and Stephen Coate, “Market Provision of Public Goods: The Case of Broadcasting,” Working Paper. [5] Berry, Steven, James Levinsohn, and Ariel Pakes (1995), “Automobile Prices in Market Equilibrium,” Econometrica, 63, 841-890. [6] Cabral, L. (1998). “Stretching Firm and Brand Reputation.” London Business School Working Paper. [7] Chamberlain, Gary (1985) “Heterogeneity, Ommited Variable Bias, and Duration Dependence,” in James Heckman and Burton Singer (eds.), Longitudinal Analysis of Labor Market Data, (Cambridge, Cambridge University Press). [8] Crawford, Gregory and Matthew Shum (2000). “Uncertainty and Learning in Pharmaceutical Demand.” Working Paper. [9] Darmon, R.Y. (1976) “Determinants of TV Viewing,” Journal of Advertising Research, 16, 17-24. [10] DeGroot, M. (1989). Probability and Statistics, (2nd edition), Addison Wesley. [11] Eckstein, Z., D. Horksy, and Y. Raban (1988). “An Empirical Dynamic Model of Optimal Brand Choice”. Foerder Institute of Economic Research, Working Paper no. 88. [12] Erdem, T. and M. Keane (1996). “Decision Making Under Uncertainty: Capturing Dynamic Brand Choice Processes in Turbulent Consumer Goods Markets”, Marketing Science, 15 (1), 1-20. [13] Erdem, T. (1998). “An Empirical Analysis of Umbrella Branding”, Journal of Marketing Research, 35, 339-351. [14] Goettler, Ronald and Ron Shachar (2002), “An Empirical Analysis of Spatial Competition,” Rand Journal of Economics (forthcoming). [15] Hajivassiliou, Vassilis (1997), “Some Practical Issues in Simulated Maximum Likelihood,” London School of Economics Working Paper No. EM/97/340. [16] Hajivassiliou, Vassilis and Paul Ruud (1994), “Classical Estimation Methods for LDV Models Using Simulation,” in R.F. Engel and D.L. McFadden (eds.), Handbook of Econometrics, Volume IV, (Amsterdam, Elsevier Science). [17] Horen, Jeffrey H. (1980), “Scheduling of Network Television Programs,” Management Science, 26 (4), 354-370. [18] Kamakura, Wagner A. and Gary J. Russell (1989), “A Probabilistic Choice Model for Market Segmentation and Elasticity Structure,” Journal of Marketing Research, 26 (4), 379-390. [19] Kesler, Lori (1989) “Extensions Leave Brands in New Area,” Advertising Age, June 1, 1987, S1. [20] Mankiw, Gregory, (1998). Principles of Economics. New York: The Dryden Press. 43 [21] McFadden, D. (1978). “Modelling the Choice of Residential Location,” in A. Karlquist et al., eds., Spatial Interaction Theory and Residential Location, North-Holland, Amsterdam. [22] Montgomery, C. and B. Wernerfelt (1992), “Risk Reduction and Umbrella Branding”, Journal of Business, 65 (1), 31-50. [23] Morrin, Maureen (1999) “The Impact of Brand Extensions on Parent Brand Memory Structures and Retrieval Processes,” Journal of Marketing Research, 36 (4), 517-525. [24] Moshkin, Nickolay and Ron Shachar (2001) “The Asymmetric Information Model of State Dependence,” http://www.tau.ac.il/~rroonn/Search.pdf [25] Ogiba, Edward (1988) “The Danger of Leveraging,” Adweek, January 4, 1988, page 42. [26] Pakes, Ariel and David Pollard (1989), “Simulation and the Asymptotics of Optimization Estimators.” Econometrica Vol. 57(5), 1027-57. [27] Park, Chan Su and V. Srinivasan (1994) “A Survey-Based Method for Measuring and Understanding Brand Equity and Its Extendibility,” Journal of Marketing Research, 31 (2), 271-288. [28] Rust, R. and M. Alpert (1984), “An Audience Flow Model of Television Viewing Choice”. Marketing Science, Vol. 3, Spring 1984, pp. 113—127. [29] Shachar, R. (1994), “A Diagnostic Test for the Sources of Persistence in Individuals’ Decisions,” Economics Letters, Vol. 45(1), 7-13. [30] Shachar, R. and B. Anand (1998). “The Effectiveness and Targeting of Television Advertising.” Journal of Economics and Management Strategy 7(3), 363-396. [31] Shachar, R. and J. Emerson (2000), “Cast Demographics, Unobserved Segments, and Heterogeneous Switching Costs in a TV Viewing Choice Model”, Journal of Marketing Research, Vol. XXXVII, 173-186. [32] Simonin, Bernard and Julie Ruth (1998) “Is a Company Known by the Company it Keeps? Assessing the Spillover Effects of Brand Alliances on Consumer Brand Attitudes,” Journal of Marketing Research, 35 (1), 30-42. [33] Sullivan, Mary (1990) “Measuring Image Spillovers in Umbrella-Branded Products,” Journal of Business, 63 (3), 309-329. [34] Wernerfelt, B. (1988). “Umbrella Branding as a Signal of New Product Quality: An Example of Signalling by Posting a Bond.” Rand Journal of Economics 19, 458-466. 44 Table 1 Individual Observable Characteristics: Definitions and Summary Statistics Definition Variable Teens Gen − X Boom Older Female Male Family Income Education Urban Basic Pr emium Between 6 and 17 years old at November 1995 Between 18 and 34 years old at November 1995 Between 35 and 49 years old at November 1995 Older than 50 years Female viewer Male viewer Viewer living in a household with (a) “woman of the house” present (over the age of 18) and (b) children On unit interval, where the limits are: zero if the income is less than $10,000, and one if the income is $40,000 and over On unit interval, where the limits are: zero if the years of school are less than 8, and one if the it is 4 or more years college Viewer lives in one of the 25 largest cities in U.S Viewer has basic cable service Viewer has basic and premium cable service Show Characteristic GenerationX Baby − Boom Family African − American Male Female Sitcom Action − Drama Romantic − Drama Mean 0.0627 0.2400 0.2764 0.4191 0.5319 0.4681 0.4304 Standard Deviation 0.2425 0.4272 0.4474 0.4936 0.4991 0.4991 0.4953 0.8333 0.2259 0.7421 0.2216 0.4149 0.3642 0.3588 0.4929 0.4813 0.4798 Table 2 Attribute Profile of the Networks: Definitions and Averages Definition ABC CBS NBC FOX The main characters in the show are between the ages of 18 and 34 The main characters in the show are between the ages of 35 and 49 The main characters in the show are members of a family The main characters in the show are African-American The main characters in the show are male The main characters in the show are female The show is a situation comedy The show is an action drama The show is a romantic drama 15 25 35 90 25 25 15 10 25 25 10 0 10 0 10 20 70 5 60 20 20 50 15 15 60 30 0 40 30 20 50 20 10 10 40 40 Numbers indicate percentage of all shows on the network with the corresponding show characteristic. The averages reported in this table are time-weighted; that is, are based not on the proportion of shows in each category, but rather on the screen time of each category. This is the appropriate representation of the probability of watching a show type when the individual is turning on the TV. 44 Parameter β Gender β Age 0 Table 3(a) Preferences for Show Attributes ( β ) Coefficient Standard Parameter Estimate Error Teens 0.2381 0.0443 0.0000 β PD Standard Error ---- 1.2785 0.1078 GenX β PD -0.3554 0.2162 β Age1 0.9045 0.0805 BabyBoomer β PD -0.6078 0.2087 β Age 2 0.0000 ---- Older β PD -0.6845 0.2320 β Family 0.4599 0.1216 Female β PD 0.8392 0.1209 β Race -1.3234 0.2843 -1.0972 0.2547 Teens β Sitcom 0.0000 ---- Income β PD Education β PD -0.6086 0.2499 GenX β Sitcom -0.9816 0.1884 Family β PD 0.1915 0.1205 BabyBoomer β Sitcom -0.9678 0.1831 Urban β PD 0.1760 0.1051 Older β Sitcom -1.2030 0.1978 k =1 β Sitcom 0.0000 ---- Female β Sitcom 0.3486 0.0833 k =1 β AD 0.0000 ---- Income β Sitcom -0.3466 0.2215 k =1 β PD 0.0000 ---- Education β Sitcom -0.4092 0.2089 k =2 β Sitcom 1.0881 0.2401 Family β Sitcom 0.1737 0.1127 k =2 β AD -0.9447 0.2218 Urban β Sitcom -0.0988 0.0856 k =2 β PD -1.8983 0.3471 Teens β AD 0.0000 ---- k =3 β Sitcom 0.4261 0.2573 GenX β AD BabyBoomer β AD Older β AD -0.5869 0.1868 -0.2408 0.2479 -0.3836 0.1815 -0.4298 0.3139 -0.1296 0.1904 k =3 β AD k =3 β PD k =4 β Sitcom 1.1141 0.2701 Female β AD Income β AD Education β AD 0.3953 0.0890 1.0342 0.2316 -0.6925 0.2403 -0.0156 0.3168 -0.1506 0.2197 k =4 β AD k =4 β PD k =5 β Sitcom -1.0337 0.3327 -0.0153 0.1028 -0.1564 0.2885 -0.1479 0.0931 -0.4185 0.3489 Family β AD Urban β AD k =5 β AD k =5 β PD 45 Network Show The Marshal Monday Night Football Roseanne ABC Ice Wars -0.2166 -0.9932 1.2204 -1.0446 0.1370 -0.7822 -0.7867 -0.0911 -0.9908 -0.2635 -0.3365 -1.5238 1.2611 -0.0438 -0.2770 0.8821 0.4133 0.4389 0.3883 0.3928 0.3341 0.4200 0.4426 0.3799 0.4069 0.4333 0.2868 0.3399 0.4625 0.4644 0.3900 0.4408 0.2155 -1.6185 0.4160 0.6834 E.R. 0.7243 0.6359 0.6503 0.5169 Dateline NBC (F) Homicide High Society Chicago Hope -1.8575 0.2989 -0.0667 2.2951 The Client Nothing Lasts Forever -1.2269 0.3128 0.3593 0.3534 Bless This House Dave's World Central Park West Courthouse Murder, She Wrote New York News -0.7359 0.4522 -1.2996 -1.4280 -0.1025 -1.7372 0.4567 0.5187 0.4185 0.4355 0.3688 0.4181 Hudson Street Home Improvement Coach NYPD Blue Ellen The Drew Carey Show Grace Under Fire The Naked Truth Prime Time Live Columbo Murder One Family Matters Boy Meets World Step by Step Hangin' With Mr. Cooper 20/20 The Nanny Can't Hurry Love Murphy Brown CBS Table 3(b) Unobserved "Show Quality" Parameters ( η ) Estimate Standard Network Show Error 48 Hours -2.7641 0.5039 CBS Here Comes The Bride 1.6575 0.4998 NBC Fresh Prince In The House She Fought Alone Wings NewsRadio Frasier Pursuit of Happiness Dateline NBC (T) Seaquest 2032 Dateline NBC (W) Law and Order Friends The Single Guy Seinfeld Caroline in the City Unsolved Mysteries FOX 46 Melrose Place Beverly Hills 90210 (M) Bram Stoker's Dracula Beverly Hills 90210 (W) Party of Five Living Single The Crew New York Undercover Strange Luck X-Files Estimate -0.9887 -0.9241 Standard Error 0.4683 0.4552 -0.5644 -0.2856 -0.5789 2.0990 -0.3116 -0.5322 0.1176 -1.9322 -0.0981 -1.3892 -0.1426 -0.7551 0.8398 -0.0754 0.6351 -0.4467 0.4091 0.6646 0.6973 0.4620 0.4248 0.4572 0.3929 0.4546 0.4369 0.3777 0.3895 0.3350 0.4335 0.4314 0.3812 0.3988 1.7731 0.2314 0.3545 0.3422 -0.5560 -1.3795 -1.0843 1.3881 0.3967 0.3522 0.6595 0.5142 -0.9617 0.8766 0.3269 0.3846 -1.0605 0.2313 0.5836 -0.3871 -0.9613 0.0000 0.4344 0.5021 0.5669 0.4835 0.3626 ---- Table 3(c) State Dependence Parameters ( δ ) Parameter Estimate Standard Error 0.8556 0.0917 δ Sitcom δ ActionDrama 0.5200 0.0955 δ PsychDrama 0.7345 0.0988 δ News δ Sport 0.0000 ---- -0.6744 0.1175 δ k =1 1.0356 0.1190 δ k =2 δ k =3 δ k =4 δ k =5 δ Basic δ Pr emium δ Female δ Family 1.6021 0.1105 2.2765 0.1137 2.1966 0.1192 2.0392 0.1245 -0.3622 0.0416 -0.2465 0.0442 0.0862 0.0345 -0.0167 0.0417 0.0000 ---- -0.0561 0.0670 -0.0437 0.0620 -0.0331 0.0692 1.6480 0.0959 0.8889 0.0977 -0.4637 0.0777 0.5040 0.1290 -0.3162 0.0829 0.6871 0.1507 -0.4877 0.0712 δ Teens δ GenerationX δ BabryBoomer δ Older δ Continuation δ Out δ First15 δ Last15 δ Hour δ FOX 10:00 δ In Pr ogress 47 Table 3(d) Preference for Outside Alternatives ( γ and α Out ) Parameter Estimate γ Basic 0.2172 Standard Error 0.0374 Parameter γ Pr emium 0.2704 0.0392 α k =1,Out ,9−10 PM 0.0000 ---- γ All -0.7679 0.0918 α k =1,Out ,10−11PM 0.0000 ---- γ Same -0.8622 0.0586 α k = 2,Out ,8−9 PM -0.3941 0.2202 γ Teens 3.4826 0.3172 α k = 2,Out ,9−10 PM 0.1576 0.1546 γ GenerationX 2.8265 0.3397 α k = 2,Out ,10−11PM 0.0238 0.2072 γ BabyBoomer 2.6805 0.3370 α k =3,Out ,8−9 PM 0.0236 0.2217 γ Older 2.2479 0.3421 α k =3,Out ,9−10 PM 0.0961 0.1502 γ Female 0.1516 0.0666 α k =3,Out ,10−11PM 0.2319 0.1914 γ Income -0.5046 0.1688 α k = 4,Out ,8−9 PM 0.6731 0.2342 γ Education -0.2100 0.1500 α k = 4,Out ,9−10 PM -0.3384 0.1712 γ Family 0.1174 0.0786 α k = 4,Out ,10−11PM -1.3440 0.1988 γ Urban -0.0451 0.0639 α k =5,Out ,8−9 PM -0.2483 0.2732 γ Monday8:00 -1.2678 0.2683 α k =5,Out ,9−10 PM -0.1942 0.1860 γ Monday 9:00 -0.0058 0.2418 α k =5,Out ,10−11PM 0.5934 0.2122 γ Monday10:00 0.2908 0.2754 γ 8:00 γ 8:15 γ 8:30 γ 8:45 γ 9:00 γ 9:15 γ 9:30 γ 9:45 γ 10:00 γ 10:15 γ 10:30 γ 10:45 0.0000 ---- -0.9489 0.0955 -0.7935 0.0986 -0.9526 0.1293 -0.8572 0.1607 -0.8748 0.1782 -0.8235 0.1724 -0.7556 0.1851 -0.3076 0.2724 -0.2622 0.2898 0.0772 0.2871 -0.0674 0.2927 48 Estimate α k =1,Out ,8−9 PM 0.0000 Standard Error ---- Table 3(e) Individual-Brand Match Parameters ( α ) Viewer Parameter Estimate Standard Segment Error 0 ---α ABC 1 2 3 4 5 α CBS α NBC α FOX 0 ---- 0 ---- 0 ---- α ABC α CBS α NBC α FOX 0.0000 ---- 0.3677 0.1513 -0.0437 0.1249 -0.6671 0.2618 0.0000 ---- -0.5129 0.1766 -0.2351 0.1240 0.2394 0.1966 0.0000 ---- 0.4457 0.1637 0.3244 0.1260 0.0796 0.2180 0.0000 ---- 1.3947 0.2009 -0.3953 0.1927 -0.9725 0.2887 α ABC α CBS α NBC α FOX α ABC α CBS α NBC α FOX α ABC α CBS α NBC α FOX Table 3(f) Miscellaneous Variance Parameters ( σ m ) Viewer Segment 1 2 3 4 5 49 Parameter Estimate Standard Error 0.4545 0.2501 σ ABC σ CBS σ NBC σ FOX 1.9940 0.8991 0.8252 0.3993 0.8778 0.5394 σ ABC σ CBS σ NBC σ FOX 1.9885 0.7486 1.6394 0.6523 0.6567 0.3320 1.2702 0.9277 σ ABC σ CBS σ NBC σ FOX 0.5978 0.2973 1.3236 0.7162 1.3314 0.5749 0.5104 0.3624 σ ABC σ CBS σ NBC σ FOX 0.4298 0.2436 0.9290 0.4564 1.5358 0.6452 0.7357 0.5181 σ ABC σ CBS σ NBC σ FOX 1.4538 0.6754 1.7258 0.7280 0.6292 0.4867 1.4721 1.6237 Table 4 across the different types and networks θ ABC CBS NBC FOX 0.2679 0.6314 0.4094 0.5329 1 0.4781 0.4400 0.3247 0.5034 2 0.3038 0.5258 0.5477 0.4113 3 0.3388 0.4514 0.5786 0.4694 4 0.4854 0.5693 0.3038 0.6235 5 Fit Measure Relevant time slots χ2 RMSE All 8:00 PM only All 8:00 PM only Table 5 Goodness of fit measures Primary Sample Benchmark Our Model model 0.001807 0.002024 0.002220 0.002439 0.005764 0.006072 0.007146 Holdout Sample Benchmark Our Model model 0.012895 0.013246 0.012058 0.014204 0.016177 0.015998 0.007196 Our measures of fit are defined as follows: [ 2 RMSE = mean j ,t ( pˆ j ,t − p j ,t ) 0.014511 0.015260 ] ˆ j ,t is the model's predicted proportion of viewers choosing alternative j and time t, and p j ,t is the where p corresponding observed proportion of viewers. The χ2 measure is equal to the sum of the squared difference between the expected and observed counts divided by the expected counts, for each time slot, averaged over all time slots. 50 Table 6 Decomposing Loyalty: Approach II Variable Estimate Standard Error Fixed network effect 0.5606 0.1484 ABC -0.3264 0.1189 CBS 0.2405 0.0715 FOX Interaction effects (The first expression is for the individual and the second for the network) -2.5592 0.4455 Teen-Boom 2.7544 0.3714 Boom-Boom 6.9070 0.3545 Older-Boom -0.1058 0.0442 GenX-GenX -0.5405 0.0431 Boom-GenX -1.200 0.0418 Older-GenX 0.5347 0.0789 Female-Female 0.3222 0.2277 Income-AfricanAmerican 0.9699 0.1032 Urban-AfricanAmerican -2.0348 0.3853 GenX-Sitcom -4.0883 0.3757 Boom-Sitcom -6.7533 0.3633 Old-Sitcom 0.0104 0.1096 Female-Romance 0.4696 0.0656 Male-Action&Sport 0.35 5025 R2 Number of observations Table 7 Biases in Standard Choice Models Parameters True parameter value Researcher 1: Traditional approach Researcher 2: Muli-product approach Mean estimate s.d. of the mean Mean estimate s.d. of the mean 51 β σs ln( L) 1 0.6371 0.0054 0.9832 0.0203 1 0.7489 0.0273 1.0303 0.0581 ---19.9773 ---19.9237 --- Figure 1 Television Schedule (8-10 pm), November 6-10, 1995 Day Network Mon. ABC 8:30 The Nanny Can’t Hurry Love NBC Fresh Prince of Bel-Air In the House ABC CBS NBC Thu. Hudson Street Roseanne Murphy Brown High Society Wings Beverly Hills 90210 Home Improvement Coach Pursuit of Happiness Frasier CBS Bless This House Dave’s World Chicago Hope Affiliate Programming: News NYPD Blue Grace Under Fire Dateline NBC Affiliate Programming: News Movie: Bram Stoker’s Dracula The Drew Carey Show 10:30 Movie: Nothing Lasts Forever News Radio Ellen 10:00 Movie: She Fought Alone The Client ABC The Naked Truth Prime Time Live Central Park West Courthouse NBC Seaquest 2032 Dateline NBC Law & Order FOX Beverly Hills 90210 Party of Five Affiliate Programming: News ABC CBS Fri. 9:30 Pro Football: Philadelphia at Dallas Melrose Place FOX Wed. 9:00 The Marshal CBS FOX Tue. 8:00 Movie: Columbo: It’s All in the Game Murder, She Wrote New York News Caroline in the City Murder One 48 Hours NBC Friends The Single Guy FOX Living Single The Crew ABC Family Matters Boy Meets World CBS Here Comes the Bride NBC Unsolved Mysteries Dateline NBC Homicide: Life on the Street FOX Strange Luck X-Files Affiliate Programming: News Seinfeld New York Undercover Step by Step Hangin’ With Mr. Cooper E.R. Affiliate Programming: News 20/20 Ice Wars: USA vs. The World 52 NBC 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 53 Romance 90 80 Romance 90 Action Fox Action 0 Sitcom 10 0 Sitcom 20 10 Female 30 20 Female 40 30 Male ABC Male 50 40 AfricanAmerican 60 50 AfricanAmerican 70 60 Family 80 70 Family 90 80 Boom Generation X Romance Action Sitcom Female Male AfricanAmerican Family Boom Generation X 90 Boom Generation X Romance Action Sitcom Female Male AfricanAmerican Family Boom Generation X Figure 2 Attribute Profile of the Networks CBS Vertical axis indicates percentage of all prime-time slots on the network that have the corresponding show characteristic.