Structural Analysis of Multi-Channel Demand Scott Shriver and Bryan Bollinger ⇤ April 1, 2015 Abstract In this paper, we propose a structural framework to study multi-channel, multi-category demand. We model demand at the level of individual purchase occasions, the timing of which is driven by a stochastic arrival process. Conditional upon the arrival of a brand consideration event, consumers sequentially set expenditure levels for the brand’s channels and product categories by maximizing a direct utility function subject to a stochastic budget constraint. Our formulation incorporates both fixed and variable channel shopping costs, as well as channel/category specific unobservables that capture differences in product assortment information. We estimate the model using the purchase histories of approximately 10,000 randomly selected customers from a firm that uses both online and retail channels to sell directly to consumers. The firm doubled its retail footprint over our two year observation window, providing a rich source of customer-specific variation in retail store distance. We leverage this variation to identify the effects of retail proximity on purchase frequency and channel expenditure patterns. Our estimates imply a 10% reduction in retail store distance increases existingcustomer total revenues by 0.7%, a result of increased brand consideration and partial recapture of consumer transportation cost savings. Channel switching patterns imply retail revenues increase by 2.1% while online revenues decrease by 0.5%. In a series of counterfactual experiments, we demonstrate how our model can be used as a decision tool for managers to explore channel-based pricing policies and to identify promising locations for new physical stores. ⇤ Shriver: Columbia Business School, Columbia University, ss4127@columbia.edu. Bollinger: The Fuqua School of Business, Duke University, bryan.bollinger@duke.edu. We thank the Wharton Customer Analytics Initiative (WCAI) for access to the data and seminar participants at the Marketing Science conference, Duke University, the University of California at San Diego, the University of Southern California, the University of Rochester, and the University of Toronto for their comments. We are also grateful to Avi Goldfarb and Kitty Wang for useful discussions regarding the paper. All errors are our own. 1 Introduction Increasingly, major brands sell directly to customers through a combination of digital (web, mobile, etc.) and physical (retail) channels. While offering multiple channels entails greater operational costs, the potential benefits to firms are many. Even when product assortments and prices are held fixed, consumer valuations of channel options will differ based on the transaction (e.g. shipping or transportation) costs they incur, which shopping formats they natively prefer, and how much product information (e.g., fit and feel considerations) can be ascertained through the various channels. Offering more (or lower cost) channel choices expands demand by increasing the likelihood that a channel will match the consumer’s shopping needs for any given shopping occasion. In addition, a brand’s presence in physical space or in digital media can influence consumer awareness of the brand, effectively serving as a form of advertising. The importance of the retail channel in this capacity has been recently reaffirmed, as once decidedly pure-play e-tailers such as Amazon.com have opened retail stores, citing their ability to attract a broader set of consumers and raise brand awareness.1 A final important benefit to firms is that they can use channel-based price discrimination policies to extract greater surplus from consumers. Understanding how and to what extent the mix of channels (and their corresponding transaction costs) influences patterns of consumer demand is therefore of mutual interest to marketing researchers, who wish to quantify the economic contribution of channels, and practitioners, who face the daunting problem of optimally locating physical stores and setting channel-specific prices. These tasks require empirical demand systems that can consistently capture demand expansion effects and patterns of substitution across the firm’s channels and product categories. In this paper, we develop such a framework and demonstrate how it can inform key aspects of channel strategy. Our model explains a comprehensive set of demand outcomes as a function of prices and retail store distance, including the frequency with which consumers shop, how much they spend per purchase occasion, whether they buy from the online (web) or retail channel, and how they allocate expenditures among multiple product categories. To achieve this, we model demand at the level of individual purchase occasions, the timing of which is driven by a stochastic arrival process that is linked to brand consideration. Conditional upon the arrival of a brand consideration event, consumers sequentially set expenditure levels for the brand’s channels and product categories (two-stage budgeting), by maximizing a direct utility function subject to a stochastic budget constraint. Channel expenditures are further constrained such that only one channel (at most) is chosen per consideration event.2 Our formulation incorporates both fixed and variable channel shopping costs, as well as channel/category 1 http://www.wsj.com/articles/amazon-rips-page-from-rivals-offline-playbook-1454708011?mod=djem_jiewr_ES_domainid 2 This assumption can be relaxed but then it would be necessary to aggregate shopping trips, which has many undesirable implications, 1 specific unobservables that capture differences in product assortment information. Unlike analyses of aggregate channel revenues, our comprehensive, individual-level approach can reveal the specific mechanisms by which channels influence demand, providing insights that may be used to fine-tune elements of the marketing mix. Our model is also quite general and adaptable to any setting where historical customer transaction data are available. Our model is structural in the sense that consumer channel and category expenditure choices are derived from a common underlying direct utility framework. This approach facilitates prediction of the demand response to retail store configurations and pricing regimes not directly observed in data. A further benefit noted by Chintagunta (1993) is that observed consumer (category, channel) decisions jointly maximize consumer utility, which would not be the case if such choices were modeled independently. Relatedly, Dubé (2004) remarks that demand derived from the joint utility model is well behaved when aggregated across consumers or markets. The benefits of structural estimation come at the expense of several computational challenges. In order to accommodate prediction of expenditures in multiple categories, we employ a direct utility specification as in the literature on multiple-discreteness (simultaneous choice of multiple alternatives) and marketing models of demand for variety (e.g., Wales and Woodland, 1983; Hanemann, 1984; Kim et al., 2002; Bhat, 2008).3 Although direct utility specifications are convenient for formulating a model of multiple-discreteness, no closed form expressions are available for the corresponding indirect utility (direct utility evaluated at the optimal expenditure level) – an object which is needed to jointly estimate the model in the form of a multi-stage budgeting problem. Using insights from Pinjari and Bhat (2010), we devise a highly efficient algorithm to numerically solve for indirect utility values. In our context, this innovation enables us to jointly estimate our hierarchical, multi-stage demand system using Markov Chain Monte Carlo (MCMC) methods, allowing for rich continuous heterogeneity at the individual level. We estimate the model using the purchase histories of approximately 10,000 randomly selected customers from a firm that sells directly to consumers though online and retail channels. The firm doubled its retail footprint over our two year observation window, providing a rich source of customer-specific variation in retail store distance. We leverage this variation to identify the effects of retail proximity on purchase frequency and channel expenditure patterns. Our estimates imply a 10% reduction in retail store distance increases existing-customer total revenues by 0.7%, a result of increased brand consideration and partial recapture of consumer transportation the foremost being that channel transation costs (an ecoonmic primative) are no longer directly linked to purchase occasions. 3 Broadly, direct utility models are distinguished from indirect utility models by the treatment of purchase quantities. By definition, the indirect utility for an alternative is the utility obtained at the consumer’s optimal consumption quantity for that alternative. An indirect utility model assumes the functional form of utility at optimal purchase quantity levels, whereas in a direct utility setup, optimal purchase quantities must be derived from the consumer’s optimization problem. Models of pure discrete choice (i.e., single-discreteness) typically abstract from purchase quantities altogether and adopt an indirect utility approach. A direct utility specification is often convenient in multiple-discreteness settings, where indirect utility functions may be difficult to specify. 2 cost savings. Channel switching patterns imply retail revenues increase by 2.1% while online revenues decrease by 0.5%. In a series of counterfactual experiments, we demonstrate how our model can be used as a decision tool for managers to explore channel-based pricing policies and to identify promising locations for new physical stores. The rest of the paper is organized as follows: in Section 1.1 we briefly discuss the related empirical literature. In Section 2 we describe the data used for the study and explore variation in key relationships. In Section 3 we develop our structural model. Section 4 describes our estimation method and model estimates are presented in Section 5. We further explore the implications of our estimates and perform counterfactual experiments in Section 6. We summarize our findings and propose further avenues of research in Section 7. 1.1 Related literature The economic contribution of channels has primarily been explored within the literature on new channel introduction.4 Researchers have studied the effect of channel introduction in multiple industries including newspapers (Deleersnyder et al., 2002; Gentzkow, 2007), music (Biyalogorsky and Naik, 2003) and apparel (Ansari et al., 2008). Most extant studies source data from the period of rapid e-commerce expansion (late 1990s to early 2000s) and consequently investigate the impact of adding an online (website) channel to existing brick and mortar or catalog channels (Ansari et al., 2008; Biyalogorsky and Naik, 2003; Deleersnyder et al., 2002; Geyskens et al., 2002; Van Nierop et al., 2011). Generally, these papers find that addition of the website does not cannibalize existing channels, due to positive spillovers across channels. With most firms now operating websites, channel strategy is increasingly focused on when and where to introduce retail outlets. To our knowledge, only three other recent studies have analyzed the impact of a firm adding physical stores to its existing online and catalog channels.5 Avery et al. (2012) use market-level panel data from an apparel/home furnishings retailer to measure the impact of opening a new physical store on net catalog sales, net online sales, sales from new customers and sales from existing customers. They use a differences-indifferences methodology, where control markets are identified using a propensity scoring algorithm. Avery et al. (2012) find that both catalog and online sales are cannibalized in the short run but tend to recover over time. Pauwels and Neslin (2011) use vector autoregression to analyze similar data (also apparel categories) in an 4 See Neslin et al. (2006) for a comprehensive survey of the literature on multi-channel customer management. et al. (2009) also study the impact of retail stores on online sales, but in the context of a competitive environment. They use a differences-in-differences approach and find significant sales declines on Amazon.com when local bookstores open. It is useful to consider why our results may be less pronounced. One possible explanation is that sales differences are driven by consumer brand preferences (Amazon vs. local store) rather than channel formats per se. Also, the results of our analysis are not not directly comparable because Forman et al. (2009) focus only on top-selling books and measure sales by the book’s sales ranking in the local market – thus, the relationship to total shopping trip expenditures (our measure of sales) is unclear. 5 Forman 3 aggregate time series format. The VAR approach permits Pauwels and Neslin (2011) to simultaneously model multiple demand outcomes, including the frequency and size (in $) of orders, returns and exchanges by channel (online, catalog, store) as well as the total number of customers in the market. The results are generally consistent with Avery et al. (2012) in that the authors find physical store introduction: i) cannibalizes catalog sales but not online sales and ii) increases the rate of new customer acquisition. The third paper is Wang and Goldfarb (2014), which uses some of the same data we analyze. Similar to Avery et al. (2012), Wang and Goldfarb (2014) use a differences-in-differences methodology to investigate the impact of retail entry on online and retail sales in markets defined by Census tracts. Wang and Goldfarb (2014) find that the retail channel can serve as a marketing communications device for the online channel, increasing sales through new customer acquisition. The authors find that in areas with a weak brand presence, defined as whether the Census tract has no sales in the three months prior to entry, this effect dominates as there are no pre-existing online sales to cannibalize. In contrast, areas with a prior brand presence also exhibit channel cannibalization effects. Our empirical findings are broadly consistent with those of Wang and Goldfarb (2014), in that both studies find evidence of demand expansion and channel cannibalization. Differences in context and data across studies makes a comparison of results and methods difficult. We note that ours is the only analysis conducted at the individual level, which allows us to exploit within-customer variation in behavior after the introduction of a nearby store. We conjecture that our operationalization of retail channel effects using individual customer retail store distances (rather than counts of stores within a market) provides a richer source of identifying variation for the effects of interest. A related point of differentiation with Wang and Goldfarb (2014), Pauwels and Neslin (2011) and Avery et al. (2012) is our ability to control for unobserved consumer heterogeneity and channel-specific product prices. Finally, only the structural model offers rich decision support capabilities to consistently forecast demand in markets under alternative channel availability or price regimes. The setup of our model also links it to the the literature on store choice, which similarly considers shopping format choices and the purchase of baskets of goods. Bell and Lattin (1998) study the choice of retail store format in an environment with different levels of price uncertainty since some retailers use EDLP and some use HILO. The choice of channels in our context has a similar character, in that channel formats differ with respect to their ability to provide information about product attributes. The ability to assess product fit has been shown to be an important factor in channel format choices in several recent marketing studies (e.g., Bell et al., 2013; Soysal and Zentner, 2014; Dzyabura et al., 2015). The store choice literature has also emphasized the importance of planned expenditure levels (or “basket size” in the language of Bell and Lattin (1998)) and shopping trip 4 fixed costs (e.g. Bell, Ho, and Tang, 1998) as determinants of shopping format choices. We similarly model channel choice as a function of the trip budget and channel transaction costs. The standard practice in the store choice literature has been to treat expenditure levels as exogenously determined – for example, Bell and Lattin (1998) use a pre-estimation calibration to obtain the probability that consumers are either “large basket” or “small basket” shoppers, and Bell, Ho, and Tang (1998) assume budgets arise implicitly from an unobserved shopping list. By contrast, we endogenize the channel expenditure decision, allowing us to infer how expenditures respond to changes in prices or retail store proximity. 2 Data Data for the study comes from a North American speciality retailer that sells to customers exclusively through its e-commerce website (the online channel) and network of retail stores (the retail channel).6 The brand sells a variety of apparel products including clothing, footwear and accessories. As is typical in the apparel industry, product assortments vary over time, reflecting fashion trends and seasonal patterns of demand (e.g., warmer clothes in winter). However, there is limited variation in the product assortment across channels – the firm’s product database indicates that fewer than 0.1% of products are offered exclusively via one channel.7 Since our objective is to predict channel-specific demand both in and out of sample, the perpetual introduction and retirement of goods makes individual SKUs both conceptually and computationally impractical as the unit of analysis (this would involve more than 70,000 unique SKUs, most of which would not enter future assortments). Therefore, for our empirical work we aggregate demand into six distinct product categories, which correspond to the top level of the brand’s product hierarchy. These category definitions are sufficiently broad so as not to be season-specific (e.g., a footwear category would encompass winter boots as well as summer sandals) but narrow enough to share common requirements regarding fit assessment (e.g., the need to try on shoes does not vary greatly across shoe styles but is presumably different from the need to try on pants). Given this aggregation scheme, our analytical framework will characterize consumer demand for product categories and explore how the mixture of available channels influences that demand. With approximately 87% of the US population now using the Internet,8 the mixture of channels available to a consumer is effectively determined by the consumer’s proximity to the retail outlets operated by the brand. 6 Confidentiality agreements preclude us from disclosing the identity of the brand or revealing precise descriptions of the products in their portfolio. 7 Operational issues such as product delivery schedules and stock-outs may result in temporary assortment variation across the channels, at least with respect to some retail stores. Lacking data on store and online channel inventories, we cannot characterize such assortment variation precisely. 8 Source: http://www.internetlivestats.com/internet-users. 5 Observing variation in retail outlet proximity is therefore critical to identifying the different mechanisms (purchase frequency, expenditure levels, basket composition, etc.) by which channel availability influences demand. Our dataset provides a rich source of such variation. The firm provided us customer home locations by Census block as well as the coordinates (latitude, longitude) and opening dates of all its retail outlets. These data allow us to calculate a customer’s proximity to a retail store at any point in time with high precision (roughly 1/3 of a mile accuracy). In addition to the cross-sectional variation in retail proximity derived from the initial spatial distribution of customers and stores, we observe significant within-subject variation due to the entry of new stores – during our two-year period of study, the brand expanded its retail footprint from 37 retail outlets to 75. We explore the effect of retail distance on various demand outcomes though a series of descriptive regressions in Section 2.2. 2.1 Preparation and summary We observe complete transaction-level purchase histories for a sample of 10,242 customers residing in the continental United States, which is randomly drawn from the firm’s database. The window of observation spans July 2010 through June 2012. Over this period, we observe a total of 29,098 transactions, each corresponding to a separate purchase occasion. Transaction records indicate the date and channel format, the quantities and prices of individual SKUs purchased, as well as a return indicator for each SKU. As our principal interest lies in the net economic contribution of channels to demand, we omit returned items from our computation of trip expenditures and aggregation of category demand. To facilitate our empirical analysis, we structure the data as a sequence of customer purchase occasions (including the possibility of zero purchases) within a time period (quarter).9 We first discuss variation in the data at the customer/quarter panel level, and then turn to discussion of variation at the level of individual purchase occasions. Table 1 summarizes variables with panel-level variation (number of purchases, retail store distance) and with cross-sectional variation (demographics, taxes). The panel summarized in Table 1 is unbalanced panel across the 8 quarters of study and 10,238 individuals, reflecting the observed acquisition of new customers. The table indicates that on average customers make 2 purchases per year (1/2 purchase per quarter) and live approximately 43 miles from the nearest store. As seen in the full distribution of purchases per quarter (Figure 1 below), some customers are very frequent shoppers, with more than 10 purchases per quarter. In Figure 2, we plot the distributions of customer retail distance (in log scale) for the initial and final periods, which makes clear that retail distance declines appreciably over time. The within-customer variation induced by observed retail 9 This approach allows us to associate continuously-varying price and store distance information with summary discrete-time measures. 6 entry events is substantial, with a standard deviation of 28.4 miles. The cross-sectional data include directly observed demographic variables (age and gender), demographic variables imputed from matching Census 2010 block groups, and customer location-specific sales tax rates for online and retail purchases. While the means strongly suggest the firm’s core market is affluent, middle-aged women, there is also substantial variance in the demographic variables. We report online and retail sales tax information linked to the customer’s home location, as these costs may influence the consumer’s channel choice.10 As the equivalence of the online and retail mean tax rates suggests, for the vast majority (97%) of consumers, online and retail tax rates are equivalent. Differential rates generally apply for those individuals living near state boundaries, where their home and closest retail outlet are located in different tax jurisdictions. Panel variables observations mean std dev min quarter 57,553 5.13 2.21 1 8 purchases 57,553 0.51 1.22 0 36 retail distance (mi) 57,553 42.67 81.49 0.1 499.74 Cross sectional variables Customer characteristics Market characteristics max observations mean std dev min max age 10,236 40.46 11.76 18 94 male 10,236 0.08 0.27 0 1 rural 10,236 0.11 0.28 0 1 white 10,236 0.80 0.19 0 1 college 10,236 0.55 0.21 0 1 commute time (min) 10,236 28.14 7.51 8.63 61.75 median HH income (k) 10,236 97.60 49.65 2.50 250.00 online tax rate 10,236 0.07 0.02 0 0.10 retail tax rate 10,236 0.07 0.01 0 0.10 Table 1: Customer/quarter panel summary statistics .8 .3 Initial period final period .6 density Density .2 .4 .1 .2 0 0 0 2 4 6 8 −2 10+ trips per quarter 0 2 4 6 log(retail distance) Figure 1: Frequency of purchase occasions per quarter Figure 2: Customer retail (log) distance distributions in initial and final periods 10 Matching is done at the zip code level using retail sales tax rates downloaded from https://www.avalara.com/products/taxrates/. The customer’s retail rate is the prevailing rate at the nearest retail outlet. For the online tax rate, we use information from the firm’s website that specifies from which states they collect online sales tax. If the customer’s home zip code is in one of these states, the online rate equals the retail rate in that zip code (and is equal to zero otherwise). 7 Price indices, which are not specific to individuals but vary by time, category and channel, are summarized graphically in Figure 3. Similar to previous treatments in the marketing and economics literatures (e.g., Gordon et al., 2013; Chevalier et al., 2003), we compute category (k) and channel (c) specific price indices for a quarter (t) as geometric expenditure-weighted means of SKU-level prices: 0 1  w j log(p jct ) ÂÂe jct B j2Jt ,k C c t pkct =exp @ A where : w j =  wj ÂÂÂe jct (1) j c t j2Jt ,k In (1) above, p jct is the average (across consumers) price paid for SKU j in channel c and quarter t, while e jct is the total expenditure (summed over consumers) for the same SKU. We use Jt to represent the set of SKUs offered in quarter t. The applied weights, which are the fraction of total observed expenditures attributable to a given SKU, naturally reflect more popular products in the resulting index. As is apparent from Figure 3, for most categories, prices are slightly lower in the retail channel. A SKU-level analysis of prices reveals that these differences stem from more aggressive discounting in the retail channel. Category 1 Category 2 Category 3 Category 4 Category 5 Category 6 150 100 Price index 50 0 150 100 50 0 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 Quarter Online Retail Figure 3: Channel-specific category price indices Next we turn to data on individual purchase occasions. We summarize observed purchases in terms of the selected channel (where we code online as 1 and retail as 2), the total expenditure, and the expenditure shares for each of the six product categories. The purchase transaction data summarized in Table 2 indicates that the average per-purchase expenditure is approximately $127 and that the 57% of observed purchases are in the retail channel. Expenditure shares tend to be highest for category 1 and lowest for category 6, which is the highest priced category. 8 observations mean std dev min max channel 29,098 1.57 0.50 1 2 expenditure 29,098 126.96 125.28 1 1245.95 share 1 29,098 0.32 0.41 0 1 share 2 29,098 0.12 0.29 0 1 share 3 29,098 0.13 0.30 0 1 share 4 29,098 0.20 0.35 0 1 share 5 29,098 0.18 0.34 0 1 share 6 29,098 0.05 0.21 0 1 Table 2: Purchase occasion summary statistics We examine the relationship between expenditures and channel formats in Figure 4, which plots kernel density estimates of trip expenditures by channel. It is clear that customers spend more per trip in the online channel (average values are $138.01 for online and $118.51 for retail), and that there are a large number of small expenditure transactions in the retail channel. We explore variation in category shares by channel in Figure 5. The bulk of the expenditure share variation across channels is linked to categories 1, 5 and 6, where there is a clear preference for category 6 in the online channel and for categories 1 and 5 in the retail channel. .008 Online Retail Online Retail share of basket expenditure .3 density .006 .004 .002 .2 .1 0 0 200 400 600 800 0 1000 trip expenditure 2 3 4 5 6 category Figure 4: Expenditure distributions by channel 2.2 1 Figure 5: Average expenditure shares by category and channel Descriptive analyses In this section, through descriptive regressions we explore how distance to the retail outlet affects demand along four dimensions: purchase frequency, the total expenditure per purchase, the channel chosen for purchase, and new customer acquisition. Our findings here guide the formulation of our structural model in Section 3 and related extensions in Section 6.2. Throughout this section and in our model development, we define the variable d as the distance to the nearest retail outlet. 9 2.2.1 Purchase frequency We model the number of purchase occasions for customer i in quarter t, Lit , as a Poisson arrival process and use the customer/quarter panel data to estimate the model. The specification includes bi-level (individual, quarter) fixed effects (di , µt ) to control for unobservables – the effect of retail store distance (a) is thus identified by deviations from individual-specific average purchase incidence rates, after controlling for time trends common to all individuals. Formally, the specification is: Lit ⇠ Poisson(rit ), log (r jt ) = a log(dit ) + di + µt The estimate of a from this regression is â = (2) 0.104 with a standard error of 0.016, which is significant below the 1% level. Given the log-log specification for r and d, we may interpret â as an elasticity, so that a 10% decrease in retail store distance corresponds to a 1.0% increase in the mean number of purchases per quarter. We conclude that the increased proximity to a retail outlet (smaller store distance) has a significant positive impact on shopping incidence rates, presumably due in part to increased top of mind awareness of the brand. 2.2.2 Expenditure level Although retail outlet proximity increases customer purchase frequency, to infer the net effect on total expenditure levels with the brand, we must also consider the potential impact of retail distance on the expenditure per purchase. To explore this issue, we use the trip-level data and model the expenditure for the l’th purchase by customer i in quarter t, eitl , using a log-log model with bi-level (individual, quarter) fixed effects (di , µt ): log(eitl ) = a log(dit ) + di + µt + eitl (3) The estimate of a from this regression is â = 0.003 with a standard error of 0.014. The sign of â is consistent with expenditures increasing at closer retail distances; however, the effect is not significant at any conventional level. Similar results hold if we condition upon only online or only retail purchases. We infer that point estimates are directionally consistent with expenditures increasing in decreasing distance, but we cannot conclusively reject the null of no distance effect on per-purchase expenditure levels. 2.2.3 Channel choice Next we analyze the effect of retail distance on channel format choices. We denote customer i’s channel choice for the l’th purchase in quarter t as citl , and following our previous convention, associate a channel choice of 2 with the retail channel. We proceed by modeling the probability of a retail channel choice using a binary logit 10 model, again including bi-level fixed effects: Pr(citl = 2) = The estimated a is â = exp (a log(dit ) + di + µt ) 1 + exp (a log(dit ) + di + µt ) (4) 0.531 with a standard error of 0.064, which is significant below the 1% level. Unsur- prisingly, proximity to a retail outlet has a strong effect on channel choices – the closer to the retail outlet, the higher the probability of a retail (vs. online) purchase. The intuitive interpretation of this result is that the lower transportation costs due to store entry increase use of the retail channel. 2.2.4 New customer acquisition Finally, we consider the impact of retail distance on new customer acquisition. Although our structural framework is principally focused on existing customer demand, in Section 6.2 we demonstrate how it may be extended to incorporate new customer acquisition. To assess how retail store distance influences new customer acquisition, we construct an auxiliary dataset organized as a balanced panel of 8 quarterly observations on 216,291 Census block groups, summarized in Table 3. Variables include the distance from the market centroid to the nearest retail store (d), the number of new (N) and existing (E) customers and the market population (pop). In total, 7641 customers are acquired over the course of the observation window (75% of individuals in our sample), leading to substantial panel-level variation in the variable N. observations mean std dev min max retail distance (mi) (d) 1,730,328 147.758 163.653 0.01 1025.9 new customers (N) 1,730,328 0.004 0.067 0 3 existing customers (E) 1,730,328 0.020 0.150 0 4 population (pop) 1,730,328 1407.82 811.03 1 36,880 Table 3: New customer data (Census block group/quarter panel) To assess how the presence of a retail outlet may influence new customer acquisition, we model the arrival of new customers (Nmt ) as a Poisson process: Nmt ⇠ Poisson(rmt ), log (rmt ) = a log(dmt ) + dm + µt + log(popm Emt ) (5) As in Section 2.2, the specification includes bi-level (market, quarter) fixed effects (dm , µt ) to control for unobservables. The level of the arrival rate is normalized in proportion to the number of consumers in a market who are not yet customers of the brand (popm Emt ). The estimate of a from this regression is â = 0.356 with a standard error of 0.022 (highly significant). Thus, a 10% decrease in retail store distance corresponds to a 3.6% 11 increase in the mean number of new customers per quarter. The relatively large magnitude of this effect indicates that new customer acquisition is a critical economic benefit of operating a retail channel. The finding supports the notion that retail stores are effective in boosting initial brand awareness among prospective customers. In summary, the descriptive analysis suggests two key effects of retail proximity on existing customer demand: an effect on the overall shopping frequency and an effect driving channel choice. We therefore include direct dependencies on store distance for these factors when specifying in our structural model in Section 3 below. 3 Model Our modeling objective is to explain and predict a comprehensive set of demand outcomes: how frequently consumers purchase, what channel they choose to purchase, how much they spend, and how they allocate expenditures across multiple product categories. In accordance with this objective, we model demand at the level of individual purchase occasions. To clarify the exposition, we first develop the model assuming consumers have homogeneous preferences. We subsequently incorporate heterogeneous preferences into our empirical specifications, as discussed in Section 3.6. Figure 6 below outlines our proposed model, which conceptualizes demand as originating with a brand consideration arrival process. Specifically, the number of times a consumer considers purchasing from the focal brand in a given period (Y ) is assumed to follow a Poisson distribution with rate parameter µ. The consideration arrival process may be conceptualized as reflecting: (i) Poisson process arrivals of the consumer’s need for product, and (ii) the conditional (Bernoulli) probability of considering the focal brand given product need.11 In Section 3.5, we explain how we relate the observed purchase frequency to the consideration process rate parameter µ. Conditional upon a brand consideration event, the consumer allocates the total shopping budget (b) among channels, product categories and the outside option in a two-stage budgeting decision.12 In the first stage, the consumer chooses the expenditure level in dollars for each channel (ec ) such that, at most, one channel has positive expenditure for a given consideration event. This choice includes the possibility of zero expenditures in all channels (a no-purchase condition). Channel expenditure levels are constrained such that the total shopping budget is equal to the sum of: (i) channel expenditures (plus applicable sales taxes), (ii) the outside good expenditure, 11 If N~Poisson(l ) and C~Bernoulli(z ), Y⌘N|C=1~Poisson(l z ). In our setting, both N (need arrival) and C (consideration) are unobserved and hence not separately identified. Our model effectively recovers the compound rate parameter µ = l z . 12 We warrant that in some settings consumers may shop in sequences other than the one we describe. For example, consumers may first determine the set of products to be purchased in the form of a shopping list (e.g. Bell, Ho, and Tang, 1998) and subsequently choose the channel from which to shop. In most cases (including ours), the consumer’s decision sequence cannot be identified from observable data, requiring any complete model of multi-channel, multi-product shopping behavior to employ comparable assumptions. In this sense, our model serves as a baseline against which other approaches may be compared. 12 Stage 0: Brand consideration arrival ! Consideration events arrive via Poisson process Arrivals ~ Poisson(µ) Stage 1: Channel expenditure (e) decision Channel utility, income shocks (ξ,ν) realized Expenditures maximize expected utility, assuming optimal 2nd stage category choices Stage 2: Category share (s) decision Category/channel-specific shocks (ε,η) realized Shares chosen to maximize utility Figure 6: Model overview and (iii) channel fixed costs. Channel fixed costs include observable factors such as shipping costs (online) and transportation costs (retail). In the second stage, the consumer allocates the channel expenditure among the K categories by setting expenditure shares sk , where the sk sum to one. In the following subsections, we first formulate the consumer’s decision problem (Section 3.1) and then discuss the empirical specifications we use during model estimation (Section 3.2). Next, we solve the model and derive likelihood expressions for each stage in Sections 3.3, 3.4 and 3.5. Finally, we introduce our persistent heterogeneity specification in Section 3.6. 3.1 Consumer’s problem: stages 1 and 2 We collect a consumer’s channel expenditure choices in a Cx1 vector (~e) and her category share choices in a Kx1 vector (~s). Conditional upon a brand consideration event, the consumer’s problem is to select the channel expenditure levels and category shares that maximize her utility, subject to a budget constraint. If the consumer were to make these decisions jointly, her problem can be formulated as follows: ✓ ◆ C C K sk ec max U(~e,~s) =   Yck gk log + 1 +  Wc I(ec > 0) + log (z) pck gk ~e,~s c=1 k=1 c=1 sub ject to : ec 0, C  I (ec > 0) 2 {0, 1}, z > 0 (6a) (6b) c=1 sk 0, K  sk = k=1 C C c=1 c=1 C  I (ec > 0) (6c) c=1  ec (1 + rc ) +  fc I(ec > 0) + z = b (6d) The first term in the specification in (6a) above captures the utility consumers get from the purchase of products from the focal brand. Note that sk ec /pck ⌘ qck is a quantity index for purchases in category k and channel c, so that consumers have decreasing marginal utility in the quantity of each category (through the log(·) 13 expression). The addition of one to sk ec /pck gk inside the log(·) expression enforces a lower bound of zero for each of the additively separable category/channel sub-utilities. The role of the g parameters are to alter the rates at which the consumer’s utility satiates in purchase quantity – a higher g implies a lower rate of satiation. The Y terms are the so-called “baseline” marginal utilities, because Yck is the marginal utility obtained in the limit of ∂U ). qck !0 ∂ qck consuming zero quantity ( lim The second term in (6a) captures utility that stems from the shopping activity itself. That is, Wc is the utility obtained from the act of shopping in channel c. We include these terms to reflect the fact that some consumers have strong preferences over channel formats, independent of how much they spend or what products they choose to buy. The indicator function I(ec > 0) reflects that shopping utility is only incurred for a chosen channel. The final term in (6a) captures the consumer’s utility for the outside good, where z is the expenditure level for the outside good. Some expenditure on the numeraire outside good z is essential (z > 0) for each consideration event. z is unobserved but, as will be seen, its value may be inferred from model assumptions. Equation (6b) contains constraints on the channel expenditure levels. The first expression implies expenditures must be non-negative, while the second enforces that at most one channel may be chosen for each consideration event. Equation (6c) provides constraints on category expenditure shares: these too must be non-negative and sum to one if there is a positive expenditure in some category (and must be identically zero otherwise). The final relation (6d) is the overall shopping budget for the consideration event: the budget (b) equals channel expenditures (marked up by the channel sales tax rate, rc ) plus the outside good expenditure (z) and channel fixed costs ( fc ). As with the channel shopping utility, channel fixed costs are only incurred when channel expenditures are positive. An empirical specification of the consumer’s decision model imposes functional form and stochastic assumptions on {Yck , Wc , fc , b}. Under standard assumptions for these terms, computing the optimal channel expenditure levels and category shares in the joint decision model is prohibitively expensive. To simplify computation and to reflect a more realistic information revelation process, we assume that consumers sequentially make channel expenditure and category share allocation decisions. Shocks to channel utilities (contained in Wc ) and the trip budget (contained in b) are realized at the time of the first stage channel expenditure decision, while shocks to product category utilities (contained in Yck ) are realized at the time of the second stage category share decision. In the first stage, we assume consumers have rational expectations with respect to product prices in both channels and know the distribution of 2nd stage unobservables. Under these assumptions, expectations are taken ⇣⇤ ⌘ K s (Y )e with respect to Yc . Let vc (ec ) = E  Yck gk log k pckcgk c + 1 be the expected utility from expenditure ec in Yc k=1 channel c, assuming categories are optimally chosen (denoted by the star superscript on sk ). Then, the two stage 14 budgeting decision may be written as: Stage 1: Channel expenditure decision max V (~e) = ~e C C c=1 c=1  vc (ec ) +  Wc I(ec > 0) + log (z) sub ject to : ec C  I (ec > 0) 2 {0, 1}, z > 0 0, C c=1 C c=1 c=1  ec (1 + rc ) +  fc I(ec > 0) + z = b " (7b) (7c) ✓ K (7a) s⇤ (Yc )ec where : vc (ec ) = E  Yck gk log k +1 Yc pck gk k=1 ◆# (7d) Stage 2: Category share decision max ~s u(~s | e⇤c > 0) = K  Yck gk log k=1 sub ject to : sk 0, K ✓ ◆ sk e⇤c +1 pck gk  sk = 1 (8a) (8b) k=1 A complication that arises when computing the expectations in (7d) is that no closed form expression is available – vc (ec ) must be computed by taking a large number of draws of the unobservables, solving for the optimal shares associated with each draw, evaluating the utility at the optimal shares, and then averaging the result. We devise a highly efficient algorithm to perform these computations, which we discuss in Section 4.2 and Appendix A. 3.2 Empirical specifications In order to solve the model and derive the corresponding likelihood function, it is necessary to impose functional form and stochastic assumptions on {Yck , Wc , fc , b, µ}. We summarize our empirical specifications for these components in Table 4 below. Functional form Category utilities Yck = exp (yk + eck + hck ) Channel utilities Wc = wc + xc Channel fixed costs* f1 = 7.95, f2 = ad k Trip budget b = exp (t + b t X t + n) Consideration arrival* µ = exp d + b d X d d z * d = retail store distance Distributional assumptions eck ⇠ EV (0, se ) , hck ⇠ N 0, shck xc ⇠ EV 0, sx n ⇠ N (0, sn ) Table 4: Empirical specifications for {Yck , Wc , fc , b, µ} 15 Category marginal utilities Yck are constrained to be positive through the exp(·) operator, and are comprised of a category intercept (y) and two stochastic terms (h and e). The h terms are normally distributed channel/category specific shocks that capture product fit and assortment information. By assumption, the extreme value shocks (e) are iid across categories and channels. This compound shock specification is used to allow for a common channel-category component and it additionally provides computational convenience when evaluating the likelihood function.13 The interpretation of this structural shock is that it is the information realized upon visiting the channel. Given the functional form in Table 4, a higher shock variance yields a higher expected utility because the potential utility gain from a positive shock is larger than that of a negative shock. This is a desirable feature because the more potential for good fit products, the higher the expected utility. We would expect that product SKU variety, the ability to try the product in the retail channel, and the information provided in the online channel all could contribute to the variance of the shock distribution. The channel utility Wc is comprised of an extreme value shock (xc ) and an intercept (wc ) that captures channel format preferences. For channel fixed costs, we set f1 = 7.95 because the firm charges a flat rate shipping fee of $7.95 per online order. Retail channel fixed (transportation) costs are not directly observed. We capture these costs using a flexible function of the consumer’s distance to the nearest retail store (d): f2 = ad k . We assume a log-normal distribution for the trip budget (b). In addition to an intercept (t), the trip budget specification incorporates additional controls through the shifter matrix X t . Our specification of X t includes combined statistical area (CSA) time trends to control for common unobserved market factors that potentially affect both demand (expenditure patterns) and firm retail entry (and thus, retail store distance). Finally, the consideration arrival process rate parameter µ, is specified as a flexible function of retail store distance (d z ), an intercept (d ), and additional controls (X d ). Distance is included to capture the effect of retail store proximity on a consumer’s awareness of and hence propensity to consider the brand. In X µ , we include quarter of year dummies to account for seasonal patterns of shopping frequency. 3.3 Second stage optimality conditions and likelihood Optimality conditions For the second-stage sub-problem, both the objective (utility) function and the con- straint equation are continuously differentiable in the control variables (~s), so the Kuhn-Tucker theorem may be ✓ ◆ K ⇤ applied to solve the model. We first form the Lagrangian for the problem: L = u(~s | ec ) l  sk 1 . Optimal shares will then satisfy the Kuhn-Tucker (KT) conditions: ∂L ∂ sk 0, sk 0, ∂L ∂ sk sk k=1 = 0 for all k = 1, ..., K. 13 In deriving the likelihood function, integration over the iid extreme value shocks (e) may be done analytically. Monte Carlo integration is performed over the heteroskedastic normal shocks (h), using draws that serve the dual purpose of computing the expected product utility function vc (ec ). 16 Without loss of generality, assume that the first good is chosen. In the Technical Appendix, we show that the KT conditions for the consumer’s problem reduce to: 8 > > <= gc1 + ec1 gck + eck > > :< gc1 + ec1 where : gck = yk + hck if s⇤k > 0 (9a) s⇤k if = 0 ✓ ⇤ ⇤ ◆ sk ec log +1 pck gk log (pck ) (9b) Likelihood The Kuhn-Tucker conditions are used to form the likelihood for the observed share vector. Without loss of generality, let the first M (M 1) categories be chosen. Conditional upon ec1 and hck , chosen categories generate equality constraints that provide a 1:1 mapping from observed (optimal) shares sk to the shocks eck . Through a change of variables calculus (Jacobian term), this mapping produces a probability density for non-zero shares. Non-chosen categories generate inequalities that provide an upper bound on the eck . Integrating the joint probability density over the allowed regions for eck produces a probability for zero share categories. The unconditional likelihood is then given by integrating over ec1 and hck . We partition the iid channel/category shocks ~e in two parts, with the Mx1 vector ~eM corresponding to chosen alternatives and the (K M)x1 vector ~eMe corresponding to non-chosen alternatives. With this convention, the probability of observing expenditure share pattern ~s may be written: `(~s =~s⇤ |e⇤c ) = J~eM !~s⇤M ˆ ˆ• h e1 = • gc1 gˆ c,M+1 +e1 eM+1 = • ··· gc1 ˆ gcK +e1 eK = • f (~e) f (~h )d~eMe d~h (10) where f (·) represents a probability density function (extreme value for e, normal for h) and g is defined in equation (9b). In the Technical Appendix, we show the resulting likelihood reduces to: ! !ˆ ✓ ◆! M s⇤ e⇤ + g p M M ⇤ g (M 1)! e j c j c j c j `(~s =~s⇤ |e⇤c ) = ’ s⇤ e⇤c +cg j pc j  ’ exp sec M 1 e⇤c sec j=1 j j=1 j=1 h K ✓ gc j  exp sec j=1 ◆! M f (~h )d~h (11) where f (· ) is the normal pdf. The integration over h has no closed form solution – we employ simulation methods to compute this integral in our estimation procedure. 17 3.4 First stage optimality conditions and likelihood Optimality conditions In the first-stage sub-problem, the Kuhn-Tucker theorem does not apply because neither the objective function nor the budget constraint is continuously differentiable in the control variables ~e.14 Fortunately, the structure of the problem presents a viable solution procedure. To fix ideas, consider our empirical setting where there are two channels. Note that because at most one expenditure, the 8channel 9 0 1 0will1receive 0 1positive > > ⇤ < = B0C Be1 C B 0 C ⇤ ⇤ optimal expenditure vector ~e must be restricted as follows: ~e 2 @ A , @ A , @ A ⌘ {~e⇤0 ,~e⇤1 ,~e⇤2 }, where > > : 0 0 e⇤2 ; e⇤1 and e⇤2 are scalar values that represent the optimal expenditure levels for channels 1 and 2, assuming they were the chosen channel. More generally, in a slight abuse of notation, let: ~e⇤c ⌘ Cx1 vector with jth element e⇤j I(c = j) , ~e⇤0 ⌘ 0Cx1 (12a) With this convention, an observation of ~e⇤c corresponds to the following 2 optimality conditions: 1. e⇤c solves ∂V (~ec ) ∂ ec = 0 : optimal trade-off between expenditures in chosen channel and the outside good h ⇣ ⌘i 2. V (~e⇤c ) = max V ~e⇤j : chosen channel utility exceeds (expenditure optimal) counterfactual channel, j=0,..,C outside good only utilities Optimality condition 1 Given that channel c is chosen, terms from the alternative channels drop out of the utility and budget constraint equations. By solving the budget constraint for z and substituting into the utility function V , the first optimality condition may be evaluated as: ∂V (~ec ) ∂ ec = ∂ ∂ ec (vc (ec ) + Wc + log (b ec (1 + rc ) fc )) = 0. Solving the resulting first order condition for the budget b gives: b = fc + (1 + rc )e⇤c + 1 + rc v0c (e⇤c ) (13) where v0c (ec ) = ∂∂ vecc . Comparing equations (13) and (7c) reveals that the first order condition pins down the outside good expenditure, z = 1+rc v0c (e⇤c ) , and hence the trip budget b, as a function of the model parameters and observable data. Using our empirical specification for b from Table (4), taking logs and rearranging gives an optimality condition for the shock to the trip budget, n: ✓ ✓ ⇤ n = h (ec ) = log fc + (1 + rc ) e⇤c + 14 Indicator 1 0 vc (e⇤c ) ◆◆ t b bX b (14) functions in the channel utility and budget constraint equations produce discontinuities in the derivatives when moving from zero to positive expenditure. 18 Optimality condition 2 First, for notational convenience, we let V̄c⇤ represent the consumer’s total expected utility at her optimal expenditure level in channel c: V̄c⇤ where we use z = ⌘ E [V 1+rc v0c (e⇤c ) (~e⇤c )] = vc (e⇤c ) + wc + log (z) = vc (e⇤c ) + wc + log ✓ 1 + rc v0c (e⇤c ) ◆ (~e⇤c ) from optimality condition 1 for the second relation. For V (15) h ⇣ ⌘i = max V ~e⇤j to j=0,..,C hold, the utility of consuming in channel c must exceed the utility of consuming the optimal amount in any other channel and that of allocating the entire budget to the outside good: V (~e⇤c ) =V̄c⇤ + xc V (~e⇤0 ) = log (b) (16a) V (~e⇤c ) =V̄c⇤ + xc V ~e⇤j = V̄ j⇤ + xc , j 6= c (16b) Both V̄c⇤ and b in condition (16a) are expressible in terms of model parameters and observable data. However, the optimal expenditure levels in the unchosen channels, e⇤j , must be computed in order to generate V̄ j⇤ = V̄ j⇤ (e⇤j ). ⇣ ⌘ 1 To do this, we find the e⇤j that solves the budget constraint for channel j: f j + (1 + r j ) e j + v0 (e = b, where j) j the value of b is substituted from equation (13) above. e⇤j may be found numerically using bisection algorithms or other root finding techniques.15 With the e⇤j values in hand, the V̄ j⇤ terms may be computed using equation (15). Likelihood As with category shares, the joint likelihood of the expenditure vector is formed by integrating the joint density of the budget and channel shocks over the region consistent with the optimality conditions: ` (~e =~e⇤c ) = Jn!e⇤c fn (h(e⇤c )) ◆ ˆ• ✓ ⇥ ⇤ ⇤ ⇤ I V (~ec ) = max V ~e j f (~x )d~x • = Jn!e⇤c f ✓ h(e⇤c ) sn ◆ (17a) j=0,1,..,C 0 ˆ• xc =log(b) V̄c⇤ V̄c⇤ V̄ j⇤ +xc B @’ j6=c ˆ V̄ j⇤ l sx x j= • ! 1 C dx j A l ✓ V̄c⇤ sx ◆ dxc (17b) where l (· ) is the extreme value pdf and equation (17b) makes use of our empirical specification distributional assumptions on n and xc . In the Technical Appendix, we show the integration in equation (17b) may be performed 15 The solution for e⇤j , if it exists, is unique. To see this, first rewrite the budget constraint as e j = b fj 1+r j 1+r j v0j (e j ) and note that v0j (e j ) is 1+r ⇣ b fj ⌘ . The latter condition is both positive and decreasing in e j . Therefore, the left hand side of this equation is monotonically increasing in e j while the right hand side is monotonically decreasing: if such functions intersect, they do so at a single point, implying any solution e⇤j must be unique. As e j b f is bounded on [0, 1+rcj ), sufficient conditions for existence are that 0 b fj 1+r j 1+r j v0j (0) and b fj 1+r j ⇣ b f > ⌘ b fj 1+r j v0j j 1+r j trivially satisfied while the first holds when the residual budget after fixed costs and taxes 1+r jj exceeds the outside good expenditure at ⇣ ⌘ 1+r zero channel expenditure v0 (0)j . The condition fails, for example, when counterfactual channel fixed costs exceed the budget. In such j cases, we set V̄ j⇤ = • when evaluating the model likelihood, to reflect the fact the channel is not a viable choice at this budget level. 19 analytically, resulting in the following expression for the likelihood of the expenditure vector:16 0 ⇣ ⇤⌘ 1 !! V̄c ✓ ⇤ ◆ ⇤ C exp log (b) V̄ s h(e ) B C j x c ` (~e =~e⇤c ) = Jn!e⇤c f ⇣ V̄ ⇤ ⌘ A 1 ’ L @ sn sx j=1 ÂCj=1 exp sxj (18) where h (· ), V̄c⇤ (· ) and b(· ) are defined above in (14), (15) and (13) and L (·) is the extreme value cdf. The final term in equation (18) corresponds to the probability of purchase (not allocating all expenditure to the outside good), while the middle term corresponds to the probability of the channel being chosen conditional on purchase. 3.5 Consideration arrival/purchase incidence likelihood 3.5.1 No purchase probability Following the logic of optimality condition 2 above and given the independence of the channel shocks, the probability of no purchase given consideration (at a fixed budget level b) is: ✓ ◆ C log (b) V̄c⇤ (b) ⇤ ⇤ Pr (V0 > Vc , 8c | b) = ’ L sx c=1 (19) To compute this probability for a given budget, one must sequentially solve for the optimal expenditure level in each channel to evaluate V̄c⇤ (as in optimality condition 2 above). Then, the unconditional probability of no purchase is given by integrating over the budget distribution: ` (~e =~e⇤0 ) = ˆ• • ◆ log (b(n)) V̄c⇤ (b(n)) f (n)dn ’L sx c=1 C ✓ (20) This integral may be computed by Gauss-Hermite quadrature. 3.5.2 Purchase incidence likelihood Let L be the number of observed purchase occasions in the period, and let Y be the number of brand consideration events so that Y ⇠ Poisson(µ). Conditional upon Y , L follows a binomial distribution: it is equal to the number of successes (purchases) in Y Bernoulli trials. The probability of success (purchase given consideration), r, follows from the expenditure model: r = 1 ` (~e =~e⇤0 ). To form the likelihood of L, we integrate over the distribution of unobserved consideration events that can rationalize an observation of L purchase occasions, as follows: 0 1 • • Y µY e r µ (r µ)L B Y C L Y Le ` (L) =  Pr[L |Y ]Pr[Y ] =  @ = A r (1 r) Y! L! Y =L Y =L L 16 The Jacobian term is: Jn!e⇤c = ∂n ∂ e⇤c = ∂ h(e⇤c ) ∂ e⇤c ✓ = (1 + rc ) 1 v00c (e⇤c ) (v0c (e⇤c ))2 20 ◆⇣ ⇣ ⌘⌘ 1 fc + (1 + rc ) e⇤c + v0 (e ⇤) c c 1 (21) Thus, the observed number of purchase occasions is also Poisson distributed, with rate parameter r µ. This compound rate parameter provides a linkage between the purchase incidence rate and the structural parameters governing channel and product utility. 3.6 Heterogeneity specification We allow for heterogeneous product and channel preferences by incorporating individual-specific heterogeneity in the category (y) and channel (w) utility parameters. We further allow for heterogeneous shopping budgets (t) and brand consideration rates (d ) via individual-specific parameters. We collect the heterogeneous parameters in the vector qi1 ⌘ {yi , wi , ti , di }. Similarly, we collect the homogeneous parameters in the vector q 2 ⌘ g, s , a, k, b t , b d . We assume a hierarchical model such that: qi1 ⇠ N (Zi °, S) ⇣ ⌘ q 2 ⇠ N q 2 , Aq 21 vec(°) ⇠ N vec ° , S ✏ A° 1 S ⇠ IW (u,W ) Heterogeneous parameters qi1 are thus projected onto demographic variables Zi using the hyperparameters °. In Zi , we include the customer and market demographic variables from Table 1. For the conjugate prior distributions, we assume that the A matrices are 0.0001I, where I is the identity matrix. We also set u = length(qi1 ) and W = uI. 4 Estimation Here we describe our Markov Chain Monte Carlo (MCMC) estimation procedure and briefly discuss identification issues. 4.1 Joint likelihood We begin construction of the full model likelihood function with the probability of observed outcomes for a single purchase occasion. We use l to index the l th purchase by customer i in quarter t. Using equations (18) and (11), the likelihood for a single purchase occasion may be written: Litl = `(~e⇤itl ,~s⇤itl ) = ` (~eitl =~e⇤itl ) ` ~sitl = ~s⇤ itl |~e⇤itl 21 (22) Given the independence across trips and using equation (21), the likelihood of the collection of trips for customer i in period t is: Lit = Pr(Lit ) Lit ’ Litl l=1 !(Lit >0) (23) The likelihood of the Ti period observations for customer i and the total likelihood as a function of all model parameters (Q) are then given by: Ti Li = ’Lit t=1 4.2 I , L(Q) = ’Li (24) i=1 Estimation algorithm Given our hierarchical model specification, we can use the following Gibbs sampler to simulate from the posterior distribution of the model parameters: 1. Draw qi1 for each household using a random walk Metropolis-Hastings step with candidate density: Li (qi1 |q 2 , °, S, Zi , {Lit } , {eitl } , {sitl }) = ’t Lit fq 1 (qi1 |°, S) 2. Draw q 2 using a random walk Metropolis-Hastings step with candidate density: L q 2 | qi1 , °, S, Zi , {Lit } , {eitl } , {sitl } = ’i ’t Lit fq 2 (q 2 ) 3. Draw °| qi1 , S from the posterior (assuming conjugate prior) 4. Draw S| qi1 from the posterior (assuming conjugate prior) 5. Repeat steps 1-4 To speed convergence of the algorithm, we obtain starting values from the maximum likelihood estimates, assuming homogenous parameters. We run the chain for 50,000 draws and discard the first 40,000 as burn-in. Reported estimates of the posterior distribution in the next section are based on the final 10,000 draws, thinned by retaining every 5th draw. 4.2.1 Computational issues The primary computational challenge for estimation relates to repeatedly solving for the optimal expenditure levels in unchosen channels (including the no-purchase probability in equation (20)). This process requires solving the budget constraint numerically (by observation), which requires repeated evaluation of expected product utility, vc (ec ) , which in turn requires repeated computation of optimal category shares (for many draws of the 22 channel/category shocks). To overcome the computational burden, we develop an algorithm that: (a) efficiently simulates optimal category shares at given expenditure levels, and (b) accurately approximates the expected product utility as a function of expenditure using a polynomial interpolant. This approach vastly reduces the number of times expected utility must be simulated when solving the budget constraint equations. For each draw of the parameters, we first obtain a Chebyshev polynomial approximation of vc (ec ) for each channel and (customer/quarter) observation. With the (analytically differentiable) polynomial approximations in hand, the budget constraint equations are efficiently solved using a vectorized bisection algorithm. Our algorithm to compute optimal category shares, which follows Pinjari and Bhat (2010), is explained in detail in Appendix A. The key feature of the algorithm is that it solves for optimal shares via a fixed number of matrix operations, rather than requiring a constrained nonlinear search. For the Chebyshev approximation, we use a 15th order polynomial.17 As the expected utility function is smooth and monotonically increasing in expenditure, these polynomials provide an excellent fit to the simulated values, with absolute deviations averaging less than 0.1%. When simulating vc (ec ) and when performing the the Monte Carlo integration over n in equation (11), we use 1000 draws generated from a Halton sequence. 4.3 Identification In a non-linear model such as the one presented here, functional form and distributional assumptions inevitably contribute to parameter identification. Nevertheless, key patterns of variation in the data link unambiguously to certain parameters. We summarize these relationships in Table 5. Product utility Channel utility Parameter Key identifying variation Normalizations y category incidence | expenditure y1 = 0 g category share | category incidence snck category incidence/share price response se normalized se = 0.2 wc channel share | purchase w1 = 0 sx channel share expenditure response Channel cost a,k channel share distance response Budget t, b t , sn empirical expenditure distribution, X t Consideration arrival d,bd purchase incidence, X d z purchase incidence distance response f1 = 7.95 Additional controls CSA time trends CSA time trends quarter fixed effects Table 5: Identification summary In the table, category incidence refers to the frequency of purchases involving the category, i.e., the discrete 17 The lower bound of the interpolation domain is zero expenditure. For the upper bound, we use the largest value at which the polynomial will be evaluated, which is the largest quadrature node evaluated in equation (20). We use the extended Chebyshev grid, where interpolation limits are also quadrature nodes. 23 portion of the discrete/continuous share allocation. Given expenditure levels, individual-specific category incidence rates can identify yi , even when categories are purchased discretely (single category baskets) and no price variation is present. Share variation (multiple category baskets) reflects cross-category quantity trade-offs that identify the satiation parameters g. Adding price variation allows the variance of category/channel shocks to be identified.18 Since at most CxK different variance terms are identified and the dual-shock formulation contains CxK+1 free parameters, we must impose a normalization restriction. We choose to estimate the shck and normalize the extreme value variance se to a sufficiently small value.19 We found that normalizing se = 0.20 was sufficiently small in our setting. The level of product utility must also be normalized, which we impose by restricting y1 = 0. In the expenditure choice model, the level of (additive) channel utility is normalized by setting online utility (w1 ) to zero. Average retail channel incidence rates given purchase then identify the w2i . Expenditure variation in the expected utility function vc (ec ) identifies the channel shock variance sx . The shopping budget and channel fixed costs also require a normalization, as the levels are not separately identified. Our approach is to estimate the budget level (t) and normalize the level of fixed costs. Our setting provides a natural normalization for online fixed costs, as the firm charges $7.95 in shipping fees per order, and thus we set f1 = 7.95. For retail fixed costs, we restrict retail fixed costs to be zero at zero distance. The retail fixed cost parameters (a, k) are then identified by changes in channel choice with retail store distance, after controlling for individual-specific retail utility intercepts and common unobserved budget trends (via CSA time trends) – principally, within-customer distance variation due to store entry. With the normalizations on fixed costs, individual specific mean expenditure levels identify ti . We impose a common variance assumption on shopping budgets, which is identified by aggregate within-subject expenditure variation. Finally, the consideration arrival rate intercepts di are identified by customer-specific average purchase frequencies, while the b d are identified by common seasonal changes in purchase rates. The effect of distance on consideration (z ) is identified by changes in purchase frequency with retail store distance, after controlling for individual-specific purchase rates and seasonal effects. 18 To 1 2 +s 2 snck e see this, note that in equation (9b), the price term has no coefficient. When shock variances are freely estimated, p effectively becomes the price coefficient for category k in channel c, since utility (g function) is scaled by the compound shock variance when entering the likelihood. 19 As a practical matter, this translates to setting s small enough to get significant estimates for the s e nck . See Bhat (2008) for an extended discussion of alternative normalizing procedures. 24 5 5.1 Results Model estimates heterogeneity mean std dev Product utility parameters Category 1 Category 2 Category 3 Category 4 Category 5 Category 6 satiation g1 N 6.2702 0.0201 online shock variance sn11 N 0.3239 0.0035 retail shock variance sn21 N 0.4268 0.0029 satiation g2 N 5.5732 0.0905 baseline utility intercept y2 Y -0.4446 0.3718 online shock variance sn12 N 0.6303 0.0163 retail shock variance sn22 N 0.4481 0.0011 satiation g3 N 8.1092 0.0907 baseline utility intercept y3 Y -0.2298 0.3758 online shock variance sn13 N 0.7758 0.0072 retail shock variance sn23 N 0.5869 0.0144 satiation g4 N 3.0852 0.0116 baseline utility intercept y4 Y -0.7035 0.4251 online shock variance sn14 N 0.8731 0.0019 retail shock variance sn24 N 0.6708 0.0065 satiation g5 N 2.0362 0.0204 baseline utility intercept y5 Y -1.0222 0.4524 online shock variance sn15 N 1.0979 0.0018 retail shock variance sn25 N 0.9644 0.0131 satiation g6 N 12.0165 0.0535 baseline utility intercept y6 Y -0.6312 0.5803 online shock variance sn16 N 1.2149 0.0166 retail shock variance sn26 N 0.1359 0.0015 Channel utility parameters Retail intercept w2 Y 0.7743 1.1541 Common channel shock variance sx N 0.8409 0.0242 distance scale a N 2.0816 0.1801 distance power k N 0.5873 0.0170 intercept t Y 4.9662 0.4294 shock variance sh N 0.4885 0.0026 intercept d Y -1.0963 0.8227 I(1st quarter) b1d b2d b3d N 0.0512 0.0008 N 0.2641 0.0017 N 0.3596 0.0100 N -0.0237 0.0024 Channel cost parameters Retail Expenditure parameters Budget* Consideration arrival parameters I(3rd quarter) I(4th quarter) distance power z * CSA time trends included in budget specification (via b t X t ) not reported for brevity Table 6: Model estimates 25 We report posterior means and standard deviations of our parameter estimates in Table 6. For heterogeneous parameters, the reported means average across customers in addition to posterior draws. Among the homogeneous product utility parameters, we note that satiation rates (g) vary widely, with category 6 exhibiting the least satiation in quantity (largest g) and category 5 exhibiting the most. Also, for 5 categories (all but category 1), the online shock variance sh1k exceeds the retail shock variance sh2k . This result is consistent with the notion that the online channel generally provides more product information and thus has larger variance shocks; category 1 is the one which we would expect the most need to try the products, which explains why the variance is higher in the retail channel for just this category. Category 6 has a much higher variance in the online channel, which is likely due to the fact that this is the one category with a substantial discrepency in the number of styles and colors available in the different channels, with many more SKUs available online. (a) Category 2 utility (y2 ) (c) Category 4 utility (y4 ) (e) Category 6 utility (y6 ) (g) Budget (t) (b) Category 3 utility (y3 ) (d) Category 5 utility (y5 ) (f) Retail channel utility (w2 ) (h) Consideration arrival (d ) Figure 7: Heterogeneous parameter frequency distributions The sample means of the heterogeneous category intercepts (yk ) are all negative, implying category 1 (normalized to zero) is the most preferred category, followed by category 3, etc. We summarize the distribution of the heterogeneous category intercepts (y) and other heterogeneous parameters (w2 , t, d ) in Figure 7. Figure 7 makes clear that valuations are more dispersed for categories 4,5, and 6 than for categories 2 and 3. There is also evidence of some bimodality in category preferences, that may be correlated with the obvious bimodality in retail channel utility intercept (w2 ). To investigate this issue, we report the correlation pattern among the heterogeneous parameters in Table 7. Indeed, we see that customers with a high preference for retail generally prefer categories 2-6 less relative to category 1 (corr(yk , w2 ) < 0). We also see that customers who consider the brand (hence, shop) frequently tend to have smaller budgets and spend less (corr(d , t) = 26 .34). Higher shopping frequency is also associated with higher retail preference (corr(d , w2 ) = .11). In Table 12 of Appendix B we report report estimates of the hyperparameters °, which capture the projection of heterogeneous parameters onto demographic variables. These results suggest older age also correlates with higher retail preference and relative preference for category 1. Demographic factors other than age have limited significance for the product and channel utility intercepts (yk ). In addition to age, shopping budgets (t) have significant positive associations with education, white race and median household income, and a negative association with rurality. Consideration rates (d ), which drive purchase frequency, are generally not well explained by demographic factors. category 2 utility y2 category 3 utility y3 category 4 utility y4 category 5 utility y5 y2 y3 y4 y5 y6 w2 t d 1 -0.05 0.20 0.10 0.21 -0.14 -0.06 0.07 -0.05 1 0.26 0.04 0.16 0.01 -0.01 -0.02 0.20 0.26 1 -0.07 0.15 -0.22 -0.09 0.12 0.10 0.04 -0.07 1 0.31 -0.56 -0.16 -0.03 -0.03 0.21 0.16 0.15 0.31 1 -0.30 0.07 -0.14 0.01 -0.22 -0.56 -0.30 1 0.01 0.11 budget t -0.06 -0.01 -0.09 -0.16 0.07 0.01 1 -0.34 consideration arrival d 0.07 -0.02 0.12 -0.03 -0.03 0.11 -0.34 1 category 6 utility y6 retail channel utility w2 Table 7: Correlations among heterogeneous parameters We now turn to distance-related effects. The consideration arrival parameter z captures the effect of retail distance on brand consideration. The negative (highly significant) coefficient for z implies that as distance decreases, consideration arrival rates (and hence purchase frequency) will go up. At the same time, the magnitude of retail fixed costs decreases in accordance with the estimated cost function f2 = 2.08d 0.59 , which we plot in Figure 8. The retail cost function is concave such that the marginal cost of traveling to the store declines with distance.20 On a purely fixed cost basis, consumers would be indifferent between the channels at a distance of approximately 11 miles from the store. However, fixed costs are only one consideration for consumers: individual category/channel preferences and product prices also factor into channel choice trade-offs. In Sections 5.2 and 6.1, we simulate from the model to derive different measures of retail distance effects on quarterly channel expenditure patterns. 20 For example, transportation costs may decline in distance due to greater use of highways (and hence better fuel milage) when traveling longer distances. Such effects are magnified to the extent that estimated fixed costs capture the opportunity cost of time spent traveling. 27 Figure 8: Channel fixed costs as function of retail distance 5.2 Price and retail distance elasticities We derive price and distance elasticity measures from the model estimates by computing (changes in) expected channel expenditures and category shares. When computing expectations, we aggregate over purchase occasions in the quarter. For example, expected quarterly channel expenditures (ec ) are calculated by customer/quarter Y observation using: ec = E  e⇤c,l = µE [e⇤c ]. For unit demand price elasticities, we work with category/channel hl=1 i s⇤ e⇤ quantity indices, qck = µE pk ckc , and compute ∂∂ qpckck qpckck by observation. Sample average demand elasticities are reported in the first two columns of Table 8. With the exception of category 1, unit demand is more responsive to price in the retail channel, consistent with the pattern of smaller shock variances (shck ) in retail. Cross-channel differences are moderate, with the exception of category 6, which is approximately twice as price responsive in retail. In the last two columns of Table 8, we report total (online + retail) expenditure price elasticities. The total expenditures effectively capture the net effect of price changes on substitution with the outside good. That the last two columns are inelastic as compared to the first two reflects that most price-induced switching occurs among the inside categories. Using total expenditures as a baseline, retail price changes in category 6 have limited effect, despite the large unit demand elasticity. 28 demand (quantity) total expenditures category online retail online retail 1 -2.84 -2.83 -0.77 -0.65 2 -3.10 -3.58 -0.30 -0.22 3 -2.92 -3.25 -0.54 -0.31 4 -2.59 -2.50 -0.49 -0.29 5 -2.21 -2.02 -0.38 -0.30 6 -2.81 -7.02 -0.56 -0.01 Table 8: Elasticities of category demand (quantity) and total expenditures with respect to price Of particular interest are retail distance effects on total expenditures and expenditures by channel. In Table 9 we report expected quarterly channel expenditure (ec ) elasticities with respect to retail store distance. For the empirical distribution of customer/store locations, the total expenditure elasticity implies a 10% decrease in retail distance will on average increase total expenditures by 0.7% among existing customers. These revenue gains result from increased purchase frequency (which increases by 0.3%) and partial recapture of consumer transportation cost savings. Channel switching patterns imply retail revenues increase by 2.1% while online revenues decrease by 0.5%. outcome elasticity total expenditures -0.07 online expenditures 0.05 retail expenditures -0.21 purchase frequency -0.03 Table 9: Elasticities with respect to retail store distance 5.3 Model fit To assess the fit of the model, we simulate purchase occasions keeping panel variables the same as in our estimation sample. In Appendix C, we provide distribution plots for: a) the number of simulated purchase occasions (corresponding to Figure 1 in the sample data), b) kernel density plots of purchase expenditure levels by channel (corresponding to Figure 4) and c) category average shares by channel (corresponding to Figure 5). In addition, we compare channel choice frequencies in the simulation and data. The distribution of simulated outcomes closely replicates the distribution of observed outcomes. 6 Applications In this section, we demonstrate the application of our results and further explore their implications. 29 6.1 Channel revenue retail distance dependence Our distance elasticities reported in Section 5.2 measure the average expenditure response to changes in retail distance, evaluated at the empirical distribution of customer/store locations. When evaluating revenue effects, other distance distributions may be of interest. In particular, fixing retail distance to a specific value for all customers in the sample provides a measure of how the firm’s customer base would respond to a uniform retail distance treatment effect (in essence, reflecting a representative consumer for the heterogeneous sample). We simulate a series of such datasets where distance treatments are varied across datasets. To derive a summary view of the experiments, we focus on expected quarterly revenues by channel as a function of retail store distance. We average over individuals in the sample to obtain per capita per quarter revenue estimates. In the left hand panel of Figure 9, we plot these expected quarterly revenues by channel and total quarterly expected revenue versus retail distance. In the right hand panel of Figure 9, we plot point elasticities corresponding to the curves in the left hand panel. In the left panel of Figure 9, total revenues increase monotonically as distance decreases, from approximately $47 per customer per quarter at 100 miles retail distance to $59 at one mile, figures that capture the net economic benefit of shifting retail availability. It is clear that gains in total revenue track increases in retail revenue, which are only partially offset by more modest declines in online revenue. Preferences for the online channel are sufficiently high among some customers that online revenues remain appreciable even as retail distance approaches zero miles. Retail and online channels contribute equally to firm revenues at a distance of approximately 25 miles from the store. Given our previous finding that costs equate at approximately 11 miles, the fact that online revenues are expected to be about 20% at this distance demonstrates the added utility many consumers get from shopping in the retail channel. Outside this radius, the curvature of the revenue functions is moderate and well approximated by a linear function. Linearizing the curves in this range and comparing slopes reveals that as distance decreases, retail revenues increase at approximately four times the rate that online revenues decline. We emphasize that the preceding results reflect demand among existing customers. In the next application, we expand our analysis to include new customer acquisition, including its dependence on retail distance. 30 (a) Expected revenue (b) Expected revenue elasticities Figure 9: Expected quarterly revenue by channel with equidistant customers, as a function of retail store distance 6.2 Retail location selection The choice of where to locate retail stores is a difficult problem that many firms face. In this section, we demonstrate how our model may inform such decisions. We abstract from cost concerns by assuming the firm’s objective is to maximize its short-run (5 year/20 quarter) expected revenue. Our method is to generate a comprehensive set of potential entry locations and to calculate the expected incremental revenue generated from locating a store at each of these locations. To maintain high spatial resolution yet keep the exercise tractable, we discretize the set of potential entry locations as Census 2010 block group centroids in the continental United States, and thus evaluate approximately 216,000 candidate locations. For each candidate location, we compute a revenue index (defined as G) using a two step process. In the first step, we determine the set of markets (other block groups) that would be impacted by entry at the candidate location. This set is generated by identifying block groups for which the distance to the nearest retail outlet would be reduced as a result of entry at the candidate location. In the second step, for each market in the impacted set, we forecast demand under two conditions: first assuming no entry were to occur, and then assuming it does occur. Taking the difference in these forecasts generates a prediction of the incremental contribution of entry. Our demand forecasts hold prices and existing store locations fixed at their final period values. These forecasts incorporate revenue contributions from both new and existing customers. We extend our framework to include new customer acquisition by assuming that: i) an exogenous market-level Poisson process drives new customer arrivals, and ii) the preferences of new customers are drawn from the (market specific) heterogeneity distribution 31 described in Section 3.6. In Appendix D, we describe how we build upon the results of Section 2.2.4 and incorporate demographic variables into our prediction of new customer arrivals. Let Imt be the set of existing and simulated new customers in market m and simulated future period t, and let eitc be the expected expenditure for 2 20 customer i in period t and channel c. Our revenue index is then given by: Gm =    eitc . c=1t=1i2Imt Experiment results Since neighboring block groups within a metro region often yield similar revenue indices, it is difficult to discern the relative desirability of broad regions (e.g., cities) using an ungrouped ranking of potential entry locations. We thus summarize the results of the experiment at the MSA (Metropolitan Statistical Area) level. To do this, we assign a MSA-level index using the maximal revenue index among the block groups in that MSA. These indices are reported in Table 10. Rank MSA Revenue index ($MM) 1 Minneapolis-St. Paul, MN-WI 2.07 2 San Francisco-Oakland-San Jose, CA (C) 1.73 3 St. Louis, MO-IL 1.42 4 Miami-Fort Lauderdale, FL (C) 1.29 5 Columbus, OH 1.20 6 Orlando, FL 1.20 7 Cincinnati-Hamilton, OH-KY-IN (C) 1.02 8 West Palm Beach-Boca Raton, FL 1.02 9 Washington-Baltimore, DC-MD-VA-WV (C) 1.00 10 New York, Northern New Jersey, Long Island, NY-NJ-CT-PA (C) 0.97 Table 10: Top 10 model-predicted entry locations (MSA level) Comparing the recommended entry MSAs in Table 10 to the brand’s store locations operative during our study reveals that three of these MSAs have existing retail stores. The experiment thus suggests that desirable locations may be found in markets with an existing brand presence as well as in “virgin territory” (although greater weight is placed on the latter). Lending some face validity to our experiment, we note that since the end of the observation window, the brand has opened store locations in three of the seven “new territory” locations identified in Table 10. To demonstrate that the counterfactual results can help inform both local and widegeography choices, we generate revenue heat maps for the top ranked MSA, Minneapolis-St. Paul, which are provided in Figure 10. The map identifies the most desirable location by the darkest shade of red, as well as close substitutes in the event entry is not possible in that specific location (due, for example, to zoning restrictions or lack of available rental space). 32 Revenue index: Revenue index: (2.08,2.10] (1.96,2.08] (1.83,1.96] (1.71,1.83] [1.58,1.71] (1.58,2.09] (1.33,1.58] (1.16,1.33] (0.99,1.16] [0.52,0.99] (a) Entire MSA (b) Inset detail Figure 10: Revenue generation index by Census block group, Minneapolis-St. Paul MSA 6.3 Eliminating online shipping costs Another important issue for multi-channel firms is how to design channel-specific pricing policies. Our model can also be used to assess the demand (and with cost information, profit) implications of various pricing policies. Broadly, the firm can influence variable shopping costs by setting channel-specific category prices, or the firm can influence fixed shopping costs by varying online shipping fees (the case of shifting retail fixed costs was considered in the previous two applications). In this application, we consider the impact of eliminating online shipping fees, i.e., we compare simulated outcomes under observed shipping fees ( f1 = 7.95) and a counterfactual scenario where the fee is eliminated ( f1 = 0). In the next section, we consider the case of systematically varying category prices across channels. We summarize the experiment in Table 11 below. For the baseline ( f1 = 7.95) and no shipping ( f1 = 0) scenarios, we report quarterly average revenues, number of purchases, and average revenue per purchase for the online channel, retail channel, and both channels combined (total). We then report the change in these measures going from the baseline to no shipping scenarios, in absolute (D ) and percentage (D%) terms. Eliminating shipping fees increases total revenue by 3.1%, driven by a 1% increase in overall purchase frequency and a 2.1% increase in expenditures per purchase. While online expenditures per purchase increase (1.5%) when shipping fees are eliminated, there is limited influence on retail expenditures per purchase (0.3% increase). The percentage change in channel purchase incidence rates is more pronounced due to switching from retail to online: online purchases increase by 11.3% while retail purchases decline by 6.4%. Conclusively determining whether this policy is profitable for the firm requires knowledge of shipping costs 33 and category margins. However, under fairly conservative assumptions, some back of the envelope calculations suggest it will not be. If we assume that shipping costs are the same as the shipping fee the firm charges to consumers (complete pass through), then we can evaluate the expected cost of implementing the policy and compare it to the expected incremental revenue the policy brings to the firm. If costs exceed revenues, the policy is not predicted to be profitable. If revenues exceed costs, a lower bound on average margins that makes the policy profitable could be compared to actual average margins if such information is available. Here, when the firm eliminates online shipping fees, it must pay shipping costs on all online purchases, meaning the expected cost of the policy is 0.211*7.95=$1.68 per customer per quarter. At the same time, the net revenue gain is $1.66 per customer per quarter, implying an expected net loss of $0.02 per customer per quarter. While fully eliminating shipping fees appears not to be profitable, there may be scope for reducing shipping fees without incurring losses. Moreover, such reductions could potentially have benefits for customer acquisition that are not accounted for here. Total (Online + Retail) Online Retail Quarterly Quarterly Rev per Quarterly Quarterly Rev per Quarterly Quarterly Rev per revenue purchases purchase revenue purchases purchase revenue purchases purchase baseline 53.12 0.455 116.71 25.63 0.190 134.90 27.48 0.265 103.67 no shipping 54.77 0.460 119.16 28.96 0.211 136.93 25.82 0.248 104.01 D 1.66 0.005 2.45 3.32 0.021 2.04 -1.67 -0.017 0.34 D% 3.1% 1.0% 2.1% 13.0% 11.3% 1.5% -6.1% -6.4% 0.3% Table 11: No online shipping costs counterfactual (per-customer averages) 6.4 Channel-based category pricing In addition to shifting customer fixed transaction costs as in the previous examples, firms can set channel-specific category prices (channel based price discrimination) to increase revenues and profits. In principle, prices can be optimized by freely varying all category prices by channel. Fully optimizing prices in this manner is a formidable computational task, as it would involve determining the best price for each product category in the online channel and potentially for each retail location. Rather than fully solve this high-dimension problem, our purpose in this section is to demonstrate how the model estimates can be used to solve a simplified price discrimination problem, where the firm’s online category prices are taken as given and an optimal (uniform) scaling factor for comparable retail categories is sought. Optimizing prices requires a profit objective function, which in turn requires that we make assumptions about category costs. For the purpose of this exercise, we assume that, in the data, the firm sets product category prices 34 using the “keystone” markup rule of 100% over wholesale price.21 This assumption allows us to back into category costs by observation. Formally, letting wck represent the marginal cost of category k in channel c for a given customer/quarter observation, and using the superscript 0 to denote the baseline (original data) scenario: wck = 12 p0ck . Since we assume the firm wishes to maintain its current online pricing policy and seeks a simple markup or markdown rule to generate retail prices from online prices, our experiment will attempt to maximize profits by varying the ratio of retail to online prices (denoted by l ), holding online prices fixed. To conduct this search, we generate a series of datasets where retail prices are altered from those observed in the data, ranging from 0.65 to 1.5 times online prices. Formally, we generate datasets indexed by price ratio l jp such that (using a 1 superscript to indicate counterfactual prices): p11k = p01k , p12k = l jp p01k . Next, for each dataset we compute 2 6 expected profits by observation as: p =   qck p1ck c=1k=1 p1ck wck . Optimal prices are selected by comparing per-customer average (or equivalently, total) quarterly profits across the l jp . The procedure outlined above, in which factors other than price are held fixed, uses the empirical distribution of retail store distance. Also of interest here is how optimal prices change with retail distance. To explore this issue, we repeat the search over l jp under two alternate retail distance distributions. In particular, we simulate profits assuming customers are located at a distance 0.5 times those observed in the data, and at 2.0 times those observed in the data. We summarize the results of our experiment graphically in Figure 11 below, where we plot expected percustomer quarterly profits as a function of the retail/online price ratio l . The three curves in Figure 11 correspond to the three distance distributions (0.5x, 1.0x, 2.0x empirical distribution). For each distance distribution, the optimal price ratio is indicated by an ’x’ on the corresponding curve in Figure 11. Optimal retail prices are approximately 105%, 110% and 115% of online prices for 0.5x, 1.0x and 2.0x distances, respectively – i.e., lower markups correspond to closer average distances. As may be seen, while total expected profits are higher, the firm’s ability to use channels as a price discrimination mechanism is reduced when customers are located closer to retail store (channel price ratios approach 1). The optimal channel price ratio for the empirical distance distribution is considerably larger than the average retail/online price ratio observed in the data (approximately 83%), which, if our cost assumptions are correct, would reflect sub-optimal pricing by the firm. 21 An alternative approach might assume we observe optimal pricing by the firm, which would allow us to back out marginal costs from our demand estimates and a supply model. Apart from the fact that our discussions with the firm suggest that a markup rule is a closer approximation of their price setting behavior, assuming optimality here would leave little scope for the counterfactual of interest. The firm of course could employ more accurate estimates of category-specific costs and thereby obtain more informative predictions of the counterfactual pricing policy. 35 Figure 11: Pricing counterfactual 7 Conclusion Our intended contributions from this research are threefold. Substantively, the paper adds to the literature that examines the demand implications of operating a mixture of online and retail channels. In addition to quantifying retail store distance effects on channel choice, our structural model estimates and descriptive regressions provide evidence of a promotional effect of retail proximity – i.e., retail presence raising brand awareness, leading to increased purchase frequency. We document evidence of similar, but larger, distance awareness effects with respect to new customer acquisition. Among existing customers, our structural estimates imply a 10% reduction in retail store distance increases total quarterly revenues by 0.7%, a result of increased brand consideration and partial recapture of consumer transportation cost savings. Channel switching patterns imply retail revenues increase by 2.1% while online revenues decrease by 0.5%. From a methodological standpoint, we develop an integrated, utility-based model that jointly predicts purchase incidence and expenditure patterns in multiple channels and product categories. In the context of our application, this formulation allows us to draw inference on the mechanisms by which channels contribute to observed patterns of demand. Our model demonstrates how to endogenize channel expenditure levels through a two-stage budgeting process, wherein first stage channel expenditure decisions maximize the expected (second stage, category share optimal) utility subject to a stochastic shopping budget constraint, which includes both fixed and variable channel transaction costs. Finally, we develop a computationally efficient algorithm to jointly estimate the multi-stage model. While our application is to apparel categories, our model and estimation method- 36 ology can be applied in any multi-channel context where the analyst has access to historical customer transaction data. We also contribute to managerial practice by providing a set of tools to address challenging problems such as where to locate new retail stores and how to optimize product prices in a multi-channel environment. Our retail entry experiment demonstrates the computational feasibility of exhaustively exploring potential entry locations at high spatial resolution, a process that could be applied iteratively to obtain even more precise predictions of optimal locations. There are several potential avenues to extend the current work. One approach might compare our model performance to those that impose alternative assumptions about the consumer’s shopping decision sequence. Researchers with access to firm inventory data could investigate cross-channel assortment depth and stock-out effects. Those with panel data containing high purchase frequencies or long time dimensions could tackle dynamic considerations such as state dependence in channel choices or learning about product categories over time. Relatedly, the role of channel interactions in the consumer’s search process could be explored. While we believe that our focus on apparel categories to some extent limits the scope for dynamics in our analysis (e.g., the ability to learn is restricted by the fact that products within the observed categories are perpetually changing), accounting for these aspects would be an interesting and challenging direction for future work. References Ansari, A., C. Mela, and S. Neslin (2008). Customer channel migration. Journal of Marketing Research 45(1), 60–76. Avery, J., T. J. Steenburgh, J. Deighton, and M. Caravella (2012). Adding bricks to clicks: Predicting the patterns of cross-channel elasticities over time. Journal of Marketing 76(3), 96–111. Bell, D. R., S. Gallino, and A. Moreno (2013). Inventory showrooms and customer migration in omni-channel retail: The effect of product information. Technical report, University of Pennsylvania. Bell, D. R., T.-H. Ho, and C. S. Tang (1998). Determining where to shop: Fixed and variable costs of shopping. Journal of Marketing Research 35(3), 352–369. Bell, D. R. and J. M. Lattin (1998). Shopping behavior and consumer preference for store price format: Why large basket shoppers prefer EDLP. Marketing Science 17(1), 66–88. Bhat, C. R. (2008). The multiple discrete-continuous extreme value (MDCEV) model: Role of utility function parameters, identification considerations, and model extensions. Transportation Research Part B: Methodological 42(3), 274–303. Biyalogorsky, E. and P. Naik (2003). Clicks and mortar: The effect of on-line activities on off-line sales. Marketing Letters 14(1), 21–32. Chevalier, J. A., A. K. Kashyap, and P. E. Rossi (2003). Why don’t prices rise during periods of peak demand? Evidence from scanner data. American Economic Review 93(1), 15–37. Chintagunta, P. K. (1993). Investigating purchase incidence, brand choice and purchase quantity decisions of households. Marketing Science 12(2), 184–208. 37 Deleersnyder, B., I. Geyskens, K. Gielens, and M. Dekimpe (2002). How cannibalistic is the internet channel? A study of the newspaper industry in the United Kingdom and the Netherlands. International Journal of Research in Marketing 19(4), 337–348. Dubé, J.-P. (2004). Multiple discreteness and product differentiation: Demand for carbonated soft drinks. Marketing Science 23(1), 66–81. Dzyabura, D., S. Jagabathula, and E. Muller (2015). Using online preference measurement to infer offline purchase behavior. Technical report, New York University. Forman, C., A. Ghose, and A. Goldfarb (2009). Competition between local and electronic markets: How the benefit of buying online depends on where you live. Management Science 55(1), 47–57. Gentzkow, M. (2007). Valuing new goods in a model with complementarity: Online newspapers. American Economic Review 97(3), 713–744. Geyskens, I., K. Gielens, and M. Dekimpe (2002). The market valuation of internet channel additions. Journal of Marketing 66(2), 102–119. Gordon, B. R., A. Goldfarb, and Y. Li (2013). Does price elasticity vary with economic growth? A cross-category analysis. Journal of Marketing Research 50(1), 4–23. Hanemann, W. M. (1984). Discrete/continuous models of consumer demand. Econometrica 54(3), 541–561. Kim, J., G. M. Allenby, and P. E. Rossi (2002). Modeling consumer demand for variety. Marketing Science 21(3), 229–250. Neslin, S. A., D. Grewal, R. Leghorn, V. Shankar, M. L. Teerling, J. S. Thomas, and P. C. Verhoef (2006). Challenges and opportunities in multichannel customer management. Journal of Service Research 9(2), 95–112. Pauwels, K. and S. Neslin (2011). Building with bricks and mortar: The revenue impact of opening physical stores in a multichannel environment. Technical report, Marketing Science Institute. Pinjari, A. R. and C. Bhat (2010). An efficient forecasting procedure for Kuhn-Tucker consumer demand model systems. Technical report, University of Texas. Soysal, G. and A. Zentner (2014). Measuring e-commerce concentration effects in the apparel industry. Technical report, University of Texas at Dallas. Van Nierop, J., P. Leeflang, M. Teerling, and K. Huizingh (2011). The impact of the introduction and use of an informational website on offline customer buying behavior. International Journal of Research in Marketing 28(2), 155–165. Wales, T. J. and A. D. Woodland (1983). Estimation of consumer demand systems with binding non-negativity constraints. Journal of Econometrics 21(3), 263–285. Wang, K. and A. Goldfarb (2014). Can offline stores drive online sales? Technical report, University of Tornoto. 38 Appendices A Computation of optimal shares and expected utilities We first demonstrate the calculation of optimal shares when the set of chosen categories is known and then describe the procedure to determine the set of chosen categories. For a given draw of the category shocks (e, h), without loss of generality assume the first M categories are chosen. For chosen categories, the Kuhn-Tucker condition ∂L ∂ sk = 0 implies e⇤c ◆ ✓Yck ec⇤ s⇤ pck p gk +1 = l , which may be solved for the optimal share: s⇤k = gk Yck l ck k we use the budget constraint equation to eliminate the Lagrange multiplier l : 1 = M Solving for l gives l = above gives: e⇤c  gk Yck k=1 M e⇤c +  gk pck M  s⇤k k=1 M =  k=1 ⇣ gk pck e⇤c . gk Yck l Next, gk pck e⇤c ⌘ . . Then, substituting this expression for l back into the optimal share equation k=1 ✓ ◆ M ⇤ Yck ec +  gk pck gk B k=1 ⇤ B sk = ⇤ @ M ec  gk Yck 0 1 C pck C A k=1 To find the optimal set of chosen categories, we use the “enumerative” algorithm of Pinjari and Bhat (2010). ⇣ ⌘ The method is based on the insight that the price normalized baseline utilities Ypckck of chosen (non-chosen) goods are greater than or equal to (less than) the Lagrange multiplier, which is an implicit function of the set of chosen categories. The algorithm to predict quantity choices thus proceeds as follows: 1. Take a draw of the the product shocks (e, h) Yck pck 2. Compute the price normalized baseline utilities for all K categories: °k = 3. Sort the categories from highest to lowest °; denote quantities sorted in this order with a tilde, e.g. °̃ m 4. Iteratively compute the Lagrange multiplier, assuming the first m categories are chosen: l m = e⇤c  g̃ j Ỹc j j=1 m e⇤c +  g̃ j p̃c j j=1 K 5. Determine the number of chosen categories by the relation: M =  I °̃m m=1 6. Compute the optimal shares for the chosen categories as s̃⇤m = s̃⇤m = 0 for m > M g̃m e⇤c 0 @ ✓ ◆ M Ỹck e⇤c +  g̃m p̃cm M k=1  g̃m Ỹcm m=1 7. Invert the sort order to restore the original category ordering, yielding s⇤k 39 lm 1 p̃cm A for m M and This procedure may be vectorized so that solutions may be sought for simultaneously for all the draws (across all observations), yielding a highly efficient polynomial time algorithm (proportional to N · D · K 2 operations, where N, D and K are respectively the number of observations, draws and categories). Once optimal shares are in hand, it is a simple matter to evaluate the utility by substituting optimal shares into equation (7d). B Hyperparameter results Hyper-parameters (°) Intercept Age Rural White College Male Commute HH Income Product utility parameters Category 2 Category 3 Category 4 utility intercept utility intercept utility intercept y2 y3 y4 Category 5 utility intercept y5 Category 6 utility intercept y6 -0.2174 -0.0050 0.0113 0.0247 0.1336 0.0154 -0.0015 -0.0008 (0.2219) (0.0026) (0.1255) (0.1843) (0.1879) (0.1106) (0.0044) (0.0008) -0.0451 -0.0038 -0.0131 -0.0455 0.1158 0.0281 -0.0008 -0.0004 (0.2304) (0.0027) (0.1184) (0.1936) (0.1977) (0.1172) (0.0045) (0.0008) -0.4170 -0.0058 0.0139 0.0161 0.0385 0.0361 -0.0010 -0.0006 (0.2078) (0.0024) (0.0992) (0.1667) (0.1700) (0.1075) (0.0039) (0.0008) -0.5142 -0.0036 -0.0331 -0.1260 -0.0630 0.0213 -0.0033 -0.0014 (0.2055) (0.0024) (0.1031) (0.1616) (0.1735) (0.1029) (0.0040) (0.0008) 0.0479 -0.0064 -0.0674 -0.2565 0.0272 0.0707 -0.0035 -0.0013 (0.1762) (0.0021) (0.0933) (0.1511) (0.1766) (0.0964) (0.0030) (0.0007) 4.4921 0.0077 0.0640 0.1404 -0.0102 0.0163 -0.0012 0.0008 (0.1833) (0.0020) (0.0918) (0.1492) (0.1575) (0.0907) (0.0035) (0.0007) 0.0651 0.0049 -0.2680 0.2483 0.7774 -0.0986 -0.0066 0.0011 (0.1419) (0.0019) (0.0873) (0.1163) (0.1217) (0.0698) (0.0024) (0.0005) -0.8729 -0.0008 -0.0229 -0.0256 -0.1494 -0.0993 -0.0030 0.0001 (0.1038) (0.0013) (0.0520) (0.0848) (0.0854) (0.0523) (0.0021) (0.0004) Channel utility parameters Retail utility intercept w2 Budget intercept t Consideration arrival parameters intercept d Table 12: Model hyperparameter estimates 40 C Model fit exhibits .8 Density .6 .4 .2 0 0 2 4 6 8 10+ trips per quarter Figure 12: Frequency of simulated purchase occasions per quarter channel online retail data 43.3% 56.7% D -1.4% +1.4% simulation 41.9% 58.1% Table 13: Empirical and simulated channel choice frequencies .008 Online Retail share of basket expenditure Online Retail density .006 .004 .002 .3 .2 .1 0 0 200 400 600 800 0 1000 1 trip expenditure 3 4 5 6 category Figure 13: Simulated expenditure distributions by channel D 2 Figure 14: Simulated average category shares by channel Demographic effects on new customer acquisition For the purpose of evaluating our entry counterfactual, it is useful to ascertain the influence of market demographics on new customer acquisition because the counterfactual involves simulating acquisition rates for markets not observed in the data. To obtain estimates, we extend the results of the Poisson fixed effects regression in Section 2.2.4. To obtain these estimates without weakening the controls identifying the distance effect (a), we perform a separate Poisson regression without market fixed effects, but where we constrain a to be as estimated in Section 2.2.4. To that specification, we add demographic information obtained from Census 2010 block group records 41 (zm ) and period fixed effects (µt ): Nmt ⇠ Poisson(rmt ), log (rmt ) = a log(dmt ) + b zm + µt + log(popm Emt ) (25) The b parameter estimates are reported in Table 14 below – all effects are significant at the 1% level. The results suggest strong positive correlation with market average age, median income, fraction of population white and fraction of population with college degrees. There is a negative correlation associated with rural markets. regressor age income rural white college quarter fixed effects estimate 0.0173 0.0015 -0.1930 0.7892 3.4707 Y Table 14: Estimates of demographic effects on new customer acquisition 42