Entry and Competition in the Retail and Service Industries A Dissertation Presented to the Faculty of the Graduate School of Yale University in Candidacy for the Degree of Doctor of Philosophy by Panle Jia Dissertation Directors: Professor Steven Berry and Professor Penny Goldberg December 2006 Contents Acknowledgments vii 1 Introduction 1 2 What Happens When Wal-Mart Comes to Town: An Empirical Analysis of the Discount Retailing Industry 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Industry background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Market definition and data description . . . . . . . . . . . . . . . . . 2.4 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Model setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 The profit function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Solution algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 The best response function . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 The maximization problem with two competing chains . . . . . . . . 2.5.3 Adding small firms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Empirical implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 Discussion: a closer look at the assumptions . . . . . . . . . . . . . . 2.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 Parameter estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.2 The competition effect and the chain effect . . . . . . . . . . . . . . 2.7.3 The impact of Wal-Mart’s expansion and related policy issues . . . . 2.8 Conclusion and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Appendix A: definitions and proofs . . . . . . . . . . . . . . . . . . . . . . . 2.9.1 Verification of the necessary condition (2.3) . . . . . . . . . . . . . . 2.9.2 The set of fixed points of an increasing function that maps a lattice into itself . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.3 A tighter lower bound and upper bound for the optimal solution vector D∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.4 Verification that the chains’ profit functions are supermodular with decreasing differences . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.5 Multiple maximizers to the chains’ optimization problem . . . . . . i 5 6 12 14 14 15 18 18 19 24 25 29 32 33 33 37 39 39 44 48 50 51 51 52 54 56 58 2.9.6 Computational issues . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 Appendix B: data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 62 3 Semi-Parametric Estimation of the Distribution of Fixed Costs in Entry Models 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Estimation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Kernel Density Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Identification Assumptions and Properties of the Quasi Maximum Likelihood Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Differences between my model and Klein and Spady (1993) . . . . . . . . . 3.7 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 63 66 67 70 73 76 78 78 84 4 Figures 85 5 Tables 89 Bibliography 100 To my dad, my mom, and my sister’s family ABSTRACT Entry and Competition in the Retail and Service Industries Panle Jia 2006 My thesis studies entry and competition in the retail and service industries. It builds on the entry literature and seeks to contribute in two ways. First, it extends the existing methodology by relaxing several commonly used assumptions. Second, it applies these extensions to analyze policy issues in the retail and service industries. Most entry models assume that entry decisions in different markets are independent. While this assumption simplifies estimation, it is ill-suited to study retail chains that can exploit the scale economies arising from clustering stores in nearby markets. The first chapter formulates and estimates a model that captures the dependence of entry decisions induced by the scale economies. The model is applied to the discount retail industry to quantify the impact of chain stores on small retailers. The results indicate that entry by either a Kmart or a Wal-Mart store displaces forty to fifty percent of small discount retailers, and that Wal-Mart’s expansion in the 90s explains fifty to seventy percent of the net change in the number of these small discount retailers. Direct subsidies to either chains or small firms are not likely to be cost effective in increasing the number of firms or the level of employment. Finally, scale economies were important for both Kmart and Wal-Mart, but the magnitude did not grow proportionately with the chains’ sizes. The second chapter relaxes the distributional assumption for the error term in entry models. This is motivated by the observation that the maximum likelihood estimator (MLE) is inconsistent if the assumed distribution is wrong. Klein and Spady (1993) proposed a consistent and efficient semi-parametric estimator that does not require the error term’s distribution to be specified. Their approach is designed for single-index models. I extend their method to an entry model with multiple indices and apply this extension to study the entry cost in the dry cleaning industry. I find evidence that the distribution of the entry cost is asymmetric with a long tail, and that this asymmetry drives the difference between the MLE estimates and the semi-parametric estimates. The proposed method can therefore be used as a robustness check for the sensitivity of the parameter estimates to the distributional assumption. c 2007 by Panle Jia ° All rights reserved. Acknowledgments I am deeply indebted to all of my committee members: Steven Berry, Penny Goldberg, Hanming Fang, and Philip Haile. I would not have come to where I am without their constant guidance and support. Writing the thesis is like climbing a mountain. Little did I know what it was like when I first started. It is not always easy to find the right trail, and I certainly would have got lost many times groping in the jungles if I had not had their careful guidance. Those frequent office visits always cleared my cloudy thoughts and lit the road ahead of me. There were a few times when I had major setbacks. It felt like standing in front of a giant cliff after sweating for months and only realizing afterwards that the object ahead was insurmountable. As my confidence dwindled to the size of a sesame kernel, they poured out their help and encouragement, and found with me the unnoticeable path that wound along the cliff. If I have achieved anything, it is as much theirs as it is mine. I owe many. Don Brown was the cheerleader as I ran the marathon of the Ph.D. program; Don Andrews showed me the right equipment before the journey; Pat Bayer spent uncountable days helping me to stay in focus; Alvin Klevorick instructed me in the right technique as I breathed for the final sprint. Oh, how can I forget my buddies — those who went through the same journey: Shamena Anwar, Rossella Argenziano, Erik and Randi Hjalmarsson, Deran Ozmen, Philipp SchmidtDengler, Henry Schneider, Tavneet Suri, Feng Zhu. . . I am so lucky to get to know all of them. The list is incomplete without the ones that have become so dear to me: Pam O’Donnell, Susan Olmsted, Pat Brown, and Dorothy Ovelar. . . Thank you all! Chapter 1 Introduction Entry has long been an interesting topic for IO economists. Among the early papers, Bain (1956) focused on “determinants of entry barriers”, such as economies of scale, product differentiation and cost asymmetry. With the application of game theory to economics in the 1970s and 1980s, IO economists started to model how strategic behavior among firms influences firms’ entry decisions. This literature tends to show that strategic behavior in many cases is more important than technology and demand factors. The empirical work that directly models strategic interaction among firms did not start until the late 1980s. Among the earliest works are Bresnahan and Reiss (1987, 1990, and 1991), and Berry (1992). The most recent ones include Berry and Waldfogel (1999), Mazzeo(2002), Seim(2006), and Toivanen and Waterson (2005), among many others. These structural models rely on the fact that firms’ discrete entry decisions reveal information about their underlying profit. By observing how firms’ entry or exit decisions change when market conditions evolve, researchers can make inferences about how market conditions affect firms’ profit and possibly learn about the nature of the competition among 1 2 firms. My thesis builds on the entry literature and seeks to contribute in two ways. First, it extends the existing methodology by relaxing a couple of commonly used assumptions. Second, it applies these extensions to analyze policy issues in the retail and service industries. Most entry models assume that entry decisions in different markets are independent. While this assumption simplifies the estimation, it is clearly problematic in the case of retail chains that operate multiple stores in several markets. There have been many suggested explanations for the success of chains, and almost all of them point to some kind of spillovers among stores operated by the same chain. For example, there can be significant scale economies in the distribution system; stores close by can split the operation cost and advertising expenses; stores also frequently share their private information about local markets and learn from one another’s managerial practices. All these factors suggest that entry decisions are dependent across markets as chains exploit various spillover effects to maximize their total profit. The first chapter formulates and estimates a model that explicitly captures the spatial correlation of entry decisions. The model is applied to the discount retail industry to quantify the impact of chain stores on small retailers. There are three major challenges in this exercise: the dimensionality problem, existence of multiple equilibria, and the invalidity of the standard inference method in GMM when the data are spatially dependent. The dimensionality problem arises because there are a large number of markets that chains have entered or potentially could have entered but chosen not to. As a consequence, modeling chains’ decisions necessitates solving a profit maximization problem defined on a 3 lattice space with a large dimension. The solution method borrows from the insight embodied in many recent papers on incomplete models (for example, Haile and Tamer (2003), Ciliberto and Tamer (2006)). In these papers, researchers exploit necessary conditions for modeling or identification, either because the research goal is to study a class of models, or because sufficient conditions are too difficult and sometimes infeasible to characterize. Similarly, I start with a set of necessary conditions, and transform the profit maximization problem into a search for the fixed points of the necessary conditions. To solve the game with two competing chains, I make use of the supermodularity property of the game. Like most static entry models, the current one also has multiple equilibria. The presence of multiple equilibria poses considerable challenges to estimation, as there is no longer a one-to-one mapping between regions of the unobservables and the observed equilibrium outcomes. There has been some progress in estimating models with multiple equilibria, but none of the existing methods are suitable for complicated models like the one studied here. Therefore I make the model ‘complete’ by choosing an equilibrium that seems reasonable a priori. The parameter estimates are stable across several very different equilibria. To address the issue of inference in GMM with spatial correlated data, I adopt the non-parametric covariance matrix estimator proposed by Conley’s (1999). The second chapter relaxes the distributional assumption for the error term in entry models. This is motivated by the observation that the maximum likelihood estimator (MLE) is inconsistent if the assumed distribution is wrong. Klein and Spady (1993) proposed a consistent and efficient semi-parametric estimator that does not require the error term’s distribution to be specified. Their approach is designed for single-index models. Entry 4 models naturally have multiple indices or cutoff values: observing M firms is defined by the profit with M − 1 competitors and the profit with M competitors. With N potential entrants, there are N cutoff values. However, a simple trick restores the single-index feature and I apply the model to study the entry cost in the dry cleaning industry. The conclusion of this chapter is more qualitative than quantitative. I find evidence that the error term has an asymmetric distribution, although I can not reject the hypothesis that the MLE estimates and the semi-parametric estimates are the same. Nevertheless, the method proposed in this chapter can be used for robustness checks. The absence of significant differences between the two groups of estimates brings us confidence in the validity of the distribution assumption, while the finding of a significant difference calls for attentions to possible specification errors. Chapter 2 What Happens When Wal-Mart Comes to Town: An Empirical Analysis of the Discount Retailing Industry “Bowman’s (in a small town in Georgia) is the eighth ‘main street’ business to close since Wal-Mart came to town.. . . For the first time in seventy-three years the big corner store is empty.” Archer and Taylor, Up against the Wal-Mart. “There is ample evidence that a small business need not fail in the face of competition from large discount stores. In fact, the presence of a large discount store usually acts as a magnet, keeping local shoppers. . . .and expanding the market. . . .” Morrison Cain, Vice president of International Mass Retail Association. 5 2.1 Introduction 2.1 6 Introduction The landscape of the U.S. retail industry has changed considerably over the past few decades as the result of two closely related trends. One is the rise of discount retailing; the other is the increasing prevalence of large retail chains. In fact, the discount retailing sector is almost entirely controlled by chains. In 1997, the top three chains (Wal-Mart, Kmart, and Target) accounted for 72.7% of total sector sales and 54.3% of the discount stores. Discount retailing is a fairly new concept, with the first discount stores appearing in the 1950s. The leading magazine for the discount industry, Discount Merchandiser, defines a modern discount store as a departmentalized retail establishment that makes use of selfservice techniques to sell a large variety of hard goods and soft goods at uniquely low margins.1,2 Over the span of several decades, the sector has emerged from the fringe of the retail industry and become part of the mainstream.3 From 1960 to 1997, the total sales revenue of discount stores, in real terms, increased 15.6 times, compared with an increase of 2.6 times for the entire retail industry. As the discount retailing sector continues to grow, opposition from other retailers, especially small ones, begins to mount. The critics tend to associate discounters and other big retailers with small-town problems caused by the closing of small firms, such as the 1 See the annual report “The True Look of the Discount Industry” in the June issue of Discount Merchandiser for the definition of the discount retailing, the sales and store numbers for the top 30 largest firms, as well as the industry sales and total number of discount stores. 2 According to Annual Benchmark Report for Retail Trade and Food Services published by the Census Bureau, from 1993 to 1997, the average markup for regular department stores was 27.9%, while the average markup for discount stores was 20.9%. Both markups increased slightly from 1998 to 2000. 3 In 1997, the discount retailing sector accounted for 15% of total retail sales. The other retail sectors are: building materials, food stores, automotive dealers, apparel, furniture, eating and drinking places, and miscellaneous retail. 2.1 Introduction 7 decline of downtown shopping districts, eroded tax bases, decreased employment, and the disintegration of closely knit communities. Partly because tax money is used to restore the blighted downtown business districts and to lure the business of big retailers with various forms of economic development subsidies, the effect of big retailers on small firms and local communities has become a matter of public concern.4 My first goal in this chapter is to quantify the impact of national discount chains on the profitability and entry and exit decisions of small retailers from the late 1980s to the late 1990s. The second salient feature of retail development in the past several decades, including in the discount sector, is the increasing dominance of large chains. In 1997, retail chains with a hundred or more stores accounted for 0.07% of the total number of firms, yet they controlled 21% of the establishments and accounted for 37% of sales and 46% of retail employment.5 Since the late 1960s, their share of the retail market more than doubled. In spite of the dominance of chain stores, few empirical studies (except Holmes (2005) and Smith (2004)) have quantified the potential advantages of chains over single-unit firms, in part because of the modeling difficulties.6 In entry models, for example, the store entry decisions of multi-unit chains are related across markets. Most of the literature assumes that entry decisions are independent across markets and focuses on competition among firms within each local market. My second objective here is to extend the entry literature by relaxing 4 See The Shils Report (1997): Measuring the Economic and Sociological Impact of the Mega-Retail Discount Chains on Small Enterprises in Urban, Suburban and Rural Communities. 5 See the 1997 Economic Census Retail Trade subject series Establishment and Firm Size (Including Legal Form of Organization), published by the US Census Bureau. 6 I discuss Holmes (2005) in detail below. Smith (2004) estimates the demand cross-elasticities between stores of the same firm and finds that mergers between the largest retail chains increase the price level by up to 7.4%. 2.1 Introduction 8 the independence assumption, and to quantify the advantage of operating multiple units by explicitly modeling chains’ entry decisions in a large number of markets. The model has two key features. First, it allows for flexible competition patterns among all retailers. Second, it incorporates the potential benefits of locating multiple stores near one another. Such benefits, which I group as “the chain effect,” can arise through several different channels. For example, there may be significant scale economies in the distribution system. Stores located near each other can split advertising costs or employee training costs, or they can share knowledge about the specific features of local markets. The chain effect causes profits of stores in the same chain to be spatially related. As a result, choosing store locations to maximize total profit is complicated, since with N markets there are 2N possible location choices. In the current application, there are more than 2,000 markets and the number of possible location choices exceeds 10300 . When several chains compete against each other, solving for the Nash equilibrium becomes further involved, as firms balance the gains from the chain effect against competition from rivals. I tackle this problem in several steps. First, I transform the profit maximization problem into a search for the fixed points of the necessary conditions. This transformation shifts the focus of the problem from a set with 2N elements to the set of fixed points of the necessary conditions. The latter has a much smaller dimension, and is well-behaved with easy-to-locate minimum and maximum points. Having dealt with the problem of dimensionality, I take advantage of the supermodularity property of the game to search for the Nash equilibrium. Finally, in estimating the parameters, I adopt the econometric technique proposed by Conley (1999) to address the issue of cross-sectional dependence. 2.1 Introduction 9 The analysis exploits a unique data set I collected that covers the entire discount retailing industry from 1988 to 1997, during which the two major national chains were Kmart and Wal-Mart.7 The results indicate that Wal-Mart’s expansion from the late 1980s to the late 1990s explains about fifty to seventy percent of the net change in the number of small discount retailers. Unobserved market-level profit shocks induce a positive correlation among the entry decisions of chains and small firms; failure to address this endogeneity issue would underestimate Wal-Mart’s impact on small firms by fifty to sixty percent. Scale economies were important to both Wal-Mart and Kmart, but their importance did not grow proportionately with the size of the chains. Finally, government subsidies to either chains or small firms in this industry are not likely to be effective in increasing the number of firms or the level of employment. The results in this chapter complement a recent study by Holmes (2005), which analyzes the diffusion process of Wal-Mart stores. Holmes quantifies the economies of density, defined as the cost savings from locating stores close to one another, a concept similar to the chain effect in this chapter. The central insight in his paper is that markets vary in quality; in the absence of economies of density, Wal-Mart would open stores in the most profitable markets first and gradually expand to less profitable ones. Since profitable markets do not necessarily cluster, Wal-Mart should open stores erratically across regions. The actual opening process, however, displayed a regular pattern of diffusion from the South, where Wal-Mart’s headquarters are, to other regions. Due to the complexity of the dynamics, with the state space growing exponentially with the number of markets and time periods, 7 During the sample period, Target was a regional store that competed mostly in the big metropolitan areas in the Midwest with few stores in the sample. See the data section for more details. 2.1 Introduction 10 it is extremely difficult to solve Wal-Mart’s optimization problem. By abstracting from competition and focusing on Wal-Mart’s single-agent maximization problem, Holmes is able to exploit a perturbation approach to estimate the economies of density. The findings suggest that these economies of density are important. Holmes’ approach is appealing because he derives the magnitude of the economies of density from the dynamic expansion process. In contrast, I identify the chain effect from the stores’ geographic clustering pattern. My approach abstracts from a number of important dynamic considerations. For example, it does not allow firms to delay store openings because of credit constraints, nor does it allow for any preemption motive as the chains compete and make simultaneous entry decisions. A dynamic model that incorporates both the competition effects and the chain effect would be ideal. However, given the great difficulty of estimating the economies of density in a single agent dynamic model, as Holmes (2005) shows, it is infeasible to estimate a dynamic model that also incorporates the strategic interactions within chains and between chains and small retailers. Since one of my main goals is to analyze the competition effects and perform policy evaluations, I adopt a twostage model in which all players make a once-and-for-all decision, with chains moving first and small retailers moving second. I estimate the model seperately for 1988 and 1997, and exploit the coefficient estimates from both years to analyze the impact of chains on small retailers. The extension of the current framework to a dynamic model is left for future research. This chapter contributes to the entry literature initiated by Bresnahan and Reiss (1990, 1991) and Berry (1992), where researchers infer the firms’ underlying profit functions by 2.1 Introduction 11 observing their equilibrium entry decisions across a large number of markets. To the extent that retail chains can be treated as multi-product firms whose differentiated products are stores with different locations, this chapter relates to several recent empirical entry papers that endogenize firms’ product choices upon entry. For example, Mazzeo (2002) considers the quality choices of highway motels, and Seim (2005) studies how video stores soften competition by choosing different locations. Unlike these studies, in which each firm chooses only one product, I analyze the behavior of multi-product firms whose product spaces are potentially large. This chapter is also related to a large literature on spatial competition in retail markets, for example, Pinkse et. al. (2002), Smith (2004), and Davis (2005). All of these models take the firms’ locations as given and focus on price or quantity competition. I adopt the opposite approach. Specifically, I assume a parametric form for the firms’ reduced-form profit functions from the stage competition, and examine how they compete spatially by balancing the chain effect against the competition effect of rivals’ actions on their own profits. Finally, the chapter is part of the growing literature on Wal-Mart, which includes Stone (1995), Basker (2005a, 2005b), Hausman and Leibtag (2005), and Neumark et al (2005). The remainder of the chapter is structured as follows. Section 2 provides background information about the discount retailing sector. Section 3 describes the data set, and section 4 discusses the model. Section 5 proposes a solution algorithm for the game between chains and small firms when there is a large number of markets. Section 6 explains the estimation approach. Section 7 presents the results. Section 8 concludes. The appendix outlines the 2.2 Industry background 12 technical details not covered in section 5. 2.2 Industry background Discount retailing is one of the most dynamic sectors in the retail industry. Table 1 (A) displays some statistics for the industry from 1960 to 1997. The sales revenue for this sector, in 2004 US dollars, skyrocketed from 12.8 billion in 1960 to 198.7 billion in 1997. In comparison, the sales revenue for the entire retail industry increased only modestly from 511.2 billion to 1313.3 billion during the same period. The number of discount stores multiplied from 1329 to 9741, while the number of firms dropped from 1016 to 230. Chain stores dominate the discount retailing sector, as they do other retail sectors. In 1970, the 39 largest discount chains, with twenty-five or more stores each, operated 49.3% of the discount stores and accounted for 41.4% of total sales. By 1989, both shares had increased to roughly 88%. In 1997, the top 30 chains controlled about 94% of total stores and sales. The principal advantages of chain stores include the central purchasing unit’s ability to buy on favorable terms and to foster specialized buying skills; the possibility of sharing operating and advertising costs among multiple units; the freedom to experiment in one selling unit without risk to the whole operation. Stores also frequently share their private information about local markets and learn from one another’s managerial practices. Finally, chains can achieve economies of scale by combining wholesaling and retailing operations within the same business unit. Until the late 1990s, the two most important national chains were Kmart and Wal- 2.2 Industry background 13 Mart. Each firm opened its first store in 1962. The first Kmart was opened by the varietychain Kresge. Kmart stores were a new experiment that provided consumers with quality merchandise at prices considerably lower than those of regular retail stores. To reduce advertising costs and to minimize customer service, these stores emphasized nationally advertised brand-name products. Consumer satisfaction was guaranteed, and all goods could be returned for a refund or an exchange (See Vance and Scott (1994), pp32). These practices were an instant success, and Kmart grew rapidly in the 1970s and 1980s. By the early 1990s, the firm had more than 2200 stores nationwide. In the late 1980s, Kmart tried to diversify and pursued various forms of specialty retailing in pharmaceutical products, sporting goods, office supplies, building materials, etc. The attempt was unsuccessful, and Kmart eventually divested itself of these interests by the late 1990s. Struggling with its management failures throughout the 1990s, Kmart maintained roughly the same number of stores; the opening of new stores offset the closing of existing ones. Unlike Kmart, which was initially supported by an established retail firm, Wal-Mart started from scratch and grew relatively slowly in the beginning. To avoid direct competition with other discounters, it focused on small towns in southern states where there were few competitors. Starting in the early 1980s, the firm began its aggressive expansion process that averaged 140 store openings per year. In 1991, Wal-Mart replaced Kmart as the largest discounter. By 1997, Wal-Mart had 2362 stores (not including the wholesale clubs) in all states, including Alaska and Hawaii. As the discounters continue to grow, small retailers start to feel their impact. There are extensive media reports on the controversies associated with the impact of large chains on 2.3 Data 14 small retailers and on local communities in general. As early as 1994, the United States House of Representatives convened a hearing titled “The Impact of Discount Superstores on Small Businesses and Local Communities.” Witnesses from mass retail associations and small retail councils testified, but no legislation followed, partly due to a lack of concrete evidence. In April 2004, the University of California, Santa Barbara, held a conference that centered on the cultural and social impact of the leading discounter, Wal-Mart. In November 2004, both CNBC and PBS aired documentaries that displayed the changes Wal-Mart had brought to the society. 2.3 Data The available data sets dictate the modeling approach used in this chapter. Hence, I discuss them before introducing the model. 2.3.1 Data sources There are three main data sources. The data on discount chains come from an annual directory published by Chain Store Guide Inc. The directory covers all operating discount stores of more than ten thousand square feet. For each store, the directory lists its name, size, street address, telephone number, store format, and firm affiliation.8 The U.S. industry classification system changed from the Standard Industrial Classification System (SIC) to the North American Industry Classification System (NAICS) in 1998. To avoid potential inconsistencies in the industry definition, I restrict the sample period to the ten years before 8 The directory stopped providing store size information in 1997 and changed the inclusion criterion to 20,000 square feet in 1998. The store formats include membership stores, regional offices, and in later years distribution centers. 2.3 Data 15 the classification change. As first documented in Basker (2005), the directory was not fully updated for some years. Fortunately, it was fairly accurate for the years used in this study. See appendix 2.10 for details. The second data set, the County Business Patterns, tabulates at the county level the number of establishments by employment size category for very detailed industry classifications. However, data disaggregated at the three-digit or finer SIC levels are unusable because of data suppression due to confidentiality requirements.9 There are eight retail sectors at the two-digit SIC level: building materials and garden supplies, general merchandise stores (or discount stores), food stores, automotive dealers and service stations, apparel and accessory stores, furniture and home-furnishing stores, eating and drinking places, and miscellaneous retail. I focus on small general merchandise stores with nineteen or fewer employees, which are the direct competitors of the discount chains. Data on county level population are downloaded from the websites of U.S. Census Bureau (before 1990) and the Missouri State Census Data Center (after 1990). Other county level demographic and retail sales data are from various years of the decennial census and the economic census. 2.3.2 Market definition and data description In this chapter, a market is defined as a county. Although the Chain Store Guide publishes the detailed street addresses for the discount stores, information about small firms is available only at the county level. Many of the market size variables, like retail sales, are also 9 Title 13 of the United States Code authorizes the Census Bureau to conduct censuses and surveys. Section 9 of the same Title requires that any information collected from the public under the authority of Title 13 be maintained as confidential and that no estimates be published that would disclose the operations of an individual firm. 2.3 Data 16 available only at the county level. I focus on counties with an average population between 5,000 and 64,000 from 1988 to 1997. There are 2065 such counties among a total of 3140 in the U.S. According to Vance and Scott (1994), the minimum county population for a Wal-Mart store was 5,000 in the 1980s, while Kmart concentrated in places with a much larger population. 9% of all U.S. counties were smaller than 5,000 and were unlikely to be a potential market for either chain, while 25% of them were large metropolitan areas with an average population of 64,000 or more. These big counties typically included multiple self-contained shopping areas, and consumers were unlikely to travel across the entire county to shop. The market configuration in these big counties tended to be very complex with a large number of competitors and many market niches. For example, in the early 1990s, there were more than one hundred big discounters and close to four hundred small general merchandise stores in Los Angeles County, one of the largest counties. Modeling firms’ strategic behavior in these markets requires geographic information more detailed than that provided by the county level data. During the sample period, there were two national chains: Kmart and Wal-Mart. The third largest chain, Target, had 340 stores in 1988 and about 800 stores in 1997. Most of them were located in metropolitan areas in the Midwest, with on average fewer than twenty stores in the counties studied here. I do not include Target in the analysis.10 In the sample, only eight counties had two Kmart stores and forty-nine counties had two Wal-Mart stores in 1988; the figures were eight and sixty-six counties, respectively, in 1997. The current specification abstracts from the choice of the number of opening stores 10 The rest of the discount chains are much smaller and are all regional. They are not included in the analysis. 2.3 Data 17 and considers only market entry decisions, as there is not enough variation in the data to identify the profit for the second store in the same market. The algorithm proposed in this chapter can be applied with little modification to models that also incorporate the store-number choice. Table 1 (B) presents summary statistics of the sample for the years 1988 and 1997. The average county population grew from 22,470 to 24,270, an increase of 8%. Retail sales per capita, in 1984 dollars, rose 10%, from $3,690 to $4,050. The average percentage of urban population was 30% in 1988 and increased to 33% in 1997. About one quarter of the counties was primarily rural with a small urban population, which is why the average across the counties seems somewhat low. 41% of the counties were in the Midwest (which includes the Great Lakes region, the Plains region, and the Rocky Mountain region, as defined by the Bureau of Economic Analysis), and 50% of the counties were in the southern regions (including the Southeast region and the Southwest region), with the rest in the Far West and the Northeast regions. Kmart had stores in 21% of the counties at the beginning of the sample period, and the number dropped slightly to 19% at the end. In comparison, Wal-Mart had stores in 32% of the counties in 1988 and in 48% of them in 1997. The average number of small firms decreased quite a bit over the same period, from 3.86 to 3.49. The median was three, with a maximum of twenty-five small firms in 1987, and nineteen in 1997. The percentage of counties with six or more small firms dropped from 22% to 18%, while the percentage of counties with at most one small firm increased from 18% to 22% over the sample period. Figure 1 and Figure 2 plot out the Kmart and Wal-Mart stores that were located in the sample counties in 1988 and 1997. 2.4 Modeling 2.4 2.4.1 18 Modeling Model setup The model I develop is a two-stage game with complete information. In stage one, Kmart and Wal-Mart simultaneously choose store locations to maximize their total profits in all markets. In stage two, small firms observe Kmart’s and Wal-Mart’s choices and decide whether to enter the market.11 Once the entry decisions are made, firms compete and profits are realized. All firms are fully rational with perfect knowledge of their rivals’ profitability and the payoff structure. When Kmart and Wal-Mart make location choices in the first stage, they take into consideration the small retailers’ reaction. There are no entry barriers; small firms enter the market until profit for an extra entrant becomes negative. In reality, small retailers existed long before the era of the discount chains. As the chains emerge in the retail industry, small firms either continue their operations and compete with the chains or exit the market. This might suggest a three-stage model, with each stage corresponding to each of these events. However, given the nature of the retail industry, the sunk entry cost is unlikely to be significant for the small firms. In other words, their first stage decisions are irrelevant, and the small retailers respond to chain stores’ entry decisions in the third stage.12 In contrast, I implicitly assume that chains can commit to their entry decisions and do not further adjust after small firms enter. This is based on the 11 I have implicitly assumed that small firms, which are firms with one to nineteen employees, are single-unit stores. 12 It is possible that small firm owners have invested considerably in their communities to establish a customer base and a good reputation. These kinds of investment is sunk if stores are closed. However, these investment can be partially recovered when the firm owners switch to other retail sectors to make use of the consumer goodwill. 2.4 Modeling 19 observation that most chain stores enter with a long-term lease of the rental property, and in many cases they invest considerably in the infrastructure construction associated with establishing a big store. 2.4.2 The profit function One way to obtain the profit function is to start from primitive assumptions of supply and demand in the retail markets, and derive the profit functions from the equilibrium conditions. Without any price, quantity, or sales data, and with very limited information on store characteristics, this approach is extremely demanding on data and relies heavily on the primitive assumptions. Instead, I follow the convention in the entry literature and assume that firms’ profit functions from the stage competition take a linear form and that profits decline in the presence of rivals. Let Di,m ∈ {0, 1} stand for chain i’s strategy in market m, where Di,m = 1 if chain i operates a store in market m and Di,m = 0 otherwise. Di = {Di,1 , ..., Di,M } is a vector indicating chain i’s location choices for the entire set of markets. Let Dj,m denote rival j’s strategy in market m, and Ns,m the number of small firms in market m. Xm , εm , and η i,m stand for a vector of observed market size variables, the market level profit shock, and firm i’s private profit shock in market m, respectively. Finally, let Zml designate the distance from market m to market l in miles, and Zm = {Zm1 , ..., ZmM }. 2.4 Modeling 20 The profit function for chain i in market m takes the following form: Πi,m (Di , Dj,m , Ns,m ; Xm , εm , η i,m , Zm ) = Di,m ∗ [Xm β i + δ ij Dj,m + δ is ln(Ns,m + 1) + δ ii Σl6=m p 1 − ρ2 εm + ρη i,m ] Di,l + Zml (2.1) where i, j ∈ {k, w}, with “k” for Kmart and “w” for Wal-Mart. Note the presence of Di in Πi,m (·): profit in market m depends on the number of stores chain i has in other markets. Profit from staying outside the market is normalized to 0. Chains maximize their total profits in all markets Σm Πi,m . In equilibrium, the number of small firms is a function of Kmart’s and Wal-Mart’s first stage decisions: Ns,m (Dk,m , Dw,m ). When making location choices, the chains take into consideration the impact of small firms’ reactions on their own profits. There are several components in chain i’s profit Πi,m in market m: the observed market size Xm β i that is parameterized by demand shifters, like population, the extent of urbanization, etc.; the unobserved profit shock p 1 − ρ2 εm + ρη i,m , known to the firms but unknown to the econometrician; the competition effects δ ij Dj,m + δ is ln(Ns,m + 1), as well as the D chain effect δ ii Σl6=m Ż i,l . Notice that the observed market size component Xm β i is allowed ml to differ for different players. Xm includes all factors that influence profits, and β i picks up the factors that are relevant for firm i. For example, Kmart might have some advantage in the Midwest, while Wal-Mart stores might be more profitable in markets close to their headquarters. The unobserved profit shock has two elements: εm , the market-level profit shifter that 2.4 Modeling 21 affects both chains and small firms, and η i,m , a firm-specific profit shock. εm is assumed to be i.i.d. across markets, while ηi,m is assumed to be i.i.d. across both firms and markets. p 1 − ρ2 (with 0 ≤ ρ ≤ 1) measures how important the market component is. In principle, it can differ for each chain and for small firms. For example, the market specific business environment — how developed the infrastructure is, whether the market has sophisticated shopping facilities, and the stance of the local community toward large corporations including big retailers — might matter more to chains than to small firms. In the baseline specification, I restrict ρ to be the same across all players. Relaxing it does not improve the fit much. ηi,m incorporates the unobserved store level heterogeneity, including the management ability, the display style and shopping environment, employees’ morale or skills, etc. As is standard in discrete choice models, the scale of the parameter coefficients and the variance of the error term are not separately identified. I normalize the variance of the error term to 1 by assuming that both εm and η i,m are standard normal random variables. The competition effect from the rival chain is captured by δ ij Dj,m , where Dj,m is one if rival j operates a store in market m. δ is ln(Ns,m + 1) denotes the effect of small firms on chain i’s profit. The addition of 1 in ln(Ns,m + 1) is used to avoid ln 0 for markets without any small firms. The log form allows the incremental competition effect to taper off when there are many small firms. D i,l The last unexplained term in the bracket, δ ii Σl6=m Zml , captures the chain effect, the benefit that having stores in other markets generates for the profitability in market m. δ ii is assumed to be non-negative. Stores split the costs of operation, delivery, and advertising among nearby ones to achieve scale economies. They also share knowledge of the localized 2.4 Modeling 22 markets and learn from one another’s managerial success. All these factors suggest that having stores nearby benefits the operation in market m, and that the benefit declines with the distance. Following Bajari and Fox (2005), I divide the spillover effect by the distance D i,l between the two markets Zml , so that profit in market m is increased by δ ii Zml if there is a store in market l that is Zml miles away. This simple formulation serves two purposes. First, it is a parsimonious way to capture the fact that it might be increasingly difficult to benefit from stores that are farther away. Second, the econometric technique exploited in the estimation requires the dependence among the observations to die away sufficiently fast. I also assume that the chain effect takes place among counties whose centroids are within fifty miles, or roughly an area that expands seventy-five miles in each direction. Including counties within a hundred miles increases substantially the computing time with little change in the parameters. This chapter focuses on the chain effect that is “localized” in nature. Some chain effects are “global”: for example, the gain that arises from a chain’s ability to buy a large volume at a discount. The latter benefits affect all stores the same, and cannot be separately identified from the constant of the profit function. Hence, the estimates δ ii , should be interpreted as a lower bound to the actual advantages enjoyed by a chain. Profit for a small firm that operates in market m is: Πs,m (Dk,m , Dw,m , Ns,m ; Xm , εm , η s,m ) = Xm β s + Σi=k,w δ si Di,m + δ ss ln(Ns,m ) p + 1 − ρ2 εm + ρη s,m (2.2) All small firms are symmetric with the same profit function Πs,m (·). For markets with no 2.4 Modeling 23 small firms, the entry condition implies that profit for a single small firm is negative: Xm β s + Σi=k,w δ si Di,m + p 1 − ρ2 εm +ρη s,m < 0. For markets with Ns,m small firms, Πs,m (Ns,m ) ≥ 0 and Πs,m (Ns,m +1) < 0. The term δ ss ln(Ns,m ) captures the competition among small firms, while Σi=k,w δ si Di,m denotes the impact of Kmart and Wal-Mart on small firms. The static nature of the model does not allow separate identification of the different channels through which the competition effect takes place. For example, one can’t tell how much of the competition effect is due to the forced exit of small firms, and how much is due to the preemption that reduces entry of small firms. The market-level error term εm makes the location choices of the chain stores Dk,m and Dw,m , and the number of small firms Ns,m endogenous in the profit functions, since a large εm leads to more entries of both chains and small firms. I explicitly address the issue of endogeneity by solving chains’ and small firms’ entry decisions simultaneously within the model. To estimate only the competition effect of big retailers on small firms δ si , without analyzing the equilibrium consequences of policy changes, it suffices to regress the number of small stores on market size variables, together with the number of chain stores, and to use instruments to correct the OLS bias of the competition effect. However, valid instruments for the presence of each of the rivals are difficult to find. Researchers have experimented with distance to headquarters or stores’ planned opening dates to instrument for Wal-Mart’s entry decisions.13 It is much more difficult to find good instruments for Kmart. The R2 of regressing Kmart stores’ locations on their distance to headquarters is less than 0.005. Another awkward feature of the linear IV regression is that the predicted number of small 13 See Neumark (2005) and Basker (2005a). 2.5 Solution algorithm 24 firms can be negative. Limited dependent variable estimation avoids this problem, but accounting for endogeneity in the discrete games requires strong assumptions about the nature of the endogeneity that are not satisfied by the current model. Perhaps the best argument for the current approach, besides the possibility of analyzing policy experiments and studying the spillover among the chain stores, is that the structural model can exploit the chain effect to help with identification. The chain effect drives entry decisions of chain stores, but is not related to small firms’ entry decisions, and serves as a natural excluded variable in the identification of the chains’ competition effects on small firms. Note that the above specification allows very flexible competition patterns among all the possible firm-pair combinations. The parameters to be estimated are {β i , δ ij , δ ii , ρ}, i, j ∈ {k, w, s}, and the central parameters are the competition effects δ ij , i, j ∈ {k, w, s}, i 6= j and the chain effects δ ii , i ∈ {k, w}. 2.5 Solution algorithm D i,l The unobserved market level profit shock εm , together with the chain effect δ ii Σl6=m Zml , renders all of the discrete variables Di,m , Dj,m , Di,l , and Ns,m endogenous in the profit functions (2.1) and (2.2). Finding the Nash equilibrium of this game is complicated. I take several steps to address this problem. Section 2.5.1 explains how to find each chain’s best response conditioning on rivals’ choices, section 2.5.2 derives the solution algorithm for the game between two chains, and section 2.5.3 adds the small retailers and solves for the Nash equilibrium of the full model. 2.5 Solution algorithm 2.5.1 25 The best response function In this subsection, let us focus on the chain’s single-agent problem and abstract from competition. In the next two subsections I incorporate competition and solve the model for all players. For notational simplicity, I have suppressed the firm subscript i and used Xm instead of Xm β i + p 1 − ρ2 εm + ρη i,m in the profit function throughout this subsection. Let M denote the total number of markets, and let D = {0, 1}M denote the choice set. An element of the set D is an M -coordinate vector D = {D1 , ..., DM }. The profit-maximization problem is: ¸ M ∙ X Dl Π= ) max Dm ∗ (Xm + δΣl6=m Zml D1, ...,DM ∈{0,1} m=1 The choice variable Dm appears in the profit function in two ways. First, it directly determines profit in market m: the firm earns Xm + δΣl6=m ZDmll if Dm = 1, and zero if Dm = 0. Second, the decision to open a store in market m increases the profits in other markets through the chain effect. The complexity of this maximization problem is twofold: first, it is a discrete problem of large dimension. In the current application, with M = 2065 and two choices for each market (enter or stay outside), the number of possible elements in the choice set D is 22065 , or roughly 10600 . The naive approach that evaluates all of them to find the profit-maximizing vector(s) is infeasible. Second, the profit function is irregular: it is neither concave nor convex. Consider the relaxed function where Dm takes real values, rather than integers {0, 1}. The Hessian of this function is indefinite, and the usual first-order condition does 2.5 Solution algorithm 26 not guarantee an optimum.14 Even if one could exploit the first-order condition, the search with a large number of choice variables is a daunting task. Instead of solving the problem directly, I transform it into a search for the fixed points of the necessary conditions for profit maximization. In particular, I exploit the lattice structure of the set of fixed points of an increasing function and propose an algorithm that obtains an upper bound DU and a lower bound DL for the profit-maximizing vector(s) . With these two bounds at hand, I evaluate all vectors that lie between them to find the profit-maximizing ones. The set of profit maximizing vectors may not be a singleton. For example, in the case of two markets with X1 = −1, X2 = −1, δ = 1, and Z1,2 = Z1,2 = 1, both D∗ = {0, 0} and D∗∗ = {1, 1} maximize the total profit. Here I assume there is only one solution. In appendix 2.9.5, I show that allowing multiple optimal solutions is a straightforward extension. Throughout this chapter, the comparison between vectors is coordinate-wise. A vector D is bigger than vector D0 if and only if every element of D is weakly bigger: D ≥ D0 if 0 ∀m. D and D 0 are unordered if neither D ≥ D0 nor D ≤ D0 . They and only if Dm ≥ Dm are the same if both D ≥ D0 and D ≤ D0 . Let the profit maximizer be denoted D∗ = arg maxD∈D Π(D). The optimality of D∗ implies that profit at D∗ must be (weakly) higher than the profit at any one-market devia- 14 A symmetric matrix is positive (negative) semidefinite iff all the eigenvalues are non-negative (nonpositive). The Hessian of the profit function (2.1) is a symmetric matrix with zero for all the diagonal elements. Its trace, which is equal to the sum of the eigenvalues, is zero. If the Hessian matrix has a positive eigenvalue, it has to have a negative one as well. There is only one possibility for the Hessian to be positive (or negative) semidefinite, which is that all the eigenvalues are 0. This is true only for the zero matrix H=0. 2.5 Solution algorithm 27 tion: ∗ ∗ ∗ Π(D1∗ , ..., Dm , ..., DM ) ≥ Π(D1∗ , ..., Dm , ..., DM ), ∀m which leads to: ∗ Dm = 1[Xm + 2δΣl6=m Dl∗ ≥ 0], ∀m Zml (2.3) The derivation of equation (2.3) is left to appendix 2.9.1. These conditions have the usual D∗ l interpretation that Xm + 2δΣl6=m Zml is market m’s marginal contribution to total profit. This equation system is not definitional; it is a set of necessary conditions for the optimal vector D∗ . Not all vectors that satisfy (2.3) maximize profit, but if D∗ maximizes profit, it must satisfy these constraints. Define Vm (D) = 1[Xm + 2δΣl6=m ZDmll ≥ 0], and V (D) = {V1 (D), ..., VM (D)}. V (·) is a vector function that maps from D into itself: V : D → D. It is an increasing function: V (D0 ) ≥ V (D00 ) whenever D0 ≥ D00 . By construction, the profit maximizer D∗ is one of V (·)’s fixed points. The following theorem, proved by Tarski (1955), states that the set of fixed points of an increasing function that maps from a lattice into itself is a lattice and has a greatest point and a least point. Appendix 2.9.2 describes the basic lattice theory. Theorem 2.1. Suppose that Y (X) is an increasing function from a nonempty complete lattice X into X. (a) The set of fixed points of Y (X) is nonempty, supX ({X ∈ X, X ≤ Y (X)}) is the greatest fixed point, and inf X ({X ∈ X, Y (X) ≤ X}) is the least fixed point. (b) The set of fixed points of Y (X) in X is a nonempty complete lattice. A lattice in which each nonempty subset has a supremum and an infimum is complete. 2.5 Solution algorithm 28 Any finite lattice is complete. A nonempty complete lattice has a greatest and a least element. Since the choice set D is a finite lattice, it is complete, and Theorem 2.1 can be directly applied. Several points are worth mentioning. First, X can be a closed interval or it can be a discrete set, as long as the set includes the greatest lower bound and the least upper bound for any of its nonempty subsets. That is, it is a complete lattice. Second, the set of fixed points is itself a nonempty complete lattice, with a greatest and a smallest point. Third, the requirement that Y (X) is “increasing” is crucial; it can’t be replaced by assuming that Y (X) is a monotone function. Appendix 2.9.2 provides a counterexample where the set of fixed points for a decreasing function is empty. Now I outline the algorithm that delivers the greatest and the least fixed point of V (D), which are, respectively, an upper bound and a lower bound for the optimal solution vector D∗ . To find D∗ , I rely on an exhaustive search among the vectors lying between these two bounds. Start with D0 = sup(D) = {1, ..., 1}. The supremum exists because D is a complete lattice. Define a sequence {Dt } : D1 = V (D0 ), and Dt+1 = V (Dt ). By the construction of D0 , we have: D0 ≥ V (D0 ) = D1 . Since V (·) is an increasing function, V (D0 ) ≥ V (D1 ), or D1 ≥ D2 . Iterating this process several times generates a decreasing sequence: D0 ≥ D1 ≥ ... ≥ Dt . Given that D0 has only M distinct elements and at least one element of the D vector is changed from 1 to 0 in each iteration, the process converges within M steps: DT −1 = DT , T ≤ M. Let DU denote the convergent vector. DU is a fixed point of the function V (·) : DU = V (DU ). To show that DU is indeed the greatest element of the set of fixed points, note that D0 ≥ D0 , where D0 is an arbitrary element of the set of fixed points. 2.5 Solution algorithm 29 Applying the function V (·) to the inequality T times, we have DU = V T (D0 ) ≥ V T (D0 ) = D0 . Using the dual argument, one can show that the convergent vector derived from D0 = inf(D) = {0, ..., 0} is the least element in the set of fixed points. Denote it by DL . In appendix 2.9.3, I show that starting from the solution to a constrained version of the profit maximization problem yields a tighter lower bound. There I also illustrate how a tighter upper bound can be obtained by starting with a vector D̃ such that D̃ ≥ D∗ and D̃ ≥ V (D̃). With the two bounds DU and DL at hand, I evaluate all vectors that lie between them and find the profit-maximizing vector D∗ . 2.5.2 The maximization problem with two competing chains The discussion in the previous subsection abstracts from rival-chain competition and considers only the chain effect. With the competition effect from the rival-chain, the profit D i,l function for chain i becomes: Πi (Di , Dj ) = ΣM m=1 [Di,m ∗ (Xim + δ ii Σl6=m Zml + δ ij Dj,m )], where Xim contains Xm β i + p 1 − ρ2 εm + ρη i,m . To address the interaction between the chain effect and the rival-chain competition effect, I invoke the following theorem from Topkis (1978), which states that the best response function is decreasing in the rival’s strategy when the payoff function is supermodular and has decreasing differences. Specifically:15 Theorem 2.2. If X is a lattice, K is a partially ordered set, Y (X, k) is supermodular in X on X for each k in K, and Y (X, k) has decreasing differences in (X, k) on X × K, then 15 The original theorem is stated in terms of Π(D, t) having increasing differences in (D, t), and arg maxD∈D Π(D, t) increasing in t. Replacing t with −t yields the version of the theorem stated here. 2.5 Solution algorithm 30 arg maxX∈X Y (X, k) is decreasing in k on {k : k ∈ K, arg maxX∈X Y (X, k) is nonempty}. Y (X, k) has decreasing differences in (X, k) on X × K if Y (X, k00 ) − Y (X, k0 ) is decreasing in X ∈ X for all k0 ≤ k 00 in K. Intuitively, Y (X, k) has decreasing differences in (X, k) if X and k are substitutes. In appendix 2.9.4, I verify that the profit function D i,l Πi (Di , Dj ) = ΣM m=1 [Di,m ∗(Xim +δ ii Σl6=m Zml +δ ij Dj,m )] is supermodular in its own strategy Di and has decreasing differences in (Di , Dj ). From Theorem 2.2, chain i’s best response correspondence arg maxDi ∈Di Πi (Di , Dj ) decreases in rival j’s strategy Dj . Similarly for chain j’s best response to i’s strategy. As the simple example in section 2.5.1 illustrates, given the rival’s strategy Dj , arg maxDi ∈D Πi (Di , Dj ) can contain more than one element. For the moment, assume that arg maxDi ∈D Πi (Di , Dj ) is a singleton for any given Dj . Appendix 2.9.5 discusses the case in which arg maxDi ∈D Πi (Di , Dj ) has multiple elements. The extension involves the concepts of set ordering and increasing (decreasing) selection, but is fairly straightforward. The set of Nash equilibria of a supermodular game is nonempty and it has a greatest element and a least element.16,17 The current entry game is not supermodular, as the profit function has decreasing differences in the joint strategy space D × D. This leads to a nonincreasing joint best response function, and we know from the discussion after Theorem 2.1 that a non-increasing function on a lattice can have an empty set of fixed points. A simple transformation, however, restores the supermodularity property of the game. The trick is to define a new strategy space for one player (for example, Kmart) to be the negative of 16 17 See Topkis (1978) and Zhou (1994). A game is supermodular if the payoff function Πi (Di , D−i ) is supermodular in Di for each D−i and each player i, and Πi (Di , D−i ) has increasing differences in (Di , D−i ) for each i. 2.5 Solution algorithm 31 e k = −Dk . The profit function can be re-written as: the original space. Let D Πk (−Dk , Dw ) = Σm (−Dk,m ) ∗ [−Xkm + δ kk Σl6=m Πw (Dw , −Dk ) = Σm Dw,m ∗ [Xwm + δ ww Σl6=m −Dk,l + (−δ kw )Dw,m ] Zml Dw,l + (−δ wk )(−Dk,m )] Zml e k , Dw ) is supermoduIt is easy to verify that the game defined on the new strategy space (D e k = −Dk , one can find lar, therefore a Nash equilibrium exists. Using the transformation D the corresponding equilibrium in the original strategy space. In the following paragraphs, I explain how to find the desired Nash equilibrium directly in the space of (Dk , Dw ) using the “Round-Robin” algorithm, where each player proceeds in turn to update its own strategy.18 To obtain the equilibrium most profitable for Kmart, start with the smallest vec0 = inf(D) = {0, ..., 0}. Derive Kmart’s best retor in Wal-Mart’s strategy space: Dw 0 ) = arg max 0 0 sponse K(Dw Dk ∈D Πk (Dk , Dw ) given Dw , using the method outlined in section 0 ). Similarly, find Wal-Mart’s best response W (D 1 ) = 2.5.1, and denote it by Dk1 = K(Dw k arg maxDw ∈D Πw (Dw , Dk1 ) given Dk1 , again using the method in section 2.5.1, and denote 1 . Note that D 1 ≥ D0 , by the construction of D0 . This finishes the first iteration it by Dw w w w 1 }. {Dk1 , Dw 1 and solve for Kmart’s best response D2 = K(D1 ). By Theorem 2.2, Kmart’s Fix Dw w k 1 ) ≤ D 1 = K(D0 ). The same best response decreases in the rival’s strategy, so Dk2 = K(Dw w k 2 ≥ D 1 . Iterating the process generates two monotone sequences: argument shows that Dw w 1 ≤ D2 ≤ ... ≤ D t . In every iteration, at least one element of Dk1 ≥ Dk2 ≥ ... ≥ Dkt , Dw w w the Dk vector is changed from 1 to 0, and one element of the Dw vector is changed from 18 See Topkis (1998) for a detailed discussion. 2.5 Solution algorithm 32 T = D T −1 , T ≤ M. 0 to 1, so the algorithm converges within M steps: DkT = DkT −1 , Dw w T ) constitute an equilibrium: D T = K(D T ), D T = W (D T ). The convergent vectors (DkT , Dw w w k k Furthermore, this equilibrium gives Kmart the highest profit among the set of all equilibria. T ) obtained using D0 = {0, ..., 0} to all That Kmart prefers the equilibrium (DkT , Dw w T ≤ D∗ for any D ∗ that belongs to an other equilibria follows from two results: first, Dw w w equilibrium; second, Πk (K(Dw ), Dw ) decreases in Dw , where K(Dw ) denotes Kmart’s best T ) ≥ Π (D∗ , D ∗ ), ∀ {D ∗ , D ∗ } that response function. Together they imply that Πk (DkT , Dw k w w k k belongs to the set of Nash equilibria. 0 ≤ D ∗ , by the construction of D 0 . To show the first result, note that Dw w w Since 0 ) ≥ K(D∗ ) = D ∗ . Similarly, D1 = W (D 1 ) ≤ K(Dw ) decreases in Dw , Dk1 = K(Dw w w k k ∗ . Repeating this process T times leads to DT = K(DT ) ≥ K(D ∗ ) = D∗ , W (Dk∗ ) = Dw w w k k T = W (D T ) ≤ W (D ∗ ) = D ∗ . The second result follows from Π (K(D∗ ), D∗ ) ≤ and Dw k w w w k k ∗ ), DT ) ≤ Π (K(D T ), D T ). The first inequality holds because Kmart’s profit funcΠk (K(Dw k w w w tion decreases in its rival’s strategy, while the second inequality follows from the definition of the best response function K(Dw ). By the dual argument, starting with Dk0 = inf(D) = {0, ..., 0} delivers the equilibrium that is most preferred by Wal-Mart. To search for the equilibrium that favors Wal-Mart in the southern region and Kmart in the rest of the country, one uses the same algorithm to solve the game separately for the south and the other regions. 2.5.3 Adding small firms Incorporating small firms into the game is a straightforward application of backward induction, since the number of small firms in the second stage is a well-defined function 2.6 Empirical implementation 33 Ns (Dk , Dw ). Chain i’s profit function now becomes Πi (Di , Dj ) = ΣM m=1 [Di,m ∗ (Xim + D i,l δ ii Σl6=m Zml + δ ij Dj,m + δ is ln(Ns (Di,m , Dj,m ) + 1)], where Xim is defined in the previous subsection. The profit function Πi (Di , Dj ) remains supermodular in Di with decreasing differences in (Di , Dj ) under a minor assumption, which essentially requires that the net competition effect of rival Dj on chain i’s profit is negative.19 The main computational burden in solving the full model with both chains and small retailers is the search for the best responses K(Dw ) and W (Dk ). In appendix 2.9.6, I discuss a few technical details related with the implementation. 2.6 2.6.1 Empirical implementation Estimation The model does not yield a closed form solution to firms’ location choices conditioning on market size observables and a given vector of parameter values. Hence I turn to simulation methods. The ones most frequently used in the I.O. literature are the method of simulated log-likelihood (MSL) and the method of simulated moments (MSM). Implementing MSL is difficult because of the complexities in obtaining an estimate of the log-likelihood of the observed sample. The cross-sectional dependence among the observed outcomes in different markets indicates that the log-likelihood of the sample is no longer the sum of the log-likelihood of each market, and one needs an exceptionally large number of simulations to get a reasonable estimate of the sample’s likelihood. Thus I adopt the 19 If we ignore the integer problem and approximate ln(Ns + 1) by −(Xsm + δ sk Dk + δ sw Dw ), then the δ sw δ sk assumption is: δ kw − δksδss < 0, δwk − δws < 0. Essentially, these two conditions imply that when there δss are small stores, the ‘net’ competition effect of Wal-Mart (its direct impact, together with its indirect impact working through small stores) on Kmart’s profit and that of Kmart on Wal-Mart’s profit are still negative. 2.6 Empirical implementation 34 MSM method to estimate the parameters in the profit functions θ0 = {β i , δ ii , δ ij , ρ}i=k,w,s ∈ Θ ⊂ RP . The following moment condition is assumed to hold at the true parameter value θ0 : E[g(Xm , θ0 )] = 0 where g(Xm , ·) ∈ RL with L ≥ P is a vector of moment functions that specifies the differences between the observed equilibrium market structures and those predicted by the model. A MSM estimator θ̂, minimizes a weighted quadratic form in ΣM m=1 ĝ(Xm , θ) : 1 min θ∈Θ M ∙ M P m=1 ¸0 ∙ M ¸ P ĝ(Xm , θ) Ω ĝ(Xm , θ) (2.4) m=1 where ĝ(·) is a simulated estimate of the true moment function, and Ω is an L × L positive p semidefinite weighting matrix. Assume Ω → Ω0 , an L × L positive definite matrix. Define the L × P matrix G0 = E[∇θ g(Xm , θ0 )]. Under some mild regularity conditions, Pakes and Pollard (1989) and McFadden (1989) show that: √ d −1 M (θ̂ − θ0 ) → Normal(0, (1 + R−1 ) ∗ A−1 0 B0 A0 ) (2.5) where R is the number of simulations, A0 ≡ G00 Ω0 G0 , B0 = G00 Ω0 Λ0 Ω0 G0 , and Λ0 = E[g(Xm , θ0 )g(Xm , θ0 )0 ] =Var[g(Xm , θ0 )]. If a consistent estimator of Λ−1 0 is used as the weighting matrix, the MSM estimator θ̂ is asymptotically efficient, with its asymptotic −1 variance being Avar(θ̂) = (1 + R−1 ) ∗ (G00 Λ−1 0 G0 ) /M. The obstacle in using this standard MSM method is that the moment functions g(Xm , ·) 2.6 Empirical implementation 35 are no longer independent across markets when the chain effect induces spatial correlation in the equilibrium outcome. For example, Wal-Mart’s entry decision in Benton County, Arkansas directly relates to its entry decision in Carroll County, Arkansas, Benton’s neighbor. In fact, any two entry decisions, Di,m and Di,l , are correlated because of the chain effect, although the dependence becomes very weak when market m and market l are far apart, since the benefit Di,l Zml evaporates with distance. The MSM estimator remains consistent with such dependent data, but the covariance matrix needs to be corrected. In particular, the asymptotic covariance matrix of the moment functions Λ0 in equation (2.5) should be replaced by Λd0 = Σs∈M E[g(Xm , θ0 )g(Xs , θ0 )0 ]. Conley (1999) proposes a nonparametric covariance matrix estimator formed by taking a weighted average of spatial autocovariance terms, with zero weights for observations farther than a certain distance. The method requires the underlying data generating process to satisfy a mixing condition that the dependence among observations dies away quickly as the distance increases. Following Conley (1999) and Conley & Ligon (2002), the estimator of Λd0 is: Λ̂ ≡ £ ¤ 1 Σm Σs∈Bm ĝ(Xm , θ)ĝ(Xs , θ)0 M (2.6) where Bm is the set of markets whose centroid is within fifty miles of market m, including market m. I have also estimated the variance of the moment functions Λ̂ summing over markets within a hundred miles. All of the parameters that are significant with the smaller set of Bm remain significant, and the changes in the t-statistics are very small. The estimation procedure is as follows. Start from some initial guess of the parameter values, and draw from the normal distribution four independent vectors: a vector of the 2.6 Empirical implementation 36 M M market level errors {εm }M m=1 and three vectors of firm-specific errors {η k,m }m=1 , {η w,m }m=1 , and {η s,m }M m=1 . Obtain the simulated profits Π̂i , i = k, w, s and solve for D̂k , D̂w , N̂s . Repeat the simulation R times and formulate ĝ(Xm , θ). Search for parameter values that minimize the objective function (2.4), while using the same set of simulation draws for all values of θ. To implement the two-step efficient estimator, I use the identity weighting matrix to find a preliminary estimate θ̃, which is then substituted in equation (2.6) to compute the optimal weight matrix Λ̂−1 for the second step. Instead of the usual machine-generated pseudo-random draws, I use Halton draws, which have better coverage properties and smaller simulation variances.20 According to Train (2000), 100 Halton draws achieves greater accuracy in his mixed logit estimation than 1000 pseudo-random draws. The parameter estimation exploits 150 Halton simulation draws while the variance is calculated with 300 Halton draws. There are twenty-six parameters with the following set of moments that match the predicted and the observed values of a) numbers of Kmart stores, Wal-Mart stores, and small firms; b) various kinds of market structures (for example, only a Wal-Mart store but no Kmart stores); c) the number of chain stores in the nearby markets; and d) the interaction between the market size variables and the above items. 20 A Halton sequence is defined in terms of a given number, usually a prime. As an illustration, consider the prime 3. Divide the unit interval evenly into three segments. The first two terms in the Halton sequence are the two break points: 13 and 23 . Then divide each of these three segments into thirds, and add the break points for these segments into the sequence in a particular way: 13 , 23 , 19 , 49 , 79 , 29 , 59 , 89 . Note that the lower break points in all three segments ( 19 , 49 , 79 ) are entered in the sequence before the higher break points ( 29 , 59 , 89 ). Then each of the 9 segments is divided into thirds, and the break points are added to the sequence: 1 2 1 4 7 2 5 8 1 10 19 4 13 22 , , , , , , , , , , , , , , and so on. This process is continued for as many points as the 3 3 9 9 9 9 9 9 27 27 27 27 27 27 researcher wants to obtain. See chapter 9 of “Discrete Choice Methods with Simulation (2003)” by Kenneth Train for an excellent discussion of the Halton draws. 2.6 Empirical implementation 2.6.2 37 Discussion: a closer look at the assumptions Now I discuss several assumptions of the model: the game’s information structure and issues of multiple equilibria, the symmetry assumption for small firms, and the non-negativity of the chain effect. Information structure and multiple equilibria In the empirical entry literature, a common approach is to assume complete information and simultaneous entry. One problem with this approach is the presence of multiple equilibria, which has posed considerable challenges to estimation. Some researchers look for features that are common among different equilibria. For example, Bresnahan and Reiss (1990 and 1991) and Berry (1992) point out that although firm identities differ across different equilibria, the number of entering firms might be unique. Grouping different equilibria by their common features leads to a loss of information and less efficient estimates. Further, common features are increasingly difficult to find when the model becomes more realistic. Others give up point identification of parameters and search for bounds, as in Andrews, Berry and Jia (2004), Chernozhukov, Hong, and Tamer (2004), Pakes, Porter, Ho, and Ishii (2005), Shaikh (2005). However, a meaningful bound might be difficult to obtain in complicated models as the one employed here, which involves three sets of profit functions with twenty-six parameters. Given the above considerations, I choose an equilibrium that seems reasonable a priori. In the baseline specification, I estimate the model using the equilibrium that is most profitable for Kmart because Kmart derives from an older entity and historically might have had a first-mover advantage. As a robustness check, I experiment with two other cases. The 2.6 Empirical implementation 38 first one chooses the equilibrium that is most profitable for Wal-Mart. This is the direct opposite of the baseline specification and is inspired by the hindsight of Wal-Mart’s success. The second one selects the equilibrium that is most profitable for Wal-Mart in the south and most profitable for Kmart in the rest of the country. This is based on the observation that the northern regions had been Kmart’s backyard until recently while Wal-Mart started its business from the south and has expertise in serving the southern population. The estimated parameters for the different cases are very similar to one another, which provides evidence that the results are robust to the equilibrium choice. The symmetry assumption for small firms I have assumed that all small firms are symmetric with the same profit function. The assumption is necessitated by data availability, since I do not observe any firm characteristics for small firms. Making this assumption greatly simplifies the complexity of the model with asymmetric competition effects, as it guarantees that in the second stage the equilibrium number of small firms in each market is unique. The chain effect δ ii The assumption that δ ii ≥ 0, i ∈ {k, w} is crucial to the solution algorithm, since it implies that the function V (D) defined by the necessary condition (2.3) is increasing, and that the profit function (2.1) is supermodular in chain i’s own strategy. These results allow me to employ two powerful theorems — Tarski’s fixed point theorem and Topkis’s theorem — to solve a complicated problem that is otherwise unmanageable. The parameter δ ii does not have to be a constant. It can be region specific, or it can vary with the size of each market 2.7 Results 39 (for example, interacting with population), as long as it is weakly positive. However, the algorithm breaks down if either δ kk or δ ww becomes negative, and it excludes scenarios where the chain effect is positive in some regions and negative in others. The discussion so far has focused on the beneficial aspect of locating stores close to each other. In practice, stores begin to compete for consumers when they get too close. As a result, chains face two opposing forces when making location choices: the chain effect and the business stealing effect. It is conceivable that in some areas stores are so close that the business stealing effect outweighs the gains and δ ii becomes negative. Holmes (2005) estimates that for places with a population density of 20,000 people per five-mile radius (which is comparable to an average city in my sample counties), 89% of the average consumers visits a Wal-Mart right near by.21 When the distance increases to 5 miles, 44% of the consumers visits the store. The percentage drops to 7% if the store is 10 miles away. Survey studies also show that few consumers drive further than 10-15 miles for general merchandise shopping. In my sample, the median distance to the nearest store is 21 miles for Wal-Mart stores, and 27 miles for Kmart stores. It seems reasonable to think that the business stealing effect, if it exists, is small. 2.7 2.7.1 Results Parameter estimates The sample includes 2065 small- and medium-sized counties with populations between 5,000 and 64,000. Even though I do not model Kmart’s and Wal-Mart’s entry decisions in other 21 This is the result from a simulation exercise where the distance is set to 0 mile. 2.7 Results 40 counties, I incorporate into the profit function the spillover from stores outside the sample. This is especially important for Wal-Mart, as the number of Wal-Mart stores in big counties doubled over the sample period. Table 1 (C) displays the summary statistics of the D k,l distance weighted numbers of adjacent Kmart stores Σl6=m,l∈Bm Zml and Wal-Mart stores Σl6=m,l∈Bm Dw,l Zml , which measure the spillover from nearby stores (including stores outside the sample). In 1997, the Kmart spillover variable remained roughly the same as in 1988, but the Wal-Mart spillover variable was almost twice as big as in 1988. The profit functions of all retailers share three common explanatory variables: log of population, log of real retail sales per capita, and the percentage of population that is urban. Many studies have found a pure size effect: there tend to be more stores in a market as the population increases. Retail sales per capita capture the “depth” of a market and explain firm entry behavior better than personal income does. The percentage of urban population measures the degree of urbanization. It is generally believed that urbanized areas have more shopping districts that attract big chain stores. For Kmart, the profit function includes a dummy variable for the Midwest regions. Kmart’s headquarters are located in Troy, Michigan. Until the mid 1980s, this region had always been the “backyard” of Kmart stores. Similarly, Wal-Mart’s profit function includes a southern dummy, as well as the log of distance in miles to its headquarters in Benton, Arkansas. This distance variable turns out to be a useful predictor for Wal-Mart stores’ location choices. For small firms, everything else equal, there are more small firms in the southern states. It could be that there have always been fewer big retail stores in the southern regions and that people rely on neighborhood small firms for day-to-day shopping. 2.7 Results 41 All coefficients (the market size coefficients β i , the competitive effects δ ij , and the chain effect δ ii ) are allowed to be firm-specific. I report the probit coefficients (for Kmart and WalMart) and the ordered probit coefficients (for small retailers) in Table 2 and the coefficients from the full model in Table 3. There are six probit (ordered probit) regressions, one for each player in each year. The market size coefficients have the same signs as those from the full model, but the competition effects are biased toward zero and the chain effects have the wrong signs. Table 3 lists the parameter estimates from the full model for 1988 and 1997. In this subsection I focus on the β’s; in the next one I discuss δ ij and δ ii in detail. Coefficients for market size variables are highly significant and intuitive, with the exception of the urban variable in the small firms’ profit function, which suggests fewer small firms locate in more urbanized areas. ρ is much smaller than 1, indicating the importance of the market level error terms and the necessity of controlling for endogeneity of all firms’ entry decisions. The model is estimated three times, each time with a different equilibrium. Tables 4 (A) and 4 (B) present the three sets of estimates for 1988 and 1997, respectively. In both tables, column one corresponds to the equilibrium most preferred by Kmart; column two uses the equilibrium most preferred by Wal-Mart; column three chooses the one that grants Wal-Mart an advantage in the southern regions and Kmart an advantage in the rest of the country. The estimates are very similar across the different equilibria. Tables 5(A) and 5(B) display the model’s goodness of fit. In Table 5(A), columns one and three display the sample averages, while the other two columns list the model’s predicted averages. The model matches exactly the actual average numbers of Kmart and Wal-Mart 2.7 Results 42 stores for 1988, and comes very close to them for 1997. The number of small firms is a noisy variable and is much harder to predict. Its sample median was 3, but the maximum was 25 in 1988 and 19 in 1997. The model does a decent job of fitting the data: the sample average was 3.86 per county in 1988 and 3.49 per county in 1997; the model’s predictions are 3.79 and 3.43, respectively. Such results might be expected as the parameters are chosen to match these moments. In Table 5(B), I report the correlations between the predicted and observed numbers of Kmart stores, Wal-Mart stores, and small firms in each market. The correlations are between 0.63 and 0.75. These correlations are not included in the set of moment functions, and a high correlation indicates a good fit. Overall, the model explains the data well. To check whether the estimates are reasonable, Table 6 lists the model’s predicted profits and compares them with the accounting profits documented in Kmart’s and WalMart’s SEC 10-K annual reports. According to the model, the average profit of Wal-Mart stores grew 51% over the sample period, which is consistent with the recorded increase of 41% in Wal-Mart’s annual reports. Kmart’s accounting profit in 1997 was substantially smaller than that in 1988, due to the financial obligations of divesting several specialized retailing businesses that were overall a financial disappointment. The average real sales per Kmart store increased 2.6% over the ten-year period. Considering the various increases in the operating costs documented in Kmart’s annual report, the change in its store sales 2.7 Results 43 revenue is compatible with the 8% decrease in the store profit predicted by the model.22,23 To understand the magnitudes of the market size coefficients, I report in Table 7 the changes in the number of each type of stores when each market size variable changes. For example, to derive the effect of population change on the number of small firms, I fix Kmart’s and Wal-Mart’s profits, increase small retailers’ profit in accordance with a ten percent increase in population, and re-solve for the new equilibrium number of small stores. The market size variables have a relatively modest impact on the number of small businesses. In 1988, a 10% increase in population attracts 8.7% more firms. The same increase in real retail sales per capita draws 5.4% more firms. The number of small firms declines by about 0.7% when the percentage of urban population goes up by 10%. In comparison, the regional dummy is much more important: everything else equal, changing the southern dummy from 1 for all counties to 0 for all counties leads to 37.2% fewer small firms (6136 small stores vs. 9775 small stores). Market size variables seem to matter more for big chains. In 1988, A 10% growth in population induces Kmart to enter 12.1% more markets and Wal-Mart 10.3% more markets. A similar increment in retail sales attracts entry of Kmart and Wal-Mart stores in 17.6% 22 The model predicted profit for Kmart (Wal-Mart) is the average of the equilibrium profit over all Kmart (Wal-Mart) stores. There are two caveats in the comparison of the profit growth predicted by the model and the reported growth of the accounting profit. First, there is no real unit for the profit derived from the structural profit function — it is scaled by one standard deviation of the unobserved error term in the profit function. The 51% increase in the model predicted profit from 1988 to 1997 assumes that the standard deviation of the error term did not change over this sample period. Second, the 41% growth in the accounting profit is averaged over all stores, including stores in counties outside the sample. The calculation here assumes that profit growth is roughly the same for stores in the sample counties and stores outside the sample counties. 23 Wal-Mart’s 1988 and 1997 annual reports do not separate the profit of Wal-Mart stores from the profit of Sam’s clubs. Since the gross markup in Sam’s clubs is half of that in the regular Wal-Mart discount stores, two dollars of sales from a Sam’s club are assumed to contribute to the total profit the same as one dollar of sales from a Wal-Mart discount store. 2.7 Results 44 and 10.5% more markets, respectively. The results are similar for 1997. These differences indicate that Kmart is much more likely to locate in bigger markets, while Wal-Mart thrives in smaller markets. Perhaps not surprisingly, the regional advantage is substantial for both chains: controlling for the market size, changing the Midwest regional dummy from 1 to 0 for all counties leads to 28% fewer Kmart stores, and changing the Southern regional dummy from 1 to 0 for all counties leads to 52.5% fewer Wal-Mart stores. When distance increases by 10%, the number of Wal-Mart stores drops by 8.4%. Wal-Mart’s “home advantage” is much smaller in 1997: everything else the same, changing the south dummy from 1 to 0 for all counties leads to 22% fewer Wal-Mart stores. The regional dummies and the distance variable provide a reduced-form way for the static model to capture the path-dependence of the expansion of Wal-Mart stores. 2.7.2 The competition effect and the chain effect As shown in Table 3, all of the competition effects and the chain effects, with the exception of the impact of small firms on chain stores, are precisely estimated. The estimates display several noticeable features. First, the negative impact of Kmart on Wal-Mart’s profit δ wk in absolute value is much smaller in 1997 than in 1988, while the opposite is true for Wal-Mart’s impact on Kmart’s profit δ kw .24 Both a Cournot model and a Bertrand model with differentiated products predict that reduction in rivals’ marginal costs drives down a firm’s own profit. I do not observe firms’ marginal costs, but these parameter estimates are consistent with evidence that Wal-Mart’s marginal cost was declining relative to Kmart’s over the sample period. Wal-Mart is famous for its cost-sensitive culture; it 24 Due to the high variance of the estimates, the difference is not statistically significant. 2.7 Results 45 is also keen on technology advancement. Holmes (2001) cites evidence that Wal-Mart has been a leading investor in information technology. In contrast, Kmart struggled with its management failures that resulted in stagnant revenue sales, and it either delayed or abandoned store renovation plans throughout the 1990s. Second, it is somewhat surprising that the negative impact of Kmart on small firms’ profit δ sk is comparable to Wal-Mart’s impact δ sw , considering the controversies and media reports generated by Wal-Mart. The outcry about Wal-Mart was probably because WalMart had more stores in small- to medium-sized markets where the effect of a big store entry was felt more acutely, and because Wal-Mart kept expanding, while Kmart was consolidating its existing stores with few net openings in these markets over the sample period. Third, the coefficient for Wal-Mart’s chain effect δ ww is smaller in 1997, although the overall effect is bigger, as Σl6=m,l∈Bm Dw,l Zml is almost twice as large in 1997 as in 1988.25 The decline in δ ww suggests that the benefit of scale economies does not grow proportionally. In fact there are good reasons to believe it might not be monotone because, as discussed in section 2.6.2, when chains grow bigger and saturate the area, cannibalization among stores becomes a stronger concern. To better assess the magnitude of the competition effects, Table 8(A) and Table 8(B) re-solve the model for different market structures. The results from Table 8(A) suggest that chains have a substantial competition impact on small firms. In 1988, compared with the scenario where there are neither Kmart nor Wal-Mart stores, adding a Kmart store to each market reduces the number of small firms by 46%, or 2.69 firms per county; adding a 25 The difference between δ ww in 1988 and δ ww in 1997 is not statistically significant. 2.7 Results 46 Wal-Mart store reduces it by 43.3%, or 2.53 firms per county. When both a Kmart and a Wal-Mart store enter, the number of small firms plummets by 71.4%, a reduction of 4.17 firms per county. If Wal-Mart takes over Kmart, the number of small firms is 12.3% higher than that observed in the sample when Wal-Mart and Kmart compete against each other.26 This is due to a business stealing effect between the competing chains — when the two chains merge or one chain takes over the other, the joint-profit-maximizing number of chain stores is smaller, which in turn leads to a larger number of small firms. The patterns are quite similar in 1997: compared with the case of no chain stores, adding a Kmart store to each market decreases the number of small firms by 36.2%, or 1.92 per county; adding a Wal-Mart, 37%, or 1.96 per county; adding both a Kmart and a Wal-Mart store, 61.6%, or 3.27 per county. Even with the conservative estimate that one Kmart or Wal-Mart store displaces 40% of the small firms, the competition effect of chains on small retailers is sizable, especially since the small discount firms form only a segment of the retailers affected by the entry of chain stores. The combined effect on all small retailers and local communities in general can be much larger. Table 8 (B) illustrates the competition effect between Kmart and Wal-Mart. Consistent with the changes in δ kw and δ wk from 1988 to 1997, the effect of Kmart’s presence on WalMart’s profit is much stronger in 1988, while the effect of Wal-Mart’s presence on Kmart’s profit is stronger in 1997. For example, in 1988, Wal-Mart would only enter 318 markets if there were a Kmart store in every county. When Kmart ceases to exist as a competitor, the 26 In this counter-factual exercise, Wal-Mart becomes the monopoly chain and competes with small retailers. The total number of chain stores is just the total number of Wal-Mart stores in the new equilibrium, and is smaller than the sum of Wal-Mart stores and Kmart stores before the take-over. 2.7 Results 47 number of markets with Wal-Mart stores rises to 846, a net increase of 166%. The same experiment in 1997 leads Wal-Mart to enter 39.3% more markets, from 728 to 1014. The pattern is reversed for Kmart. In 1988, Kmart would enter 42.6% more markets when there is no Wal-Mart stores compared with the case of one Wal-Mart store in every county (479 Kmart stores vs. 336 Kmart stores); in 1997, Kmart would enter 87.1% more markets for the same experiment (610 Kmart stores vs. 326 Kmart stores).27 To examine the importance of the chain effect for both chains, consider Table 8 (C). The first row reports the percentage of store profit due to the chain effect; this is the average D i,l of δ ii Σl6=m,l∈Bm Zml divided by the average store profit (reported in Table 6). For both chains, the chain effect contributes to more than 10% of the store profit. For example, it accounts for 10.2% of Wal-Mart’s profit in 1988, and 12.3% of its profit in 1997. To derive the equilibrium number of stores when there is no chain effect, I set δ ii = 0 for the targeted chain, but keep the rival’s δ jj unchanged and re-solve the model. In 1988, without the chain effect, the number of Kmart stores would have decreased by 40, and Wal-Mart would have entered 125 fewer markets. The numbers are comparable for 1997. This result is consistent with Holmes (2005), who also found scale economies to be important. Given the magnitude of these spillover effects, further research that explains their mechanism will help improve our understanding of the retail industry, in particular its productivity gains over the past several decades.28 27 In solving for the number of Wal-Mart (Kmart) stores when Kmart (Wal-Mart) exits, I allow the small firms to compete with the remaining chain. 28 See Foster et al (2002). 2.7 Results 2.7.3 48 The impact of Wal-Mart’s expansion and related policy issues Consistent with media reports about Wal-Mart’s impact on small retailers, the model predicts that Wal-Mart’s expansion contributes to a large percentage of the net decline in the number of small firms over the sample period. The first row in Table 9 (A) records the net decrease of 748 small firms observed over the sample period, or 0.36 per market. To evaluate the impact of Wal-Mart’s expansion on small firms separately from other factors (e.g., the change in market sizes or the change in Kmart stores), I re-solve the model using the 1988 coefficients for Kmart’s and small firms’ profit functions and the 1988 market size variables, but the 1997 coefficients for Wal-Mart’s profit function. The experiment corresponds to holding everything the same as in 1988, but allowing Wal-Mart to become more efficient and expand. The predicted number of small firms falls by 558 from the model prediction using the 1988 coefficients for Wal-Mart’s profit function. This accounts for 75% of the observed decrease in the number of small firms. Conducting the same experiment but using the 1997 coefficients for Kmart’s and small firms’ profit functions, the 1997 market size variables, and the 1988 coefficients for Wal-Mart’s profit function, I find that Wal-Mart’s expansion accounts for 383 stores, or 51% of the observed decrease in the number of small firms. If we ignore the endogeneity of chains’ entry decisions and regress the number of small firms on the number of chains together with the market size variables, we would underestimate the impact of Wal-Mart’s expansion on small retailers by a large amount. For example, using the coefficients from an ordered probit model applied to the 1988 data, the difference between the expected number of small firms using Wal-Mart’s 1988 store number 2.7 Results 49 and the expected number of small firms using Wal-Mart’s 1997 store number explains only 33% of the observed decline in the number of small firms. Using the coefficients from the same ordered probit model applied to the 1997 data, Wal-Mart’s expansion between 1988 and 1997 accounts for only 20% of the observed decline in the number of small firms.29 Overall, ignoring the endogeneity of chains’ entry decisions underestimates the competition effect by fifty to sixty percent. Using the conservative figure of 383 stores, the absolute impact of Wal-Mart’s entry seems modest. However, the exercise here includes only small firms in the discount sector. Both Kmart and Wal-Mart carry a large assortment of products and compete with a variety of stores, like hardware stores, houseware stores, apparel stores, etc., so that their impact on local communities is conceivably much larger. To examine the overall impact of Wal-Mart’s expansion, one needs to include a separate profit function for firms in each of these other categories and estimate the system of profit functions jointly. Government subsidy has long been a policy instrument to encourage firm investment and to create jobs. To evaluate the effectiveness of this policy in the discount retailing sector, I simulate the equilibrium numbers of stores when various firms are subsidized. The results in Table 10 indicate that direct subsidies do not seem to be effective in generating jobs. In 1988, subsidizing Wal-Mart stores 10% of their average profit, which amounts to one million dollars, increases the number of Wal-Mart stores per county only from 0.31 to 0.34.30 With the average Wal-Mart store hiring fewer than 300 full and part-time employees, 29 The ordered probit regressions use the same right-hand side variables as the structural model. See Table 2 for the coefficients from the ordered probit regressions. 30 The average Wal-Mart store’s net income in 1988 is about one million in 2004 dollars (see Table 6). Using a discount rate of 10%, the discounted present value of a store’s lifetime profit is about ten million. 2.8 Conclusion and future work 50 the additional number of stores translates to at most nine new jobs.31 Similarly, subsidizing all small firms by 100% of their average profit increases their number from 3.79 to 4.61, and generates eight jobs if on average a small firm hires ten employees. Together, these exercises suggest that a direct subsidy should be used with caution if it is designed to increase employment in this industry. 2.8 Conclusion and future work I have examined the competition effect of chain stores on small firms and the role of the chain effect in firms’ entry decisions. The results support the anecdotal evidence that “big drives out small.” On average, entry by either a Kmart or a Wal-Mart store displaces forty to fifty percent of the small discount firms. Wal-Mart’s expansion from the late 1980s to the late 1990s explains fifty to seventy percent of the net change in the number of small discount firms. Failure to address the endogeneity of firms’ entry decisions would result in underestimating this impact by fifty to sixty percent. Furthermore, direct subsidies to either chains or small firms are not likely to be effective in creating jobs and should be used with caution. These results reinforce the concerns raised by many policy observers regarding the subsidies directed to big retail corporations. Perhaps less obvious is the conclusion that subsidies toward small retailers should also be designed carefully. Like Holmes (2005), I find that scale economies, as captured by the chain effect, generate A subsidy of 10% is equivalent to one million dollars. 31 The equilibrium numbers of Kmart stores and small firms decrease slightly when Wal-Mart is subsidized, but the implied change in employment is tiny. 2.9 Appendix A: definitions and proofs 51 substantial benefits. Studying these scale economies in more detail is useful for helping firms exploit such advantages and for guiding merger policies or other regulations that affect chains. A better understanding of the mechanism underlying these spillover effects will also help us to gain insight in the productivity gains in the retail industry over the past several decades. Finally, the algorithm used in this chapter can be applied to many industries where scale economies are important. One application is the airline industry, where the network of flight routes exhibits a type of spillover effect similar to the one described here. For example, adding a route from New York to Boston directly affects profits of flights that either originate from or end in Boston and New York. The tools proposed in this chapter can be deployed to extend current models of strategic interaction among airlines to incorporate such network effects. Another possible application is to industries with cost complementarity among different products. The algorithm here is particularly suitable for modeling firms’ product choices when the product space is large. 2.9 2.9.1 Appendix A: definitions and proofs Verification of the necessary condition (2.3) Let D∗ = arg maxD∈D Π(D). The optimality of D∗ implies the following set of necessary conditions: ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ Π(D1∗ , ..., Dm−1 , Dm , Dm+1 , ..., DM ) ≥ Π(D1∗ , ..., Dm−1 , Dm , Dm+1 , ..., DM ), ∀m, Dm 6= Dm 2.9 Appendix A: definitions and proofs 52 ∗ ∗ ∗ }. Π(D ∗ ) differs from Π(D̂) in two parts: the Let D̂ = {D1∗ , ..., Dm−1 , Dm , Dm+1 , ..., DM profit in market m, and the profit in all other markets through the chain effect: Dl∗ ]+ Zml D∗ Dm ) δΣl6=m Dl∗ ( m ) − δΣl6=m Dl∗ ( Zlm Zlm D∗ ∗ = (Dm − Dm )[Xm + 2δΣl6=m l ] Zml ∗ Π(D∗ ) − Π(D̂) = (Dm − Dm )[Xm + δΣl6=m ∗ 6= D , we where Zml = Zlm due to the symmetry. Since Π(D∗ ) − Π(D̂) ≥ 0 for any Dm m D∗ ∗ = 1, D = 0 if and only if X + 2δΣ ∗ l have Dm m m l6=m Zml ≥ 0; and Dm = 0, Dm = 1 if and only D∗ D∗ ∗ = 1[X + 2δΣ l l if Xm + 2δΣl6=m Zml ≤ 0. Together they imply Dm m l6=m Zml ≥ 0]. 2.9.2 The set of fixed points of an increasing function that maps a lattice into itself All of the definitions in this appendix — the definitions of a lattice, a complete lattice, supermodular functions, increasing differences, as well as induced set ordering — are taken from Topkis (1998). The definition of a lattice involves the concepts of a partially ordered set, a join, and a meet. A partially ordered set is a set X on which there is a binary relation ¹ that is reflexive, antisymmetric, and transitive. If two elements, X 0 and X 00 , of a partially ordered set X have a least upper bound (greatest lower bound) in X, it is their join (meet) and is denoted X 0 ∨ X 00 (X 0 ∧ X 00 ). Definition 2.1. A partially ordered set that contains the join and the meet of each pair of its elements is a lattice. Definition 2.2. A lattice in which each nonempty subset has a supremum and an infimum 2.9 Appendix A: definitions and proofs 53 is complete. Any finite lattice is complete. A nonempty complete lattice has a greatest element and a least element. Tarski’s fixed point theorem, stated in the main body of the chapter as Theorem 2.1, establishes that the set of fixed points of an increasing function that maps from a lattice into itself is a nonempty complete lattice with a greatest element and a least element. For a counterexample where a decreasing function’s set of fixed points is empty, consider the following simplified entry model where three firms compete with each other and decide simultaneously whether to enter the market. Their joint strategy space is D = {0, 1}3 . The profit functions are as follows: ⎧ ⎪ ⎪ Πk = Dk (0.5 − Dw − 0.25Ds ) ⎪ ⎪ ⎨ Πw = Dw (1 − 0.5Dk − 1.1Ds ) ⎪ ⎪ ⎪ ⎪ ⎩ Πs = Ds (0.6 − 0.5Dw − 0.7Ds ) Let D = {Dk , Dw , Ds } ∈ D, D−i denote rivals’ strategies, Vi (D−i ) denote the best response function for player i, and V (D) = {Vk (D−k ), Vw (D−w ), Vs (D−s )} denote the joint best response function. It is easy to show that V (D) is a decreasing function that takes the following values: ⎧ ⎪ ⎨ V (0, 0, 0) = {1, 1, 1}; V (0, 0, 1) = {1, 0, 1}; V (0, 1, 0) = {0, 1, 1}; V (0, 1, 1) = {0, 0, 1} ⎪ ⎩ V (1, 0, 0) = {1, 1, 0}; V (1, 0, 1) = {1, 0, 0}; V (1, 1, 0) = {0, 1, 0}; V (1, 1, 1) = {0, 0, 0} The set of fixed points for V (D) is empty, since there does not exist a D ∈ D such that V (D) = D. 2.9 Appendix A: definitions and proofs 2.9.3 54 A tighter lower bound and upper bound for the optimal solution vector D∗ In section 2.5.1 I have shown that using inf(D) and sup(D) as starting points yields, respectively, a lower bound and an upper bound to D∗ = arg maxD∈D Π(D). Here I introduce two bounds that are tighter. The lower bound builds on the solution to a constrained maximization problem: max D1, ...,DM ∈{0,1} Π = ¸ M ∙ X Dl Dm ∗ (Xm + δΣl6=m ) Zml i=1 s.t. if Dm = 1, then Xm + δΣl6=m Dl >0 Zml The solution to this constrained maximization problem belongs to the set of fixed points of the vector function V̂ (D) = {V̂1 (D), ..., V̂M (D)}, where V̂m (D) = 1[Xm + δΣl6=m ZDmll > 0]. The function V̂ (·) is increasing and maps from D into itself: V : D → D. Using arguments similar to those in section 2.5.1, one can show that the convergent vector D̂ using sup(D) as the starting vector is the greatest element of the set of fixed points. Further, D̂ achieves a higher profit than any other fixed point of V̂ (·), since by construction each non-zero element of the vector D̂ adds to the total profit. Changing any non-zero element(s) of D̂ to zero reduces the total profit. To show that D̂ ≤ D∗ , the solution to the original unconstrained maximization problem, we construct a contradiction. Since the maximum of an unconstrained problem is always greater than that of a corresponding constrained problem, we have: Π(D∗ ) ≥ Π(D̂). Therefore, D∗ can’t be strictly smaller than D̂, because any vector strictly smaller than D̂ delivers 2.9 Appendix A: definitions and proofs 55 a lower profit. Suppose D∗ and D̂ are unordered. Let D∗∗ = D∗ ∨ D̂ (where “∨” defines the element-by-element Max operation). The change from D∗ to D∗∗ increases total profit, ∗ = 1 do not decrease after the change, and profits at because profits at markets with Dm ∗ = 0 but D̂ = 1 increase from 0 to a positive number after the change. markets with Dm m This contradicts the definition of D∗ , so D̂ ≤ D∗ . Note that V (D̂) ≥ D̂, where V (·) is as defined in section (2.5.1). This follows from Vm (D̂) = 1[Xm + 2δΣl6=m ZD̂mll ≥ 0] ≥ 1[Xm + δΣl6=m ZD̂mll ≥ 0] = D̂m , ∀m, where the last equality holds because D̂ is a fixed point of V̂ (·). The monotonicity of V (·), together with D̂ ≤ D∗ and V (D̂) ≥ D̂, implies that the iteration process starting from D̂ converges, and that the convergent vector (denoted as D̂T ) is a lower bound to D∗ . D̂T is a tighter lower bound than DL (discussed in section (2.5.1)) because D̂ ≥ inf(D), so D̂T = V T T (D̂) ≥ V T T (inf(D)) = DL , with T T = max{T, T 0 }, where T is the number of iterations from D̂ to D̂T and T 0 is the number of iterations from inf(D) to DL . Since the chain effect is bounded by zero (when there are no other stores anywhere) and δΣl6=m Z1ml (when there is a store in every market), we can find a tighter upper bound to D∗ by starting from the vector D̃ = {D̃m : D̃m = 0 if Xm + 2δΣl6=m Z1ml < 0; D̃m = 1 otherwise}. The markets with D̃m = 0 contribute a negative element to total profit even with the largest conceivable chain effect, so it is never optimal to enter these markets, i.e., D̃ ≥ D∗ . It is straightforward to show that the iteration converges (V T (D̃) = D̃T , T ≤ M ) and that the convergent vector D̃T is a tighter upper bound to D∗ than DU .32 D̃T ≤ DU because D̃T = V T T (D̃) ≤ V T T (sup(D)) = DU , with T T = max{T, T 0 }, where T is the number of iterations from D̃ to D̃T and T 0 is the number of iterations from sup(D) to DU . 32 2.9 Appendix A: definitions and proofs 2.9.4 56 Verification that the chains’ profit functions are supermodular with decreasing differences Definition 2.3. Suppose that Y (X) is a real-valued function on a lattice X. If Y (X 0 ) + Y (X 00 ) ≤ Y (X 0 ∨ X 00 ) + Y (X 0 ∧ X 00 ) (2.7) for all X 0 and X 00 in X, then Y (X) is supermodular on X. If Y (X 0 ) + Y (X 00 ) < Y (X 0 ∨ X 00 ) + Y (X 0 ∧ X 00 ) for all unordered X 0 and X 00 in X, then Y (X) is strictly supermodular on X. If −Y (X) is (strictly) supermodular, then Y (X) is (strictly) submodular. Definition 2.4. Suppose that X and K are partially ordered sets and Y (X, k) is a realvalued function on X×K. If Y (X, k00 )−Y (X, k0 ) is increasing, decreasing, strictly increasing, or strictly decreasing in X on X for all k0 ≺ k 00 in K, then Y (X, k) has, respectively, increasing differences, decreasing differences, strictly increasing differences, or strictly decreasing differences in (X, k) on X. Now let us verify that chain i’s profit function (2.1) is supermodular in its own strategy Di ∈ D. For ease of notation, the firm subscript i is omitted, and Xm β i + δ ij Dj,m + p 1 − ρ2 εm + ρη i,m is absorbed into Xm . Chain i’s profit function is i h P Dl ∗ (X + δΣ ) . First it is easy to show that D0 ∨D00 = simplified to: Π = M D m m l6=m Zml m=1 δ is ln(Ns,m + 1) + (D0 − min(D0 , D00 )) + (D00 − min(D0 , D00 )) + min(D0 , D00 ), D0 ∧ D00 = min(D0 , D00 ). Let D0 − min(D0 , D00 ) be denoted D1 , D00 − min(D0 , D00 ) as D2 , and min(D0 , D00 ) as D3 .The 2.9 Appendix A: definitions and proofs 57 left-hand side of the inequality (2.7) is: X Dl0 D00 00 )+ Dm (Xm + δΣl6=m l ) m m Zml Zml X £ ¤ 0 0 00 0 00 = − min(Dm , Dm )) + min(Dm , Dm ) ∗ (Dm m ∙ ¸ 1 0 0 00 0 00 Xm + δΣl6=m [(Dl − min(Dl , Dl )) + min(Dl , Dl )] + Zml X 00 0 00 0 00 [(Dm − min(Dm , Dm )) + min(Dm , Dm )] ∗ m ∙ ¸ 1 00 0 00 0 00 Xm + δΣl6=m )[(Dl − min(Dl , Dl )) + min(Dl , Dl )] Zml X 1 (D1,m + D3,m )(Xm + δΣl6=m (D1,l + D3,l )) + = m Zml X 1 (D2,m + D3,m )(Xm + δΣl6=m (D2,l + D3,l )) m Zml Π(D0 ) + Π(D00 ) = X 0 Dm (Xm + δΣl6=m Similarly, the right-hand side of the inequality (2.7) is: ∙ ¸ 1 0 00 Xm + δΣl6=m (D ∨ Dl ) + Π(D ∨ D ) + Π(D ∧ D ) = Zml l ∙ ¸ X 1 0 00 0 00 (Dm ∧ Dm ) Xm + δΣl6=m (D ∧ Dl ) m Zml l X = (D1,m + D2,m + D3,m ) ∗ m ¸ ∙ 1 (D1,l + D2,l + D3,l ) + Xm + δΣl6=m Zml X 1 D3,m (Xm + δΣl6=m D3,l ) m Zml D2,m D1,l + D1,m D2,l ) = Π(D0 ) + Π(D00 ) + δ(Σm Σl6=m Zml 0 00 0 00 X 0 (Dm m 00 ∨ Dm ) The profit function is supermodular in its own strategy if the chain effect δ is non-negative. The verification of decreasing differences is also straightforward (here δ ij Dj,m is spelled out, 2.9 Appendix A: definitions and proofs 58 rather than absorbed into Xim ): Πi (Di , Dj00 ) − Πi (Di , Dj0 ) ¸ X ∙ Di,l 00 = + δ ij Dj,m ) − Di,m ∗ (Xim + δ ii Σl6=m m Zml ¸ X ∙ Di,l 0 Di,m ∗ (Xim + δ ii Σl6=m + δ ij Dj,m ) m Zml = δ ij M X m=1 00 0 Di,m (Dj,m − Dj,m ) The difference is decreasing in Di for all Dj0 < Dj00 as long as δ ij ≤ 0. 2.9.5 Multiple maximizers to the chains’ optimization problem In the main body of the chapter, I have assumed that the optimal solution to the profit maximization problem D∗ is unique. To accommodate multiple solutions in the algorithm discussed in subsections 2.5.1 and 2.5.2, I need to introduce the definition of the induced set ordering. Definition 2.5. The induced set ordering v is defined on the collection of nonempty members of the power set P(X)\{∅} such that X0 v X00 in P(X)\{∅} if X 0 in X0 and X 00 in X00 imply that X 0 ∧ X 00 is in X0 and X 0 ∨ X 00 is in X00 . The power set P(X) of a set X is the set of all subsets of X. Definition 2.6. A function whose range is included in the collection of all subsets of some set is a correspondence. A correspondence Sk is increasing (decreasing) in k on K if the domain K is a partially ordered set, the range {Sk : k ∈ K} is in L(X) where X is a lattice and L(X) is a partially ordered set with the ordering relation v, and Sk is an increasing (decreasing) correspondence from K into L(X) (so k0 ¹ k00 in K implies Sk0 v Sk00 (Sk00 v Sk0 ) in L(X)). 2.9 Appendix A: definitions and proofs 59 In stating that Sk is increasing (decreasing) in k, it is implicit that each Sk is a nonempty sublattice of X and that v is the ordering relation on the sets Sk in L(X). The following Theorem, discussed in Topkis (1998), states that if Sk is decreasing in k on K and Sk is finite, then Sk has a greatest element and a least element, both of which decrease in k.33 Theorem 2.3. Suppose that X is a lattice, K is a partially ordered set, Sk is a subset of X for each k in K, and Sk is decreasing in k on K. If Sk has a greatest (least) element for each k in K, then the greatest (least) element is a decreasing function of k from K into X. Hence, if Sk is finite for each k in K, or X is a subset of Rn and Sk is a compact subset of Rn for each k in K, then Sk has a greatest element and a least element for each k in K and the greatest (least) element is a decreasing function of k. According to Theorem 2.2 in the main body of the chapter, arg maxDi ∈Di Πi (Di , Dj ) is decreasing in Dj . Since arg maxDi ∈Di Πi (Di , Dj ) ⊂ Di is finite, Theorem 2.3 implies that the set arg maxDi ∈Di Πi (Di , Dj ) has a greatest element and a least element for each Dj , both of which decease in Dj . The solution algorithm that accommodates multiple solutions to the profit-maximizing problem is as follows: to search for the equilibrium most profitable for Kmart, if there are multiple elements in Kmart’s best response correspondence, choose the greatest element; if there are multiple elements in Wal-Mart’s best response correspondence, choose the least element. To search for the equilibrium most profitable for Wal-Mart, choose the greatest element in Wal-Mart’s best response correspondence and the least element in Kmart’s best response. 33 The original theorem is stated in terms of Sk increasing in k. Replacing k with −k delivers the version of the theorem stated here. 2.9 Appendix A: definitions and proofs 2.9.6 60 Computational issues The main computational burden of this exercise is the search for the best responses K(Dw ) and W (Dk ). In section 2.5.1, I have proposed two bounds DU and DL that help to reduce the number of profit evaluations. Appendix 2.9.3 illustrates a tighter upper and lower bound that work well in the empirical implementation. When the chain effect δ ii is sufficiently big, it is conceivable that the upper bound and lower bound are far apart from each other. If this happens, computational burden once again becomes an issue, as there will be many vectors between these two bounds. Two observations work in favor of the algorithm. First, recall that the chain effect is assumed to take place among counties whose centroids are within fifty miles. Markets that are farther away are not directly connected. Conditioning on the entry decisions in other markets, the entry decisions in group A do not depend on the entry decisions in group B if all markets in group A are at least fifty miles away from any market in group B. Therefore, what matters is the size of the largest connected markets different between DU and DL , rather than the total number of elements different between DU and DL . To illustrate this point, suppose there are ten markets as below: 4 1 2 3 5 6 7 9 10 U L 8 . Suppose the upper bound D and the lower bound D are the same for markets 2,6,9, and 10, but differ for the rest six markets: DU = 1 1 D2 1 1 D6 1 D9 D10 1 , 2.9 Appendix A: definitions and proofs DL = 0 0 D2 0 0 D6 0 D9 D10 61 0 . If markets 1, 4, and 5 (group A) are at least fifty miles away from markets 3, 7, and 8 (group B), one only needs to evaluate 23 + 23 = 16 vectors, rather than 26 = 64 vectors to find the profit maximizing vector. The second observation is that even with a sizable chain effect, the event of having DU and DL different in a large connected area is extremely unlikely. Let N denote the U = 1[X + size of such an area. Both DU and DL are the fixed points of V (·), so: Dm m DU DL L = 1[X + 2δΣ l l 2δΣl6=m,l∈Bm Zml + ξ m ≥ 0] and Dm m l6=m,l∈Bm Zml + ξ m ≥ 0], where I have grouped Xm β i + δ ij Dj,m + δ is ln(Ns,m + 1) into Xm , and p 1 − ρ2 εm + ρηi,m into ξ m . Bm denotes the set of markets that are within fifty miles from market m, including m.) The U = 1, D L = 0 for every market m in the size-N connected area C is: probability of Dm N m U L = 1, Dm = 0, ∀m ∈ CN ) ≤ Pr(Xm + ξ m < 0, Xm + ξ m + 2δΣl6=m,l∈Bm Pr(Dm = ΠN m=1 Pr(Xm + ξ m < 0, Xm + ξ m + 2δΣl6=m,l∈Bm 1 ≥ 0, ∀m ∈ CN ) Zml 1 ≥ 0) Zml where ΠN m=1 denotes the product of the N elements. The equality follows from the i.i.d. assumption of Xm +ξ m . As δ goes to infinity, the probability approaches ΠN m=1 Pr(Xm +ξ m < 0) from below. How fast it decreases when N increases depends on the distribution of ξ m as well as the distribution of Xm . If ξ m is i.i.d. normally distributed and Xm is linearly distributed between [−a, a], with a a finite positive number, on average the probability is on the magnitude of ( 12 )N . 2.10 Appendix B: data 62 To show this, note that: N E(ΠN m=1 Pr(Xm + ξ m < 0)) = E(Πm=1 (1 − Φ(Xm )) = ΠN m=1 [1 − E(Φ(Xm ))] 1 = ( )N 2 Therefore, even in the worst scenario that the chain effect δ approaches infinity, the probability of having a large connected area that differs between DU and DL decreases exponentially with the size of the area. In the current application, the size of the largest connected area that differs between DL and DU is seldom bigger than seven or eight markets. 2.10 Appendix B: data I went through all the painstaking details to clean the data from the Directory of Discount Stores. After the manually entered data were inspected many times with the hard copy, the stores’ cities were matched to belonging counties using a census data.34 Some city names listed in the directory contained typos, so I first found possible spellings using the census data, then inspected the stores’ street addresses and zipcodes using various web sources to confirm the right city name spelling. The final data set appears to be quite accurate. I compared it with Wal-Mart’s firm data and found the difference to be quite small.35 For the sample counties, only thirty to sixty stores were not matched between these two sources for either 1988 or 1997. 34 Marie Pees kindly provided these data. 35 I am very grateful to Emek Basker for sharing the Wal-Mart firm data with me. Chapter 3 Semi-Parametric Estimation of the Distribution of Fixed Costs in Entry Models 3.1 Introduction Entry has long been an interesting topic for IO economists. Among the early papers, Bain (1956) focused on “determinants of entry barriers”, such as economies of scale, product differentiation and cost asymmetry.With the application of game theory to economics in the 1970s and 1980s, IO economists started to model how strategic behavior among firms influences firms’ entry decisions. This literature tends to show that strategic behavior in many cases is more important than technology and demand factors. The empirical work that directly models strategic interaction among firms did not start until the late 1980s. Among the earliest works are Bresnahan and Reiss (1987, 1990, and 1991), and Berry (1992). The most recent ones include Berry and Waldfogel (1999), Mazzeo(2002), Seim(2006), and 63 3.1 Introduction 64 Toivanen and Waterson (2005), among many others. These structural models rely on the fact that firms’ discrete entry decisions reveal some information about their underlying profit. By observing how firms’ entry or exit decisions change when market conditions evolve, we can make inferences about how market conditions affect firms’ profit and possibly learn about the nature of the competition among firms. In most of these models, profit in a particular market (usually a geographic region) is a reduced form equation of market observables (like income, population, demographics, and possibly the number of firms in the market, or some measure of market structure) plus an error term. A distribution assumption of the fixed cost, plus some assumptions of the entry game structure, allows researchers to express the probability of observing a certain market structure as a function of profit parameters. Maximum likelihood or simulated maximum likelihood (in the cases when it is difficult to calculate the probabilities directly) can then be used to estimate parameters in the profit function. Besides the choice of profit functional forms, researchers also have to assume a distribution function for the unobserved fixed cost. Most of the time it is chosen for computational simplicity or tractability reasons. This brings the following question: what if the distribution assumption is wrong? What if the true distribution has a long tail or is multi-modal when it is assumed to be normal? Will this lead to biased estimates, overestimating or underestimating the competitive interactions among firms? As we all know, the maximum 3.1 Introduction 65 likelihood estimator under mis-specification is typically inconsistent, converging in probability to the value that minimizes the Kullback-Liebler Information Criterion. The direction of the asymptotic bias is generally difficult to tell. Since we typically do not have any information about the unobserved fixed cost, it would be imprudent to report the parameter estimates without carrying out any robustness check on the distribution assumptions. In this chapter I will explain how we can get a consistent (and efficient under certain conditions) estimator without imposing any distribution assumptions on the fixed cost, and propose some simple specification tests. In some cases, we might be interested in the distribution of the fixed cost itself. We might want to know whether the distribution has shifted after some policy reforms or regulation changes. For example, one might be interested in examining whether after the 1996 telecommunication act the average entry barriers due to fixed cost have shifted dramatically. Unlike some other semi-parametric estimation methods, (for example, the average derivative method), the estimation method used in this chapter (proposed by Klein and Spady (1993)) allows one to back out the distribution itself. The chapter is organized as follows. The second section explains the setup of the model. The third to the fifth sections elaborate the estimation methods and properties of the estimators. The sixth section describes the differences between the model used in Klein and Spady (1993) and the one in this chapter. The last two sections discuss the data source and the results. Section nine concludes. 3.2 Model 3.2 66 Model The entry game studied here has a very stylized setting with homogeneous firms (similar to Bresnahan and Reiss (1991)). The game has two stages. In the first stage, firms decide whether to enter or not. Once the entry decisions have been made, firms compete in the second stage in which payoffs are determined. It is a game with complete information: every firm knows about the level of market demand, the variable costs of production, as well as the fixed cost (the error term that is not observed by econometricians). In equilibrium, all of the firms produce the same amount of output, charge the same price and have the same market shares. The number of firms in equilibrium is the maximum number of firms that can be supported by a market, i.e., all the incumbent firms earn non-negative profits and the entry of an additional firm makes the profits earned by all firms in the market to fall below zero. Admittedly no industry fits in the above description. After all, firms differ in many aspects: products produced by different firms are usually not the same, firms’ services differ, and their reputation is generally very different. This most stylized model only serves as a starting point for analysis and helps to shed lights on the structure of the underlying profit functions firms face (which we never observe) by analyzing the relationship between market conditions and firms’ entry decisions.Another motivation for this type of models is due to the scarcity of firm level information. As a matter of fact, many of the empirical entry models have focused on the industries that are relatively homogeneous (so the homogeneity assumption is not too bad) and have a large number of well defined market areas (so we 3.3 Estimation Method 67 can make inferences by using cross sectional data). For example, Mazzeo (2002) analyzed the entry of motels in small towns along the interstate highways, and Seim (2006) studied the video retailers in medium sized areas. I also assume that total demand function is well behaved and firm level profit decreases in the number of firms operating in the market. Let Πj denote the profit each firm earns when there are j firms, then Π1 ≥ Π2 ≥ · · · ≥ ΠJ . Profit depends on the number of firms, market observables and fixed cost in the following way: Πj = V (X, j, θ) − F where X is a vector of market observables, j is the number of firms in the market, θ is the parameter vector, and F is the fixed cost. In equilibrium, there are j firms when Πj is positive (or zero) while Πj+1 is negative. This translates to the following ordered response model: 3.3 ⎧ ⎪ 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 ⎪ ⎪ ⎨ Y = 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ··· ⎪ ⎪ ⎪ ⎪ ⎩ J if Π1 < 0 if Π1 ≥ 0 & Π2 < 0 if Π2 ≥ 0 & Π3 < 0 if ΠJ ≥ 0 Estimation Method Before I jump into the semi-parametric estimation procedure proposed by Klein and Spady (1993), it is perhaps helpful to review the commonly used parametric estimation approach. 3.3 Estimation Method 68 Notice that if we fix a distribution assumption on the fixed cost, it is straight forward to write down the likelihood of the sample and pursue with maximum likelihood estimation. Specifically, the probability of observing j firms in market i conditioning on the market observables Xi is (assuming the fixed cost {Fi }N i=1 has a i.i.d. normal distribution): P r(Yi = j|Xi = xi ) = P r(Πj ≥ 0, Πj+1 < 0|Xi = xi ) = P r(V (Xi , j + 1, θ) < Fi < V (Xi , j, θ)|Xi = xi ) = Φ(V ( xi , j, θ)) − Φ(V ( xi , j + 1, θ)) = Pj (θ) where Φ denotes the c.d.f. of a normal distribution, and I have suppressed the subscript i in Pj (θ) to simplify notation. The log likelihood function for the sample is therefore the following: LN = J N X X {Yi = j}ln[P r(Yi = j|Xi = xi )] i=1 j=0 = N X J X {Yi = j}lnPj (θ) i=1 j=0 where {.} is the indicator function. It might seem impossible to come up with a probability measure if we don’t impose any distribution assumption of F at all. The corner stone of Klein and Spady (1993) approach is to translate the unobserved conditional probability into 3.3 Estimation Method 69 a product of three components that we can easily estimate using Baye’s rule: P r(Yi = j|Xi = xi ) = = pY i,Xi (Yi = j, xi ) pXi ( xi ) pXi ( xi |Yi = j) ∗ P (Yi = j) pXi ( xi ) where pXi ( xi |Yi = j) is the conditional density of Xi , pXi ( xi ) is the unconditional density of Xi , and P (Yi = j) is the unconditional probability of Yi equal to j. Both pXi (xi |Yi = j) and pXi (xi ) can be estimated directly from data using non parametric density estimation. P (Yi = j) can be consistently estimated by the sample frequency of {Yi = j}. A simple transformation therefore enables us to back out the unobserved probability measure using data alone! In a nutshell, the differences between parametric estimation and the semiparametric approach can be summarized as the following: parametric estimation starts with a distribution assumption, and calculates the choice probability P r(Y = j|X = x) based on the distribution assumption; the semi-parametric estimation reverses the procedure in the sense that it estimates the choice probabilities first and then derives the distribution of the error term (or the fixed cost) based on the choice probability estimates. Let Pj (θ) denote the conditional choice probability P r(Yi = j|Xi = xi ).In Klein and Spady (1993), Pj (θ) is estimated semiparametrically using a two-step bias reducing kernel (also called adaptive kernel). The estimated P̂j (θ) is substituted for Pj (θ) in the log likelihood function.Maximizing the log likelihood function with respect to θ then gives us 3.4 Kernel Density Estimation 70 the quasi-maximum likelihood estimator θ̂(P̂ ) (“quasi” because we use the estimated choice probability instead of the “true” choice probability in the maximization routine). Klein and Spady (1993) showed that θ̂(P̂ ) behaves asymptotically like the MLE θ̂(P ). Under mild regularity conditions, θ̂(P̂ ) is consistent, asymptotically normally distributed.When the error term is independent of the regressors, θ̂(P̂ ) is efficient, in that it attains the efficiency bound among the class of semi-parametric estimators. (Roughly speaking, among the class of all estimators that don’t impose distribution assumptions for the error term, θ̂(P̂ ) has the smallest variance.) 3.4 Kernel Density Estimation To be able to understand how one can estimate the choice probability nonparametrically, a brief review about density estimation might be helpful. Suppose θ is known in the above model. Suppose also that the correct function form of profit V (X, j, θ) is known and the focus is to estimate the distribution of the random scalar variable V (X, j, θ). The data is a sample of N realizations of V : v1 , v2 , · · · , vN . A naive estimator would be to distribute equal mass on all observations, each with mass 1/N . To do this, we draw a box with width 2h (h is some small positive number) and height (2hN )−1 centered around each observation and sum over all the boxes. Intuitively, if there are many observations centered around value v, summing over lots of boxes leads to a high value of fˆV (v), which is what we would expect.For an example of density estimation using 3.4 Kernel Density Estimation 71 seven data points, see Figure 3(A). As simple and appealing as this approach is, it has a couple of drawbacks. Most importantly, fˆV (v) is not a continuous function, with jumps at {{vi ± h}}N i=1 (vi is the sample observation) and has zero derivatives everywhere else. A straight forward extension to the above method is to replace the box function with a function K(v) that is smooth and satisfies the“weight” requirement: Z −∞ K(v)dv = 1 −∞ Typically, K(v) is chosen to be a symmetric probability function, like the normal density function. By analogy with the definition of the naive estimator, the density estimator with kernel K is: N 1 X v − vi ]/h fˆV (v) = K[ N h i=1 where {vi }N i=1 are the sample points, h is the window size, or bandwidth. Just as the naive estimator can be considered as a sum of “boxes” centered around the observations, the kernel estimator is a sum of bumps placed at the sample points. The kernel K determines the shape of the bumps, while the window size h determines the width of the bumps. A big bandwidth allows observations far away from v to contribute to the estimate of fˆV (v), leading to a smoother density estimate, while a small bandwidth focuses more on the observations immediately around v and the density estimates is bumpier. Some elementary properties of kernel estimators follow directly from the definition. 3.4 Kernel Density Estimation 72 First, if the kernel function K is a density function, then fˆV (v) is also a density function (non-negative everywhere and integrating to one). Further, if K is smooth and differentiable to the rth order, so is fˆV (v). See Figure 3(B) for an example of kernel density estimation, taken from Silverman (1986), page 14. The kernel estimator defined above applies one window size to the entire sample and doesn’t handle very well the tail part of the distribution where there are fewer observations. Adaptive kernel estimation copes with this problem by a two-step procedure. An initial kernel estimate (usually called the pilot estimate in the literature of semi-parametric estimation) is used to get an idea of the density at each observation. This preliminary density estimate then yields an optimal window size used to construct the adaptive estimator itself. The general strategy of the adaptive kernel estimation is the following: Step 1: find a pilot estimate fˆV (vi ) that satisfies fˆV (vi ) > 0, ∀i. ˆ Step 2: define the local bandwidth factor λi by λi = ( fV g(vi ) )−α , where g is the geometric mean of all the fˆV (vi ). α is a constant between zero and one.1 Step 3: define the adaptive kernel estimator f˜V (v) by: N 1 X v − vi ˜ K[ ]/(hλi ) fV (v) = N hλi i=1 where K is the kernel function and h is the normal bandwidth (or window size) that is held constant across the sample points. The general view in the kernel estimation literature is 1 Abramson (1982) showed that under some regularity conditions, the optimal α is 1/2. 3.5 Identification Assumptions and Properties of the Quasi Maximum Likelihood Estimator 73 that the adaptive kernel estimate is insensitive to the fine details of the pilot estimate, and therefore any convenient estimate suffices for that purpose. The adaptive kernel estimation is important because we need to have a sufficiently small bias of the density estimate to establish the consistency and normality of the quasimaximum likelihood estimator θ̂(P̂ ). Intuitively, we want the estimated choice probability P̂j (θ) (which is based on the estimated densities) to approach to the “true” choice probability Pj (θ) reasonably fast. In fact, the adaptive kernel combined with optimal window sizes obtains a uniform bias of the order h4 . (See Silverman (1986) for further explanation of the properties of the adaptive kernel estimation.) 3.5 Identification Assumptions and Properties of the Quasi Maximum Likelihood Estimator The identification assumptions of θ̂(P̂ ) is fairly general and less restrictive than those under the parametric estimation procedure. They are: Assumption 3.1. Parameter space: the true parameter θ0 is an interior point of Θ, a compact subspace of RK . Assumption 3.2. Data: {Yi , Xi }N i=1 are i.i.d. observations. In addition, each Xi is independent of the error term i. Assumption 3.3. Index restriction: there exists a scalar index V (X, θ0 ) such that P r(Y = j|X = x) = P r(Y = j|V (X, θ0 ) = V ( x, θ0 )) 3.5 Identification Assumptions and Properties of the Quasi Maximum Likelihood Estimator 74 In the above model, V (X, θ0 ) is V (X, j, θ0 ), the non random part of the profit. The index restriction is very important in that it reduces the dimension of the conditional variables from K (assuming there are K non-constant regressors) to 1. Instead of estimating the distribution of K variables, whose precision decreases exponentially as K increases, we only need to estimate the distribution function for one variable (the index). Among the set of regressors, X1 , ..., Xk , at least one regressor is continuous. Without loss of generality, let this variable be X1 . Write f as the conditional density of X1 given the rest of the regressors and Y . Assumption 3.4. Conditional density: f is strictly positive and has bounded derivatives up to the 4th order. The smoothness and boundedness help to control the bias in establishing the limiting distribution of θ̂(P̂ ). It ensures that the densities underlying the choice probability Pj (θ) are sufficiently smooth and can be approximated by a smooth function of θ. Klein and Spady (1993) showed that the quasi-likelihood estimator explained above is consistent at the √ N rate, asymptotically normal and efficient under the assumption of independent errors. Theorem 3.1 (Consistency). Under conditions A1 - A4: J N X X p θ̂ ≡ arg sup {Yi = j}lnP̂j (θ) → θ0 θ i=1 j=0 Proof: See Klein and Spady (1993) 3.5 Identification Assumptions and Properties of the Quasi Maximum Likelihood Estimator 75 Theorem 3.2 (Normality). Under conditions A1 - A4, the asymptotic distribution of N 1/2 (θ̂ − θ0 ) is N (0, Σ), where Σ≡E (" ∂P ∂θ #" ∂P ∂θ #0 " 1 P (1 − P ) #)−1 θ=θ0 Proof: See Klein and Spady (1993). See Klein and Spady (1993) for further explanation of the asymptotic variance of θ̂. Here P = P r(Y |θ), where θ is not fixed at the true value θ0 . Intuitively, when the estimated choice probability P̂j (θ) converges to the true choice probability Pj (θ) fast enough, the quasi likelihood function L̂N = converges uniformly to its expectation LN = PN PJ i=1 j=0 {Yi PN PJ i=1 j=0 {Yi = j}lnP̂j (θ) = j}lnPj (θ). Therefore, the quasi likelihood estimator θ̂, which maximizes the quasi likelihood function, converges uniformly to the true parameter value θ0 that maximizes the true likelihood function. The distribution of N 1/2 (θ̂ − θ0 ) inherits its normality from the properties of the score function, which can be shown to converge to a normalized sum of i.i.d. elements that has a asymptotic normal distribution. 3.6 Differences between my model and Klein and Spady (1993) 3.6 76 Differences between my model and Klein and Spady (1993) The model considered by Klein and Spady (1993) is a traditional ordered response model with a single index and multiple cutoff values: ⎧ ⎪ 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 ⎪ ⎪ ⎨ Y = 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ··· ⎪ ⎪ ⎪ ⎪ ⎩ J if Xβ + < t0 if t0 ≤ Xβ + < t1 if t1 ≤ Xβ + < t2 if Xβ + ≥ tJ−1 where Xβ is the single index (X is a row vector of K regressors), and t0 , t1 , · · · , tJ−1 are the cutoff values. In my model, all of the cutoff values are zero. Instead of a single index, there are multiple indices: ⎧ ⎪ 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 ⎪ ⎪ ⎨ Y = 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ··· ⎪ ⎪ ⎪ ⎪ ⎩ J if V (X, 1, θ) − F < 0 if V (X, 1, θ) − F ≥ 0, V (X, 2, θ) − F < 0 if V (X, 2, θ) − F ≥ 0, V (X, 3, θ) − F < 0 if V (X, J, θ) − F ≥ 0 where the indices are V1 (which is V (X, 1, θ)), V2 , · · · , VJ . While Klein and Spady (1993) only needs to condition on one index Xβ, I have to condition on all the indices V1 , V2 , · · · , VJ to calculate the choice probability P r(Y = j|V ). Under the assumption that V1 ≥ V2 ≥ · · · ≥ VJ , the outcome of Y (say Y = j) is determined by the relative positions of two indices: Vj ≥ F and Vj+1 < F . In some sense, Vj and Vj+1 3.6 Differences between my model and Klein and Spady (1993) 77 are sufficient statistics for the conditional choice probability. To be more specific, P r(Y = j|V1 , · · · , VJ ) = P r(Y = j|Vj , Vj+1 ) = pVj ,Vj+1 (vj , vj+1 |Y = j)P (Y = j) pVj ,Vj+1 (vj , vj+1 ) To obtain the joint density estimate of Vj and Vj+1 , bivariate density estimation is needed. However, the accuracy of density estimates decreases exponentially as the number of dimensions increases. A trick to go back to the single index model is to change the structure of Y . Instead of using: Y = j ⇔ Πj ≥ 0, Πj+1 < 0 we use: Y ≤ j ⇔ Πj+1 < 0 ⇔ Vj+1 < F which leads to: P r(Y ≤ j|V1 , · · · , VJ ) = P r(Y ≤ j|Vj+1 ) = pVj+1 (vj+1 |Y ≤ j)P (Y ≤ j) pVj+1 (vj+1 ) The estimated conditional choice probability P r(Y = j|V ) is the difference between P r(Y ≤ j|V ) and P r(Y ≤ j − 1|V ). 3.7 Data 3.7 78 Data The Data set is composed of all small towns with population less than 10,000 across U.S. They are at least 10 miles away from any town with a population bigger than 1,000 and 20 miles away from any town bigger than 5,000. Together 684 towns satisfy these restrictions. Geographical isolation helps to define a market relatively easily, since population of the town will be a good proxy for the local demand. The industry I am focusing on is dry cleaning industry, for several reasons. First, people are not very likely to travel long distance for dry cleaning services, so the leakage of demand should not be a serious problem. Second, it is a fairly homogenous industry with standard cleaning technology. Third, the sizes of the firms in my sample are very similar, with 90% of them employing less than 10 employees. Town population, median household income and population age structure come from Census CD, while firm information comes from American Business Disc, 2002. 3.8 Results Since not many towns have three or more firms, I group them with the towns that have two firms. Among the 684 towns in the sample, 354 of them have no dry cleaners, 219 of them have one dry cleaner, and the rest of the 111 towns have two or more dry cleaners. Monopoly profit and duopoly profit takes the following forms: Π1 = S[a1 + b ∗ Inc] − F Π2 = S [a2 + b ∗ Inc] − F 2 3.8 Results where S is town population, Inc is the median household income. S (or 79 S 2 in the duopoly profit equation) is the per firm market size, which measures the quantity of demand facing each firm; a1 + b ∗ Inc is a first order approximation of monopoly variable profit per unit of demand, and a2 + b ∗ Inc is an approximation of duopoly profit. Here I assume that the total revenue can be separated into two multiplicative elements: unit variable profit and the total units of demand. In particular, profit per unit doesn’t depend on the size of the market. I also assume that the entry of another firm changes the unit variable profit only by a constant. Admittedly, this is a very restrictive assumption, but it is a parsimonious way to model the competition effect of entry and it allows me to focus on the parameter estimation. The parameters are a1, a2, b, and the mean or median of the fixed cost F . I impose restrictions on the parameters to guarantee that Π1 ≥ Π2 . Notice that there is no constant in the model — it is absorbed into the mean of the fixed cost. The approach proposed by Klein and Spady (1993) can not identify the constant term since there is no restriction imposed on the error term distribution. In particular, we can not impose the zero mean assumption of the error term during estimation. The mean or median of the fixed cost can only be recovered after obtaining the estimate of its distribution. Also, as common to all of the ordered response models, the parameters are only identified up to a scale factor. Here I normalize b to be one. Recall that the choice probabilities are functions of the fixed cost distribution evaluated 3.8 Results 80 at the sample points: P r(Yi ≤ j|Vj+1 (Xi , j + 1, θ) = Vj+1 (xi , j + 1, θ)) = P r(F > Vj+1 (xi , j + 1, θ)) = 1 − PF (Vj+1 (xi , j + 1, θ)) where PF (Vj+1 (xi , j+1, θ)) is the distribution of F evaluated at the sample point Vj+1 (xi , j+ 1, θ). Figure 4 displays the estimated distributions and Figure 5 shows the density estimates of the fixed cost. There are four different estimates, each using a different window size, holding the parameter estimates θ̂ fixed at the one obtained using the optimal window size N −1/6 . The window sizes for the first graph to the last graph are N −1/6 , N −1/7 , N −1/8 and N −1/9 respectively. (The estimation procedure requires that the window size be between N −1/6 and N −1/9 for a reasonable bias and variance of the parameter estimates.) As you can see, the curves get smoother as the window size gets bigger. Note that the density is not symmetric — it has a long tail. I repeat the above exercise while fixing the parameters to be at the parametric estimate values. The distribution estimates look very similar to the above figures. It seems that this feature of the distribution (asymmetry) is not an artifact of the semiparametric estimation , but is inherent to the data. The boundary of the distribution is not very precisely estimated. There are many ways to estimate the distribution of the error term, yet there has not been any work on which one is most efficient. For example, we can estimate the distribution from the probability estimates {P r(Yi ≤ 0|V1 (xi , j + 1, θ))}N i=1 , which give us 684 data points, or we can estimate the distribution from any other choice probabilities, like 3.8 Results 81 P r(Y ≤ j|Vj+1 (xi , j + 1, θ))}N i=1 ; or maybe we can use a mixture of both. In plotting the graphs,I have chosen to use the first category P r(Y ≤ 0|V1 ) since the group {Yi = 0} has by far the most observations (more than half of the sample) and the choice probability P r(Y ≤ 0|V1 ) should be estimated more accurately. I did a rough comparison among different distribution estimates using different groups of choice probabilities. They overlap in the middle part where there are more observations and differ somewhat in the boundaries. Table 11 shows both the semiparametric estimates and the parametric estimates. As can be seen from the table, all of the estimates are significant at 1% level. The parametric estimates of the slope coefficients are 12% larger than the semi-parametric ones; the parametric estimate of the constant is around 20% to 30% larger. These differences translate to a bigger competitive effect of entry as suggested by the parametric estimates. It will become clearer later when I discuss about entry threshold ratios. In the parametric estimation, the mean and median of the fixed cost happen to be the same number because the assumed distribution is symmetric. I would argue that in the current context, the median is more meaningful and certainly more robust than the mean of the distribution. The median informs us of the “typical” fixed costs incurred by firms, while the mean is more sensitive to the tail properties of the fixed cost. A technical reason that I use the median is that I can not estimate the boundary of the distribution very precisely, so a reliable estimate of the mean is hard to obtain. Perhaps not surprisingly, the log likelihood of the semiparametric estimation is much bigger than that of the parametric estimation, suggesting a better fit of the data. 3.8 Results 82 A central concept in the Bresnahan and Reiss (1991) is the entry threshold SkT H , the smallest size of the market that can support k firms, once we fix the fixed cost at its mean (or median) level. Mathematically, they are: S1T H = S2T H = S2T H S1T H = F a1 + b ∗ Inc F a2 + b ∗ Inc a1 + b ∗ Inc a2 + b ∗ Inc The threshold values are displayed in Table 12. According to the parametric estimates, in a town with average median household income, it takes about 2.7 thousand people to support one dry cleaner and an extra 4.5 thousand people to support the second firm. The semi-parametric estimates suggest that the second firm will enter when the town’s population increases by 4,000. This is a 13% difference. A more rigorous test of the differences is discussed later. To show that the asymmetry of the fixed cost distribution drives the differences between the parametric estimates and the semi-parametric estimates, I did the following exercise. I truncated those observations corresponding to the tail part of the fixed cost distribution and changed the sample distribution of the fixed cost into a more or less symmetric one. I repeated both the parametric and the semiparametric estimation using the new sample. Table 13 displays the new estimates. Now the differences between the two groups of estimates become much smaller, in the range of 1% to 5%! 3.8 Results 83 A natural choice to test whether the two groups of estimates are significantly different from each other is the Hausman test. Recall that the formula for Hausman test is: H = (θ̂ − θ̃)0 V ar(θ̂ − θ̃)−1 (θ̂ − θ̃) where θ̃ is the efficient estimator under the null hypothesis, but is inconsistent under the alternative. θ̂ is consistent under both the null and the alternative. A typical trick to calculate the variance of ( θ̂ − θ̃ ) is to use the following: V ar(θ̂ − θ̃) = V ar(θ̂) − V ar(θ̃) if θ̃ is the efficient estimator under the null. Here I run into a technical problem: as showed in Table 11, the estimated variances of the semiparametric estimates are much smaller than the estimated variances of the parametric ones. I repeated the estimation of the variances using different window sizes, including the one that is optimal to estimate the ∂P ∂θ element in the semiparametric θ’s variance (See Theorem 2 for detail.) However, they all produce the same pattern that the variances of the semi-parametric estimates are smaller. Interestingly, Horowitz and Hardle (1996) also found the same pattern that the variances of the parametric estimates are much bigger than those of the semiparametric estimates (the average derivative estimators). This opens the following discussion: is it a small sample problem, in that we can not obtain a very accurate estimate of the true variances, or does it suggest some sort of mis-specification (for example, the symmetry assumption on the error distribution is not valid)? The answer to this question is beyond the scope of this chapter 3.9 Conclusion 84 and is left for future research. Using the difference of the estimated variance matrices as the denominator of the test statistics, we fail to reject the null hypothesis (with the size of the test being 5%) that these two sets of estimates are not significantly different from each other. 3.9 Conclusion The conclusion of this chapter is more quailitative than quantitative. I seem to find evidence that the error term has a asymmetric distribution, yet I can not reject the hypothesis that the two estimates are the same. The literature does not provide researchers with a good guidance as to how to obtain the optimal distribution estimates among the many possible choices. Despite the above limitations, the method proposed in this chapter can still be used for robustness check. Failure to find significant differences among the two groups of estimates brings us more confidence about the validity of the distribution assumption, while the finding of a significant difference calls for attentions about possible specification errors. 4. FIGURES Figure 1: Wal-Mart and Kmart Stores in 1988 Legend ! # Wmart Store Kmart Store Small and Medium Counties Figure 2: Wal-Mart and Kmart Stores in 1997 Wmart Stores # Kmart Store Small and Medium Counties 85 86 4. FIGURES Figure 3(A): Naïve Estimate. Window Size: 0.4 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -5 -4 -3 -2 -1 0 1 2 3 4 Figure 3(B): Kernel Estimate Showing Individual Kernels. Window Size: 0.4 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -5 -4 -3 -2 -1 0 1 2 3 4 4. FIGURES Figure 4: CDF of the Error Term: Different Window Sizes 87 4. FIGURES Figure 5: PDF of the Error Term: Different Window Sizes 88 5. TABLES Year 1960 1980 1989 1997 89 Table 1 (A): The Discount Industry from 1960 to 1997 Number of Total Sales Average Store Number Discount Stores (2004 $bill.) Size (thou ft²) of Firms 1329 12.8 38.4 1016 8311 119.4 66.8 584 9406 123.4 66.5 427 9741 198.7 79.2 230 Source: various issues of Discount Merchandiser. The numbers only include traditional discount stores. Wholesale clubs, super-centers, and special retailing stores are all excluded. Table 1 (B): Summary Statistics for the Data Set 1988 1997 Variable Mean Std. Mean Std. 22.47 14.12 24.27 15.67 Population (thou.) 3.69 1.44 4.05 2.02 Per Capita Retail Sales (1984 $thou.) 0.30 0.23 0.33 0.24 Percentage of Urban Population Midwest (1 if in the Great Lakes, Plains, 0.41 0.49 0.41 0.49 or Rocky Mountain Region) 0.50 0.50 0.50 0.50 South (1 if Southwest or Southeast) 6.14 3.88 6.14 3.88 Distance to Benton, AR (100 miles) 0.21 0.42 0.19 0.41 % of Counties with Kmart Stores 0.32 0.53 0.48 0.57 % of Counties with Wal-Mart Stores 3.86 2.84 3.49 2.61 Number of Firms with 1-19 Employees 2065 Number of Counties Source: the 1988 population is from the U.S. Census Bureau’s website; the 1997 population is from the website of the Missouri State Census Data Center. Retail sales are from the 1987 and 1997 Economic Census, respectively. The percentage of urban population is from the 1990 and 2000 decennial census, respectively. Region dummies and the distance variable are from the 1990 census. The numbers of Kmart and Wal-Mart stores are from the Directory of Discount Stores, and the number of small stores is from the County Business Patterns. See section 4.2 for the definition of the chain effect. Table 1 (C): Summary Statistics for the Distance Weighted Number of Adjacent Stores 1988 1997 Variable Mean Std. Mean Std. Distance Weighted Number of Adjacent 0.11 0.08 0.13 0.11 Kmart Stores within 50 Miles Distance Weighted Number of Adjacent 0.10 0.08 0.19 0.19 Wal-Mart Stores within 50 Miles 2065 Number of Counties Source: the annual reference “Directory of Discount Department Stores” by Chain Store Guide, Business Guides, Inc., New York. 5. TABLES 90 Table 2: Parameter Estimates from Probit (Kmart and Wal-Mart) and Ordered Probit (Small Firms) Kmart's Profit Wal-Mart's Profit Small Firms' Profit Variable 1988 1997 Variable 1988 1997 Variable 1988 Log Population 1.68* 1.75* Log Population 1.27* 2.23* Log Population 1.24* 1997 1.18* (0.12) (0.12) (0.10) (0.11) (0.05) (0.05) Log Retail Sales 1.69* 1.64* Log Retail Sales 1.32* 1.58* Log Retail Sales 0.76* 0.64* (0.16) (0.14) (0.07) (0.06) Urban Ratio 1.56* 1.20* (0.25) (0.24) Midwest 0.43* 0.33* (0.09) (0.10) Constant -20.55* -20.59* (1.36) δ kw (0.14) (0.13) Urban Ratio 1.27* 0.99* (0.21) (0.21) Log Distance -1.63* -1.11* (0.09) (0.08) South 1.08* 0.87* (0.09) (0.09) Constant -5.75* -12.88* (1.11) (1.08) δ wk -0.57* -0.73* (0.11) (0.11) δ ww 2.88* -3.41* δ ws (0.79) (0.63) -0.20* -0.34* (0.08) (0.09) (1.28) -0.21* -0.66* (0.09) (0.11) δ kk -1.38* -1.15† δ ks (0.67) (0.64) -0.17† 0.01 (0.09) (0.09) Observation Number 2065 2065 Log Likelihood -583.64 -567.77 2065 2065 -680.25 -669.71 Urban Ratio -1.10* -0.99* (0.12) (0.12) South 0.62* 0.78* (0.05) (0.05) -0.34* -0.11 (0.07) (0.07) δ sk δ sw -0.32* -0.21* (0.06) (0.06) 2065 -4226 2065 -4035 Note: * denotes significance at the 5% confidence level, and † denotes significance at the 10% confidence level. Standard errors are in parentheses. The cutoff values for the small firms’ regressions are omitted. Midwest and South are regional dummies, with the Great Lakes region, the Plains region, and the Rocky Mountain region grouped as the “Midwest”, the Southwest region and the Southeast region grouped as the “South”. δ ij , i, j ∈ {k , w, s}, i ≠ j , denotes the competition effect, while δ ii , i ∈ {k , w}, denotes the chain effect. “k” stands for Kmart, “w” stands for Wal-Mart, and “s” stands for small stores. 5. TABLES Kmart's Profit Variable Log Population 91 1988 1.49* Table 3: Parameter Estimates from the Full Model Wal-Mart's Profit Small Firms' Profit 1997 Variable 1988 1997 Variable 1988 1.84* Log Population 1.54* 2.16* Log Population 1.65* 1997 1.90* (0.09) (0.13) (0.15) (0.15) (0.18) (0.26) Log Retail Sales 2.19* 2.01* Log Retail Sales 1.56* 1.85* Log Retail Sales 1.04* 1.17* (0.25) (0.23) Urban Ratio 2.07* 1.55* (0.42) (0.29) Midwest 0.40* 0.32* (0.12) (0.14) Constant -24.60* -24.08* (2.07) δ kw -0.93* (0.22) (0.29) δ kk 0.63* 0.75* δ ks (0.20) (0.36) -0.07 -0.02 ρ (0.05) (0.05) 0.53* 0.65* (0.11) (0.10) 62.84 (0.25) 2.19* 1.73* (0.35) (0.40) Log Distance -1.31* -1.01* (0.16) (0.14) South 0.94* 0.61* (0.13) (0.11) (2.07) -0.48* Function Value (0.35) Urban Ratio 30.80 Constant -10.90* -16.37* δ wk -1.54* -1.13* (2.98) (0.12) (0.16) Urban Ratio -0.46* -0.78* (0.21) (0.24) South 0.88* 1.03* (0.14) (0.17) Constant -10.22* -11.89* (0.98) (1.56) δ sk -1.20* -1.00* δ sw (0.23) (0.20) -1.11* -1.03* δ ss (0.16) (0.21) -2.14* -2.41* (0.28) (0.35) 2065 2065 (2.17) δ ww (0.21) (0.30) 1.22* 0.89* δ ws (0.43) (0.39) -0.03 -0.03 (0.11) (0.12) Observation Number Note: * denotes significance at the 5% confidence level, and † denotes significance at the 10% confidence level. Standard errors are in parentheses. Midwest and South are regional dummies, with the Great Lakes region, the Plains region, and the Rocky Mountain region grouped as the ‘Midwest’, the Southwest region and the Southeast region grouped as the ‘South’. δ ij , i, j ∈ {k , w, s}, i ≠ j , denotes the competition effect, while δ ii , i ∈ {k , w}, denotes the chain effect. ‘k’ stands for Kmart, ‘w’ stands for Wal-Mart, and ‘s’ stands for small stores. 1 − ρ 2 measures the importance of the market-level profit shocks. 5. TABLES 92 Table 4 (A): Parameter Estimates Using Different Equilibria (1988) Favors Kmart Favors Wal-Mart Favors Wal-Mart in South Kmart's Profit Log Population 1.49* 1.53* (0.09) (0.12) (0.13) Log Retail Sales 2.19* 2.16* 2.18* (0.25) (0.23) (0.30) Urban Ratio 2.07* 2.15* 2.23* (0.42) (0.36) (0.36) Midwest 0.40* 0.32* 0.27* (0.12) (0.10) (0.09) Constant -24.60* -24.20* -24.30* δ kw (2.07) (1.83) (2.51) -0.48* -0.51† -0.46† δ kk (0.22) (0.29) (0.25) 0.63* 0.71† 0.72* δ ks (0.20) (0.36) (0.16) -0.07 -0.13 -0.09 (0.05) (0.15) (0.09) 1.54* 1.52* 1.43* (0.15) (0.21) (0.13) Log Retail Sales 1.56* 1.52* 1.55* (0.35) (0.25) (0.17) Urban Ratio 2.19* 2.31* 2.45* (0.35) (0.34) (0.38) Log Distance -1.31* -1.29* -1.34* (0.16) (0.17) (0.12) South 0.94* 1.03* 1.00* Wal-Mart's Profit Log Population 1.46* (0.13) (0.13) (0.21) Constant -10.90* -10.88* -10.58* δ wk (2.98) (1.69) (1.46) -1.54* -1.58* -1.51* δ ww (0.21) (0.32) (0.32) 1.22* 1.32* 1.25* δ ws (0.43) (0.24) (0.47) -0.03 -0.03 -0.03 (0.11) (0.09) (0.07) 1.65* 1.64* 1.59* (0.18) (0.26) (0.13) Log Retail Sales 1.04* 1.04* 1.04* (0.12) (0.16) (0.12) Urban Ratio -0.46* -0.53† -0.46* (0.21) (0.32) (0.19) South 0.88* 0.85* 0.93* Small Firms' Profit Log Population (0.14) (0.11) (0.09) Constant -10.22* -10.21* -10.18* δ sk (0.98) (1.40) (1.05) -1.20* -1.14* -1.12* δ sw (0.23) (0.19) (0.18) -1.11* -1.08* -1.12* δ ss (0.16) (0.15) (0.13) -2.14* -2.14* -2.10* ρ (0.28) (0.37) (0.16) 0.53* 0.54* 0.54* (0.11) (0.12) (0.08) Function Value Number of Observations 62.84 2065 62.87 2065 71.30 2065 Note: * denotes significance at the 5% confidence level, and † denotes significance at the 10% confidence level. Standard errors are in parentheses. See Table 3 for the explanation of the variables and parameters. 5. TABLES 93 Table 4 (B): Parameter Estimates Using Different Equilibria (1997) Favors Kmart Favors Wal-Mart Favors Wal-Mart in South Kmart's Profit Log Population 1.84* 1.66* (0.13) (0.13) (0.15) Log Retail Sales 2.01* 1.91* 1.83* (0.23) (0.34) (0.22) Urban Ratio 1.55* 1.57* 1.74* (0.29) (0.20) (0.50) Midwest 0.32* 0.26* 0.22 (0.14) (0.10) (0.14) -24.08* -22.52* -22.18* δ kw (2.07) (3.16) (1.73) -0.93* -0.94* -0.92* δ kk (0.29) (0.23) (0.29) 0.75* 0.82* 0.74* δ ks (0.36) (0.33) (0.36) -0.02 -0.02 -0.01 (0.05) (0.07) (0.14) 2.16* 2.05* 2.03* (0.15) (0.14) (0.17) Log Retail Sales 1.85* 1.72* 1.74* (0.25) (0.15) (0.20) Urban Ratio 1.73* 1.72* 1.90* (0.40) (0.53) (0.51) Log Distance -1.01* -1.04* -1.04* (0.14) (0.10) (0.10) South 0.61* 0.74* 0.55* Constant Wal-Mart's Profit Log Population 1.70* (0.11) (0.08) (0.10) Constant -16.37* -14.80* -14.87* δ wk (2.17) (1.30) (1.26) -1.13* -1.26* -1.15* δ ww (0.30) (0.21) (0.55) 0.89* 0.91* 0.87* δ ws (0.39) (0.23) (0.27) -0.03 -0.10 -0.05 (0.12) (0.12) (0.10) 1.90* 1.92* 1.95* (0.26) (0.15) (0.17) Log Retail Sales 1.17* 1.20* 1.15* (0.16) (0.21) (0.12) Urban Ratio -0.78* -0.80* -0.73† (0.24) (0.21) (0.39) South 1.03* 1.00* 0.97* Small Firms' Profit Log Population (0.17) (0.12) (0.07) Constant -11.89* -12.18* -11.84* δ sk (1.56) (1.82) (1.21) -1.00* -1.07* -1.09* δ sw (0.20) (0.18) (0.30) -1.03* -1.03* -1.04* δ ss (0.21) (0.15) (0.23) -2.41* -2.39* -2.40* ρ (0.35) (0.20) (0.17) 0.65* 0.67* 0.63* (0.10) (0.08) (0.14) Function Value Number of Observations 30.80 2065 32.53 2065 37.70 2065 Note: * denotes significance at the 5% confidence level, and † denotes significance at the 10% confidence level. Standard errors are in parentheses. See Table 3 for the explanation of the variables and parameters. 5. TABLES 94 Number of: Kmart Wal-Mart Small Firms Table 5 (A): Model's Goodness of Fit 1988 1997 Sample Model Sample Mean Mean Mean 0.21 0.21 0.19 0.32 0.32 0.48 3.86 3.79 3.49 Model Mean 0.20 0.49 3.43 Note: the model nails down the sample mean of Kmart and Wal-Mart in 1988 almost exactly. The average number of small discount stores in each county is 3.86, while the model’s prediction is 3.79. The results are similar for 1997. Table 5 (B): Correlation between Model Prediction and Sample Observation Number of: 1988 1997 Kmart 0.66 0.64 Wal-Mart 0.73 0.75 Small Firms 0.63 0.64 Note: the correlation between the predicted and the observed number of Kmart stores is 0.66 in 1988, and 0.64 in 1997. The correlation between the predicted and the observed numbers of Wal-Mart stores (and small stores) is also very high. Overall, the model fits the sample variation fairly well. Table 6: Model Predicted Profit vs. Accounting Profit Kmart Wal-Mart Model Average 1997/ 1988 1988 1997 0.80 0.74 0.92 1.03 1.55 1.51 Average Accounting Profit 1988 1997 1997/ ($mill.) ($mill.) 1988 0.56 0.14* 0.25 0.95 1.34 1.41 Source: Kmart’s and Wal-Mart’s SEC 10-K annual report. *: Kmart’s accounting profit fluctuated dramatically in the 1990s, due to the financial obligations of the various divested businesses. A better indicator of its store profit is probably the average store sales, which remained stagnant throughout the 90s. 5. TABLES 95 Table 7 (A): Number of Kmart Stores When the Market Size Changes 1988 1997 Percent Total Percent Total Base Case 100.0% 431 100.0% 408 Population Up 10% 112.1% 483 115.7% 472 Retail Sales Up 10% 117.6% 507 117.4% 479 Urban Ratio Up 10% 107.0% 461 106.1% 433 Midwest=0 for All Counties 86.1% 371 87.7% 358 Midwest=1 for All Counties 119.3% 514 115.4% 471 Table 7 (B): Number of Wal-Mart Stores When the Market Size Changes 1988 1997 Percent Total Percent Base Case 100.0% 658 100.0% Population Up 10% 110.3% 726 108.3% Retail Sales Up 10% 110.5% 727 107.1% Urban Ratio Up 10% 105.3% 693 102.5% Distance Up 10% 91.6% 603 96.2% South=0 for All Counties 64.1% 422 88.4% South=1 for All Counties 135.1% 889 113.2% Total 1014 1098 1086 1039 975 896 1148 Table 7 (C): Number of Small Firms When the Market Size Changes 1988 1997 Percent Total Percent Base Case 100.0% 7831 100.0% Population Up 10% 108.7% 8511 109.0% Retail Sales Up 10% 105.4% 8253 105.4% Urban Ratio Up 10% 99.3% 7773 98.7% South=0 for All Counties 78.4% 6136 76.2% South=1 for All Counties 124.8% 9775 125.0% Total 7090 7727 7474 7000 5404 8860 Note: for each of these simulation exercises in all three panels, I fix other firms’ profits and only change the profit of the target firm in accordance with the change in the market size. I re-solve the model to obtain the equilibrium numbers of firms. For example, in the second row of Table 7 (A), I increase Kmart’s profit according to a ten percent increase in population while holding Wal-Mart and small firms’ profits constant. Using this new set of profits, the equilibrium number of Kmart stores is 12.1% higher than in the base case in 1988. 5. TABLES 96 Table 8 (A): Number of Small Firms with Different Market Structure 1988 1997 Percent Total Percent Total No Kmart or Wal-Mart 100.0% 12070 100.0% 10946 Only Kmart in Each Market 54.0% 6519 63.8% 6985 Only Wal-Mart in Each Market 56.7% 6849 63.0% 6898 Both Kmart and Wal-Mart 28.6% 3457 38.4% 4198 Wal-Mart Competes with Kmart 64.9% 7831 64.8% 7090 Wal-Mart Takes Over Kmart 72.9% 8796 72.3% 7918 Table 8 (B): Competition Effect for Kmart and Wal-Mart 1988 1997 Percent Total Percent Number of Kmart Stores Base Case 100.0% 431 100.0% Wm in Each Market 78.0% 336 79.9% Wm Exits Each Market 111.1% 479 149.5% Not Compete with Small 108.1% 466 102.7% Number of Wal-Mart Stores Base Case Km in Each Market Km Exits Each Market Not Compete with Small 1988 Percent 100.0% 48.3% 128.6% 102.6% Total 658 318 846 675 1997 Percent 100.0% 71.8% 108.6% 101.5% Total 408 326 610 419 Total 1014 728 1101 1029 Table 8 (C) : Chain Effect for Kmart and Wal-Mart Kmart Wal-Mart 1988 1997 1988 1997 Percentage of Profit Explained by Chain Effect 14.0% 17.4% 10.2% 12.3% Reduction in Number of Stores with No Chain Effect 40 46 125 109 Note: for the first four rows in Table 8(A), I fix the number of Kmart and Wal-Mart stores as specified and solve for the equilibrium number of small stores. For the last two rows in Table 8(A) and all rows (except for the rows of ‘Base Case’) in Table 8(B), I re-solve the full model using the specified assumptions. ‘Base Case’ in Table 8(B) is what we observe in the data when Kmart competes with Wal-Mart. Table 8(C) explains the importance of chain effect for both Kmart and Wal-Mart. Overall, the benefit from the chain effect is 10-17% of a chain store’s profit. 5. TABLES 97 Table 9: The Impact of Wal-Mart's Expansion on Small Stores 1988 1997 Observed Decrease in the Number of Small Stores 748 748 Predicted Decrease from the Full Model 558 383 Percentage Explained 75% 51% Predicted Decrease from Ordered Probit 247 149 Percentage Explained 33% 20% Note: for the full model, the predicted 558 store exits in 1988 are obtained by simulating the change in the number of small stores using the 1988 coefficients for Kmart’s and the small stores’ profit functions, but the 1997 coefficients for Wal-Mart’s profit function. The column of 1997 uses the 1997 coefficients for Kmart’s and small stores’ profit functions, but the 1988 coefficients for Wal-Mart’s profit function. For the ordered probit model, the predicted store exits are the difference between the expected number of small stores using Wal-Mart’s 1988 store number and the expected number of small stores using Wal-Mart’s 1997 store number, both of which calculated using the probit coefficient estimates for the indicated year. Table 10: The Impact of Government Subsidies Average Number of Stores 1988 1997 Base Case Kmart Wal-Mart Small Firms 0.21 0.32 3.79 Changes in the Number of Stores Compared to the Base Case 1988 1997 0.20 0.49 3.43 Subsidize Kmart's Profit by 10% 0.22 0.21 Kmart 0.31 0.49 Wal-Mart 3.77 3.41 Small Firms 0.01 -0.01 -0.03 0.01 0.00 -0.02 Subsidize Wal-Mart's Profit by 10% 0.21 0.19 Kmart 0.34 0.52 Wal-Mart 3.74 3.39 Small Firms 0.00 0.02 -0.05 -0.01 0.03 -0.04 Subsidize Small Firms' Profit by 100% 0.21 0.20 Kmart 0.32 0.49 Wal-Mart 4.61 4.23 Small Firms 0.00 0.00 0.81 0.00 0.00 0.80 Note: for each of these counter-factual exercises, I incorporate the change in the subsidized firm’s profit and re-solve the model to obtain the equilibrium numbers of stores. 5. TABLES 98 Table 11: Estimated Coefficients (Standard Errors) Semi-Parametric Parametric Model Model 1 1 b a1 a2 Median of Fixed Cost Log Likelihood (NA) (NA) -4.94** -5.58** (0.12) (0.55) -4.21** -4.70** (0.11) (0.47) -8.26 -9.93** (NA) -1.53 -527.6 -548.7 Note: ** denotes significance at the 1% level, and * denotes significance at the 5% level. Table 12: Entry Threshold Parametric Model Monopoly Threshold (Per Firm Market Size) Duopoly Threshold (Per Firm Market Size) Total Duopoly Market Size Change in Market Size from Monopoly to Duopoly Threshold Ratio (S_2 / S_1) Parametric Model 2.76** 2.73** (0.11) (0.08) 3.41** 3.61** (0.16) (0.22) 6.81** 7.21** (0.23) (0.43) 4.05 4.48 1.23 1.32 Note: ** denotes significance at the 1% level, and * denotes significance at the 5% level. 5. TABLES b 99 Table 13: Comparison between Parametric and Semi-Parametric Estimates with the Truncated Sample Semi-Parametric Parametric Model Model 1 1 a1 a2 (NA) (NA) -5.24** -5.32** (0.13) (0.52) -4.39** -4.57** (0.11) (0.44) -8.65 -9.07** (NA) (1.45) Log Likelihood -493.3 -511.9 Monopoly Threshold (Per Firm Market Size) 2.63** 2.69** (0.10) (0.08) 3.54** 3.46** (0.16) (0.24) 4.45 4.23 1.35 1.29 Median of Fixed Cost Duopoly Threshold (Per Firm Market Size) Change in Market Size from Monopoly to Duopoly Threshold Ratio (S_2 / S_1) Note: ** denotes significance at the 1% level, and * denotes significance at the 5% level. Bibliography Abramson, Ian S. (1982) “On Bandwidth Variation in Kernel Estimates - A Square Root Law”, The Annals of Statistics, 10(4), 1217—1223. Andrews, Donald W.K., Steven Berry, and Panle Jia (2004), “Confidence Regions for Parameters in Discrete Games with Multiple Equilibria,” Yale University working paper. Bain, Joe S. (1956) Barriers to New Competition. Cambridge: Harvard University Press. Bajari, Patrick, and Jeremy Fox (2005), “Complementarities and Collusion in an FCC Spectrum Auction,” working paper. Bajari, Patrick, Han Hong, and Stephen Ryan (2004), “Identification and Estimation of Discrete Games of Complete Information,” Duke University working paper. Basker, Emek (2005a), “Job Creation or Destruction? Labor-Market Effects of Wal-Mart Expansion,” The Review of Economics and Statistics, vol. 87, No. 1, pp. 174-183. Basker, Emek (2005b), “Selling a Cheaper Mousetrap: Wal-Mart’s Effect on Retail Prices,” Journal of Urban Economics, Vol. 58, No. 2, pp. 203-229. Berry, Steven (1992), “Estimation of a Model of Entry in the Airline Industry,” Econometrica, vol 60, No. 4, pp. 889-917. Berry, Steve, and Waldfogel, Joel. (1999), “Free Entry and Social Inefficiency in Radio Broadcasting”, RAND Journal of Economics, 30, 397-420. Bresnahan, Timothy F., and Reiss, Peter C. (1987), “Do Entry Conditions Vary across Markets?” Brookings Papers on Economic Activity, 1987, 833-871. Bresnahan, Timothy and Peter Reiss (1990), “Entry into Monopoly Markets,” Review of Economic Studies, vol. 57, No. 4, pp. 531-553. Bresnahan, Timothy and Peter Reiss (1991), “Entry and Competition in Concentrated Markets,” Journal of Political Economy, vol. 95, No. 5, pp. 57-81. Chernozhukov, Victor, Han Hong, and Elie Tamer (2004), “Inference on Parameter Set in Econometric Models,” Princeton University working paper. Ciliberto, Federico, and Elie Tamer (2006), "Market Structure and Multiple Equilibria in Airline Markets," Northwestern University working paper. 100 BIBLIOGRAPHY 101 Committee on Small Business. House (1994), “Impact of Discount Superstores on Small Business and Local Communities,” Committee Serial No. 103-99. Congressional Information Services, Inc. Conley, Timothy (1999), “GMM Estimation with Cross Sectional Dependence,” Journal of Econometrics, Vol. 92, pp. 1-45. Conley, Timothy, and Ethan Ligon (2002), “Economic Distance and Cross Country Spillovers”, Journal of Economic Growth, Vol. 7, pp. 157-187. Davis, Peter (2005) “Spatial Competition in Retail Markets: Movie Theaters,” forthcoming RAND Journal of Economics. Directory of Discount Department Stores (1988-1997), Chain Store Guide, Business Guides, Inc., New York. Discount Merchandiser (1988-1997), Schwartz Publications, New York. Foster, Lucia, John Haltiwanger, and C.J. Krizan (2002), “The Link Between Aggregate and Micro Productivity Growth: Evidence from Retail Trade,” NBER working paper, No. 9120. Gowrisankaran, Gautam, and Joanna Stavins (2004), “Network Externalities and Technology Adoption: Lessons from Electronic Payments,” Rand Journal of Economics, Vol. 35, No. 2, pp 260-276. Haile, Philip, and Elie Tamer (2003), “Inference with an Incomplete Model of English Auctions,” Journal of Political Economy, Vol 111, pp1-51. Hausman, Jerry, and Ephraim Leibtag (2005), “Consumer Benefits from Increased Competition in Shopping Outlets: Measuring the Effect of Wal-Mart,” NBER working paper, No. 11809. Holmes, Thomas (2001), “Barcodes lead to Frequent Deliveries and Superstores,” RAND Journal of Economics, Vol. 32, No. 4, pp. 708-725. Holmes, Thomas (2005), “The Diffusion of Wal-Mart and Economies of Density,” University of Minnesota working paper. Klein, Roger W., and Sherman, Robert P. (2002), “Shift Restrictions and Semiparametric Estimation in Ordered Response Models”, Econometrica, 70, 663-691. Klein, Roger W., and Spady, Richard H. (1993), “An Efficient Semiparametric Estimator for Binary Response Models”, Econometrica, 61, 387-421. Kmart Inc. (1988-2000), Annual Report. Mazzeo, Michael (2002), “Product Choice and Oligopoly Market Structure,” RAND Journal of Economics, vol. 33, No. 2, pp. 1-22. BIBLIOGRAPHY 102 McFadden, Daniel (1989), “A Method of Simulated Moments for Estimation of Discrete Response Models without Numerical Integration,” Econometrica, vol. 57, No. 5, pp. 9951026. Neumark, David, Junfu Zhang, and Stephen Ciccarella (2005), “The Effects of Wal-Mart on Local Labor Markets,” NBER working paper, No. 11782. Pakes, Ariel and David Pollard, “Simulation and the Asymptotics of Optimization Estimators,” Econometrica, vol. 57, No. 5, pp. 1027-1057. Pakes, Ariel, Jack Porter, Kate Ho, and Joy Ishii (2005), “Moment Inequalities and Their Application,” Harvard working paper. Pinkse, Joris, Margaret Slade, and Craig Brett (2002), “Spatial Price Competition: A Semiparametric Approach,” Econometrica, Vol. 70, No. 3, pp. 1111-1153. Seim, Katja (2006), “An Empirical Model of Firm Entry with Endogenous Product-Type Choices,” forthcoming RAND Journal of Economics. Shaikh, Azeem (2005), “Inference for Partially Identified Econometric Models,” Stanford University working paper. Shils, Edward B. (1997), “The Shils Report: Measuring the Economic and Sociological Impact of the Mega-Retail Discount Chains on Small Enterprise in Urban, Suburban and Rural Communities,” online: http://www.lawmall.com/rpa/rpashils.htm. Silverman, B.W. (1986) “Density Estimation for Statistics and Data Analysis”, Chapman and Hall. Smith, Howard (2004), “Supermarket Choice and Supermarket Competition in Market Equilibrium,” Review of Economic Studies, vol. 71, No. 1, pp. 235-263. Stone, Keneth (1995), “Impact of Wal-Mart Stores On Iowa Communities: 1983-93,” Economic Development Review, Vol. 13, No. 2, pp. 60-69. Tarski, Alfred (1955), “A Lattice-Theoretical Fixpoint Theorem and Its Applications,” Pacific Journal of Mathematics, Vol 5, pp 285-309. Taylor, Don, and Jeanne Archer (1994), “Up Against the Wal-Marts (How Your Business Can Survive in the Shadow of the Retail Giants),” New York: American Management Association. Toivanen, Otto, and Waterson, Michael (2005), “Market Structure and Entry: Where is the Beef?”, Rand Journal of Economics, Vol 36, pp680-699. Topkis, Donald (1978), “Minimizing a submodular function on a lattice,” Operations Research, Vol 26, pp305-321. Topkis, Donald (1979), “Equilibrium Points in Nonzero-Sum n-Person Submodular Games,” SIAM Journal of Control and Optimum, Vol 17, pp 773-787. BIBLIOGRAPHY 103 Topkis, Donald (1998), “Supermodularity and Complementarity,” Princeton University Press, New Jersey. Train, Kenneth (2000), “Halton Sequences for Mixed Logit,” UC Berkeley working paper. Train, Kenneth (2003), “Discrete Choice Methods with Simulation,” Cambridge University Press, Cambridge, UK. Vance, Sandra S., and Roy V. Scott (1994), “A History of Sam Walton’s Retail Phenomenon,” Twayne Publishers, New York. Wal-Mart Stores, Inc. (1970-2000), Annual Report. Zhou, Lin (1994), “The Set of Nash Equilibria of a Supermodular Game Is a Complete Lattice,” Games and Economic Behavior, Vol. 7, pp 295-300.