Entry and Competition in the Retail and Service Industries

Entry and Competition in the Retail and Service Industries
A Dissertation
Presented to the Faculty of the Graduate School
of
Yale University
in Candidacy for the Degree of
Doctor of Philosophy
by
Panle Jia
Dissertation Directors: Professor Steven Berry and Professor Penny Goldberg
December 2006
Contents
Acknowledgments
vii
1 Introduction
1
2 What Happens When Wal-Mart Comes to Town: An Empirical Analysis
of the Discount Retailing Industry
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Industry background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Market definition and data description . . . . . . . . . . . . . . . . .
2.4 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Model setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2 The profit function . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Solution algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.1 The best response function . . . . . . . . . . . . . . . . . . . . . . .
2.5.2 The maximization problem with two competing chains . . . . . . . .
2.5.3 Adding small firms . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Empirical implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.2 Discussion: a closer look at the assumptions . . . . . . . . . . . . . .
2.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7.1 Parameter estimates . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7.2 The competition effect and the chain effect . . . . . . . . . . . . . .
2.7.3 The impact of Wal-Mart’s expansion and related policy issues . . . .
2.8 Conclusion and future work . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9 Appendix A: definitions and proofs . . . . . . . . . . . . . . . . . . . . . . .
2.9.1 Verification of the necessary condition (2.3) . . . . . . . . . . . . . .
2.9.2 The set of fixed points of an increasing function that maps a lattice
into itself . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9.3 A tighter lower bound and upper bound for the optimal solution vector D∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9.4 Verification that the chains’ profit functions are supermodular with
decreasing differences . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9.5 Multiple maximizers to the chains’ optimization problem . . . . . .
i
5
6
12
14
14
15
18
18
19
24
25
29
32
33
33
37
39
39
44
48
50
51
51
52
54
56
58
2.9.6 Computational issues . . . . . . . . . . . . . . . . . . . . . . . . . .
2.10 Appendix B: data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
62
3 Semi-Parametric Estimation of the Distribution of Fixed Costs in Entry
Models
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Estimation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Kernel Density Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Identification Assumptions and Properties of the Quasi Maximum Likelihood
Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Differences between my model and Klein and Spady (1993) . . . . . . . . .
3.7 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
63
66
67
70
73
76
78
78
84
4 Figures
85
5 Tables
89
Bibliography
100
To my dad, my mom, and my sister’s family
ABSTRACT
Entry and Competition in the Retail and Service Industries
Panle Jia
2006
My thesis studies entry and competition in the retail and service industries. It builds
on the entry literature and seeks to contribute in two ways. First, it extends the existing
methodology by relaxing several commonly used assumptions. Second, it applies these
extensions to analyze policy issues in the retail and service industries.
Most entry models assume that entry decisions in different markets are independent.
While this assumption simplifies estimation, it is ill-suited to study retail chains that can
exploit the scale economies arising from clustering stores in nearby markets. The first
chapter formulates and estimates a model that captures the dependence of entry decisions
induced by the scale economies. The model is applied to the discount retail industry to
quantify the impact of chain stores on small retailers. The results indicate that entry by
either a Kmart or a Wal-Mart store displaces forty to fifty percent of small discount retailers,
and that Wal-Mart’s expansion in the 90s explains fifty to seventy percent of the net change
in the number of these small discount retailers. Direct subsidies to either chains or small
firms are not likely to be cost effective in increasing the number of firms or the level of
employment. Finally, scale economies were important for both Kmart and Wal-Mart, but
the magnitude did not grow proportionately with the chains’ sizes.
The second chapter relaxes the distributional assumption for the error term in entry
models. This is motivated by the observation that the maximum likelihood estimator (MLE)
is inconsistent if the assumed distribution is wrong. Klein and Spady (1993) proposed a
consistent and efficient semi-parametric estimator that does not require the error term’s
distribution to be specified. Their approach is designed for single-index models. I extend
their method to an entry model with multiple indices and apply this extension to study
the entry cost in the dry cleaning industry. I find evidence that the distribution of the
entry cost is asymmetric with a long tail, and that this asymmetry drives the difference
between the MLE estimates and the semi-parametric estimates. The proposed method can
therefore be used as a robustness check for the sensitivity of the parameter estimates to the
distributional assumption.
c 2007 by Panle Jia
°
All rights reserved.
Acknowledgments
I am deeply indebted to all of my committee members: Steven Berry, Penny Goldberg,
Hanming Fang, and Philip Haile. I would not have come to where I am without their
constant guidance and support.
Writing the thesis is like climbing a mountain. Little did I know what it was like
when I first started. It is not always easy to find the right trail, and I certainly would
have got lost many times groping in the jungles if I had not had their careful guidance.
Those frequent office visits always cleared my cloudy thoughts and lit the road ahead of
me. There were a few times when I had major setbacks. It felt like standing in front of
a giant cliff after sweating for months and only realizing afterwards that the object ahead
was insurmountable. As my confidence dwindled to the size of a sesame kernel, they poured
out their help and encouragement, and found with me the unnoticeable path that wound
along the cliff. If I have achieved anything, it is as much theirs as it is mine.
I owe many. Don Brown was the cheerleader as I ran the marathon of the Ph.D.
program; Don Andrews showed me the right equipment before the journey; Pat Bayer
spent uncountable days helping me to stay in focus; Alvin Klevorick instructed me in the
right technique as I breathed for the final sprint.
Oh, how can I forget my buddies — those who went through the same journey: Shamena
Anwar, Rossella Argenziano, Erik and Randi Hjalmarsson, Deran Ozmen, Philipp SchmidtDengler, Henry Schneider, Tavneet Suri, Feng Zhu. . . I am so lucky to get to know all
of them. The list is incomplete without the ones that have become so dear to me: Pam
O’Donnell, Susan Olmsted, Pat Brown, and Dorothy Ovelar. . . Thank you all!
Chapter 1
Introduction
Entry has long been an interesting topic for IO economists. Among the early papers, Bain
(1956) focused on “determinants of entry barriers”, such as economies of scale, product
differentiation and cost asymmetry. With the application of game theory to economics in
the 1970s and 1980s, IO economists started to model how strategic behavior among firms
influences firms’ entry decisions. This literature tends to show that strategic behavior in
many cases is more important than technology and demand factors. The empirical work
that directly models strategic interaction among firms did not start until the late 1980s.
Among the earliest works are Bresnahan and Reiss (1987, 1990, and 1991), and Berry (1992).
The most recent ones include Berry and Waldfogel (1999), Mazzeo(2002), Seim(2006), and
Toivanen and Waterson (2005), among many others.
These structural models rely on the fact that firms’ discrete entry decisions reveal information about their underlying profit. By observing how firms’ entry or exit decisions
change when market conditions evolve, researchers can make inferences about how market
conditions affect firms’ profit and possibly learn about the nature of the competition among
1
2
firms.
My thesis builds on the entry literature and seeks to contribute in two ways. First, it
extends the existing methodology by relaxing a couple of commonly used assumptions. Second, it applies these extensions to analyze policy issues in the retail and service industries.
Most entry models assume that entry decisions in different markets are independent.
While this assumption simplifies the estimation, it is clearly problematic in the case of
retail chains that operate multiple stores in several markets. There have been many suggested explanations for the success of chains, and almost all of them point to some kind
of spillovers among stores operated by the same chain. For example, there can be significant scale economies in the distribution system; stores close by can split the operation cost
and advertising expenses; stores also frequently share their private information about local
markets and learn from one another’s managerial practices. All these factors suggest that
entry decisions are dependent across markets as chains exploit various spillover effects to
maximize their total profit.
The first chapter formulates and estimates a model that explicitly captures the spatial
correlation of entry decisions. The model is applied to the discount retail industry to
quantify the impact of chain stores on small retailers. There are three major challenges in
this exercise: the dimensionality problem, existence of multiple equilibria, and the invalidity
of the standard inference method in GMM when the data are spatially dependent.
The dimensionality problem arises because there are a large number of markets that
chains have entered or potentially could have entered but chosen not to. As a consequence,
modeling chains’ decisions necessitates solving a profit maximization problem defined on a
3
lattice space with a large dimension. The solution method borrows from the insight embodied in many recent papers on incomplete models (for example, Haile and Tamer (2003),
Ciliberto and Tamer (2006)). In these papers, researchers exploit necessary conditions for
modeling or identification, either because the research goal is to study a class of models,
or because sufficient conditions are too difficult and sometimes infeasible to characterize.
Similarly, I start with a set of necessary conditions, and transform the profit maximization
problem into a search for the fixed points of the necessary conditions. To solve the game
with two competing chains, I make use of the supermodularity property of the game.
Like most static entry models, the current one also has multiple equilibria. The presence of multiple equilibria poses considerable challenges to estimation, as there is no longer
a one-to-one mapping between regions of the unobservables and the observed equilibrium
outcomes. There has been some progress in estimating models with multiple equilibria, but
none of the existing methods are suitable for complicated models like the one studied here.
Therefore I make the model ‘complete’ by choosing an equilibrium that seems reasonable a
priori. The parameter estimates are stable across several very different equilibria. To address the issue of inference in GMM with spatial correlated data, I adopt the non-parametric
covariance matrix estimator proposed by Conley’s (1999).
The second chapter relaxes the distributional assumption for the error term in entry
models. This is motivated by the observation that the maximum likelihood estimator (MLE)
is inconsistent if the assumed distribution is wrong. Klein and Spady (1993) proposed a
consistent and efficient semi-parametric estimator that does not require the error term’s
distribution to be specified. Their approach is designed for single-index models. Entry
4
models naturally have multiple indices or cutoff values: observing M firms is defined by
the profit with M − 1 competitors and the profit with M competitors. With N potential
entrants, there are N cutoff values. However, a simple trick restores the single-index feature
and I apply the model to study the entry cost in the dry cleaning industry.
The conclusion of this chapter is more qualitative than quantitative. I find evidence
that the error term has an asymmetric distribution, although I can not reject the hypothesis
that the MLE estimates and the semi-parametric estimates are the same.
Nevertheless,
the method proposed in this chapter can be used for robustness checks. The absence
of significant differences between the two groups of estimates brings us confidence in the
validity of the distribution assumption, while the finding of a significant difference calls for
attentions to possible specification errors.
Chapter 2
What Happens When Wal-Mart
Comes to Town: An Empirical
Analysis of the Discount Retailing
Industry
“Bowman’s (in a small town in Georgia) is the eighth ‘main street’ business to
close since Wal-Mart came to town.. . . For the first time in seventy-three years
the big corner store is empty.” Archer and Taylor, Up against the Wal-Mart.
“There is ample evidence that a small business need not fail in the face of competition from large discount stores. In fact, the presence of a large discount
store usually acts as a magnet, keeping local shoppers. . . .and expanding the
market. . . .” Morrison Cain, Vice president of International Mass Retail Association.
5
2.1 Introduction
2.1
6
Introduction
The landscape of the U.S. retail industry has changed considerably over the past few
decades as the result of two closely related trends. One is the rise of discount retailing; the
other is the increasing prevalence of large retail chains. In fact, the discount retailing sector
is almost entirely controlled by chains. In 1997, the top three chains (Wal-Mart, Kmart,
and Target) accounted for 72.7% of total sector sales and 54.3% of the discount stores.
Discount retailing is a fairly new concept, with the first discount stores appearing in
the 1950s. The leading magazine for the discount industry, Discount Merchandiser, defines
a modern discount store as a departmentalized retail establishment that makes use of selfservice techniques to sell a large variety of hard goods and soft goods at uniquely low
margins.1,2 Over the span of several decades, the sector has emerged from the fringe of the
retail industry and become part of the mainstream.3 From 1960 to 1997, the total sales
revenue of discount stores, in real terms, increased 15.6 times, compared with an increase
of 2.6 times for the entire retail industry.
As the discount retailing sector continues to grow, opposition from other retailers, especially small ones, begins to mount. The critics tend to associate discounters and other
big retailers with small-town problems caused by the closing of small firms, such as the
1
See the annual report “The True Look of the Discount Industry” in the June issue of Discount Merchandiser for the definition of the discount retailing, the sales and store numbers for the top 30 largest firms, as
well as the industry sales and total number of discount stores.
2
According to Annual Benchmark Report for Retail Trade and Food Services published by the Census
Bureau, from 1993 to 1997, the average markup for regular department stores was 27.9%, while the average
markup for discount stores was 20.9%. Both markups increased slightly from 1998 to 2000.
3
In 1997, the discount retailing sector accounted for 15% of total retail sales. The other retail sectors
are: building materials, food stores, automotive dealers, apparel, furniture, eating and drinking places, and
miscellaneous retail.
2.1 Introduction
7
decline of downtown shopping districts, eroded tax bases, decreased employment, and the
disintegration of closely knit communities. Partly because tax money is used to restore the
blighted downtown business districts and to lure the business of big retailers with various
forms of economic development subsidies, the effect of big retailers on small firms and local communities has become a matter of public concern.4 My first goal in this chapter is
to quantify the impact of national discount chains on the profitability and entry and exit
decisions of small retailers from the late 1980s to the late 1990s.
The second salient feature of retail development in the past several decades, including in
the discount sector, is the increasing dominance of large chains. In 1997, retail chains with a
hundred or more stores accounted for 0.07% of the total number of firms, yet they controlled
21% of the establishments and accounted for 37% of sales and 46% of retail employment.5
Since the late 1960s, their share of the retail market more than doubled. In spite of the
dominance of chain stores, few empirical studies (except Holmes (2005) and Smith (2004))
have quantified the potential advantages of chains over single-unit firms, in part because
of the modeling difficulties.6 In entry models, for example, the store entry decisions of
multi-unit chains are related across markets. Most of the literature assumes that entry
decisions are independent across markets and focuses on competition among firms within
each local market. My second objective here is to extend the entry literature by relaxing
4
See The Shils Report (1997): Measuring the Economic and Sociological Impact of the Mega-Retail Discount Chains on Small Enterprises in Urban, Suburban and Rural Communities.
5
See the 1997 Economic Census Retail Trade subject series Establishment and Firm Size (Including Legal
Form of Organization), published by the US Census Bureau.
6
I discuss Holmes (2005) in detail below. Smith (2004) estimates the demand cross-elasticities between
stores of the same firm and finds that mergers between the largest retail chains increase the price level by
up to 7.4%.
2.1 Introduction
8
the independence assumption, and to quantify the advantage of operating multiple units by
explicitly modeling chains’ entry decisions in a large number of markets.
The model has two key features. First, it allows for flexible competition patterns among
all retailers. Second, it incorporates the potential benefits of locating multiple stores near
one another. Such benefits, which I group as “the chain effect,” can arise through several
different channels. For example, there may be significant scale economies in the distribution
system. Stores located near each other can split advertising costs or employee training costs,
or they can share knowledge about the specific features of local markets.
The chain effect causes profits of stores in the same chain to be spatially related. As a
result, choosing store locations to maximize total profit is complicated, since with N markets
there are 2N possible location choices. In the current application, there are more than 2,000
markets and the number of possible location choices exceeds 10300 . When several chains
compete against each other, solving for the Nash equilibrium becomes further involved, as
firms balance the gains from the chain effect against competition from rivals. I tackle this
problem in several steps. First, I transform the profit maximization problem into a search
for the fixed points of the necessary conditions. This transformation shifts the focus of the
problem from a set with 2N elements to the set of fixed points of the necessary conditions.
The latter has a much smaller dimension, and is well-behaved with easy-to-locate minimum
and maximum points. Having dealt with the problem of dimensionality, I take advantage
of the supermodularity property of the game to search for the Nash equilibrium. Finally, in
estimating the parameters, I adopt the econometric technique proposed by Conley (1999)
to address the issue of cross-sectional dependence.
2.1 Introduction
9
The analysis exploits a unique data set I collected that covers the entire discount retailing industry from 1988 to 1997, during which the two major national chains were Kmart
and Wal-Mart.7 The results indicate that Wal-Mart’s expansion from the late 1980s to
the late 1990s explains about fifty to seventy percent of the net change in the number of
small discount retailers. Unobserved market-level profit shocks induce a positive correlation
among the entry decisions of chains and small firms; failure to address this endogeneity issue would underestimate Wal-Mart’s impact on small firms by fifty to sixty percent. Scale
economies were important to both Wal-Mart and Kmart, but their importance did not grow
proportionately with the size of the chains. Finally, government subsidies to either chains
or small firms in this industry are not likely to be effective in increasing the number of firms
or the level of employment.
The results in this chapter complement a recent study by Holmes (2005), which analyzes
the diffusion process of Wal-Mart stores. Holmes quantifies the economies of density, defined
as the cost savings from locating stores close to one another, a concept similar to the chain
effect in this chapter. The central insight in his paper is that markets vary in quality; in
the absence of economies of density, Wal-Mart would open stores in the most profitable
markets first and gradually expand to less profitable ones. Since profitable markets do
not necessarily cluster, Wal-Mart should open stores erratically across regions. The actual
opening process, however, displayed a regular pattern of diffusion from the South, where
Wal-Mart’s headquarters are, to other regions. Due to the complexity of the dynamics,
with the state space growing exponentially with the number of markets and time periods,
7
During the sample period, Target was a regional store that competed mostly in the big metropolitan
areas in the Midwest with few stores in the sample. See the data section for more details.
2.1 Introduction
10
it is extremely difficult to solve Wal-Mart’s optimization problem. By abstracting from
competition and focusing on Wal-Mart’s single-agent maximization problem, Holmes is
able to exploit a perturbation approach to estimate the economies of density. The findings
suggest that these economies of density are important.
Holmes’ approach is appealing because he derives the magnitude of the economies of
density from the dynamic expansion process. In contrast, I identify the chain effect from
the stores’ geographic clustering pattern. My approach abstracts from a number of important dynamic considerations. For example, it does not allow firms to delay store openings
because of credit constraints, nor does it allow for any preemption motive as the chains
compete and make simultaneous entry decisions. A dynamic model that incorporates both
the competition effects and the chain effect would be ideal. However, given the great difficulty of estimating the economies of density in a single agent dynamic model, as Holmes
(2005) shows, it is infeasible to estimate a dynamic model that also incorporates the strategic interactions within chains and between chains and small retailers. Since one of my main
goals is to analyze the competition effects and perform policy evaluations, I adopt a twostage model in which all players make a once-and-for-all decision, with chains moving first
and small retailers moving second. I estimate the model seperately for 1988 and 1997, and
exploit the coefficient estimates from both years to analyze the impact of chains on small
retailers. The extension of the current framework to a dynamic model is left for future
research.
This chapter contributes to the entry literature initiated by Bresnahan and Reiss (1990,
1991) and Berry (1992), where researchers infer the firms’ underlying profit functions by
2.1 Introduction
11
observing their equilibrium entry decisions across a large number of markets. To the extent
that retail chains can be treated as multi-product firms whose differentiated products are
stores with different locations, this chapter relates to several recent empirical entry papers
that endogenize firms’ product choices upon entry. For example, Mazzeo (2002) considers
the quality choices of highway motels, and Seim (2005) studies how video stores soften
competition by choosing different locations. Unlike these studies, in which each firm chooses
only one product, I analyze the behavior of multi-product firms whose product spaces are
potentially large.
This chapter is also related to a large literature on spatial competition in retail markets,
for example, Pinkse et. al. (2002), Smith (2004), and Davis (2005). All of these models
take the firms’ locations as given and focus on price or quantity competition. I adopt the
opposite approach. Specifically, I assume a parametric form for the firms’ reduced-form
profit functions from the stage competition, and examine how they compete spatially by
balancing the chain effect against the competition effect of rivals’ actions on their own
profits.
Finally, the chapter is part of the growing literature on Wal-Mart, which includes Stone
(1995), Basker (2005a, 2005b), Hausman and Leibtag (2005), and Neumark et al (2005).
The remainder of the chapter is structured as follows. Section 2 provides background
information about the discount retailing sector. Section 3 describes the data set, and section
4 discusses the model. Section 5 proposes a solution algorithm for the game between chains
and small firms when there is a large number of markets. Section 6 explains the estimation
approach. Section 7 presents the results. Section 8 concludes. The appendix outlines the
2.2 Industry background
12
technical details not covered in section 5.
2.2
Industry background
Discount retailing is one of the most dynamic sectors in the retail industry. Table 1 (A)
displays some statistics for the industry from 1960 to 1997. The sales revenue for this
sector, in 2004 US dollars, skyrocketed from 12.8 billion in 1960 to 198.7 billion in 1997.
In comparison, the sales revenue for the entire retail industry increased only modestly from
511.2 billion to 1313.3 billion during the same period. The number of discount stores
multiplied from 1329 to 9741, while the number of firms dropped from 1016 to 230.
Chain stores dominate the discount retailing sector, as they do other retail sectors. In
1970, the 39 largest discount chains, with twenty-five or more stores each, operated 49.3%
of the discount stores and accounted for 41.4% of total sales. By 1989, both shares had
increased to roughly 88%. In 1997, the top 30 chains controlled about 94% of total stores
and sales.
The principal advantages of chain stores include the central purchasing unit’s ability
to buy on favorable terms and to foster specialized buying skills; the possibility of sharing
operating and advertising costs among multiple units; the freedom to experiment in one
selling unit without risk to the whole operation. Stores also frequently share their private
information about local markets and learn from one another’s managerial practices. Finally,
chains can achieve economies of scale by combining wholesaling and retailing operations
within the same business unit.
Until the late 1990s, the two most important national chains were Kmart and Wal-
2.2 Industry background
13
Mart. Each firm opened its first store in 1962. The first Kmart was opened by the varietychain Kresge. Kmart stores were a new experiment that provided consumers with quality
merchandise at prices considerably lower than those of regular retail stores. To reduce
advertising costs and to minimize customer service, these stores emphasized nationally
advertised brand-name products. Consumer satisfaction was guaranteed, and all goods
could be returned for a refund or an exchange (See Vance and Scott (1994), pp32). These
practices were an instant success, and Kmart grew rapidly in the 1970s and 1980s. By the
early 1990s, the firm had more than 2200 stores nationwide. In the late 1980s, Kmart tried
to diversify and pursued various forms of specialty retailing in pharmaceutical products,
sporting goods, office supplies, building materials, etc. The attempt was unsuccessful, and
Kmart eventually divested itself of these interests by the late 1990s. Struggling with its
management failures throughout the 1990s, Kmart maintained roughly the same number of
stores; the opening of new stores offset the closing of existing ones.
Unlike Kmart, which was initially supported by an established retail firm, Wal-Mart
started from scratch and grew relatively slowly in the beginning. To avoid direct competition
with other discounters, it focused on small towns in southern states where there were few
competitors. Starting in the early 1980s, the firm began its aggressive expansion process
that averaged 140 store openings per year. In 1991, Wal-Mart replaced Kmart as the largest
discounter. By 1997, Wal-Mart had 2362 stores (not including the wholesale clubs) in all
states, including Alaska and Hawaii.
As the discounters continue to grow, small retailers start to feel their impact. There are
extensive media reports on the controversies associated with the impact of large chains on
2.3 Data
14
small retailers and on local communities in general. As early as 1994, the United States
House of Representatives convened a hearing titled “The Impact of Discount Superstores
on Small Businesses and Local Communities.” Witnesses from mass retail associations and
small retail councils testified, but no legislation followed, partly due to a lack of concrete
evidence. In April 2004, the University of California, Santa Barbara, held a conference that
centered on the cultural and social impact of the leading discounter, Wal-Mart. In November
2004, both CNBC and PBS aired documentaries that displayed the changes Wal-Mart had
brought to the society.
2.3
Data
The available data sets dictate the modeling approach used in this chapter. Hence, I discuss
them before introducing the model.
2.3.1
Data sources
There are three main data sources. The data on discount chains come from an annual
directory published by Chain Store Guide Inc. The directory covers all operating discount
stores of more than ten thousand square feet. For each store, the directory lists its name,
size, street address, telephone number, store format, and firm affiliation.8 The U.S. industry
classification system changed from the Standard Industrial Classification System (SIC) to
the North American Industry Classification System (NAICS) in 1998. To avoid potential
inconsistencies in the industry definition, I restrict the sample period to the ten years before
8
The directory stopped providing store size information in 1997 and changed the inclusion criterion to
20,000 square feet in 1998. The store formats include membership stores, regional offices, and in later years
distribution centers.
2.3 Data
15
the classification change. As first documented in Basker (2005), the directory was not fully
updated for some years. Fortunately, it was fairly accurate for the years used in this study.
See appendix 2.10 for details.
The second data set, the County Business Patterns, tabulates at the county level the
number of establishments by employment size category for very detailed industry classifications. However, data disaggregated at the three-digit or finer SIC levels are unusable
because of data suppression due to confidentiality requirements.9 There are eight retail sectors at the two-digit SIC level: building materials and garden supplies, general merchandise
stores (or discount stores), food stores, automotive dealers and service stations, apparel
and accessory stores, furniture and home-furnishing stores, eating and drinking places, and
miscellaneous retail. I focus on small general merchandise stores with nineteen or fewer
employees, which are the direct competitors of the discount chains.
Data on county level population are downloaded from the websites of U.S. Census Bureau
(before 1990) and the Missouri State Census Data Center (after 1990). Other county level
demographic and retail sales data are from various years of the decennial census and the
economic census.
2.3.2
Market definition and data description
In this chapter, a market is defined as a county. Although the Chain Store Guide publishes
the detailed street addresses for the discount stores, information about small firms is available only at the county level. Many of the market size variables, like retail sales, are also
9
Title 13 of the United States Code authorizes the Census Bureau to conduct censuses and surveys.
Section 9 of the same Title requires that any information collected from the public under the authority of
Title 13 be maintained as confidential and that no estimates be published that would disclose the operations
of an individual firm.
2.3 Data
16
available only at the county level.
I focus on counties with an average population between 5,000 and 64,000 from 1988
to 1997. There are 2065 such counties among a total of 3140 in the U.S. According to
Vance and Scott (1994), the minimum county population for a Wal-Mart store was 5,000
in the 1980s, while Kmart concentrated in places with a much larger population. 9% of all
U.S. counties were smaller than 5,000 and were unlikely to be a potential market for either
chain, while 25% of them were large metropolitan areas with an average population of 64,000
or more. These big counties typically included multiple self-contained shopping areas, and
consumers were unlikely to travel across the entire county to shop. The market configuration
in these big counties tended to be very complex with a large number of competitors and
many market niches. For example, in the early 1990s, there were more than one hundred
big discounters and close to four hundred small general merchandise stores in Los Angeles
County, one of the largest counties. Modeling firms’ strategic behavior in these markets
requires geographic information more detailed than that provided by the county level data.
During the sample period, there were two national chains: Kmart and Wal-Mart. The
third largest chain, Target, had 340 stores in 1988 and about 800 stores in 1997. Most of
them were located in metropolitan areas in the Midwest, with on average fewer than twenty
stores in the counties studied here. I do not include Target in the analysis.10
In the sample, only eight counties had two Kmart stores and forty-nine counties had
two Wal-Mart stores in 1988; the figures were eight and sixty-six counties, respectively, in
1997. The current specification abstracts from the choice of the number of opening stores
10
The rest of the discount chains are much smaller and are all regional. They are not included in the
analysis.
2.3 Data
17
and considers only market entry decisions, as there is not enough variation in the data to
identify the profit for the second store in the same market. The algorithm proposed in
this chapter can be applied with little modification to models that also incorporate the
store-number choice.
Table 1 (B) presents summary statistics of the sample for the years 1988 and 1997.
The average county population grew from 22,470 to 24,270, an increase of 8%. Retail sales
per capita, in 1984 dollars, rose 10%, from $3,690 to $4,050. The average percentage of
urban population was 30% in 1988 and increased to 33% in 1997. About one quarter of the
counties was primarily rural with a small urban population, which is why the average across
the counties seems somewhat low. 41% of the counties were in the Midwest (which includes
the Great Lakes region, the Plains region, and the Rocky Mountain region, as defined by
the Bureau of Economic Analysis), and 50% of the counties were in the southern regions
(including the Southeast region and the Southwest region), with the rest in the Far West
and the Northeast regions. Kmart had stores in 21% of the counties at the beginning of
the sample period, and the number dropped slightly to 19% at the end. In comparison,
Wal-Mart had stores in 32% of the counties in 1988 and in 48% of them in 1997. The
average number of small firms decreased quite a bit over the same period, from 3.86 to 3.49.
The median was three, with a maximum of twenty-five small firms in 1987, and nineteen in
1997. The percentage of counties with six or more small firms dropped from 22% to 18%,
while the percentage of counties with at most one small firm increased from 18% to 22%
over the sample period. Figure 1 and Figure 2 plot out the Kmart and Wal-Mart stores
that were located in the sample counties in 1988 and 1997.
2.4 Modeling
2.4
2.4.1
18
Modeling
Model setup
The model I develop is a two-stage game with complete information. In stage one, Kmart
and Wal-Mart simultaneously choose store locations to maximize their total profits in all
markets. In stage two, small firms observe Kmart’s and Wal-Mart’s choices and decide
whether to enter the market.11 Once the entry decisions are made, firms compete and
profits are realized. All firms are fully rational with perfect knowledge of their rivals’
profitability and the payoff structure. When Kmart and Wal-Mart make location choices in
the first stage, they take into consideration the small retailers’ reaction. There are no entry
barriers; small firms enter the market until profit for an extra entrant becomes negative.
In reality, small retailers existed long before the era of the discount chains. As the chains
emerge in the retail industry, small firms either continue their operations and compete with
the chains or exit the market. This might suggest a three-stage model, with each stage
corresponding to each of these events. However, given the nature of the retail industry,
the sunk entry cost is unlikely to be significant for the small firms. In other words, their
first stage decisions are irrelevant, and the small retailers respond to chain stores’ entry
decisions in the third stage.12 In contrast, I implicitly assume that chains can commit to
their entry decisions and do not further adjust after small firms enter. This is based on the
11
I have implicitly assumed that small firms, which are firms with one to nineteen employees, are single-unit
stores.
12
It is possible that small firm owners have invested considerably in their communities to establish a
customer base and a good reputation. These kinds of investment is sunk if stores are closed. However, these
investment can be partially recovered when the firm owners switch to other retail sectors to make use of the
consumer goodwill.
2.4 Modeling
19
observation that most chain stores enter with a long-term lease of the rental property, and
in many cases they invest considerably in the infrastructure construction associated with
establishing a big store.
2.4.2
The profit function
One way to obtain the profit function is to start from primitive assumptions of supply
and demand in the retail markets, and derive the profit functions from the equilibrium
conditions. Without any price, quantity, or sales data, and with very limited information
on store characteristics, this approach is extremely demanding on data and relies heavily
on the primitive assumptions. Instead, I follow the convention in the entry literature and
assume that firms’ profit functions from the stage competition take a linear form and that
profits decline in the presence of rivals.
Let Di,m ∈ {0, 1} stand for chain i’s strategy in market m, where Di,m = 1 if chain i
operates a store in market m and Di,m = 0 otherwise. Di = {Di,1 , ..., Di,M } is a vector
indicating chain i’s location choices for the entire set of markets. Let Dj,m denote rival j’s
strategy in market m, and Ns,m the number of small firms in market m. Xm , εm , and η i,m
stand for a vector of observed market size variables, the market level profit shock, and firm
i’s private profit shock in market m, respectively. Finally, let Zml designate the distance
from market m to market l in miles, and Zm = {Zm1 , ..., ZmM }.
2.4 Modeling
20
The profit function for chain i in market m takes the following form:
Πi,m (Di , Dj,m , Ns,m ; Xm , εm , η i,m , Zm ) = Di,m ∗ [Xm β i + δ ij Dj,m +
δ is ln(Ns,m + 1) + δ ii Σl6=m
p
1 − ρ2 εm + ρη i,m ]
Di,l
+
Zml
(2.1)
where i, j ∈ {k, w}, with “k” for Kmart and “w” for Wal-Mart. Note the presence of Di in
Πi,m (·): profit in market m depends on the number of stores chain i has in other markets.
Profit from staying outside the market is normalized to 0. Chains maximize their total
profits in all markets Σm Πi,m . In equilibrium, the number of small firms is a function of
Kmart’s and Wal-Mart’s first stage decisions: Ns,m (Dk,m , Dw,m ). When making location
choices, the chains take into consideration the impact of small firms’ reactions on their own
profits.
There are several components in chain i’s profit Πi,m in market m: the observed market
size Xm β i that is parameterized by demand shifters, like population, the extent of urbanization, etc.; the unobserved profit shock
p
1 − ρ2 εm + ρη i,m , known to the firms but unknown
to the econometrician; the competition effects δ ij Dj,m + δ is ln(Ns,m + 1), as well as the
D
chain effect δ ii Σl6=m Ż i,l . Notice that the observed market size component Xm β i is allowed
ml
to differ for different players. Xm includes all factors that influence profits, and β i picks up
the factors that are relevant for firm i. For example, Kmart might have some advantage
in the Midwest, while Wal-Mart stores might be more profitable in markets close to their
headquarters.
The unobserved profit shock has two elements: εm , the market-level profit shifter that
2.4 Modeling
21
affects both chains and small firms, and η i,m , a firm-specific profit shock. εm is assumed to
be i.i.d. across markets, while ηi,m is assumed to be i.i.d. across both firms and markets.
p
1 − ρ2 (with 0 ≤ ρ ≤ 1) measures how important the market component is. In principle,
it can differ for each chain and for small firms. For example, the market specific business
environment — how developed the infrastructure is, whether the market has sophisticated
shopping facilities, and the stance of the local community toward large corporations including big retailers — might matter more to chains than to small firms. In the baseline
specification, I restrict ρ to be the same across all players. Relaxing it does not improve
the fit much. ηi,m incorporates the unobserved store level heterogeneity, including the management ability, the display style and shopping environment, employees’ morale or skills,
etc. As is standard in discrete choice models, the scale of the parameter coefficients and
the variance of the error term are not separately identified. I normalize the variance of the
error term to 1 by assuming that both εm and η i,m are standard normal random variables.
The competition effect from the rival chain is captured by δ ij Dj,m , where Dj,m is one if
rival j operates a store in market m. δ is ln(Ns,m + 1) denotes the effect of small firms on
chain i’s profit. The addition of 1 in ln(Ns,m + 1) is used to avoid ln 0 for markets without
any small firms. The log form allows the incremental competition effect to taper off when
there are many small firms.
D
i,l
The last unexplained term in the bracket, δ ii Σl6=m Zml
, captures the chain effect, the
benefit that having stores in other markets generates for the profitability in market m. δ ii
is assumed to be non-negative. Stores split the costs of operation, delivery, and advertising
among nearby ones to achieve scale economies. They also share knowledge of the localized
2.4 Modeling
22
markets and learn from one another’s managerial success. All these factors suggest that
having stores nearby benefits the operation in market m, and that the benefit declines with
the distance. Following Bajari and Fox (2005), I divide the spillover effect by the distance
D
i,l
between the two markets Zml , so that profit in market m is increased by δ ii Zml
if there is
a store in market l that is Zml miles away. This simple formulation serves two purposes.
First, it is a parsimonious way to capture the fact that it might be increasingly difficult
to benefit from stores that are farther away. Second, the econometric technique exploited
in the estimation requires the dependence among the observations to die away sufficiently
fast. I also assume that the chain effect takes place among counties whose centroids are
within fifty miles, or roughly an area that expands seventy-five miles in each direction.
Including counties within a hundred miles increases substantially the computing time with
little change in the parameters.
This chapter focuses on the chain effect that is “localized” in nature. Some chain effects
are “global”: for example, the gain that arises from a chain’s ability to buy a large volume at
a discount. The latter benefits affect all stores the same, and cannot be separately identified
from the constant of the profit function. Hence, the estimates δ ii , should be interpreted as
a lower bound to the actual advantages enjoyed by a chain.
Profit for a small firm that operates in market m is:
Πs,m (Dk,m , Dw,m , Ns,m ; Xm , εm , η s,m ) = Xm β s + Σi=k,w δ si Di,m + δ ss ln(Ns,m )
p
+ 1 − ρ2 εm + ρη s,m
(2.2)
All small firms are symmetric with the same profit function Πs,m (·). For markets with no
2.4 Modeling
23
small firms, the entry condition implies that profit for a single small firm is negative: Xm β s +
Σi=k,w δ si Di,m +
p
1 − ρ2 εm +ρη s,m < 0. For markets with Ns,m small firms, Πs,m (Ns,m ) ≥ 0
and Πs,m (Ns,m +1) < 0. The term δ ss ln(Ns,m ) captures the competition among small firms,
while Σi=k,w δ si Di,m denotes the impact of Kmart and Wal-Mart on small firms. The static
nature of the model does not allow separate identification of the different channels through
which the competition effect takes place. For example, one can’t tell how much of the
competition effect is due to the forced exit of small firms, and how much is due to the
preemption that reduces entry of small firms.
The market-level error term εm makes the location choices of the chain stores Dk,m and
Dw,m , and the number of small firms Ns,m endogenous in the profit functions, since a large
εm leads to more entries of both chains and small firms. I explicitly address the issue of
endogeneity by solving chains’ and small firms’ entry decisions simultaneously within the
model. To estimate only the competition effect of big retailers on small firms δ si , without
analyzing the equilibrium consequences of policy changes, it suffices to regress the number
of small stores on market size variables, together with the number of chain stores, and to use
instruments to correct the OLS bias of the competition effect. However, valid instruments
for the presence of each of the rivals are difficult to find. Researchers have experimented
with distance to headquarters or stores’ planned opening dates to instrument for Wal-Mart’s
entry decisions.13 It is much more difficult to find good instruments for Kmart. The R2
of regressing Kmart stores’ locations on their distance to headquarters is less than 0.005.
Another awkward feature of the linear IV regression is that the predicted number of small
13
See Neumark (2005) and Basker (2005a).
2.5 Solution algorithm
24
firms can be negative. Limited dependent variable estimation avoids this problem, but
accounting for endogeneity in the discrete games requires strong assumptions about the
nature of the endogeneity that are not satisfied by the current model. Perhaps the best
argument for the current approach, besides the possibility of analyzing policy experiments
and studying the spillover among the chain stores, is that the structural model can exploit
the chain effect to help with identification. The chain effect drives entry decisions of chain
stores, but is not related to small firms’ entry decisions, and serves as a natural excluded
variable in the identification of the chains’ competition effects on small firms.
Note that the above specification allows very flexible competition patterns among all the
possible firm-pair combinations. The parameters to be estimated are {β i , δ ij , δ ii , ρ}, i, j ∈
{k, w, s}, and the central parameters are the competition effects δ ij , i, j ∈ {k, w, s}, i 6= j
and the chain effects δ ii , i ∈ {k, w}.
2.5
Solution algorithm
D
i,l
The unobserved market level profit shock εm , together with the chain effect δ ii Σl6=m Zml
,
renders all of the discrete variables Di,m , Dj,m , Di,l , and Ns,m endogenous in the profit
functions (2.1) and (2.2). Finding the Nash equilibrium of this game is complicated. I take
several steps to address this problem. Section 2.5.1 explains how to find each chain’s best
response conditioning on rivals’ choices, section 2.5.2 derives the solution algorithm for the
game between two chains, and section 2.5.3 adds the small retailers and solves for the Nash
equilibrium of the full model.
2.5 Solution algorithm
2.5.1
25
The best response function
In this subsection, let us focus on the chain’s single-agent problem and abstract from competition. In the next two subsections I incorporate competition and solve the model for all
players.
For notational simplicity, I have suppressed the firm subscript i and used Xm instead of
Xm β i +
p
1 − ρ2 εm + ρη i,m in the profit function throughout this subsection. Let M denote
the total number of markets, and let D = {0, 1}M denote the choice set. An element of the
set D is an M -coordinate vector D = {D1 , ..., DM }. The profit-maximization problem is:
¸
M ∙
X
Dl
Π=
)
max
Dm ∗ (Xm + δΣl6=m
Zml
D1, ...,DM ∈{0,1}
m=1
The choice variable Dm appears in the profit function in two ways. First, it directly determines profit in market m: the firm earns Xm + δΣl6=m ZDmll if Dm = 1, and zero if Dm = 0.
Second, the decision to open a store in market m increases the profits in other markets
through the chain effect.
The complexity of this maximization problem is twofold: first, it is a discrete problem
of large dimension. In the current application, with M = 2065 and two choices for each
market (enter or stay outside), the number of possible elements in the choice set D is 22065 , or
roughly 10600 . The naive approach that evaluates all of them to find the profit-maximizing
vector(s) is infeasible. Second, the profit function is irregular: it is neither concave nor
convex. Consider the relaxed function where Dm takes real values, rather than integers
{0, 1}. The Hessian of this function is indefinite, and the usual first-order condition does
2.5 Solution algorithm
26
not guarantee an optimum.14 Even if one could exploit the first-order condition, the search
with a large number of choice variables is a daunting task.
Instead of solving the problem directly, I transform it into a search for the fixed points
of the necessary conditions for profit maximization. In particular, I exploit the lattice
structure of the set of fixed points of an increasing function and propose an algorithm that
obtains an upper bound DU and a lower bound DL for the profit-maximizing vector(s) .
With these two bounds at hand, I evaluate all vectors that lie between them to find the
profit-maximizing ones.
The set of profit maximizing vectors may not be a singleton. For example, in the case
of two markets with X1 = −1, X2 = −1, δ = 1, and Z1,2 = Z1,2 = 1, both D∗ = {0, 0}
and D∗∗ = {1, 1} maximize the total profit. Here I assume there is only one solution.
In appendix 2.9.5, I show that allowing multiple optimal solutions is a straightforward
extension.
Throughout this chapter, the comparison between vectors is coordinate-wise. A vector
D is bigger than vector D0 if and only if every element of D is weakly bigger: D ≥ D0 if
0 ∀m. D and D 0 are unordered if neither D ≥ D0 nor D ≤ D0 . They
and only if Dm ≥ Dm
are the same if both D ≥ D0 and D ≤ D0 .
Let the profit maximizer be denoted D∗ = arg maxD∈D Π(D). The optimality of D∗
implies that profit at D∗ must be (weakly) higher than the profit at any one-market devia-
14
A symmetric matrix is positive (negative) semidefinite iff all the eigenvalues are non-negative (nonpositive). The Hessian of the profit function (2.1) is a symmetric matrix with zero for all the diagonal
elements. Its trace, which is equal to the sum of the eigenvalues, is zero. If the Hessian matrix has a positive
eigenvalue, it has to have a negative one as well. There is only one possibility for the Hessian to be positive
(or negative) semidefinite, which is that all the eigenvalues are 0. This is true only for the zero matrix H=0.
2.5 Solution algorithm
27
tion:
∗
∗
∗
Π(D1∗ , ..., Dm
, ..., DM
) ≥ Π(D1∗ , ..., Dm , ..., DM
), ∀m
which leads to:
∗
Dm
= 1[Xm + 2δΣl6=m
Dl∗
≥ 0], ∀m
Zml
(2.3)
The derivation of equation (2.3) is left to appendix 2.9.1. These conditions have the usual
D∗
l
interpretation that Xm + 2δΣl6=m Zml
is market m’s marginal contribution to total profit.
This equation system is not definitional; it is a set of necessary conditions for the optimal
vector D∗ . Not all vectors that satisfy (2.3) maximize profit, but if D∗ maximizes profit, it
must satisfy these constraints.
Define Vm (D) = 1[Xm + 2δΣl6=m ZDmll ≥ 0], and V (D) = {V1 (D), ..., VM (D)}. V (·) is a
vector function that maps from D into itself: V : D → D. It is an increasing function:
V (D0 ) ≥ V (D00 ) whenever D0 ≥ D00 . By construction, the profit maximizer D∗ is one of
V (·)’s fixed points. The following theorem, proved by Tarski (1955), states that the set of
fixed points of an increasing function that maps from a lattice into itself is a lattice and has
a greatest point and a least point. Appendix 2.9.2 describes the basic lattice theory.
Theorem 2.1. Suppose that Y (X) is an increasing function from a nonempty complete
lattice X into X.
(a) The set of fixed points of Y (X) is nonempty, supX ({X ∈ X, X ≤ Y (X)}) is the
greatest fixed point, and inf X ({X ∈ X, Y (X) ≤ X}) is the least fixed point.
(b) The set of fixed points of Y (X) in X is a nonempty complete lattice.
A lattice in which each nonempty subset has a supremum and an infimum is complete.
2.5 Solution algorithm
28
Any finite lattice is complete. A nonempty complete lattice has a greatest and a least
element. Since the choice set D is a finite lattice, it is complete, and Theorem 2.1 can be
directly applied. Several points are worth mentioning. First, X can be a closed interval or
it can be a discrete set, as long as the set includes the greatest lower bound and the least
upper bound for any of its nonempty subsets. That is, it is a complete lattice. Second,
the set of fixed points is itself a nonempty complete lattice, with a greatest and a smallest
point. Third, the requirement that Y (X) is “increasing” is crucial; it can’t be replaced by
assuming that Y (X) is a monotone function. Appendix 2.9.2 provides a counterexample
where the set of fixed points for a decreasing function is empty.
Now I outline the algorithm that delivers the greatest and the least fixed point of V (D),
which are, respectively, an upper bound and a lower bound for the optimal solution vector
D∗ . To find D∗ , I rely on an exhaustive search among the vectors lying between these two
bounds.
Start with D0 = sup(D) = {1, ..., 1}. The supremum exists because D is a complete
lattice. Define a sequence {Dt } : D1 = V (D0 ), and Dt+1 = V (Dt ). By the construction of
D0 , we have: D0 ≥ V (D0 ) = D1 . Since V (·) is an increasing function, V (D0 ) ≥ V (D1 ),
or D1 ≥ D2 . Iterating this process several times generates a decreasing sequence: D0 ≥
D1 ≥ ... ≥ Dt . Given that D0 has only M distinct elements and at least one element of the
D vector is changed from 1 to 0 in each iteration, the process converges within M steps:
DT −1 = DT , T ≤ M. Let DU denote the convergent vector. DU is a fixed point of the
function V (·) : DU = V (DU ). To show that DU is indeed the greatest element of the set of
fixed points, note that D0 ≥ D0 , where D0 is an arbitrary element of the set of fixed points.
2.5 Solution algorithm
29
Applying the function V (·) to the inequality T times, we have DU = V T (D0 ) ≥ V T (D0 ) =
D0 .
Using the dual argument, one can show that the convergent vector derived from D0 =
inf(D) = {0, ..., 0} is the least element in the set of fixed points. Denote it by DL . In
appendix 2.9.3, I show that starting from the solution to a constrained version of the profit
maximization problem yields a tighter lower bound. There I also illustrate how a tighter
upper bound can be obtained by starting with a vector D̃ such that D̃ ≥ D∗ and D̃ ≥ V (D̃).
With the two bounds DU and DL at hand, I evaluate all vectors that lie between them
and find the profit-maximizing vector D∗ .
2.5.2
The maximization problem with two competing chains
The discussion in the previous subsection abstracts from rival-chain competition and considers only the chain effect. With the competition effect from the rival-chain, the profit
D
i,l
function for chain i becomes: Πi (Di , Dj ) = ΣM
m=1 [Di,m ∗ (Xim + δ ii Σl6=m Zml + δ ij Dj,m )],
where Xim contains Xm β i +
p
1 − ρ2 εm + ρη i,m .
To address the interaction between the chain effect and the rival-chain competition effect,
I invoke the following theorem from Topkis (1978), which states that the best response
function is decreasing in the rival’s strategy when the payoff function is supermodular and
has decreasing differences. Specifically:15
Theorem 2.2. If X is a lattice, K is a partially ordered set, Y (X, k) is supermodular in
X on X for each k in K, and Y (X, k) has decreasing differences in (X, k) on X × K, then
15
The original theorem is stated in terms of Π(D, t) having increasing differences in (D, t), and
arg maxD∈D Π(D, t) increasing in t. Replacing t with −t yields the version of the theorem stated here.
2.5 Solution algorithm
30
arg maxX∈X Y (X, k) is decreasing in k on {k : k ∈ K, arg maxX∈X Y (X, k) is nonempty}.
Y (X, k) has decreasing differences in (X, k) on X × K if Y (X, k00 ) − Y (X, k0 ) is decreasing in X ∈ X for all k0 ≤ k 00 in K. Intuitively, Y (X, k) has decreasing differences
in (X, k) if X and k are substitutes. In appendix 2.9.4, I verify that the profit function
D
i,l
Πi (Di , Dj ) = ΣM
m=1 [Di,m ∗(Xim +δ ii Σl6=m Zml +δ ij Dj,m )] is supermodular in its own strategy
Di and has decreasing differences in (Di , Dj ). From Theorem 2.2, chain i’s best response
correspondence arg maxDi ∈Di Πi (Di , Dj ) decreases in rival j’s strategy Dj . Similarly for
chain j’s best response to i’s strategy.
As the simple example in section 2.5.1 illustrates, given the rival’s strategy Dj ,
arg maxDi ∈D Πi (Di , Dj ) can contain more than one element. For the moment, assume
that arg maxDi ∈D Πi (Di , Dj ) is a singleton for any given Dj . Appendix 2.9.5 discusses the
case in which arg maxDi ∈D Πi (Di , Dj ) has multiple elements. The extension involves the
concepts of set ordering and increasing (decreasing) selection, but is fairly straightforward.
The set of Nash equilibria of a supermodular game is nonempty and it has a greatest
element and a least element.16,17 The current entry game is not supermodular, as the profit
function has decreasing differences in the joint strategy space D × D. This leads to a nonincreasing joint best response function, and we know from the discussion after Theorem 2.1
that a non-increasing function on a lattice can have an empty set of fixed points. A simple
transformation, however, restores the supermodularity property of the game. The trick is
to define a new strategy space for one player (for example, Kmart) to be the negative of
16
17
See Topkis (1978) and Zhou (1994).
A game is supermodular if the payoff function Πi (Di , D−i ) is supermodular in Di for each D−i and each
player i, and Πi (Di , D−i ) has increasing differences in (Di , D−i ) for each i.
2.5 Solution algorithm
31
e k = −Dk . The profit function can be re-written as:
the original space. Let D
Πk (−Dk , Dw ) = Σm (−Dk,m ) ∗ [−Xkm + δ kk Σl6=m
Πw (Dw , −Dk ) = Σm Dw,m ∗ [Xwm + δ ww Σl6=m
−Dk,l
+ (−δ kw )Dw,m ]
Zml
Dw,l
+ (−δ wk )(−Dk,m )]
Zml
e k , Dw ) is supermoduIt is easy to verify that the game defined on the new strategy space (D
e k = −Dk , one can find
lar, therefore a Nash equilibrium exists. Using the transformation D
the corresponding equilibrium in the original strategy space. In the following paragraphs, I
explain how to find the desired Nash equilibrium directly in the space of (Dk , Dw ) using the
“Round-Robin” algorithm, where each player proceeds in turn to update its own strategy.18
To obtain the equilibrium most profitable for Kmart, start with the smallest vec0 = inf(D) = {0, ..., 0}. Derive Kmart’s best retor in Wal-Mart’s strategy space: Dw
0 ) = arg max
0
0
sponse K(Dw
Dk ∈D Πk (Dk , Dw ) given Dw , using the method outlined in section
0 ). Similarly, find Wal-Mart’s best response W (D 1 ) =
2.5.1, and denote it by Dk1 = K(Dw
k
arg maxDw ∈D Πw (Dw , Dk1 ) given Dk1 , again using the method in section 2.5.1, and denote
1 . Note that D 1 ≥ D0 , by the construction of D0 . This finishes the first iteration
it by Dw
w
w
w
1 }.
{Dk1 , Dw
1 and solve for Kmart’s best response D2 = K(D1 ). By Theorem 2.2, Kmart’s
Fix Dw
w
k
1 ) ≤ D 1 = K(D0 ). The same
best response decreases in the rival’s strategy, so Dk2 = K(Dw
w
k
2 ≥ D 1 . Iterating the process generates two monotone sequences:
argument shows that Dw
w
1 ≤ D2 ≤ ... ≤ D t . In every iteration, at least one element of
Dk1 ≥ Dk2 ≥ ... ≥ Dkt , Dw
w
w
the Dk vector is changed from 1 to 0, and one element of the Dw vector is changed from
18
See Topkis (1998) for a detailed discussion.
2.5 Solution algorithm
32
T = D T −1 , T ≤ M.
0 to 1, so the algorithm converges within M steps: DkT = DkT −1 , Dw
w
T ) constitute an equilibrium: D T = K(D T ), D T = W (D T ).
The convergent vectors (DkT , Dw
w
w
k
k
Furthermore, this equilibrium gives Kmart the highest profit among the set of all equilibria.
T ) obtained using D0 = {0, ..., 0} to all
That Kmart prefers the equilibrium (DkT , Dw
w
T ≤ D∗ for any D ∗ that belongs to an
other equilibria follows from two results: first, Dw
w
w
equilibrium; second, Πk (K(Dw ), Dw ) decreases in Dw , where K(Dw ) denotes Kmart’s best
T ) ≥ Π (D∗ , D ∗ ), ∀ {D ∗ , D ∗ } that
response function. Together they imply that Πk (DkT , Dw
k
w
w
k
k
belongs to the set of Nash equilibria.
0 ≤ D ∗ , by the construction of D 0 .
To show the first result, note that Dw
w
w
Since
0 ) ≥ K(D∗ ) = D ∗ . Similarly, D1 = W (D 1 ) ≤
K(Dw ) decreases in Dw , Dk1 = K(Dw
w
w
k
k
∗ . Repeating this process T times leads to DT = K(DT ) ≥ K(D ∗ ) = D∗ ,
W (Dk∗ ) = Dw
w
w
k
k
T = W (D T ) ≤ W (D ∗ ) = D ∗ . The second result follows from Π (K(D∗ ), D∗ ) ≤
and Dw
k
w
w
w
k
k
∗ ), DT ) ≤ Π (K(D T ), D T ). The first inequality holds because Kmart’s profit funcΠk (K(Dw
k
w
w
w
tion decreases in its rival’s strategy, while the second inequality follows from the definition
of the best response function K(Dw ).
By the dual argument, starting with Dk0 = inf(D) = {0, ..., 0} delivers the equilibrium
that is most preferred by Wal-Mart. To search for the equilibrium that favors Wal-Mart in
the southern region and Kmart in the rest of the country, one uses the same algorithm to
solve the game separately for the south and the other regions.
2.5.3
Adding small firms
Incorporating small firms into the game is a straightforward application of backward induction, since the number of small firms in the second stage is a well-defined function
2.6 Empirical implementation
33
Ns (Dk , Dw ). Chain i’s profit function now becomes Πi (Di , Dj ) = ΣM
m=1 [Di,m ∗ (Xim +
D
i,l
δ ii Σl6=m Zml
+ δ ij Dj,m + δ is ln(Ns (Di,m , Dj,m ) + 1)], where Xim is defined in the previous
subsection. The profit function Πi (Di , Dj ) remains supermodular in Di with decreasing
differences in (Di , Dj ) under a minor assumption, which essentially requires that the net
competition effect of rival Dj on chain i’s profit is negative.19
The main computational burden in solving the full model with both chains and small
retailers is the search for the best responses K(Dw ) and W (Dk ). In appendix 2.9.6, I discuss
a few technical details related with the implementation.
2.6
2.6.1
Empirical implementation
Estimation
The model does not yield a closed form solution to firms’ location choices conditioning on
market size observables and a given vector of parameter values. Hence I turn to simulation
methods. The ones most frequently used in the I.O. literature are the method of simulated
log-likelihood (MSL) and the method of simulated moments (MSM).
Implementing MSL is difficult because of the complexities in obtaining an estimate of the
log-likelihood of the observed sample. The cross-sectional dependence among the observed
outcomes in different markets indicates that the log-likelihood of the sample is no longer
the sum of the log-likelihood of each market, and one needs an exceptionally large number
of simulations to get a reasonable estimate of the sample’s likelihood. Thus I adopt the
19
If we ignore the integer problem and approximate ln(Ns + 1) by −(Xsm + δ sk Dk + δ sw Dw ), then the
δ sw
δ sk
assumption is: δ kw − δksδss
< 0, δwk − δws
< 0. Essentially, these two conditions imply that when there
δss
are small stores, the ‘net’ competition effect of Wal-Mart (its direct impact, together with its indirect impact
working through small stores) on Kmart’s profit and that of Kmart on Wal-Mart’s profit are still negative.
2.6 Empirical implementation
34
MSM method to estimate the parameters in the profit functions θ0 = {β i , δ ii , δ ij , ρ}i=k,w,s ∈
Θ ⊂ RP . The following moment condition is assumed to hold at the true parameter value
θ0 :
E[g(Xm , θ0 )] = 0
where g(Xm , ·) ∈ RL with L ≥ P is a vector of moment functions that specifies the differences between the observed equilibrium market structures and those predicted by the
model.
A MSM estimator θ̂, minimizes a weighted quadratic form in ΣM
m=1 ĝ(Xm , θ) :
1
min
θ∈Θ M
∙
M
P
m=1
¸0
∙ M
¸
P
ĝ(Xm , θ) Ω
ĝ(Xm , θ)
(2.4)
m=1
where ĝ(·) is a simulated estimate of the true moment function, and Ω is an L × L positive
p
semidefinite weighting matrix. Assume Ω → Ω0 , an L × L positive definite matrix. Define
the L × P matrix G0 = E[∇θ g(Xm , θ0 )]. Under some mild regularity conditions, Pakes and
Pollard (1989) and McFadden (1989) show that:
√
d
−1
M (θ̂ − θ0 ) → Normal(0, (1 + R−1 ) ∗ A−1
0 B0 A0 )
(2.5)
where R is the number of simulations, A0 ≡ G00 Ω0 G0 , B0 = G00 Ω0 Λ0 Ω0 G0 , and Λ0 =
E[g(Xm , θ0 )g(Xm , θ0 )0 ] =Var[g(Xm , θ0 )]. If a consistent estimator of Λ−1
0 is used as the
weighting matrix, the MSM estimator θ̂ is asymptotically efficient, with its asymptotic
−1
variance being Avar(θ̂) = (1 + R−1 ) ∗ (G00 Λ−1
0 G0 ) /M.
The obstacle in using this standard MSM method is that the moment functions g(Xm , ·)
2.6 Empirical implementation
35
are no longer independent across markets when the chain effect induces spatial correlation
in the equilibrium outcome. For example, Wal-Mart’s entry decision in Benton County,
Arkansas directly relates to its entry decision in Carroll County, Arkansas, Benton’s neighbor. In fact, any two entry decisions, Di,m and Di,l , are correlated because of the chain
effect, although the dependence becomes very weak when market m and market l are far
apart, since the benefit
Di,l
Zml
evaporates with distance.
The MSM estimator remains consistent with such dependent data, but the covariance
matrix needs to be corrected. In particular, the asymptotic covariance matrix of the moment
functions Λ0 in equation (2.5) should be replaced by Λd0 = Σs∈M E[g(Xm , θ0 )g(Xs , θ0 )0 ].
Conley (1999) proposes a nonparametric covariance matrix estimator formed by taking a
weighted average of spatial autocovariance terms, with zero weights for observations farther
than a certain distance. The method requires the underlying data generating process to
satisfy a mixing condition that the dependence among observations dies away quickly as
the distance increases. Following Conley (1999) and Conley & Ligon (2002), the estimator
of Λd0 is:
Λ̂ ≡
£
¤
1
Σm Σs∈Bm ĝ(Xm , θ)ĝ(Xs , θ)0
M
(2.6)
where Bm is the set of markets whose centroid is within fifty miles of market m, including
market m. I have also estimated the variance of the moment functions Λ̂ summing over
markets within a hundred miles. All of the parameters that are significant with the smaller
set of Bm remain significant, and the changes in the t-statistics are very small.
The estimation procedure is as follows. Start from some initial guess of the parameter
values, and draw from the normal distribution four independent vectors: a vector of the
2.6 Empirical implementation
36
M
M
market level errors {εm }M
m=1 and three vectors of firm-specific errors {η k,m }m=1 , {η w,m }m=1 ,
and {η s,m }M
m=1 . Obtain the simulated profits Π̂i , i = k, w, s and solve for D̂k , D̂w , N̂s . Repeat
the simulation R times and formulate ĝ(Xm , θ). Search for parameter values that minimize
the objective function (2.4), while using the same set of simulation draws for all values of θ.
To implement the two-step efficient estimator, I use the identity weighting matrix to find a
preliminary estimate θ̃, which is then substituted in equation (2.6) to compute the optimal
weight matrix Λ̂−1 for the second step.
Instead of the usual machine-generated pseudo-random draws, I use Halton draws, which
have better coverage properties and smaller simulation variances.20 According to Train
(2000), 100 Halton draws achieves greater accuracy in his mixed logit estimation than 1000
pseudo-random draws. The parameter estimation exploits 150 Halton simulation draws
while the variance is calculated with 300 Halton draws.
There are twenty-six parameters with the following set of moments that match the
predicted and the observed values of a) numbers of Kmart stores, Wal-Mart stores, and
small firms; b) various kinds of market structures (for example, only a Wal-Mart store
but no Kmart stores); c) the number of chain stores in the nearby markets; and d) the
interaction between the market size variables and the above items.
20
A Halton sequence is defined in terms of a given number, usually a prime. As an illustration, consider
the prime 3. Divide the unit interval evenly into three segments. The first two terms in the Halton sequence
are the two break points: 13 and 23 . Then divide each of these three segments into thirds, and add the
break points for these segments into the sequence in a particular way: 13 , 23 , 19 , 49 , 79 , 29 , 59 , 89 . Note that the
lower break points in all three segments ( 19 , 49 , 79 ) are entered in the sequence before the higher break points
( 29 , 59 , 89 ). Then each of the 9 segments is divided into thirds, and the break points are added to the sequence:
1 2 1 4 7 2 5 8 1 10 19 4 13 22
, , , , , , , , , , , , , , and so on. This process is continued for as many points as the
3 3 9 9 9 9 9 9 27 27 27 27 27 27
researcher wants to obtain. See chapter 9 of “Discrete Choice Methods with Simulation (2003)” by Kenneth
Train for an excellent discussion of the Halton draws.
2.6 Empirical implementation
2.6.2
37
Discussion: a closer look at the assumptions
Now I discuss several assumptions of the model: the game’s information structure and issues
of multiple equilibria, the symmetry assumption for small firms, and the non-negativity of
the chain effect.
Information structure and multiple equilibria
In the empirical entry literature, a common approach is to assume complete information and
simultaneous entry. One problem with this approach is the presence of multiple equilibria,
which has posed considerable challenges to estimation. Some researchers look for features
that are common among different equilibria. For example, Bresnahan and Reiss (1990
and 1991) and Berry (1992) point out that although firm identities differ across different
equilibria, the number of entering firms might be unique. Grouping different equilibria by
their common features leads to a loss of information and less efficient estimates. Further,
common features are increasingly difficult to find when the model becomes more realistic.
Others give up point identification of parameters and search for bounds, as in Andrews,
Berry and Jia (2004), Chernozhukov, Hong, and Tamer (2004), Pakes, Porter, Ho, and
Ishii (2005), Shaikh (2005). However, a meaningful bound might be difficult to obtain in
complicated models as the one employed here, which involves three sets of profit functions
with twenty-six parameters.
Given the above considerations, I choose an equilibrium that seems reasonable a priori.
In the baseline specification, I estimate the model using the equilibrium that is most profitable for Kmart because Kmart derives from an older entity and historically might have
had a first-mover advantage. As a robustness check, I experiment with two other cases. The
2.6 Empirical implementation
38
first one chooses the equilibrium that is most profitable for Wal-Mart. This is the direct
opposite of the baseline specification and is inspired by the hindsight of Wal-Mart’s success.
The second one selects the equilibrium that is most profitable for Wal-Mart in the south
and most profitable for Kmart in the rest of the country. This is based on the observation
that the northern regions had been Kmart’s backyard until recently while Wal-Mart started
its business from the south and has expertise in serving the southern population. The estimated parameters for the different cases are very similar to one another, which provides
evidence that the results are robust to the equilibrium choice.
The symmetry assumption for small firms
I have assumed that all small firms are symmetric with the same profit function. The
assumption is necessitated by data availability, since I do not observe any firm characteristics
for small firms. Making this assumption greatly simplifies the complexity of the model with
asymmetric competition effects, as it guarantees that in the second stage the equilibrium
number of small firms in each market is unique.
The chain effect δ ii
The assumption that δ ii ≥ 0, i ∈ {k, w} is crucial to the solution algorithm, since it implies
that the function V (D) defined by the necessary condition (2.3) is increasing, and that the
profit function (2.1) is supermodular in chain i’s own strategy. These results allow me to
employ two powerful theorems — Tarski’s fixed point theorem and Topkis’s theorem — to
solve a complicated problem that is otherwise unmanageable. The parameter δ ii does not
have to be a constant. It can be region specific, or it can vary with the size of each market
2.7 Results
39
(for example, interacting with population), as long as it is weakly positive. However, the
algorithm breaks down if either δ kk or δ ww becomes negative, and it excludes scenarios
where the chain effect is positive in some regions and negative in others.
The discussion so far has focused on the beneficial aspect of locating stores close to each
other. In practice, stores begin to compete for consumers when they get too close. As a
result, chains face two opposing forces when making location choices: the chain effect and
the business stealing effect. It is conceivable that in some areas stores are so close that the
business stealing effect outweighs the gains and δ ii becomes negative.
Holmes (2005) estimates that for places with a population density of 20,000 people per
five-mile radius (which is comparable to an average city in my sample counties), 89% of
the average consumers visits a Wal-Mart right near by.21 When the distance increases to 5
miles, 44% of the consumers visits the store. The percentage drops to 7% if the store is 10
miles away. Survey studies also show that few consumers drive further than 10-15 miles for
general merchandise shopping. In my sample, the median distance to the nearest store is
21 miles for Wal-Mart stores, and 27 miles for Kmart stores. It seems reasonable to think
that the business stealing effect, if it exists, is small.
2.7
2.7.1
Results
Parameter estimates
The sample includes 2065 small- and medium-sized counties with populations between 5,000
and 64,000. Even though I do not model Kmart’s and Wal-Mart’s entry decisions in other
21
This is the result from a simulation exercise where the distance is set to 0 mile.
2.7 Results
40
counties, I incorporate into the profit function the spillover from stores outside the sample.
This is especially important for Wal-Mart, as the number of Wal-Mart stores in big counties doubled over the sample period. Table 1 (C) displays the summary statistics of the
D
k,l
distance weighted numbers of adjacent Kmart stores Σl6=m,l∈Bm Zml
and Wal-Mart stores
Σl6=m,l∈Bm
Dw,l
Zml ,
which measure the spillover from nearby stores (including stores outside the
sample). In 1997, the Kmart spillover variable remained roughly the same as in 1988, but
the Wal-Mart spillover variable was almost twice as big as in 1988.
The profit functions of all retailers share three common explanatory variables: log of
population, log of real retail sales per capita, and the percentage of population that is urban.
Many studies have found a pure size effect: there tend to be more stores in a market as the
population increases. Retail sales per capita capture the “depth” of a market and explain
firm entry behavior better than personal income does. The percentage of urban population
measures the degree of urbanization. It is generally believed that urbanized areas have more
shopping districts that attract big chain stores.
For Kmart, the profit function includes a dummy variable for the Midwest regions.
Kmart’s headquarters are located in Troy, Michigan. Until the mid 1980s, this region had
always been the “backyard” of Kmart stores. Similarly, Wal-Mart’s profit function includes
a southern dummy, as well as the log of distance in miles to its headquarters in Benton,
Arkansas. This distance variable turns out to be a useful predictor for Wal-Mart stores’
location choices. For small firms, everything else equal, there are more small firms in the
southern states. It could be that there have always been fewer big retail stores in the
southern regions and that people rely on neighborhood small firms for day-to-day shopping.
2.7 Results
41
All coefficients (the market size coefficients β i , the competitive effects δ ij , and the chain
effect δ ii ) are allowed to be firm-specific. I report the probit coefficients (for Kmart and WalMart) and the ordered probit coefficients (for small retailers) in Table 2 and the coefficients
from the full model in Table 3. There are six probit (ordered probit) regressions, one for
each player in each year. The market size coefficients have the same signs as those from the
full model, but the competition effects are biased toward zero and the chain effects have
the wrong signs.
Table 3 lists the parameter estimates from the full model for 1988 and 1997. In this
subsection I focus on the β’s; in the next one I discuss δ ij and δ ii in detail. Coefficients for
market size variables are highly significant and intuitive, with the exception of the urban
variable in the small firms’ profit function, which suggests fewer small firms locate in more
urbanized areas. ρ is much smaller than 1, indicating the importance of the market level
error terms and the necessity of controlling for endogeneity of all firms’ entry decisions.
The model is estimated three times, each time with a different equilibrium. Tables 4
(A) and 4 (B) present the three sets of estimates for 1988 and 1997, respectively. In both
tables, column one corresponds to the equilibrium most preferred by Kmart; column two
uses the equilibrium most preferred by Wal-Mart; column three chooses the one that grants
Wal-Mart an advantage in the southern regions and Kmart an advantage in the rest of the
country. The estimates are very similar across the different equilibria.
Tables 5(A) and 5(B) display the model’s goodness of fit. In Table 5(A), columns one
and three display the sample averages, while the other two columns list the model’s predicted
averages. The model matches exactly the actual average numbers of Kmart and Wal-Mart
2.7 Results
42
stores for 1988, and comes very close to them for 1997. The number of small firms is a noisy
variable and is much harder to predict. Its sample median was 3, but the maximum was 25
in 1988 and 19 in 1997. The model does a decent job of fitting the data: the sample average
was 3.86 per county in 1988 and 3.49 per county in 1997; the model’s predictions are 3.79
and 3.43, respectively. Such results might be expected as the parameters are chosen to
match these moments. In Table 5(B), I report the correlations between the predicted and
observed numbers of Kmart stores, Wal-Mart stores, and small firms in each market. The
correlations are between 0.63 and 0.75. These correlations are not included in the set of
moment functions, and a high correlation indicates a good fit. Overall, the model explains
the data well.
To check whether the estimates are reasonable, Table 6 lists the model’s predicted
profits and compares them with the accounting profits documented in Kmart’s and WalMart’s SEC 10-K annual reports. According to the model, the average profit of Wal-Mart
stores grew 51% over the sample period, which is consistent with the recorded increase of
41% in Wal-Mart’s annual reports. Kmart’s accounting profit in 1997 was substantially
smaller than that in 1988, due to the financial obligations of divesting several specialized
retailing businesses that were overall a financial disappointment. The average real sales
per Kmart store increased 2.6% over the ten-year period. Considering the various increases
in the operating costs documented in Kmart’s annual report, the change in its store sales
2.7 Results
43
revenue is compatible with the 8% decrease in the store profit predicted by the model.22,23
To understand the magnitudes of the market size coefficients, I report in Table 7 the
changes in the number of each type of stores when each market size variable changes. For
example, to derive the effect of population change on the number of small firms, I fix
Kmart’s and Wal-Mart’s profits, increase small retailers’ profit in accordance with a ten
percent increase in population, and re-solve for the new equilibrium number of small stores.
The market size variables have a relatively modest impact on the number of small
businesses. In 1988, a 10% increase in population attracts 8.7% more firms. The same
increase in real retail sales per capita draws 5.4% more firms. The number of small firms
declines by about 0.7% when the percentage of urban population goes up by 10%. In
comparison, the regional dummy is much more important: everything else equal, changing
the southern dummy from 1 for all counties to 0 for all counties leads to 37.2% fewer small
firms (6136 small stores vs. 9775 small stores).
Market size variables seem to matter more for big chains. In 1988, A 10% growth in
population induces Kmart to enter 12.1% more markets and Wal-Mart 10.3% more markets.
A similar increment in retail sales attracts entry of Kmart and Wal-Mart stores in 17.6%
22
The model predicted profit for Kmart (Wal-Mart) is the average of the equilibrium profit over all Kmart
(Wal-Mart) stores. There are two caveats in the comparison of the profit growth predicted by the model
and the reported growth of the accounting profit. First, there is no real unit for the profit derived from
the structural profit function — it is scaled by one standard deviation of the unobserved error term in
the profit function. The 51% increase in the model predicted profit from 1988 to 1997 assumes that the
standard deviation of the error term did not change over this sample period. Second, the 41% growth in the
accounting profit is averaged over all stores, including stores in counties outside the sample. The calculation
here assumes that profit growth is roughly the same for stores in the sample counties and stores outside the
sample counties.
23
Wal-Mart’s 1988 and 1997 annual reports do not separate the profit of Wal-Mart stores from the profit
of Sam’s clubs. Since the gross markup in Sam’s clubs is half of that in the regular Wal-Mart discount stores,
two dollars of sales from a Sam’s club are assumed to contribute to the total profit the same as one dollar
of sales from a Wal-Mart discount store.
2.7 Results
44
and 10.5% more markets, respectively. The results are similar for 1997. These differences
indicate that Kmart is much more likely to locate in bigger markets, while Wal-Mart thrives
in smaller markets. Perhaps not surprisingly, the regional advantage is substantial for both
chains: controlling for the market size, changing the Midwest regional dummy from 1 to 0
for all counties leads to 28% fewer Kmart stores, and changing the Southern regional dummy
from 1 to 0 for all counties leads to 52.5% fewer Wal-Mart stores. When distance increases
by 10%, the number of Wal-Mart stores drops by 8.4%. Wal-Mart’s “home advantage” is
much smaller in 1997: everything else the same, changing the south dummy from 1 to 0 for
all counties leads to 22% fewer Wal-Mart stores. The regional dummies and the distance
variable provide a reduced-form way for the static model to capture the path-dependence
of the expansion of Wal-Mart stores.
2.7.2
The competition effect and the chain effect
As shown in Table 3, all of the competition effects and the chain effects, with the exception
of the impact of small firms on chain stores, are precisely estimated.
The estimates display several noticeable features. First, the negative impact of Kmart
on Wal-Mart’s profit δ wk in absolute value is much smaller in 1997 than in 1988, while the
opposite is true for Wal-Mart’s impact on Kmart’s profit δ kw .24 Both a Cournot model and a
Bertrand model with differentiated products predict that reduction in rivals’ marginal costs
drives down a firm’s own profit. I do not observe firms’ marginal costs, but these parameter
estimates are consistent with evidence that Wal-Mart’s marginal cost was declining relative
to Kmart’s over the sample period. Wal-Mart is famous for its cost-sensitive culture; it
24
Due to the high variance of the estimates, the difference is not statistically significant.
2.7 Results
45
is also keen on technology advancement. Holmes (2001) cites evidence that Wal-Mart
has been a leading investor in information technology. In contrast, Kmart struggled with
its management failures that resulted in stagnant revenue sales, and it either delayed or
abandoned store renovation plans throughout the 1990s.
Second, it is somewhat surprising that the negative impact of Kmart on small firms’
profit δ sk is comparable to Wal-Mart’s impact δ sw , considering the controversies and media
reports generated by Wal-Mart. The outcry about Wal-Mart was probably because WalMart had more stores in small- to medium-sized markets where the effect of a big store entry
was felt more acutely, and because Wal-Mart kept expanding, while Kmart was consolidating
its existing stores with few net openings in these markets over the sample period.
Third, the coefficient for Wal-Mart’s chain effect δ ww is smaller in 1997, although the
overall effect is bigger, as Σl6=m,l∈Bm
Dw,l
Zml
is almost twice as large in 1997 as in 1988.25 The
decline in δ ww suggests that the benefit of scale economies does not grow proportionally.
In fact there are good reasons to believe it might not be monotone because, as discussed in
section 2.6.2, when chains grow bigger and saturate the area, cannibalization among stores
becomes a stronger concern.
To better assess the magnitude of the competition effects, Table 8(A) and Table 8(B)
re-solve the model for different market structures. The results from Table 8(A) suggest
that chains have a substantial competition impact on small firms. In 1988, compared with
the scenario where there are neither Kmart nor Wal-Mart stores, adding a Kmart store to
each market reduces the number of small firms by 46%, or 2.69 firms per county; adding a
25
The difference between δ ww in 1988 and δ ww in 1997 is not statistically significant.
2.7 Results
46
Wal-Mart store reduces it by 43.3%, or 2.53 firms per county. When both a Kmart and a
Wal-Mart store enter, the number of small firms plummets by 71.4%, a reduction of 4.17
firms per county. If Wal-Mart takes over Kmart, the number of small firms is 12.3% higher
than that observed in the sample when Wal-Mart and Kmart compete against each other.26
This is due to a business stealing effect between the competing chains — when the two
chains merge or one chain takes over the other, the joint-profit-maximizing number of chain
stores is smaller, which in turn leads to a larger number of small firms. The patterns are
quite similar in 1997: compared with the case of no chain stores, adding a Kmart store to
each market decreases the number of small firms by 36.2%, or 1.92 per county; adding a
Wal-Mart, 37%, or 1.96 per county; adding both a Kmart and a Wal-Mart store, 61.6%, or
3.27 per county.
Even with the conservative estimate that one Kmart or Wal-Mart store displaces 40%
of the small firms, the competition effect of chains on small retailers is sizable, especially
since the small discount firms form only a segment of the retailers affected by the entry of
chain stores. The combined effect on all small retailers and local communities in general
can be much larger.
Table 8 (B) illustrates the competition effect between Kmart and Wal-Mart. Consistent
with the changes in δ kw and δ wk from 1988 to 1997, the effect of Kmart’s presence on WalMart’s profit is much stronger in 1988, while the effect of Wal-Mart’s presence on Kmart’s
profit is stronger in 1997. For example, in 1988, Wal-Mart would only enter 318 markets if
there were a Kmart store in every county. When Kmart ceases to exist as a competitor, the
26
In this counter-factual exercise, Wal-Mart becomes the monopoly chain and competes with small retailers. The total number of chain stores is just the total number of Wal-Mart stores in the new equilibrium,
and is smaller than the sum of Wal-Mart stores and Kmart stores before the take-over.
2.7 Results
47
number of markets with Wal-Mart stores rises to 846, a net increase of 166%. The same
experiment in 1997 leads Wal-Mart to enter 39.3% more markets, from 728 to 1014. The
pattern is reversed for Kmart. In 1988, Kmart would enter 42.6% more markets when there
is no Wal-Mart stores compared with the case of one Wal-Mart store in every county (479
Kmart stores vs. 336 Kmart stores); in 1997, Kmart would enter 87.1% more markets for
the same experiment (610 Kmart stores vs. 326 Kmart stores).27
To examine the importance of the chain effect for both chains, consider Table 8 (C). The
first row reports the percentage of store profit due to the chain effect; this is the average
D
i,l
of δ ii Σl6=m,l∈Bm Zml
divided by the average store profit (reported in Table 6). For both
chains, the chain effect contributes to more than 10% of the store profit. For example, it
accounts for 10.2% of Wal-Mart’s profit in 1988, and 12.3% of its profit in 1997. To derive
the equilibrium number of stores when there is no chain effect, I set δ ii = 0 for the targeted
chain, but keep the rival’s δ jj unchanged and re-solve the model. In 1988, without the chain
effect, the number of Kmart stores would have decreased by 40, and Wal-Mart would have
entered 125 fewer markets. The numbers are comparable for 1997. This result is consistent
with Holmes (2005), who also found scale economies to be important. Given the magnitude
of these spillover effects, further research that explains their mechanism will help improve
our understanding of the retail industry, in particular its productivity gains over the past
several decades.28
27
In solving for the number of Wal-Mart (Kmart) stores when Kmart (Wal-Mart) exits, I allow the small
firms to compete with the remaining chain.
28
See Foster et al (2002).
2.7 Results
2.7.3
48
The impact of Wal-Mart’s expansion and related policy issues
Consistent with media reports about Wal-Mart’s impact on small retailers, the model predicts that Wal-Mart’s expansion contributes to a large percentage of the net decline in the
number of small firms over the sample period. The first row in Table 9 (A) records the net
decrease of 748 small firms observed over the sample period, or 0.36 per market. To evaluate
the impact of Wal-Mart’s expansion on small firms separately from other factors (e.g., the
change in market sizes or the change in Kmart stores), I re-solve the model using the 1988
coefficients for Kmart’s and small firms’ profit functions and the 1988 market size variables,
but the 1997 coefficients for Wal-Mart’s profit function. The experiment corresponds to
holding everything the same as in 1988, but allowing Wal-Mart to become more efficient
and expand. The predicted number of small firms falls by 558 from the model prediction
using the 1988 coefficients for Wal-Mart’s profit function. This accounts for 75% of the
observed decrease in the number of small firms. Conducting the same experiment but using
the 1997 coefficients for Kmart’s and small firms’ profit functions, the 1997 market size
variables, and the 1988 coefficients for Wal-Mart’s profit function, I find that Wal-Mart’s
expansion accounts for 383 stores, or 51% of the observed decrease in the number of small
firms.
If we ignore the endogeneity of chains’ entry decisions and regress the number of small
firms on the number of chains together with the market size variables, we would underestimate the impact of Wal-Mart’s expansion on small retailers by a large amount. For
example, using the coefficients from an ordered probit model applied to the 1988 data, the
difference between the expected number of small firms using Wal-Mart’s 1988 store number
2.7 Results
49
and the expected number of small firms using Wal-Mart’s 1997 store number explains only
33% of the observed decline in the number of small firms. Using the coefficients from the
same ordered probit model applied to the 1997 data, Wal-Mart’s expansion between 1988
and 1997 accounts for only 20% of the observed decline in the number of small firms.29
Overall, ignoring the endogeneity of chains’ entry decisions underestimates the competition
effect by fifty to sixty percent.
Using the conservative figure of 383 stores, the absolute impact of Wal-Mart’s entry
seems modest. However, the exercise here includes only small firms in the discount sector.
Both Kmart and Wal-Mart carry a large assortment of products and compete with a variety
of stores, like hardware stores, houseware stores, apparel stores, etc., so that their impact on
local communities is conceivably much larger. To examine the overall impact of Wal-Mart’s
expansion, one needs to include a separate profit function for firms in each of these other
categories and estimate the system of profit functions jointly.
Government subsidy has long been a policy instrument to encourage firm investment
and to create jobs. To evaluate the effectiveness of this policy in the discount retailing
sector, I simulate the equilibrium numbers of stores when various firms are subsidized. The
results in Table 10 indicate that direct subsidies do not seem to be effective in generating
jobs. In 1988, subsidizing Wal-Mart stores 10% of their average profit, which amounts to
one million dollars, increases the number of Wal-Mart stores per county only from 0.31 to
0.34.30 With the average Wal-Mart store hiring fewer than 300 full and part-time employees,
29
The ordered probit regressions use the same right-hand side variables as the structural model. See Table
2 for the coefficients from the ordered probit regressions.
30
The average Wal-Mart store’s net income in 1988 is about one million in 2004 dollars (see Table 6).
Using a discount rate of 10%, the discounted present value of a store’s lifetime profit is about ten million.
2.8 Conclusion and future work
50
the additional number of stores translates to at most nine new jobs.31 Similarly, subsidizing
all small firms by 100% of their average profit increases their number from 3.79 to 4.61,
and generates eight jobs if on average a small firm hires ten employees. Together, these
exercises suggest that a direct subsidy should be used with caution if it is designed to
increase employment in this industry.
2.8
Conclusion and future work
I have examined the competition effect of chain stores on small firms and the role of the
chain effect in firms’ entry decisions. The results support the anecdotal evidence that “big
drives out small.” On average, entry by either a Kmart or a Wal-Mart store displaces forty
to fifty percent of the small discount firms. Wal-Mart’s expansion from the late 1980s to
the late 1990s explains fifty to seventy percent of the net change in the number of small
discount firms. Failure to address the endogeneity of firms’ entry decisions would result
in underestimating this impact by fifty to sixty percent. Furthermore, direct subsidies to
either chains or small firms are not likely to be effective in creating jobs and should be used
with caution.
These results reinforce the concerns raised by many policy observers regarding the subsidies directed to big retail corporations. Perhaps less obvious is the conclusion that subsidies
toward small retailers should also be designed carefully.
Like Holmes (2005), I find that scale economies, as captured by the chain effect, generate
A subsidy of 10% is equivalent to one million dollars.
31
The equilibrium numbers of Kmart stores and small firms decrease slightly when Wal-Mart is subsidized,
but the implied change in employment is tiny.
2.9 Appendix A: definitions and proofs
51
substantial benefits. Studying these scale economies in more detail is useful for helping firms
exploit such advantages and for guiding merger policies or other regulations that affect
chains. A better understanding of the mechanism underlying these spillover effects will also
help us to gain insight in the productivity gains in the retail industry over the past several
decades.
Finally, the algorithm used in this chapter can be applied to many industries where
scale economies are important. One application is the airline industry, where the network of
flight routes exhibits a type of spillover effect similar to the one described here. For example,
adding a route from New York to Boston directly affects profits of flights that either originate
from or end in Boston and New York. The tools proposed in this chapter can be deployed to
extend current models of strategic interaction among airlines to incorporate such network
effects. Another possible application is to industries with cost complementarity among
different products. The algorithm here is particularly suitable for modeling firms’ product
choices when the product space is large.
2.9
2.9.1
Appendix A: definitions and proofs
Verification of the necessary condition (2.3)
Let D∗ = arg maxD∈D Π(D). The optimality of D∗ implies the following set of necessary
conditions:
∗
∗
∗
∗
∗
∗
∗
∗
Π(D1∗ , ..., Dm−1
, Dm
, Dm+1
, ..., DM
) ≥ Π(D1∗ , ..., Dm−1
, Dm , Dm+1
, ..., DM
), ∀m, Dm
6= Dm
2.9 Appendix A: definitions and proofs
52
∗
∗
∗ }. Π(D ∗ ) differs from Π(D̂) in two parts: the
Let D̂ = {D1∗ , ..., Dm−1
, Dm , Dm+1
, ..., DM
profit in market m, and the profit in all other markets through the chain effect:
Dl∗
]+
Zml
D∗
Dm
)
δΣl6=m Dl∗ ( m ) − δΣl6=m Dl∗ (
Zlm
Zlm
D∗
∗
= (Dm
− Dm )[Xm + 2δΣl6=m l ]
Zml
∗
Π(D∗ ) − Π(D̂) = (Dm
− Dm )[Xm + δΣl6=m
∗ 6= D , we
where Zml = Zlm due to the symmetry. Since Π(D∗ ) − Π(D̂) ≥ 0 for any Dm
m
D∗
∗ = 1, D = 0 if and only if X + 2δΣ
∗
l
have Dm
m
m
l6=m Zml ≥ 0; and Dm = 0, Dm = 1 if and only
D∗
D∗
∗ = 1[X + 2δΣ
l
l
if Xm + 2δΣl6=m Zml
≤ 0. Together they imply Dm
m
l6=m Zml ≥ 0].
2.9.2
The set of fixed points of an increasing function that maps a lattice
into itself
All of the definitions in this appendix — the definitions of a lattice, a complete lattice,
supermodular functions, increasing differences, as well as induced set ordering — are taken
from Topkis (1998). The definition of a lattice involves the concepts of a partially ordered
set, a join, and a meet. A partially ordered set is a set X on which there is a binary relation
¹ that is reflexive, antisymmetric, and transitive. If two elements, X 0 and X 00 , of a partially
ordered set X have a least upper bound (greatest lower bound) in X, it is their join (meet)
and is denoted X 0 ∨ X 00 (X 0 ∧ X 00 ).
Definition 2.1. A partially ordered set that contains the join and the meet of each pair of
its elements is a lattice.
Definition 2.2. A lattice in which each nonempty subset has a supremum and an infimum
2.9 Appendix A: definitions and proofs
53
is complete.
Any finite lattice is complete. A nonempty complete lattice has a greatest element and
a least element. Tarski’s fixed point theorem, stated in the main body of the chapter as
Theorem 2.1, establishes that the set of fixed points of an increasing function that maps
from a lattice into itself is a nonempty complete lattice with a greatest element and a least
element.
For a counterexample where a decreasing function’s set of fixed points is empty, consider
the following simplified entry model where three firms compete with each other and decide
simultaneously whether to enter the market. Their joint strategy space is D = {0, 1}3 . The
profit functions are as follows:
⎧
⎪
⎪
Πk = Dk (0.5 − Dw − 0.25Ds )
⎪
⎪
⎨
Πw = Dw (1 − 0.5Dk − 1.1Ds )
⎪
⎪
⎪
⎪
⎩
Πs = Ds (0.6 − 0.5Dw − 0.7Ds )
Let D = {Dk , Dw , Ds } ∈ D, D−i denote rivals’ strategies, Vi (D−i ) denote the best
response function for player i, and V (D) = {Vk (D−k ), Vw (D−w ), Vs (D−s )} denote the joint
best response function. It is easy to show that V (D) is a decreasing function that takes the
following values:
⎧
⎪
⎨ V (0, 0, 0) = {1, 1, 1}; V (0, 0, 1) = {1, 0, 1}; V (0, 1, 0) = {0, 1, 1}; V (0, 1, 1) = {0, 0, 1}
⎪
⎩ V (1, 0, 0) = {1, 1, 0}; V (1, 0, 1) = {1, 0, 0}; V (1, 1, 0) = {0, 1, 0}; V (1, 1, 1) = {0, 0, 0}
The set of fixed points for V (D) is empty, since there does not exist a D ∈ D such that
V (D) = D.
2.9 Appendix A: definitions and proofs
2.9.3
54
A tighter lower bound and upper bound for the optimal solution
vector D∗
In section 2.5.1 I have shown that using inf(D) and sup(D) as starting points yields, respectively, a lower bound and an upper bound to D∗ = arg maxD∈D Π(D). Here I introduce
two bounds that are tighter. The lower bound builds on the solution to a constrained
maximization problem:
max
D1, ...,DM ∈{0,1}
Π =
¸
M ∙
X
Dl
Dm ∗ (Xm + δΣl6=m
)
Zml
i=1
s.t. if Dm = 1, then Xm + δΣl6=m
Dl
>0
Zml
The solution to this constrained maximization problem belongs to the set of fixed points of
the vector function V̂ (D) = {V̂1 (D), ..., V̂M (D)}, where V̂m (D) = 1[Xm + δΣl6=m ZDmll > 0].
The function V̂ (·) is increasing and maps from D into itself: V : D → D. Using arguments
similar to those in section 2.5.1, one can show that the convergent vector D̂ using sup(D) as
the starting vector is the greatest element of the set of fixed points. Further, D̂ achieves a
higher profit than any other fixed point of V̂ (·), since by construction each non-zero element
of the vector D̂ adds to the total profit. Changing any non-zero element(s) of D̂ to zero
reduces the total profit.
To show that D̂ ≤ D∗ , the solution to the original unconstrained maximization problem,
we construct a contradiction. Since the maximum of an unconstrained problem is always
greater than that of a corresponding constrained problem, we have: Π(D∗ ) ≥ Π(D̂). Therefore, D∗ can’t be strictly smaller than D̂, because any vector strictly smaller than D̂ delivers
2.9 Appendix A: definitions and proofs
55
a lower profit. Suppose D∗ and D̂ are unordered. Let D∗∗ = D∗ ∨ D̂ (where “∨” defines
the element-by-element Max operation). The change from D∗ to D∗∗ increases total profit,
∗ = 1 do not decrease after the change, and profits at
because profits at markets with Dm
∗ = 0 but D̂ = 1 increase from 0 to a positive number after the change.
markets with Dm
m
This contradicts the definition of D∗ , so D̂ ≤ D∗ .
Note that V (D̂) ≥ D̂, where V (·) is as defined in section (2.5.1). This follows from
Vm (D̂) = 1[Xm + 2δΣl6=m ZD̂mll ≥ 0] ≥ 1[Xm + δΣl6=m ZD̂mll ≥ 0] = D̂m , ∀m, where the last
equality holds because D̂ is a fixed point of V̂ (·). The monotonicity of V (·), together with
D̂ ≤ D∗ and V (D̂) ≥ D̂, implies that the iteration process starting from D̂ converges, and
that the convergent vector (denoted as D̂T ) is a lower bound to D∗ .
D̂T is a tighter lower bound than DL (discussed in section (2.5.1)) because D̂ ≥ inf(D),
so D̂T = V T T (D̂) ≥ V T T (inf(D)) = DL , with T T = max{T, T 0 }, where T is the number of
iterations from D̂ to D̂T and T 0 is the number of iterations from inf(D) to DL .
Since the chain effect is bounded by zero (when there are no other stores anywhere)
and δΣl6=m Z1ml (when there is a store in every market), we can find a tighter upper bound
to D∗ by starting from the vector D̃ = {D̃m : D̃m = 0 if Xm + 2δΣl6=m Z1ml < 0; D̃m = 1
otherwise}. The markets with D̃m = 0 contribute a negative element to total profit even
with the largest conceivable chain effect, so it is never optimal to enter these markets, i.e.,
D̃ ≥ D∗ . It is straightforward to show that the iteration converges (V T (D̃) = D̃T , T ≤ M )
and that the convergent vector D̃T is a tighter upper bound to D∗ than DU .32
D̃T ≤ DU because D̃T = V T T (D̃) ≤ V T T (sup(D)) = DU , with T T = max{T, T 0 }, where T is the
number of iterations from D̃ to D̃T and T 0 is the number of iterations from sup(D) to DU .
32
2.9 Appendix A: definitions and proofs
2.9.4
56
Verification that the chains’ profit functions are supermodular with
decreasing differences
Definition 2.3. Suppose that Y (X) is a real-valued function on a lattice X. If
Y (X 0 ) + Y (X 00 ) ≤ Y (X 0 ∨ X 00 ) + Y (X 0 ∧ X 00 )
(2.7)
for all X 0 and X 00 in X, then Y (X) is supermodular on X. If
Y (X 0 ) + Y (X 00 ) < Y (X 0 ∨ X 00 ) + Y (X 0 ∧ X 00 )
for all unordered X 0 and X 00 in X, then Y (X) is strictly supermodular on X. If −Y (X) is
(strictly) supermodular, then Y (X) is (strictly) submodular.
Definition 2.4. Suppose that X and K are partially ordered sets and Y (X, k) is a realvalued function on X×K. If Y (X, k00 )−Y (X, k0 ) is increasing, decreasing, strictly increasing,
or strictly decreasing in X on X for all k0 ≺ k 00 in K, then Y (X, k) has, respectively, increasing differences, decreasing differences, strictly increasing differences, or strictly decreasing
differences in (X, k) on X.
Now let us verify that chain i’s profit function (2.1) is supermodular in its own strategy
Di ∈ D. For ease of notation, the firm subscript i is omitted, and Xm β i + δ ij Dj,m +
p
1 − ρ2 εm + ρη i,m is absorbed into Xm . Chain i’s profit function is
i
h
P
Dl
∗
(X
+
δΣ
)
. First it is easy to show that D0 ∨D00 =
simplified to: Π = M
D
m
m
l6=m Zml
m=1
δ is ln(Ns,m + 1) +
(D0 − min(D0 , D00 )) + (D00 − min(D0 , D00 )) + min(D0 , D00 ), D0 ∧ D00 = min(D0 , D00 ). Let
D0 − min(D0 , D00 ) be denoted D1 , D00 − min(D0 , D00 ) as D2 , and min(D0 , D00 ) as D3 .The
2.9 Appendix A: definitions and proofs
57
left-hand side of the inequality (2.7) is:
X
Dl0
D00
00
)+
Dm
(Xm + δΣl6=m l )
m
m
Zml
Zml
X £
¤
0
0
00
0
00
=
− min(Dm
, Dm
)) + min(Dm
, Dm
) ∗
(Dm
m
∙
¸
1
0
0
00
0
00
Xm + δΣl6=m
[(Dl − min(Dl , Dl )) + min(Dl , Dl )] +
Zml
X
00
0
00
0
00
[(Dm
− min(Dm
, Dm
)) + min(Dm
, Dm
)] ∗
m
∙
¸
1
00
0
00
0
00
Xm + δΣl6=m
)[(Dl − min(Dl , Dl )) + min(Dl , Dl )]
Zml
X
1
(D1,m + D3,m )(Xm + δΣl6=m
(D1,l + D3,l )) +
=
m
Zml
X
1
(D2,m + D3,m )(Xm + δΣl6=m
(D2,l + D3,l ))
m
Zml
Π(D0 ) + Π(D00 ) =
X
0
Dm
(Xm + δΣl6=m
Similarly, the right-hand side of the inequality (2.7) is:
∙
¸
1
0
00
Xm + δΣl6=m
(D ∨ Dl ) +
Π(D ∨ D ) + Π(D ∧ D ) =
Zml l
∙
¸
X
1
0
00
0
00
(Dm ∧ Dm ) Xm + δΣl6=m
(D ∧ Dl )
m
Zml l
X
=
(D1,m + D2,m + D3,m ) ∗
m
¸
∙
1
(D1,l + D2,l + D3,l ) +
Xm + δΣl6=m
Zml
X
1
D3,m (Xm + δΣl6=m
D3,l )
m
Zml
D2,m D1,l + D1,m D2,l
)
= Π(D0 ) + Π(D00 ) + δ(Σm Σl6=m
Zml
0
00
0
00
X
0
(Dm
m
00
∨ Dm
)
The profit function is supermodular in its own strategy if the chain effect δ is non-negative.
The verification of decreasing differences is also straightforward (here δ ij Dj,m is spelled out,
2.9 Appendix A: definitions and proofs
58
rather than absorbed into Xim ):
Πi (Di , Dj00 ) − Πi (Di , Dj0 )
¸
X ∙
Di,l
00
=
+ δ ij Dj,m ) −
Di,m ∗ (Xim + δ ii Σl6=m
m
Zml
¸
X ∙
Di,l
0
Di,m ∗ (Xim + δ ii Σl6=m
+ δ ij Dj,m )
m
Zml
= δ ij
M
X
m=1
00
0
Di,m (Dj,m
− Dj,m
)
The difference is decreasing in Di for all Dj0 < Dj00 as long as δ ij ≤ 0.
2.9.5
Multiple maximizers to the chains’ optimization problem
In the main body of the chapter, I have assumed that the optimal solution to the profit
maximization problem D∗ is unique. To accommodate multiple solutions in the algorithm
discussed in subsections 2.5.1 and 2.5.2, I need to introduce the definition of the induced
set ordering.
Definition 2.5. The induced set ordering v is defined on the collection of nonempty members of the power set P(X)\{∅} such that X0 v X00 in P(X)\{∅} if X 0 in X0 and X 00 in X00
imply that X 0 ∧ X 00 is in X0 and X 0 ∨ X 00 is in X00 .
The power set P(X) of a set X is the set of all subsets of X.
Definition 2.6. A function whose range is included in the collection of all subsets of some
set is a correspondence. A correspondence Sk is increasing (decreasing) in k on K if the
domain K is a partially ordered set, the range {Sk : k ∈ K} is in L(X) where X is a lattice
and L(X) is a partially ordered set with the ordering relation v, and Sk is an increasing
(decreasing) correspondence from K into L(X) (so k0 ¹ k00 in K implies Sk0 v Sk00 (Sk00 v
Sk0 ) in L(X)).
2.9 Appendix A: definitions and proofs
59
In stating that Sk is increasing (decreasing) in k, it is implicit that each Sk is a nonempty
sublattice of X and that v is the ordering relation on the sets Sk in L(X). The following
Theorem, discussed in Topkis (1998), states that if Sk is decreasing in k on K and Sk is
finite, then Sk has a greatest element and a least element, both of which decrease in k.33
Theorem 2.3. Suppose that X is a lattice, K is a partially ordered set, Sk is a subset of
X for each k in K, and Sk is decreasing in k on K. If Sk has a greatest (least) element for
each k in K, then the greatest (least) element is a decreasing function of k from K into X.
Hence, if Sk is finite for each k in K, or X is a subset of Rn and Sk is a compact subset
of Rn for each k in K, then Sk has a greatest element and a least element for each k in K
and the greatest (least) element is a decreasing function of k.
According to Theorem 2.2 in the main body of the chapter, arg maxDi ∈Di Πi (Di , Dj ) is
decreasing in Dj . Since arg maxDi ∈Di Πi (Di , Dj ) ⊂ Di is finite, Theorem 2.3 implies that
the set arg maxDi ∈Di Πi (Di , Dj ) has a greatest element and a least element for each Dj , both
of which decease in Dj . The solution algorithm that accommodates multiple solutions to
the profit-maximizing problem is as follows: to search for the equilibrium most profitable for
Kmart, if there are multiple elements in Kmart’s best response correspondence, choose the
greatest element; if there are multiple elements in Wal-Mart’s best response correspondence,
choose the least element. To search for the equilibrium most profitable for Wal-Mart, choose
the greatest element in Wal-Mart’s best response correspondence and the least element in
Kmart’s best response.
33
The original theorem is stated in terms of Sk increasing in k. Replacing k with −k delivers the version
of the theorem stated here.
2.9 Appendix A: definitions and proofs
2.9.6
60
Computational issues
The main computational burden of this exercise is the search for the best responses K(Dw )
and W (Dk ). In section 2.5.1, I have proposed two bounds DU and DL that help to reduce
the number of profit evaluations. Appendix 2.9.3 illustrates a tighter upper and lower bound
that work well in the empirical implementation.
When the chain effect δ ii is sufficiently big, it is conceivable that the upper bound and
lower bound are far apart from each other. If this happens, computational burden once
again becomes an issue, as there will be many vectors between these two bounds.
Two observations work in favor of the algorithm. First, recall that the chain effect is
assumed to take place among counties whose centroids are within fifty miles. Markets that
are farther away are not directly connected. Conditioning on the entry decisions in other
markets, the entry decisions in group A do not depend on the entry decisions in group B if
all markets in group A are at least fifty miles away from any market in group B. Therefore,
what matters is the size of the largest connected markets different between DU and DL ,
rather than the total number of elements different between DU and DL . To illustrate this
point, suppose there are ten markets as below:
4
1
2
3
5
6
7
9
10
U
L
8 . Suppose the upper bound D and the lower bound D are the same
for markets 2,6,9, and 10, but differ for the rest six markets: DU = 1
1
D2
1
1
D6
1
D9
D10
1 ,
2.9 Appendix A: definitions and proofs
DL = 0
0
D2
0
0
D6
0
D9
D10
61
0 . If markets 1, 4, and 5 (group A) are at least fifty miles
away from markets 3, 7, and 8 (group B), one only needs to evaluate 23 + 23 = 16 vectors,
rather than 26 = 64 vectors to find the profit maximizing vector.
The second observation is that even with a sizable chain effect, the event of having
DU and DL different in a large connected area is extremely unlikely. Let N denote the
U = 1[X +
size of such an area. Both DU and DL are the fixed points of V (·), so: Dm
m
DU
DL
L = 1[X + 2δΣ
l
l
2δΣl6=m,l∈Bm Zml
+ ξ m ≥ 0] and Dm
m
l6=m,l∈Bm Zml + ξ m ≥ 0], where I have
grouped Xm β i + δ ij Dj,m + δ is ln(Ns,m + 1) into Xm , and
p
1 − ρ2 εm + ρηi,m into ξ m . Bm
denotes the set of markets that are within fifty miles from market m, including m.) The
U = 1, D L = 0 for every market m in the size-N connected area C is:
probability of Dm
N
m
U
L
= 1, Dm
= 0, ∀m ∈ CN ) ≤ Pr(Xm + ξ m < 0, Xm + ξ m + 2δΣl6=m,l∈Bm
Pr(Dm
= ΠN
m=1 Pr(Xm + ξ m < 0, Xm + ξ m + 2δΣl6=m,l∈Bm
1
≥ 0, ∀m ∈ CN )
Zml
1
≥ 0)
Zml
where ΠN
m=1 denotes the product of the N elements. The equality follows from the i.i.d.
assumption of Xm +ξ m . As δ goes to infinity, the probability approaches ΠN
m=1 Pr(Xm +ξ m <
0) from below. How fast it decreases when N increases depends on the distribution of ξ m
as well as the distribution of Xm . If ξ m is i.i.d. normally distributed and Xm is linearly
distributed between [−a, a], with a a finite positive number, on average the probability is
on the magnitude of ( 12 )N .
2.10 Appendix B: data
62
To show this, note that:
N
E(ΠN
m=1 Pr(Xm + ξ m < 0)) = E(Πm=1 (1 − Φ(Xm ))
= ΠN
m=1 [1 − E(Φ(Xm ))]
1
= ( )N
2
Therefore, even in the worst scenario that the chain effect δ approaches infinity, the probability of having a large connected area that differs between DU and DL decreases exponentially
with the size of the area. In the current application, the size of the largest connected area
that differs between DL and DU is seldom bigger than seven or eight markets.
2.10
Appendix B: data
I went through all the painstaking details to clean the data from the Directory of Discount
Stores. After the manually entered data were inspected many times with the hard copy, the
stores’ cities were matched to belonging counties using a census data.34 Some city names
listed in the directory contained typos, so I first found possible spellings using the census
data, then inspected the stores’ street addresses and zipcodes using various web sources to
confirm the right city name spelling. The final data set appears to be quite accurate. I
compared it with Wal-Mart’s firm data and found the difference to be quite small.35 For
the sample counties, only thirty to sixty stores were not matched between these two sources
for either 1988 or 1997.
34
Marie Pees kindly provided these data.
35
I am very grateful to Emek Basker for sharing the Wal-Mart firm data with me.
Chapter 3
Semi-Parametric Estimation of the
Distribution of Fixed Costs in
Entry Models
3.1
Introduction
Entry has long been an interesting topic for IO economists. Among the early papers, Bain
(1956) focused on “determinants of entry barriers”, such as economies of scale, product
differentiation and cost asymmetry.With the application of game theory to economics in
the 1970s and 1980s, IO economists started to model how strategic behavior among firms
influences firms’ entry decisions. This literature tends to show that strategic behavior in
many cases is more important than technology and demand factors. The empirical work
that directly models strategic interaction among firms did not start until the late 1980s.
Among the earliest works are Bresnahan and Reiss (1987, 1990, and 1991), and Berry (1992).
The most recent ones include Berry and Waldfogel (1999), Mazzeo(2002), Seim(2006), and
63
3.1 Introduction
64
Toivanen and Waterson (2005), among many others.
These structural models rely on the fact that firms’ discrete entry decisions reveal some
information about their underlying profit. By observing how firms’ entry or exit decisions
change when market conditions evolve, we can make inferences about how market conditions affect firms’ profit and possibly learn about the nature of the competition among firms.
In most of these models, profit in a particular market (usually a geographic region) is a
reduced form equation of market observables (like income, population, demographics, and
possibly the number of firms in the market, or some measure of market structure) plus an
error term. A distribution assumption of the fixed cost, plus some assumptions of the entry
game structure, allows researchers to express the probability of observing a certain market
structure as a function of profit parameters. Maximum likelihood or simulated maximum
likelihood (in the cases when it is difficult to calculate the probabilities directly) can then
be used to estimate parameters in the profit function.
Besides the choice of profit functional forms, researchers also have to assume a distribution function for the unobserved fixed cost. Most of the time it is chosen for computational
simplicity or tractability reasons. This brings the following question: what if the distribution assumption is wrong? What if the true distribution has a long tail or is multi-modal
when it is assumed to be normal? Will this lead to biased estimates, overestimating or
underestimating the competitive interactions among firms? As we all know, the maximum
3.1 Introduction
65
likelihood estimator under mis-specification is typically inconsistent, converging in probability to the value that minimizes the Kullback-Liebler Information Criterion. The direction
of the asymptotic bias is generally difficult to tell. Since we typically do not have any information about the unobserved fixed cost, it would be imprudent to report the parameter
estimates without carrying out any robustness check on the distribution assumptions. In
this chapter I will explain how we can get a consistent (and efficient under certain conditions) estimator without imposing any distribution assumptions on the fixed cost, and
propose some simple specification tests.
In some cases, we might be interested in the distribution of the fixed cost itself. We
might want to know whether the distribution has shifted after some policy reforms or regulation changes. For example, one might be interested in examining whether after the 1996
telecommunication act the average entry barriers due to fixed cost have shifted dramatically. Unlike some other semi-parametric estimation methods, (for example, the average
derivative method), the estimation method used in this chapter (proposed by Klein and
Spady (1993)) allows one to back out the distribution itself.
The chapter is organized as follows. The second section explains the setup of the model.
The third to the fifth sections elaborate the estimation methods and properties of the estimators. The sixth section describes the differences between the model used in Klein and
Spady (1993) and the one in this chapter. The last two sections discuss the data source and
the results. Section nine concludes.
3.2 Model
3.2
66
Model
The entry game studied here has a very stylized setting with homogeneous firms (similar
to Bresnahan and Reiss (1991)). The game has two stages. In the first stage, firms decide
whether to enter or not. Once the entry decisions have been made, firms compete in the
second stage in which payoffs are determined. It is a game with complete information: every
firm knows about the level of market demand, the variable costs of production, as well as
the fixed cost (the error term that is not observed by econometricians). In equilibrium, all
of the firms produce the same amount of output, charge the same price and have the same
market shares. The number of firms in equilibrium is the maximum number of firms that
can be supported by a market, i.e., all the incumbent firms earn non-negative profits and
the entry of an additional firm makes the profits earned by all firms in the market to fall
below zero.
Admittedly no industry fits in the above description. After all, firms differ in many aspects: products produced by different firms are usually not the same, firms’ services differ,
and their reputation is generally very different. This most stylized model only serves as a
starting point for analysis and helps to shed lights on the structure of the underlying profit
functions firms face (which we never observe) by analyzing the relationship between market
conditions and firms’ entry decisions.Another motivation for this type of models is due to
the scarcity of firm level information. As a matter of fact, many of the empirical entry
models have focused on the industries that are relatively homogeneous (so the homogeneity
assumption is not too bad) and have a large number of well defined market areas (so we
3.3 Estimation Method
67
can make inferences by using cross sectional data). For example, Mazzeo (2002) analyzed
the entry of motels in small towns along the interstate highways, and Seim (2006) studied
the video retailers in medium sized areas.
I also assume that total demand function is well behaved and firm level profit decreases
in the number of firms operating in the market. Let Πj denote the profit each firm earns
when there are j firms, then Π1 ≥ Π2 ≥ · · · ≥ ΠJ . Profit depends on the number of firms,
market observables and fixed cost in the following way:
Πj = V (X, j, θ) − F
where X is a vector of market observables, j is the number of firms in the market, θ is
the parameter vector, and F is the fixed cost. In equilibrium, there are j firms when Πj is
positive (or zero) while Πj+1 is negative. This translates to the following ordered response
model:
3.3
⎧
⎪
0
⎪
⎪
⎪
⎪
⎪
⎪
⎪
1
⎪
⎪
⎨
Y =
2
⎪
⎪
⎪
⎪
⎪
⎪
···
⎪
⎪
⎪
⎪
⎩
J
if Π1 < 0
if Π1 ≥ 0 & Π2 < 0
if Π2 ≥ 0 & Π3 < 0
if ΠJ ≥ 0
Estimation Method
Before I jump into the semi-parametric estimation procedure proposed by Klein and Spady
(1993), it is perhaps helpful to review the commonly used parametric estimation approach.
3.3 Estimation Method
68
Notice that if we fix a distribution assumption on the fixed cost, it is straight forward to
write down the likelihood of the sample and pursue with maximum likelihood estimation.
Specifically, the probability of observing j firms in market i conditioning on the market
observables Xi is (assuming the fixed cost {Fi }N
i=1 has a i.i.d. normal distribution):
P r(Yi = j|Xi = xi ) = P r(Πj ≥ 0, Πj+1 < 0|Xi = xi )
= P r(V (Xi , j + 1, θ) < Fi < V (Xi , j, θ)|Xi = xi )
= Φ(V ( xi , j, θ)) − Φ(V ( xi , j + 1, θ))
= Pj (θ)
where Φ denotes the c.d.f. of a normal distribution, and I have suppressed the subscript
i in Pj (θ) to simplify notation. The log likelihood function for the sample is therefore the
following:
LN
=
J
N X
X
{Yi = j}ln[P r(Yi = j|Xi = xi )]
i=1 j=0
=
N X
J
X
{Yi = j}lnPj (θ)
i=1 j=0
where {.} is the indicator function. It might seem impossible to come up with a probability
measure if we don’t impose any distribution assumption of F at all. The corner stone of
Klein and Spady (1993) approach is to translate the unobserved conditional probability into
3.3 Estimation Method
69
a product of three components that we can easily estimate using Baye’s rule:
P r(Yi = j|Xi = xi ) =
=
pY i,Xi (Yi = j, xi )
pXi ( xi )
pXi ( xi |Yi = j) ∗ P (Yi = j)
pXi ( xi )
where pXi ( xi |Yi = j) is the conditional density of Xi , pXi ( xi ) is the unconditional density
of Xi , and P (Yi = j) is the unconditional probability of Yi equal to j. Both pXi (xi |Yi = j)
and pXi (xi ) can be estimated directly from data using non parametric density estimation.
P (Yi = j) can be consistently estimated by the sample frequency of {Yi = j}. A simple
transformation therefore enables us to back out the unobserved probability measure using
data alone!
In a nutshell, the differences between parametric estimation and the semiparametric
approach can be summarized as the following: parametric estimation starts with a distribution assumption, and calculates the choice probability P r(Y = j|X = x) based on the
distribution assumption; the semi-parametric estimation reverses the procedure in the sense
that it estimates the choice probabilities first and then derives the distribution of the error
term (or the fixed cost) based on the choice probability estimates.
Let Pj (θ) denote the conditional choice probability P r(Yi = j|Xi = xi ).In Klein and
Spady (1993), Pj (θ) is estimated semiparametrically using a two-step bias reducing kernel (also called adaptive kernel). The estimated P̂j (θ) is substituted for Pj (θ) in the log
likelihood function.Maximizing the log likelihood function with respect to θ then gives us
3.4 Kernel Density Estimation
70
the quasi-maximum likelihood estimator θ̂(P̂ ) (“quasi” because we use the estimated choice
probability instead of the “true” choice probability in the maximization routine). Klein and
Spady (1993) showed that θ̂(P̂ ) behaves asymptotically like the MLE θ̂(P ). Under mild
regularity conditions, θ̂(P̂ ) is consistent, asymptotically normally distributed.When the error term is independent of the regressors, θ̂(P̂ ) is efficient, in that it attains the efficiency
bound among the class of semi-parametric estimators. (Roughly speaking, among the class
of all estimators that don’t impose distribution assumptions for the error term, θ̂(P̂ ) has
the smallest variance.)
3.4
Kernel Density Estimation
To be able to understand how one can estimate the choice probability nonparametrically, a
brief review about density estimation might be helpful.
Suppose θ is known in the above model. Suppose also that the correct function form
of profit V (X, j, θ) is known and the focus is to estimate the distribution of the random
scalar variable V (X, j, θ). The data is a sample of N realizations of V : v1 , v2 , · · · , vN . A
naive estimator would be to distribute equal mass on all observations, each with mass 1/N .
To do this, we draw a box with width 2h (h is some small positive number) and height
(2hN )−1 centered around each observation and sum over all the boxes. Intuitively, if there
are many observations centered around value v, summing over lots of boxes leads to a high
value of fˆV (v), which is what we would expect.For an example of density estimation using
3.4 Kernel Density Estimation
71
seven data points, see Figure 3(A).
As simple and appealing as this approach is, it has a couple of drawbacks. Most importantly, fˆV (v) is not a continuous function, with jumps at {{vi ± h}}N
i=1 (vi is the sample
observation) and has zero derivatives everywhere else. A straight forward extension to the
above method is to replace the box function with a function K(v) that is smooth and
satisfies the“weight” requirement:
Z
−∞
K(v)dv = 1
−∞
Typically, K(v) is chosen to be a symmetric probability function, like the normal density
function. By analogy with the definition of the naive estimator, the density estimator with
kernel K is:
N
1 X v − vi
]/h
fˆV (v) =
K[
N
h
i=1
where {vi }N
i=1 are the sample points, h is the window size, or bandwidth. Just as the naive
estimator can be considered as a sum of “boxes” centered around the observations, the
kernel estimator is a sum of bumps placed at the sample points. The kernel K determines
the shape of the bumps, while the window size h determines the width of the bumps. A
big bandwidth allows observations far away from v to contribute to the estimate of fˆV (v),
leading to a smoother density estimate, while a small bandwidth focuses more on the observations immediately around v and the density estimates is bumpier.
Some elementary properties of kernel estimators follow directly from the definition.
3.4 Kernel Density Estimation
72
First, if the kernel function K is a density function, then fˆV (v) is also a density function
(non-negative everywhere and integrating to one). Further, if K is smooth and differentiable
to the rth order, so is fˆV (v). See Figure 3(B) for an example of kernel density estimation,
taken from Silverman (1986), page 14.
The kernel estimator defined above applies one window size to the entire sample and
doesn’t handle very well the tail part of the distribution where there are fewer observations.
Adaptive kernel estimation copes with this problem by a two-step procedure. An initial
kernel estimate (usually called the pilot estimate in the literature of semi-parametric estimation) is used to get an idea of the density at each observation. This preliminary density
estimate then yields an optimal window size used to construct the adaptive estimator itself.
The general strategy of the adaptive kernel estimation is the following: Step 1: find a
pilot estimate fˆV (vi ) that satisfies fˆV (vi ) > 0, ∀i.
ˆ
Step 2: define the local bandwidth factor λi by λi = ( fV g(vi ) )−α , where g is the geometric
mean of all the fˆV (vi ). α is a constant between zero and one.1
Step 3: define the adaptive kernel estimator f˜V (v) by:
N
1 X v − vi
˜
K[
]/(hλi )
fV (v) =
N
hλi
i=1
where K is the kernel function and h is the normal bandwidth (or window size) that is held
constant across the sample points. The general view in the kernel estimation literature is
1
Abramson (1982) showed that under some regularity conditions, the optimal α is 1/2.
3.5 Identification Assumptions and Properties of the Quasi Maximum
Likelihood Estimator
73
that the adaptive kernel estimate is insensitive to the fine details of the pilot estimate, and
therefore any convenient estimate suffices for that purpose.
The adaptive kernel estimation is important because we need to have a sufficiently
small bias of the density estimate to establish the consistency and normality of the quasimaximum likelihood estimator θ̂(P̂ ). Intuitively, we want the estimated choice probability
P̂j (θ) (which is based on the estimated densities) to approach to the “true” choice probability
Pj (θ) reasonably fast. In fact, the adaptive kernel combined with optimal window sizes
obtains a uniform bias of the order h4 . (See Silverman (1986) for further explanation of the
properties of the adaptive kernel estimation.)
3.5
Identification Assumptions and Properties of the Quasi
Maximum Likelihood Estimator
The identification assumptions of θ̂(P̂ ) is fairly general and less restrictive than those under
the parametric estimation procedure. They are:
Assumption 3.1. Parameter space: the true parameter θ0 is an interior point of Θ, a
compact subspace of RK .
Assumption 3.2. Data: {Yi , Xi }N
i=1 are i.i.d. observations. In addition, each Xi is independent of the error term
i.
Assumption 3.3. Index restriction: there exists a scalar index V (X, θ0 ) such that
P r(Y = j|X = x) = P r(Y = j|V (X, θ0 ) = V ( x, θ0 ))
3.5 Identification Assumptions and Properties of the Quasi Maximum
Likelihood Estimator
74
In the above model, V (X, θ0 ) is V (X, j, θ0 ), the non random part of the profit. The
index restriction is very important in that it reduces the dimension of the conditional variables from K (assuming there are K non-constant regressors) to 1. Instead of estimating
the distribution of K variables, whose precision decreases exponentially as K increases, we
only need to estimate the distribution function for one variable (the index).
Among the set of regressors, X1 , ..., Xk , at least one regressor is continuous. Without
loss of generality, let this variable be X1 . Write f as the conditional density of X1 given
the rest of the regressors and Y .
Assumption 3.4. Conditional density: f is strictly positive and has bounded derivatives
up to the 4th order.
The smoothness and boundedness help to control the bias in establishing the limiting
distribution of θ̂(P̂ ). It ensures that the densities underlying the choice probability Pj (θ)
are sufficiently smooth and can be approximated by a smooth function of θ.
Klein and Spady (1993) showed that the quasi-likelihood estimator explained above is
consistent at the
√
N rate, asymptotically normal and efficient under the assumption of
independent errors.
Theorem 3.1 (Consistency). Under conditions A1 - A4:
J
N X
X
p
θ̂ ≡ arg sup
{Yi = j}lnP̂j (θ) → θ0
θ
i=1 j=0
Proof: See Klein and Spady (1993)
3.5 Identification Assumptions and Properties of the Quasi Maximum
Likelihood Estimator
75
Theorem 3.2 (Normality). Under conditions A1 - A4, the asymptotic distribution of
N 1/2 (θ̂ − θ0 ) is N (0, Σ), where
Σ≡E
("
∂P
∂θ
#"
∂P
∂θ
#0 "
1
P (1 − P )
#)−1
θ=θ0
Proof: See Klein and Spady (1993). See Klein and Spady (1993) for further explanation
of the asymptotic variance of θ̂. Here P = P r(Y |θ), where θ is not fixed at the true value θ0 .
Intuitively, when the estimated choice probability P̂j (θ) converges to the true choice
probability Pj (θ) fast enough, the quasi likelihood function L̂N =
converges uniformly to its expectation LN =
PN PJ
i=1
j=0 {Yi
PN PJ
i=1
j=0 {Yi
= j}lnP̂j (θ)
= j}lnPj (θ). Therefore, the
quasi likelihood estimator θ̂, which maximizes the quasi likelihood function, converges uniformly to the true parameter value θ0 that maximizes the true likelihood function. The
distribution of N 1/2 (θ̂ − θ0 ) inherits its normality from the properties of the score function,
which can be shown to converge to a normalized sum of i.i.d. elements that has a asymptotic
normal distribution.
3.6 Differences between my model and Klein and Spady (1993)
3.6
76
Differences between my model and Klein and Spady (1993)
The model considered by Klein and Spady (1993) is a traditional ordered response model
with a single index and multiple cutoff values:
⎧
⎪
0
⎪
⎪
⎪
⎪
⎪
⎪
⎪
1
⎪
⎪
⎨
Y =
2
⎪
⎪
⎪
⎪
⎪
⎪
···
⎪
⎪
⎪
⎪
⎩
J
if Xβ + < t0
if t0 ≤ Xβ + < t1
if t1 ≤ Xβ + < t2
if Xβ + ≥ tJ−1
where Xβ is the single index (X is a row vector of K regressors), and t0 , t1 , · · · , tJ−1 are
the cutoff values. In my model, all of the cutoff values are zero. Instead of a single index,
there are multiple indices:
⎧
⎪
0
⎪
⎪
⎪
⎪
⎪
⎪
⎪
1
⎪
⎪
⎨
Y =
2
⎪
⎪
⎪
⎪
⎪
⎪
···
⎪
⎪
⎪
⎪
⎩
J
if V (X, 1, θ) − F < 0
if V (X, 1, θ) − F ≥ 0, V (X, 2, θ) − F < 0
if V (X, 2, θ) − F ≥ 0, V (X, 3, θ) − F < 0
if V (X, J, θ) − F ≥ 0
where the indices are V1 (which is V (X, 1, θ)), V2 , · · · , VJ .
While Klein and Spady (1993) only needs to condition on one index Xβ, I have to
condition on all the indices V1 , V2 , · · · , VJ to calculate the choice probability P r(Y = j|V ).
Under the assumption that V1 ≥ V2 ≥ · · · ≥ VJ , the outcome of Y (say Y = j) is determined
by the relative positions of two indices: Vj ≥ F and Vj+1 < F . In some sense, Vj and Vj+1
3.6 Differences between my model and Klein and Spady (1993)
77
are sufficient statistics for the conditional choice probability. To be more specific,
P r(Y = j|V1 , · · · , VJ ) = P r(Y = j|Vj , Vj+1 )
=
pVj ,Vj+1 (vj , vj+1 |Y = j)P (Y = j)
pVj ,Vj+1 (vj , vj+1 )
To obtain the joint density estimate of Vj and Vj+1 , bivariate density estimation is
needed. However, the accuracy of density estimates decreases exponentially as the number
of dimensions increases. A trick to go back to the single index model is to change the
structure of Y . Instead of using:
Y = j ⇔ Πj ≥ 0, Πj+1 < 0
we use:
Y ≤ j ⇔ Πj+1 < 0 ⇔ Vj+1 < F
which leads to:
P r(Y ≤ j|V1 , · · · , VJ ) = P r(Y ≤ j|Vj+1 )
=
pVj+1 (vj+1 |Y ≤ j)P (Y ≤ j)
pVj+1 (vj+1 )
The estimated conditional choice probability P r(Y = j|V ) is the difference between P r(Y ≤
j|V ) and P r(Y ≤ j − 1|V ).
3.7 Data
3.7
78
Data
The Data set is composed of all small towns with population less than 10,000 across U.S.
They are at least 10 miles away from any town with a population bigger than 1,000 and 20
miles away from any town bigger than 5,000. Together 684 towns satisfy these restrictions.
Geographical isolation helps to define a market relatively easily, since population of the town
will be a good proxy for the local demand. The industry I am focusing on is dry cleaning
industry, for several reasons. First, people are not very likely to travel long distance for dry
cleaning services, so the leakage of demand should not be a serious problem. Second, it is a
fairly homogenous industry with standard cleaning technology. Third, the sizes of the firms
in my sample are very similar, with 90% of them employing less than 10 employees. Town
population, median household income and population age structure come from Census CD,
while firm information comes from American Business Disc, 2002.
3.8
Results
Since not many towns have three or more firms, I group them with the towns that have
two firms. Among the 684 towns in the sample, 354 of them have no dry cleaners, 219 of
them have one dry cleaner, and the rest of the 111 towns have two or more dry cleaners.
Monopoly profit and duopoly profit takes the following forms:
Π1 = S[a1 + b ∗ Inc] − F
Π2 =
S
[a2 + b ∗ Inc] − F
2
3.8 Results
where S is town population, Inc is the median household income. S (or
79
S
2
in the duopoly
profit equation) is the per firm market size, which measures the quantity of demand facing
each firm; a1 + b ∗ Inc is a first order approximation of monopoly variable profit per unit
of demand, and a2 + b ∗ Inc is an approximation of duopoly profit. Here I assume that the
total revenue can be separated into two multiplicative elements: unit variable profit and
the total units of demand. In particular, profit per unit doesn’t depend on the size of the
market. I also assume that the entry of another firm changes the unit variable profit only
by a constant. Admittedly, this is a very restrictive assumption, but it is a parsimonious
way to model the competition effect of entry and it allows me to focus on the parameter
estimation. The parameters are a1, a2, b, and the mean or median of the fixed cost F . I
impose restrictions on the parameters to guarantee that Π1 ≥ Π2 .
Notice that there is no constant in the model — it is absorbed into the mean of the
fixed cost. The approach proposed by Klein and Spady (1993) can not identify the constant
term since there is no restriction imposed on the error term distribution. In particular,
we can not impose the zero mean assumption of the error term during estimation. The
mean or median of the fixed cost can only be recovered after obtaining the estimate of its
distribution. Also, as common to all of the ordered response models, the parameters are
only identified up to a scale factor. Here I normalize b to be one.
Recall that the choice probabilities are functions of the fixed cost distribution evaluated
3.8 Results
80
at the sample points:
P r(Yi ≤ j|Vj+1 (Xi , j + 1, θ) = Vj+1 (xi , j + 1, θ)) = P r(F > Vj+1 (xi , j + 1, θ))
= 1 − PF (Vj+1 (xi , j + 1, θ))
where PF (Vj+1 (xi , j+1, θ)) is the distribution of F evaluated at the sample point Vj+1 (xi , j+
1, θ). Figure 4 displays the estimated distributions and Figure 5 shows the density estimates
of the fixed cost. There are four different estimates, each using a different window size, holding the parameter estimates θ̂ fixed at the one obtained using the optimal window size N −1/6
. The window sizes for the first graph to the last graph are N −1/6 , N −1/7 , N −1/8 and N −1/9
respectively. (The estimation procedure requires that the window size be between N −1/6
and N −1/9 for a reasonable bias and variance of the parameter estimates.) As you can
see, the curves get smoother as the window size gets bigger. Note that the density is not
symmetric — it has a long tail. I repeat the above exercise while fixing the parameters to be
at the parametric estimate values. The distribution estimates look very similar to the above
figures. It seems that this feature of the distribution (asymmetry) is not an artifact of the
semiparametric estimation , but is inherent to the data. The boundary of the distribution
is not very precisely estimated.
There are many ways to estimate the distribution of the error term, yet there has not
been any work on which one is most efficient. For example, we can estimate the distribution from the probability estimates {P r(Yi ≤ 0|V1 (xi , j + 1, θ))}N
i=1 , which give us 684
data points, or we can estimate the distribution from any other choice probabilities, like
3.8 Results
81
P r(Y ≤ j|Vj+1 (xi , j + 1, θ))}N
i=1 ; or maybe we can use a mixture of both. In plotting
the graphs,I have chosen to use the first category P r(Y ≤ 0|V1 ) since the group {Yi = 0}
has by far the most observations (more than half of the sample) and the choice probability P r(Y ≤ 0|V1 ) should be estimated more accurately. I did a rough comparison among
different distribution estimates using different groups of choice probabilities. They overlap
in the middle part where there are more observations and differ somewhat in the boundaries.
Table 11 shows both the semiparametric estimates and the parametric estimates. As
can be seen from the table, all of the estimates are significant at 1% level. The parametric
estimates of the slope coefficients are 12% larger than the semi-parametric ones; the parametric estimate of the constant is around 20% to 30% larger. These differences translate
to a bigger competitive effect of entry as suggested by the parametric estimates. It will
become clearer later when I discuss about entry threshold ratios. In the parametric estimation, the mean and median of the fixed cost happen to be the same number because the
assumed distribution is symmetric. I would argue that in the current context, the median is
more meaningful and certainly more robust than the mean of the distribution. The median
informs us of the “typical” fixed costs incurred by firms, while the mean is more sensitive to
the tail properties of the fixed cost. A technical reason that I use the median is that I can
not estimate the boundary of the distribution very precisely, so a reliable estimate of the
mean is hard to obtain. Perhaps not surprisingly, the log likelihood of the semiparametric
estimation is much bigger than that of the parametric estimation, suggesting a better fit of
the data.
3.8 Results
82
A central concept in the Bresnahan and Reiss (1991) is the entry threshold SkT H , the
smallest size of the market that can support k firms, once we fix the fixed cost at its mean
(or median) level. Mathematically, they are:
S1T H
=
S2T H
=
S2T H
S1T H
=
F
a1 + b ∗ Inc
F
a2 + b ∗ Inc
a1 + b ∗ Inc
a2 + b ∗ Inc
The threshold values are displayed in Table 12. According to the parametric estimates,
in a town with average median household income, it takes about 2.7 thousand people to
support one dry cleaner and an extra 4.5 thousand people to support the second firm. The
semi-parametric estimates suggest that the second firm will enter when the town’s population increases by 4,000. This is a 13% difference. A more rigorous test of the differences is
discussed later.
To show that the asymmetry of the fixed cost distribution drives the differences between
the parametric estimates and the semi-parametric estimates, I did the following exercise.
I truncated those observations corresponding to the tail part of the fixed cost distribution
and changed the sample distribution of the fixed cost into a more or less symmetric one.
I repeated both the parametric and the semiparametric estimation using the new sample.
Table 13 displays the new estimates. Now the differences between the two groups of estimates become much smaller, in the range of 1% to 5%!
3.8 Results
83
A natural choice to test whether the two groups of estimates are significantly different
from each other is the Hausman test. Recall that the formula for Hausman test is:
H = (θ̂ − θ̃)0 V ar(θ̂ − θ̃)−1 (θ̂ − θ̃)
where θ̃ is the efficient estimator under the null hypothesis, but is inconsistent under the
alternative. θ̂ is consistent under both the null and the alternative. A typical trick to
calculate the variance of ( θ̂ − θ̃ ) is to use the following:
V ar(θ̂ − θ̃) = V ar(θ̂) − V ar(θ̃)
if θ̃ is the efficient estimator under the null. Here I run into a technical problem: as showed
in Table 11, the estimated variances of the semiparametric estimates are much smaller than
the estimated variances of the parametric ones. I repeated the estimation of the variances
using different window sizes, including the one that is optimal to estimate the
∂P
∂θ
element
in the semiparametric θ’s variance (See Theorem 2 for detail.) However, they all produce
the same pattern that the variances of the semi-parametric estimates are smaller. Interestingly, Horowitz and Hardle (1996) also found the same pattern that the variances of
the parametric estimates are much bigger than those of the semiparametric estimates (the
average derivative estimators). This opens the following discussion: is it a small sample
problem, in that we can not obtain a very accurate estimate of the true variances, or does it
suggest some sort of mis-specification (for example, the symmetry assumption on the error
distribution is not valid)? The answer to this question is beyond the scope of this chapter
3.9 Conclusion
84
and is left for future research. Using the difference of the estimated variance matrices as the
denominator of the test statistics, we fail to reject the null hypothesis (with the size of the
test being 5%) that these two sets of estimates are not significantly different from each other.
3.9
Conclusion
The conclusion of this chapter is more quailitative than quantitative. I seem to find evidence
that the error term has a asymmetric distribution, yet I can not reject the hypothesis that
the two estimates are the same. The literature does not provide researchers with a good
guidance as to how to obtain the optimal distribution estimates among the many possible
choices. Despite the above limitations, the method proposed in this chapter can still be
used for robustness check. Failure to find significant differences among the two groups of
estimates brings us more confidence about the validity of the distribution assumption, while
the finding of a significant difference calls for attentions about possible specification errors.
4. FIGURES
Figure 1: Wal-Mart and Kmart Stores in 1988
Legend
!
#
Wmart Store
Kmart Store
Small and Medium Counties
Figure 2: Wal-Mart and Kmart Stores in 1997
Wmart Stores
#
Kmart Store
Small and Medium Counties
85
86
4. FIGURES
Figure 3(A): Naïve Estimate. Window Size: 0.4
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-5
-4
-3
-2
-1
0
1
2
3
4
Figure 3(B): Kernel Estimate Showing Individual Kernels. Window Size: 0.4
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-5
-4
-3
-2
-1
0
1
2
3
4
4. FIGURES
Figure 4: CDF of the Error Term: Different Window Sizes
87
4. FIGURES
Figure 5: PDF of the Error Term: Different Window Sizes
88
5. TABLES
Year
1960
1980
1989
1997
89
Table 1 (A): The Discount Industry from 1960 to 1997
Number of
Total Sales
Average Store
Number
Discount Stores
(2004 $bill.)
Size (thou ft²)
of Firms
1329
12.8
38.4
1016
8311
119.4
66.8
584
9406
123.4
66.5
427
9741
198.7
79.2
230
Source: various issues of Discount Merchandiser. The numbers only include traditional discount stores.
Wholesale clubs, super-centers, and special retailing stores are all excluded.
Table 1 (B): Summary Statistics for the Data Set
1988
1997
Variable
Mean
Std.
Mean
Std.
22.47
14.12
24.27
15.67
Population (thou.)
3.69
1.44
4.05
2.02
Per Capita Retail Sales (1984 $thou.)
0.30
0.23
0.33
0.24
Percentage of Urban Population
Midwest (1 if in the Great Lakes, Plains,
0.41
0.49
0.41
0.49
or Rocky Mountain Region)
0.50
0.50
0.50
0.50
South (1 if Southwest or Southeast)
6.14
3.88
6.14
3.88
Distance to Benton, AR (100 miles)
0.21
0.42
0.19
0.41
% of Counties with Kmart Stores
0.32
0.53
0.48
0.57
% of Counties with Wal-Mart Stores
3.86
2.84
3.49
2.61
Number of Firms with 1-19 Employees
2065
Number of Counties
Source: the 1988 population is from the U.S. Census Bureau’s website; the 1997 population is from the
website of the Missouri State Census Data Center. Retail sales are from the 1987 and 1997 Economic
Census, respectively. The percentage of urban population is from the 1990 and 2000 decennial census,
respectively. Region dummies and the distance variable are from the 1990 census. The numbers of Kmart
and Wal-Mart stores are from the Directory of Discount Stores, and the number of small stores is from the
County Business Patterns. See section 4.2 for the definition of the chain effect.
Table 1 (C): Summary Statistics for the Distance
Weighted Number of Adjacent Stores
1988
1997
Variable
Mean
Std.
Mean
Std.
Distance Weighted Number of Adjacent
0.11
0.08
0.13
0.11
Kmart Stores within 50 Miles
Distance Weighted Number of Adjacent
0.10
0.08
0.19
0.19
Wal-Mart Stores within 50 Miles
2065
Number of Counties
Source: the annual reference “Directory of Discount Department Stores” by Chain Store Guide, Business Guides, Inc.,
New York.
5. TABLES
90
Table 2: Parameter Estimates from Probit (Kmart and Wal-Mart) and Ordered Probit (Small Firms)
Kmart's Profit
Wal-Mart's Profit
Small Firms' Profit
Variable
1988
1997
Variable
1988
1997
Variable
1988
Log Population
1.68*
1.75*
Log Population 1.27*
2.23*
Log Population 1.24*
1997
1.18*
(0.12)
(0.12)
(0.10)
(0.11)
(0.05)
(0.05)
Log Retail Sales
1.69*
1.64*
Log Retail Sales 1.32*
1.58*
Log Retail Sales 0.76*
0.64*
(0.16)
(0.14)
(0.07)
(0.06)
Urban Ratio
1.56*
1.20*
(0.25)
(0.24)
Midwest
0.43*
0.33*
(0.09)
(0.10)
Constant
-20.55* -20.59*
(1.36)
δ kw
(0.14)
(0.13)
Urban Ratio
1.27*
0.99*
(0.21)
(0.21)
Log Distance
-1.63*
-1.11*
(0.09)
(0.08)
South
1.08*
0.87*
(0.09)
(0.09)
Constant
-5.75* -12.88*
(1.11)
(1.08)
δ wk
-0.57*
-0.73*
(0.11)
(0.11)
δ ww
2.88*
-3.41*
δ ws
(0.79)
(0.63)
-0.20*
-0.34*
(0.08)
(0.09)
(1.28)
-0.21*
-0.66*
(0.09)
(0.11)
δ kk
-1.38*
-1.15†
δ ks
(0.67)
(0.64)
-0.17†
0.01
(0.09)
(0.09)
Observation Number 2065
2065
Log Likelihood
-583.64 -567.77
2065
2065
-680.25 -669.71
Urban Ratio
-1.10* -0.99*
(0.12)
(0.12)
South
0.62*
0.78*
(0.05)
(0.05)
-0.34*
-0.11
(0.07)
(0.07)
δ sk
δ sw
-0.32* -0.21*
(0.06)
(0.06)
2065
-4226
2065
-4035
Note: * denotes significance at the 5% confidence level, and † denotes significance at the 10% confidence
level. Standard errors are in parentheses. The cutoff values for the small firms’ regressions are omitted.
Midwest and South are regional dummies, with the Great Lakes region, the Plains region, and the Rocky
Mountain region grouped as the “Midwest”, the Southwest region and the Southeast region grouped as the
“South”. δ ij , i, j ∈ {k , w, s}, i ≠ j , denotes the competition effect, while δ ii , i ∈ {k , w}, denotes the
chain effect. “k” stands for Kmart, “w” stands for Wal-Mart, and “s” stands for small stores.
5. TABLES
Kmart's Profit
Variable
Log Population
91
1988
1.49*
Table 3: Parameter Estimates from the Full Model
Wal-Mart's Profit
Small Firms' Profit
1997
Variable
1988
1997
Variable
1988
1.84*
Log Population 1.54* 2.16*
Log Population 1.65*
1997
1.90*
(0.09)
(0.13)
(0.15)
(0.15)
(0.18)
(0.26)
Log Retail Sales 2.19*
2.01*
Log Retail Sales 1.56*
1.85*
Log Retail Sales 1.04*
1.17*
(0.25)
(0.23)
Urban Ratio
2.07*
1.55*
(0.42)
(0.29)
Midwest
0.40*
0.32*
(0.12)
(0.14)
Constant
-24.60* -24.08*
(2.07)
δ kw
-0.93*
(0.22)
(0.29)
δ kk
0.63*
0.75*
δ ks
(0.20)
(0.36)
-0.07
-0.02
ρ
(0.05)
(0.05)
0.53*
0.65*
(0.11)
(0.10)
62.84
(0.25)
2.19*
1.73*
(0.35)
(0.40)
Log Distance
-1.31* -1.01*
(0.16)
(0.14)
South
0.94*
0.61*
(0.13)
(0.11)
(2.07)
-0.48*
Function
Value
(0.35)
Urban Ratio
30.80
Constant
-10.90* -16.37*
δ wk
-1.54* -1.13*
(2.98)
(0.12)
(0.16)
Urban Ratio
-0.46*
-0.78*
(0.21)
(0.24)
South
0.88*
1.03*
(0.14)
(0.17)
Constant
-10.22* -11.89*
(0.98)
(1.56)
δ sk
-1.20*
-1.00*
δ sw
(0.23)
(0.20)
-1.11*
-1.03*
δ ss
(0.16)
(0.21)
-2.14*
-2.41*
(0.28)
(0.35)
2065
2065
(2.17)
δ ww
(0.21)
(0.30)
1.22*
0.89*
δ ws
(0.43)
(0.39)
-0.03
-0.03
(0.11)
(0.12)
Observation
Number
Note: * denotes significance at the 5% confidence level, and † denotes significance at the 10% confidence
level. Standard errors are in parentheses. Midwest and South are regional dummies, with the Great Lakes
region, the Plains region, and the Rocky Mountain region grouped as the ‘Midwest’, the Southwest region
and the Southeast region grouped as the ‘South’. δ ij , i, j ∈ {k , w, s}, i ≠ j , denotes the competition effect,
while
δ ii , i ∈ {k , w}, denotes the chain effect. ‘k’ stands for Kmart, ‘w’ stands for Wal-Mart, and ‘s’ stands
for small stores. 1 − ρ 2 measures the importance of the market-level profit shocks.
5. TABLES
92
Table 4 (A): Parameter Estimates Using Different Equilibria (1988)
Favors Kmart
Favors Wal-Mart
Favors Wal-Mart in South
Kmart's Profit
Log Population
1.49*
1.53*
(0.09)
(0.12)
(0.13)
Log Retail Sales
2.19*
2.16*
2.18*
(0.25)
(0.23)
(0.30)
Urban Ratio
2.07*
2.15*
2.23*
(0.42)
(0.36)
(0.36)
Midwest
0.40*
0.32*
0.27*
(0.12)
(0.10)
(0.09)
Constant
-24.60*
-24.20*
-24.30*
δ kw
(2.07)
(1.83)
(2.51)
-0.48*
-0.51†
-0.46†
δ kk
(0.22)
(0.29)
(0.25)
0.63*
0.71†
0.72*
δ ks
(0.20)
(0.36)
(0.16)
-0.07
-0.13
-0.09
(0.05)
(0.15)
(0.09)
1.54*
1.52*
1.43*
(0.15)
(0.21)
(0.13)
Log Retail Sales
1.56*
1.52*
1.55*
(0.35)
(0.25)
(0.17)
Urban Ratio
2.19*
2.31*
2.45*
(0.35)
(0.34)
(0.38)
Log Distance
-1.31*
-1.29*
-1.34*
(0.16)
(0.17)
(0.12)
South
0.94*
1.03*
1.00*
Wal-Mart's Profit
Log Population
1.46*
(0.13)
(0.13)
(0.21)
Constant
-10.90*
-10.88*
-10.58*
δ wk
(2.98)
(1.69)
(1.46)
-1.54*
-1.58*
-1.51*
δ ww
(0.21)
(0.32)
(0.32)
1.22*
1.32*
1.25*
δ ws
(0.43)
(0.24)
(0.47)
-0.03
-0.03
-0.03
(0.11)
(0.09)
(0.07)
1.65*
1.64*
1.59*
(0.18)
(0.26)
(0.13)
Log Retail Sales
1.04*
1.04*
1.04*
(0.12)
(0.16)
(0.12)
Urban Ratio
-0.46*
-0.53†
-0.46*
(0.21)
(0.32)
(0.19)
South
0.88*
0.85*
0.93*
Small Firms' Profit
Log Population
(0.14)
(0.11)
(0.09)
Constant
-10.22*
-10.21*
-10.18*
δ sk
(0.98)
(1.40)
(1.05)
-1.20*
-1.14*
-1.12*
δ sw
(0.23)
(0.19)
(0.18)
-1.11*
-1.08*
-1.12*
δ ss
(0.16)
(0.15)
(0.13)
-2.14*
-2.14*
-2.10*
ρ
(0.28)
(0.37)
(0.16)
0.53*
0.54*
0.54*
(0.11)
(0.12)
(0.08)
Function Value
Number of Observations
62.84
2065
62.87
2065
71.30
2065
Note: * denotes significance at the 5% confidence level, and † denotes significance at the 10% confidence
level. Standard errors are in parentheses. See Table 3 for the explanation of the variables and parameters.
5. TABLES
93
Table 4 (B): Parameter Estimates Using Different Equilibria (1997)
Favors Kmart
Favors Wal-Mart Favors Wal-Mart in South
Kmart's Profit
Log Population
1.84*
1.66*
(0.13)
(0.13)
(0.15)
Log Retail Sales
2.01*
1.91*
1.83*
(0.23)
(0.34)
(0.22)
Urban Ratio
1.55*
1.57*
1.74*
(0.29)
(0.20)
(0.50)
Midwest
0.32*
0.26*
0.22
(0.14)
(0.10)
(0.14)
-24.08*
-22.52*
-22.18*
δ kw
(2.07)
(3.16)
(1.73)
-0.93*
-0.94*
-0.92*
δ kk
(0.29)
(0.23)
(0.29)
0.75*
0.82*
0.74*
δ ks
(0.36)
(0.33)
(0.36)
-0.02
-0.02
-0.01
(0.05)
(0.07)
(0.14)
2.16*
2.05*
2.03*
(0.15)
(0.14)
(0.17)
Log Retail Sales
1.85*
1.72*
1.74*
(0.25)
(0.15)
(0.20)
Urban Ratio
1.73*
1.72*
1.90*
(0.40)
(0.53)
(0.51)
Log Distance
-1.01*
-1.04*
-1.04*
(0.14)
(0.10)
(0.10)
South
0.61*
0.74*
0.55*
Constant
Wal-Mart's Profit
Log Population
1.70*
(0.11)
(0.08)
(0.10)
Constant
-16.37*
-14.80*
-14.87*
δ wk
(2.17)
(1.30)
(1.26)
-1.13*
-1.26*
-1.15*
δ ww
(0.30)
(0.21)
(0.55)
0.89*
0.91*
0.87*
δ ws
(0.39)
(0.23)
(0.27)
-0.03
-0.10
-0.05
(0.12)
(0.12)
(0.10)
1.90*
1.92*
1.95*
(0.26)
(0.15)
(0.17)
Log Retail Sales
1.17*
1.20*
1.15*
(0.16)
(0.21)
(0.12)
Urban Ratio
-0.78*
-0.80*
-0.73†
(0.24)
(0.21)
(0.39)
South
1.03*
1.00*
0.97*
Small Firms' Profit
Log Population
(0.17)
(0.12)
(0.07)
Constant
-11.89*
-12.18*
-11.84*
δ sk
(1.56)
(1.82)
(1.21)
-1.00*
-1.07*
-1.09*
δ sw
(0.20)
(0.18)
(0.30)
-1.03*
-1.03*
-1.04*
δ ss
(0.21)
(0.15)
(0.23)
-2.41*
-2.39*
-2.40*
ρ
(0.35)
(0.20)
(0.17)
0.65*
0.67*
0.63*
(0.10)
(0.08)
(0.14)
Function Value
Number of Observations
30.80
2065
32.53
2065
37.70
2065
Note: * denotes significance at the 5% confidence level, and † denotes significance at the 10% confidence
level. Standard errors are in parentheses. See Table 3 for the explanation of the variables and parameters.
5. TABLES
94
Number of:
Kmart
Wal-Mart
Small Firms
Table 5 (A): Model's Goodness of Fit
1988
1997
Sample
Model
Sample
Mean
Mean
Mean
0.21
0.21
0.19
0.32
0.32
0.48
3.86
3.79
3.49
Model
Mean
0.20
0.49
3.43
Note: the model nails down the sample mean of Kmart and Wal-Mart in 1988 almost exactly. The average
number of small discount stores in each county is 3.86, while the model’s prediction is 3.79. The results are
similar for 1997.
Table 5 (B): Correlation between
Model Prediction and Sample Observation
Number of:
1988
1997
Kmart
0.66
0.64
Wal-Mart
0.73
0.75
Small Firms
0.63
0.64
Note: the correlation between the predicted and the observed number of Kmart stores is 0.66 in 1988, and
0.64 in 1997. The correlation between the predicted and the observed numbers of Wal-Mart stores (and
small stores) is also very high. Overall, the model fits the sample variation fairly well.
Table 6: Model Predicted Profit vs. Accounting Profit
Kmart
Wal-Mart
Model Average
1997/
1988
1988
1997
0.80
0.74
0.92
1.03
1.55
1.51
Average Accounting Profit
1988
1997
1997/
($mill.) ($mill.) 1988
0.56
0.14*
0.25
0.95
1.34
1.41
Source: Kmart’s and Wal-Mart’s SEC 10-K annual report.
*: Kmart’s accounting profit fluctuated dramatically in the 1990s, due to the financial obligations of the
various divested businesses. A better indicator of its store profit is probably the average store sales, which
remained stagnant throughout the 90s.
5. TABLES
95
Table 7 (A): Number of Kmart Stores When the Market Size Changes
1988
1997
Percent
Total
Percent
Total
Base Case
100.0%
431
100.0%
408
Population Up 10%
112.1%
483
115.7%
472
Retail Sales Up 10%
117.6%
507
117.4%
479
Urban Ratio Up 10%
107.0%
461
106.1%
433
Midwest=0 for All Counties
86.1%
371
87.7%
358
Midwest=1 for All Counties
119.3%
514
115.4%
471
Table 7 (B): Number of Wal-Mart Stores When
the Market Size Changes
1988
1997
Percent
Total
Percent
Base Case
100.0%
658
100.0%
Population Up 10%
110.3%
726
108.3%
Retail Sales Up 10%
110.5%
727
107.1%
Urban Ratio Up 10%
105.3%
693
102.5%
Distance Up 10%
91.6%
603
96.2%
South=0 for All Counties
64.1%
422
88.4%
South=1 for All Counties
135.1%
889
113.2%
Total
1014
1098
1086
1039
975
896
1148
Table 7 (C): Number of Small Firms When the Market Size Changes
1988
1997
Percent
Total
Percent
Base Case
100.0%
7831
100.0%
Population Up 10%
108.7%
8511
109.0%
Retail Sales Up 10%
105.4%
8253
105.4%
Urban Ratio Up 10%
99.3%
7773
98.7%
South=0 for All Counties
78.4%
6136
76.2%
South=1 for All Counties
124.8%
9775
125.0%
Total
7090
7727
7474
7000
5404
8860
Note: for each of these simulation exercises in all three panels, I fix other firms’ profits and only change the
profit of the target firm in accordance with the change in the market size. I re-solve the model to obtain the
equilibrium numbers of firms. For example, in the second row of Table 7 (A), I increase Kmart’s profit
according to a ten percent increase in population while holding Wal-Mart and small firms’ profits constant.
Using this new set of profits, the equilibrium number of Kmart stores is 12.1% higher than in the base case
in 1988.
5. TABLES
96
Table 8 (A): Number of Small Firms with Different Market Structure
1988
1997
Percent
Total
Percent
Total
No Kmart or Wal-Mart
100.0%
12070
100.0%
10946
Only Kmart in Each Market
54.0%
6519
63.8%
6985
Only Wal-Mart in Each Market
56.7%
6849
63.0%
6898
Both Kmart and Wal-Mart
28.6%
3457
38.4%
4198
Wal-Mart Competes with Kmart
64.9%
7831
64.8%
7090
Wal-Mart Takes Over Kmart
72.9%
8796
72.3%
7918
Table 8 (B): Competition Effect for Kmart and Wal-Mart
1988
1997
Percent
Total
Percent
Number of Kmart Stores
Base Case
100.0%
431
100.0%
Wm in Each Market
78.0%
336
79.9%
Wm Exits Each Market
111.1%
479
149.5%
Not Compete with Small
108.1%
466
102.7%
Number of Wal-Mart Stores
Base Case
Km in Each Market
Km Exits Each Market
Not Compete with Small
1988
Percent
100.0%
48.3%
128.6%
102.6%
Total
658
318
846
675
1997
Percent
100.0%
71.8%
108.6%
101.5%
Total
408
326
610
419
Total
1014
728
1101
1029
Table 8 (C) : Chain Effect for Kmart and Wal-Mart
Kmart
Wal-Mart
1988
1997
1988
1997
Percentage of Profit
Explained by Chain Effect
14.0%
17.4%
10.2%
12.3%
Reduction in Number of Stores
with No Chain Effect
40
46
125
109
Note: for the first four rows in Table 8(A), I fix the number of Kmart and Wal-Mart stores as specified and
solve for the equilibrium number of small stores. For the last two rows in Table 8(A) and all rows (except
for the rows of ‘Base Case’) in Table 8(B), I re-solve the full model using the specified assumptions. ‘Base
Case’ in Table 8(B) is what we observe in the data when Kmart competes with Wal-Mart. Table 8(C)
explains the importance of chain effect for both Kmart and Wal-Mart. Overall, the benefit from the chain
effect is 10-17% of a chain store’s profit.
5. TABLES
97
Table 9: The Impact of Wal-Mart's Expansion on Small Stores
1988
1997
Observed Decrease in the Number of Small Stores
748
748
Predicted Decrease from the Full Model
558
383
Percentage Explained
75%
51%
Predicted Decrease from Ordered Probit
247
149
Percentage Explained
33%
20%
Note: for the full model, the predicted 558 store exits in 1988 are obtained by simulating the change in the
number of small stores using the 1988 coefficients for Kmart’s and the small stores’ profit functions, but
the 1997 coefficients for Wal-Mart’s profit function. The column of 1997 uses the 1997 coefficients for
Kmart’s and small stores’ profit functions, but the 1988 coefficients for Wal-Mart’s profit function. For the
ordered probit model, the predicted store exits are the difference between the expected number of small
stores using Wal-Mart’s 1988 store number and the expected number of small stores using Wal-Mart’s
1997 store number, both of which calculated using the probit coefficient estimates for the indicated year.
Table 10: The Impact of Government Subsidies
Average Number of Stores
1988
1997
Base Case
Kmart
Wal-Mart
Small Firms
0.21
0.32
3.79
Changes in the Number of Stores
Compared to the Base Case
1988
1997
0.20
0.49
3.43
Subsidize Kmart's Profit by 10%
0.22
0.21
Kmart
0.31
0.49
Wal-Mart
3.77
3.41
Small Firms
0.01
-0.01
-0.03
0.01
0.00
-0.02
Subsidize Wal-Mart's Profit by 10%
0.21
0.19
Kmart
0.34
0.52
Wal-Mart
3.74
3.39
Small Firms
0.00
0.02
-0.05
-0.01
0.03
-0.04
Subsidize Small Firms' Profit by 100%
0.21
0.20
Kmart
0.32
0.49
Wal-Mart
4.61
4.23
Small Firms
0.00
0.00
0.81
0.00
0.00
0.80
Note: for each of these counter-factual exercises, I incorporate the change in the subsidized firm’s profit
and re-solve the model to obtain the equilibrium numbers of stores.
5. TABLES
98
Table 11: Estimated Coefficients (Standard Errors)
Semi-Parametric
Parametric
Model
Model
1
1
b
a1
a2
Median of Fixed Cost
Log Likelihood
(NA)
(NA)
-4.94**
-5.58**
(0.12)
(0.55)
-4.21**
-4.70**
(0.11)
(0.47)
-8.26
-9.93**
(NA)
-1.53
-527.6
-548.7
Note: ** denotes significance at the 1% level, and * denotes significance at the 5% level.
Table 12: Entry Threshold
Parametric
Model
Monopoly Threshold
(Per Firm Market Size)
Duopoly Threshold
(Per Firm Market Size)
Total Duopoly
Market Size
Change in Market Size from
Monopoly to Duopoly
Threshold Ratio
(S_2 / S_1)
Parametric
Model
2.76**
2.73**
(0.11)
(0.08)
3.41**
3.61**
(0.16)
(0.22)
6.81**
7.21**
(0.23)
(0.43)
4.05
4.48
1.23
1.32
Note: ** denotes significance at the 1% level, and * denotes significance at the 5% level.
5. TABLES
b
99
Table 13: Comparison between Parametric and
Semi-Parametric Estimates with the Truncated Sample
Semi-Parametric
Parametric
Model
Model
1
1
a1
a2
(NA)
(NA)
-5.24**
-5.32**
(0.13)
(0.52)
-4.39**
-4.57**
(0.11)
(0.44)
-8.65
-9.07**
(NA)
(1.45)
Log Likelihood
-493.3
-511.9
Monopoly Threshold
(Per Firm Market Size)
2.63**
2.69**
(0.10)
(0.08)
3.54**
3.46**
(0.16)
(0.24)
4.45
4.23
1.35
1.29
Median of Fixed Cost
Duopoly Threshold
(Per Firm Market Size)
Change in Market Size from
Monopoly to Duopoly
Threshold Ratio
(S_2 / S_1)
Note: ** denotes significance at the 1% level, and * denotes significance at the 5% level.
Bibliography
Abramson, Ian S. (1982) “On Bandwidth Variation in Kernel Estimates - A Square Root
Law”, The Annals of Statistics, 10(4), 1217—1223.
Andrews, Donald W.K., Steven Berry, and Panle Jia (2004), “Confidence Regions
for Parameters in Discrete Games with Multiple Equilibria,” Yale University working paper.
Bain, Joe S. (1956) Barriers to New Competition. Cambridge: Harvard University Press.
Bajari, Patrick, and Jeremy Fox (2005), “Complementarities and Collusion in an FCC
Spectrum Auction,” working paper.
Bajari, Patrick, Han Hong, and Stephen Ryan (2004), “Identification and Estimation of Discrete Games of Complete Information,” Duke University working paper.
Basker, Emek (2005a), “Job Creation or Destruction? Labor-Market Effects of Wal-Mart
Expansion,” The Review of Economics and Statistics, vol. 87, No. 1, pp. 174-183.
Basker, Emek (2005b), “Selling a Cheaper Mousetrap: Wal-Mart’s Effect on Retail
Prices,” Journal of Urban Economics, Vol. 58, No. 2, pp. 203-229.
Berry, Steven (1992), “Estimation of a Model of Entry in the Airline Industry,” Econometrica, vol 60, No. 4, pp. 889-917.
Berry, Steve, and Waldfogel, Joel. (1999), “Free Entry and Social Inefficiency in
Radio Broadcasting”, RAND Journal of Economics, 30, 397-420.
Bresnahan, Timothy F., and Reiss, Peter C. (1987), “Do Entry Conditions Vary
across Markets?” Brookings Papers on Economic Activity, 1987, 833-871.
Bresnahan, Timothy and Peter Reiss (1990), “Entry into Monopoly Markets,” Review
of Economic Studies, vol. 57, No. 4, pp. 531-553.
Bresnahan, Timothy and Peter Reiss (1991), “Entry and Competition in Concentrated Markets,” Journal of Political Economy, vol. 95, No. 5, pp. 57-81.
Chernozhukov, Victor, Han Hong, and Elie Tamer (2004), “Inference on Parameter
Set in Econometric Models,” Princeton University working paper.
Ciliberto, Federico, and Elie Tamer (2006), "Market Structure and Multiple Equilibria in Airline Markets," Northwestern University working paper.
100
BIBLIOGRAPHY
101
Committee on Small Business. House (1994), “Impact of Discount Superstores on
Small Business and Local Communities,” Committee Serial No. 103-99. Congressional Information Services, Inc.
Conley, Timothy (1999), “GMM Estimation with Cross Sectional Dependence,” Journal
of Econometrics, Vol. 92, pp. 1-45.
Conley, Timothy, and Ethan Ligon (2002), “Economic Distance and Cross Country
Spillovers”, Journal of Economic Growth, Vol. 7, pp. 157-187.
Davis, Peter (2005) “Spatial Competition in Retail Markets: Movie Theaters,” forthcoming RAND Journal of Economics.
Directory of Discount Department Stores (1988-1997), Chain Store Guide, Business Guides, Inc., New York.
Discount Merchandiser (1988-1997), Schwartz Publications, New York.
Foster, Lucia, John Haltiwanger, and C.J. Krizan (2002), “The Link Between
Aggregate and Micro Productivity Growth: Evidence from Retail Trade,” NBER working
paper, No. 9120.
Gowrisankaran, Gautam, and Joanna Stavins (2004), “Network Externalities and
Technology Adoption: Lessons from Electronic Payments,” Rand Journal of Economics,
Vol. 35, No. 2, pp 260-276.
Haile, Philip, and Elie Tamer (2003), “Inference with an Incomplete Model of English
Auctions,” Journal of Political Economy, Vol 111, pp1-51.
Hausman, Jerry, and Ephraim Leibtag (2005), “Consumer Benefits from Increased
Competition in Shopping Outlets: Measuring the Effect of Wal-Mart,” NBER working
paper, No. 11809.
Holmes, Thomas (2001), “Barcodes lead to Frequent Deliveries and Superstores,” RAND
Journal of Economics, Vol. 32, No. 4, pp. 708-725.
Holmes, Thomas (2005), “The Diffusion of Wal-Mart and Economies of Density,” University of Minnesota working paper.
Klein, Roger W., and Sherman, Robert P. (2002), “Shift Restrictions and Semiparametric Estimation in Ordered Response Models”, Econometrica, 70, 663-691.
Klein, Roger W., and Spady, Richard H. (1993), “An Efficient Semiparametric Estimator for Binary Response Models”, Econometrica, 61, 387-421.
Kmart Inc. (1988-2000), Annual Report.
Mazzeo, Michael (2002), “Product Choice and Oligopoly Market Structure,” RAND
Journal of Economics, vol. 33, No. 2, pp. 1-22.
BIBLIOGRAPHY
102
McFadden, Daniel (1989), “A Method of Simulated Moments for Estimation of Discrete
Response Models without Numerical Integration,” Econometrica, vol. 57, No. 5, pp. 9951026.
Neumark, David, Junfu Zhang, and Stephen Ciccarella (2005), “The Effects of
Wal-Mart on Local Labor Markets,” NBER working paper, No. 11782.
Pakes, Ariel and David Pollard, “Simulation and the Asymptotics of Optimization
Estimators,” Econometrica, vol. 57, No. 5, pp. 1027-1057.
Pakes, Ariel, Jack Porter, Kate Ho, and Joy Ishii (2005), “Moment Inequalities
and Their Application,” Harvard working paper.
Pinkse, Joris, Margaret Slade, and Craig Brett (2002), “Spatial Price Competition: A Semiparametric Approach,” Econometrica, Vol. 70, No. 3, pp. 1111-1153.
Seim, Katja (2006), “An Empirical Model of Firm Entry with Endogenous Product-Type
Choices,” forthcoming RAND Journal of Economics.
Shaikh, Azeem (2005), “Inference for Partially Identified Econometric Models,” Stanford
University working paper.
Shils, Edward B. (1997), “The Shils Report: Measuring the Economic and Sociological
Impact of the Mega-Retail Discount Chains on Small Enterprise in Urban, Suburban and
Rural Communities,” online: http://www.lawmall.com/rpa/rpashils.htm.
Silverman, B.W. (1986) “Density Estimation for Statistics and Data Analysis”, Chapman
and Hall.
Smith, Howard (2004), “Supermarket Choice and Supermarket Competition in Market
Equilibrium,” Review of Economic Studies, vol. 71, No. 1, pp. 235-263.
Stone, Keneth (1995), “Impact of Wal-Mart Stores On Iowa Communities: 1983-93,”
Economic Development Review, Vol. 13, No. 2, pp. 60-69.
Tarski, Alfred (1955), “A Lattice-Theoretical Fixpoint Theorem and Its Applications,”
Pacific Journal of Mathematics, Vol 5, pp 285-309.
Taylor, Don, and Jeanne Archer (1994), “Up Against the Wal-Marts (How Your Business Can Survive in the Shadow of the Retail Giants),” New York: American Management
Association.
Toivanen, Otto, and Waterson, Michael (2005), “Market Structure and Entry:
Where is the Beef?”, Rand Journal of Economics, Vol 36, pp680-699.
Topkis, Donald (1978), “Minimizing a submodular function on a lattice,” Operations
Research, Vol 26, pp305-321.
Topkis, Donald (1979), “Equilibrium Points in Nonzero-Sum n-Person Submodular
Games,” SIAM Journal of Control and Optimum, Vol 17, pp 773-787.
BIBLIOGRAPHY
103
Topkis, Donald (1998), “Supermodularity and Complementarity,” Princeton University
Press, New Jersey.
Train, Kenneth (2000), “Halton Sequences for Mixed Logit,” UC Berkeley working paper.
Train, Kenneth (2003), “Discrete Choice Methods with Simulation,” Cambridge University Press, Cambridge, UK.
Vance, Sandra S., and Roy V. Scott (1994), “A History of Sam Walton’s Retail
Phenomenon,” Twayne Publishers, New York.
Wal-Mart Stores, Inc. (1970-2000), Annual Report.
Zhou, Lin (1994), “The Set of Nash Equilibria of a Supermodular Game Is a Complete
Lattice,” Games and Economic Behavior, Vol. 7, pp 295-300.