Applying spatial techniques: What can we learn about theory? Henry G. Overman LSE, CEP & CEPR Lecture for the 19th Advance Summer School in Regional Science Publishing papers in spatial economics • Types of paper: – – • Methodological Applied For applied papers the key question is do we learn anything new about: – – Theory Policy Some casual empiricism • Based on a spatial econ workshop (Kiel ’05) • 60 papers at the conference – 12 methodological – 48 empirical • 10 growth in EU regions • Theoretical and empirical issues – Econometric theory and empirical work – Economic theory and empirical work • What do we learn from spatial econometric papers about theories of economic growth and location? Some less casual empiricism • Abreu, Groot and Florex ‘space and growth’ – 63 papers between 1995 and 2004 – Data • 68% EU • 11% country • 8% US/Canada – Relationship to theory • 63% standard spatial • 11% derive explicit models from theory Lessons from less casual empiricism 1. Spatial econometrics literature should think about underlying reasons for spatial dependence 2. Non-spatial literature should worry about spatial dependence of residuals 3. Spatial economics literature unduly concentrated on methodological issues HGO: What new things do we learn about growth? Space as nuisance • “For better or worse, spatial correlation is often ignored in applied work because correcting the problem can be difficult” Wooldridge, p. 7 • Key assumption – We know the relationship we want to estimate • Conclusion – We should use spatial econometric toolbox to correct residuals where appropriate An analogy • The returns to education – Wage = f (ability, education) – Ability unobserved but correlated with education Fixed/Random effects estimation to get coefficient on education • Slightly unfair comparison because dealing with spatial correlation harder – FE/RE maintains i.i.d. assumption – Need different asymptotic theory etc The challenge • The problem – Way too many papers focus on space as nuisance – Standard spatial techniques to correct the coefficient estimates (63%) – Important to understand these techniques but … – … revised coefficient estimates often do not tell us anything new! • How can we use spatial data or spatial techniques to learn something new? The empirics of location • Four types of papers on the location of economic activity (or people): – – – – Descriptive papers Empirical models Class of model approaches Structural approaches Descriptive work • Good descriptive work should – – – – – – Give us a feel for the data Give us a feel for patterns in the data … .. Without getting too hung up on the details Hopefully tell us something about theory … … Without claiming to tell us lots about theory Give us a feel for how we might best analyse the data Location patterns • For concreteness consider something specific – the spatial location of economic activity. • First important point – define your terms: – Are places specialised in particular activities? – Are activities localised in particular places? • Second important point – plot the data (GIS) – Cross check from statistical results to data plot Source: Duranton and Overman, Review of Economic Studies (2005) Source: Duranton and Overman, Review of Economic Studies (2005) First generation – location measures • Typical way to proceed is to calculate some summary statistic for each industry/location – Specialisation: Is the production structure of a particular region similar or different from other regions?; how different is the production structure? – Localisation: Is economic activity in a particular activity broadly in line with overall economic activity or is the activity concentrated in a few regions?; how concentrated is the economic activity? A typical paper • Variety of measures to capture spatial location patterns • Discussion of why some measures better than others • But, no systematic attempt to outline criteria by which to assess these methods • Arguments usually statistical and one dimensional Measuring localisation: 5 key properties 1. Comparable across industries • (e.g. can Lorenz curves be compared) 2. Conditioning on overall agglomeration 3. Spatial vs. Industrial concentration • • (The lumpiness problem) Ellison and Glaeser (JPE, 1997) dartboard approach; Maurel and Sedillot (RSUE, 1999); Devereux et al (RSUE, 2005) Measuring localisation 4. Scale and aggregation • • Dots on a map to units in a box Problem I – scale of localisation • • Problem II – size of units • • California 150 x Rhode Island Problem III – MAUP • • Cutlery in Sheffield versus Motor cars in Thames valley Spurious correlations across aggregated variables Problem IV – Downward bias • • Treat boxes separately Border problems 5. Significance • Null hypothesis of randomness Spatial point pattern techniques solve these problems … 1. Select relevant establishments 2. Density of bilateral distances between all pairs of establishments (4) 3. Construct counterfactuals • • • Same number of establishments (3) Randomly allocate across existing sites (2) Local and global confidence intervals (5) … and we learn something • Excess localisation not as frequent as previous studies – Significance versus border bias • Highly skewed – Some sectors very localised; – Others weakly – Many not significantly • Scale of localisation – Urban/metropolitan – Regional for 3d • Broad sector effects – 4d behave similarly within 3d • Size of localised establishments – Big or small depending on industry 1st generation: Concentration regressions • Get measures of industry characteristics and run a “concentration regression” • CONC(s) = a + bTRCOSTS(s) + cIRS(s) + dLINKAGES(s) + eRESOURCE(s) + fHIGH_TECH(s) Conceptual limitations • Theory tells us nothing about the relationship between indices and industry characteristics when more than two regions • Given availability of shares, why throw away lots of information by calculating only one summary statistic? Using industry shares Harrigan (1997) classical trade theory + simple translog revenue function + hicks neutral technology G F s agk ln rgf ln v cf gc c g k 1 c k f 1 – a and r vary across industries, technologies and factors Location theory • Ellison and Glaeser (1999) – sequential plant choice + expected profits depend on location specific and spillovers • Expected shares a non-linear function of: F F f 1 f 1 c c r p y p gf f f g f – Interaction of industry/country characteristics – No theoretical justification for using intensities Industry intensities • Midelfart et al (2002) CRS + CES preferences + differentiate goods + Armington + transport costs + # of industries proportional to country size F F f 1 f 1 c c r p y p gf f f g f Another interaction model Some comments • Number of firms in industry s, region r as a function of interaction between industry and regional characteristics • E.g. first expression interacts vertical linkages intensity (mu), sectoral labour intensity (phi) with regional wages • Problems – Hardly any data available – No firm movement (short run) – End up estimating sectoral transport variable An improvement over first generation? • A much clearer link from theory to the empirical specification that is estimated • Spatial interactions modelled explicitly • But could still be spatial correlation in the residuals Get out the spatial econometrics toolbox? 2nd order issue relative to first order issue of identification What do we learn about theory? • Harrigan is a straightforward neo-classical trade model • E&G is a very stylised geography model with black box assumptions to get to functional form • Midelfart et. al. has some geography effects but no IRS • Gaigne et. al. have a functional form that is very far from what they estimate An alternative strategy • Take one particular class of models and test whether the data are consistent with the model • Even better – nest one class of models within another class of models and test whether the data allow us to reject the implied restrictions Testing agglomeration • Agglomeration has two senses: – A process by which things come together – A pattern in which economic activity is spatially concentrated • Two paths approach – Test mechanisms – Test predictions • We will consider NEG models Defining and delimiting NEG • • NEG (here) = theories that follow the approach put forward by Krugman’s 1991 JPE article Five key ingredients 1. 2. 3. 4. 5. IRS internal to the firm; no tech externalities Imperfect competition (Dixit-Stiglitz) Trade costs (iceburg) Endogenous firm locations Endogenous location of demand • • Mobile workers I/O linkages Antecedents & Novelties • Ingredients 1-4 all appeared in New Trade Theory literature home market effects in Krugman 1980 • Key innovation of NEG relative to NTT is assumption 5 • With all 5 assumptions, initial symmetry can be broken and agglomeration form through circular causation Testing NEG predictions • Leamer and Levinsohn (1995) “Estimate don’t test” • Empiricists need to take theory seriously, but not too seriously • False confirmation – housing prices very expensive in areas with concentrated activity • False rejection – Kruman’s prediction of complete concentration NEG predictions 1. Access advantages raise factor prices 2. Access advantages induce factor inflows 3. Home market / magnification effects 1. Lower t.c. increase HME 2. More product differentiation (IRS? – same parameter) increases HME 4. Trade induces agglomeration 1. Increases for high IRS, high diff 2. t.c. inverted u? 5. Catastrophe 1. Small change t.c. large change location 2. Temporary shocks can have permanent effects Strategy • Take these predictions to the data • Empirical specifications that are “close” to the underlying theory • Allows us to assess whether these mechanisms and predictions are consistent with data (not prove that these are the mechanisms) Empirical NEG • Papers that model spatial linkages explicitly consistent with “class of models” approach – Redding and Venables (2004): income across countries – Davis and Weinstein (2004): testing for home market effect – Davis and Weinstein (2005): Catastrophe for location of Japanese industry Lessons from NEG work • Methods should connect closely to theory but not be reliant upon features introduced for tractability or clarity rather than realism • Better to have a limited number of parameters to distinguish models? – e.g. beta/sigma convergence • Much more work needed on observational equivalence – 1st order issue • A more accurate estimate of (say) a beta coefficient? • Discriminating between alternative models of differences across space? Structural estimation • Estimation of specification directly derived from the theoretical model without any further simplifying/function form assumptions • Clear identification of which variables are endogenous • Interpretations easier? • Computation of the model parameters: possible simulation of the model on real data Lessons from structural models? • Endogeneity – Structural econometric specification identifies precisely which variables are endogenous – In simpler situations (eg neighbourhood effects) may get through intuition • Which variables should be on RHS/LHS – Working with structural theory suggests these are more complicated than expected • Structural identification of parameters The downside • Do we really believe that the world looks like a NEG model plus some random shocks? • Two issues here – Is the world NEG? – What are the shocks? Estimation versus testing • Estimation – assume NEG model is valid and estimate its parameters under this assumption • Need to be confident that the model is true before estimating it – A crazy model (D-S) might not be so bad an approximation – Models place restrictions on parameters – Reality checks with parameter values • Testing requires nested structural models An alternative approach • Structural estimation works well in simple situations where we can observe agents actions and where the real world is close to the model (e.g. some IO situations) • A bounds approach can work well in situations which are very complicated, but where different classes of models consistently place restrictions on the relationships between variables (Sutton) Lessons 1. Mainstream economics increasingly recognising importance of space 2. Huge scope for geo-referenced data to increase our understanding of socioeconomic processes 3. Spatial econometrics providing a rapidly expanding toolbox for dealing with some problems encountered with spatial data Lessons (cont) 3. Too much emphasis on application of methods [c.f. heteroscedastic robust errors] 4. Too little attention on issues of role of theory and importance of identification a. Why include a spatial lag? b. If answer to (a) is • • “robustness for particular parameter estimate” see (3) “spatial interactions” then identification is everything 5. Class of models approaches to identification may be better than structural