Introduction to Spatial Regression Analysis ICPSR Summer Program 2010 1 1 Paul R. Voss1 and Katherine J. Curtis2 University of North Carolina at Chapel Hill 2 University of Wisconsin-Madison Odum Institute for Research in Social Science Manning Hall, CB #3355 University of North Carolina at Chapel Hill Chapel Hill, NC 2759 paul_voss@unc.edu 2 Department of Community & Environmental Sociology 1450 Linden Drive University of Wisconsin-Madison Madison WI 53706 kcurtis@ssc.wisc.edu Objectives The goal of this five-day course is to provide an overview of applied spatial regression analysis (spatial econometrics) that will enable participants to effectively incorporate these tools into their own empirical research. The course will introduce the broader field of spatial data analysis and the range of issues that generally must be dealt with when analyzing georeferenced data on a lattice. Census-type data are among the most commonly encountered data that conform to this description, although the course acknowledges the wider range of data appropriate for spatial regression analysis. In general, this is NOT a course where significant attention can be given to spatial analyses involving so-called geostatistical data or point pattern data. It also is not a GIS course. Course Materials and Organization The course will convene each day from 9:00 a.m. until approximately 4:30 p.m., except for the last day (Friday), when the course likely will wind down earlier to enable participants who must meet Friday evening flights to do so. The course is organized into a format that includes morning lectures (theoretical and conceptual underpinnings) and afternoon computing lab sessions (hands-on applications). We will attempt to set aside the last half hour or more of each day for group discussion of the topics introduced that day. Course materials are organized such that the readings supplement and provide greater detail on the topics covered in the classroom. Many more topics are introduced in the course lectures (assisted by PowerPoint) than can reasonably be absorbed in five intensive days, so the readings provide a point of return for review and deeper understanding of the topics covered, as well as a source of references for further reading. The lab exercises are guided by written, step-by-step tutorial instructions so that they can be repeated (and more fully absorbed) at a later time. Recommended readings and lab exercises are available on-line. Software The course will use primarily the spatial analysis package GeoDa TM and the open source programming application, R. OUTLINE OF COURSE Day 1 Morning: 1. 2. 3. 4. 5. 5. 6. 7. 8. Welcome and introductions Review of objectives and overview of plan for the week Goal and overview for Day 1 Motivational example Understanding spatial data a. Overview of spatial data and spatial data analysis b. Spatial analysis vs. spatial data analysis c Classes of problems in spatial data analysis d. Spatial vs. non-spatial data analysis Why spatial is special a. Characteristics of spatial data b. Problems often caused by spatial data Review OLS estimation a. Assumptions of the classical linear regression model b. Consequences of violation of the assumptions The importance of exploratory data analysis (EDA) and exploratory spatial data analysis (ESDA) Orientation to afternoon lab: Introduction to shapefiles (attribute data and digital map married up) and elementary ESDA using GeoDa.TM Orientation to univariate EDA using R Day 1 Afternoon: 1. 2. 3. 4. Introduction to “our” shapefile; your task: Begin thinking now about hypotheses, models, and analyses Reading a shapefile into GeoDa TM Simple mapping operations using GeoDa TM Univariate EDA using R. Day 1 Readings: 1. 2. 3. 4. [Older reading but nice perspective] Anselin, Luc. 1989. “What is Special About Spatial Data? Alternative Perspectives on Spatial Data Analysis.” NCGIA Technical Paper 89-4. [Recent & highly accessible reading] Ward, Michael D., & Kristian Skrede Gleditsch. 2008. Spatial Regression Models. Quantitative Applications in the Social Sciences, No. 155. Thousand Oaks, CA: Sage. Chapter 1. [May be found as a downloadable PDF at: http://www.duke.edu/web/methods/ ] [Together with the following reading, a nice motivational example] Loftin, Colin & Sally K. Ward. 1983. “A Spatial Autocorrelation Model of the Effects of Population Density on Fertility.” American Sociological Review, 48(1):121-128. Galle, Omer R., Walter R. Gove, & J. Miller McPherson. 1972. “Population Density and Pathology: What Are the Relations for Man?” Science (new series) 176:23-30 Day 2 Morning: 1. 2. 3. 4. 5. 7. Q&A from readings or 1st day lecture or lab Goal for Day 2: ESDA & spatial autocorrelation Data exploration: a. Distribution aspects of dependent variable b. QQ Plots c. Linearity between dependent variable and independent variables d. Variable transformations; Box-Cox transformations Global spatial autocorrelation & weights matrices a. What it is b. How it arises; Spatial processes i. Spatial heterogeneity ii. Spatial dependence c. Consequences of spatial autocorrelation d. How to measure it i. Weights Matrices ii. Global measures of spatial autocorrelation a. Global Moran statistic b. Global Geary statistic c. Problems with global measures Local measures of spatial autocorrelation a. Local Moran b. Moran scatterplot c. LISA mapping Orientation to afternoon lab: ESDA and spatial autocorrelation with GeoDa TM and similar work in R. Day 2 Afternoon: 1. 2. 3. 4. 5. Introduction to ESDA ESDA with GeoDa TM and R Creating and comparing weights matrices Global spatial autocorrelation in GeoDa TM and R Local spatial autocorrelation in GeoDa TM and R Day 2 Readings: 1. 2. 3. [Introduction to a key diagnostic tool in spatial data analysis] Anselin, Luc. 1996. “The Moran Scatterplot as an ESDA Tool to Assess Local Instability in Spatial Association.” Pp. 111-125 in Fischer, Manfred, Henk J. Scholten, & David Unwin (eds.) Spatial Analytical Perspectives on GIS: GISDATA 4 (London: Taylor & Francis). [The foundational reading for LISA statistics] Anselin, Luc. 1995. “Local Indicators of Spatial Association – LISA.” Geographical Analysis 27(2):93-115. [Nice example of ESDA] Messner, Steven F., et al. 1999. “The Spatial Patterning of County Homicide Rates: An Application of Exploratory Spatial Data Analysis.” Journal of Quantitative Criminology 15(4):423-450 Day 3 Morning: 1. Q&A from readings or 2nd day lecture, lab or readings 2. 3. 4. 5. 6. Goal for Day 3: Understanding spatial regression Spatial processes a. Spatial heterogeneity i. Define ii. Causes of iii. Problems arising from iv. Correcting for spatial heterogeneity v. GWR preview b. Spatial dependence i. Define ii. Causes of a. True contagion vs. false contagion iii. Expressions of a. Lagged dependent variable b. Unresolved heterogeneity; error lag iv. Corrections for a. Spatial lag model b. Spatial error model c. What these models mean/imply d. Relationship between the two models e. Higher order models Common modeling strategy a. Specify and estimate OLS model b. Analyze the regression diagnostics c. Specify spatial model d. MLE fundamentals Understanding the regression diagnostics provided by GeoDa TM a. Information criteria statistics b. Normality of errors c. Heteroskedasticity d. Lagrange multiplier statistics Orientation to afternoon lab: OLS & spatial regression modeling with GeoDa TM and R Day 3 Afternoon: 1. 2. 3. OLS regression in GeoDa TM and R GeoDa TM diagnostics and implications of these Spatial regression models in GeoDa TM and R Day 3 Readings: 1. 2. 3. [A very strong foundational reading] Anselin, Luc, & Anil Bera. 1998. “Spatial Dependence in Linear Regression Models with an Introduction to Spatial Econometrics.” Chapter 7 (pp. 237-289) in Aman Ullah and David Giles (eds.) Handbook of Applied Economic Statistics (New York: Marcel Dekker). [Overview of spatial econometric regression models] Anselin, Luc. 2002. “Under the Hood: Issues in the Specification and Interpretation of Spatial Regression Models.” Agricultural Economics 27(3):247-267. [Good reading for understanding of the dataset used in this course] Voss, Paul R., David D. Long, Roger B. Hammer, & Samantha Friedman. 2006. “County Child 4. 5. Poverty Rates in the U.S.: A Spatial Regression Approach.” Population Research and Policy Review 25:369-391. [Terrific example of grounding a spatial data analysis in theory] Baller, Robert D., & Kelly K. Richardson. 2002. “Social Integration, Imitation, and the Geographic Patterning of Suicide.” American Sociological Review 67(6):873-888. [Wonderful overview of spatial error and spatial lag regression models] Ward, Michael D., & Kristian Skrede Gleditsch. 2008. Spatial Regression Models. Quantitative Applications in the Social Sciences, No. 155. Thousand Oaks, CA: Sage. Chapters 2 & 3. Day 4 Morning: 1. 2. 3. 4. 5. 6. 7. 8. Q&A from readings or 3rd day lecture, lab or readings Goal for Day 4: Understanding spatial heterogeneity in relationships Brief digression to examine spatial smoothing using Empirical Bayes approach Introduction to GWR a. Theory and concept b. Local multivariate methods for spatial data analysis i. Spatial expansion model ii. Spatial adaptive filtering iii. Multilevel modeling iv. Random coefficient models GWR analytical steps What it means a. Spatial regime analysis b. GWR as a specification tool (interaction effects) c. GWR as a tool for policy analysis and decision making Cautions with GWR Orientation to afternoon lab: GWR Day 4 Afternoon: 1. 2. GWR in R Spatial regime analysis in R Day 4 Readings: 1. 2. 3. 4. [Understanding GWR] Fotheringham, A. Stewart, & Chris Brunsdon. 1999. “Local forms of Spatial Analysis.” Geographical Analysis 31(4):340-358. [GWR has its critics] Wheeler, David, & Michael Tiefelsdorf. 2005. “Multicollinearity and Correlation among Local Regression Coefficients in Geographically Weighted Regression.” Journal of Geographical Systems 7:161-187. [Excellent example of regime analysis] O’Loughlin, John, Colin Flint, & Luc Anselin. 1994. “The Geography of the Nazi Vote: Context, Confession, and Class in the Reichstag Election of 1930.” Annals of the Association of American Geographers 84(3):351-380. [How to for R] Anselin, Luc. 2007. “Spatial Regimes.” Pp. 107-115 in Spatial Regression Analysis in R: A workbook. (CSISS) Day 5 Morning: 1. 2. 3. 4. 5. 6. 7. 8. Q&A from readings or 4th day lecture, lab or readings Goal for the day: Dealing with spatial autocorrelation using Multilevel Analysis Defining “place” and “space” Conceptual motivations for multilevel modeling Statistical motivations Basic two-level multilevel model (continuous outcome) Generalized multilevel model (binary outcome) Orientation to afternoon: Multilevel analysis in R Day 5 Afternoon: 1. 2. Multilevel analysis in R Final questions & consultations regarding student data analyses and plans Day 5 Readings: 1. 2. 3. 4. 5. 6. 7. [Classical, early approach to context] Entwisle, Barbara, John B. Casterline, & Hussein A.A. Sayed. 1989. “Villages as Contexts for Contraceptive Behavior in Rural Egypt.” American Sociological Review 54(6):1019-1034. [More contemporary example] Baumer, Eric. P., Steven F. Messner, & Richard Rosenfeld. 2003. “Explaining Spatial Variation in Support for Capital Punishment: A Multilevel Analysis.” American Journal of Sociology 108(4):844-875. [Summary/Introduction] Teachman, Jay & Kyle Crowder. 2002. “Multilevel Models in Family Research: Some Conceptual and Methodological Issues.” Journal of Marriage and Family 64(2):280-294. [Bringing in space] Goldstein H , Rasbash J, & Browne W, et al. 2000. “Multilevel Models in the Study of Dynamic Household Structures.” European Journal of Population 16:373–88. Chaix, Basile, Juan Merlo, S.V. Subramanian, John Lynch, & Pierre Chauvin. 2005. “Comparison of a Spatial Perspective with the Multilevel Analytical Approach in Neighborhood Studies: The Case of Mental and Behavioral Disorders due to Psychoactive Substance Use in Malmö, Sweden, 2001.” American Journal of Epidemiology 162(2):171-182. [How to for R] Bliese, Paul. 2009. “Multilevel Modeling in R (2.3): A brief introduction to R, the multilevel package and the nmle package.” http://cran.rproject.org/doc/contrib/Bliese_Multilevel.pdf [Primer] Luke, Douglas. 2004. Multilevel Modeling. Thousand Oaks, CA: Sage.