Tom Kiley Intro to GIS Final Paper December 14, 2012 PROJECT DESCRIPTION The goal of this project was to visualize population change in Springfield, Massachusetts, and surrounding communities over the last 60 years. Springfield’s peak decennial Census population (in 1960) was approximately 174,000. Since then, the city has lost more than 21,000 inhabitants; its population in 2010 was approximately 153,000. I wanted to see if population was simply shifting outward from Springfield and, if so, how densely it was settled in the surrounding towns. There are different methods for mapping population density that are intended to overcome limitations associated with simply dividing population by the area of the corresponding city or town. (The limitations of this approach become apparent when you look at the city of Chicopee, Springfield’s northern neighbor, which is relatively densely settled on the west side but has an airport on the east side and, therefore, few inhabitants.) One method is dasymetric mapping, which ESRI describes this way: A technique in which attribute data that is organized by a large or arbitrary area unit is more accurately distributed within that unit by the overlay of geographic boundaries that exclude, restrict, or confine the attribute in question. For example, a population attribute organized by census tract might be more accurately distributed by the overlay of water bodies, vacant land, and other land-use boundaries within which it is reasonable to infer that people do not live.1 I had briefly considered attempting a dasymetric analysis, but such an analysis is not necessary if population data are available at a fine-grained geographic level, such as Census blocks. Census block data are available from both MassGIS and NHGIS for 1990, 2000, and 2010, so as my thinking evolved, I decided to limit my analysis to these years using blocks, along with land use data from MassGIS to show residential development by lot size. DATA SOURCES To begin, I first downloaded select population and housing (100%) data for 1990, 2000, and 2010 from NHGIS at the Census block level, selecting Massachusetts as my geographic extent. The screenshot on the next page shows my selections. This request produces three CSV files – one for each year – along with metadata in a text document (.txt). The first column of each CSV file is the join column, labeled GISJOIN. I later downloaded Census block boundary files from NHGIS and joined them to the NHGIS tables in ArcMap. 1 http://support.esri.com/en/knowledgebase/GISDictionary/term/dasymetric%20mapping 1 I later discovered that MassGIS has already produced shapefiles for Census blocks from 19902, 20003, 20104, each with an attribute table that already includes population. Because these shapefiles took up less space, were easier to work with, and were already projected in the Massachusetts state plane system, I ultimately used them instead of the NHGIS data. For a user working on an analysis of another state, however, the NHGIS data are extremely useful. After trying various options, I thought a dot density map offered the best representation of the way population is distributed in Springfield and surrounding communities. I ultimately set each dot equal to 20 people. For land use, MassGIS has a datalayer (LANDUSE_POLY) that classifies land uses in one of either 21 or 37 categories based on aerial photography.5 Because the 21-category classification system was available for both 1971 (the earliest year of data) and 1999 (the latest year), I relied exclusively on that. Of those 21 categories, four are residential: 10 11 12 13 Multi-family Smaller than 1/4 acre lots 1/4 – 1/2 acre lots Larger than 1/2 acre lots I began by creating a new field in the LANDUSE_POLY layer, labeled “Change.” Any polygon whose land use in 1999 had changed since 1971 was assigned a -1; no change was assigned a 0. (The syntax was simply CHANGE=“LU21_1999” <> “LU21_1971”). I then selected all attributes assigned a -1 and created a clipped datalayer. From this clipped datalayer, I then displayed only residential land uses (10-13). The result was a map showing new residential development between 1971 and 1999 by lot size. For the base map, I used the MassGIS datalayer TOWNS_POLY, which I clipped to include only Springfield and the eight cities and towns that are contiguous with it (Agawam, Chicopee, East Longmeadow, Hampden, Longmeadow, Ludlow, West Springfield, and Wilbraham). I also used MassGIS datalayers for hydrography and roads. I used the clipped TOWNS_POLY to clip all of my other datalayers to just my study area. Finally, I felt it was critically important to show population change from 1960 to 2010, not just from 1990 to 2010. That is because Springfield’s most dramatic population loss occurred in those first 20 2 1990 metadata: http://www.mass.gov/anf/research-and-tech/it-serv-and-support/application-serv/office-ofgeographic-information-massgis/datalayers/census1990.html 3 2000 metadata: http://www.mass.gov/anf/research-and-tech/it-serv-and-support/application-serv/office-ofgeographic-information-massgis/datalayers/census2000.html 4 2010 metadata: http://www.mass.gov/anf/research-and-tech/it-serv-and-support/application-serv/office-ofgeographic-information-massgis/datalayers/census2010.html 5 Land use metadata are available from http://www.mass.gov/anf/research-and-tech/it-serv-andsupport/application-serv/office-of-geographic-information-massgis/datalayers/lus.html. 2 years, and moderated thereafter. NHGIS provides Census tract boundaries files back to 1960, and Social Explorer provides historical population data by Census tract. I had serious concerns about these data, however, which I will discuss later. As a result, I simply downloaded decennial Census reports (in PDF format) from the Census website and then created an Excel spreadsheet where I inputted the population counts for Springfield and the eight surrounding cities and towns for 1960, 1970, and 2010 (1980, 1990, and 2000 population counts are already included in TOWNS_POLY). Ultimately, I only needed 1960 and 2010, since I only looked at change over that entire time period. Either way, however, it was a small file to create manually. I then joined my Excel file to my clipped towns datalayer and created a new field, Acres, in order to show population density by town (persons/acres) and population change by town between 1960 and 2010. I then laid my dot density map (described above) over the population change map. The result, I hope, is a clear representation of the fact that the fastest growth since 1960 has occurred in the lowest-density places. 3 DIFFICULTIES ENCOUNTERED I had significant concerns about data during this assignment, but the truth is that I do not know if there were issues with data quality or issues with my manipulation of the data (or both), particularly when joining the data with shapefiles. My concerns are: As I’ve mentioned, both NHGIS and MassGIS make Census block data available for 1990, 2000, and 2010, but the data appear to me differ substantially depending on which source you use. I don’t know how to explain this. I noticed that the NHGIS data are projected in a different system (Albers Conical) than the MassGIS data, but while I understand that this could affect how the data appear on a map, I don’t see why it would change the values that appear in an attribution table. After I downloaded the NHGIS data, I deleted all the data in all three CSV files that was not from Hampden County (where Springfield is located), in order to make it more wieldy. But I also don’t see why this should have affected the data for Hampden County. I don’t have an explanation for the discrepancy, but as I’ve said, I ultimately chose to use the MassGIS files for my analysis. Another benefit of doing this is that I trust MassGIS far more than I trust myself! I also had concerns with the data that resulted when I joined NHGIS Census tract boundary files with Social Explorer population data from 1960 and 1980. When I did this, a number of tracts in the most densely settled part of Springfield appeared as having no population whatsoever – zero people – in both years, but 1980 was worse. I think it’s simply because the fields were missing the values. In the map on the next page of 1980 Census tracts, the lightest shade indicates zero population – obviously not possible. NOTE ABOUT PROJECTION Because all of the data I used ultimately came from MassGIS (with the exception of data I created myself in Excel and then joined to MassGIS shapefiles), all of the data were projected in NAD 1983 State Plan Massachusetts Mainland FIPS. 4 CONCLUDING THOUGHTS My biggest frustration was that Census block-level data were not available before 1990, but the most dramatic change occurred between 1960 and 1980. I could have remedied this somewhat with Census tract data, but was unable in my timeframe to overcome serious concerns about the data I was able to find from NHGIS and Social Explorer. With more time, I would investigate the tract data issues and also explore applications based on some of the readings I summarize below. Nevertheless, I think the land use maps and dot density map make a persuasive case that Springfield has suffered powerfully from suburbanization, and that virtually all residential growth over the last several generations has occurred in the low-density areas surrounding Springfield. This has serious implications. As Peter Calthorpe and others have pointed out, carbon emissions associated with households in lowdensity, single use districts are much higher than carbon emissions associated with households in highdensity, mixed-use districts. If we are going to satisfactorily address climate change, then we must alter the residential land use patterns that have predominated all over the country in the last 60 to 75 years. 5 ANNOTATED CITATIONS 1. Eidlin, Eric. “What Density Doesn’t Tell Us About Sprawl.” Access, Number 37, fall 2010, pp. 2-9. 2. Ewing, Reid, Rolf Pendall, and Don Chen. “Measuring Sprawl and its Impact, Volume 1.” Smart Growth America report, data unknown. Available for download from http://www.smartgrowthamerica.org/documents/MeasuringSprawlTechnical.pdf. I made a decision fairly early on not to do a sprawl analysis. A sprawl analysis should go beyond density to also look at “neighborhood mix of homes, jobs, and services;” “strength of centers, such as business districts,” and “accessibility via the street network”, according to Reid, Pendall, and Chen. I think a full sprawl analysis would be an important way to extend my analysis of population density and distribution. In his paper, however, Eric Eidlin does describe methods for dealing with limitations in assessing population density. These methods include measuring variation in the distribution of population of a city or town using a gini coefficient (a measure from zero to one, where 1 indicates that 100% of the population resides in a single Census tract). Standard measures of density indicate that the Los Angeles area is more dense than the New York City area, because the LA suburbs are denser than the New York suburbs. But the gini coefficient for LA is 0.65, while it’s 0.77 for New York, reflecting the high density of New York City itself. Given my small analysis area, however, and the availability of block-level data, calculating gini coefficients is not necessary here. 3. Holt, James, C.P. Lo, and Thomas Hodler. “Dasymetric estimation of population density and areal interpolation of census data.” Cartography and Geographic Information Science, Vol. 31, No. 2, 2004, pp. 103-121. Using the Atlanta metropolitan area as an example, this analysis demonstrates how dasymetric estimation of population density can be used to overcome the limitations of choroplethic estimation – namely, the Modifiable Areal Unit Problem (MAUP) that we discussed in class. 4. Wu, Shuo-sheng, Xiaomin Qiu, and Le Wang. “Population Estimation Methods in GIS and Remote Sensing: A Review.” GIScience and Remote Sensing, Vol. 42, No. 1, 2005, pp. 58-74. This paper assesses two broad sets of methods for mapping population. In the first set, areal interpolation, Census data are available. Here, “source zone” data (where the source zone might be Census tracts, for example) are aggregated to “finer-scale” raster data and then aggregated for “target zones” – that is, the spatial unit the research wishes to analyze. I think one of my biggest disappointments was that I ran out of time before I was able to figure out how to successfully rasterize my data. This is something I’d like to return to. The second set of methods the authors discuss comprises statistical methods used to model population when Census data are not available (say, in years between decennial Censuses.) I’m not sure this would have had applications for my particular analysis. 6 5. Wu, Shuo-sheng, Le Wang, and Xiaomin Qiu. “Incorporating GIS Building Data and Census Housing Statistics for Sub-Block-Level Population Estimation.” The Professional Geographer, Vol. 60, No. 1, 2008, pp. 121-135. This article notes that the sizes of Census blocks – “the most fine-grained census population data available to the public” – can vary greatly. This is absolutely true; in my analysis, in lowdensity areas Census blocks were sometimes not much smaller than Census tracts. The authors describe a model for using GIS building data and Census housing data to estimate populations at sub-block levels. Their formula looks like this: Population=BuildingVolume/HousingUnitSpace*OccupancyRate*HouseholdSize. I would be interesting in gaining better familiarity with this approach and applying it in future analyses. 7