MODELING POPULATION DENSITY WITH NIGHT

Pergamon Comput., Environ. and Urban Systems,Vol. 21, No. 3/4, pp. 227-244, 1997 © 1998ElsevierScienceLtd. All fights reserved Printed in Great Britain 0198-9715/98 $19.00 + 0.00 PH: S0198-9715(97)01005-3 MODELING POPULATION DENSITY WITH NIGHT-TIME SATELLITE IMAGERY AND GIS Paul Sutton 1 Department of Geography, University of Califomia at Santa Barbara, Santa Barbara, Califomia 93106, U.S.A. ABSTRACT. Night-time satellite imagery, as provided by the Defense Meteorological Satellite Program's Operational Linescan System (DMSP OLS), shows promise as a proxy measurement of urban extent. Earlier efforts have shown that the areas of contiguous saturated DMSP OLS images show strong correlations with the total population living in those areas. This paper describes efforts at modeling the population density within the urban areas identified within the continental United States. These efforts build upon the previous efforts of Clark, Berry, NorcPoeck, Tobler and others to describe the variation of population density within cities. The method described herein differs from the aforementioned theories because it operates from the edges of the urban areas rather than attempting to identify a "center" of the urban cluster. By measuring distance from the edge rather than the distance from the center this method allows for the "multiple nuclei" of urbun clustering that have clearly manifested as a result of the conurbation of urban centers within the U.S.A. This paper describes the methods used to allocate population to one, two, three, five, and ten square kilometer pixeis for the continental U.S.A. Several urban population decay functions are applied and evaluated. In addition, an empirical urban population density decay function is derived for all the urban clusters defined by the DMSP imagery. © 1998 Elsevier Science Ltd. All rights reserved INTRODUCTION The growth in human population has profound social, economic, and environmental consequences. The urbanization of the world's landsurface is one facet of population growth that may have deleterious economic and environmental impacts. Identifying and anticipating the location, size, and growth rate of the urbanized areas of the planet promises to be an important component of understanding, adapting t o , and possibly mitigating many facets of global change. While urbanized landcover presently only accounts for about 6% of the world's land area, this proportion is growing (Meyer, 1996). 1E-mail: sutton@geog.ucsb.edu 227 228 P. Sutton Furthermore, in terms of human population, urban areas are growing both in total and percentage terms and now account for over 50% of the human beings on the planet. Increased global urbanization may significantly alter local to regional climates while contributing directly to increased emissions of greenhouse gassvs, land degradation, and the loss of productive cropland (Berry, 1990). Accurate data on the spatial distribution of human population is critical in addressing the causes and impacts of global environmental change. High quality data on the size and distribution of the human population over the whole planet is critical in order to monitor, understand, respond to, and perhaps even prevent environmental degradation, loss of biodiversity, and resource depletion in many parts of the world. Fine resolution population density data has been used to measure changes in land-use patterns within the United States (Hitt, 1994). An increased understanding of how the density and distribution of human population varies within and between urban areas could contribute to the development of improved ability to monitor and predict the distribution of the human population. This work utilizes the U.S.A. as a regional study in order to inform further studies in other parts of the world. Identifying relationships between the total population of urbanized areas and the size of urbanized areas using information such as gross domestic product (GDP) per capita, most common means of transportation, distribution of wealth characteristics, and energy consumption per capita could prove useful in predicting future rates of urbanization. This work may be effectively incorporated into dynamic historical models of human transformation processes such as urbanization described by Acevedo, Foresman, and Buchanan (1996). Two aspects of urbanization arc very important with respect to land-use and land-cover change. First, most urbanization is unidirectional, culminating in the virtually permanent conversion of productive agricultural land into settlements (Meyer, 1996). Urbanization as a threat to agricultural lands has been demonstrated using DMSP imagery in the U.S.A. (Imhoff & Lawrence, 1997). Second, most of the urban population of the planet live in under-developed nations that are striving for increased economic development and technological progress. Historically, the cities of the developed countries have gone through a process of counter-urbanization in which the size of urban areas has spread dramatically (Keyfitz, 1990). China is notable as a country in which the parallel demographic, economic, and environmental developments it is undergoing are likely to have substantial global repercussions. China represents about one-fifth of the world's population and its economy has been growing at staggering rates for over ten years. China's cities have higher population densities than the U.S.A.; however, if their growing wealth changes the nature of urbanization in China we may be seeing unprecedented rates of urbanization that consume agricultural lands that are vital to world food supplies (Brown, 1995). If the cities of the developing world continue to follow historical patterns of urban growth, we are certain to see unprecedented changes in land usefland cover that will undoubtedly have profound impacts on cropland and the environment. This paper describes a means of modeling human population density within urban clusters as defined by the Defense Meteorological Satellite Program's Operational Linescan System (DMSP OLS) night-time satellite imagery. The ground truth for measuring the accuracy of these models was a 1 km 2 resolution grid of the population density derived from the 1990 U.S. decennial census. The urban density models used were parameterized with only two pieces of information: (a) the size and shape of the urban clusters defined by the DMSP OLS imagery and (b) a log-log relationship between the size Modeling Population Density 229 of an urban cluster and its total population that is described in a another paper (Sutton, Roberts, Elvidge, & Meij, 1997). METHODS The datasets used to perform these analyses were continental coverages of the U.S.A. at a resolution of one square kilometer. The DMSP OLS data is a stable city lights image produced by Elvidge, Baugh, Kihn, and Davis (1995). This image used 232 orbits of the DMSP OLS data archives. The use of multiple orbits was needed in order to obtain a composite image of stable light sources. Clouds, lightning, and the phase of the moon among other things can cause significant variation in DMSP OLS imagery from one orbit to the next. The second dataset was an image or "grid" of the continental U.S. population density derived from 1990 census data at the block group administrative boundary level (see Figure 1). The grid was derived from the block group layer of the Bureau of the Census' Topologically Integrated G-eo-referenced and Encoded Referencing System (TIGER), and proportionally allocated to 1 km 2 cells. This dataset was developed by the Socioeconomic Data and Applications Center (SEDAC) at CIESIN (Meij, 1995). It should be noted that this dataset was used only as a reference for the models developed from the DMSP OLS dataset. Previous analyses showed that the saturated D M S P pixels capture over 80% of the population on only 10% of the land; however, these efforts did not reallocate population density back to the one square kilometer pixel (Sutton et al., 1997). The earlier effort verified the DMSP OLS data as a feasible method for determining the areal extent of urban areas. It showed that these urban areas showed a strong relationship with their corresponding total populations. This relationship is very similar to the works of Clark, Stewart, and Welch (Clark, 1951; Stewart & Warntz, 1958; Welch, 1980). The methods described here explain various means of disaggregating the total city population back to the one square kilometer pixel that make up the city cluster. The methods adopted involve taking advantage of the spatial nature of the DMSP data. The DMSP image used was simply a binary image with saturated values and dark values (see insets of Figure 2). The saturated pixels of the DMSP OLS image were grouped into urban clusters based on their adjacency to other saturated pixels (Figure 2). Each saturated pixel in the DMSP OLS image was classed based on both a number uniquely identifying the cluster to which it belonged and a number representing the distance of that pixel to the edge of the cluster to which it belonged. This resulted in over 5,000 distinct clusters. The distance is defined as the shortest distance from the pixel in question to a non-saturated or dark pixel. One additional manipulation was incorporated to account for coastal and border cities. This was done to account for the fact that the densest parts of cities such as Chicago, Los Angeles, etc. are often right on the coast. Consequently the pixels of the ocean, lakes, and Mexican and Canadian borders were treated as if they were saturated pixels. This resulted in coastal and border cities having the highest distances to the edge on the center of their coastal or border contacts (Figure 3 shows the distribution of distance values, and data structure for three fictitious clusters of different shape but equal area, including a coastline cluster). The following is a description of how the grid with cluster identification information is combined with the distance to the edge of cluster grid. The data structure of Grids in Arc/ INFO includes a table called the value attribute table (VAT) which contains two items: a 230 P. Sutton V&IUeSnigher mall 1.00. 0.900.80- r the cumulative percentage ofpopulatien chart on the lower left for an enhanced understanding of the distribution of these cell values. 300,000 0.70- 200,000 This graph represents the [ proportic~t of the Contine*ttal [ United States that lives at or i below the corresponding I population density plotted below. 0.60- 0.~0.40~ 0.300.20" 0.10 i 0.00" Z> hOWeVermey all nave lOW frequencies that do not show up at this scale. See .... M ~., 100,000 ' "° " ' ' ° , . . . o. 12345678910 ~ i~1 12 14 16 18 20 22 24 ~,000 Population Density (persons / sq kin) i .... , .... i .... i .... i' 10000 20000 30000 40000 50000 Population Density F I G U R E 1. Grid of population density of the Continental United States derived from 1990 U.S. c e m m block groups. VALUE and a COUNT. The value represents the value assigned to the pixel or cell. The count is the number of pixels that have that particular value. In order to produce a grid that incorporated both the distance to edge information and cluster identification, the following manipulation was performed. First, the grid which had unique values for each of the clusters (which had a VAT whose COUNT represented the area of each urban cluster in square kilometers) was multiplied by a thousand. This would result in cluster number 1 becoming cluster number 1,000. The Modeling Population Density 231 16" 14" ~ 12" 10" 6" 4" 2 o 0 o 01234 5678910 12 LB(ClnsIer Area) ~4 15 Nominally classified Urban clusters with insets of the Los Angeles and Baltimore-Washington corridor shown in the binary state of the DMSP image. Figures 7 & 8 show the models applied to these binary insets. The points in the regression on the upper right are defined by clusters in the image in the upper left. The regression is weighted by the total population of the clusters. Without weighting, a small town in Nevada would count as much as the New York City cluster. The regression parameters are: Ln(Cluster population) = 3.353 + 1.3$9*Ln(Cluster Area). It is worth noting that the L.A. cluster had the largest absolute error of all the clusters in this study. FIGURE 2. DMSP O 1 ~ nlght-time satellite image showing saturated pixels in nominal colors iden/ffying adjacent pixels which form urban clusters. distance to edge grid was multiplied by ten and truncated to integers. This resulted in each pixel having values that represented their distance to dark continental U.S. land in tenths of kilometers. At this point, however, a pixel on the edge of the New York cluster is the same as a pixel on the edge of any other urban cluster including the Los Angeles, Phoenix, and Tucumcari dusters. The values of all these pixels are less than 100 kin; consequently, adding the cluster ID grid to the distance to edge grid results in a grid in which each pixel has a value in which the numbers from 0 to 999 represent the pixel's distance to the edge of its cluster, and the numbers from one thousand and up represent a number that uniquely identifies the cluster in which that pixel occurs (e.g., the pixels on the edge of duster number 1 have values of 1010 (10 being the smallest distance value)). The VAT table that 232 P. Sutton Maps of Sample Urban Clusters[ (Values in cells represent distsnces to edge) ~L [ Distribution of Distance Values 16 @ 1.0 km 4 @ 1.4 km 4 @ 2.0 km 4 @ 2.2 km 1 '°' @ 3.2 km 29 - Total Area Value Attribute Table of [ Sam of C~tK I~!000 md Dig 2 Edlle I VJ~Ug COUMT 11010 11014 11020 16 4 4 11022 11032 4 1 Cluster #11 (FairlyRoundinShspe) VALOE 19 @ 1.0 km 6 @ 1.4 km 2 @ 2.0 km 2 @ 2.2 Pan 10 2( 10 1~I,2~2:. 1o1oL, l, I 309010 309014 309020 19 6 2 309022 2 Cluster #309 O r m ~ y Shaped) [~1o VALUE 11 @ 1,0 km 4 @ 1.4 km 4 @ 2.0 km 3 @ 2.2 km 2 @ 2.8 kJn 1 @ 3.0 km 2 @ 3.2 km 1 @ 3.6 km 1 @ 4.0 km Land 22 12 lo lo L 28122 20 14 ;10 2 Le2°'z~ 30 20 10 Ocean [ 29 = Total Area ~o 1, 10110 COUNT 29 ffi Total Area Clnstco-eery ]-~1 J COUNT 711 010 11 711 014 4 711 020 4 711 022 3 711 028 2 711 030 1 711 032 2 711 036 1 711 040 1 FIGURE 3. Three sample urban elugers of equal area with VAT structure. Grid produces conveniently supplies the number of unique occurrences of each value. Thus, the VAT table for this grid has values which contain both the distance to edge information and count information that indicates the number of pixels that are a distance "x" from the edge of "Cluster Y" (Figure 3). Modeling Population Density 233 The question then becomes: How does one apply the traditional models based on a circular city to the irregularly shaped clusters identified from the DMSP night-time satellite imagery? The next stage of this analysis involves the manipulation of the VAT for the grid just described. The manipulation involves adding new columns to the table which are used to model the population density within each cluster according to any integrable function describing the decay of population density from an urban center. However, this model turns the distance function around and uses distance from the edge of an urban center instead. None the less, the decay functions used will be the traditional functions proposed describing population density as a function of distance from the urban center. The way this was done was via manipulation of the VALUE and COUNT items of the VAT for this grid. A simple program was written that calculated the following: (a) cluster ID, (b) cluster area, (c) radius of equivalent circle, (d) total population of cluster, (e) upper limit of integration for equivalent circle at distance "D", (f) lower limit of integration for equivalent circle at distance "D", and (g) population density estimate. Most of these values can be found for the clusters shown in Figure 3 in the table at the bottom of Figure 5. This table has population density estimates for all of the urban decay functions described in Figure 4. Cluster ID was obtained simply by dividing by 1,000 and truncating. Cluster area was obtained by summing the count values by cluster ID. The radius of equivalent circle was obtained by simply solving the equation: cluster area=~ x(R) 2 for R. Total population of cluster was obtained by using the formula obtained from previous work describing the linear relationship between the natural log of the population of urban areas and the natural log of the area of those clusters: total population of cluster= exp[3.359 + 1.359(cluster area)] Sutton et al., 1997). The R-square for the weighted linear regression of this log-log relationship is 0.97. The limits of integration warrant further explanation. It is important to know that the VAT file is sorted in ascending order. The limits of integration are calculated in the following manner. The urban population density functions all begin at a distance of 0 and end at a distance of 1, with the highest or central population densities at 0 (Figure 4 describes the urban decay functions used). The limits of integration are merely the radii of circles nested inside a circle of radius 1. An example is provided both here and in more graphic detail in Figures 4 and 5. Suppose a cluster has 100 one square kilometer pixels in it, 35 of these pixels are on the outer edge of this cluster with a distance value of 1 km from the edge. The upper limit of integration for these pixels would be 1.0 and the lower limit of integration for these pixels would be the radius of a circle that contained 65% of the area of the unit circle (65% comes from 100 total - the 35 in question). The unit circle has an area of ~, thus by solving the equation: 0.65xTr=~x(R) 2 for R, the lower limit of integration is obtained, e.g., 0.806. These radii are solved cumulatively within each cluster ID resulting in steadily decreasing limits of integration as distance from the edge increases to its maximum and the last lower limit of integration is 0. These limits of integration are then used in the definite integral associated with various urban population density decay functions shown in Figure 4. The value of this definite integral is the proportion of the total population that live in the pixels which are that distance from the edge of the cluster. This fraction is multiplied by the estimated total population of the cluster and divided by the number of pixels at that distance to produce an estimate of the population density of those pixels. It may be important to note here that all the pixels at a distance "x" from the edge of a particular cluster will be assigned the 234 P. Sutton Name of Population Density Decay Function 2-1) curve of function on the 0 to I interval Definite Integral of Function on left rotated about Z-axis as on right Algebraic Expression of2-D curve 3-D representation of function rotated about Z-axis Uniform x2 2 I,~ ¸ Y=I 1 x^2 0~2 0.4 0.'6 0.8 I I×1 0.~ i Linear/Conic ! x2 0.8 ~ \ 0.6 Y=l-x "~ 0.2 0.4 0.6 0.8 3(~2/2y(xA313)) xl I Parabolic 1\ a8 \ Y=(1-x) 0.4 ~2 a2 ~4 a6 a8 x2 2 xl i Exponential )<2 -X 0.8 0.7 0.6 0.5 0.4 Y=e ,-.. ~'~,. 0.2 0,4 0.6 0,8 -e(eA(-.X)*(x+I)) .7128 xl I Gaussian 1 O,6 0,4 0.2 y_## x2 ~(3(xA2)) xl as 1 1.8 2 Z8 i FIGURE 4. Graphic and algebraic representatiom of urban population demity decay functions. same population density value. The integrals solved are the urban population density decay functions rotated about the y axis. For a graphic description of this method of estimating population density see Figure 5. Several different functions were used to model the population density within each urban cluster (Figure 4). The simplest was a uniform model which simply assigned each pixel the same average population density on a per cluster basis. The second was a simple linear 235 Modeling Population Density Clusters Equivalent (darknessof shading propoaional ~ Pop Den) Radii Calculations for Cluster #11 R1 = 3.038 *.1857=0.564 R2 = 3.038 *.4152=1.261 R3 = 3.038 *.5571=1.692 R4 = 3.038 *.6695=2.034 R5 = 3.038 * 1.000=3.038 I ~r~,t, Cluster#11 ro,.,~i~,~o~ ) I Population Densities (pers/km sq) Allocated to regions of Cluster #309 Using Exponential PDF A r e a "A": D = l . 0 k m @ 81.1 A r e a "B": D = l . 4 k m @ 111.5 A r e a "C": D = 2 . 0 k m @ 131.9 l Circles Clusl.er#309~.~*,~:,~¢[ A r e a "D": D = 2 . 2 k m @ 152.8 g i ~..e- Total Population Calculation Total Population = 2,777 Exp[3.353 + 1.359*In(Area)] where Area = 29 km sq [Note: This value is the same for all the clusters described here] ~ClmWlnm tNsaMmm 11 ' 'i'.~ m s~s ,to 1~s 11 1.40 4 05571 a~6 ~.8 110.4 85.8 g7.9 29.8 11 zoo 4 84152 a~vl ~8 1~s l~ar 11(18 184.1 11 11 309 31~ 309 309 711 711 711 711 711 711 711 711 711 Zm 2.80 4 1 (11857 (10000 0.4152 (118ff/ 96.8 g~8 2"/2.0 4433 272.6 4432 137.4 180.8 381.6 739,9 1.00 19 6 2 2 11 4 4 3 2 I 2 1 1 0,5872 1.0000 (15B72 0.3714 (1~ 1.0000 (17878 0.~48 (1~872 0AQ13 (141~2 (13714 G21~8 (11857 mS g5.8 858 95,8 958 g5.8 ~8 858 95.8 958 958 g5.8 95.8 53.9 14L6 Ig5,7 237.6 29.3 73.8 102,6 131.9 157.0 1742 1957 222A 2522 28.3 ....'1~,3, 2~.9 384.0 81.1 111.5 131.9 15P_S 6.3 117.5 344,8 841.5 7.8 73.8 as 382 73`8 121.9 171,8 211.4 288,9 344.8 441k4 73.8 88.3 96.2 106,7 122.4 131+9 t44.4 160,6 6.3 22.0 63.3 135.5 213.5 344~ 542.6 739,9 IA0 2.00 2.20 1.00 1.40 200 220 2.60 3,00 3.80 4,00 4.10 d ~ zll.l~l~ ___ 03714 (12628 (10000 (17878 (16948 (15872 (14913 0,4152 (13714 0.2828 0,185"/ (10000 ~ ~ I I~mldc ~ 78o ~5 The three urban clusters with their proportional equivalent circle and a table denoting the various estimates of population density within them based on the five population density decay functions described in Fig 4. Each cluster has an area of 29 km2which is modeled to a circle with a radius of 3.038 kin. The estimated population for each of the clusters is 2,777 persons. The various population density estimates are presented in the table. FIGURE 5. The three urban dusters of Pigure 3 with an equivalent circle and table with population den,Sty esmnmtes. 236 P. Sutton function that decreased from 1 to 0 over the range 0 to 1. The third was a parabolic function that decreased from 1 to 0 over the same 0 to 1 range. The fourth was an exponential decay which ranges from 1 to e x p ( - 1) as distance varies from 0 to 1. The fifth was the standard Gaussian distribution for which the limits of integration were bumped up to 3 to include the first three standard deviations. An appropriate multiplicative constant was used on each of the definite integrals to insure that they all integrated to unity over the limits of integration: 0 to 2 ~ and 0 to 1. In addition to applying these theoretical models to estimate the population density inside these urban clusters, it was also possible to determine empirical population density decay functions from the actual population density data. The grid or image that contained the distance to edge information in aH the saturated pixels was overlaid over the actual population density grid. An average population density was calculated for each distance value. In addition a standard deviation was determined for each distance value. Plots of distance to edge of cluster vs. average population density and standard deviation of population density can be found in Figure 6. The sample size for these plots decreases with increasing distance. The left-most point for both of these plots is based on calculating the average population density and standard deviation of the population density for all of the pixels that are on the edge of any cluster. As the distance to the edge of the cluster increases the sample size (e.g., number of pixels used to calculate these values) drops significantly. As one moves to the right in these plots, to greater distances, one is moving into the hearts of these urban clusters. The first plot shows that population density does indeed increase with distance to the edge of the urban cluster. In fact, close inspection of the point pattern suggests a Gaussian-like curve for distances from 0 to about 15 kin. None the less, a simple Average Pop Density vs. Distance to Edge of Cluster ~j Std. Dev. ofAvg.Pop Den vs. Distanceto Edgeof Clmter I 35oo -I g ,4 Jl 3000 "I i3ooo g., !!,ooo R : 0.87 . ~ . .5.~. • ~ -~ R : 0.79 ". ".% "~,.'. '~,".~:2-.~ •,%~;, .: /,,4,..,.,.,.,.,., o 0 nJmmcefrome ~ of clmm' 11/10Kn) Population f Dbtance to D~n,ity = 25.5 + 75.0 ~Ed~e oga,,~ 50 100 150 200 250 300 350 Dbhmcefromedgeof duster (1/10Kin) Standard Deviation of m l ~ A A . . . . I Distance to Population Density ~ i ~ . t . - w "~" O.~F/~ Edge of Cluste The empirical measurements of the Aver•ge Population Density of the pixeis of the urban clusters •s • function of distance to the edge of the duster. The plot on the right shows the standard deviation of these average population densities as • function of distance. The standard deviation is • Imost directly proportionul to the mean as • function of distance. This is true when all the clusters are considered at once and when they are classified by total area as inHg 9. F I G U R E 6. Two graphs showing/he average population deusity of cluster pixels and standard deviation of popular/on dewdty as a function of distance from edge of cluster. Modeling Population Density 237 linear fit on this curve produces an R-square of 0.87. This linear fit was used as an empirical model for the population density. It will be compared to the theoretical models in the Results section. Clearly the plot looks as if there are some problems with heteroscedasticity. Some of the appearance is undoubtedly due to the declining sample size as distance increases; however, it is still likely that there is heteroscedasticity. The second plot is the standard deviation of the population density for the pixels with increasing distance to the edge of these clusters. It is interesting to note that the standard deviation increases with distance from the edge and is directly proportional to the mean for which it is measured. This does not bode well for the models described here. It indicates that any models that have uniform estimates of population density for all pixels at a given distance from the edge of a cluster will be unable to capture inherent variability in the population density for constant distances to the edge of urban clusters. In other words, all the pixels on the edge of an urban cluster have a standard deviation of population density that is almost as large as their average population density. It should also be noted that values for distances greater than 35 km were available but were not included in these plots. They were based on extremely small sample sizes because they were pixels deep in the heart of only the largest urban clusters. One possible means of reducing the increasing variance of the average population density would be to first classify these urban clusters based on either their absolute area or total population and then produce these kinds of curves for clusters of approximately the same size. RESULTS The theoretical models for estimating population density were applied to every urban cluster of the continental U.S.A. with each of the urban decay functions described in Figure 4 at spatial resolutions of 1, 2, 3, 5, and 10, square kilometer pixels. Table 1 is a list of all the cross-correlations between the model and the actual population density. The correlations were obtained by comparing and cross-correlating only those pixels in the urban clusters. (The figures would be a little higher if the dark or "source" areas were used with low or zero estimates of population density.) It is worth noting that the actual urban population is about 194 million whereas the model predicts 214 million. This total bias of 20 million across the nation is a result of using the regression parameters from the log transformation of the population and area of the clusters. There are 591,351 saturated pixels in the DMSP OLS city lights image which have an average corresponding population density of 321 persons/kin 2. All of the models have an overall average population density estimate of about 362 persons/kin 2. This kind of bias could easily be Table 1. Correlation {R} Between Model and Actual Population Density as a Function of Spatial Resolution and Population Density Decay Uniform Linear/Conic Parabolic Exponential Gaussian 1 km 2 2 km 2 3 km 2 5 km 2 10 km 2 0.293 0.518 0.546 0.428 0.541 0.319 0.562 0.59 0.464 0.584 0.339 0.594 0.62 0.489 0.614 0.364 0.629 0.652 0.516 0.647 0.414 0.691 0.714 0.566 0.709 238 P. Sutton corrected for by using nationally aggregated population figures and reducing all estimates by such a proportion. Another issue is the question of how well these two datasets are registered spatially. The tables below represent the cross-correlations between the model and the population density dataset for the Gaussian model at the five different spatial resolutions. The middle value is the correlation obtained at what is believed to be the true registration. The values in the surrounding cells are the correlations obtained when the image derived from the model has been shifted by the corresponding number of cells. In essence these figures show the influence of registration error on correlation at a range of scales. These tables suggest that the registration for these datasets is probably pretty good because the correlation is highest at the preferred registration at all scales. 0.55 0.58 0.41 0.55 0.57 0.48 0.54 0.56 0.50 0.60 0.71 0.47 0.58 0.65 0.52 0.57 0.61 0.53 0.53 0.56 0.43 0.55 0.56 0.48 0.53 0.55 0.50 10 km x 10 km Pixds 5 km x 5 km Pixels 3 km x 3 km Pixels 0.47 0.'49 0.50 0.47 0.42 0.47 0.49 0.49 0.48 0.45 0.49 0.53 0.55 0.51 0.44 0.48 0.51 0.52 0.50 0.47 0.50 0.55 0.58 0.52 0.46 0.49 0.52 0.54 0.51 0.47 0.49 0.52 0.54 0.50 0.45 0.48 0.50 0.51 0.49 0.46 0.46 0.48 0.48 0.46 0.42 0.47 0.48 0.48 0.47 0.45 2 km x 2 km Pixels I lun x I k m Pixels Figures 7 and 8 are representations of the parabolic model applied to the Los Angeles cluster and the Baltimore-Washington DC cluster respectively. The uniform gray area is not saturated in the DMSP OLS image and is consequently not modeled at all. Superimposed on the gray-scaled model is a contour map of the actual population density of these areas. The contour map is a gross generalization of the actual 1 km 2 population density grid of these areas, but it does give an overall feel of the actual population density of these areas. The grid of population density has a much higher degree of variability than is suggested by the contour lines. None the less, these images suggest a correlation between the model and the population. Surprisingly the correlations for these clusters have R 2 values of only 0.22 and 0.25 respectively (Figures 7 and 8). Yet these figures are none the less representative of the model's general application because the R 2 values are typical of the model for the whole continental U.S.A. These R z values are low because of the model's inability to identify variability of population density at constant distance to edge. The standard deviation vs. distance curve of Figure 6 clearly shows one of the reasons that these R 2 values are as low as they are. In fact, it is surprising that the models capture as much of the variation in population density as they do considering the fact that they are derived from a binary image. An empirical model derived from the linear regression shown in Figure 6 was also developed. In this model the estimated population density was simply the parameters of Modeling Population Density 239 The parabolic population density decay function applied to the Los Angeles cluster and environs. The contour lines are the actual population density and are at intervals of S00 persons/km2. The correlation (R) between the actual population density and the modeled values for just those pixels in the L.A. Cluster is: 0.51. The smaller inset is an image of the actual population density minus the predicted population density. A distribution of these ! values is to the right of this inset. HGURE 7. A representationof/he parabolicmodelappliedto the Los Angelesduster with a population demudtycoutou~map superimposedover/he model. the linear regression: estimated pop. den. = 25 + 75 x (distance to edge in kilometers). This model overestimated the total population of the urban clusters of the U.S.A. by a b o u t 11%. The mean value of the empirical model was 358 whereas the actual average population density of the cluster pixels was 321. The empirical model did have the best correlation with the actual population density but not by a large margin. The correlation {R) between the empirical model and actual population density on the one square 240 P. Sutton FIGURE 8. A representationof the p~rabolicmodel appliedto the Boston--W=dtln~onDC corridor cluster with a populationdensitycontourmap superfumpoeedover the model. kilometer scale was 0.578. Recall that the one kilometer parabolic and Gaussian models had correlations of 0.546 and 0.541, respectively. One means o f improving the empirical model would be to classify the cities into subgroups based on either their areal extent or total population and then determine the empirical population density decay function. This would eliminate the averaging o f pixels on the edge o f large clusters such as New York from pixels on the edge o f small clusters like Santa Barbara. Figure 9 shows the plots of average population density as a function o f 241 Modeling PopulationDensity distance for six different size classes of urban cluster. As cluster size diminishes the greatest depth also diminishes. Plots of standard deviation of population density are not shown but all of them show the same pattern as described in Figure 6. It is interesting to note that the slope parameter for the two large classes of urban cluster is lower than the slope of the four The Urban Clusters classified by size. Plots below show empirical measurements of population demity as a function of distance to edge of cluster for each size class. Linear Regression data is provided for each size category. Each point in the plots is the average of all the pixels at that distance from edge of a cluster in the cities of that size. As depth into duster increases the number of pixels used to calculate the averege pop. density diminishes. About 2% of the data come from pixels that are very deep (at large distances) in the cluster. These were not included in the plots if they were derived from samples of less than 30 pixels. •, "~'.~ ! 11 clusters with Area > 5000 km~ ......... R == 0,94 Slope: 80 (persons/km2)/km Average Population Density (Y) vs. Distance to Edge o f Cluster (x) for the Megacities t~ ....... / J t / / :7,/ ' ......... i i 1'50i R 2 = 0.98 Slope: 108 (persons/km~)/km fitv (Y~Ivs. Distance to Edge of Cluster (x) for the Medium Cities R'S~0"93 73 (persons/km~)/km Average Population Density (Y) vs. Distance to Edge o f Cluster (x) for the Very Large Cities Average Population Density (Y) vs. Distance to Edge o f Cluster (x) for the Small Cities / i R == 0.92 Slope: ii ! 118 (persons/km2)/km J Average Population E Edge o f Cluster (x )ensity (Y) vs. Distance to x) for the Tiny Cities F I G U R E 9. Plots o f the empirical population density decay functions for six different sizes o f urban duster. 242 P. Sutton smaller classes. The population density gradient for the large cities is about 75 persons/kin 2 per kilometer. It is about 110 for the four smaller classes. This suggests that, on average, the population density increases more rapidly with depth for small urban clusters than it does for large urban clusters. This may be due to the fact that the large urban clusters are primarily composed of the conurbations of one or more separate cities. The San Francisco cluster includes Oakland, San Jose, and other cities. The New York cluster reaches into Connecticut and New Jersey. Conurbation may result in deflating the gradients as measured in this way. None the less, the empirical curves are interesting and linear regressions are an oversimplification of the gradients. All of the curves show population density increasing with distance in ways reminiscent of the theoretical exponential decays of traditional urban geographic theory. A model was built in which each cluster was classified by size and then modeled based on the corresponding regression shown in Figure 9. This model showed no significant improvement on the single empirical model of Figure 6. This shows that in order to improve correlations it is necessary for future models to include information that accounts for the variability in population density that occurs at constant distances to the edge of these clusters. DISCUSSION The method of modeling population density presented here requires very little input information. The only information used is a binary DMSP image and a relationship between area and population for urban clusters. The model is then developed using the spatial information in the image and applying various urban population density decay functions to the urban clusters. These theoretical models explain almost as much variation as empirically derived population density decay functions that depend on distance alone. These rather simple beginnings may prove to be a good starting point for the development of a more accurate method of modeling population density that uses additional independent sources of information such as those produced by the USGS at the EROS data center which include the NDVI greenness index from AVHRR, digital elevation data, climate, and more (Loveland, Merchant, Ohlen, & Brown, 1991). In addition, a low-gain version of the DMSP imagery will soon be available. It may prove to be quite useful because it shows much more variation within the urban clusters (e.g., the images are not saturated). The concept of population density is an abstract one. Consequently, attempting to model it raises many questions. The ground truth data used in this paper was derived from a vector dataset of the blockgroup polygons of the 1990 U.S. census. Assumptions were made to convert this dataset to a grid. One assumption is the uniformity of population density within the block group polygons. Using this dataset as the ground truth may not be entirely appropriate. Alternative ground truthing methods may show that the correlations obtained here are actually underestimates. The census is also a measure of the night-time population location. Despite the fact that the DMSP imagery is also produced at night, it may be providing clues as to where the population is in the day also. The downtown areas of many cities show up as very bright despite the fact that they often have lower uight-time population densities. These population densities are much higher during the working hours of the day. How much does the population density of an urban center vary on a temporal basis? Clearly it is difficult to measure the error inherent in a model of this nature because Modeling Population Density 243 the truth is an elusive quantity that varies temporally and spatially in complex ways that are difficult if not impossible to measure for appreciably sized areas. Future work will focus on identifying additional information that can augment that captured by this model. In addition, these models will be run on other countries of the world to determine whether aggregate national figures such as GDP, GDP per capita, percent of population in rural areas, and other such figures can explain any of the expected variability between nations in the relationships between population and night-time satellite imagery. Earlier work has shown that the relationship between settlement size and settlement population varies dramatically at some regional and international scales (Stewart & Warutz, 1958; Nordbeck, 1965; Tobler, 1969). Explaining this variation with aggregate national economic, demographic, and political data could greatly improve our ability to predict rates of urbanization in parts of the world where good population data does not exist. CONCLUSION The model described in this paper accounts for 25% of the variation in the population density of the urban areas in the continental U.S.A. from information contained in a binary image derived from DMSP OLS imagery and some relatively simple spatial analysis. This may prove to be a good foundation for developing a model of population density estimation for other parts of the world where census data is not available. Future efforts will focus on improving the correlation between the model and reality via the inclusion of additional information from the AVHRR and DMSP OLS low gain satellites. It is hoped that this information will account for variability in population density that occurs at constant depth within the urban clusters defined here. Any overall bias in these models can easily be corrected with aggregate national and/or sub-national population totals. Additional ground truth data will be obtained in countries with varying levels of economic development, rural/urban population ratios, and predominant means of transportation in an attempt to explain large scale variation in population patterns that is expected across these variables. ACKNOWLEDGEMENTS This research was supported in part by a SUCSB California Space Grant Fellowship, a NASA Earth System Science Global Change Fellowship, The National Center for Geographic Information and Analysis, and The UCSB Geography Department. This assistance is gratefully acknowlexigcd. In addition I would like to thank all the reviewers of the article for their helpful comments and references. m REFERENCES Acevedo, W., Foresman, T. W., & Buchanan, J. T. (1996). Origins and philosophy of building a temporal database to examine human transformation processes. In ASPRS/ACSM Annual Conventionand Exposition, 62d/56th, Baltimore, Maryland, April 22-25, 1996, Proceedings: Bethesda, Maryland. American Society for Photogrammetry and Remote Sensing. American Congress on Surveying and Mapping, 1(1), 148-161. Berry, B. (1990). Urbanization. In B. L. Turner (Ed.), The Earth as transformed by human action (pp. 103-120). Cambridge: Cambridge University Press. Brown, L. (1995). Who willfeed China?. New York: W.W. Norton & Company. Clark, C. (1951). Urban population densities. Journal of the Royal Statistical Society, CXIV(IV), 490-496. Elvidge, C. D., Baugh, K. E., Kihn~E. A., & Davis, E. R. (1995). Mapping city lights with nighttime data from the DMSP operational linescan system. Photogrammetric Engineering and Remote Sensing, LXIII(6), 727-734. 244 P. Sutton Hitt, K. J. (1994). Refining 1970's land-use data with 1990population data to indicate new residential development. U.S. Geological Survey Water-Resources Investigations Report 94-4250. lmhoff, M. L., & Lawrence, W. T. (1997). Using nighttime DMSP/OLS images of city lights to estimate the impact of urban land use on soil resources in the United States. Remote Sensing of Environment, 59(1), 105-117. Keyfitz, N. (1990). The growing human population. In J. Piel (Ed), Managing planet Earth (pp. 61-72). New York: Scientific American. Loveland, T., Merchant, J., Ohlen, D., & Brown, 1. (1991). Development of a land-cover characteristics database for the conterminns U.S. Photograrwnetric Engineering and Remote Sensing, 57(11), 1453-1463. Meij, H. (1995). Integrated datasets for the USA, Consortium for International Earth Science Information Network. Meyer, W. (1996). Human impact on the Earth. Cambridge: Cambridge University Press. Nordbeck, S. (1965). The law of aUometric growth. Michigan Inter-University Community of Mathematical Geographers, Paper 7. Stewart, J., & Warntz, W. (1958). Physics of population distribution. Journal of Regional Science, 1, 99-123. Sutton, P., Roberts, D., Elvidge, C. D, & Meij, H. (1997). A comparison of nighttime satellite imagery and population density for the continental United States. Photogrammetric Engineering and Remote Sensing, 63(11), 1303-1313. Tobler, W. (1969). Satellite confirmation of settlement size coefficients. Area, 1, 30-34. Welch, R. (1980). Monitoring urban population and energy utilization patterns from satellite data. Remote Sensing of Environment, 9(I), 1-9.

MODELING POPULATION DENSITY WITH NIGHT

Related documents

Products

Support

MODELING POPULATION DENSITY WITH NIGHT

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib