Beyond City Size: Characterizing and predicting the location of urban amenities by Elisa Castaner Ensenat B.S., M.I.T (2014) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Masters of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2015 c Massachusetts Institute of Technology 2015. All rights reserved. β Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Department of Electrical Engineering and Computer Science May 8, 2015 Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cesar A. Hidalgo Associate Professor Thesis Supervisor Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prof. Albert R. Meyer Chairman, Maters of Engineering Thesis Committee 2 Beyond City Size: Characterizing and predicting the location of urban amenities by Elisa Castaner Ensenat Submitted to the Department of Electrical Engineering and Computer Science on May 8, 2015, in partial fulfillment of the requirements for the degree of Masters of Engineering in Electrical Engineering and Computer Science Abstract Intercity studies have shown that a city’s characteristics —ranging from infrastructure to crime—scale as a power of its population. These studies, however, have not been extended to the intra-city scale, leaving open the question of how urban characteristics are distributed within a city. Here we study the spatial organization of one important urban characteristic: its amenities, such as restaurants, cafes, and libraries. We use a dataset summarizing the position of more than 1.2 million amenities disaggregated into 74 distinct categories and covering 47 U.S. cities to show that: (i) the spatial distribution of amenities within a city is characterized by dense agglomerations of amenities (which we call micro-clusters), (ii) that unlike in the intercity case, size is a poor predictor of the amenities of each type that locate in each micro-cluster, and (iii) that the number of amenities of each type in a micro-cluster is better predicted using information on the collocation of amenities observed across all micro-clusters than using the micro-cluster’s size. Finally, we use these findings to create a recommendation algorithm that suggests amenities that are missing in a micro-cluster and can inform the efforts of developers and planners looking to construct and regulate the development of new and existing neighborhoods. Thesis Supervisor: Cesar A. Hidalgo Title: Associate Professor 3 4 Acknowledgments I would like to thank my supervisor, Cesar Hidalgo, for the patient guidance, encouragement, and advice he has provided me throughout my time as his student. I have been extremely lucky to have a supervisor who cared so much about my work, and who responded to my questions and queries promtly. I would also like to thank the rest of the Macro Connections group at the MIT Media Lab for the support and feedback they have given me throughout the year. The completion of this project wouldn’t have been possible without their help. 5 6 Contents 1 Introduction 1.1 13 Multi-Centers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2 Data 17 3 Results 19 3.1 From the intercity to the intra-city scale . . . . . . . . . . . . . . 19 3.2 Micro-clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 Intra-city scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4 Recommender System . . . . . . . . . . . . . . . . . . . . . . . . 27 4 Discussion 33 A Supplementary Material 35 A.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 A.2 Intercity Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 A.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 A.3.1 Effective number of amenities . . . . . . . . . . . . . . . . 40 A.3.2 Identifying cluster centers . . . . . . . . . . . . . . . . . . 41 A.3.3 Assigning points to clusters . . . . . . . . . . . . . . . . . 41 A.4 Collocation of amenities . . . . . . . . . . . . . . . . . . . . . . . 42 7 A.5 Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 B Cities Administrative Units and Populations 8 51 List of Figures 3-1 Intercity Scaling Relations . . . . . . . . . . . . . . . . . . . . . . 21 3-2 Clustering algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 24 3-3 Intercity Scaling Relations . . . . . . . . . . . . . . . . . . . . . . 26 3-4 City micro-agglomerations . . . . . . . . . . . . . . . . . . . . . . 29 3-5 Prediction of amenities in Boston’s micro-clusters. . . . . . . . . . 32 A-1 Clustering Algorithm: Boston, SF, NY . . . . . . . . . . . . . . . 43 A-2 Amenities correlations matrix . . . . . . . . . . . . . . . . . . . . 44 9 10 List of Tables A.1 Merged amenity types . . . . . . . . . . . . . . . . . . . . . . . . 36 A.2 Total amenity count and amenity categories . . . . . . . . . . . . 37 A.3 Cities population and amenity count . . . . . . . . . . . . . . . . 38 A.4 Intercity Scaling parameters per amenity type . . . . . . . . . . . 40 A.5 π 2 s of intercity and intra-city models . . . . . . . . . . . . . . . . 47 A.6 AIC and BIC values of intercity and intra-city models . . . . . . . 49 B.1 Administrative units included in each city . . . . . . . . . . . . . 65 11 12 Chapter 1 Introduction During the last decade the empirical study of cities has been characterized by a strong emphasis on scaling relationships connecting the size of a city —measured by its population—with attributes ranging from the availability of infrastructure to the presence of crime [1, 2]. This growing literature has shown that these scaling relationships hold across cities from different cultures and time periods [2, 3]. Yet, these intercity relationships teach us little about the way in which these attributes are spatially distributed within a city. In fact, one could easily construct a model where attributes follow a random spatial distribution within a city and that also satisfies the intercity scaling relationships documented in the literature. In this paper we add to this literature by bringing the quantitative study of cities to the intra-city scale and by showing the statistical principles that explain the frequency, composition, and location of amenities within a city. But why is the intra-city scale important? One the one hand, understanding the distribution of amenities is important for the planners and developers who shape cities. Planners need to create urban designs that stimulate the virtuous social interactions that encourage economic activity, reduce levels of crime, and lower traffic congestion [4, 5, 6, 7, 8]. Developers, who construct buildings look13 ing for profits, need to create buildings and units that are attractive to residents and shop owners, and hence, need to understand which types of buildings and units are better pre-adapted to the uses that a neighborhood might require. On the other hand, a city’s citizens and visitors can benefit from maps representing the city at a meso-scale. These meso-scale maps can focus on clusters of amenities instead of individual units, helping uncover the presence of neighborhoods with an active urban life. Finally, small business owners may also benefit from a statistical understanding of cities at the intra-city scale, as the empirical laws describing the location of amenities in neighborhoods can be used to uncover instances of unsatisfied demand that shop owners can use to identify new business locations (this is information that is now only available to large franchising operations —such as Starbucks [9]). So a better understanding of cities at the intra-city scale can benefit both, the planners and developers that shape a city, and the citizens and visitors who utilize a city’s streets. Here, we move beyond the intercity scale and studies focused on the size of cities by looking at data summarizing the precise location of amenities (such as restaurants, cafes, and libraries) within a city. Our contribution consists on two parts. First, we introduce a clustering algorithm to show that the spatial organization of cities is based on hundreds of highly localized micro-clusters of urban activity, and that the size of these micro-clusters, is a poor predictor of the number of amenities of a certain type that are present in it. This suggests that, to recover the predictability of the intercity studies —which we reproduce with our data—we need to use information on the types of amenities that are present in each micro-cluster. Our second contribution involves the development of a simple prediction algorithm that exploits information on the patterns of collocation of amenities observed across thousands of micro-clusters. We use this algorithm to identify anomalies in the data —which can represent instances 14 of unsatisfied demand—and use these anomalies to suggest both, new amenities for each cluster and the clusters that are in the direst need of a specific amenity. Together these results help extend the study of cities to the intra-city scale, and also, open new avenues of research that focus on the composition of amenities at the neighborhood scale. 1.1 Multi-Centers The idea that highly localized clusters of economic activity characterize the distribution of amenities in a city has a long academic tradition. On the one hand scholars have conducted empirical studies looking to identify and characterize micro-clusters (or neighborhood scale agglomerations), and on the other hand, we have models that have been used to explain why economic and social activity agglomerates. On the empirical side people have used employment densities [10], commuting patterns [11], the floor space used by businesses [10], mobile phone and social media activity [12, 13, 14], and the spatial collocation of commercial units [15] to identify micro-clusters of urban activity. On the theoretical side, people have developed supply side and demand side theories to explain agglomeration. Supply side theories of agglomeration focus on externalities, such as knowledge spillovers [16], shared capacities [17], and transportation costs [18], to explain the colocation of businesses and/or manufacturing activities. Demand side theories focus on the ability of agglomerations to attract shared customers. The quintessential demand side model of agglomeration is Hotelling’s 1929 model, which predicts that similar businesses would collocate to maximize their catchment area. These demand side stories, of course, also apply to businesses that are not necessarily similar, but complementary, such as shoe-stores and clothing stores, explaining 15 also, why businesses that are not closely related —such as car repair shops and ice-cream stores—tend not to collocate. The rise of novel high-resolution data sources summarizing the location of urban amenities, however, allows us to explore the collocation of amenities empirically, helping us both, validate these theories, but also, provide new empirical facts that we could use to test new theories and models. 16 Chapter 2 Data We collect data from the Google Places API containing the latitude, longitude, and type of amenity (i.e. cafe, restaurant, library, etc.), for more than 1.26 million amenities across 47 US cities (see SM for details). Additionally, we collect data on the population of each of these cities by identifying all of the administrative units contained within the area of our amenities data (see SM). For instance, in the case of Boston, our amenities data includes the areas of Cambridge, Somerville, and Brookline, so we estimate the population of the larger city of Boston by summing the populations of these and other administrative units (see SM). Going forward we use the word city to refer to the naturally occurring urban agglomerations that people refer to colloquially as ’cities’, and not to the narrowly defined administrative units that exist within them (i.e. we use Boston to refer to the union of the administrative units of Boston, Cambridge, Brookline, Somerville, Newton, etc., and not to the administrative area controlled by Boston’s City Hall). We adopt this use of the word city because our data involves contiguous areas that transcend individual administrative units. Certainly the data from the Google Places API is not free of biases and limitations. The amenities data registered in the Google Places API focuses on 17 customer facing businesses and places of interests (from hair salons and bakeries to airports and cemeteries). Therefore, the Google Places API data fails to include information on other forms of economic activity, such as manufacturing or business-to-business activities. Also, the data might have coding issues, such as having a restaurant registered as a bar. Moreover, businesses that shut down, either because they went broke or relocated, might not be updated from Google Maps, and therefore, the data can contain outdated information. Yet, despite these limitations, the Google Places API is accurate enough to be the backbone of the world’s most used mapping service (Google Maps) and is used daily by millions of individuals to find the location of businesses. This makes the Google Places API data an imperfect, yet attractive dataset to study the spatial organization of amenities at the intra-city scale. Finally, we remind the reader that any results derived in this paper should be interpreted in the narrow context of the data from which these results were derived. This is data from an online mapping service and for U.S. cities only. The question of whether the results presented below can be generalized to other locations, and also, of whether these results hold for other datasets, is beyond the scope of this paper. 18 Chapter 3 Results 3.1 From the intercity to the intra-city scale We begin by reproducing the well-known intercity scaling laws of Bettencourt et al. [19, 20, 21] using our urban amenities data. By reproducing these laws we validate our data in the context of intercity research before presenting our intra-city contributions. Figure 3-1 shows the total number of amenities ππ in a city as a function of its population. The total number of amenities in a city (ππ ) scales sub-linearly with a city’s population ππ as: ππ = π0 πππ½ , with π0 = 2.03 and π½ = 0.68 (Fig. 3-1a, π 2 = 90% p-value βͺ 1x10−5 ) matching Bettencourt et al. scaling laws [1, 19, 20, 21]. The sub-linearity of this scaling law indicates the presence of scale economies, since it means that the number of per capita amenities in a city decreases with a city’s total population. Next, we explore the exponent of this scaling relationship for amenities of different types. We find that some amenities, such as museums, religious centers, and art galleries, scale slowly with a city’s population (roughly as the square root of a city’s total population (π½ ≈ 0.5)). Other amenities, such as restaurants, 19 bakeries, and dentists, scale almost linearly with a city’s population (π½ > 0.8) (For a summary of all exponents, see SM Table A.4). This diversity of scaling relationships tells us that the composition of amenities in a city changes with a city’s population. For instance, in a city with a population of only half a million people we expect to find, on average, 46 restaurants per museum, but in a city with ten times that population we expect to find almost double that (76 restaurants per museum). These different ratios are direct expressions of the difference in scaling exponents characterizing the dependence of restaurants and museums in a city’s size (Figure 3-1b). But not all amenities correlate strongly with a city’s population. In fact, the relationship between the number of amenities and a city’s population is noisy for many amenities. To distinguish the amenities that correlate strongly with a city’s population from those that don’t we use the π 2 statistic of the scaling relationship connecting a city’s population with the number of amenities of each type (Figure 3-1c). A high π 2 (π 2 > 0.5), such as that characterizing the scaling of restaurants, schools, and shoe stores, means that a city’s population is a strong predictor of the number of amenities of that type in a city. A low π 2 (π 2 < 0.5), such as that characterizing the scaling of museums, embassies, and universities, means that a city’s population is an incomplete predictor of the number of amenities of that type in a city. Note that the observed π 2 s, but not the scaling exponents, will be almost the same if we were to use the total number of amenities instead of population as a measure of city size, since a city’s population correlates almost perfectly with that city’s total number of amenities (Fig. 3-1a π 2 = 90%). 20 Figure 3-1: Intercity Scaling Relations. a Scaling of the total number of amenities in a city π (ππ ), as a function of a city’s population (ππ ). The total number of amenities in a city scales as ππ = π0 πππ½ with π0 = 2.03, π½ = 0.68, and π 2 = 0.90. Each point represents one of the 47 US cities in our dataset. b Scaling of the total number of restaurants and museums in a city as a function of a city’s population. Each point represents the number of restaurants (yellow) or museums (blue) in a different city. The figure shows that the scaling exponent of restaurants (π½ = 0.81) is larger than the scaling exponent of museums (π½ = 0.59) meaning that the number of restaurants per museum increases with a city’s population. c The scaling exponent (horizontal axis) and the goodness of fit (π 2 , vertical axis) of the scaling relationship of each amenity type. The horizontal dashed line separates amenities whose number correlates strongly with a city’s population (π 2 > 0.5) from those characterized by a milder correlation (π 2 < 0.5). The vertical dashed line separates amenities that scale with population faster than the total amount of amenities in a city (π½ > 0.68) from those that scale slower than that. 21 3.2 Micro-clusters But do these scaling relationships hold at the intra-city scale? To explore the intra-city scale we first need to divide the city into meaningful intra city units. To perform this division we introduce a clustering algorithm that splits cities into micro-clusters, which are spatially localized and bounded agglomerations of amenities. Then, we study the city at the intra-city scale by using micro-clusters as our unit of study. As a measure of the size of a micro-cluster we use the total number of amenities present in it. Switching from cities to micro-clusters as our unit of study will reveal that the size of a micro-cluster, unlike that of a city, is a poor predictor of the number of amenities of each type present in it. Yet, as we will show, we can recover some of the predictability lost when moving to the intra-city scale by using data on the types of amenities that are present in each micro-cluster (and controlling for over-fitting by using both Akaike’s and Bayes’ Information Criteria). We begin the spatial clustering of urban amenities by calculating the effective number of amenities that are present in each location π. We define the effective number of amenities in location π (πΌπ ), as the number of amenities that can be reached by walking from that location. Formally, the effective number of amenities in location π is the scalar function πΌπ : πΌπ = ππ ∑οΈ π−πΎπππ π=1 where πππ is the distance between amenity π and amenity π, πΎ is a decay parameter that discounts amenities based on their distance to location π, and ππ is the total number of amenities in city π. To interpret the values of πΌ it is useful to note that an amenity at the location where the measurement is taking place (i.e. with πππ = 0) contributes one to the effective number of amenities in 22 that location. An amenity π at distance πππ = 1/π —which would imply walking 1/π kilometers from amenity π will contribute only 1/π to location’s π effective number of amenities (πΌπ ). We find that our algorithm finds meaningful clusters when we set πΎ = 16, which implies that the contribution of an amenity to the effective number of amenities of a location roughly halves every 62.5 meters and becomes negligible at about 500 meters (the short side of the Manhattan block is 80 meters long). Figure 3-2 illustrates our clustering algorithm using the city of Boston as an example. The bottom layer (Fig. 3-2a) is a map of Boston used for spatial reference. The center layer (Fig. 3-2b) shows Boston’s effective number of amenities (πΌ) for all the locations where an amenity is present. The top layer (Fig. 3-2c) shows the clusters identified using our algorithm (with different colors). To identify the amenities belonging to each cluster we begin by identifying each local peak on the effective number of amenities landscape defined by πΌ (Fig. 3-2b) as the center of a potential micro-cluster. We identify these local peaks by searching for locations that have an effective number of amenities πΌ larger than their π nearest neighbors (using a functional heuristic to find the π that works best for each πΌ—see SM). Then, we assign amenities to a micro-cluster by using the following greedy algorithm: (i) We initialize clusters by assigning to each cluster center all amenities that are in close proximity to it (less than 0.5 kms). (ii) We calculate the distance between each unassigned amenity and the amenities that have been assigned to a cluster. (iii) We assign to a cluster only the amenity that is closest to an amenity that has already been assigned to a cluster. And (iv), we recalculate the distance between assigned and unassigned amenities and repeat step (iii) and (iv) until all amenities have been assigned to a cluster. An example of the clusters found for the city of Boston is shown in Figure 3-2c (see SM for more examples). 23 Figure 3-2: Clustering algorithm. a Map of Boston b The number of effective amenities (πΌ) at each location where an amenity is present in Boston. Peaks represent locations with a high number of effective amenities and valleys represent locations with a low number of effective amenities. The black dots represent the local maxima identified by our clustering algorithm. These points represent the centers of a micro-cluster (for example, Kendall/MIT or the North End). c Clusters identified using our clustering algorithm. Each cluster is expressed as a set of dots of the same color, each dot representing an amenity. The center of each cluster is marked using a black dot. 24 Overall, we find that the clusters identified using this algorithm correspond to well-known centers of urban activity. In the case of Boston these clusters include Harvard Square and Central Square in Cambridge and The North End and Coolidge Corner in Boston, among others. We also note that the distribution of the effective number of amenities in a city is also characterized by some universal properties. Figure 3-3a shows the distribution of the effective number of amenities (πΌ) for every city in our dataset while Figure 3-3b shows the same distribution after normalizing the effective number of amenities in a city by that city’s average (< πΌ >= ∑οΈ πΌ π π ππ ). For comparison, we also show the same distributions for an ensemble of cities where the location of each amenity has been randomized. These randomized cities are characterized by a narrow distribution for their effective number of amenities, meaning that these random cities lack the high concentrations of amenities that indicate the presence of micro-clusters in real cities. More importantly, figure 3-3b shows that once we normalize the effective number of amenities in a city by that city’s average all cities follow the same lognormal distribution π( πΌπ = π₯) = πππ (π, π) <πΌ> with π = −0.404 and π = 0.89. The existence of a universal distribution for the effective number of amenities across all cities in our sample means that all of these cities have an equal number of peaks and valleys of a given magnitude when the magnitude of these peaks and valleys is measured in units of that city’s average. 25 Figure 3-3: Intercity Scaling Relations. a The distribution of the effective number of amenities in each US city. Blue lines show the distribution observed in our urban amenities data and orange lines show the distribution observed after randomizing the location of amenities for each city. b The distribution of the effective number of amenities in each US city normalized by the average effective number of amenities in that city. Blue lines show the distribution observed in the cities data and orange lines show the distribution observed in the same cities but after randomizing the location of amenities 26 3.3 Intra-city scaling Now that we have identified micro-clusters for all cities in our data we analyze whether the scaling relationships that hold at the intercity scale also hold at the scale of micro-clusters (i.e. we test whether the number of amenities of each type in a cluster scales with the size of that cluster). Figure 3-4a compares the scaling relationships observed at the intercity scale with the scaling relationships observed at the intra-city scale for a subset of amenities and two different models (for all amenities see SM table A.5). In light colors (light blue and vermillion) we show the accuracy of models predicting the number of amenities of a given type in a city or a micro-cluster using only information on that city or cluster’s size. The dark bars (navy and crimson) show the accuracy of a model using information on the composition of amenities in a city or micro-cluster (which we will explain later). The comparison between the size based models show that amenities, such as schools, doctors, and shoe stores, which correlate strongly with the total number of amenities in a city (average inter-city scaling π 2 > 70%), do not scale well with the total number of amenities in a micro-cluster (average inter-city π 2 < 18%). This indicates that the scaling laws observed in the intercity scale fail to hold—for most amenities—at the intra-city scale. Next, we try to recover some of the predictability lost at the intra-city scale by introducing a model based on the composition of a micro-cluster—the types of amenities present in it. 3.4 Recommender System We begin the construction of the composition-based model by studying the collocation of pairs of amenities across all clusters. Figure 3-3b shows the network of correlations between pairs of amenities calculated using spearman’s rank correla27 tion across all clusters. We build the skeleton of this network using a Maximum Spanning Tree algorithm and then add edges between amenities that have a pairwise correlation equal or larger than 0.3 (see SM for the full correlations matrix) [22]. The network shows that amenities tend to collocate with other amenities of similar types. For example, car repair shops collocate with car dealers (Spearman’s π = 0.45), religious centers collocate with schools (Spearman’s π = 0.46), and nightclubs collocate with bars (Spearman’s π = 0.36). Also, the network shows that amenities sometimes tend to collocate with amenities from different categories. For instance, clothing stores collocate with restaurants and beauty salons (respective Spearman’s π = 0.52 π = 0.45). What is more important, however, is that these patterns of collocation suggests that it is possible to create a parsimonious model to predict the number of amenities of a type in a cluster using information on the presence of other amenities in it, since the network indicates that the presence of a set of amenities in a cluster carries information about the presence of other amenities. Finally, we use the collocation of amenities in a cluster to create an algorithm that we can use to predict the number of amenities that should locate in each micro-cluster and create a recommender system that we can use to identify micro-clusters where particular amenities are over or under-supplied. To create this algorithm we need to go beyond pairwise correlations, as the high clustering of the network of collocations (Fig. 3-4) indicates that the information about the presence of an amenity in a cluster carried by the presence of other amenities is likely to have some redundancy. Going forward, we go beyond pairwise correlations by using a forward selection algorithm that iteratively adds types of amenities to a regression until the contribution of the presence of a new amenity type to the predictive power of the regression is characterized by a p-value of more than 0.001 (see SM). In addition, we validate the models resulting from 28 Figure 3-4: Micro-Cluster Composition a Light blue and light red bars, respectively, correspond to the π 2 of the predictions obtained using the size of a city (left) and the size of each micro-cluster (right). The dark blue and dark red bars correspond, respectively, to the π 2 of the predictions obtained using the composition of cities (left) and the composition of micro-clusters (right). (For all amenities see SM). b The nodes in the network represent different types of amenities and the edges connect amenities that are likely to collocate in a microcluster (see SM). The width of the edges connecting a pair of nodes is proportional to the spearman correlation obtained from the collocation of the two types of amenities across all micro-clusters. The size of a node is proportional to the number of times that an amenity is present in our data set. The color of each node represents the category that the amenity belongs to. 29 this forward selection algorithm by using both Akaike’s Information Criterion (AIC) and Bayes’s Information Criterion (BIC). By using AIC and BIC we ensure that the models that we obtain are not better than the models using size simply because they include more variables. The red bars of Figure 3-4a (vermillion and crimson) compare the π 2 of the models constructed using the size of micro-clusters with the π 2 of the models constructed using the composition of micro-clusters. In most cases (66/74 = 89%), the BIC test chooses the regression using the composition of a micro-cluster over the regression using its size (the exception are airports, aquariums, bus stations, car rentals, casinos, convenience stores, gas stations, and zoos). Also, we note that these results are not just statistically significant, but characterized by strong size effects. On average, for the 66 amenity types in which the composition model works better, the π 2 of the composition model is twice that of the model using size only (π 2 = 17% on average using size vs. π 2 = 35% on average using composition), meaning that the increase in predictive power obtained by considering the composition of amenities in a cluster is not only statistically significant, but also substantial. Finally, we use the composition model described above to create a recommender system [22, 24] to suggest amenities that might be missing in an urban cluster. We predict missing amenities by calculating the difference between the number of amenities in a cluster predicted by the composition model and the number of amenities of that type observed in each cluster. Figure 3-5 compares the number of car parks, hotels, and beauty salons, observed and predicted, for each micro-cluster in Boston. Points above the lines, such as Harvard Square in car parks (Figure 3-5a), the North End in hotels (Figure 3-5b), and Central Square in Beauty Salons (Figure 3-5c), suggest instances of unsatisfied demand. Points below the lines such as Boston’s Theatre District 30 in car parks, Coolidge Corner in hotels, and Winthrop in beauty salons, suggest instances of excess demand. Of course, these suggestions should not be taken literally. For instance, a decision to build new parking in Harvard square is a decision that requires considering many aspects of Harvard Square that are not included in our model, such as the aesthetics of its architecture [25, 26] or the externalities caused by cars. Nevertheless, this validation shows that our model automatically captures the under-supply of parking that characterizes Harvard square (and that is well known to Cambridge residents). Figure 5b, on the other hand, shows that our model suggests a lack of hotels in the North End, a wellknown tourist spot where only a handful of hotels are present. This could mean that there is a great potential for new hotels to locate in Boston’s North End, but once again, this is a decision that would need to incorporate other factors, such as North End’s famous idiosyncratic architecture and active resident community [4]. 31 Figure 3-5: Prediction of amenities in Boston’s micro-clusters. a Observed vs. predicted number of car parks, b hotels, and c, beauty salons for each micro-cluster in Boston. Points above the lines represent micro-clusters where the predicted number of amenities is higher than the observed, suggesting instances of unsatisfied demand (or missing data). Points below the lines represent micro-clusters where the predicted number of amenities is lower than the observed, suggesting instances of excess demand. 32 Chapter 4 Discussion During recent years the quantitative study of cities has focused extensively on inter-city studies, and in particular, on inter-city scaling laws. These intercity studies, however, do not tell us much about the spatial distribution of a city’s characteristics. In this paper we extended this literature to the intra-city scale by focusing on micro-clusters of urban amenities and by showing that the scaling laws that hold at the inter-city scale need to be replaced by multivariate statistical models that exploit information on the composition of micro-clusters to predict the number of amenities of each type that is present in each micro-cluster. Of course, our results and models are not free of biases and limitations. Beyond the data biases described above, our model is limited by its simplicity, which bounds the total amount of variance in the presence of amenities that we can explain. Our statistical model predicts the number of amenities that locate in a micro-cluster using regressions without interaction terms. This means that the models could be potentially improved by using more complex functional forms, but also, by adding to them information that is not expressed in the presence of amenities, such as the aesthetic appeal of a neighborhood’s architecture [25, 26], it’s foot traffic as captured by mobile phone data [27], or the centrality of the 33 urban micro-cluster in the context of the city. Still, the results and methods presented here point to interesting new avenues of research. For example, time resolved data sources for both amenities and streetscapes could be used to explore the interaction between the dynamics of the amenities that locate in a micro-cluster and the types of buildings being constructed in it. Also, these results could be used to help inform what types of business permits need to be given out to help balance the micro-clusters of a city’s neighborhoods. On the computational side, the information uncovered here could be used to create new meso-scale city maps that can help users understand a city’s micro-clusters, but also, deliver the recommendations for each micro-cluster uncovered by our algorithm or similar algorithms. Together, our results, and the new avenue of research they open, should help stimulate further quantitative study of the multivariate statistical laws that characterize cities at the intra-city scale. 34 Appendix A Supplementary Material A.1 Data Amenities Data: We collected data from the Google Places API containing the latitude, longitude and type (cafe, restaurant, library, etc.) of the urban amenities located in 47 US cities. The original data set contains 95 different types of amenities but we merged them into 74 categories by aggregating data on amenities that fulfill similar functions (Table A.1) and excluding amenities that are unspecific (such as the "store" category) or for which little data is available. The amenities we exclude are: taxi stand, campground, store, subway station, RV park, movie rental, and shopping mall. The resulting amenities are shown in Table A.2. Population Data: We collect data on the population of each city from Wikipedia. Table B.1 in shows all the administrative units in each city overlapping with our amenities data, and their population as indicated in Wikipedia. To obtain each city’s population we aggregate the population of each of the administrative units that overlap with our amenities data for that city. The final population of cities and their total number of amenities are shown in Table A.3. 35 Original Amenities Hindu temple Mosque Place of worship Synagogue Church Meal delivery Meal takeaway Food Restaurant Health Doctor Finance Bank Roofing Contractor Electrician Plumber Painter General Contractor New Amenities Religious center Restaurant Doctor Finance Construction contractor Table A.1: The left column shows the amenities that were merged into a new amenity type, shown in the right column. Amenity Accounting Airport Points 17280 1535 Category Services Transportation Amusement park Aquarium Art gallery 1017 Entertainment 492 5358 Entertainment Entertainment ATM 30753 Services Bakery Bar Beauty salon Bicycle store Book store 9255 21506 41851 1409 3417 Food & Drinks Food & Drinks Services Shopping Shopping Amenity Gym Hardware store Home goods store Hospital Hotel and lodging Insurance agency Jewelry store Laundry Lawyer Library Liquor store 36 Points 5934 4595 Category Health Shopping 29537 Shopping 7942 11452 Health Services 27866 Services 6751 14391 37611 3466 7948 Shopping Services Services Education Shopping Bowling alley 366 Entertainment Government Services Services Services Entertainment Other Government Shopping Local Gov- 10081 ernment Office Locksmith 2182 Movie The- 1232 ater Moving 12744 Company Museum 2161 Night Club 5675 Park 25723 Parking 5527 Pet Store 2270 Pharmacy 15204 Physiotherapist 7929 Bus station Cafe 110642 9485 Transportation Food & Drinks Car dealer 11603 Services Car rental Car repair Car wash Casino Cemetery City hall Clothing store Construction contractor Convenience store Courthouse 2968 40215 3202 172 2386 140 29806 86044 Services Police 1613 Government 13818 Shopping Post Office 2723 Services 717 Government 39484 Services Dentist 26071 Health 58468 Other Department store Doctor Electronics store Embassy Finance Fire station Florist 3515 Shopping Real Estate Agency Religious Centers Restaurant 112430 Food & Drinks 153772 11876 Health Shopping School Shoe Store 46516 8612 Education Shopping 688 32221 2050 5102 Government Services Government Shopping 2843 1245 5849 1262 Health Entertainment Services Transportation Funeral home Furniture store Gas station 2761 Services 7394 Services 12379 Shopping Spa Stadium Storage Train Station Travel Agency University 6597 Education 2552 Services 5373 Services Grocery or supermarket 15206 Shopping Veterinary Care Zoo 114 Entertainment Total 1,262,374 Services Entertainment Services Entertainment Food & Drinks Other Transportation Shopping Shopping Health Table A.2: Total number amenities of each type in the Google Places data set in the 47 US cities in our study. The Categories column shows the category we assign each amenity type to when we study the collocation of amenities. 37 City Atlanta Austin Baltimore Birmingham Boston Buffalo Charlotte Chicago Cincinnati Cleveland Columbus Dallas Denver Detroit Houston Indianapolis Jacksonville Las Vegas Los Angeles Louisville Memphis Miami Milwaukee Naples Population Number of Amenities 447,841 19,050 885,400 22,592 642,587 14,434 389,250 15,066 1,121,438 19,769 258,959 7,409 850,880 19,954 3,618,465 64,531 453,968 13,818 685,931 18,496 1,128,075 27,854 2,435,949 44,358 1,757,830 32,731 973,284 21,776 3,362,560 80,011 1,468,843 212,96 1,007,094 204,66 1,850,966 29,009 6,428,879 114,002 840,601 22,425 832,803 21,350 800,216 13,403 822,777 20,590 95,796 5,970 City Nashville New Orleans New York Oklahoma Orlando Philadelphia Phoenix Pittsburgh Portland Providence Raleigh Richmond Sacramento Salt Lake San Antonio San Diego San Francisco San Jose Seattle St Louis Tampa Virginia Beach Washington Total Population Number of Amenities 737,796 21,619 570,943 14,607 8,405,837 75,081 922,506 21,010 493,524 20,559 1,945,795 40,410 2,046,991 39,354 466,879 15,714 609,456 21,043 290,459 6,653 582,834 15,884 262,944 9,437 767,408 20,372 210,806 9,444 1,511,307 35,255 2,297,970 46,614 837,442 18,984 1,472,951 30,868 622,155 20,514 361,273 12,125 742,583 25,285 448,479 10,619 1,267,943 20,310 60,940,877 1,236,151 Table A.3: Population and total number of amenities of each city. A.2 Intercity Scaling We explore the scaling exponent π½ of the scaling relationship (πππ = π0 πππ½ ) for each type of amenity π, πππ , in a city π with population of that city, ππ , finding that scaling exponents vary greatly for each amenity type. Table A.4 shows π0 , π½, and π 2 of the scaling relationship of each type of amenity. 38 Amenity Accounting Airport Amusement park Aquarium Art gallery π0 0.014 0.003 0.000 π½ 0.727 0.655 0.769 π 2 0.751 0.362 0.309 π0 0.002 0.009 0.028 π½ 0.797 0.658 0.710 π 2 0.869 0.783 0.709 0.006 0.003 0.725 0.797 0.837 0.712 0.013 0.763 0.582 0.003 0.003 0.536 0.005 0.001 0.094 0.766 0.814 0.520 0.681 0.897 0.553 0.833 0.775 0.627 0.755 0.775 0.780 0.000 0.001 0.009 0.861 0.704 0.734 0.526 0.776 0.544 0.681 0.613 0.601 0.035 0.124 0.285 0.846 0.620 Amenity Gym Hardware store Home goods store Hospital Hotel and lodging Insurance agency Jewelry store Laundry Lawyer Library Liquor store Local Government Office Locksmith Movie Theater Moving Company Museum Night Club Park Parking Pet Store Pharmacy Physiotherapist Police 0.000 0.010 0.765 0.653 0.621 0.742 Atm 0.041 0.690 0.695 Bakery Bar Beauty salon Bicycle store Book store Bowling alley 0.001 0.016 0.025 0.001 0.002 0.002 0.847 0.728 0.745 0.733 0.735 0.589 0.936 0.799 0.816 0.699 0.829 0.421 Bus station cafe Car dealer 18.313 0.004 0.015 0.320 0.767 0.684 0.250 0.756 0.420 Car rental Car repair Car wash Casino Cemetery City hall Clothing store Construction contractor Convenience store Courthouse 0.000 0.024 0.001 0.004 0.006 0.004 0.012 0.179 0.839 0.743 0.790 0.461 0.634 0.465 0.772 0.656 0.010 0.003 0.014 0.010 0.000 0.007 0.008 0.001 0.592 0.762 0.751 0.667 0.891 0.763 0.706 0.737 0.692 0.852 0.736 0.765 0.890 0.908 0.830 0.744 0.054 0.611 0.462 Post Office 0.002 0.736 0.888 0.001 0.710 0.717 0.041 0.704 0.710 Dentist 0.002 0.902 0.823 0.156 0.639 0.748 Department store Doctor Electronics store Embassy Finance Fire station Florist Funeral home 0.005 0.680 0.519 Real Estate Agency Religious Centers Restaurant 0.024 0.817 0.945 0.205 0.003 0.689 0.803 0.818 0.790 School Shoe Store 0.015 0.002 0.790 0.827 0.925 0.887 0.000 0.024 0.011 0.002 0.018 0.838 0.729 0.583 0.766 0.569 0.153 0.798 0.371 0.843 0.564 Spa Stadium Storage Train Station Travel Agency 0.000 0.003 0.003 0.005 0.001 0.853 0.643 0.747 0.558 0.856 0.687 0.460 0.454 0.133 0.857 39 Furniture store Gas station Grocery or supermarket 0.011 0.005 0.006 0.716 0.652 0.766 0.796 0.289 0.878 University Veterinary Care Zoo 0.149 0.002 0.009 0.482 0.764 0.398 0.225 0.648 0.364 Table A.4: Shows the value of the parameters π0 , π½ and π 2 of the scaling relationship, of the total number of each type of amenity in a city, π΄ππ , with that city’s population, ππ expressed as: π΄ππ = π0 πππ½ . A.3 A.3.1 Clustering Effective number of amenities We begin our clustering procedure by calculating the effective number of amenities at each location. The effective number of amenities, πΌπ , in a location π represents the number of amenities that can be reached by walking from that location. We define πΌπ as: πΌπ = ππ ∑οΈ π=1 π−πΎπππ = π ∑οΈ π=1 ππ ∑οΈ π−πΎπππ + π=π+1 π−πΎπππ = π ∑οΈ π−πΎπππ + π π=1 where πππ is the distance (in km) between amenity π and amenity π, and ππ is the total number of amenities in a city π. πΎ is a decay parameter that discounts amenities based on their distance to location π. We set πΎ = 16, meaning that the contribution of an amenity to the effective number of amenities at a location roughly halves every 62.5 meters and becomes negligible at about 500 meters. To simplify the calculation of the effective number of amenities in a location we use π amenities instead of ππ . Theoretically all of the amenities in a city should contribute to a location’s effective number of amenities, but since amenities that are far from a location are discounted by an exponential factor, considering 40 the contribution of the π closest amenities gives already a good approximation. In general, we find that the effective number of amenities for a location does not change after considering the first few hundred amenities, indicating that π = 2, 000 provides a set that is large enough to provide a good estimate for a location’s effective number of amenities. A.3.2 Identifying cluster centers We continue our clustering procedure by identifying the centers of each microcluster as the local peaks on the landscape. We identify local peaks by searching for locations that have an effective number of amenities, πΌπ , larger than their ππ nearest neighbors. We define ππ as: ππ = 3πΌπ + 50, i.e. a function of the effective number of amenities at location π, so that the centers of very dense clusters are required to have larger πΌπ than a large number of neighbor amenities, while centers of very sparse clusters are required to have larger πΌπ than a small number of neighboring amenities. By setting ππ proportional to πΌπ we avoid assigning multiple cluster centers to areas with high density of amenities, and we avoid not assigning any cluster center to areas with a low density of amenities. A.3.3 Assigning points to clusters Finally, we assign points to micro-clusters using the cluster centers we obtained. First, we remove the 10% of the points in each city with the lowest effective number of amenities, to eliminate isolated amenities that are not part of a microcluster. After that, we assign all amenities that are within a distance of 0.5km of a cluster center to that cluster center. Then, we calculate the distance from each unassigned point to each assigned point. Furthermore, we iteratively: 1. Choose the unassigned point, π’, which is closest to an assigned point, π. 41 2. Assign point π’ to the cluster point a belongs to. 3. Calculate the distance from each unassigned point to the newly assigned point π’. The algorithm finalizes once all points have been assigned to a cluster. Figure A1 shows the effective number of amenities in the cities of Boston, San Francisco, and New York (left figures), and the corresponding assignments of amenities to clusters (right figures). A.4 Collocation of amenities To study the collocation patterns of amenities, we calculate the spearman correlation between all pairs of amenities across clusters. We show the resulting correlations in the form of a network, where nodes represent amenity types and edges connect amenities that are highly correlated across micro-clusters. To construct this network we first create a Maximum Spanning Tree (MST) of the network and then add edges only between amenities that have a pairwise correlation equal or larger than 0.3. Here, we show the values of all spearman correlations between amenities across clusters in the form of a matrix (Figure A-2). We cluster amenities using Ward linkages. A.5 Predictions We construct four regression models to predict each type of amenity in the intercity and intra-city scale using two different metrics: size and composition. In the inter city scale, we predict the number of each type of amenity in a city using the total number of amenities in a city and the composition of amenities in the 42 Figure A-1: Clustering Algorithm: Boston, SF, NY. The figures on the right show the effective number of amenities at each location in the cities of a Boston, b San Francisco, and c New York. Red lines correspond to areas with a high effective number of amenities and blue lines correspond to areas with a low effective number of amenities. The black dots represent the locations we assign as cluster centers. The figures on the left show the corresponding assignment of amenities to micro-clusters. Each dot represents an amenity, and sets of dots of the same color constitute a micro-cluster. 43 Figure A-2: Amenities correlations matrix. Matrix showing the Spearman correlation between each pair of amenities. Amenities are clustered using Ward linkages. 44 city. In the intra-city scale, we predict the number of each type of amenity in a micro-cluster using the size of micro-clusters and the composition of amenities in each micro-cluster. We create a model that uses the total number of amenities in a micro-cluster to predict the number of each type of amenity in that micro-cluster. To construct these models we use a forward selection algorithm that iteratively adds types of amenities to a regression until the contribution of the presence of a new amenity type to the predictive power of the regression is characterized by a p-value of more than 0.001 (nextly we explain how we use AIC and BIC to verify our model selection). Table A.5 shows the π 2 obtained for each of these models. Given that these four models use a different number of samples and parameters, we calculate the Akaike Information Cirterion (AIC) and Bayesian Information Criterion (BIC) of each of the models. These criteria allow us to differentiate the models: the lower the AIC and BIC values, the more desirable the model (better fit and less overfitted). The AIC and BIC values obtained for each model are summarized in Table A.6. Accounting Airport Amusement Park Aquarium Art Gallery ATM Bakery Bar Beauty Salon Bicycle Store Book Store Bowling Alley Bus Station Cafe Car Dealer Car Rental Intercity Scaling Size Composition 0.946 0.985 0.575 0.816 0.382 0.724 0.709 0.880 0.603 0.930 0.911 0.967 0.777 0.980 0.649 0.966 0.952 0.989 0.594 0.919 0.878 0.980 0.478 0.702 0.242 0.431 0.649 0.956 0.608 0.850 0.831 0.942 45 Intra-City Scaling Size Composition 0.291 0.448 0.016 0.114 0.002 0.005 0.014 0.028 0.114 0.271 0.320 0.465 0.364 0.543 0.462 0.750 0.449 0.615 0.080 0.183 0.245 0.344 0.004 0.014 0.023 0.237 0.505 0.670 0.003 0.231 0.042 0.118 Car Repair Car Wash Casino Cemetery City Hall Clothing Store Construction Contractor Convenience Store Courthouse Dentist Department Store Doctor Electronics Store Embassy Finance Fire Station Florist Funeral Home Furniture Store Gas Station Grocery or Supermarket Gym Hardware Store Home Goods Store Hospital Hotel and Lodging Insurance Agency Jewelry Store Laundry Lawyer Library Liquor Store Local Government Office Locksmith Movie Theater Moving Company Museum Night Club Park Parking Pet Store Pharmacy Physiotherapist Police Post Office Real Estate Agency 0.867 0.828 0.016 0.126 0.379 0.884 0.824 0.629 0.676 0.954 0.673 0.957 0.924 0.102 0.953 0.490 0.889 0.476 0.912 0.443 0.791 0.911 0.896 0.908 0.958 0.795 0.825 0.902 0.933 0.871 0.610 0.753 0.901 0.671 0.780 0.721 0.499 0.735 0.669 0.666 0.812 0.878 0.863 0.681 0.859 0.835 0.976 0.970 0.000 0.585 0.449 0.993 0.978 0.928 0.738 0.974 0.945 0.986 0.966 0.419 0.983 0.632 0.981 0.787 0.980 0.777 0.955 0.984 0.953 0.986 0.979 0.824 0.981 0.978 0.984 0.894 0.937 0.815 0.937 0.752 0.952 0.931 0.951 0.957 0.745 0.938 0.943 0.949 0.931 0.866 0.964 0.952 46 0.016 0.005 0.002 0.001 0.031 0.298 0.135 0.042 0.088 0.262 0.016 0.408 0.224 0.046 0.424 0.018 0.207 0.018 0.173 0.000 0.116 0.229 0.020 0.213 0.096 0.250 0.234 0.208 0.180 0.359 0.180 0.175 0.181 0.033 0.125 0.012 0.221 0.326 0.149 0.374 0.077 0.169 0.081 0.052 0.090 0.381 0.437 0.071 0.008 0.015 0.151 0.718 0.456 0.134 0.446 0.439 0.200 0.694 0.355 0.114 0.610 0.058 0.259 0.146 0.444 0.028 0.377 0.339 0.194 0.517 0.546 0.435 0.433 0.352 0.354 0.570 0.416 0.301 0.567 0.053 0.190 0.131 0.412 0.606 0.320 0.610 0.192 0.371 0.260 0.201 0.130 0.513 Religious Centers Restaurant School Shoe Store Spa Stadium Storage Train Station Travel Agency University Veterinary Care Zoo 0.744 0.921 0.948 0.916 0.784 0.613 0.632 0.099 0.813 0.238 0.814 0.343 0.868 0.995 0.976 0.966 0.940 0.749 0.912 0.414 0.931 0.351 0.966 0.680 0.171 0.659 0.251 0.153 0.182 0.010 0.010 0.047 0.292 0.020 0.020 0.001 0.430 0.826 0.438 0.648 0.297 0.107 0.123 0.087 0.402 0.328 0.115 0.011 Table A.5: π 2 of the intercity and intra-city models we construct using metrics of size and composition of cities (in the case of the intercity) and micro-clusters (in the case of the intra-city). Accounting Airport Amusement Park Aquarium Art Gallery Atm Bakery Bar Beauty Salon Bicycle Store Book Store Bowling Alley Bus Station Cafe Car Dealer Car Rental Car Repair Car Wash Casino Cemetery City Hall Clothing Store Intercity Scale Size AIC BIC 387.1 389.0 283.6 285.4 252.4 254.3 Comp. AIC 610.2 534.2 492.8 160.8 404.3 458.2 437.6 508.4 471.4 268.8 278.1 132.9 667.5 443.1 461.3 290.1 521.0 293.8 216.3 341.0 76.2 500.4 395.3 600.4 614.1 479.8 724.1 625.0 283.5 510.0 414.8 735.0 461.9 714.4 443.9 731.4 460.0 443.5 586.3 293.2 592.9 162.6 406.2 460.1 439.5 510.2 473.3 270.6 280.0 134.8 669.3 445.0 463.1 291.9 522.8 295.6 218.2 342.9 78.0 502.2 Scale BIC 615.7 536.0 496.5 Intra-City Size AIC 7233.6 -14564.4 -6923.4 BIC 7240.7 -14557.4 -6916.4 Comp. AIC 5467.9 -14565.1 -6940.3 BIC 5630.0 -14480.5 -6933.2 399.0 605.9 617.8 483.5 731.5 628.7 285.4 517.4 418.5 736.9 465.6 716.2 449.4 738.8 467.4 443.5 590.0 295.0 600.3 -24141.2 15448.2 14260.7 2507.6 20416.0 19820.0 -16203.0 -7584.9 -28639.3 34335.6 4533.6 10190.0 -3146.7 22908.0 -11654.7 -35421.5 -21285.1 -37437.7 31184.7 -24134.2 15455.3 14267.7 2514.7 20423.0 19827.0 -16196.0 -7577.8 -28632.3 34342.7 4540.6 10197.1 -3139.7 22915.1 -11647.6 -35414.5 -21278.0 -37430.7 31191.8 -23744.2 13780.7 12446.0 -349.5 14125.5 16895.3 -17083.1 -8461.1 -28784.8 34766.9 1134.4 8802.5 -3181.3 18230.6 -11747.7 -35127.2 -21423.3 -38618.9 23911.6 -23709.0 13893.4 12664.5 -208.5 14358.1 17113.8 -16970.3 -8313.1 -28756.6 34936.1 1338.8 8873.0 -3110.8 18371.6 -11663.2 -35113.1 -21402.2 -38562.5 24024.4 47 Construction Contractor Convenience Store Courthouse Dentist Department Store Doctor Electronics Store Embassy Finance Fire Station Florist Funeral Home Furniture Store Gas Station Grocery or Supermarket Gym Hardware Store Home Goods Store Hospital Hotel and Lodging Insurance Agency Jewelry Store Laundry Lawyer Library Liquor Store Local Government Office Locksmith Movie Theater Moving Company Museum Night Club Park Parking 591.7 593.6 574.4 578.1 23556.0 23563.1 20067.1 20243.3 455.2 457.1 570.9 572.8 1726.9 1733.9 2246.0 2394.0 160.9 436.7 327.4 162.8 438.6 329.2 404.9 714.8 510.3 406.8 718.5 514.0 -13810.3 19519.7 -6448.4 -13803.3 19526.7 -6441.4 -17901.4 17163.9 -7219.3 -17788.6 17311.9 -7064.3 578.7 386.1 580.5 388.0 607.2 587.0 614.6 590.7 42180.3 3688.0 42187.3 3695.0 36517.8 2189.0 36672.9 2322.9 329.3 440.6 293.9 332.2 336.3 389.5 331.1 442.5 295.8 334.1 338.1 391.4 632.8 615.9 600.6 524.5 568.3 441.7 634.7 623.3 602.4 530.0 570.1 447.2 -2578.3 20231.0 -17404.5 -4705.5 -10028.7 10673.1 -2571.2 20238.1 -17397.5 -4698.5 -10021.7 10680.1 -3268.8 16860.1 -17771.8 -5303.1 -10844.2 7599.0 -3205.4 17036.3 -17722.5 -5197.4 -10703.3 7697.7 333.2 467.7 335.0 469.6 543.6 592.6 545.5 596.3 -13926.6 9280.9 -13919.5 9288.0 -11923.4 6500.8 -11867.0 6677.1 321.2 299.0 323.1 300.9 463.7 516.4 469.2 522.0 -2721.9 -8239.8 -2714.9 -8232.8 -4013.8 -9960.1 -3837.6 -9854.4 469.2 471.0 645.5 651.0 17169.9 17177.0 13170.9 13290.7 310.3 413.6 312.1 415.5 428.4 734.4 433.9 736.3 11386.1 12585.9 11393.2 12592.9 5907.4 10292.7 6027.2 10483.0 496.8 498.7 692.3 697.9 14861.7 14868.7 12397.0 12538.0 353.3 400.6 479.9 343.4 405.4 331.1 355.1 402.4 481.8 345.3 407.2 332.9 444.2 566.7 728.9 429.1 632.6 481.5 447.9 570.4 730.8 432.8 634.4 487.0 14860.2 4144.1 38846.6 -5993.1 -1736.3 12849.6 14867.3 4151.2 38853.6 -5986.1 -1729.2 12856.6 13143.0 2146.9 35662.3 -8949.8 -2355.6 7505.0 13269.9 2316.0 35831.4 -8808.9 -2242.8 7638.9 309.2 223.5 443.6 311.0 225.4 445.5 542.0 352.4 628.4 543.9 356.1 632.1 -14495.9 -16822.2 300.1 -14488.8 -16815.1 307.2 -14640.9 -17422.3 -457.0 -14591.5 -17337.7 -372.4 318.3 377.0 504.8 372.0 320.1 378.9 506.7 373.9 385.0 545.3 683.4 596.1 392.4 550.8 685.3 599.8 -4793.4 6321.7 11027.2 5373.6 -4786.3 6328.7 11034.2 5380.6 -6985.0 1774.4 10194.1 1963.1 -6872.2 1922.5 10363.3 2153.4 48 Pet Store Pharmacy Physiotherapist Police Post Office Real Estate Agency Religious Centers Restaurant School Shoe Store Spa Stadium Storage Train Station Travel Agency University Veterinary Care Zoo 285.8 429.4 353.1 252.8 269.4 515.7 287.6 431.2 355.0 254.6 271.3 517.6 452.0 643.7 667.6 349.2 504.4 668.7 455.7 647.4 673.1 352.9 508.1 672.4 -13001.7 7366.7 791.0 -15255.4 -12492.7 20820.6 -12994.6 7373.7 798.0 -15248.4 -12485.7 20827.7 -14005.4 5035.9 -544.4 -16701.6 -12755.2 19273.1 -13885.6 5176.9 -459.9 -16602.9 -12670.6 19449.4 565.0 566.9 729.1 732.8 24793.2 24800.3 21728.3 21883.4 602.3 488.4 363.4 307.5 225.0 394.6 334.7 398.7 403.4 336.8 604.2 490.3 365.2 309.4 226.8 396.4 336.6 400.6 405.3 338.6 745.4 741.3 607.3 456.4 553.0 582.3 545.1 545.0 557.5 619.2 752.8 745.0 611.0 460.1 554.9 586.0 547.0 548.7 559.3 624.8 32182.1 15330.2 18001.5 -8683.6 -13695.6 -6999.5 -10105.8 3926.0 24047.2 -3348.5 32189.1 15337.2 18008.6 -8676.6 -13688.5 -6992.4 -10098.8 3933.1 24054.3 -3341.5 26651.6 13283.1 11416.1 -9852.6 -13931.8 -7697.7 -10424.3 2523.5 21500.1 -3679.7 26912.4 13445.2 11528.9 -9732.7 -13875.4 -7606.1 -10389.0 2671.5 21627.0 -3538.8 70.8 72.7 144.4 148.1 -43402.7 -43395.6 -42907.2 -42879.0 Table A.6: AIC and BIC values of the intercity and intra-city models we construct using metrics of size and composition of cities (in the case of the intercity) and micro-clusters (in the case of the intra-city). 49 50 Appendix B Cities Administrative Units and Populations City Administrative District Population Atlanta Atlanta 447,841 Total City Population 447,841 Austin Austin 885,400 Baltimore Baltimore Arbutus Halethorpe 622,104 20,483 N/A Birmingham Birmingham Vestavia Hills Mountain Brook Homewood Bessemer Fultondale Gardendale Tarrant Center Point Chalkville Trussville 212,237 34,018 20,359 25,750 27,053 8,752 13,735 6,285 16,864 3,829 20,368 Boston Boston Quincy Milton 645,966 92,271 27,003 885,400 642,587 389,250 51 Dedham Brookline Somerville Cambridge Watertown Chelsea Belmont 24,729 58,732 75,754 105,162 31,915 35,177 24,729 Buffalo Buffalo 258,959 Charlotte Charlotte Mint Hill Matthews Pineville 792,862 23,341 27,198 7,479 Chicago Chicago Lincolnwood Park Ridge Rosemont Schiller Park Norridge Hardwood Heights Bensenville Franklin Park River Groove Elmwood Park Northlake Stone Park Melrose Park River Forest Oak Park Maywood Bellwood Berkeley Hillside Forest Park Broadview Westchester North Riverside Berwyn Cicero La Grange Park Riverside Brookfield Lyons Stickney 2,718,782 12,590 37,480 4,202 11,793 14,572 8,612 18,352 18,333 10,227 24,883 12,323 4,946 25,411 11,172 51,878 24,090 19,071 5,209 8,193 14,167 7,932 16,718 6,672 56,800 84,103 13,579 8,875 18,978 10,729 6,786 1,121,438 258,959 850,880 52 Forest View La Grange Western Springs Hinsdale Mc Cook Summit Countryside Indian Head Park Hodgkins Burr Ridge Palos Park Palos Heights Crestwood Willow Springs Justice Bedford Park Bridgeview Hickory Hills Palos Hills Chicago Ridge Worth Hometown Oak Lawn Evergreen Park Alsip Merrionette Park Robbins Blue Island 698 15,550 12,975 16,816 228 11,054 5,895 3,809 1,897 10,559 4,847 12,515 10,950 5,524 12,926 580 16,446 14,049 17,484 14,305 10,789 4,349 56,690 19,852 19,277 1,900 5,337 23,706 Cincinnati Delhi Covedale Mack Bridgetown North Dent Cheviot Monfort Heights White Oak North College Hill Groesbeck Finneytown Amberley Deer Park Kenwood Fairfax Mariemont 296,943 29,510 6,447 11,585 12,569 10,497 8,375 11,948 19,167 9,397 6,788 12,741 3,585 5,736 6,981 1,699 3,618,465 Cincinati 453,968 53 Cleveland Cleveland Cleveland Heights University Heights Shaker Heights Maple Heights Garfield Heights Parma Brook Park Brooklyn Rooky River Fairview Park 396,815 46,121 13,539 28,448 23,138 28,849 81,601 19,212 11,169 20,213 16,826 685,931 Columbus Columbus Westerville Huber Ridge Worthington Dublin Hilliard Upper Arlington Marble Cliff Grandview Heights Lincoln Village Urbancrest Grove City Obetz Groveport Blacklick Estates Reynoldsburg Bexley Whitehall Gahanna New Albany 787,033 36,120 4,883 13,575 41,751 28,435 33,771 573 6,536 9,482 960 36,832 4,628 5,540 9,518 36,347 13,057 18,062 33,248 7,724 Dallas Dallas Richardson Garland Farmers Branch Carrollton Irving University Park Highland Park Grand Prairie Duncanville Hutchins Seagoville Balch Springs 1,197,816 103,297 226,876 28,616 126,700 228,653 23,068 8,564 175,396 38,524 5,338 14,835 23,728 1,128,075 54 Mesquite Sunnyvale Rowlett Sachse Addison 139,824 5,130 56,199 20,329 13,056 Denver Denver Glendale Englewood Sheridian Cherry Hills village Greenwood Village Littleton Lakewood Edgewater Wheat Ridge Arvada Berkley Twin Lakes Westminster Sherrelwood Welby Commerce City Derby Thornton Federal Heights Northglenn Aurora 649,495 4,184 30,255 5,664 5,987 13,925 41,737 142,980 5,170 30,166 111,707 11,207 171 106,114 18,287 14,846 45,913 7,685 118,772 11,973 35,789 345,803 Detroit Detroit Lincoln Park Dearborn Melvindale Dearborn Heights Highland Park Hamtramck Grosse Pointe Woods Harper Woods Grosse Pointe Farms Grosse Pointe Grosse Pointe Park 681,090 38,144 95,884 10,525 57,774 11,629 22,423 15,838 13,990 9,316 5,326 11,345 Houston Houston Seabrook Kemah Webster 2,195,914 11,952 3,334 10,400 2,435,949 1,757,830 973,284 55 Friendswood Pearland Fresno Sugar Land Greatwood Rosenberg Richmond Pecan Grove Mission Bend Cinco Ranch Katy Cypress Jersey Village Hunters Creek Village Bellaire Spring Aldine Tomball Humble Porter Atascocita Huffman Crosby Highlands Channelview Jacinto City Galena Park Deer Park La Porte Pasadena South Houston Sheldon Barrett Cloverleaf Four Corners Meadows Place Missouri City Fifth Street Brookside Village 35,805 108,715 19,069 83,860 6,640 31,676 11,081 15,881 36,501 18,274 14,102 122,803 7,620 4,367 16,855 54,298 15,869 10,753 15,133 25,627 65,844 12,116 2,299 7,522 38,289 10,553 10,887 32,010 33,800 149,043 16,983 1,990 3,199 22,942 2,954 4,660 67,358 2,059 1,523 Indianapolis Lawrence Beech Grove Warren Franklin Township Perry Township 843,393 46,001 14,192 1,239 54,594 108,972 3,362,560 Indianapolis 56 Decatur Speedway Wayne Camby Pike Township Washington township 9,362 11,930 136,828 32,388 77,895 132,049 Jacksonville Lakeside Orange Park Oakleaf Plantation Bellair-Meadowbrook Terrace Atlantic Beach Neptune Beach Jacksonville Beach Ponte Vedra Beach Sawgrass Palm Valley Baldwin Nassau Village-Ratliff Callahan 821,784 30,943 8,412 20,315 13,343 Las Vegas North Las Vegas Whitney Winchester Paradise Henderson Spring Valley Summerlin South Enterprise Nellis AFB Sunrise Manor Blue Diamond 583,736 216,961 38,585 27,978 223,167 257,729 178,395 24,085 108,481 2,187 189,372 290 1,468,843 Jacksonville 12,895 7,124 21,823 37,924 4,942 19,860 1,430 5,337 962 1,007,094 Las Vegas 1,850,966 Los Angeles Los Angeles Santa Monica Marina del Rey Beverly Hills Culver City Inglewood Burbank La Crescenta Montroes La Canada Flintridge Glendale 57 3,884,307 89,736 8,866 34,290 38,883 109,673 103,340 19,653 20,246 196,021 Pasadena East Los Angeles South Pasadena San Marino Vernon Huntington Park Bell Bell Gardens Florence-Graham South Gate Lynwood Compton Willowbrook Long Beach Carson West Carson View Park-Windsor Hills Westmont Lennox Hawthorne Gardena El Segundo Manhattan Beach Redondo Beach Torrance Lomita Rolling Hills Palos Verdes Peninsula Rancho Palos Verdes Signal Hill 137,122 126,496 25,619 13,147 112 58,114 35,477 42,072 63,387 94,396 69,772 96,455 35,983 462,257 91,714 21,699 11,075 31,853 22,753 84,293 58,829 16,654 35,135 66,748 147,478 20,256 1,860 Louisville New Albany Clarksville Jeffersonville Oak Park Buckner Crestwood Mt Washington Hillview Brooks Shepherdsville Shively St Matthews Lyndon Northfield 609,893 36,372 21,724 44,953 5,379 4,000 1,999 9,117 8,172 2,401 11,222 15,157 15,852 11,002 970 41,643 11,465 6,428,879 Louisville 58 Rolling Hills Anchorage Middletown Hurstbourne Memphis West Memphis Bartlett Lakeland Germantown Collierville 907 2,264 7,218 3,884 653,450 26,245 55,055 12,430 39,161 46,462 Miami Miami Coral Gables Coral Terrace West Miami Miami Springs Gladeview Hialeah West Little River El Portal Miami Shores 419,777 49,631 24,380 5,965 13,809 14,468 224,669 34,699 2,325 10,493 Milwaukee Milwaukee Shorewood Whitefish Bay Glendale Brown Deer Bayside Wauwatosa West Allis Greenfield Greendale Hales Corners 599,164 13,162 14,137 12,872 12,088 4,411 47,068 60,732 37,072 14,325 7,746 Naples Naples Vineyards Golden Gate Lely Naples Manor Lely Resort Pelican Bay Naples Park East Naples 19,537 3,375 23,961 3,451 5,562 4,646 6,346 5,967 22,951 Nashville Nashville Ashland City 626,681 4,541 Memphis 832,803 800,216 822,777 95,796 59 Millersville Goodlettsville Hendersonville Mt Juliet 7,471 16,813 54,068 28,222 New Orleans New Orleans Marrero Harvey Gretna Terrytown Timerlane Arabi Chalmette Meraux Violet St Bernard 378,715 33,141 20,348 17,736 23,319 10,243 8,093 17,119 10,192 8,555 43,482 New York New York 8,405,837 Oklahoma Oklahoma Mustang Yukon Bethany Piedmont Edmond The village Nichols Hills Moore Del City Midwest City Spencer Jones Choctaw Harrah McLoud 610,613 17,395 22,709 19,563 5,720 81,405 3,710 55,081 21,332 54,371 3,746 2,517 15,205 5,095 4,044 737,796 570,943 8,405,837 922,506 Orlando Orlando Clarcona Pine Hills Orlovista Doctor Phillips Williamsburg Hunters Creek Oak Ridge Pine Castle Conway Belle Isle 244,483 2,990 60,076 6,123 10,981 7,646 14,321 22,685 10,805 13,467 5,988 60 Taft Meadow Woods Azalea Park Winter Park Goldenrod Eatonville Fairview Shores 2,205 25,558 12,556 29,203 12,039 2,159 10,239 Philadelphia Philadelphia Westville Gloucester City Mt Ephraim Bellmawr Barrington Haddonfield Collingswood Camden Cherry Hill Pennsauken Township Maple Shade Township Riverton Cinnaminson Cheltenham Glenside Abington Wyncote Jenkintown Rockledge Flourtown Wyndmoor Plymouth Meeting Darby 1,553,165 4,288 11,402 4,676 11,540 6,983 11,507 13,850 76,903 71,722 35,830 19,043 2,779 16,763 4,810 8,384 55,234 3,044 4,422 2,550 4,538 5,498 6,177 10,687 Phoenix Phoenix Tolleson Glendale Peoria Sun City Tempe Guadalupe 1,445,632 6,756 226,721 162,592 37,499 161,719 6,072 Pittsburgh Pittsburgh Homestead Whitaker Munhall Brentwood 305,841 3,165 1,271 11,380 9,643 493,524 1,945,795 2,046,991 61 Whitehall Castle Shannon Mt Oliver Dormont Scott Township Green Tree Carnegie Ingram Crafton Rosslyn Farms McKees Rocks Stowe Township Avalon Bellevue Reserve Township Millvale Sharpsburg Aspinwall Wilkinsburg Edgewood Rankin Braddock 13,938 8,316 3,403 8,593 17,024 4,431 7,972 3,330 5,951 427 6,104 6,362 4,705 8,370 3,333 3,744 3,446 2,801 15,930 3,118 2,122 2,159 466,879 Portland Portland 609,456 Providence Providence North Providence Cranston 177,994 32,078 80,387 Raleigh Raleigh Cary 431,746 151,088 Richmond Richmond Bon Air Bensley East Highland Park Lakeside 214,114 16,366 5,819 14,796 11,849 Sacramento Sacramento Rio Linda North Highlands Arden-Arcade La Riviera Rosemont Parkway-South mento 475,122 15,106 42,694 92,186 10,802 22,681 36,468 609,456 290,459 582,834 262,944 62 Sacra- Florin Vineyard 47,513 24,836 Salt Lake Salt Lake South Salt Lake 186,440 24,366 San Antonio San Antonio Somerset Macdona Helotes Leon Valley Terrell hills Castle Hills Kirby Shavano Park Windcrest Converse Live Oak Universal City Adkins Cibolo Northcliff Garden Ridge Fair Oaks ranch 1,409,019 1,550 559 7,341 10,151 4,878 4,116 8,673 3,035 5,364 18,198 9,156 N/A N/A 19,580 1,819 1,882 5,986 San Diego San Diego Chula Vista National city Bonita La Presa Coronado Spring Valley La Mesa Rancho San Diego El Cajon Santee Granite Hills Winter Gardens Lakeside Poway Fairbanks ranch Rancho Santa Fe Encinitas Solana Beach Del Mar Escondido 1,345,895 243,916 58,582 12,538 34,126 24,697 28,205 57,065 21,208 99,478 53,413 3,035 20,631 20,648 47,811 3,148 3,117 59,518 12,867 4,161 143,911 767,408 210,806 1,511,307 63 2,297,970 San Francisco San Francisco 837,442 San Jose San Jose Sunnyvale Santa Clara Fruitdale Campbell Saratoga Los Gatos Morgan Hill East Foothills Milpitas 1,000,536 140,081 116,468 935 39,349 29,926 29,413 37,882 8,269 70,092 Seattle Seattle White Center 608,660 13,495 St Louis St Louis Castle Point Bellefontaine Neighbors Jennings Normandy Northwoods Pine Lawn 319,294 3,962 10,828 14,712 5,008 4,208 3,261 Tampa Tampa Town ’N’ Country Egypt Lake-Leto Greater Carrollwood Lake Magdalene Cheval Greater Northdale Lutz Thonotosassa Temple Terrace Del Rio Mango Seffner Brandon Palm River-Clair Mel Progress Village Gibstonton 347,645 78,442 35,282 N/A 28,509 10,702 22,079 19,344 13,014 24,541 N/A 11,313 7,579 103,483 21,024 5,392 14,234 Virginia Beach Virginia Beach Greenbrier East 448,479 N/A 837,442 1,472,951 622,155 361,273 742,583 448,479 64 Washington Washington Bethesda Silver Spring Friendship Village Takoma Park Hyattsville Coral Hills Suitland-Silver Hill Hillcrest Heights Marlow Heights Temple hills Alexandria Arlington 658,893 63,374 76,716 4,512 16,715 17,865 9,895 33,515 16,469 5,618 7,852 148,892 207,627 1,267,943 Table B.1: Administrative units that overlap with our amenities data and their respective population taken from Wikipedia. 65 66 Bibliography [1] Bettencourt, Luís MA, et al. "Growth, innovation, scaling, and the pace of life in cities." Proceedings of the national academy of sciences 104.17 (2007): 7301-7306. [2] Bettencourt, Luis, and Geoffrey West. "A unified theory of urban living." Nature 467.7318 (2010): 912-913. [3] Ortman, S. G., et al. "Settlement Scaling and Increasing Returns in an Ancient Society." (2014). [4] Jacobs, Jane. The death and life of great American cities. Vintage (New York), 1961. [5] Jacobs, Jane. Cities and the Wealth of Nations. Harmondsworth, UK: Penguin, 1986. [6] Alexander, Christopher, S. Ishikawa, and M. Silverstein. "Pattern languages." Center for Environmental Structure 2 (1977). [7] Mumford, Lewis. "The city in history. its origins, its transformation, and its prospects." (1961). [8] Glaeser, Edward. "Triumph ofthe City." How our Greatest Invention (2011). [9] B. Thau (2014). "How big data helps chains like Starbucks pick store locations". Retrieved from: http://www.forbes.com/sites/barbarathau/2014/04/24/how-big-datahelps-retailers-like-starbucks-pick-store-locations-an-unsung-key-to-retailsuccess/2/ [10] Giuliano, Genevieve, and Kenneth A. Small. "Subcenters in the Los Angeles region." Regional science and urban economics 21.2 (1991): 163-182. 67 [11] Cladera, Josep Roca, Carlos R. Marmolejo Duarte, and Montserrat Moix. "Urban structure and polycentrism: towards a redefinition of the sub-centre concept." Urban Studies 46.13 (2009): 2841-2868. [12] Hollenstein, Livia, and Ross Purves. "Exploring place through usergenerated content: Using Flickr tags to describe city cores." Journal of Spatial Information Science 1 (2015): 21-48. [13] Toole, Jameson L., et al. "Inferring land use from mobile phone activity." Proceedings of the ACM SIGKDD international workshop on urban computing. ACM, 2012. [14] Pei, Tao, et al. "A new insight into land use classification based on aggregated mobile phone data." International Journal of Geographical Information Science 28.9 (2014): 1988-2007. [15] Krueger, Samuel Glendening. "Delimiting the Postmodern Urban Center: An analysis of urban amenity clusters in Los Angeles. University of Southern California", 2012. [16] Glaeser, Edward L., and Joshua D. Gottlieb. "The wealth of cities: Agglomeration economies and spatial equilibrium in the United States". No. w14806. National Bureau of Economic Research, 2009. [17] Fujita, Masahisa, and Paul Krugman. "The new economic geography: Past, present and the future." Papers in regional science 83.1 (2004): 139-164. [18] Hidalgo, César A., and Ricardo Hausmann. "The building blocks of economic complexity." Proceedings of the National Academy of Sciences 106.26 (2009): 10570-10575. [19] Bettencourt, Luís MA, et al. "Urban scaling and its deviations: Revealing the structure of wealth, innovation and crime across cities." PloS one 5.11 (2010): e13541. [20] Gomez-Lievano, Andres, HyeJin Youn, and Luis MA Bettencourt. "The statistics of urban scaling and their connection to Zipf?s law." PLoS One 7.7 (2012): e40393. [21] Bettencourt, Luís MA. "The origins of scaling in cities." Science 340.6139 (2013): 1438-1441. [22] Hidalgo, César A., et al. "The product space conditions the development of nations." Science 317.5837 (2007): 482-487. 68 [23] Maes, Pattie. "Agents that reduce work and information overload." Communications of the ACM 37.7 (1994): 30-40. [24] Resnick, Paul, and Hal R. Varian. "Recommender systems." Communications of the ACM 40.3 (1997): 56-58. [25] Salesses, Philip, Katja Schechtner, and César A. Hidalgo. "The collaborative image of the city: mapping the inequality of urban perception." PloS one 8.7 (2013): e68400. [26] Naik, Nikhil, et al. "Streetscore–Predicting the Perceived Safety of One Million Streetscapes." Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on. IEEE, 2014. [27] Gonzalez, Marta C., Cesar A. Hidalgo, and Albert-Laszlo Barabasi. "Understanding individual human mobility patterns." Nature 453.7196 (2008): 779-782. 69