Spatial/temporal mismatch: a conflation protocol for Canada Census spatial files NADINE SCHUURMAN Department of Geography, Simon Fraser University, RCB 7123, Burnaby, BC, Canada V5A 1S6 (e-mail: suzanad@sfu.ca) DARRIN GRUND Faculty of Health Sciences, Simon Fraser University, WMC 2812, Burnaby, BC, Canada V5A 1S6 (e-mail: dmgrund@sfu.ca) MICHAEL HAYES Faculty of Health Sciences, Simon Fraser University, WMC 2812, Burnaby, BC, Canada V5A 1S6 SUZANA DRAGICEVIC Department of Geography, Simon Fraser University, RCB 7123, Burnaby, BC, Canada V5A 1S6 (e-mail: suzanad@sfu.ca) The Canada census is one of the chief sources of demographic and socio-economic data for researchers in this country. Census variables are linked to geography files that allow researchers using geographic information systems (GIS) to view and analyze spatial data. Some of the most useful analysis, however, is based on changes in attribute values over time and space. Analysis of spatiotemporal events such as shifting migration patterns or changes in the distribution of health status permits a more dimensioned perspective than the viewing of static spatial phenomena. The analysis of spatio-temporal phenomena is limited by major changes in the spatial framework (e.g., location of road networks and other spatial entities) between national censuses. This paper addresses this limitation by (i) illustrating the extent of spatial mismatch between the 1996 and the 2001 census; (ii) examining attempts to rectify this problem in other jurisdictions and (iii) presenting a ‘made-in-Canada’ solution for conflation of census geometries. We believe that this solution will enhance the ability of Canadian The Canadian Geographer / Le Géographe canadien 50, no 1 (2006) 74–84 # / Canadian Association of Geographers / L’Association canadienne des géographes Le recensement est une source majeure de données démographiques et socio-économiques pour les chercheurs au Canada. Les variables du recensement sontreliées aux fichiers géographiques et permettent aux chercheurs utilisant le système d’information géographique de voir et analyser les données spatiales. Cependant, une des analyses les plus utiles est basée sur le changement de certaines attributs dans le temps et l’espace. L’analyse spatiotemporelle d’événements, comme le changement des modes d’immigration, ou les changements d’états de santé permettent une perspective plus dimensionnée que la seule vue de phénomènes statique spatiaux. L’analyse des phénomènes spatio-temporels est limitée par les changements majeurs dans le cadre spatial ( e.g. endroits ou se situent les réseaux routiers et autres entités spatiales) entre recensements. C’est article adresse cette limitation en ( 1) illustrant l’étendue des disparités spatiales entre les recensements de 1996 et 2001 ( 2) examinant les tentatives de rectification de ce problème dans d’autres juridictions et ( 3) présentant une solution ‘faite au Canada’ en ce Spatial/temporal mismatch 75 researchers to describe and analyze socio-economic, health and demographic shifts across time and space. The research is supported by an ftp site for downloading the census geography rectification software presented in this paper. que concerne la conflation des géométries du recensement. Nous pensons que cette solution permettra d’accroitre la possibilité aux chercheurs canadiens de décrire et d’analyser les décalages à travers le temps et l’espace des données, socio-économiques, de santé et démographiques. Cette recherche est soutenue par un site ftp de téléchargement du logiciel de rectification des données géographiques du recensement présentées dans cet article. Introduction: The Canadian Census and the Pesky Problem of Static Cling study of static (non-dynamic) spatial events. Unfortunately, the spatial unit used to report attributes is not consistent between the 1996 and the 2001 censuses. In 1996, the enumeration area (EA) was the smallest reporting unit, while in 2001, dissemination areas (DAs) were used. The latter are approximately 10 times smaller than EAs. Moreover, the spatial geometry of the two census periods is non-congruent. This paper addresses these incongruities and introduces a solution for their reconciliation—one that has enormous potential benefit to researchers and governments in Canada. Geographic information system (GIS) analysis has historically been constrained by static data—or data that are captured at a single point of time. To understand how people and events shift over space and through time—for example, changing settlement patterns of recent immigrants or changes in the spatial distribution of health disparities—it is necessary to have spatial data for more than one period. Understanding how spatial phenomena change through time, conversely, permits understanding of the dynamism of geography, important to researchers and policy makers alike. To compare temporal changes, however, it is necessary that the sets of spatial data must precisely coincide geometrically. That is, static features such as roads, lakes, buildings and administrative units must coincide for the time periods used. The Canadian census is the most important source of socio-economic data in the country. It is used to study immigrant populations, the health of distinct clusters of Canadians, patterns of home ownership as well as a myriad other spatial and statistical phenomena. The ability to study change over time for spatial events such as immigrant resettlement or the spatial distribution of health disparities is dependent on equivalent data between census years. For non-spatial attribute data such as age, education and income, effort is made to match categories across multiple censuses— although there are frequent changes to categories which confound comparisons. It is perhaps more important that the census geography (or the spatial reference system) remains comparable. In the absence of closely aligned spatial files including street networks, municipal boundaries and census units, it is difficult to understand spatio-temporal phenomena, and we are reduced instead to the The Canadian Geographer / Le Géographe canadien 50, no 1 (2006) Differences in Spatial Geometry between 1996 and 2001 Street Network Files The EA was the primary census spatial collection and dissemination unit until 1996. Until 2001, the EA was the smallest census unit available and offered the highest spatial and attribute resolution. EAs are composed of one or more neighbouring blocks. Blocks, in turn, are composed of block faces, defined as one side of a street between two consecutive features intersecting that street; the intersecting features can be other streets, geographic boundaries or limits of map tiles. Block-face representative points can be generated from these linear features and are commonly placed midway between the features intersecting the street and set back a distance of 22, 11, 5 or 1 m from the street centre line (Statistics Canada 2003). The block-face points are nodes with attribute values for population and dwelling counts on that block. Street network files (SNF) were originally created in the early 1970s and served to delineate data collection (block faces) as well as to define EA boundaries. Figure 1 demonstrates this relationship. Using the same unit (e.g., EAs based on the SNF) for both collection and high-resolution 76 Nadine Schuurman, Darrin Grund, Michael Hayes and Suzana Dragicevic EA 991 EA 993 EA 992 EA 994 Enumeration area boundary Block-face point Street Figure 1 Spatial relationship between features. Enumeration area boundaries are shown in relation to the street network and block-face points dissemination created privacy and confidentiality issues and resulted in conflicts between the optimization of data collection and reporting. While the SNF is an integral part of defining EAs, its underlying weakness is the information used to create the data file. Original SNFs were created from disparate data sets at various scales using NAD27 as their datum. Their geometry was captured by digitizing paper maps, and is subject to all of the accuracy problems associated with manual digitizing (Martin 1996). The 2001 census introduced new digital cartographic files for the generation of separate areas for data collection and dissemination and to address the accuracy problems associated with SNFs. While EAs remain the primary unit of collection, DAs are now the smallest geography for which census data are disseminated. DAs are composed of one or more city blocks within urban areas and, primarily, use the road network data file to define their spatial boundaries. The census continues to use road data to define the DA boundaries for 2001; however, the SNF was replaced by an updated road network file (RNF). The RNF contains more roads and has increased positional accuracy (large portions of the road network have been re-aligned to match the National Topographical DataBase), and the datum is now NAD83 (Statistics Canada 2001). Statistics Canada created DAs in response to suggestions from the research community that The Canadian Geographer / Le Géographe canadien 50, no 1 (2006) they use spatial units that are compact, uniform and remain relatively stable over time (Purderer 2001). The design criteria for the generation of the DA boundaries were . Temporal stability . Reduced area suppression (minimum population) . Uniformity (maximum population) . Intuitive boundaries (visible) . Compact shape1. Temporal stability was the DA feature most requested by the user community. This feature requires that the DA boundaries respect the boundaries of both census tracts and census subdivisions. As census tract and census subdivision boundaries remain relatively stable over time, so in theory would the DA boundaries in the future. Uniformity between DAs is achieved by setting a target population of 500 people per DA (e.g., violating the homogeneity factor). Setting this target population also aids in the avoidance of potential data suppression for purposes of privacy. The irony remains that to undertake future temporal stability for the DAs, the existing EAs were discarded. The shift from EAs to DAs is associated, however, with difficulties in performing comparisons between the 2001 census data and other historical census data, because the spatial frameworks are no longer equivalent. The problem of non-aligned census geographies is further compounded by significant differences in the SNF from 1996 to 2001. The differences are illustrated in Figure 2(a, b) (below). The discrepancies between EAs and DAs are far more significant in urban/census metropolitan areas where street networks and EA/DA densities are the greatest. The enormity of the EA/DA problem in urban areas is illustrated by the fact that only 198 of more than 3300 DAs in the Greater Vancouver Regional District (GVRD) correspond to 1996 EAs on a one-to-one basis (Statistics Canada 2003). This low correspondence means that an EA to DA statistical correspondence file cannot be 1 Homogeneity in terms of population size was another factor requested by the user community with the suggestion that dwelling type be used as a basis for homogeneity. To enlist the homogeneity factor as a design criteria would have required dwelling type counts to be generated by block from the 1996 EA-level census data. It was concluded, however, that the quality of such dwelling type estimates would be inadequate, and the homogeneity criterion was dropped (Purderer 2001). Spatial/temporal mismatch (a) A 32 m B 1996 Street network 2001 Road network The offset between nodes A and B is approximately 32 m (b) Offset distance 30 m or more DA 551 EA 991 EA 993 Offset distance 20 m or more EA 992 EA 994 The DA 551 prior to conflation is not spatially congruent with the EA boundaries Dissemination area 551 boundary Enumeration area boundary Figure 2 (a) Spatial shift of street features. The spatial shift between the 1996 street network and the 2001 road network can exceed 32 m in certain areas. (b) Spatial mismatch of enumeration area (EA) and dissemination area (DA) boundaries. The spatial shift in the street networks results in the DA boundaries not lining up with their corresponding EA boundaries used for historical spatial analysis. This limitation has important consequences for Canadian geographers who seek to understand spatio-temporal shifts in socio-economic phenomena. Geographical Dimensions and Historical Comparisons: The Need for EA to DA Spatial Correspondence The authors are part of a team at the Institute for Health Research and Education (IHRE) at Simon Fraser University, investigating ways of The Canadian Geographer / Le Géographe canadien 50, no 1 (2006) 77 characterizing population health spatially using high-resolution (cadastral level) spatial data and a range of detailed attributes associated with this degree of granularity. The IHRE group was confronted with the EA/DA correspondence failure in the fall of 2002 when it attempted to characterize changes in the socio-economic conditions across the GVRD between 1996 and 2001. The temporal component of this analysis is a means of teasing out patterns in health outcomes related to social status that might not be evident in a static temporal frame. Upon discovering the lack of spatial correspondence between 1996 and 2001 SNF, IHRE initiated discussions with Statistics Canada. Several meetings ensued during which it became evident that the spatial dimensions of correspondence had not been accounted for by Statistics Canada. The agency had done a good job of ensuring attribute correspondence, but they did not have a stock algorithm or the internal capability to reconcile the spatial geometries of the 1996 and 2001 censuses. Discussions between Statistics Canada and the research group identified two possible methods of reconciliation between the 1996 and the 2001 SNF. The first method involved the use of an EA/DA correspondence file, while the second involved using the block-face points generated for the 1996 street network as a link between the two data sets. As the 1996 EAs and the 2001 DAs show a low one-to-one correspondence, the first method was discarded. Thus, it was decided to use the block-face points generated for the 1996 SNF as a link between the two data sets. As the 1996 EAs are partially delineated from the street network, it is easy to see that a hierarchical relationship exists between the EAs (and DAs) and the block-face representative points. Data collected at the EA level are aggregations of data at the block-face level. Once the spatial shift between the 1996 EAs and the 2001 DAs is corrected, the data at the block-face level can be re-aggregated to match the 2001 DA geography. Figure 3 illustrates this process Although Statistics Canada maintains population and dwelling count data at the block-face level, data are no longer disseminated for individual block-faces because of confidentiality concerns. Nor does Statistics Canada have a methodology for reconciling the 1996 and 2001 block-face data. To reconcile attributes from the two censuses, they requested that our research team provide Statistics Canada with the adjusted DA boundaries, so that 78 Nadine Schuurman, Darrin Grund, Michael Hayes and Suzana Dragicevic DA 551 DA 551 EA 991 EA 993 EA 992 EA 994 The re-aligned DA 551 is overlayed onto the EA geographical areas EA 992 EA 994 Census data for DA 551 are re-tabulated using the block-face points falling inside the DA Enumeration area boundary Block-face point Dissemination area 551 Block-face points within DA 551 Figure 3 Re-tabulating the block-face points. Re-aggregating the block-face points from enumeration area (EA) boundaries to the re-aligned dissemination area (DA) boundaries they could perform a custom tabulation of the 1996 census data (at block-face level) to match the 2001 DA geography. The process of this tabulation involves a point in polygon overlay, which assigns each block-face point to the appropriate DA (T. Brown, Statistics Canada 2003; Personal communication). In summary, the research established a process of census conflation based upon the following procedures. . As the boundaries do not share a one-to-one geometric correspondence, a concordance file between the two cannot be used. . The 1996 block-face point was identified as a viable link between the two geographies. . To be able to use the block-face points, we first had to align the 2001 DA boundaries to the 1996 street network from which the 1996 EAs were delineated. Otherwise, the DAs would include some block-face points they should not and exclude others that they should include. . Once the boundarieshave been aligned, 1996 census data held at the block-face level can be re-tabulated to reflect the 2001 DA geography by Statistics Canada (termed custom geography tabulation). The Canadian Geographer / Le Géographe canadien 50, no 1 (2006) This custom geography tabulation can then be used to perform a historical analysis between the 1996 and the 2001 census data sets. Once the prerequisite of reconciling the 1996 and 2001 SNF had been established, our research team set about developing a conflation methodology. The first step was to investigate other approaches to conflation. Conflation for the Masses: Multiple Approaches and Dimensions of the Geometry Problem Reconciling two geographical frameworks that represent the same spatial phenomena is referred to as conflation. There are a number of technical definitions for conflation (GIS/Trans Ltd 1995; Yuan and Tao 1999; Kang 2001; Veltkamp 2001; Rahimi et al. 2002), but, in its most general form, conflation is the reconciliation of different geometric descriptions of the same feature. Conflation must also account for inconsistencies between data sets. In the simplest of cases, it involves a one-toone matching between features in two data sets Spatial/temporal mismatch where both the coverages host the same features. In other cases, there is a one-to-none or one-to-many correspondence. This complicates the process as decisions must be made about which map source should be treated as reliable for feature matching. In many cases, such decisions must be made on a feature-by-feature basis. Theory and prototypes for map conflation were developed between 1983 and 1985 at the United States Bureau of the Census. These involved protocols for point and feature mapping as well as a number of computer science algorithms that drew heavily from mathematical theories of generalization and numerical and geometric theory (Saalfeld 1993). Early algorithms focused on point-based conflation, whereas most map features are polygons necessitating the need for shape analysis (Doytsher et al. 2001). Shape matching enables registration, approximation and simplication of linear and polygon features (Veltkamp 2001). Current conflation reflects the larger GIScience field in its experimentation with fuzzy logic, rough sets, component tool kits and agentbased solutions for conflation (Rahimi et al. 2002). Recent commercial efforts to incorporate artificial intelligence in conflation (as opposed to rule-based expert systems) are reflected in programs like CONFLEX developed by Digital Corporation (2004). In each case, however, the goal of conflation is to reconcile similar but non-coincident spatial files corresponding to the same area of the earth’s surface. Conflation is not a one-step process; it involves multiple tasks, including reconciling feature location discrepancies including sliver polygons and edge matching, adding new features to a coverage where they were not previously represented and integrating new attributes into a spatial data set (Yuan and Tao 1999). The algorithmic steps involved in map conflation have been characterised by Saalfeld (cited in Kang 2001) as . identify potential-matching pairs of point features . rubber sheet, the first map to align with the second based on pairs identified in step one . repeat until no new matches are found. Despite automated approaches to many conflation problems, human intervention remains a reality of conflation. This is despite the optimistic development of software-based agents and numerous automated routines (Yuan and Tao 1999). The Canadian Geographer / Le Géographe canadien 50, no 1 (2006) 79 Conflation is like automated generalization (elimination of map detail as scale decreases) in this respect. There have been numerous attempts to automate generalization, but none of the technical solutions have been able to fully incorporate the flexibility of human judgement (Schuurman 1999). Operator intervention in conflation is, however, costly and time consuming—accounting for over 90 percent of time to conflate for only 5 percent of matches on a given project (Yuan and Tao 1999). Conflation is not a fully fledged automated procedure nor is it a purely technical endeavour. There are numerous institutional dimensions of conflation. Creating correspondences between different spatial descriptions of the same features are the basis for extending the resolution and scope of spatial data and their attributes. The larger goal of our population health research program is to develop a high-resolution basis for investigating subtle shifts in health characteristics of population. It is an example of research that requires co-operation between multiple institutions to build the appropriate spatial data set. The development of integrated data sets for specific analyses is part of a broader trend away from proprietary, single-use data that are owned by institutions and agencies (Schuurman 2002). This trend is opposed by a long tradition of institutional ownership and jurisdiction over data and is countered in Canada by prevailing traditions of cost-recovery for data (Klinkenberg 2003). Disjoint spatial databases such as municipal cadastre systems are often isolated pockets of potential analysis, but their true power in aiding spatial and temporal analysis remains to be realized. In the absence of map conflation, such analysis is stymied by curtailed spatial data with no temporal depth. Despite recognition of conflation as a GIS process for over two decades, the tendency to work in local jurisdictional environments with little data sharing has, to date, forestalled geographical research (Rahimi et al. 2002). Depth and extent of spatial data are achieved through multiple individual conflation and integration efforts. As in Canada, many local governments in the US maintain their own large-scale data sets in the form of digital orthophotos, parcel maps and road networks. There are discrepancies, however, between these data and the Census Bureau’s map data sets. One example is Vermont’s efforts to develop spatial and attribute 80 Nadine Schuurman, Darrin Grund, Michael Hayes and Suzana Dragicevic correspondence between their state data collection—the Vermont centre for geographic information—and topological integrated geographic encoding and referencing used by the US census (Sperling and Sharp 1999). Another state conflation project that shares similarity to Canada’s EA/ DA mismatch is a map conflation procedure developed to correct discrepancies between the US Census Bureau and local government data in Delaware County, Ohio (Kang 2001). A data integration system, based on geometric principles as opposed to attribute information matching, was implemented in ESRI’s ARCVIEW as an interactive cartographic system and is similar to the program developed by the authors in the ARCGIS environment (described below). This tool aided the administrators of Delaware County, Ohio, to successfully update 2000 collection blocks, correct inaccurate addresses and identify missing housing units in multiple locations (Kang 2001). Few people realize, however, the degree to which clerical and GIS resources are strained by map conflation projects (Sperling and Sharp 1999). Frequently, there is no extra budget to extend data sets and insufficient expertise. Add to this the cost of purchasing powerful conflation software, and conflation may remain on the back burner. To add insult to injury, when organizations do invest in appropriate software, it often requires customization by skilled programmers to suit local needs. On the other hand, the need for conflation is burgeoning as local governments and businesses want to add global positioning system and other detailed information to existing infrastructural coverages such as that provided by the Canada census. Map conflation is a precursor to statistical, epidemiological and sociological analysis of many spatial phenomena (Kang 2001). Frequently, conflation involves integration of two geospatial data sets, in which one is acknowledged to be the superior source, but the other contains valuable attribute and/or spatial information. This is not necessarily the case with the 1996 and 2001 census, although the geometric descriptions of the latter are currently accepted as more reliable. Conflation of census to census data encompasses many traditional aspects of conflation. It is simpler, however, in that semantic conflation is easier, because similarities between attributes from different years have a limited range of associated meaning. The category (females zero to The Canadian Geographer / Le Géographe canadien 50, no 1 (2006) fourteen years) has, after all, only so many interpretations. The challenges of geometric conflation were, however, made evident, as we struggled to match the 1996 and 2001 geometries. A Made-in-Canada Solution: CensusSpecific Conflation Our group thus began an investigation into how best to reconcile the spatial definitions used in the two recent censuses. This was critical to the research project; without such correspondence, it would be impossible to superimpose more detailed (cadastre level) spatial data and their associated variables (e.g., property tax assessment) with more aggregated socio-economic variables from the two censuses to illustrate trends over five years. The first step towards conflating the two geometries entailed a search for offthe-shelf software that addressed the problem. There are a number of expensive and encompassing software packages marketed for conflation. Like most university research groups, however, we wanted to avoid paying for software features that were not specific to their particular problem. Moreover, we were concerned that other Canadian researchers would be restricted in their analysis without the identification or development of a mechanism for census conflation. We also recognized that a Canada-wide standard for rectification of census geography is imperative for developing comparable research in different parts of the country—especially in an area like spatialized health research because of the possible sensitivity of research results. Conflation software capable of automatic boundary alignment comes at a hefty price. Comprehensive products include GIS/TRANS GIS/ T-CONFLATE software and LAND BASE SYSTEMS TOTAL FIT software. A decision was made instead to develop an in-house solution at IHRE that could be used by other members of Canadian research community interested in comparing 1996 and 2001 spatial data. The development of a custom conflation software tool for EA/DA alignment was possible, because it represents an isolated and limited geometrical problem rather than a host of one-to-one, one-to-many, many-to-one and many-to-many spatial and attribute non-correspondences. The focus was to create a simple program that allows the user to automatically or manually create links Spatial/temporal mismatch between two data sets and then aligns one data set to the other based upon these links. Many university researchers in Canada (and the United States) use ESRI’s software products for GIS analysis—a product of very effective market seeding based on academic discounts. Development of a conflation mechanism for census geometries in ESRI’s newest environment, ARCGIS 9.0, was therefore the obvious choice. The ARCGIS 9.0 software package provides numerous editing tools designed to overlay and match one data set over another. However, an editing tool that automatically adds links to the data set, enabling linkages between the edit coverage (coverage the user is adjusting) and the snap coverage (the data set the user is adjusting the edit data set to) is not available with the current version of the software. Therefore, the research team developed a coding and application tool in the ARCGIS environment that would automate this as part of the data conflation process. The interface to the census conflation tool is shown below in Figure 4. Additional information and the application tool can be downloaded from the http://www.gis.sfu.ca website. Validation: Ensuring That the Dots Line Up Substantiation of the process and determination of its limitations were important aspects of the project. Validation focused on (i) how well the EA/DA conversion was performed and (ii) whether the 1996 Figure 4 The census conflation tool—autolink form interface The Canadian Geographer / Le Géographe canadien 50, no 1 (2006) 81 block-face points were attributed to the correct corresponding DAs. The team investigated the number of block-face points that had been assigned to new DAs. These numbers were obtained by overlaying the 1996 block-face point coverage with that of the DA boundaries at several stages of the rectification process. In each instance, the block-face point coverage was overlaid with the DA boundaries, before any adjustment was performed. The same overlay was performed with the DA coverage subsequent to the autolink adjustment, the manual adjustment and finally after the quality control adjustment. After each overlay, the attribute for the DA identifier (DAUID) was renamed to produce multiple fields for the DA ID. This sequence of layers was then analyzed to determine how membership changed at various stages. For example, after all overlays were completed, the final block-face point coverage contained fields named DAUID_BEFORE, DAUID_ALINK and DAUID_MANUAL. Block-face point features could then be selected by those which had a changed DAUID attribute after each adjustment process. The efficiency of the autolink feature could be further assessed by a series of more complex queries. For example, points, which had been moved into the incorrect DA by the autolink feature, were found by the following queries: For those points which autolink failed to correct: ‘‘DAUID_BEFORE’’ ¼ ‘‘DAUID_ALINK’’ AND ‘‘DAUID_ BEFORE’’ <> ‘‘DAUID_MANUAL’’ For those which were initially assigned to the wrong DA and were moved by autolink to another DA which was incorrect: ‘‘DAUID_BEFORE’’ <> ‘‘DAUID_ALINK’’ ‘‘DAUID_ BEFORE’’ <> ‘‘DAUID_MANUAL’’ AND ‘‘DAUID_ ALINK’’ <> ‘‘DAUID_MANUAL’’ And for those which had been incorrectly moved out of their correct DA: ‘‘DAUID_BEFORE’’ <> ‘‘DAUID_ALINK’’ AND ‘‘DAUID_ BEFORE’’ ¼ ‘‘DAUID_MANUAL’’ The resulting numbers were broken down by municipality and are displayed in Table 1. Summarization by municipality was achieved by overlaying the point coverage with the available GVRD land-use coverage, which contained the attribute for each municipality. The table summarizes that the autolink program was able to match up some of the boundaries correctly. Half of the adjustments, however, were ultimately performed manually to compensate for autolink errors and those missed by the automated process. The Canadian Geographer / Le Géographe canadien 50, no 1 (2006) 69 88,134 3,437 114 124 0 0 1 5 141 170 139 12 96 53 44 56 205 699 842 0 337 236 58 105 AUTO LINK corrections 3,219 123 188 0 0 4 2 161 77 108 30 169 33 64 99 293 757 438 4 241 266 74 88 Manual corrections missed by AUTO LINK 109 0 4 0 0 0 1 6 3 4 1 10 0 1 0 10 15 33 0 8 8 1 4 Manual corrections of AUTO LINK A* 1,178 16 70 0 0 3 1 104 36 61 12 44 16 13 18 135 250 187 0 96 57 17 42 Manual corrections of AUTO LINK B** 101 17 61 4 7 2 10 Manual quality check corrections 4,607 139 262 0 0 7 4 271 116 173 43 240 56 78 117 438 1,083 662 4 355 331 92 136 Total manual corrections *A refers to those points which were originally allocated to the wrong dissemination area (DA) but were allocated to another DA that was still incorrect by AUTO LINK. **B refers to those points which were originally allocated to the correct DA but were subsequently assigned to another DA by AUTO LINK. GVRD N/A 3,313 5,147 108 73 130 1,658 1,399 1,302 5,176 16,974 18,150 Electoral Area A Township of Langley Village of Anmore Village of Belcarra Village of Lions Bay North Vancouver Port Coquitlam Port Moody Richmond Surrey Vancouver 163 5,380 3,329 4,204 782 3,246 of of of of of of 463 8,088 5,690 1,061 2,229 Total blockface points City of White Rock District of Delta District of Maple Ridge District of North Vancouver District of Pitt Meadows District of West Vancouver City City City City City City Bowen Island Municipality City of Burnaby City of Coquitlam City of Langley City of New Westminster Municipality Table 1 Statistics of adjusted block-face points for municipalities within the Greater Vancouver Regional District (GVRD) 8,044 253 386 0 0 8 9 412 286 312 55 336 109 122 173 643 1,782 1,504 4 692 567 150 241 Total corrections 82 Nadine Schuurman, Darrin Grund, Michael Hayes and Suzana Dragicevic Spatial/temporal mismatch One of These Things Is Not the Same: The Problem with Matching Attributes It may be evident to some readers that a problem of some magnitude remains after the spatial geometry is reconciled; re-distributing the attributes from 1996 (disseminated at the EA level) to match the 2001 DAs. The districting matches, but the attributes are still being reported using different spatial units. We contacted Statistics Canada about this to find that their office is happy to do this but at a cost. It is compulsory, however, that Statistics Canada do the attribute reconciliation between the newly convergent EA/DA geometries, because they are the sole owners of the block-facelevel attribute data which is protected for privacy reasons. Data are collected at the block-face level but distributed at the EA/DA level. The tabular data that university and government researchers have access to do not contain a link down to the block-face level for reasons of confidentiality. Statistics Canada can provide a custom tabulation of the 1996 block-face points to the rectified DA boundaries for 2001. Obtaining the custom tabulation allows direct linkage between attributes of the 1996 and 2001 census. This, in turn, enables the researcher to compare data at a higher resolution (DA vs. EA). An example is provided by the analysis of health status based on mortality statistics. At a higher resolution, even small errors in assigning deaths to specific locations (arising from spatial mismatch of underlying frameworks) can cause large increases in estimated mortality rates, particularly when the aim is to aggregate small areas based on income into quintile aggregates. There is no reason to assume that such errors would be randomly distributed within an urban area, particularly as re-development in inner city locations typically results in displacing of lower income groups by higher ones as in the case of Vancouver’s Concord Pacific and Yaletown developments. From Principle to Practice: Implementation of Census Conflation for Spatial Analysis The utility of this software and methodology warrant substantiation, and this can be achieved by The Canadian Geographer / Le Géographe canadien 50, no 1 (2006) 83 visiting the context for this tool kit. The IHRE research group is interested in examining gradients of social inequality and ultimately linking these to sentinel health conditions. Assessment of ‘at-risk’ areas from a socio-economic perspective using existing census data requires working at as high a resolution as possible. Thus, census tracts which have matching geometries are not appropriate for this analysis. Integration of high-resolution socioeconomic data from the 1996/2001 Canada census allows researchers to conduct historical analyses of economic and social conditions as well as take into account external conditions such as inward and outward migration. These data can be used to portray changes in the health of population more explicitly than a static picture. This dynamic framing of health indicators can also be used to measure the influence of federal and provincial policies on a population’s social and economic conditions and be correlated with health outcomes. Conclusion: Assessing the Functionality and Weighing the Benefits There are two components for assessing the functionality of our proposed solution to the EA/DA conflation problem. The first is technical and involves determination of the extent to which the labour involved was warranted by the results and whether the software is usable. The second is more abstract and involves an assessment of the degree to which conflation of the 1996 and 2001 Canada census data will improve the quality and extent of socio-demographic and health research in Canada. The first of these is easier to assess, as it can be described using metrics. The adjustment of the DA boundaries to concord with the 1996 street network data for the GVRD census geographies took 70–75 h of work; this included manual adjustments that were required. The autolink function was found to operate best in areas which had a grid system street configuration, such as the City of Vancouver. It did not, however, function optimally in dense urban areas with non-uniform configuration (crescent streets, etc.) such as those found in the cities of Surrey and Richmond. In these areas, there was significant number of cases in which the autolink feature allocated the wrong DA IDs to the block-face points. 84 Nadine Schuurman, Darrin Grund, Michael Hayes and Suzana Dragicevic The usability of the program is not in question. It is simple to use and operate. Processing time may be a concern when using very large data sets, although projects of such scope would probably have the resources to purchase and implement large-scale conflation software. Future improvements to this software might include the ability to add and delete nodes. It would also be salutary to be able to measure the overall degree to which geometric accuracy is compromised by the re-allocation of DAs based upon the 1996 SNF—bearing in mind that the accuracy of the street files are in question anyway given the degree of variation between 1996 and 2001 files. Analytical accuracy is, of course, greatly enhanced, given that spatial temporal comparison between 1996 and 2001 are not otherwise possible. The second axis of assessment—contribution—is more difficult to analyze. It is based not only upon extent of uptake and the technical satisfaction of future users. It is also influenced by the resources necessary to create correspondence between DAs and EAs and by the impact of the resulting analysis. The issue of resources is a subtle one, because a research group that is already taxed in terms of technical expertise will not appreciate the incursion of this exercise to the tune of 70 h. The impact can, however, be priceless. As the examples above illustrates, there are a number of spatial analyses that will never otherwise be accomplished with the associated cost to knowledge in areas as diverse as immigration studies, population health and economics. The chief contribution of this methodology and software is an ability to incorporate a spatial dimension to temporal analysis of Canada census data. It thus permits the transformation of static spatial features and events to spatial temporal entities—with the greater dimensionality that such entities encompass. Acknowledgements This research was made possible through the support of the Canadian Institute for Health Information, Canadian Population Health Initiative research project ‘Urban Structures, Population Health and Public Policy’. We thank Ted Brown, Regional Advisor, Statistics Canada Pacific Region for his assistance The Canadian Geographer / Le Géographe canadien 50, no 1 (2006) with this endeavour. Luan Vo, research assistant, provided invaluable assistance. References 2004 Conflex: Intelligent, Automated Conflation Software Available at http://www.digitalcorp.com/conflex.htm (accessed 11 November 2004) DOYTSHER, Y., FILIN, S., and EZRA, E. 2001 ‘Transformation of datasets in a linear-based map conflation framework’ Surveying and Land Information Systems 61(3), 159–169 GIS/TRANS LTD. 1995 Comprehensive GIS Conflation, GIS/Trans Ltd, 1–9 Available at http://www.gistrans.com/pub/cf_whipr.pdf (accessed March 2004) KANG, H. 2001 Spatial Data Integration: A. Case Study of Map Conflation with Census Bureau and Local Government Data, University Consortium for Geographic Information Science (UCGIS) Summer Assembly, June 2001. Available at http://www.cobblestoneconcepts.com/ucgis2summer/ kang/kang_ main.htm (accessed March 2004) KLINKENBERG, B. 2003 ‘The true cost of spatial data in Canada’ The Canadian Geographer Le Géographe canadian 47(1), 37–49 MARTIN, D. 1996 Geographic Information Systems: Socioeconomic Applications (New York: Routledge) PURDERER, H. 2001 Introducing the Dissemination Area for the 2001 Census: An Update Geography Working Paper Series, Statistics Canada: 1–7. Available at http://www.statcan.ca/english/ research/92F0138 MIE/92F0138MIE2000004.pdf (accessed November 2003) RAHIMI, S., COBB, M., ALI, D., PAPRZYCKI, M., et al. 2002 ‘A knowledge-based multi-agent system for geospatial data conflation’ Journal of Geographic Information and Decision Analysis 6(2), 67–81 SAALFELD, A.J. 1993 ‘Conflation: automated map compilation (automated mapping)’ PhD Dissertation, Computer Vision Laboratory, Center for Automation Research, University of Maryland College Park SCHUURMAN, N. 1999 ‘Critical GIS: theorizing an emerging discipline’ Cartographica 36(4), 1–108 —. 2002 ‘Flexible standardization: making interoperability accessible to agencies with limited resources’ Cartography and Geographic Information Science 29(4), 343–353 SPERLING, J. and SHARP, S.A. 1999 ‘A prototype cooperative effort to enhance TIGER’ URISA Journal 11(2), 35–42 STATISTICS CANADA 2001 2001 Road Network Files – Reference Guide – Cat no. 92F0157GIE —. 2003 2001 Census Dictionary Available at http://www.stat can.ca/english/census2001/dict/appendices/92-378XIE02002.pdf (accessed October 2003) VELTKAMP, R.C. 2001 Shape Matching: Similarity Measures and Algorithms Proceedings from International Conference on Shape Modeling and Applications 2001, pp. 188–197. Genova, Italy, May 2001 YUAN, S. and TAO, C. 1999 Development of Conflation Components. Geoinformatics and Socioinformatics: The Proceedings of Geoinformatics´99 Conference, Ann Arbor, Michigan, University of California, pp. 1–13 DIGITAL CORPORATION