Spatial/temporal mismatch: a conflation protocol for Canada Census spatial files NADINE SCHUURMAN

advertisement
Spatial/temporal mismatch: a conflation protocol
for Canada Census spatial files
NADINE SCHUURMAN
Department of Geography, Simon Fraser University, RCB 7123, Burnaby, BC, Canada V5A 1S6 (e-mail: suzanad@sfu.ca)
DARRIN GRUND
Faculty of Health Sciences, Simon Fraser University, WMC 2812, Burnaby, BC, Canada V5A 1S6 (e-mail: dmgrund@sfu.ca)
MICHAEL HAYES
Faculty of Health Sciences, Simon Fraser University, WMC 2812, Burnaby, BC, Canada V5A 1S6
SUZANA DRAGICEVIC
Department of Geography, Simon Fraser University, RCB 7123, Burnaby, BC, Canada V5A 1S6 (e-mail: suzanad@sfu.ca)
The Canada census is one of the chief sources of
demographic and socio-economic data for researchers in this country. Census variables are linked
to geography files that allow researchers using
geographic information systems (GIS) to view and
analyze spatial data. Some of the most useful
analysis, however, is based on changes in attribute
values over time and space. Analysis of spatiotemporal events such as shifting migration patterns
or changes in the distribution of health status
permits a more dimensioned perspective than the
viewing of static spatial phenomena. The analysis of
spatio-temporal phenomena is limited by major
changes in the spatial framework (e.g., location of
road networks and other spatial entities) between
national censuses. This paper addresses this limitation
by (i) illustrating the extent of spatial mismatch
between the 1996 and the 2001 census; (ii) examining
attempts to rectify this problem in other jurisdictions
and (iii) presenting a ‘made-in-Canada’ solution for
conflation of census geometries. We believe that this
solution will enhance the ability of Canadian
The Canadian Geographer / Le Géographe canadien 50, no 1 (2006) 74–84
# / Canadian Association of Geographers / L’Association canadienne des géographes
Le recensement est une source majeure de données
démographiques et socio-économiques pour les
chercheurs au Canada. Les variables du recensement sontreliées aux fichiers géographiques et
permettent aux chercheurs utilisant le système
d’information géographique de voir et analyser les
données spatiales. Cependant, une des analyses les
plus utiles est basée sur le changement de certaines
attributs dans le temps et l’espace. L’analyse spatiotemporelle d’événements, comme le changement des
modes d’immigration, ou les changements d’états de
santé permettent une perspective plus dimensionnée
que la seule vue de phénomènes statique spatiaux.
L’analyse des phénomènes spatio-temporels est limitée
par les changements majeurs dans le cadre spatial
( e.g. endroits ou se situent les réseaux routiers et
autres entités spatiales) entre recensements.
C’est article adresse cette limitation en ( 1) illustrant
l’étendue des disparités spatiales entre les recensements de 1996 et 2001 ( 2) examinant les tentatives de
rectification de ce problème dans d’autres juridictions
et ( 3) présentant une solution ‘faite au Canada’ en ce
Spatial/temporal mismatch
75
researchers to describe and analyze socio-economic,
health and demographic shifts across time and
space. The research is supported by an ftp site for
downloading the census geography rectification
software presented in this paper.
que concerne la conflation des géométries du recensement. Nous pensons que cette solution permettra
d’accroitre la possibilité aux chercheurs canadiens de
décrire et d’analyser les décalages à travers le temps
et l’espace des données, socio-économiques, de santé
et démographiques. Cette recherche est soutenue par
un site ftp de téléchargement du logiciel de rectification des données géographiques du recensement présentées dans cet article.
Introduction: The Canadian Census
and the Pesky Problem of Static Cling
study of static (non-dynamic) spatial events.
Unfortunately, the spatial unit used to report attributes is not consistent between the 1996 and the
2001 censuses. In 1996, the enumeration area (EA)
was the smallest reporting unit, while in 2001,
dissemination areas (DAs) were used. The latter are
approximately 10 times smaller than EAs. Moreover,
the spatial geometry of the two census periods is
non-congruent. This paper addresses these incongruities and introduces a solution for their reconciliation—one that has enormous potential benefit to
researchers and governments in Canada.
Geographic information system (GIS) analysis has historically been constrained by static data—or data that
are captured at a single point of time. To understand
how people and events shift over space and through
time—for example, changing settlement patterns of
recent immigrants or changes in the spatial distribution of health disparities—it is necessary to have spatial data for more than one period. Understanding
how spatial phenomena change through time, conversely, permits understanding of the dynamism of
geography, important to researchers and policy
makers alike. To compare temporal changes, however, it is necessary that the sets of spatial data
must precisely coincide geometrically. That is, static
features such as roads, lakes, buildings and administrative units must coincide for the time periods used.
The Canadian census is the most important
source of socio-economic data in the country. It is
used to study immigrant populations, the health of
distinct clusters of Canadians, patterns of home
ownership as well as a myriad other spatial and
statistical phenomena. The ability to study change
over time for spatial events such as immigrant resettlement or the spatial distribution of health disparities is dependent on equivalent data between
census years. For non-spatial attribute data such
as age, education and income, effort is made to
match categories across multiple censuses—
although there are frequent changes to categories
which confound comparisons. It is perhaps more
important that the census geography (or the spatial
reference system) remains comparable. In the
absence of closely aligned spatial files including
street networks, municipal boundaries and census
units, it is difficult to understand spatio-temporal
phenomena, and we are reduced instead to the
The Canadian Geographer / Le Géographe canadien 50, no 1 (2006)
Differences in Spatial Geometry
between 1996 and 2001 Street
Network Files
The EA was the primary census spatial collection and
dissemination unit until 1996. Until 2001, the EA was
the smallest census unit available and offered the
highest spatial and attribute resolution. EAs are composed of one or more neighbouring blocks. Blocks, in
turn, are composed of block faces, defined as one
side of a street between two consecutive features
intersecting that street; the intersecting features
can be other streets, geographic boundaries or limits
of map tiles. Block-face representative points can be
generated from these linear features and are commonly placed midway between the features intersecting the street and set back a distance of 22, 11,
5 or 1 m from the street centre line (Statistics Canada
2003). The block-face points are nodes with attribute
values for population and dwelling counts on that
block. Street network files (SNF) were originally created in the early 1970s and served to delineate data
collection (block faces) as well as to define EA boundaries. Figure 1 demonstrates this relationship.
Using the same unit (e.g., EAs based on the SNF)
for
both
collection
and
high-resolution
76 Nadine Schuurman, Darrin Grund, Michael Hayes and Suzana Dragicevic
EA 991
EA 993
EA 992
EA 994
Enumeration area boundary
Block-face point
Street
Figure 1
Spatial relationship between features. Enumeration area boundaries are
shown in relation to the street network and block-face points
dissemination created privacy and confidentiality
issues and resulted in conflicts between the optimization of data collection and reporting. While
the SNF is an integral part of defining EAs, its
underlying weakness is the information used to
create the data file. Original SNFs were created
from disparate data sets at various scales using
NAD27 as their datum. Their geometry was captured by digitizing paper maps, and is subject to
all of the accuracy problems associated with manual digitizing (Martin 1996).
The 2001 census introduced new digital cartographic files for the generation of separate areas
for data collection and dissemination and to
address the accuracy problems associated with
SNFs. While EAs remain the primary unit of collection, DAs are now the smallest geography for which
census data are disseminated. DAs are composed of
one or more city blocks within urban areas and,
primarily, use the road network data file to define
their spatial boundaries. The census continues to
use road data to define the DA boundaries for
2001; however, the SNF was replaced by an updated
road network file (RNF). The RNF contains more
roads and has increased positional accuracy (large
portions of the road network have been re-aligned
to match the National Topographical DataBase), and
the datum is now NAD83 (Statistics Canada 2001).
Statistics Canada created DAs in response to
suggestions from the research community that
The Canadian Geographer / Le Géographe canadien 50, no 1 (2006)
they use spatial units that are compact, uniform
and remain relatively stable over time (Purderer
2001). The design criteria for the generation of
the DA boundaries were
. Temporal stability
. Reduced area suppression (minimum population)
. Uniformity (maximum population)
. Intuitive boundaries (visible)
. Compact shape1.
Temporal stability was the DA feature most
requested by the user community. This feature
requires that the DA boundaries respect the
boundaries of both census tracts and census subdivisions. As census tract and census subdivision
boundaries remain relatively stable over time, so
in theory would the DA boundaries in the future.
Uniformity between DAs is achieved by setting a
target population of 500 people per DA (e.g., violating the homogeneity factor). Setting this target
population also aids in the avoidance of potential
data suppression for purposes of privacy. The
irony remains that to undertake future temporal
stability for the DAs, the existing EAs were
discarded.
The shift from EAs to DAs is associated, however,
with difficulties in performing comparisons
between the 2001 census data and other historical
census data, because the spatial frameworks are no
longer equivalent. The problem of non-aligned census geographies is further compounded by significant differences in the SNF from 1996 to 2001. The
differences are illustrated in Figure 2(a, b) (below).
The discrepancies between EAs and DAs are far
more significant in urban/census metropolitan
areas where street networks and EA/DA densities
are the greatest. The enormity of the EA/DA problem in urban areas is illustrated by the fact that
only 198 of more than 3300 DAs in the Greater
Vancouver Regional District (GVRD) correspond to
1996 EAs on a one-to-one basis (Statistics Canada
2003). This low correspondence means that an EA
to DA statistical correspondence file cannot be
1 Homogeneity in terms of population size was another factor
requested by the user community with the suggestion that
dwelling type be used as a basis for homogeneity. To enlist the
homogeneity factor as a design criteria would have required
dwelling type counts to be generated by block from the 1996
EA-level census data. It was concluded, however, that the quality of such dwelling type estimates would be inadequate, and
the homogeneity criterion was dropped (Purderer 2001).
Spatial/temporal mismatch
(a)
A
32
m B
1996 Street network
2001 Road network
The offset between nodes A
and B is approximately 32 m
(b)
Offset distance 30 m or more
DA 551
EA 991
EA 993
Offset distance
20 m or more
EA 992
EA 994
The DA 551 prior to conflation is not spatially congruent with the
EA boundaries
Dissemination area 551 boundary
Enumeration area boundary
Figure 2
(a) Spatial shift of street features. The spatial shift between the 1996 street
network and the 2001 road network can exceed 32 m in certain areas. (b)
Spatial mismatch of enumeration area (EA) and dissemination area (DA)
boundaries. The spatial shift in the street networks results in the DA
boundaries not lining up with their corresponding EA boundaries
used for historical spatial analysis. This limitation
has important consequences for Canadian geographers who seek to understand spatio-temporal
shifts in socio-economic phenomena.
Geographical Dimensions and
Historical Comparisons: The Need for
EA to DA Spatial Correspondence
The authors are part of a team at the Institute
for Health Research and Education (IHRE) at
Simon Fraser University, investigating ways of
The Canadian Geographer / Le Géographe canadien 50, no 1 (2006)
77
characterizing population health spatially using
high-resolution (cadastral level) spatial data and
a range of detailed attributes associated with this
degree of granularity. The IHRE group was confronted with the EA/DA correspondence failure in
the fall of 2002 when it attempted to characterize
changes in the socio-economic conditions across the
GVRD between 1996 and 2001. The temporal component of this analysis is a means of teasing out
patterns in health outcomes related to social status
that might not be evident in a static temporal frame.
Upon discovering the lack of spatial correspondence between 1996 and 2001 SNF, IHRE initiated
discussions with Statistics Canada. Several meetings
ensued during which it became evident that the spatial dimensions of correspondence had not been
accounted for by Statistics Canada. The agency had
done a good job of ensuring attribute correspondence, but they did not have a stock algorithm or
the internal capability to reconcile the spatial geometries of the 1996 and 2001 censuses. Discussions
between Statistics Canada and the research group
identified two possible methods of reconciliation
between the 1996 and the 2001 SNF. The first
method involved the use of an EA/DA correspondence file, while the second involved using the
block-face points generated for the 1996 street network as a link between the two data sets. As the 1996
EAs and the 2001 DAs show a low one-to-one correspondence, the first method was discarded. Thus, it
was decided to use the block-face points generated
for the 1996 SNF as a link between the two data sets.
As the 1996 EAs are partially delineated from the
street network, it is easy to see that a hierarchical
relationship exists between the EAs (and DAs) and
the block-face representative points. Data collected at the EA level are aggregations of data at
the block-face level. Once the spatial shift between
the 1996 EAs and the 2001 DAs is corrected, the
data at the block-face level can be re-aggregated to
match the 2001 DA geography. Figure 3 illustrates
this process
Although Statistics Canada maintains population
and dwelling count data at the block-face level,
data are no longer disseminated for individual
block-faces because of confidentiality concerns.
Nor does Statistics Canada have a methodology
for reconciling the 1996 and 2001 block-face data.
To reconcile attributes from the two censuses, they
requested that our research team provide Statistics
Canada with the adjusted DA boundaries, so that
78 Nadine Schuurman, Darrin Grund, Michael Hayes and Suzana Dragicevic
DA 551
DA 551
EA 991
EA 993
EA 992
EA 994
The re-aligned DA 551 is overlayed onto the
EA geographical areas
EA 992
EA 994
Census data for DA 551 are re-tabulated using
the block-face points falling inside the DA
Enumeration area boundary
Block-face point
Dissemination area 551
Block-face points within DA 551
Figure 3
Re-tabulating the block-face points. Re-aggregating the block-face points from enumeration area (EA) boundaries to the re-aligned dissemination area (DA) boundaries
they could perform a custom tabulation of the 1996
census data (at block-face level) to match the 2001
DA geography. The process of this tabulation
involves a point in polygon overlay, which assigns
each block-face point to the appropriate DA
(T. Brown, Statistics Canada 2003; Personal communication). In summary, the research established
a process of census conflation based upon the
following procedures.
. As the boundaries do not share a one-to-one
geometric correspondence, a concordance file
between the two cannot be used.
. The 1996 block-face point was identified as a
viable link between the two geographies.
. To be able to use the block-face points, we first
had to align the 2001 DA boundaries to the 1996
street network from which the 1996 EAs were
delineated. Otherwise, the DAs would include
some block-face points they should not and
exclude others that they should include.
. Once the boundarieshave been aligned, 1996 census
data held at the block-face level can be re-tabulated
to reflect the 2001 DA geography by Statistics
Canada (termed custom geography tabulation).
The Canadian Geographer / Le Géographe canadien 50, no 1 (2006)
This custom geography tabulation can then be
used to perform a historical analysis between the
1996 and the 2001 census data sets. Once the
prerequisite of reconciling the 1996 and 2001
SNF had been established, our research team set
about developing a conflation methodology. The
first step was to investigate other approaches to
conflation.
Conflation for the Masses: Multiple
Approaches and Dimensions of the
Geometry Problem
Reconciling two geographical frameworks that
represent the same spatial phenomena is referred
to as conflation. There are a number of technical
definitions for conflation (GIS/Trans Ltd 1995;
Yuan and Tao 1999; Kang 2001; Veltkamp 2001;
Rahimi et al. 2002), but, in its most general form,
conflation is the reconciliation of different geometric descriptions of the same feature. Conflation
must also account for inconsistencies between data
sets. In the simplest of cases, it involves a one-toone matching between features in two data sets
Spatial/temporal mismatch
where both the coverages host the same features. In
other cases, there is a one-to-none or one-to-many
correspondence. This complicates the process as
decisions must be made about which map source
should be treated as reliable for feature matching.
In many cases, such decisions must be made on a
feature-by-feature basis.
Theory and prototypes for map conflation were
developed between 1983 and 1985 at the United
States Bureau of the Census. These involved
protocols for point and feature mapping as well
as a number of computer science algorithms that
drew heavily from mathematical theories of
generalization and numerical and geometric
theory (Saalfeld 1993). Early algorithms focused
on point-based conflation, whereas most map
features are polygons necessitating the need for
shape analysis (Doytsher et al. 2001). Shape matching enables registration, approximation and simplication of linear and polygon features (Veltkamp
2001). Current conflation reflects the larger
GIScience field in its experimentation with fuzzy
logic, rough sets, component tool kits and agentbased solutions for conflation (Rahimi et al. 2002).
Recent commercial efforts to incorporate artificial
intelligence in conflation (as opposed to rule-based
expert systems) are reflected in programs like
CONFLEX developed by Digital Corporation (2004). In
each case, however, the goal of conflation is to
reconcile similar but non-coincident spatial files
corresponding to the same area of the earth’s
surface.
Conflation is not a one-step process; it involves
multiple tasks, including reconciling feature location discrepancies including sliver polygons and
edge matching, adding new features to a coverage
where they were not previously represented and
integrating new attributes into a spatial data set
(Yuan and Tao 1999). The algorithmic steps
involved in map conflation have been characterised by Saalfeld (cited in Kang 2001) as
. identify
potential-matching pairs of point
features
. rubber sheet, the first map to align with the second based on pairs identified in step one
. repeat until no new matches are found.
Despite automated approaches to many conflation problems, human intervention remains a reality of conflation. This is despite the optimistic
development of software-based agents and numerous automated routines (Yuan and Tao 1999).
The Canadian Geographer / Le Géographe canadien 50, no 1 (2006)
79
Conflation is like automated generalization (elimination of map detail as scale decreases) in this
respect. There have been numerous attempts to
automate generalization, but none of the technical
solutions have been able to fully incorporate the
flexibility of human judgement (Schuurman 1999).
Operator intervention in conflation is, however,
costly and time consuming—accounting for over
90 percent of time to conflate for only 5 percent of
matches on a given project (Yuan and Tao 1999).
Conflation is not a fully fledged automated procedure nor is it a purely technical endeavour.
There are numerous institutional dimensions of
conflation. Creating correspondences between different spatial descriptions of the same features
are the basis for extending the resolution and
scope of spatial data and their attributes. The
larger goal of our population health research
program is to develop a high-resolution basis for
investigating subtle shifts in health characteristics
of population. It is an example of research that
requires co-operation between multiple institutions to build the appropriate spatial data set.
The development of integrated data sets for
specific analyses is part of a broader trend away
from proprietary, single-use data that are owned
by institutions and agencies (Schuurman 2002).
This trend is opposed by a long tradition of institutional ownership and jurisdiction over data and
is countered in Canada by prevailing traditions of
cost-recovery for data (Klinkenberg 2003). Disjoint
spatial databases such as municipal cadastre systems are often isolated pockets of potential analysis, but their true power in aiding spatial and
temporal analysis remains to be realized. In the
absence of map conflation, such analysis is stymied by curtailed spatial data with no temporal
depth. Despite recognition of conflation as a GIS
process for over two decades, the tendency to
work in local jurisdictional environments with little data sharing has, to date, forestalled geographical research (Rahimi et al. 2002).
Depth and extent of spatial data are achieved
through multiple individual conflation and integration efforts. As in Canada, many local governments in the US maintain their own large-scale
data sets in the form of digital orthophotos, parcel
maps and road networks. There are discrepancies, however, between these data and the Census
Bureau’s map data sets. One example is Vermont’s
efforts to develop spatial and attribute
80 Nadine Schuurman, Darrin Grund, Michael Hayes and Suzana Dragicevic
correspondence between their state data collection—the Vermont centre for geographic information—and topological integrated geographic
encoding and referencing used by the US census
(Sperling and Sharp 1999). Another state conflation project that shares similarity to Canada’s EA/
DA mismatch is a map conflation procedure developed to correct discrepancies between the US
Census Bureau and local government data in
Delaware County, Ohio (Kang 2001). A data integration system, based on geometric principles as
opposed to attribute information matching, was
implemented in ESRI’s ARCVIEW as an interactive
cartographic system and is similar to the program
developed by the authors in the ARCGIS environment (described below). This tool aided the administrators of Delaware County, Ohio, to successfully
update 2000 collection blocks, correct inaccurate
addresses and identify missing housing units in
multiple locations (Kang 2001).
Few people realize, however, the degree to which
clerical and GIS resources are strained by map
conflation projects (Sperling and Sharp 1999).
Frequently, there is no extra budget to extend data
sets and insufficient expertise. Add to this the cost
of purchasing powerful conflation software, and
conflation may remain on the back burner. To add
insult to injury, when organizations do invest in
appropriate software, it often requires customization by skilled programmers to suit local needs. On
the other hand, the need for conflation is burgeoning as local governments and businesses want to
add global positioning system and other detailed
information to existing infrastructural coverages
such as that provided by the Canada census.
Map conflation is a precursor to statistical, epidemiological and sociological analysis of many
spatial phenomena (Kang 2001). Frequently, conflation involves integration of two geospatial data
sets, in which one is acknowledged to be the
superior source, but the other contains valuable
attribute and/or spatial information. This is not
necessarily the case with the 1996 and 2001 census, although the geometric descriptions of the
latter are currently accepted as more reliable.
Conflation of census to census data encompasses
many traditional aspects of conflation. It is simpler, however, in that semantic conflation is
easier, because similarities between attributes
from different years have a limited range of associated meaning. The category (females zero to
The Canadian Geographer / Le Géographe canadien 50, no 1 (2006)
fourteen years) has, after all, only so many interpretations. The challenges of geometric conflation
were, however, made evident, as we struggled to
match the 1996 and 2001 geometries.
A Made-in-Canada Solution: CensusSpecific Conflation
Our group thus began an investigation into how
best to reconcile the spatial definitions used in
the two recent censuses. This was critical to the
research project; without such correspondence, it
would be impossible to superimpose more
detailed (cadastre level) spatial data and their
associated variables (e.g., property tax assessment) with more aggregated socio-economic variables from the two censuses to illustrate trends
over five years. The first step towards conflating
the two geometries entailed a search for offthe-shelf software that addressed the problem.
There are a number of expensive and encompassing software packages marketed for conflation.
Like most university research groups, however,
we wanted to avoid paying for software features
that were not specific to their particular problem.
Moreover, we were concerned that other Canadian
researchers would be restricted in their analysis
without the identification or development of a
mechanism for census conflation. We also recognized that a Canada-wide standard for rectification of census geography is imperative for
developing comparable research in different
parts of the country—especially in an area like
spatialized health research because of the possible sensitivity of research results.
Conflation software capable of automatic
boundary alignment comes at a hefty price.
Comprehensive products include GIS/TRANS GIS/
T-CONFLATE software and LAND BASE SYSTEMS TOTAL FIT
software. A decision was made instead to develop
an in-house solution at IHRE that could be used by
other members of Canadian research community
interested in comparing 1996 and 2001 spatial
data. The development of a custom conflation
software tool for EA/DA alignment was possible,
because it represents an isolated and limited geometrical problem rather than a host of one-to-one,
one-to-many, many-to-one and many-to-many spatial and attribute non-correspondences. The focus
was to create a simple program that allows the
user to automatically or manually create links
Spatial/temporal mismatch
between two data sets and then aligns one data set
to the other based upon these links.
Many university researchers in Canada (and the
United States) use ESRI’s software products for GIS
analysis—a product of very effective market seeding based on academic discounts. Development
of a conflation mechanism for census geometries
in ESRI’s newest environment, ARCGIS 9.0, was
therefore the obvious choice. The ARCGIS 9.0 software package provides numerous editing tools
designed to overlay and match one data set over
another. However, an editing tool that automatically adds links to the data set, enabling linkages
between the edit coverage (coverage the user is
adjusting) and the snap coverage (the data set
the user is adjusting the edit data set to) is not
available with the current version of the software.
Therefore, the research team developed a coding
and application tool in the ARCGIS environment
that would automate this as part of the data conflation process. The interface to the census conflation tool is shown below in Figure 4. Additional
information and the application tool can be downloaded from the http://www.gis.sfu.ca website.
Validation: Ensuring That the Dots
Line Up
Substantiation of the process and determination of
its limitations were important aspects of the project.
Validation focused on (i) how well the EA/DA conversion was performed and (ii) whether the 1996
Figure 4
The census conflation tool—autolink form interface
The Canadian Geographer / Le Géographe canadien 50, no 1 (2006)
81
block-face points were attributed to the correct corresponding DAs. The team investigated the number
of block-face points that had been assigned to new
DAs. These numbers were obtained by overlaying
the 1996 block-face point coverage with that of the
DA boundaries at several stages of the rectification
process. In each instance, the block-face point coverage was overlaid with the DA boundaries, before
any adjustment was performed. The same overlay
was performed with the DA coverage subsequent to
the autolink adjustment, the manual adjustment
and finally after the quality control adjustment.
After each overlay, the attribute for the DA identifier
(DAUID) was renamed to produce multiple fields for
the DA ID. This sequence of layers was then analyzed to determine how membership changed at
various stages. For example, after all overlays were
completed, the final block-face point coverage contained fields named DAUID_BEFORE, DAUID_ALINK
and DAUID_MANUAL. Block-face point features
could then be selected by those which had a changed
DAUID attribute after each adjustment process.
The efficiency of the autolink feature could be
further assessed by a series of more complex
queries. For example, points, which had been
moved into the incorrect DA by the autolink feature, were found by the following queries:
For those points which autolink failed to correct:
‘‘DAUID_BEFORE’’ ¼ ‘‘DAUID_ALINK’’ AND ‘‘DAUID_
BEFORE’’ <> ‘‘DAUID_MANUAL’’
For those which were initially assigned to the
wrong DA and were moved by autolink to another
DA which was incorrect:
‘‘DAUID_BEFORE’’ <> ‘‘DAUID_ALINK’’ ‘‘DAUID_
BEFORE’’ <> ‘‘DAUID_MANUAL’’ AND ‘‘DAUID_ ALINK’’
<> ‘‘DAUID_MANUAL’’
And for those which had been incorrectly moved
out of their correct DA:
‘‘DAUID_BEFORE’’ <> ‘‘DAUID_ALINK’’ AND
‘‘DAUID_ BEFORE’’ ¼ ‘‘DAUID_MANUAL’’
The resulting numbers were broken down by municipality and are displayed in Table 1. Summarization
by municipality was achieved by overlaying the point
coverage with the available GVRD land-use coverage,
which contained the attribute for each municipality.
The table summarizes that the autolink program
was able to match up some of the boundaries correctly. Half of the adjustments, however, were ultimately performed manually to compensate for
autolink errors and those missed by the automated process.
The Canadian Geographer / Le Géographe canadien 50, no 1 (2006)
69
88,134
3,437
114
124
0
0
1
5
141
170
139
12
96
53
44
56
205
699
842
0
337
236
58
105
AUTO LINK
corrections
3,219
123
188
0
0
4
2
161
77
108
30
169
33
64
99
293
757
438
4
241
266
74
88
Manual corrections
missed by AUTO LINK
109
0
4
0
0
0
1
6
3
4
1
10
0
1
0
10
15
33
0
8
8
1
4
Manual corrections
of AUTO LINK A*
1,178
16
70
0
0
3
1
104
36
61
12
44
16
13
18
135
250
187
0
96
57
17
42
Manual corrections
of AUTO LINK B**
101
17
61
4
7
2
10
Manual quality
check corrections
4,607
139
262
0
0
7
4
271
116
173
43
240
56
78
117
438
1,083
662
4
355
331
92
136
Total manual
corrections
*A refers to those points which were originally allocated to the wrong dissemination area (DA) but were allocated to another DA that was still incorrect by AUTO LINK.
**B refers to those points which were originally allocated to the correct DA but were subsequently assigned to another DA by AUTO LINK.
GVRD
N/A
3,313
5,147
108
73
130
1,658
1,399
1,302
5,176
16,974
18,150
Electoral Area A
Township of Langley
Village of Anmore
Village of Belcarra
Village of Lions Bay
North Vancouver
Port Coquitlam
Port Moody
Richmond
Surrey
Vancouver
163
5,380
3,329
4,204
782
3,246
of
of
of
of
of
of
463
8,088
5,690
1,061
2,229
Total blockface points
City of White Rock
District of Delta
District of Maple Ridge
District of North Vancouver
District of Pitt Meadows
District of West Vancouver
City
City
City
City
City
City
Bowen Island Municipality
City of Burnaby
City of Coquitlam
City of Langley
City of New Westminster
Municipality
Table 1
Statistics of adjusted block-face points for municipalities within the Greater Vancouver Regional District (GVRD)
8,044
253
386
0
0
8
9
412
286
312
55
336
109
122
173
643
1,782
1,504
4
692
567
150
241
Total
corrections
82 Nadine Schuurman, Darrin Grund, Michael Hayes and Suzana Dragicevic
Spatial/temporal mismatch
One of These Things Is Not the
Same: The Problem with Matching
Attributes
It may be evident to some readers that a problem
of some magnitude remains after the spatial geometry is reconciled; re-distributing the attributes
from 1996 (disseminated at the EA level) to match
the 2001 DAs. The districting matches, but the
attributes are still being reported using different
spatial units. We contacted Statistics Canada
about this to find that their office is happy to do
this but at a cost. It is compulsory, however, that
Statistics Canada do the attribute reconciliation
between the newly convergent EA/DA geometries,
because they are the sole owners of the block-facelevel attribute data which is protected for privacy
reasons. Data are collected at the block-face level
but distributed at the EA/DA level. The tabular
data that university and government researchers
have access to do not contain a link down to the
block-face level for reasons of confidentiality.
Statistics Canada can provide a custom tabulation
of the 1996 block-face points to the rectified DA
boundaries for 2001.
Obtaining the custom tabulation allows direct
linkage between attributes of the 1996 and 2001
census. This, in turn, enables the researcher to
compare data at a higher resolution (DA vs. EA).
An example is provided by the analysis of health
status based on mortality statistics. At a higher
resolution, even small errors in assigning deaths
to specific locations (arising from spatial mismatch of underlying frameworks) can cause large
increases in estimated mortality rates, particularly
when the aim is to aggregate small areas based on
income into quintile aggregates. There is no reason to assume that such errors would be randomly
distributed within an urban area, particularly as
re-development in inner city locations typically
results in displacing of lower income groups by
higher ones as in the case of Vancouver’s
Concord Pacific and Yaletown developments.
From Principle to Practice:
Implementation of Census Conflation
for Spatial Analysis
The utility of this software and methodology warrant substantiation, and this can be achieved by
The Canadian Geographer / Le Géographe canadien 50, no 1 (2006)
83
visiting the context for this tool kit. The IHRE
research group is interested in examining gradients
of social inequality and ultimately linking these to
sentinel health conditions. Assessment of ‘at-risk’
areas from a socio-economic perspective using
existing census data requires working at as high a
resolution as possible. Thus, census tracts which
have matching geometries are not appropriate for
this analysis. Integration of high-resolution socioeconomic data from the 1996/2001 Canada census
allows researchers to conduct historical analyses of
economic and social conditions as well as take into
account external conditions such as inward and outward migration. These data can be used to portray
changes in the health of population more explicitly
than a static picture. This dynamic framing of
health indicators can also be used to measure the
influence of federal and provincial policies on a
population’s social and economic conditions and
be correlated with health outcomes.
Conclusion: Assessing the
Functionality and Weighing
the Benefits
There are two components for assessing the
functionality of our proposed solution to the
EA/DA conflation problem. The first is technical
and involves determination of the extent to which
the labour involved was warranted by the results
and whether the software is usable. The second is
more abstract and involves an assessment of the
degree to which conflation of the 1996 and 2001
Canada census data will improve the quality and
extent of socio-demographic and health research
in Canada. The first of these is easier to assess, as
it can be described using metrics.
The adjustment of the DA boundaries to concord
with the 1996 street network data for the GVRD
census geographies took 70–75 h of work; this
included manual adjustments that were required.
The autolink function was found to operate best in
areas which had a grid system street configuration,
such as the City of Vancouver. It did not, however,
function optimally in dense urban areas with
non-uniform configuration (crescent streets, etc.)
such as those found in the cities of Surrey and
Richmond. In these areas, there was significant
number of cases in which the autolink feature allocated the wrong DA IDs to the block-face points.
84 Nadine Schuurman, Darrin Grund, Michael Hayes and Suzana Dragicevic
The usability of the program is not in question.
It is simple to use and operate. Processing time
may be a concern when using very large data
sets, although projects of such scope would
probably have the resources to purchase and
implement large-scale conflation software. Future
improvements to this software might include
the ability to add and delete nodes. It would also
be salutary to be able to measure the overall
degree to which geometric accuracy is compromised by the re-allocation of DAs based upon
the 1996 SNF—bearing in mind that the accuracy
of the street files are in question anyway given the
degree of variation between 1996 and 2001 files.
Analytical accuracy is, of course, greatly enhanced,
given that spatial temporal comparison between
1996 and 2001 are not otherwise possible.
The second axis of assessment—contribution—is
more difficult to analyze. It is based not only upon
extent of uptake and the technical satisfaction of
future users. It is also influenced by the resources
necessary to create correspondence between DAs
and EAs and by the impact of the resulting analysis.
The issue of resources is a subtle one, because a
research group that is already taxed in terms of
technical expertise will not appreciate the incursion
of this exercise to the tune of 70 h. The impact can,
however, be priceless. As the examples above illustrates, there are a number of spatial analyses that
will never otherwise be accomplished with the associated cost to knowledge in areas as diverse as
immigration studies, population health and economics. The chief contribution of this methodology
and software is an ability to incorporate a spatial
dimension to temporal analysis of Canada census
data. It thus permits the transformation of static
spatial features and events to spatial temporal
entities—with the greater dimensionality that such
entities encompass.
Acknowledgements
This research was made possible through the support of the
Canadian Institute for Health Information, Canadian Population
Health Initiative research project ‘Urban Structures, Population
Health and Public Policy’. We thank Ted Brown, Regional
Advisor, Statistics Canada Pacific Region for his assistance
The Canadian Geographer / Le Géographe canadien 50, no 1 (2006)
with this endeavour. Luan Vo, research assistant, provided
invaluable assistance.
References
2004 Conflex: Intelligent, Automated Conflation
Software Available at http://www.digitalcorp.com/conflex.htm
(accessed 11 November 2004)
DOYTSHER, Y., FILIN, S., and EZRA, E. 2001 ‘Transformation of datasets in
a linear-based map conflation framework’ Surveying and Land
Information Systems 61(3), 159–169
GIS/TRANS LTD. 1995 Comprehensive GIS Conflation, GIS/Trans Ltd,
1–9 Available at http://www.gistrans.com/pub/cf_whipr.pdf
(accessed March 2004)
KANG, H. 2001 Spatial Data Integration: A. Case Study of Map
Conflation with Census Bureau and Local Government
Data, University Consortium for Geographic Information
Science (UCGIS) Summer Assembly, June 2001. Available at
http://www.cobblestoneconcepts.com/ucgis2summer/
kang/kang_ main.htm (accessed March 2004)
KLINKENBERG, B. 2003 ‘The true cost of spatial data in Canada’ The
Canadian Geographer Le Géographe canadian 47(1), 37–49
MARTIN, D. 1996 Geographic Information Systems: Socioeconomic
Applications (New York: Routledge)
PURDERER, H. 2001 Introducing the Dissemination Area for the 2001
Census: An Update Geography Working Paper Series, Statistics
Canada: 1–7. Available at http://www.statcan.ca/english/
research/92F0138
MIE/92F0138MIE2000004.pdf
(accessed
November 2003)
RAHIMI, S., COBB, M., ALI, D., PAPRZYCKI, M., et al. 2002 ‘A knowledge-based
multi-agent system for geospatial data conflation’ Journal of
Geographic Information and Decision Analysis 6(2), 67–81
SAALFELD, A.J. 1993 ‘Conflation: automated map compilation (automated mapping)’ PhD Dissertation, Computer Vision
Laboratory, Center for Automation Research, University of
Maryland College Park
SCHUURMAN, N. 1999 ‘Critical GIS: theorizing an emerging discipline’
Cartographica 36(4), 1–108
—. 2002 ‘Flexible standardization: making interoperability accessible to agencies with limited resources’ Cartography and
Geographic Information Science 29(4), 343–353
SPERLING, J. and SHARP, S.A. 1999 ‘A prototype cooperative effort to
enhance TIGER’ URISA Journal 11(2), 35–42
STATISTICS CANADA 2001 2001 Road Network Files – Reference Guide –
Cat no. 92F0157GIE
—. 2003 2001 Census Dictionary Available at http://www.stat
can.ca/english/census2001/dict/appendices/92-378XIE02002.pdf (accessed October 2003)
VELTKAMP, R.C. 2001 Shape Matching: Similarity Measures and
Algorithms Proceedings from International Conference on
Shape Modeling and Applications 2001, pp. 188–197. Genova,
Italy, May 2001
YUAN, S. and TAO, C. 1999 Development of Conflation Components.
Geoinformatics and Socioinformatics: The Proceedings of
Geoinformatics´99
Conference,
Ann
Arbor,
Michigan,
University of California, pp. 1–13
DIGITAL CORPORATION
Download