Spatially-Aware Information Retrieval with Graph-Based Qualitative Reference Models Thomas Vögele , Christoph Schlieder

Spatially-Aware Information Retrieval with
Graph-Based Qualitative Reference Models
Thomas Vögele(1), Christoph Schlieder(2)
(1)
TZI, University of Bremen, PO-Box 330440, 28334 Bremen, Germany,
(2)
Bamberg University, 96045 Bamberg, Germany,
vogele@tzi.de|christoph.schlieder@wiai.uni-bamberg.de
Abstract
Geo-referenced information is used by a growing number of
“spatially-aware” tools in different application areas,
including tourism, marketing, environmental management,
and mobile location based services. To support such
applications, methods for “spatially-aware” information
retrieval that do not only consider the thematic, but also the
spatial relevance of information items are needed.
In addition to information “directly” geo-referenced with
the help of coordinates, there exist large amounts of
information that is geo-referenced “indirectly” through
place names. Place name lists, or gazetteers, link place
names to coordinate space, but do offer only limited options
for the spatial representation of place names and reasoning
about spatial relevance.
In this paper, we outline the concept of qualitative spatial
reference models that use regional approximations of place
names in support of reasoning about spatial relevance. The
core components of such reference models are graph-based
abstractions of polygonal standard reference tessellations,
together with their intrinsic decomposition hierarchies.
Reasoning about spatial relevance is based on a metric that
evaluates the vertical and horizontal proximity of spatial
entities.
Introduction
As growing stores of geospatial information are being
collected world wide, the management of and access to
this information becomes more and more important.
Several ongoing initiatives try to set standards and to
establish infrastructures for the exchange of geospatial
data, both on a national and international level (OGC
1999), (OGC 2001),(ISO/TC-211 2000), (ISO/TC-211
2000), (Kuhn, Basedow et al. 2000).
Because most of these efforts are rooted in the GI
community, the term “geospatial information” is used
mainly for data sources such as digital cartographic
products, surveys, satellite images, aerial photographs, and
data from ground-based and atmospheric monitoring
stations. These resources use geographic coordinates to
locate their footprints on the Earth’s surface and can thus
be categorized as directly geo-referenced geospatial data
(Goodchild 1999).
However, many geospatial data do also have spatial
relations to named geographic features such as cities,
parks, and biogeographic regions. Named geographic
features, or place names, are managed with the help of
gazetteers, and digital geo-referenced gazetteers are used
in a number of digital library projects (e.g., the Alexandria
470
Digital Library (Hill, Frew et al. 1999)). They link place
names to geographic footprints and provide an indirect
geo-referencing of geospatial information.
We claim that an integrated view on directly- and
indirectly geo-referenced data can be the basis for a
number of new “geographically aware” applications. These
may be found in, but will not be confined to, the fields of
intelligent information retrieval and spatial metadata
(Schlieder and Vögele 2002), the application of ad-hoc
networks for exchange of geospatial information (Vögele
and Schlieder 2002), as well as mobile and location-based
services. These increasingly user-centered applications will
have to rely on “personalized” gazetteers that are based on
standardized place names provided by “official” gazetteers,
but can be customized by the user for special purposes
(Brandt, Hill et al. 1999).
One of the major advantages of gazetteers is the fact
that they provide a “parsimonious representation of
geographic space that combines a rich set of place name
data with only limited locational data (Jones, Alani et al.
2001). However, this is also one of their main weaknesses:
By drastically reducing the complexity of the spatial
representations, gazetteers offer only very limited support
for spatial reasoning. A lot of the spatial knowledge that is
implicit in representations like the ones typically used by a
GIS has to be explicitly encoded into a gazetteer. There are
approaches which extend the spatial reasoning capabilities
of gazetteers with the help of Voronoi polygons based on
coordinate points (Alani, Jones et al. 2001), or the use of
spatial indices based on uniform grids (Riekert 1999).
In this paper we will outline an approach that uses
graph-based spatial reference models as the basis for
qualitative spatial footprints of place names. We will show
how these representations can be used for reasoning about
spatial relevance.
Components of a Spatial Reference Model
Place Names and Spatial Footprints
One of the primary tasks of a gazetteer is to geospatially
reference place names, i.e. to provide a common frame of
reference for the geographic positioning and
disambiguation of place names. Most gazetteers
approximate the regional extent of geographic objects
through a spatial footprint in the form of a geographic
(point) coordinate, or a rectangular bounding box (defined
FLAIRS 2003
c 2003, American Association for Artificial IntelliCopyright °
gence (www.aaai.org). All rights reserved.
by two point coordinates). Only in some cases, more
complex geographic representation, like polygons, are
used.
All footprint representations described so-far rely on a
geographic coordinate system and the use of geometric
algorithms to evaluate spatial relations among place
names. An alternative approach is to abstract from
geographic coordinates and to use spatial indices. Some
gazetteers represent place names as a set of spatial indices
that is obtained by projecting the regional extent of a place
name onto a uniform orthogonal reference grid (Angrick,
Bös et al. 2002).
The concept of a qualitative spatial footprint presented
in this paper relies on spatial indices as well. However,
instead of a uniform (or other) regular reference grid, we
use polygonal standard reference tessellations as a frame
of reference.
Polygonal Standard Reference Tessellations
In a GIS, polygons are frequently used to represent
geographic objects in 2-dimensional Euclidean space E2. A
(simple) polygon in E2 can be defined as an area that is
enclosed by a simple closed polyline, which represents the
boundary of the polygon (Worboys 1995). The polyline
consists of a finite set of line segments (edges). The endpoints of the edges are called vertices. In this paper we
apply a definition where a polygon is a closed sets of
points, i.e. edges and vertices belong to the polygon.
Polygons can be arranged in different ways in E2: If we
consider that polygons P1,...,Pn are contained in a part
of the plane bounded by a polygon P, two types of
arrangements of the polygons within the containing
polygon P can be distinguished (Schlieder, Vögele et al.
2001): A polygonal covering P=P1c...cPn. The polygons
cover the containing polygon and in general, they will
overlap. A polygonal patchwork interior, where
(Pi1Pj)=0 for all i≠j from {1,...,n}. In a
patchwork, the polygons are either disjoint or intersect
only in edges and/or vertices. A special, but very common
type of arrangement is the polygonal tessellation, which
can be defined as a polygonal covering that also forms a
polygonal patchwork.
If we decompose a polygonal covering, patchwork, or
tessellation P into its components P1,...,Pn, we obtain a
decomposition that can be represented by a hierarchical
data structure encoding the spatial part-of relation together
with the type of arrangement of the parts (Schlieder,
Vögele et al. 2001). For the special case of a polygonal
decomposition by tessellation, we can define the relation
tess⊆Π×2∏, where Π denotes the set of polygons in the
plane, and tess(P,{P1,…,Pk}) iff{P1,…,Pk} is a
tessellation of P. Using this relation, we can say that a
polygon P1 is spatially part-of a polygon P2 if P1 is part of
the decomposition by tessellation of P2, i.e.
P1¤P2 iff tess(P2,{…,P1,…}). Applied to the
decomposition hierarchy shown in Figure 1, we can say for
example that AA¤A, and tess(A,{AA,AB}).
Figure 1: Decomposition hierarchy of a polygonal
tessellation
Many artificial, man-made subdivisions of geographic
space do form polygonal tessellations. Typical examples
are administrative units, postal code areas, and census
districts. Because they represent “official” and
standardized spatial models, we refer to them as polygonal
Standard Reference Tessellations, or pSRTs. For a number
of reasons, pSRTs are well suited to provide a refrerence
for spatial indices:
• Many pSRTs offer reference units that can be
addressed in an intuitive way through well-know
descriptors (which are in fact place names
themselves), like for example the names of
administrative units in a tessellation of administrative
subdivisions. Human users can relate to such entities
much better than to arbitrarily created and cryptically
named uniform grid cell rasters. For example, it is
much easier to refer to a polygon called Contra Costa
County than to a grid cell descriptor like CA1089.
• Many organizations use pSRTs of administrative units
or postal code areas to geo-reference their (spatial)
data holdings. As a result, digital versions of pSRTs
are typically easy to obtain.
• Administrative units and other pSRTs are organized as
hierarchical partonomic structures. A nation may
decompose into states, each of which decomposes into
counties and so on. We will show below how these
decomposition hierarchies can be used in support of
spatial relevance reasoning.
Graph-Based Spatial Reference Models
About the Need for Qualitative Abstraction
Quantitative (i.e., coordinate-based) representations of
pSRTs are often rather complex and bulky. A polygonal
tessellation of the counties of the contiguous United States,
for example, may easily exceed 9 MB (ERSI shape
format). However, for the type of qualitative spatial
relevance reasoning which is the focus of this paper, most
of the
information-content of such quantitative
representations is redundant. In our approach, we use
qualitative, graph-based abstractions of pSRTs. They
capture the topological relations between polygons needed
for qualitative spatial relevance reasoning. At the same
FLAIRS 2003
471
time, they are the basis for light-weight and highly
exchangeable spatial models.
Connection Graphs and Decomposition Trees
There are a number approaches to use graph structures to
represent topological relations between
regional
geographic objects (Molenaar 1998),(Kuijpers, Paredaens
et al. 1995). Based on this work, we introduced connection
graphs (Schlieder, Vögele et al. 2001) to encode
topological neighbourhood relations between polygons in a
tessellation. Connection graphs encode neighbourhood
relations together with their ordering and, if applicable, the
identification of an external area.
By recursively decomposing a pSRT, we can analyse
its hierarchical structure and represent it as a
decomposition tree. The recursive decomposition of the
pSRT depicted in Figure 1, for example, yields the
decomposition tree shown in Figure 2. Formally, the
decomposition hierarchy of a reference tessellation is a
directed acyclic graph (DAG). Out of the set R of all
reference units in a pSRT D, each node in the graph
represents a reference unit r0R, while the edges between
two nodes ri and rj denote spatial part-of relations
between the reference units, i.e. ri¤rj. Reference units
can be grouped into partonomic sets S, where
S={ri,…,rn},r0R. A partonomic set S of reference units
is called non-redundant if none of the reference units in S
is spatially part-of another reference unit in S, i.e. S is nonredundant iff ∀ri0S,¬›rj0S : ri¤rj).
A partonomic set S of reference units is normalized if
all reference units r0S have the same graph-theoretical
depth de, i.d. the same distance from the root.
S is-normalized iff œri0S,œrj0S : de(ri)=de(rj).
An important property of a pSRT is that it can be
decomposed into normalized partonomic sets of reference
units, with each set representing a specific level of the
partonomic hierarchy of the SRT. Because each level of
the hierarchy represents a specific granularity of the SRT,
we can also speak of the levels-of-detail L of the
decompostion tree. The decomposition tree in Figure 2, for
example, has four levels-of-detail, with L0 being the least,
and L3 being most detailed representation.
Qualitative Spatial Footprints
Using the qualitative spatial reference model described
above, we can approximate the regional extent pn of a
place name in terms of a set of reference units
Spn={ri,…,rn},ri0R, where R is the set of all reference
units in a spatial reference model. To map pn to R, we use
a is-defined-as function L:P→2R, where P is the set of all
place name regions, and 2R is the power set of all reference
units. We call Spn the qualitative spatial footprint of the
place name region pn.
472
FLAIRS 2003
Figure 2: A decomposition tree
L is based on an evaluation of the topological relation
between pn and ri0R. Because both pn and ri represent
regions in 2-D Euclidean space, we can use the region
connection calculus RCC-8 (Randell et al. 1992) to
describe the topological relations between them. For the
case where pn is equal to or proper part of a single
reference unit ri, its spatial footprint Spn contains only
one element ri. If pn overlaps or contains multiple
reference units, Spn consists of more than one reference
unit. If pn cannot be mapped onto a reference unit, i.e. pn
and ri are disconnected or externally connected, Spn is an
empty set .
L(p) =
{ri}
EQ(p,ri)wPP(p,ri)
{…,ri,…} PO(p,ri)wPP-1(p,ri)
{}
DC(p,ri)wEC(p,ri)
Applied to the decomposition hierarchy depicted in Figure
1, the normalized spatial footprints for the three place
names shown in Figure 3 can be defined as SPN1={AAA},
SPN2={AAA,AAB,ABA,ABB,CBA},
SPN3={ABA,ABB,BAA,BAB,BBA}.
As long as Spn remains non-redundant, the spatial
footprint may consist of reference units taken from
different levels of the decomposition hierarchy. For
example, SPN2 in Figure 3 could be defined in a nonnormalized form as a combination of reference units from
L1 and L3, i.e. SPN2={A,CBA}. This has significant
practical implications, as it eliminates the need to go to the
highest level of detail to define footprints for place names
with a large spatial extension. However, non-normalized
spatial footprints have to be normalized if we want to
evaluate neighborhood relations and compute spatial
relevance (see below). This is achieved with the help of an
normalization operator η that recursively decomposes all
reference units ri,…,rj in the spatial footprint until
œri0S,œrj0S : de(ri)=de(rj).
Reasoning about Spatial Relevance
A central idea behind “spatially-aware” information
retrieval is to provide access to information items based
not just on thematic, but also on spatial relevance. This
raises the problem of defining the term “spatially relevant”,
and finding an appropriate metric for its computation.
Figure 3: Three placenames projected onto a
polygonal reference tessellation
In geographic space, “everything is related to everything
else, but near things are more related than distant things”
(Tobler 1970). Therefore, we set up the hypothesis that the
spatial relevance σ(rq,ri) of a location ri with respect to
a query location rq increases with decreasing distance D
between ri and rq (Schlieder, Vögele et al. 2001). In the
simplest case σ(rq,ri)=1/D(rq,ri). However, the
concept of spatial relevance is only useful in a comparative
approach. If we consider two locations ri and rj, we can
say that a location ri is spatially more relevant than rj to a
query location rq if σ(rq,ri) > σ(rq,rj).
In a graph-based spatial reference model we can easily
compute the graph-theoretical distances (and therefore
values for spatial relevance) between nodes. However,
because the spatial reference model combines
neighbourhood- (i.e. connection-) graphs with a
hierarchical decomposition tree, two types of distances
interact:
• The neighborhood (or horizontal) distance ν(rq,ri)
of two nodes rq and ri that are part of the same
connection graph. If ν is low, ri is spatially relevant to
rq. With increasing neighborhood distance, the spatial
relevance between rq and ri decreases. In this paper
we focus on the simplest case, where horizontal
proximity can be seen as a rough abstraction of
Euclidean distance between the centers of the spatial
entities. Obviously, this notion will have to be refined,
and parameters like the connectivity or the distribution
of relative sizes of the spatial entities will have to be
addressed as well.
• Secondly, there is the hierarchical (or vertical)
distance δ(rq,ri) of two reference units rq and ri
with respect to the underlying decomposition
hierarchy. The semantics of vertical distance are more
difficult to grasp as they depend heavily on the
semantics of the pSRT and the resulting
decomposition hierarchy. Given a pSRT of
administrative units, a low δ(rq,ri) means that rq
and ri belong to the same administrative super-unit. A
high δ(rq,ri) indicates that the reference units are
“administratively” far apart. The level of vertical
distance between two units in different branches of the
decomposition tree increases with the total depth of
the hierarchical decomposition.
Intuitively, this makes sense: with respect to administrative
issues, a county in California is much further away from a
(comparable) district in Mexico than it is from a county in
Arizona because California and Arizona at least belong to
the same nation (USA). In summary, these two criteria for
spatial relevance lead to a spatial relevance metric that is
based on a combination of the “branch distance” (i.e. the
number of nodes between the start node and the first
common parent it has with the target node) in the DAG
representing the decomposition hierarchy, and the shortest
path distances in the connection graph representing a
reference tessellation at a specific level of detail.
In a prototypical implementation, Dijkstra’s closest
path algorithm was used to compute the horizontal distance
ν(rq,ri) in the connection graph. The vertical distance
δ(rq,ri) was computed by recursively traversing the
decomposition tree until the first common parent of rq and
ri was reached. The total distance D(rq,ri) was obtained
by a linear combination of ν(rq,ri) and δ(rq,ri):
D(rq,ri) = αν(rq,ri) + (1-α)δ(rq,ri), and
σ(rq,ri) = 1/D(rq,ri)
The term α is a weighting factor with a range between 0
and 1. By manipulating α, a spatial query can be fine-tuned
to favour either locations that are spatially close to the
location of interest (α=1), or locations that belong to the
same part of a hierarchical partonomy (α=0). As an
example, we used a pSRT of US counties and US states to
compute σi(rq,{ri,…,rn}) for different values of α
(Figure 4). As query location rq, we chose a county close
to a state boundary. Obviously, this location has the
maximum spatial relevance (σ(rq,rq)=1). For α=1, a
quasi-circular region with counties of decreasing spatial
relevance was computed (Figure 4-a). This region depends
only on geographic neighbourhood, without taking into
account state boundaries. For α=0, all counties in the state
which contains the query location were assigned the same
spatial relevance because they all belong to the same
administrative super-unit (Figure 4-b). All counties in the
neighbouring states are uniformly assigned a lower spatial
relevance. To demonstrate the effect of the depth of the
decomposition hierarchy, we introduced a (somewhat
artificial) subdivision into regions. As a result, the states to
the north are assigned an even lower σ because they belong
to a different region. Finally, for α=0.5, two semi-circles
of decreasing spatial relevance were drawn around rq
(Figure 4-c). They are separated by the state boundary, the
counties in the neighbouring state generally showing a
lower spatial relevance than the counties in the “target”
state.
In an information retrieval task we try to solve queries
of the type concept@location, i.e. we try to find
information sources that are relevant with respect to a
specific thematic concept at or close to a specific location,
FLAIRS 2003
473
or place name. In such a context, the value of α depends
very much on what we try to find, and why we try to find
it. If, for example, we are looking for a vacation home in or
close to a specific county, all neighbouring counties are
relevant, no matter if they belong to another state or not.
On the other hand, if we are in search for suitable property
to set up our business, it may be very important which state
we are in due to different tax laws and other state-specific
legislation.
Figure 4: Computed spatial relevance for α=1 (a),
α=0 (b), and α=0.5 (c)
As a result, the value of α depends on the context of the
query, i.e. both on user intent and the semantics of the
thematic concept. In our prototype, we assigned a default
value of α=0.5 for all queries and left it to the user to
adjust this parameter as needed. In future implementations
it may be worthwhile to evaluate options for including
concept-specific default values for α in the formal
description of the query concepts. Future improvements of
the system will also include algorithms to cope with multiunit footprints, the evaluation of relations between
(overlapping) place names, and the evaluation of spatially
relevant regions between a set of spatially disconnected
query locations.
Results and Discussion
In this paper, we showed that polygonal Standard
Reference Tessellations can be used to build qualitative,
graph-based spatial reference models. These models retain
enough spatial information to support the type of reasoning
about spatial relevance needed in information retrieval
applications. Based on the definition of spatial footprints in
terms of the reference units of such standard reference
models, spatial relevance reasoning can be extended to
place names.
Compared to polygonal GIS data, qualitative reference
models are light-weight and highly interoperable. This
makes them useful for a number of applications, including
machine-readable indices of digital maps (Schlieder and
Vögele 2002), applications as metadata in highly
distributed and ad-hoc networks (Vögele and Schlieder
2002), as well as mobile and location-based services.
In this paper, we outlined the basic concepts of graphbased spatial reference models, qualitative spatial
footprints, and reasoning about spatial relevance. These
concepts will be extended and discussed in more details in
papers to come.
474
FLAIRS 2003
References
Alani, H., C. B. Jones and D. Tudhope (2001). VoronoiBased Region Approximation for Geographical
Information Retrieval with Gazetteers. Internation
Journal of Geographical Information Science 15(4):
287-306.
Angrick, M., R. Bös and T. Bandholtz 2002. Semantic
Network Services (SNS). Proceedings of the 16th
conference "Envrionmental Informatics 2002"
(EnviroInfo'2002), Vienna.
Brandt, L., L. L. Hill and M. F. Goodchild 1999. Digital
Gazetteer Information Exchange (DGIE) - Final Report.
Digital Gazetteer Information Exchange Workshop.
Goodchild, M. 1999. The Future of the Gazetteer. Digital
Gazetteer Information Exchange Workshop.
Hill, L. L., J. Frew and Q. Zheng (1999). Geographic
names: The implementation of a gazetteer in a
georeferenced digital library. D-Lib Magazine 5(1).
ISO/TC-211 2000. ISO/DIS 19119 - Geographic
Information - Services, Norwegian Technology Centre.
ISO/TC-211 2000. ISO/FDIS 19115 - Geographic
Information - Metadata, Norwegian Technology Centre.
Jones, C., H. Alani and D. Tudlope 2001. Geographical
Information Retrieval with Ontologies of Place. COSIT
2001, Morro Bay, California.
Kuhn, W., S. Basedow, C. Brox, C. Riedemann, H. Rossol,
K. Senkler and K. Zens 2000. Geospatial Data
Infrastructure (GDI) North-Rhine Westfalia - Reference
Model 3.0. Münster, Geoinformatik Münster: 46.
OGC 1999. The OpenGIS Abstract Specification, Open
GIS Consortium.
OGC 2001. OpenGIS Consortium Discussion Paper Basic Services Model 0.0.7.
Riekert, W.-F. 1999. Erschließung von Fachinformationen
im Internet mit Hilfe von Thesauri und Gazetteers.
Management von Umweltinformationen in vernetzten
Umgebungen, 2nd workshop HMI, Nürnberg.
Schlieder, C. and T. Vögele 2002. Indexing and Browsing
Digital Maps with Intelligent Thumbnails. In :
Proceedings of the International Symposium on Spatial
Data Handling (SDH) 2002, Ottawa, Canada, Springer.
Schlieder, C., T. Vögele and U. Visser 2001. Qualitative
Spatial Reasoning for Information Retrieval by
Gazetteers. Conference on Spatial Information Theory
(COSIT) 2001, Morro Bay, California.
Tobler, W. 1970. A Computer Movie Simulating Urban
Growth in the Detroit Region. Economic Geography. 46:
360-371.
Vögele, T. and C. Schlieder 2002. The Use of Spatial
Metadata for Information Retrieval in Peer-to-Peer
Networks. AGILE2002, Palma de Mallorca, Spain.
Worboys, M. F. 1995. GIS - A Computing Perspective.
London, Philadelphia. Taylor & Francis.