Spatially-Aware Information Retrieval with Graph-Based Qualitative Reference Models Thomas Vögele(1), Christoph Schlieder(2) (1) TZI, University of Bremen, PO-Box 330440, 28334 Bremen, Germany, (2) Bamberg University, 96045 Bamberg, Germany, vogele@tzi.de|christoph.schlieder@wiai.uni-bamberg.de Abstract Geo-referenced information is used by a growing number of “spatially-aware” tools in different application areas, including tourism, marketing, environmental management, and mobile location based services. To support such applications, methods for “spatially-aware” information retrieval that do not only consider the thematic, but also the spatial relevance of information items are needed. In addition to information “directly” geo-referenced with the help of coordinates, there exist large amounts of information that is geo-referenced “indirectly” through place names. Place name lists, or gazetteers, link place names to coordinate space, but do offer only limited options for the spatial representation of place names and reasoning about spatial relevance. In this paper, we outline the concept of qualitative spatial reference models that use regional approximations of place names in support of reasoning about spatial relevance. The core components of such reference models are graph-based abstractions of polygonal standard reference tessellations, together with their intrinsic decomposition hierarchies. Reasoning about spatial relevance is based on a metric that evaluates the vertical and horizontal proximity of spatial entities. Introduction As growing stores of geospatial information are being collected world wide, the management of and access to this information becomes more and more important. Several ongoing initiatives try to set standards and to establish infrastructures for the exchange of geospatial data, both on a national and international level (OGC 1999), (OGC 2001),(ISO/TC-211 2000), (ISO/TC-211 2000), (Kuhn, Basedow et al. 2000). Because most of these efforts are rooted in the GI community, the term “geospatial information” is used mainly for data sources such as digital cartographic products, surveys, satellite images, aerial photographs, and data from ground-based and atmospheric monitoring stations. These resources use geographic coordinates to locate their footprints on the Earth’s surface and can thus be categorized as directly geo-referenced geospatial data (Goodchild 1999). However, many geospatial data do also have spatial relations to named geographic features such as cities, parks, and biogeographic regions. Named geographic features, or place names, are managed with the help of gazetteers, and digital geo-referenced gazetteers are used in a number of digital library projects (e.g., the Alexandria 470 Digital Library (Hill, Frew et al. 1999)). They link place names to geographic footprints and provide an indirect geo-referencing of geospatial information. We claim that an integrated view on directly- and indirectly geo-referenced data can be the basis for a number of new “geographically aware” applications. These may be found in, but will not be confined to, the fields of intelligent information retrieval and spatial metadata (Schlieder and Vögele 2002), the application of ad-hoc networks for exchange of geospatial information (Vögele and Schlieder 2002), as well as mobile and location-based services. These increasingly user-centered applications will have to rely on “personalized” gazetteers that are based on standardized place names provided by “official” gazetteers, but can be customized by the user for special purposes (Brandt, Hill et al. 1999). One of the major advantages of gazetteers is the fact that they provide a “parsimonious representation of geographic space that combines a rich set of place name data with only limited locational data (Jones, Alani et al. 2001). However, this is also one of their main weaknesses: By drastically reducing the complexity of the spatial representations, gazetteers offer only very limited support for spatial reasoning. A lot of the spatial knowledge that is implicit in representations like the ones typically used by a GIS has to be explicitly encoded into a gazetteer. There are approaches which extend the spatial reasoning capabilities of gazetteers with the help of Voronoi polygons based on coordinate points (Alani, Jones et al. 2001), or the use of spatial indices based on uniform grids (Riekert 1999). In this paper we will outline an approach that uses graph-based spatial reference models as the basis for qualitative spatial footprints of place names. We will show how these representations can be used for reasoning about spatial relevance. Components of a Spatial Reference Model Place Names and Spatial Footprints One of the primary tasks of a gazetteer is to geospatially reference place names, i.e. to provide a common frame of reference for the geographic positioning and disambiguation of place names. Most gazetteers approximate the regional extent of geographic objects through a spatial footprint in the form of a geographic (point) coordinate, or a rectangular bounding box (defined FLAIRS 2003 c 2003, American Association for Artificial IntelliCopyright ° gence (www.aaai.org). All rights reserved. by two point coordinates). Only in some cases, more complex geographic representation, like polygons, are used. All footprint representations described so-far rely on a geographic coordinate system and the use of geometric algorithms to evaluate spatial relations among place names. An alternative approach is to abstract from geographic coordinates and to use spatial indices. Some gazetteers represent place names as a set of spatial indices that is obtained by projecting the regional extent of a place name onto a uniform orthogonal reference grid (Angrick, Bös et al. 2002). The concept of a qualitative spatial footprint presented in this paper relies on spatial indices as well. However, instead of a uniform (or other) regular reference grid, we use polygonal standard reference tessellations as a frame of reference. Polygonal Standard Reference Tessellations In a GIS, polygons are frequently used to represent geographic objects in 2-dimensional Euclidean space E2. A (simple) polygon in E2 can be defined as an area that is enclosed by a simple closed polyline, which represents the boundary of the polygon (Worboys 1995). The polyline consists of a finite set of line segments (edges). The endpoints of the edges are called vertices. In this paper we apply a definition where a polygon is a closed sets of points, i.e. edges and vertices belong to the polygon. Polygons can be arranged in different ways in E2: If we consider that polygons P1,...,Pn are contained in a part of the plane bounded by a polygon P, two types of arrangements of the polygons within the containing polygon P can be distinguished (Schlieder, Vögele et al. 2001): A polygonal covering P=P1c...cPn. The polygons cover the containing polygon and in general, they will overlap. A polygonal patchwork interior, where (Pi1Pj)=0 for all i≠j from {1,...,n}. In a patchwork, the polygons are either disjoint or intersect only in edges and/or vertices. A special, but very common type of arrangement is the polygonal tessellation, which can be defined as a polygonal covering that also forms a polygonal patchwork. If we decompose a polygonal covering, patchwork, or tessellation P into its components P1,...,Pn, we obtain a decomposition that can be represented by a hierarchical data structure encoding the spatial part-of relation together with the type of arrangement of the parts (Schlieder, Vögele et al. 2001). For the special case of a polygonal decomposition by tessellation, we can define the relation tess⊆Π×2∏, where Π denotes the set of polygons in the plane, and tess(P,{P1,…,Pk}) iff{P1,…,Pk} is a tessellation of P. Using this relation, we can say that a polygon P1 is spatially part-of a polygon P2 if P1 is part of the decomposition by tessellation of P2, i.e. P1¤P2 iff tess(P2,{…,P1,…}). Applied to the decomposition hierarchy shown in Figure 1, we can say for example that AA¤A, and tess(A,{AA,AB}). Figure 1: Decomposition hierarchy of a polygonal tessellation Many artificial, man-made subdivisions of geographic space do form polygonal tessellations. Typical examples are administrative units, postal code areas, and census districts. Because they represent “official” and standardized spatial models, we refer to them as polygonal Standard Reference Tessellations, or pSRTs. For a number of reasons, pSRTs are well suited to provide a refrerence for spatial indices: • Many pSRTs offer reference units that can be addressed in an intuitive way through well-know descriptors (which are in fact place names themselves), like for example the names of administrative units in a tessellation of administrative subdivisions. Human users can relate to such entities much better than to arbitrarily created and cryptically named uniform grid cell rasters. For example, it is much easier to refer to a polygon called Contra Costa County than to a grid cell descriptor like CA1089. • Many organizations use pSRTs of administrative units or postal code areas to geo-reference their (spatial) data holdings. As a result, digital versions of pSRTs are typically easy to obtain. • Administrative units and other pSRTs are organized as hierarchical partonomic structures. A nation may decompose into states, each of which decomposes into counties and so on. We will show below how these decomposition hierarchies can be used in support of spatial relevance reasoning. Graph-Based Spatial Reference Models About the Need for Qualitative Abstraction Quantitative (i.e., coordinate-based) representations of pSRTs are often rather complex and bulky. A polygonal tessellation of the counties of the contiguous United States, for example, may easily exceed 9 MB (ERSI shape format). However, for the type of qualitative spatial relevance reasoning which is the focus of this paper, most of the information-content of such quantitative representations is redundant. In our approach, we use qualitative, graph-based abstractions of pSRTs. They capture the topological relations between polygons needed for qualitative spatial relevance reasoning. At the same FLAIRS 2003 471 time, they are the basis for light-weight and highly exchangeable spatial models. Connection Graphs and Decomposition Trees There are a number approaches to use graph structures to represent topological relations between regional geographic objects (Molenaar 1998),(Kuijpers, Paredaens et al. 1995). Based on this work, we introduced connection graphs (Schlieder, Vögele et al. 2001) to encode topological neighbourhood relations between polygons in a tessellation. Connection graphs encode neighbourhood relations together with their ordering and, if applicable, the identification of an external area. By recursively decomposing a pSRT, we can analyse its hierarchical structure and represent it as a decomposition tree. The recursive decomposition of the pSRT depicted in Figure 1, for example, yields the decomposition tree shown in Figure 2. Formally, the decomposition hierarchy of a reference tessellation is a directed acyclic graph (DAG). Out of the set R of all reference units in a pSRT D, each node in the graph represents a reference unit r0R, while the edges between two nodes ri and rj denote spatial part-of relations between the reference units, i.e. ri¤rj. Reference units can be grouped into partonomic sets S, where S={ri,…,rn},r0R. A partonomic set S of reference units is called non-redundant if none of the reference units in S is spatially part-of another reference unit in S, i.e. S is nonredundant iff ∀ri0S,¬›rj0S : ri¤rj). A partonomic set S of reference units is normalized if all reference units r0S have the same graph-theoretical depth de, i.d. the same distance from the root. S is-normalized iff œri0S,œrj0S : de(ri)=de(rj). An important property of a pSRT is that it can be decomposed into normalized partonomic sets of reference units, with each set representing a specific level of the partonomic hierarchy of the SRT. Because each level of the hierarchy represents a specific granularity of the SRT, we can also speak of the levels-of-detail L of the decompostion tree. The decomposition tree in Figure 2, for example, has four levels-of-detail, with L0 being the least, and L3 being most detailed representation. Qualitative Spatial Footprints Using the qualitative spatial reference model described above, we can approximate the regional extent pn of a place name in terms of a set of reference units Spn={ri,…,rn},ri0R, where R is the set of all reference units in a spatial reference model. To map pn to R, we use a is-defined-as function L:P→2R, where P is the set of all place name regions, and 2R is the power set of all reference units. We call Spn the qualitative spatial footprint of the place name region pn. 472 FLAIRS 2003 Figure 2: A decomposition tree L is based on an evaluation of the topological relation between pn and ri0R. Because both pn and ri represent regions in 2-D Euclidean space, we can use the region connection calculus RCC-8 (Randell et al. 1992) to describe the topological relations between them. For the case where pn is equal to or proper part of a single reference unit ri, its spatial footprint Spn contains only one element ri. If pn overlaps or contains multiple reference units, Spn consists of more than one reference unit. If pn cannot be mapped onto a reference unit, i.e. pn and ri are disconnected or externally connected, Spn is an empty set . L(p) = {ri} EQ(p,ri)wPP(p,ri) {…,ri,…} PO(p,ri)wPP-1(p,ri) {} DC(p,ri)wEC(p,ri) Applied to the decomposition hierarchy depicted in Figure 1, the normalized spatial footprints for the three place names shown in Figure 3 can be defined as SPN1={AAA}, SPN2={AAA,AAB,ABA,ABB,CBA}, SPN3={ABA,ABB,BAA,BAB,BBA}. As long as Spn remains non-redundant, the spatial footprint may consist of reference units taken from different levels of the decomposition hierarchy. For example, SPN2 in Figure 3 could be defined in a nonnormalized form as a combination of reference units from L1 and L3, i.e. SPN2={A,CBA}. This has significant practical implications, as it eliminates the need to go to the highest level of detail to define footprints for place names with a large spatial extension. However, non-normalized spatial footprints have to be normalized if we want to evaluate neighborhood relations and compute spatial relevance (see below). This is achieved with the help of an normalization operator η that recursively decomposes all reference units ri,…,rj in the spatial footprint until œri0S,œrj0S : de(ri)=de(rj). Reasoning about Spatial Relevance A central idea behind “spatially-aware” information retrieval is to provide access to information items based not just on thematic, but also on spatial relevance. This raises the problem of defining the term “spatially relevant”, and finding an appropriate metric for its computation. Figure 3: Three placenames projected onto a polygonal reference tessellation In geographic space, “everything is related to everything else, but near things are more related than distant things” (Tobler 1970). Therefore, we set up the hypothesis that the spatial relevance σ(rq,ri) of a location ri with respect to a query location rq increases with decreasing distance D between ri and rq (Schlieder, Vögele et al. 2001). In the simplest case σ(rq,ri)=1/D(rq,ri). However, the concept of spatial relevance is only useful in a comparative approach. If we consider two locations ri and rj, we can say that a location ri is spatially more relevant than rj to a query location rq if σ(rq,ri) > σ(rq,rj). In a graph-based spatial reference model we can easily compute the graph-theoretical distances (and therefore values for spatial relevance) between nodes. However, because the spatial reference model combines neighbourhood- (i.e. connection-) graphs with a hierarchical decomposition tree, two types of distances interact: • The neighborhood (or horizontal) distance ν(rq,ri) of two nodes rq and ri that are part of the same connection graph. If ν is low, ri is spatially relevant to rq. With increasing neighborhood distance, the spatial relevance between rq and ri decreases. In this paper we focus on the simplest case, where horizontal proximity can be seen as a rough abstraction of Euclidean distance between the centers of the spatial entities. Obviously, this notion will have to be refined, and parameters like the connectivity or the distribution of relative sizes of the spatial entities will have to be addressed as well. • Secondly, there is the hierarchical (or vertical) distance δ(rq,ri) of two reference units rq and ri with respect to the underlying decomposition hierarchy. The semantics of vertical distance are more difficult to grasp as they depend heavily on the semantics of the pSRT and the resulting decomposition hierarchy. Given a pSRT of administrative units, a low δ(rq,ri) means that rq and ri belong to the same administrative super-unit. A high δ(rq,ri) indicates that the reference units are “administratively” far apart. The level of vertical distance between two units in different branches of the decomposition tree increases with the total depth of the hierarchical decomposition. Intuitively, this makes sense: with respect to administrative issues, a county in California is much further away from a (comparable) district in Mexico than it is from a county in Arizona because California and Arizona at least belong to the same nation (USA). In summary, these two criteria for spatial relevance lead to a spatial relevance metric that is based on a combination of the “branch distance” (i.e. the number of nodes between the start node and the first common parent it has with the target node) in the DAG representing the decomposition hierarchy, and the shortest path distances in the connection graph representing a reference tessellation at a specific level of detail. In a prototypical implementation, Dijkstra’s closest path algorithm was used to compute the horizontal distance ν(rq,ri) in the connection graph. The vertical distance δ(rq,ri) was computed by recursively traversing the decomposition tree until the first common parent of rq and ri was reached. The total distance D(rq,ri) was obtained by a linear combination of ν(rq,ri) and δ(rq,ri): D(rq,ri) = αν(rq,ri) + (1-α)δ(rq,ri), and σ(rq,ri) = 1/D(rq,ri) The term α is a weighting factor with a range between 0 and 1. By manipulating α, a spatial query can be fine-tuned to favour either locations that are spatially close to the location of interest (α=1), or locations that belong to the same part of a hierarchical partonomy (α=0). As an example, we used a pSRT of US counties and US states to compute σi(rq,{ri,…,rn}) for different values of α (Figure 4). As query location rq, we chose a county close to a state boundary. Obviously, this location has the maximum spatial relevance (σ(rq,rq)=1). For α=1, a quasi-circular region with counties of decreasing spatial relevance was computed (Figure 4-a). This region depends only on geographic neighbourhood, without taking into account state boundaries. For α=0, all counties in the state which contains the query location were assigned the same spatial relevance because they all belong to the same administrative super-unit (Figure 4-b). All counties in the neighbouring states are uniformly assigned a lower spatial relevance. To demonstrate the effect of the depth of the decomposition hierarchy, we introduced a (somewhat artificial) subdivision into regions. As a result, the states to the north are assigned an even lower σ because they belong to a different region. Finally, for α=0.5, two semi-circles of decreasing spatial relevance were drawn around rq (Figure 4-c). They are separated by the state boundary, the counties in the neighbouring state generally showing a lower spatial relevance than the counties in the “target” state. In an information retrieval task we try to solve queries of the type concept@location, i.e. we try to find information sources that are relevant with respect to a specific thematic concept at or close to a specific location, FLAIRS 2003 473 or place name. In such a context, the value of α depends very much on what we try to find, and why we try to find it. If, for example, we are looking for a vacation home in or close to a specific county, all neighbouring counties are relevant, no matter if they belong to another state or not. On the other hand, if we are in search for suitable property to set up our business, it may be very important which state we are in due to different tax laws and other state-specific legislation. Figure 4: Computed spatial relevance for α=1 (a), α=0 (b), and α=0.5 (c) As a result, the value of α depends on the context of the query, i.e. both on user intent and the semantics of the thematic concept. In our prototype, we assigned a default value of α=0.5 for all queries and left it to the user to adjust this parameter as needed. In future implementations it may be worthwhile to evaluate options for including concept-specific default values for α in the formal description of the query concepts. Future improvements of the system will also include algorithms to cope with multiunit footprints, the evaluation of relations between (overlapping) place names, and the evaluation of spatially relevant regions between a set of spatially disconnected query locations. Results and Discussion In this paper, we showed that polygonal Standard Reference Tessellations can be used to build qualitative, graph-based spatial reference models. These models retain enough spatial information to support the type of reasoning about spatial relevance needed in information retrieval applications. Based on the definition of spatial footprints in terms of the reference units of such standard reference models, spatial relevance reasoning can be extended to place names. Compared to polygonal GIS data, qualitative reference models are light-weight and highly interoperable. This makes them useful for a number of applications, including machine-readable indices of digital maps (Schlieder and Vögele 2002), applications as metadata in highly distributed and ad-hoc networks (Vögele and Schlieder 2002), as well as mobile and location-based services. In this paper, we outlined the basic concepts of graphbased spatial reference models, qualitative spatial footprints, and reasoning about spatial relevance. These concepts will be extended and discussed in more details in papers to come. 474 FLAIRS 2003 References Alani, H., C. B. Jones and D. Tudhope (2001). VoronoiBased Region Approximation for Geographical Information Retrieval with Gazetteers. Internation Journal of Geographical Information Science 15(4): 287-306. Angrick, M., R. Bös and T. Bandholtz 2002. Semantic Network Services (SNS). Proceedings of the 16th conference "Envrionmental Informatics 2002" (EnviroInfo'2002), Vienna. Brandt, L., L. L. Hill and M. F. Goodchild 1999. Digital Gazetteer Information Exchange (DGIE) - Final Report. Digital Gazetteer Information Exchange Workshop. Goodchild, M. 1999. The Future of the Gazetteer. Digital Gazetteer Information Exchange Workshop. Hill, L. L., J. Frew and Q. Zheng (1999). Geographic names: The implementation of a gazetteer in a georeferenced digital library. D-Lib Magazine 5(1). ISO/TC-211 2000. ISO/DIS 19119 - Geographic Information - Services, Norwegian Technology Centre. ISO/TC-211 2000. ISO/FDIS 19115 - Geographic Information - Metadata, Norwegian Technology Centre. Jones, C., H. Alani and D. Tudlope 2001. Geographical Information Retrieval with Ontologies of Place. COSIT 2001, Morro Bay, California. Kuhn, W., S. Basedow, C. Brox, C. Riedemann, H. Rossol, K. Senkler and K. Zens 2000. Geospatial Data Infrastructure (GDI) North-Rhine Westfalia - Reference Model 3.0. Münster, Geoinformatik Münster: 46. OGC 1999. The OpenGIS Abstract Specification, Open GIS Consortium. OGC 2001. OpenGIS Consortium Discussion Paper Basic Services Model 0.0.7. Riekert, W.-F. 1999. Erschließung von Fachinformationen im Internet mit Hilfe von Thesauri und Gazetteers. Management von Umweltinformationen in vernetzten Umgebungen, 2nd workshop HMI, Nürnberg. Schlieder, C. and T. Vögele 2002. Indexing and Browsing Digital Maps with Intelligent Thumbnails. In : Proceedings of the International Symposium on Spatial Data Handling (SDH) 2002, Ottawa, Canada, Springer. Schlieder, C., T. Vögele and U. Visser 2001. Qualitative Spatial Reasoning for Information Retrieval by Gazetteers. Conference on Spatial Information Theory (COSIT) 2001, Morro Bay, California. Tobler, W. 1970. A Computer Movie Simulating Urban Growth in the Detroit Region. Economic Geography. 46: 360-371. Vögele, T. and C. Schlieder 2002. The Use of Spatial Metadata for Information Retrieval in Peer-to-Peer Networks. AGILE2002, Palma de Mallorca, Spain. Worboys, M. F. 1995. GIS - A Computing Perspective. London, Philadelphia. Taylor & Francis.