Provided for non-commercial research and educational use only. Not for reproduction, distribution or commercial use. This article was originally published in International Encyclopedia of Human Geography, published by Elsevier, and the attached copy is provided by Elsevier for the author’s benefit and for the benefit of the author’s institution, for non-commercial research and educational use including without limitation use in instruction at your institution, sending it to specific colleagues who you know, and providing a copy to your institution’s administrator. All other uses, reproduction and distribution, including without limitation commercial reprints, selling or licensing copies or access, or posting on open internet sites, your personal or institution’s website or repository, are prohibited. For exceptions, permission may be sought for such use through Elsevier’s permissions site at: http://www.elsevier.com/locate/permissionusematerial Schuurman N. 2009. Spatial Ontologies. In Kitchin R, Thrift N (eds) International Encyclopedia of Human Geography, Volume 1, pp. 377–383. Oxford: Elsevier ISBN: 978-0-08-044911-1 © Copyright 2009 Elsevier Ltd. Author's personal copy Spatial Ontologies N. Schuurman, Simon Fraser University, Burnaby, BC, Canada & 2009 Elsevier Ltd. All rights reserved. Glossary Formalization It is the process through which ideas, text, and spatial entities are translated into digital objects so that they can be manipulated in a computer. It is the process of rendering the nondigital world into code so that it can be computed. Interoperability It is the process of exchanging data and code between diverse computing systems and databases. It frequently involves conflation of near but nonidentical categories or computing protocols. Ontology It means something very different in philosophy than it does in information systems. In the former, it means the question of being. In computing and information sciences, an ontology is a formal universe in which each entity is precisely defined and its relationship with every other entity in the specific categorical or computing realm is also predetermined. Ontologies in this context are the range of what is possible. They can be thought of as simply a classification system or a data dictionary. Definition: Spatial Ontologies The term ‘spatial ontologies’ is an amalgamation of two words that point to very different concepts. Spatial can be distinguished from geography as the former refers to space while the latter refers specifically to the Earth’s surface. Space is a concept that has many manifestations at scales from the interior of cells to entire galaxy and ultimately the universe. Space often refers to two dimensions but can be extended to three and even four dimensions. Space can be mathematical (e.g., a set) or philosophical (e.g., epistemological implications of a conceptual framework). Space and the adjective spatial are in effect complex descriptions of a concept that embodies area at a range of scales and epistemologies. Adding the noun ontology to the adjective spatial is a complicating strategy. Ontology is multifaceted term as it is understood very differently across disciplines. Human geographers and social scientists have interpreted ontology in an Aristotelian manner. In this guise, ontology signifies a foundational reality and is often referred to as the essence of an object or phenomena. Alternatively, ontologies answer the question: what must the world be like – or what conditions must exist – for us to perceive and know something. Computing scientists and systems theorists, on the other hand, use an engineering interpretation of the word ontology. In this view, an ontology is a complete and internally logical system (e.g., a classification system). In cartography and geographic information systems (GISs), ontology is interpreted in the information science tradition. Each ontology in this guise is a unique statement of logic, a way of describing spatial entities from one perspective or knowledge system. Formal ontologies comprise a logical universe, and an ontology in a formal environment is equivalent to a logical theory – but is often expressed as a classification system in databases. Formal or systems ontologies have three levels of convention: representational, communication protocol, and content specification. The realm of existence is limited to that which can be represented (often in a database). Domain knowledge must be formally declared and this becomes its own world of discourse. Ontologies thus described appear circumscribed. They lack the mystery of philosophical ontologies yet this requirement of exact specification is a means of communicating different epistemologies within an information system. Each spatial ontology is a product of an epistemology – or a way of knowing the world. There is a tradition of essentialism in information sciences which argues that the world is divided into natural classes that reflect an essential reality and these classes can be discerned. This epistemological view relies on the power of human observation to divide artifacts into natural classes. There is increasing recognition, however, that scientific classification is a mixture of metaphysics and epistemology and that the classification systems we use to represent the world are dependent on perspective, measurement techniques, and agenda. In geographic information science (GIScience), many scholars now acknowledge that context shapes the formation and selection of categories. Changes in context – or point of view – lead to a shift in perceived and/or enumerated categories. Each representation of categories leads potentially to an alternative ontology that could be implemented in a computational environment. Though there are GIScientists who would argue that categories are based on naturally occurring, stable categories, most people agree that GIS must be flexible enough to represent more than one system of classification. Multiple ontologies permit multiple stakeholders with divergent agenda to be represented in a single GIS. In a world with multiple spatial ontologies, scientific classification is just as vulnerable to error as lay classification. Increased recognition of the role of epistemology 377 Author's personal copy 378 Spatial Ontologies in creating ontologies is a radical departure from objectivism which is premised on one ‘real world’ independent of the human mind and social context. When we talk of multiple spatial ontologies, we are incorporating recognition that each of those ontologies represents an epistemology and a point of view. How Do Spatial Ontologies Look Like? A spatial ontology then is a formal list of objects or terms that are permitted within a specified range set by a group. Medical data registries, for instance, contain fields (columns in the database) that are strictly controlled. Using the example of a Trauma Registry, we find a number of fields including injury date, ICD 10 codes, postal code of patient’s usual residence, injury severity score (ISS), external cause code, etc. In combination, these database fields comprise a classification system. Such a classification system, like many in common use, implicitly describes the developers’ epistemology – or ways of knowing and state of understanding – in that discipline. The ontology can be considered spatial because even though it describes nonspatial attributes (such as patient’s date of birth), the injury location is a critical element. A Trauma Registry from a different country might include quite different fields or more commonly the interpretation or scale of the data in individual fields may vary widely. In one registry, location might refer to postal code while in another it may be entered using Global Positioning System (GPS) coordinates. Likewise, injury description may be described using a different coding system (e.g., ICD 9). One of the challenges of working with multiple ontologies is that comparing and integrating them may be difficult because each is developed based on differing assumptions about the world. And as the state of understanding in a field changes, ontologies change in subtle and often imperceptible ways. A reasonable question at this point might revolve around the difference between a simple classification system and an ontology. In a computational interpretation of ontology, the only discernable product is the database. But before an ontology is fully developed, the stakeholders might consider the skeletal classification system the ontology. Moreover, geologists consider a map legend a fully fledged ontology while a GIScientist might consider a range of spatial entities (e.g., mountain, volcano, valley, and foothills) an ontology. And a database analyst may refer to a data dictionary as an ontology. The truth is that there is a range of acceptable interpretation of spatial ontology in GIScience as illustrated in Figure 1. But every ontology is linked by its relationship to a particular epistemology and its role in determining the range of acceptable categories in a classification. Moreover, ontologies in information science are ultimately translated into formal terms that can be implemented computationally. Formalization is the process through which ideas and objects are described in terms that can be encoded digitally so that the objects can be manipulated in a computer. It is this process of formalization that separates organized but unstructured thoughts and conceptual frameworks from computational data structures populated with data. A necessary stage of formalization is the creation of succinct coherent ontologies. Formalization and Ontologies GIScientists frequently refer to ‘formalization’ as the process of rendering concepts into a form that can be ultimately represented in a digital environment. This is more easily said than done. There is a great difference between the abstract concepts that we routinely use to describe the world and the formal language – and computing code – that is required to implement technology. GIS, like all information sciences, is based on a hierarchical relationship between conceptualization and formalization. If we think of conceptualization as a step toward understanding spatial processes and relationships, there remains a pressing need to express these relationships in a mathematical or formal notation as a precursor to coding them. The trade-off, however, is information loss associated with formalization. There is a danger that we suppress detail in favor of abstract and formal relationships. On the other hand, abstraction – based on information loss – is a powerful way of understanding structure. The process of abstraction assists us in making sense of the physical and human world and, in GIS, we use abstractions to model reality. The process of formalization is based on abstraction of concepts into a symbolic form. A mechanism for moving between cognitive impressions of reality and database representations of it is required, however, to achieve this level of abstraction. In a seminal paper by Jean Brodeur and others, the researchers describe five ontological stages associated with geospatial data and analysis. At the top level is reality (R), followed by cognitive models of reality (R1). The third stage consists of conceptual representations and is referred to as R2. The fourth ontological stage (R3) converts concepts to database representations. Finally, the fifth ontological level (R4) is the set of spatial concepts that can be produced from the database. Figure 2 illustrates the conceptual model of ontologies. To human geographers, such a taxonomy might seem mechanistic, but it is essential in GIS to be able to express cognitive models in a formal language as a Family Cebidae Family Aotidae Family Pitheciidae Family Atelidae Family Tarsiidae Parvorder Catarrhini Superfamily Cercopithecoidea Family Cercopithecidae Superfamily Hominoidea Family Hylobatidae Family Hominidae Infraorder Simiiformes Parvorder Platyrrhini Infraorder Tarsiiformes Suborder Haplorrhini Infraorder Lemuriformes Superfamily Cheirogaleoidea Family Cheirogaleidae Superfamily Lemuroidea Family Lemuridae Family Lepilemuridae Family Indriidae Infraorder Chiromyiformes Family Daubentoniidae Infraorder Lorisiformes Family Lorisidae Family Galagidae ORDER PRIMATES Suborder Strepsirrhini Data Element Patient—diagnosis date, DDMMYYYY 270544 NHIG, Standard 01/03/2005 The date on which a patient is diagnosed with a particular condition or disease. Patient—diagnosis date The date on which a patient is diagnosed with a particular condition or disease. Patient Diagnosis date Date Date/Time DDMMYYYY 8 (d) Source: Australian Institute of Health and Welfare (2006). National Health Data Dictionary, Volume 2. Collection and usage attributes Comments: Classification systems, which enable the allocation of a code to the diagnostic information, can be used in conjunction with this metadata item. Source and reference attributes Submitting organisation: Cardiovascular Data Working Group Relational attributes Related metadata references: Supersedes Date of diagnosis, version 1, DE, NHDD, NHIMG, Superseded 01/03/2005 Implementation in Data Set Cardiovascular disease (clinical) DSS NHIG, Superseded 15/02/2006 Specifications: Cardiovascular disease (clinical) DSS NHIG, Standard 15/02/2006 Data element attributes Representational attributes Representation class: Data type: Format: Maximum character length: Value domain attributes Data element concept: Definition: Object class: Property: Data element concept attributes Metadata item type: Technical name: METeOR identifier: Registration status: Definition: Identifying and definitional attributes DATE OF DIAGNOSIS (c) Figure 1 Multiple interpretations of ontology. (a) Illustrates a map legend. For many geologists and cartograhers, the legend represents the entire formal universe of an ontology. In (b), data models represent different ontologies. (c) Is a classification systems or taxonomy – another interpretation of an ontology. (d) Illustrates a data dictionary that defines exactly what terms can be represented spatially and how they are defined. Each of these are alternative iterations of ontology. (b) (a) River/Lake Private Land Farming Park Sightings Food/Gas Camping Protected Roads LEGEND (c) Author's personal copy Spatial Ontologies 379 Author's personal copy 380 Spatial Ontologies R: Reality R1: Cognitive models of reality R2: Conceptual representations of reality R3: Database representations of reality Feature Cabin Storage Road Bridge Lake Stream Woods 1 Woods 2 Bush 1 Bush 2 Bush 3 ID 0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 0011 Type Point Point Line Line Polygon Line Polygon Polygon Point Point Point Length 50 7 70 - Area 500 500 300 - Man-made Yes Yes Yes Yes No No No No No No No 4 R : Spatial concepts that can be produced from the database Feature Cabin Storage Road Bridge Lake Stream Woods 1 Woods 2 Bush 1 Bush 2 Bush 3 ID 0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 0011 Type Point Point Line Line Polygon Line Polygon Polygon Point Point Point Length 50 7 70 - Area 500 500 300 - Man-made Yes Yes Yes Yes No No No No No No No Figure 2 Five levels of abstraction associated with computational ontologies. The highest level of reality is the world as it is – a traditional philosophical interpretation of ontology. R1 represents the way that humans perceive the world – transformed through cognition. When we conceptualize reality, we alter it again (R2). Database representations of reality (R3) introduce another profound transformation and the spatial concepts that can be produced from a database (R4) are further restricted. In this illustration, we see how at each step, representations of the world are altered (and limited) as we move through different levels of abstraction. For example, the first viewer conceives the scene of the forest, house, pond, and bridge differently than she is able to convey it to the interlocutor. And representations of reality are again transformed when articulated in databases and re-articulated as spatial concepts that can be computationally displayed. Author's personal copy Spatial Ontologies precursor to coding it. R2 or the set of conceptual representations marks the fuzzy dividing line between the conceptual and implementational realms. A structured vocabulary must be used to describe objects and object classes. Moreover, database representation requires a method of embedding context. For example, we need to know whether the word ‘crown’ refers to the degree to which a tree blocks light in a forest or a metal adornment intended to rest on a queen’s head. In everyday (nondigital) life, such meaning is extracted from verbal or environmental context. In a digital environment, it must be supplied. Context is necessary for understanding implied use, background, and common meaning of spatial entities. Ontologies are a way of encoding context. Ontologies can exist, be described and understood, but this does not mean that they have been formalized. Formalization is the critical step referred to by Brodeur’s R3 and R4. Formalization allows us to take descriptive concepts and render them digital; it makes them real in computational terms. GIScientists are currently developing methods of formalizing and encoding diverse ways of looking at the spatial world – a critical component of ontology research. 381 development of databases – as a means of ensuring that the terms of classifications can be recognized in different settings. The goal of maintaining ontological context is consistent with one of the chief objectives of present-day GIScience: interoperability. There is a trend toward distributed digital knowledge with different institutions housing different databases that are occasionally integrated for comparative or global analyses. Such institutions, including businesses, government agencies, and university researchers, need ways to integrate spatial information even when it is described from different epistemologies at different levels of abstraction using multiple schematics (e.g., different database organization). Sharing spatial data and attributes requires stability of language across culture and geography – assumptions that are seldom true. Researchers are forced to develop methods of delivering intended meaning along with field names in databases in order to allow ‘intelligent’ semantic sharing. Semantic interoperability implies translating concepts into code and back again. The work of semantic interoperability is enhanced by thinking of each unique set of spatial objects as part of an ontology – with its own context. Ontology Research in GIScience Ontology research in GIScience began in the mid-1990s when a number of scientific journal papers first explored the issue. Predictably, there were many variations in interpretation of ontology but they were linked by the assumption that (1) we need to represent multiple epistemologies; and (2) formalization is a constraint in representation at the computational level. There is a common understanding that concepts must be filtered through a formalization process. The process of formalization is the designated arena for fixing meaning – for creating a fixed universe of discourse. Representing multiple ways of seeing and understanding the world in a computer has been tackled by a number of researchers in different ways. One way is to use semantic networks that employ node–arc structures from graph theory to achieve this goal. To achieve this, nodes are equated with spatial entities or concepts and arcs with relationships between the entities. Close concepts are physically close on the graph structure. The goal of the semantic proximity approach is to identify close or near concepts and label database elements (e.g., mountains or valleys) so that proximate concepts and entities can be linked. Other researchers have developed complex sets of inclusion rules that must be incorporated during classification of entities. Whether or not a spatial entity is included in a future classification can be theoretically determined by the rules. Each of these methods is linked by emphasis on including context in the Spatial Ontologies of Land Use: An Example Land-use categories are used by municipal planners to organize and categorize public and private lands under their purview. Land use refers to the activities occurring on a particular piece of property or area of public land. Land-use classification is intended to include both the intent and the reality of how a given land area within a metropolitan area may be altered by habitation and human use. Land-use categories, however, vary widely among communities and most have different taxonomies (e.g., ontologies). Because it is complicated by semantic heterogeneity, integration of land-use taxonomies is complicated. Each municipality collects its land-use data using different assumptions about categories. These epistemological and ontological differences become encoded in the data – often with no record. Though the distinctions are often subtle (e.g., skateboard parks might be included as ‘green space’ in one community but excluded in another because they are concrete), they affect the reliability of the data. Data always contain gray zones of meaning. While they are being used in the context in which they are collected, this is usually not a problem. The meaning of green space in the dataset is implicit, for example, in the offices of individual municipal governments. But if that database is moved to a different municipality, workers there might interpret it differently – most likely in the same way that their community Author's personal copy 382 Spatial Ontologies presently understands the term. When data are integrated in broader datasets, however, or sold as secondary datasets, their contextual meaning can be lost resulting in the propagation of uncertainty in the attendant analysis. Once subtle distinctions are encapsulated with the data, users make assumptions about their meaning. For example, neighboring municipalities might integrate their land-use categories to develop a regional classification. Superficial integration of the datasets based on the assumption that categories such as ‘green space’ or ‘recreation centre’ mean the same thing in every jurisdiction embeds error into the resulting merged data. The problem, in effect, is one of different ontologies. Comparing urban land use across municipal lines in the Greater Vancouver metropolitan area of British Columbia (BC), Canada, is a telling example of the problems associated with different spatial ontologies that affect everyday GIS and cartographic practices. The Greater Vancouver Regional District (GVRD), located within the province of BC, houses over 2 million people residing within 21 municipalities. It constitutes a single contiguous metropolitan area. Land-use classification is a municipal government responsibility in the GVRD. There is no regional land-use classification standard, and resulting taxonomies of land use are disparate and thereby incommensurable. For example, one community may identify ‘single family’ as a land-use designation; this designation may include two or three different types of single-family types in a competing classification (see Table 1). Municipalities characterize land use by employing codes to describe differences. Table 2 illustrates the superclasses used by the neighboring cities of Vancouver and Burnaby. The superclasses differ somewhat as would be expected – but the differences are more profound than might first appear. The city of Burnaby uses detailed codes to identify land use at the scale of individual properties. They give each property outline a very specific code. For example, the numerical code range 000– 099 constitutes the ‘residential series’ and describes various housing types and their associated forms of tenure. The city of Vancouver, by contrast, operates with many fewer subclasses – and their database is much more general as a result. The general categories may be similar, but the problem is that there is no way of digging deeper. Fundamental ontological distinctions are a product of epistemological differences in data collection as well as varying institutional cultures in the individual municipalities. When these ontologies’ differences are accounted for, one realizes that a trivial integration of the superclasses contains uncertainty. While the mapping presented in Table 2 is schematically legitimate, it fails to convey the true (lack of) symmetry between the land-use classes. The city of Vancouver bases its land-use coding on municipal ‘zoning’, a legal classification encoded into, and mandated by, city policy (bylaws). Zones are determined for large areas at a time, and they encompass many individual properties. Each property is assigned a land-use ‘code’ based on which zone their property lies in. Land-use information for Vancouver is based on the ‘what should be there’ principle. By contrast, the city of Burnaby determines based on the activities associated with individual cadastral parcels – as determined by the taxation authorities. The city of Burnaby also determines its own municipal zoning. Unlike Vancouver, it does not consider zoning and land use as equivalent. These epistemological differences in how Table 2 Cadastral land-use classification for the cities of Vancouver and Burnaby Table 1 Number of different zoning designations for various categories including of industrial, low-rise residential, high-rise residential and single-family residential land use City of Coquitlam City of North Vancouver City of Port Coquitlam City of Richmond City of Surrey Industrial Res low Res high Single family 9 7 2 1 2 1 5 3 5 2 2 1 7 5 1 2 1 2 1 3 The table illustrates that the five sample communities have a varying range of subclassifications. Each land use categories and its subclassifications constitute an ontology. Schuurman, N., Leszczynski, A., Fiedler, R., Grund, D. and Bell, N. (2006). Building an integrated cadastral fabric for higher resolution socioeconomic spatial data analysis. In Riedl, W., Kainz, A., & Elmes, G. (eds.) Progress in Spatial Data Handling, 12th International Symposium on Spatial Data Handling. Berlin: Springer. Vancouver ‘District schedules’ Burnaby Land-use ‘series’ One-family dwelling Two-family dwelling Multiple dwelling Limited agricultural Industrial Commercial - no corresponding class - Residential series Cultural and recreation (a subclass) Historic area - no corresponding class Comprehensive development Farm series Industrial series Commercial series Utility series includes railways, pipelines, etc. Institutional series government buildings, etc. - no corresponding class - Note that the superclasses are not commensurate but can be manually synced. From Schuurman, N. and Leszczynski, A. (2006). Ontologybased metadata. Tranactions in Geographic Information Science 10(5), 709–726. Author's personal copy Spatial Ontologies land use is assigned in two different communities reveal ontological differences. The differences can be summarized as the distinction between ‘what should be there’ versus ‘what is actually there’. As a result, it is difficult to directly compare land-use categories between Burnaby and Vancouver. The two classification schemes are irreconcilable at a very basic level because they emerge from two different institutional contexts – in other words, they represent two separate, though overlapping, ontologies. The example of land use reveals the impact that different ontologies have on spatial data – such as land use – that may otherwise seem unproblematic. If data users are to understand their data from a qualitative perspective, they need to think about ontologies. Are the two ontologies comparable based on their classification systems; attribute characteristics; terminology; naming conventions; rationale for collection; scale; and measurement systems? Collectively, such parameters constitute data ‘context’. Context is likewise the basis for understanding how different spatial ontologies shape data. Spatial Ontologies: Present and Future In the past decade, GIScience researchers have become increasingly concerned with optimal methods for representing multiple epistemological perspectives about the same spatial events and entities. At the same time, there is a pressing need to attach contextual information to database elements in order to identify different ontologies in the interest of enhancing interoperability of data across institutions and geography. Spatial ontologies enable multiple stakeholders to represent different scenarios, agenda, and interpretations of the geographic world. Formal ontologies are a means of implementing these diverse views in a structured digital environment. Acknowledging and understanding alternate perspectives on the same geographical 383 phenomena or spatial framework is a means of preserving heterogeneity of perspective with implications for democratic policy and governance. Spatial ontologies are a means of achieving these goals in a computational environment. See also: Critical GIS; Critical Realism/Critical Realist Geographies; Feminism, Maps and GIS; GIS and Cartography; Maps. Further Reading Agarwal, P. (2005). Ontological considerations in GIScience. International Journal of Geographical Information Science 19(5), 501--536. Bowker, G. (1996). The history of information infrastructures: The case of the international classification of diseases. Information Processing and Management 32(1), 49--61. Bowker, G. C. (2000). Biodiversity datadiversity. Social Studies of Science 30(5), 643--683. Gregory, D. (2000). Ontology. In Johnston, R. J., Gregory, D., Pratt, G. & Watts, M. (eds.) The Dictionary of Human Geography. Oxford: Blackwell. Harvey, F., Werner, K., Hardy, P. and Yaser, B. (1999). Semantic interoperability: A central issue for sharing geographic information. Annals of Regional Science 33, 213--232. Kuhn, W. (2001). Ontologies in support of activities in geographical space. International Journal of Geographical Information Science 15(7), 613--631. Schuurman, N. (2006). Formalization matters: Critical GIS and ontology research. Annals of the Association of American Geographers 96(4), 726--739. Schuurman, N. and Leszczynski, A. (2006). Ontology-based metadata. Tranactions in Geographic Information Science 10(5), 709--726. Schuurman, N., Leszczynski, A., Fiedler, R., Grund, D. and Bell, N. (2006). Building an integrated cadastral fabric for higher resolution socioeconomic spatial data analysis. In Riedl, W., Kainz, A. & Elmes, G. (eds.) Progress in Spatial Data Handling, 12th International Symposium on Spatial Data Handling. Berlin: Springer. Smith, B. and Mark, D. M. (1998). Ontology and geographic kinds. Paper read at The 8th International Symposium on Spatial Data Handling. Vancouver, BC, Canada. Winter, S. (2001). Ontology: Buzzword or paradigm shift in GI science? International Journal of Geographical Information Science 15(7), 587--590.