2. Metadata - International Cartographic Association

advertisement
Emmanuel Stefanakis
Dr. Emmanuel Stefanakis is an Assistant Professor at Harokopio
University of Athens in the Department of Geography. His main
research interests include geographic information systems, spatial
decision support systems, knowledge and database systems, spatial data
infrastructures, cartography, geovisualization, and web technology.
Poulicos Prastacos
Dr. Poulicos Prastacos is Director of Research in the Regional Analysis
Division of the Institute of Applied and Computational Mathematics,
FORTH. His main research interests include GIS applications for
transportation, environment, regional planning; statistical, geographic
and environmental databases; web GIS and applications on the internet;
development of land use, transportation, and regional economic models;
implementation of decision support systems.
SEMANTIC-BASED SPATIAL INFORMATION
INFRASTRUCTURES: INTEGRATING DATA AND
SERVICES INTO A SINGLE COLLECTION
Emmanuel Stefanakis1 and Poulicos Prastacos2
1
Department of Geography, Harokopio University, 70, El. Venizelou Ave., 17671 Athens,
Greece, estef@hua.gr
2
Institute of Applied and Computational Mathematics, Foundation for Research and
Technology, Vassilika Vouton, P.O. Box 1527, 71110 Heraklion, Greece,
poulicos@iacm.forth.gr
Abstract. Nowadays, special attention is given towards the development of effective
spatial data infrastructures (SDI’s) at regional, national or international level. The main
scope of such an infrastructure is to promote the accessibility and usability of geospatial
data. In parallel, there is an urgent need to share geospatial services. This leads to the
development of spatial information infrastructures or service-driven infrastructures. The
paper provides an overview of the role of semantics in a spatial data infrastructure or
service-driven infrastructure (we adopt a common acronym, i.e., SDI), and the alternative
architectures of the middleware between the clients (users and applications) and the servers
(data and services).
1
Keywords. Semantics, Interoperability, Spatial Data Infrastructures (SDI), Spatial
Information Infrastructures.
1. INTRODUCTION
Nowadays special attention is given towards the development of effective spatial data
infrastructures (SDIs) at regional, national or international level (Williamson et al., 2004).
SDIs are frameworks of policies, institutional arrangements, data, services, technologies,
and people with a common scope, i.e., to promote the accessibility and usability of
geospatial content (data and services).
The construction of an SDI presumes that the participating organizations have agreed on
the adoption of common vocabularies, practices, standards, technical specifications and
operational components (Nebert, 2004). Hence, an SDI is not a simple data set or
repository. Instead, an SDI hosts: (a) geographic content (data and services); (b) sufficient
description of this content (metadata); (c) effective methods to discover and evaluate this
content (data catalogues); (d) tools to visualize the data (e.g., web mapping); and (e)
services and software tools to support specific application domains.
An SDI architecture is usually conceptualized as a three-tier architecture (Evans, 2003).
At the top layer (the client) reside the users and applications; at the bottom layer (the
server) reside the geospatial data; and at the middle layer (the middleware) reside all
services that assist the accessibility to the data repositories (Figure 1). Obviously, the
middle layer is the one that facilitates the worldwide access to and the use of online
geospatial data.
Users and Applications
top layer
Discovery & retrieval services
middle layer
Geospatial data
bottom layer
Figure 1. The three-tier architecture of current SDI.
2
Currently, SDI paradigm focuses on the exchange of geospatial data; i.e., the middle layer
supports the discovery and retrieval of data available at the bottom layer; and this is
consistent to the term spatial data infrastructure.
On the other hand, it is widely recognized (e.g., Bernard et al., 2005), that future SDI
research must focus more on the users’ needs and the design of appropriate geoinformation
services and service architectures to satisfy these needs. In other words, the development
of service-driven infrastructures; or spatial information infrastructures will enhance
infrastructures capabilities.
This may be accomplished by enriching both the bottom and middle layers (Figure 2). The
bottom layer will also accommodate geospatial services (i.e., analytical software tools)
along with data. As regards to the middle layer, existing discovery and retrieval services
will be extended to apply to both the data and services available at the bottom layer.
Additionally, the middle layer will incorporate software tools and services that may be
used by the top layer (users and applications) and hence enhance the whole SDI
functionality.
In both architectures the middle layer plays the role of the middleware between the clients
(users and applications) and the servers (data and services). Its role is supported by a set of
mechanisms, as follows (Nebert, 2004): (a) sufficient descriptions of the geospatial content
(both data and services), i.e., the metadata; (b) discovery of distributed collections of
geospatial content through their metadata descriptions, i.e., the catalogue gateway; and (c)
metaphors to discover and access the geospatial content, i.e., the interfaces.
Users and Applications
top layer
Geoprocessing services
middle layer
Geospatial content
bottom layer
3
Figure 2. The three-tier architesture of future SDI.
All mechanisms above must be constructed in such a way that enable the effective sharing
and use of geospatial content. The incorporation of rich semantics at all these components
affects definitely the efficiency of the middleware (Nebert, 2004). Semantics enable the
interoperability between different data collections and services. The scope of this paper is
to highlight the presence of semantics into these components, and discuss the alternative
architectures of the middleware.
In the sequel, we use the single acronym SDI to represent both spatial data and information
infrastructures. Although, several other acronyms can be found in the literature (such as SII
standing for spatial information infrastructures) we maintain the widely used acronym SDI,
which can also stand for service-driven infrastructures.
The discussion is organized as follows. Sections 2 to 4 comment the role of semantics and
related methods in the components of an SDI and how they assist the role of the
middleware. Specifically, Section 2 focuses on metadata; Section 3 on the catalogue
gateway; and Section 4 on the interfaces. Section 5 presents briefly the efforts and products
of the organizations developing standards for SDIs. Then Section 6 shows the alternative
architectures for SDIs. Finally, Section 7 concludes the discussion.
2. METADATA
As regards to the metadata, they convey – by nature – the semantics of the SDI content
(Nogueras-Iso et al. 2005). Metadata helps people who use geospatial content find the data
and services they need and determine how best to use it. Ideally, the metadata structures
and definitions should be referenced to a standard. Metadata standards (Nogueras-Iso et al.,
2005) for geospatial content – although in use – are currently under continuous revision
and testing.
Professional communities have developed metadata standards for geospatial content. These
standards describe both the structure of data and services and the intended semantic
content for specific data themes. Currently, the most prominent metadata standards are
(Nebert, 2004): the Content Standard for Digital Geospatial Metadata – CSDGM (FGDC,
1998); and the ISO 19115 (ISO-TC211, 2003).
4
These standards define the schema for describing geographic information and services and
provide information about the identification, the extent, the quality, the spatial and
temporal schema, the spatial reference, and the distribution of digital geographic data.
Additionally, both standards contain guidelines to develop metadata profiles, which allow
their customization to specific community needs.
CSDGM has been adopted and implemented in the U.S., Canada, U.K. and other countries
around the world. ISO 19115 has been expressed through its companion specification, ISO
19139, into eXtensible Markup Language (XML) and is currently used for the creation of a
North American Profile of Metadata for the U.S., Canada, and Mexico. The same standard
will also be used by the INfrastructure for SPatial Information in Europe (INSPIRE, 2004).
3. CATALOGUE GATEWAY
As regards to the catalogue gateway, semantics play again a key role, provided that they
assist and guide the discovery and evaluation of useful content from an SDI. The catalogue
exploits the metadata of the multiple resources hosted in an SDI.
Obviously, catalogues can be achieved through the use of common descriptive
vocabularies (metadata), a common discovery and retrieval protocol and a registration
system for metadata collections (Nebert, 2004). At this point the issue of interoperability
between metadata schemas emerges. According to the ISO, interoperability is defined as
the capability to communicate, execute programs, or transfer data among various
functional units in a manner that requires the user to have limited or no knowledge of the
characteristics of those units.
The catalogue gateway handles multiple metadata collections, which are distributed and
heterogeneous. The heterogeneity of the metadata collections may be considered as
syntactic, schematic or semantic heterogeneity (Bishr, 1998). Syntactic heterogeneity
refers to the differences in the logical data models (e.g., relational versus object-oriented)
or in spatial representations (e.g., raster versus vector). Schematic heterogeneity refers to
the differences in the conceptual data models (the data are organized in different schemas
and structures). Semantic heterogeneity refers to the differences in meaning, interpretation
or usage of the same or related data. Semantic heterogeneity is further divided into naming
5
(i.e., synonyms and homonyms) and cognitive (i.e., different conceptualizations)
heterogeneity.
Semantic heterogeneity between metadata collections causes most integration problems.
These problems refer to (Bernard et al., 2003): (a) metadata interpretation, and (b) data
interpretation. As regards to metadata interpretation, different providers may use different
terms. This makes difficult for both the humans and the computers to recognize the
coherence between similar terms. As regards to data interpretation, the properties of the
entities involved in an application domain are expressed based in various standardized
(e.g., CORINE vocabulary for land cover classification; EAA, 2000) or not vocabularies.
Data and metadata interpretation problems must be overcome when data from different
sources have to be integrated into a single collection. This requires the adoption of
common terms, which are usually provided by a standardized vocabulary with unchanged
terms. All terms coming from other vocabularies will be translated in the adopted
vocabulary using tools for semantic translation.
The use of ontologies (Fonseca and Egenhofer, 1999; Guarino, 1998; Wache, 2001) as
semantic translators is a possible approach to overcome the problem of semantic
heterogeneity (Bernard et al. 2003). Ontologies play a critical role in associating meaning
with data such as computers can understand and process data automatically. Ontologies
may be used (a) to assist the translation of a term (i.e., a concept) from one vocabulary
(i.e., one ontology) into a term from another vocabulary; and (b) to derive super concept
and sub-concept relationships.
However, currently ontologies are at the best able to define data semantics under very
controlled situations. They cannot yet fully support semantic translation. They fail to cope
with the semantics of services (Bernard et al., 2005). Nowadays GI community addresses
the following three research issues (Fonseca and Sheth, 2003): (a) the creation and
management of domain specific geo-ontologies for both data and services; (b) the
matching geo-concepts available in the web to geo-ontologies; and (c) the integration of
different ontologies as regards to both the geographic dimension and the non-geographic
domain.
6
4. INTERFACES
As regards to the interfaces, geospatial applications have special requirements at the user
interface level. This is because the appropriate visualization of geospatial data may assist
people to discover or evaluate the geospatial content (Adrienko and Adrienko, 1999; Kraak
and Brown, 2000). Hence, the quality of geovisualization products must be promoted.
Semantics associated to geospatial content may be exploited at this stage and lead to
effective geo-visualizations. Current research on the development of metaphors and GUI’s
as well as on issues related to multiple representation and interoperability of geospatial
data (e.g., ISRPS, 2006) may lead to enhanced visualizations.
Last, but not least, the services and software tools hosted by an SDI must be semantic
based in order to be more efficient and assist advanced application domains. The
development of semantic-based software for geospatial applications is an emerging field of
research. Although many semantic-based tools have been developed in the past to assist
geospatial applications, these tools have been designed and implemented patchily without
big interference with other tools or an established research agenda. Currently this situation
changes dramatically, with several workshops aligned to this new trend, e.g., the
Geovisiualization Conference (EURESCO, 2004), the SeBGIS Workshop (OTM, 2005),
etc.
5. DEVELOPING STANDARDS AND SPECIFICATIONS
Open Geospatial Consortium (OGC) in parallel with World Wide Web Consortium (W3C)
and ISO develop standards and specifications to support the interoperability between
repositories
with
geospatial
content.
Table
1
provides
standards/specifications related to information content.
Table 1. Standards and Specifications for geospatial content.
Organization Standard/Specification name
W3C
HyperText Transfer Protocol
(HTTP)
W3C
Simple Object Access Protocol
(SOAP)
W3C
Web
Services
Description
Language (WSDL)
OGC
Web Map Server (WMS)
7
some
representative
OGC
OGC
OGC
OGC
ISO-TC211
ISO-TC211
ISO-TC211
ISO-TC211
ISO-TC211
Web Coverage Server (WCS)
Web Feature Service (WFS)
Geography Markup Language
(GML)
Catalog Service (CS)
Rules for Application Schema
(DIS19109)
Methodology
for
Feature
Cataloguing (DIS19110)
Spatial
Referencing
by
Geographical
Identifiers
(CD19112)
Metadata (IS19115)
Metadata
Implementation
Specification (DTS19139)
The scope of these standards and specifications is to enable an application developer to use
any geospatial content (data and services) available on “the net” within a single
environment and a single workflow (McKee and Buehler, 1998). However, standardization
may not solve the problem of interoperability by itself, because of the following reasons
(Stoimenov and Dordevic-Kajan, 2002):
 The construction and maintenance of a single and integrated model is a hard task.
 The requirement to communicate with geospatial sources that do not conform the
adopted standard will be always present.
 Existing geospatial sources have their own models which may not always be mapped to
the common model without information loss.
 The standards/specifications are subject to continuous change, but systems will not all
simultaneously change to conform.
6. ESTABLISHING THE SYNERGY OF GEOSPATIAL CONTENT SOURCES
The problem of interoperability may be overcome by the adoption of one of the wellestablished methods that allow the synergy of heterogeneous information sources (Garcia
Molina et al, 2000) (Figure 3): (a) federated databases; (b) data warehouses; and (c)
mediators.
Federated databases adopt the simplest architecture. Information sources are independent
to each other. The synergy is accomplished by a separate middleware component
(connector) per source pair (Figure 3a). This component is responsible for the
8
interoperability between the two sources. This architecture is ideal when a limited number
of sources are involved; otherwise numerous connectors must be built and being
maintained. Notice that for N sources, Nx(N-1) connectors are required for a full network.
In data warehouses, the contents of two or more sources are collected and combined into a
common schema. Then a single repository, called warehouse, accommodates all
information content (Figure 3b). The population of the warehouse is accomplished by an
individual content extractor per source and a combiner, which maps the content of each
source into the common schema.
Mediators, similarly to warehouses, provide a unified view to the contents of two or more
sources. However, a mediator does not accommodate any content. Key component of
mediator architecture (Figure 3c) is the middleware, which is called wrapper. A wrapper
plays the role of the translator between the mediator and an individual source. Notice that
the mediator may be able to judge if a query posed by the user may find useful content to
an individual source. This is accomplished through the use of appropriate metadata and
meta-services at the mediator level. Mediator technology is more flexible and would be
ideal for the development of an SDI (Stoimenov and Derdevic-Kajan, 2002). Obviously, it
implementation raises all issues related to semantic heterogeneity as presented in the
previous Sections.
query
result
Warehouse
Source A
Source B
Combiner
Source C
Extractor
Extractor
Source A
Source B
Source D
(a)
(b)
9
result
query
Mediator
query
result
query
result
Wrapper
query
Wrapper
result
result
query
Source B
Source A
(c)
Figure 3. Alternative methods to allow the synergy of heterogeneous information sources:
(a) federated databases (the arrows represent the connectors); (b) data warehouses (the
arrows represent content flow); and (c) mediators (the arrows represent content flow).
7. DISCUSSION
The paper provides an overview of the role of semantics in a spatial data infrastructure or
service-driven infrastructure (SDI), and the alternative architectures of the middleware
between the clients (users and applications) and the servers (data and services).
It is widely recognized, that the integration of semantics (of both data and services) at all
components of an SDI middleware (metadata, gateways and interfaces) may promote the
SDI functionality and usefulness. Currently, technology provides enhanced tools and
methods for implementing this task (models, repositories, s/w tools); however GI science is
not mature to make full use of them, provided that the issues of semantic heterogeneity
must be overcome.
REFERENCES
Adrienko, G., and Adrienko, N., 1999. Interactive maps for visual data exploration.
International Journal of Geographical Information Science, vol. 13 (4), pp.355-374.
Taylor-Francis.
Bernard, L., Craglia, M., Gould, M., and Kuhn, W., 2005. Towards an SDI research
agenta. In the Proceedings of the 11th EC-GI & GIS Workshop – ESDI: Setting the
Framework. Sardinia, Italy.
Bernard, L., Georgiadou, Y., and Wytzisk, A., (Eds) 2005. Position Papers. First Research
Workshop on Cross-learning on Spatial Data Infrastructures and Information
Infrastructures. ITC, The Netherlands.
10
Bernard, L., Einspanier, U., Haubrock, S., Hübner, S., Kuhn, W., Lessing, R., Lutz, M. and
Visser, U., 2003. Ontologies for Intelligent Search and Semantic Translation in
Spatial Data Infrastructures. Photogrammetrie - Fernerkundung - Geoinformation.
Ausgabe
6.
Seite(n)
451-462.
http://ifgi.unimuenster.de/~lutzm/pfg03_bernard_et_al.pdf
Bishr, Y., 1998. Overcoming the semantic and other barriers to GIS interoperability.
International Journal of Geographical Information Science. Special Issue on
Interoperability in GIS. Vol. 12, No 4. pp. 299-314, Taylor & Francis.
EAA, 2000. CORINE Land Cover: Technical Guide. European Environmental Agency.
EURESCO, 2004. Geovisualisation: EuroConference on Methods to Support Interaction in
Geovisualisation Environments. Kolymbari, Crete, Greece. ESF.
Evans, J.D., 2003. Geospatial Interoperability Reference Model. FGDC, WG GAI.
http://gai.fgdc.gov/girm/v1.0/
FGDC, 1998. Content Standard for Digital Geospatial Metadata. Version 2.0. Federal
Geographic Data Committee.
Fonseca, F., and Egenhofer, M., 1999. Ontology-driven information systems. In the
Proceedings of the 7th ACM Symposium on Advances in GIS. Kansas, MO, pp. 14-19.
ACM Press.
Fonseca, F., and Sheth, A., 2003. The GeoSpatial Semantic Web for UCGIS research
priorities.
UCGIS.
http://www.ucgis.org/
priorities/research/2002researchPDF/shortterm/e_geosemantic_web.pdf
Guarino, N., 1998. Formal ontology in information systems. In the Proceedings of the
Formal Ontology in Information Systems (FOIS). Trento, Italy, pp. 3-15. IOS Press.
INSPIRE, 2004. The INSPIRE Proposal. The INfrastructure for SPatial InfoRmation in
Europe. http://www.ec-gis.org/inspire/
ISO-TC211, 2003. Text for FDIS 19115 Geographic Information – Metadata. Final Draft
Version. International Organization for Standardization. http://www.isotc211.org/
ISPRS, 2006. Workshop on Multiple Representation and Interoperability of Spatial Data,
Hanover, Germany, February 22-24, 2006. ISPRS. http://www.ikg.unihannover.de/isprs/ workshop.htm
Kraak, M.J., and Brown, A., 2000. Web Cartography. CRC Press.
McKee, L., and Buehler, R., 1998. The Open GIS Guide. Open GIS Consortium, Inc.
http://www.OpenGIS.org/techno/ specs.htm
Garcia Molina, H., Ullman, J.D., and Widom, J., 2000. Database Systems Implementation.
Prentice Hall.
Nebert, D.D., (Ed.) 2004. Developing Spatial Data Infrastructures: The SDI Cookbook.
Version 2. GSDI Publications. http://www.gsdi.org/docs2004/Cookbook/cookbookV2.0.pdf
Nogueras-Iso, J., Zarazaga-Soria, F., and Muro-Medrano, P.R., 2005. Geographic
Information Metadata for Spatial Data Infrastructures: Resources, Interoperability
and Information Retrieval. Springer.
OTM, 2005. 1st International Workshop on Semantic-based GIS. Agia Napa, Cyprus,
November 3-4, 2005. http://www.cs.rmit.edu.au/fedconf/sebgis2005cfp.html
Stoimenov, L., and Derdevic-Kajan, S., 2002. Framework for semantic GIS
interoperability. Facta Universitatis. Ser. Math. Inform. 17, pp. 107-125.
Wache, H., Voegele, T., Visser, U., Stuckensmhmidt, H., Scuster, G., Neumann, H., and
Huebner, S., 2001. Ontology-based integration of information, a survey of existing
approaches. Workshop on Ontologies and Information Sharing, IJCAI’01. pp. 108117.
11
Williamson, I.P., Rajabifard, A., and Feeney, M.E.F, (Eds) 2004. Developing Spatial Data
Infrastructures: From Concept to Reality. Taylor & Francis Group.
12
Download