Harnessing the Geospatial Semantic Web: Toward Place

advertisement
Harnessing the Geospatial Semantic Web: Toward Place-Based Information
Organization and Access
Marcy Bidney, Pennsylvania State University (maplibrarian@psu.edu) 1 Central Pattee
Library, University Park, PA 16802
Marcy is Head of the Donald W. Hamer Maps Library at Penn State University and serves as
the Geospatial Information Librarian. She was the Chair of the Map and Geospatial
Information Roundtable of ALA in 2010. Her research interests include alternative access
modes for map and data collections, geographic education/literacy, the history of
cartography and the spatial humanities.
Kevin Clair, Pennsylvania State University (kmc35@psu.edu) 126 Paterno Library,
University Park, PA 16802.
Kevin is the Metadata Librarian at the Penn State University Libraries, a position he has
held since 2007. His research interests include the uses of linked open data in libraries, the
economics of metadata creation in a research library, and the uses of digital collections in
local and community history.
Abstract
The geospatial semantic web’s primary use to date has been the creation of map mashups,
collaborative mapping projects, and other research functions. The power of the geospatial
semantic web can also be harnessed for the development of place-based access points to
further the use of information collections – digital and print. Creating a geographic search
interface for information collections allows users to search by location. The basic principles
of linked data, describing entities using unique identifiers and provide links between
related objects, tie into the desire for libraries to link their own digital resources with
related materials held by other cultural institutions publishing content on the Web. This
paper will provide an overview of linked data principles, discuss the benefits and
challenges of providing geographic information in metadata records, and provide examples
of how location based searches are valuable to users, and offer opportunities for future
research.
Introduction
As understanding and development of Web technology deepens, a variety of useful means
for information organization and access emerge. The development of a geospatial semantic
web utilizing linked data has been most promising in changing how information is
organized, displayed and accessed. While the geospatial semantic web’s primary use to
date has been the creation of map mashups, collaborative mapping projects, and other
research functions, the power of the geospatial semantic web can also be harnessed for the
development of place-based access points to further the use of information collections –
digital and print.
Organizing information based on location is a powerful idea – it has the capacity to bring
together information from diverse communities of practice that a researcher may never
have considered. While this is certainly not a new idea, the technology is now just getting to
a point where this idea can become a reality. “Place” is interdisciplinary and the creation of
geographic search interfaces utilizing linked data principles enables access across a variety
of information collections in powerful and innovative ways. The developers of new
generation library catalogs should be thinking outside of traditional text display methods
and look forward to the creation of more visual displays of search results. Developers have
much to learn from the rapid increase in the use of geospatial information to generate
mashups of data and information on the web and the display of this data on maps. The
creation of a geographic search interface for information collections allows users to search
for an item according to its location; simply by clicking on a given location on a map users
can explore library collections of all kinds of materials related by place. The geospatial
semantic web makes this idea more realistic through the use of linked data to expose
connections between bits of information that otherwise may not have been revealed
through a simple text search in a traditional library catalog. This method also exposes
relationships in a visual context that is inherently more meaningful than textual
connotations.
Tim Berners-Lee and his co-authors wrote in Scientific American (Berners-Lee et al. 2001)
about the Semantic Web describing it not as “a separate Web but an extension of the
current one, in which information is given well-defined meaning, better enabling
computers and people to work in cooperation.” In the same article the authors say in order
“…for the Semantic Web to function, computers must have access to structured collections
of information and sets of inference rules that they can use to conduct automated
reasoning.” It would stand to reason then, that one of the most useful places for Semantic
Web technologies, such as linked data, to take root would be in libraries where structured
collections of information have existed for centuries. Ten years later, Gillian Byrne and Lisa
Goddard (Byrne and Goddard 2010) refer to the promise of semantic web capabilities as
“dazzling”. In making the case for linked data, they provide examples of how to implement
it successfully in the library environment. Writing about the uses of linked data for libraries
they can now make such statements that technology is “no longer the major obstacle to
linked data implementation.”
Linked open data provides a powerful tool for enhancing access to library catalog records.
The purpose of linked data is to provide connections between related, but unconnected
data on the Web, in much the same way that Web pages are currently linked together. In
recent years librarians have seized upon the linked open data concept as a means of
sharing metadata about library collections and repurposing that content for new audiences
and applications. The basic principles of linked data—to describe entities using unique
identifiers, and provide links between related objects using these identifiers in order to
enrich them both—tie into the desire for libraries to link their own digital resources with
related materials held by other libraries, cultural institutions, or other entities publishing
content on the Web.
2
This paper will provide an overview of linked data principles, discuss the benefits and
challenges of providing geographic information in metadata records, the development of
standards for interface design, provide examples of how location based searches are
valuable to users, and offer opportunities for future research.
Linked Data in Practice
Linked open data is built on two foundations. First is the "technology stack" of transmission
protocols (such as HTTP) and data markup and serialization standards (such as RDF and
OWL) that encode the data. Second are the controlled vocabularies governing the terms
that may be linked, provided through commonly utilized linked data thesauri such as
DBpedia. These provide not only the terms for describing objects, but also their associated
URIs, so that links to additional resources may be made. Much of the early work done by
cultural institutions to integrate themselves into the linked open data space has been
focused on publishing authority data in a form that may be linked by themselves and by
other domains. Examples of this work include the Library of Congress authority files, the
Virtual International Authority File, and the Rameau subject headings published by the
Bibliothèque Nationale de France.
For geospatial applications, the key linked open data hub is the GeoNames service,
http://geonames.usgs.gov/, which provides persistent URIs and metadata about a variety
of geographic locations. One of the key features of GeoNames is that each entity described
in its database includes an RDF serialization, allowing for direct links from other linked
data-enabled services. In this way GeoNames has positioned itself as an essential data
source for any developer working on a linked data application with a geospatial component.
There are many benefits to being able to ingest geospatial metadata from linked data
vocabularies using existing metadata in library catalog records. By integrating links to
widely-used linked data vocabularies into the catalog, libraries can take advantage of the
rich metadata encoded in those vocabularies and enrich their own metadata substantially.
By using this two-way relationship, they may build additional interfaces and improve the
coverage of their records, including support for temporal and spatial interfaces. From this
they may be able to provide map-based access to library resources of all kinds, or allow
users to create their own research guides to resources of interest, or a variety of other new
services or applications.
Allowing for encoding of coordinate points in catalog records would not necessarily require
manual revisiting of all of the records in every catalog. Many linked data services, including
DBpedia and GeoNames, deliver data through APIs, allowing for users to ingest substantial
amounts of new metadata with simple inputs of data they already have on hand. By
knowing the name of the location for which one requires coordinate points, the GeoNames
API will return a variety of data, including the latitude and longitude points; from here it is
a matter of transforming the data from the web service's native export format into one
more suitable for ingest into a library catalog.
3
Sustained collaboration among cultural institutions will greatly enable the effort to include
coordinate metadata in library catalogs. Collaborative cataloging initiatives already exist in
North America, through efforts such as the Library of Congress’ Program for Cooperative
Cataloging (PCC) and its associated programs. These can provide a framework for
developing linked open data programs in libraries, and for communicating the work in this
space to other libraries. Just as Library of Congress Subject Headings (LCSH) thesaurus
serves as a central hub for library metadata that may be linked to other linked open data
vocabularies to download coordinate points, LC may also play a key role in coordinating
the development of linked data initiatives and training of library cataloging staff in these
efforts, as well as continuing to connect LCSH to other linked data vocabularies such as
GeoNames.
Working within the linked open data network will allow connecting library metadata with
metadata created by other domains. This will enhance discovery layers, especially
geospatial ones, by allowing users to click on a location or point of interest and view
external metadata drawn from a linked data hub that goes well beyond that which libraries
have created (and vice-versa; other domains may take advantage of our data by the same
turn). Taking advantage of the linked open data cloud allows our metadata to be more
broadly accessed by a wider community of users. Additionally, because linked open data
operates at the network level by design, the cataloging work may be shared with other
metadata specialists in other communities. This network-level cooperative cataloging will
allow libraries to focus on "hidden collections," i.e., those special collections and unique
local holdings that are currently under-exposed. Connecting library metadata describing
these resources to the linked open data cloud, whether through producing our own linked
data or consuming that produced by others, improve access to library collections and
discovery layers and enrich the linked open data cloud on the Web even further
(Hannemann and Kett 2010).
User Experience Design
Because most of the early work on building a linked data ecosystem has been focused on
publishing and exposing vocabularies, end-user services are not as mature. Recently
however, a handful of examples of user-facing services using linked data have emerged.
One of these is DBpedia Mobile, a mobile device application developed by Becker and Bizer
at the Freie Universität Berlin, that is an example of an innovative user interface driven by
linked open data, from which users may discover digital resources of which they were
previously unaware. This application takes as input the location of a mobile device as
communicated through its GPS coordinates, and returns a map with links to nearby points
of interest encoded within the DBpedia database, based on the latitude and longitude
points contained in the corresponding RDF-encoded metadata. From this interface, users
may follow these links and discover additional information about those points of interest.
In an article in Computers in Libraries, Anne-Lena Westrum (Westrum 2011) described the
challenges in developing an open library catalog for the Oslo Public Library, and how they
utilized linked data concepts to address these challenges in a way that made sense to end
users. By connecting the data in their local catalog to linked open data vocabularies, they
4
were able to enhance access to the items in their collection by providing contextual
information about authors derived from DBpedia and VIAF. Using linked open data
improved the OPL's discovery services in two ways. By consuming linked data, they were
able to allow for additional access points and advanced queries of their catalog data; by
producing it (i.e., by exposing it openly as RDF), they allowed third-party access to their
metadata, providing even more visualizations of their metadata that were not considered
during the project's implementation.
These examples demonstrate the power of using linked open data to improve access to
library catalogs, as well as making that linked library data open so that end users can
develop applications that sit atop that data, and take advantage of it to enhance access even
further. By applying these principles to geospatial access, libraries can provide vastly
improved service through their catalogs and discovery layers to users whose queries have
a geographic focus, whether they know it or not. Vocabularies such as GeoNames and
DBpedia, which provide coordinate metadata for place names, historical events, and so
forth, provide the back-end support for these services. Many popular websites, well known
to most users of online library catalogs, provide guideposts for the front-end services.
Included in these services is Google Maps itself, as well as sites such as Flickr which use
free online map services to provide geospatial access points to cultural heritage resources.
They allow for contributors to place photographs submitted to the sites on a map, and for
users to browse or search for resources that interest them based on geographic entry
points. To connect these two threads of cultural institutional practice, the benefits of both
increasing coordinate access to library metadata and utilizing linked open data as a means
to this end are apparent. The challenge, then, becomes getting there through a method that
scales upward.
The Future of Geographic Search Interface
The current state of search and discovery in libraries is bland at best, even with the
integration of second-generation discovery systems. Libraries have not worked to find
dynamic, creative, interactive means for accessing their collections and we are stuck with
the same old, text driven search and results screen when searching in an online catalog.
While providing a geographic search interface for cartographic collections makes sense,
one can argue that it also makes sense to do so for all library collections as well as an
alternate version of the traditional OPAC. Integrating geographic linked data vocabularies
into existing library metadata would allow libraries to create such search interfaces, not
only for their digital collections, but also possibly more importantly, for their print
collections. Users would be able to explore entire library collections based on where an
item was written, or where a story takes place or from where an author lived. Based on this
idea, the user would then be able to explore other material from that same area, as these
would also appear in the search results.
In 2010 Marcy Bidney (Bidney 2010) made the case for inclusion of geographic coordinates
in the catalog records of cartographic material to enhance the capabilities of providing
alternate access points for print and digital map collections, and called for continued
research on how the Semantic Web and linked data could further efforts to provide visual
5
geographic interface access points for library collections. Almost simultaneously Klokan
Technologies, based in Switzerland, was hard at work developing a new ranking algorithm,
which they named MapRank (Oehrli et al. 2011). The development of this algorithm
resulted in a successful geographic search interface for digital cartographic collections.
Implementation of this interface can be seen at the David Rumsey Map Collection site and
also at the Swiss Electronic Library. In both cases the user is exploring the digital
collections by zooming in and out of the map interface and seeing relevant search results
appear on the right-hand side of the screen. This searching function of these sites is along
the more traditional lines as it utilizes existing metadata records for digital objects that
include bounding box coordinate data, allowing the computer to search within a boxed area
on the map. Though it serves to search a specific set of metadata records and does not
utilize linked data technologies, it is the first successful implementation of this type of
search interface for digital cartographic collections and provides much promise for the
future of geospatial search interface development.
The Finnish Culture Sampo site is a rare example of how cultural institutions such as
museums, historical societies and libraries can begin to harness Semantic Web
technologies to create dynamic, interactive, visual search interfaces based on geography.
This project pulled information together from approximately twenty Finnish cultural
institutions and, among other means of access, provides access via geography using a
Google Maps interface. The Culture Sampo site brings together a variety of digital objects
such as maps, poems, books, and folk songs, along with tagged images from Panoramio and
Wikipedia, allowing a user to analyze Finnish culture in a way that creates a more complete
picture by combining information objects spread across the Web.
Both of these projects provide a glimpse at what is technologically possible in the
development of alternate search interface design. While neither is optimal, a combination
of the two – the clean, smooth functioning interface of MapRank search along with the
Semantic Web technology use of the Finnish Culture Sampo, would result in an amazing
geographic search and discovery experience as represented in the Finnish Culture Sampo
site.
Conclusion
There are many challenges to the creation of these interfaces, but as noted earlier, it is not
technology that stands in the way anymore. According to Byrne and Goddard (Byrne &
Goddard 2010) one of the biggest challenges is the change that will have to occur in
libraries to bring these projects to reality. Utilizing linked data vocabularies and integrating
these into the traditional controlled vocabularies to which librarians are accustomed,
represents a major shift in thinking for many cataloging librarians and would represent a
major shift in cataloging policies across the board. Another challenge mentioned by
Westrum is the need for “a modern metadata format that is open and flexible”. MARC does
not fit this description, and while RDA allows more flexibility and openness to library
metadata records, library metadata records have a long way to go before they can be
considered truly open and flexible. Some other challenges exist in dissemination of the
knowledge about linked data and the Semantic Web. Byrne and Goddard note this by
6
reviewing library conference proceedings and commenting on the lack of published articles
on linked data but with a sense that the seas are starting to shift toward an increased
awareness. Despite these, and other, challenges, libraries should begin to shift their
thinking toward utilizing linked data and the Semantic Web as these technologies are
proving to be a powerful tool in the future of information search and discovery.
The future of alternate user interface design for accessing library collections is bright.
There is a lot of work still ahead to bring features such as a geographic search interface to
our users but we are finally in a time where the technology exists for this to happen, now
we just need to begin to change the way we think about, create and work with metadata.
7
References
Becker, C. & Bizer, C., DBpedia Mobile : A Location-Enabled Linked Data Browser. World,
pp.6–7.
Berners-Lee, T., Hendler, J. & Lassila, O., 2001. The Semantic Web. Scientific American,
284(5), pp.34–43.
Bibliotheque electronique Suisse, Kartenportal.CH: Geographical Search (beta). Available
at: http://kartenportal.mapranksearch.com/en/ [Accessed February 6, 2012].
Bibliotheque National de France, RAMEAU : Accueil. Available at: http://rameau.bnf.fr/
[Accessed February 6, 2012].
Bidney, M.M., 2010. Can Geographic Coordinates in the Catalog Record Be Useful? Journal of
Map & Geography Libraries, 6(2), pp.140–150. Available at:
http://www.tandfonline.com/doi/abs/10.1080/15420353.2010.492304.
Bizer, C. & Berlin, F.U., Linked Data - The Story So Far. International Journal on Semantic
Web and Information Systems.
Byrne, G. & Goddard, L., 2010. The Strongest Link: Libraries and Linked Data. D-Lib.
Chen, H., 2007. Geospatial Semantic Web. Image (Rochester, N.Y.), pp.272–275.
Congress, L. of, Home - Authorities & Vocabularies (Library of Congress). Available at:
http://id.loc.gov/ [Accessed February 6, 2012].
CultureSampo, Kulttuurisampo - suomalainen kulttuuri semanttisessa web 2.0:ssa.
Available at: http://www.kulttuurisampo.fi/index.shtml [Accessed February 6,
2012].
Egenhofer, M.J., 2009. Toward the Semantic Geospatial Web *. Society, pp.5–8.
Flickr, Welcome to Flickr - Photo Sharing. Available at: http://www.flickr.com/ [Accessed
February 6, 2012].
Goddard, Lisa, B., Gillian, 2010. Linked Data tools: Semantic Web for the Masses. First
Monday, 15(1). Available at:
http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/3120/
2633.
Hannemann, J., 2010. Linked Data for Libraries and Jürgen Kett German National Library ,.
Knowledge Management, pp.1–12.
Keller, M.A., 2001. Linked Data : A Way out. Director, pp.10–11.
8
Kuhn, W., Geospatial Semantics : Why , of What , and How ? Introduction : Why Semantics ?
Information Systems.
OCLC, The Virtual International Authority File (VIAF). Available at: http://viaf.org/
[Accessed February 6, 2012].
Oehrli, M. et al., 2011. MapRank: Geographical Search for Cartographic Materials in
Libraries. D-Lib, 17(9/10). Available at:
http://www.dlib.org/dlib/september11/oehrli/09oehrli.html.
Rumsey, D., David Rumsey Historical Map Collection | The Collection. Available at:
http://www.davidrumsey.com/ [Accessed February 6, 2012].
Scharl, A., 2007. Towards the Geospatial Web : Media Platforms for Managing Geotagged
Knowledge Repositories. Knowledge Creation Diffusion Utilization, pp.3–14.
Science, G.I., The University Consortium for. Scientific American.
Seikel, M. & Steele, T., 2011. How MARC Has Changed: The History of the Format and Its
Forthcoming Relationship to RDA. Technical Services Quarterly, 28(3), pp.322–334.
Available at:
http://www.tandfonline.com/doi/abs/10.1080/07317131.2011.574519.
Shadbolt, N., Berners-Lee, T. & Hall, W., 2006. The Semantic Web Revisited. IEEE Intelligent
Systems, 21(3), pp.96–101. Available at:
http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1637364.
Smits, J., 2011. Libraries Mapped: A Question of Research! Journal of Map & Geography
Libraries, 7(2), pp.220–244. Available at:
http://www.tandfonline.com/doi/abs/10.1080/15420353.2011.566844.
Söderbäck, A., Why libraries should embrace Linked Data Why LIBRIS likes Linked Data.
Westrum, A.-L., 2011. The Key to the Future of the Library Catalog is Openness. Computers
in Libraries, 31(3), pp.10-14.
Wick, M., GeoNames. Available at: http://www.geonames.org/ [Accessed February 6, 2012].
9
Download