Harnessing the Geospatial Semantic Web: Toward Place-Based Information Organization and Access Marcy Bidney, Pennsylvania State University (maplibrarian@psu.edu) 1 Central Pattee Library, University Park, PA 16802 Marcy is Head of the Donald W. Hamer Maps Library at Penn State University and serves as the Geospatial Information Librarian. She was the Chair of the Map and Geospatial Information Roundtable of ALA in 2010. Her research interests include alternative access modes for map and data collections, geographic education/literacy, the history of cartography and the spatial humanities. Kevin Clair, Pennsylvania State University (kmc35@psu.edu) 126 Paterno Library, University Park, PA 16802. Kevin is the Metadata Librarian at the Penn State University Libraries, a position he has held since 2007. His research interests include the uses of linked open data in libraries, the economics of metadata creation in a research library, and the uses of digital collections in local and community history. Abstract The geospatial semantic web’s primary use to date has been the creation of map mashups, collaborative mapping projects, and other research functions. The power of the geospatial semantic web can also be harnessed for the development of place-based access points to further the use of information collections – digital and print. Creating a geographic search interface for information collections allows users to search by location. The basic principles of linked data, describing entities using unique identifiers and provide links between related objects, tie into the desire for libraries to link their own digital resources with related materials held by other cultural institutions publishing content on the Web. This paper will provide an overview of linked data principles, discuss the benefits and challenges of providing geographic information in metadata records, and provide examples of how location based searches are valuable to users, and offer opportunities for future research. Introduction As understanding and development of Web technology deepens, a variety of useful means for information organization and access emerge. The development of a geospatial semantic web utilizing linked data has been most promising in changing how information is organized, displayed and accessed. While the geospatial semantic web’s primary use to date has been the creation of map mashups, collaborative mapping projects, and other research functions, the power of the geospatial semantic web can also be harnessed for the development of place-based access points to further the use of information collections – digital and print. Organizing information based on location is a powerful idea – it has the capacity to bring together information from diverse communities of practice that a researcher may never have considered. While this is certainly not a new idea, the technology is now just getting to a point where this idea can become a reality. “Place” is interdisciplinary and the creation of geographic search interfaces utilizing linked data principles enables access across a variety of information collections in powerful and innovative ways. The developers of new generation library catalogs should be thinking outside of traditional text display methods and look forward to the creation of more visual displays of search results. Developers have much to learn from the rapid increase in the use of geospatial information to generate mashups of data and information on the web and the display of this data on maps. The creation of a geographic search interface for information collections allows users to search for an item according to its location; simply by clicking on a given location on a map users can explore library collections of all kinds of materials related by place. The geospatial semantic web makes this idea more realistic through the use of linked data to expose connections between bits of information that otherwise may not have been revealed through a simple text search in a traditional library catalog. This method also exposes relationships in a visual context that is inherently more meaningful than textual connotations. Tim Berners-Lee and his co-authors wrote in Scientific American (Berners-Lee et al. 2001) about the Semantic Web describing it not as “a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” In the same article the authors say in order “…for the Semantic Web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning.” It would stand to reason then, that one of the most useful places for Semantic Web technologies, such as linked data, to take root would be in libraries where structured collections of information have existed for centuries. Ten years later, Gillian Byrne and Lisa Goddard (Byrne and Goddard 2010) refer to the promise of semantic web capabilities as “dazzling”. In making the case for linked data, they provide examples of how to implement it successfully in the library environment. Writing about the uses of linked data for libraries they can now make such statements that technology is “no longer the major obstacle to linked data implementation.” Linked open data provides a powerful tool for enhancing access to library catalog records. The purpose of linked data is to provide connections between related, but unconnected data on the Web, in much the same way that Web pages are currently linked together. In recent years librarians have seized upon the linked open data concept as a means of sharing metadata about library collections and repurposing that content for new audiences and applications. The basic principles of linked data—to describe entities using unique identifiers, and provide links between related objects using these identifiers in order to enrich them both—tie into the desire for libraries to link their own digital resources with related materials held by other libraries, cultural institutions, or other entities publishing content on the Web. 2 This paper will provide an overview of linked data principles, discuss the benefits and challenges of providing geographic information in metadata records, the development of standards for interface design, provide examples of how location based searches are valuable to users, and offer opportunities for future research. Linked Data in Practice Linked open data is built on two foundations. First is the "technology stack" of transmission protocols (such as HTTP) and data markup and serialization standards (such as RDF and OWL) that encode the data. Second are the controlled vocabularies governing the terms that may be linked, provided through commonly utilized linked data thesauri such as DBpedia. These provide not only the terms for describing objects, but also their associated URIs, so that links to additional resources may be made. Much of the early work done by cultural institutions to integrate themselves into the linked open data space has been focused on publishing authority data in a form that may be linked by themselves and by other domains. Examples of this work include the Library of Congress authority files, the Virtual International Authority File, and the Rameau subject headings published by the Bibliothèque Nationale de France. For geospatial applications, the key linked open data hub is the GeoNames service, http://geonames.usgs.gov/, which provides persistent URIs and metadata about a variety of geographic locations. One of the key features of GeoNames is that each entity described in its database includes an RDF serialization, allowing for direct links from other linked data-enabled services. In this way GeoNames has positioned itself as an essential data source for any developer working on a linked data application with a geospatial component. There are many benefits to being able to ingest geospatial metadata from linked data vocabularies using existing metadata in library catalog records. By integrating links to widely-used linked data vocabularies into the catalog, libraries can take advantage of the rich metadata encoded in those vocabularies and enrich their own metadata substantially. By using this two-way relationship, they may build additional interfaces and improve the coverage of their records, including support for temporal and spatial interfaces. From this they may be able to provide map-based access to library resources of all kinds, or allow users to create their own research guides to resources of interest, or a variety of other new services or applications. Allowing for encoding of coordinate points in catalog records would not necessarily require manual revisiting of all of the records in every catalog. Many linked data services, including DBpedia and GeoNames, deliver data through APIs, allowing for users to ingest substantial amounts of new metadata with simple inputs of data they already have on hand. By knowing the name of the location for which one requires coordinate points, the GeoNames API will return a variety of data, including the latitude and longitude points; from here it is a matter of transforming the data from the web service's native export format into one more suitable for ingest into a library catalog. 3 Sustained collaboration among cultural institutions will greatly enable the effort to include coordinate metadata in library catalogs. Collaborative cataloging initiatives already exist in North America, through efforts such as the Library of Congress’ Program for Cooperative Cataloging (PCC) and its associated programs. These can provide a framework for developing linked open data programs in libraries, and for communicating the work in this space to other libraries. Just as Library of Congress Subject Headings (LCSH) thesaurus serves as a central hub for library metadata that may be linked to other linked open data vocabularies to download coordinate points, LC may also play a key role in coordinating the development of linked data initiatives and training of library cataloging staff in these efforts, as well as continuing to connect LCSH to other linked data vocabularies such as GeoNames. Working within the linked open data network will allow connecting library metadata with metadata created by other domains. This will enhance discovery layers, especially geospatial ones, by allowing users to click on a location or point of interest and view external metadata drawn from a linked data hub that goes well beyond that which libraries have created (and vice-versa; other domains may take advantage of our data by the same turn). Taking advantage of the linked open data cloud allows our metadata to be more broadly accessed by a wider community of users. Additionally, because linked open data operates at the network level by design, the cataloging work may be shared with other metadata specialists in other communities. This network-level cooperative cataloging will allow libraries to focus on "hidden collections," i.e., those special collections and unique local holdings that are currently under-exposed. Connecting library metadata describing these resources to the linked open data cloud, whether through producing our own linked data or consuming that produced by others, improve access to library collections and discovery layers and enrich the linked open data cloud on the Web even further (Hannemann and Kett 2010). User Experience Design Because most of the early work on building a linked data ecosystem has been focused on publishing and exposing vocabularies, end-user services are not as mature. Recently however, a handful of examples of user-facing services using linked data have emerged. One of these is DBpedia Mobile, a mobile device application developed by Becker and Bizer at the Freie Universität Berlin, that is an example of an innovative user interface driven by linked open data, from which users may discover digital resources of which they were previously unaware. This application takes as input the location of a mobile device as communicated through its GPS coordinates, and returns a map with links to nearby points of interest encoded within the DBpedia database, based on the latitude and longitude points contained in the corresponding RDF-encoded metadata. From this interface, users may follow these links and discover additional information about those points of interest. In an article in Computers in Libraries, Anne-Lena Westrum (Westrum 2011) described the challenges in developing an open library catalog for the Oslo Public Library, and how they utilized linked data concepts to address these challenges in a way that made sense to end users. By connecting the data in their local catalog to linked open data vocabularies, they 4 were able to enhance access to the items in their collection by providing contextual information about authors derived from DBpedia and VIAF. Using linked open data improved the OPL's discovery services in two ways. By consuming linked data, they were able to allow for additional access points and advanced queries of their catalog data; by producing it (i.e., by exposing it openly as RDF), they allowed third-party access to their metadata, providing even more visualizations of their metadata that were not considered during the project's implementation. These examples demonstrate the power of using linked open data to improve access to library catalogs, as well as making that linked library data open so that end users can develop applications that sit atop that data, and take advantage of it to enhance access even further. By applying these principles to geospatial access, libraries can provide vastly improved service through their catalogs and discovery layers to users whose queries have a geographic focus, whether they know it or not. Vocabularies such as GeoNames and DBpedia, which provide coordinate metadata for place names, historical events, and so forth, provide the back-end support for these services. Many popular websites, well known to most users of online library catalogs, provide guideposts for the front-end services. Included in these services is Google Maps itself, as well as sites such as Flickr which use free online map services to provide geospatial access points to cultural heritage resources. They allow for contributors to place photographs submitted to the sites on a map, and for users to browse or search for resources that interest them based on geographic entry points. To connect these two threads of cultural institutional practice, the benefits of both increasing coordinate access to library metadata and utilizing linked open data as a means to this end are apparent. The challenge, then, becomes getting there through a method that scales upward. The Future of Geographic Search Interface The current state of search and discovery in libraries is bland at best, even with the integration of second-generation discovery systems. Libraries have not worked to find dynamic, creative, interactive means for accessing their collections and we are stuck with the same old, text driven search and results screen when searching in an online catalog. While providing a geographic search interface for cartographic collections makes sense, one can argue that it also makes sense to do so for all library collections as well as an alternate version of the traditional OPAC. Integrating geographic linked data vocabularies into existing library metadata would allow libraries to create such search interfaces, not only for their digital collections, but also possibly more importantly, for their print collections. Users would be able to explore entire library collections based on where an item was written, or where a story takes place or from where an author lived. Based on this idea, the user would then be able to explore other material from that same area, as these would also appear in the search results. In 2010 Marcy Bidney (Bidney 2010) made the case for inclusion of geographic coordinates in the catalog records of cartographic material to enhance the capabilities of providing alternate access points for print and digital map collections, and called for continued research on how the Semantic Web and linked data could further efforts to provide visual 5 geographic interface access points for library collections. Almost simultaneously Klokan Technologies, based in Switzerland, was hard at work developing a new ranking algorithm, which they named MapRank (Oehrli et al. 2011). The development of this algorithm resulted in a successful geographic search interface for digital cartographic collections. Implementation of this interface can be seen at the David Rumsey Map Collection site and also at the Swiss Electronic Library. In both cases the user is exploring the digital collections by zooming in and out of the map interface and seeing relevant search results appear on the right-hand side of the screen. This searching function of these sites is along the more traditional lines as it utilizes existing metadata records for digital objects that include bounding box coordinate data, allowing the computer to search within a boxed area on the map. Though it serves to search a specific set of metadata records and does not utilize linked data technologies, it is the first successful implementation of this type of search interface for digital cartographic collections and provides much promise for the future of geospatial search interface development. The Finnish Culture Sampo site is a rare example of how cultural institutions such as museums, historical societies and libraries can begin to harness Semantic Web technologies to create dynamic, interactive, visual search interfaces based on geography. This project pulled information together from approximately twenty Finnish cultural institutions and, among other means of access, provides access via geography using a Google Maps interface. The Culture Sampo site brings together a variety of digital objects such as maps, poems, books, and folk songs, along with tagged images from Panoramio and Wikipedia, allowing a user to analyze Finnish culture in a way that creates a more complete picture by combining information objects spread across the Web. Both of these projects provide a glimpse at what is technologically possible in the development of alternate search interface design. While neither is optimal, a combination of the two – the clean, smooth functioning interface of MapRank search along with the Semantic Web technology use of the Finnish Culture Sampo, would result in an amazing geographic search and discovery experience as represented in the Finnish Culture Sampo site. Conclusion There are many challenges to the creation of these interfaces, but as noted earlier, it is not technology that stands in the way anymore. According to Byrne and Goddard (Byrne & Goddard 2010) one of the biggest challenges is the change that will have to occur in libraries to bring these projects to reality. Utilizing linked data vocabularies and integrating these into the traditional controlled vocabularies to which librarians are accustomed, represents a major shift in thinking for many cataloging librarians and would represent a major shift in cataloging policies across the board. Another challenge mentioned by Westrum is the need for “a modern metadata format that is open and flexible”. MARC does not fit this description, and while RDA allows more flexibility and openness to library metadata records, library metadata records have a long way to go before they can be considered truly open and flexible. Some other challenges exist in dissemination of the knowledge about linked data and the Semantic Web. Byrne and Goddard note this by 6 reviewing library conference proceedings and commenting on the lack of published articles on linked data but with a sense that the seas are starting to shift toward an increased awareness. Despite these, and other, challenges, libraries should begin to shift their thinking toward utilizing linked data and the Semantic Web as these technologies are proving to be a powerful tool in the future of information search and discovery. The future of alternate user interface design for accessing library collections is bright. There is a lot of work still ahead to bring features such as a geographic search interface to our users but we are finally in a time where the technology exists for this to happen, now we just need to begin to change the way we think about, create and work with metadata. 7 References Becker, C. & Bizer, C., DBpedia Mobile : A Location-Enabled Linked Data Browser. World, pp.6–7. Berners-Lee, T., Hendler, J. & Lassila, O., 2001. The Semantic Web. Scientific American, 284(5), pp.34–43. Bibliotheque electronique Suisse, Kartenportal.CH: Geographical Search (beta). Available at: http://kartenportal.mapranksearch.com/en/ [Accessed February 6, 2012]. Bibliotheque National de France, RAMEAU : Accueil. Available at: http://rameau.bnf.fr/ [Accessed February 6, 2012]. Bidney, M.M., 2010. Can Geographic Coordinates in the Catalog Record Be Useful? Journal of Map & Geography Libraries, 6(2), pp.140–150. Available at: http://www.tandfonline.com/doi/abs/10.1080/15420353.2010.492304. Bizer, C. & Berlin, F.U., Linked Data - The Story So Far. International Journal on Semantic Web and Information Systems. Byrne, G. & Goddard, L., 2010. The Strongest Link: Libraries and Linked Data. D-Lib. Chen, H., 2007. Geospatial Semantic Web. Image (Rochester, N.Y.), pp.272–275. Congress, L. of, Home - Authorities & Vocabularies (Library of Congress). Available at: http://id.loc.gov/ [Accessed February 6, 2012]. CultureSampo, Kulttuurisampo - suomalainen kulttuuri semanttisessa web 2.0:ssa. Available at: http://www.kulttuurisampo.fi/index.shtml [Accessed February 6, 2012]. Egenhofer, M.J., 2009. Toward the Semantic Geospatial Web *. Society, pp.5–8. Flickr, Welcome to Flickr - Photo Sharing. Available at: http://www.flickr.com/ [Accessed February 6, 2012]. Goddard, Lisa, B., Gillian, 2010. Linked Data tools: Semantic Web for the Masses. First Monday, 15(1). Available at: http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/3120/ 2633. Hannemann, J., 2010. Linked Data for Libraries and Jürgen Kett German National Library ,. Knowledge Management, pp.1–12. Keller, M.A., 2001. Linked Data : A Way out. Director, pp.10–11. 8 Kuhn, W., Geospatial Semantics : Why , of What , and How ? Introduction : Why Semantics ? Information Systems. OCLC, The Virtual International Authority File (VIAF). Available at: http://viaf.org/ [Accessed February 6, 2012]. Oehrli, M. et al., 2011. MapRank: Geographical Search for Cartographic Materials in Libraries. D-Lib, 17(9/10). Available at: http://www.dlib.org/dlib/september11/oehrli/09oehrli.html. Rumsey, D., David Rumsey Historical Map Collection | The Collection. Available at: http://www.davidrumsey.com/ [Accessed February 6, 2012]. Scharl, A., 2007. Towards the Geospatial Web : Media Platforms for Managing Geotagged Knowledge Repositories. Knowledge Creation Diffusion Utilization, pp.3–14. Science, G.I., The University Consortium for. Scientific American. Seikel, M. & Steele, T., 2011. How MARC Has Changed: The History of the Format and Its Forthcoming Relationship to RDA. Technical Services Quarterly, 28(3), pp.322–334. Available at: http://www.tandfonline.com/doi/abs/10.1080/07317131.2011.574519. Shadbolt, N., Berners-Lee, T. & Hall, W., 2006. The Semantic Web Revisited. IEEE Intelligent Systems, 21(3), pp.96–101. Available at: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1637364. Smits, J., 2011. Libraries Mapped: A Question of Research! Journal of Map & Geography Libraries, 7(2), pp.220–244. Available at: http://www.tandfonline.com/doi/abs/10.1080/15420353.2011.566844. Söderbäck, A., Why libraries should embrace Linked Data Why LIBRIS likes Linked Data. Westrum, A.-L., 2011. The Key to the Future of the Library Catalog is Openness. Computers in Libraries, 31(3), pp.10-14. Wick, M., GeoNames. Available at: http://www.geonames.org/ [Accessed February 6, 2012]. 9