Digital Curation Centre Event Report for 27 October 2006 DCC Workshop: Maintaining Long-term Access to Geospatial Data On Friday 27 October, the UK Digital Curation Centre (DCC) held a one-day workshop, hosted by the e-Science Institute (eSI) in Edinburgh. The workshop, on maintaining longterm access to geospatial data, was targeted for people already knowledgeable about geospatial data use and management. The theme reflects the missions of both the DCC and the eSI: to assist scientists and other researchers in actively managing and adding value to the digital data collections they work with, over long time periods. The goal of the workshop was to bring together a group of people who could learn from and update each other on progress and best practices, through informal presentations and discussions, about the following areas: • • • Citing and managing geospatial databases Geospatial data formats and metadata Geospatial repositories and storage Roughly 35 participants from various UK organizations gathered to hear eight short talks about these topics, with an open discussion session featuring four speakers in the role of panel members at the close of the workshop. This workshop was organized by Rajendra Bose, DCC Postdoctoral Researcher (School of Informatics, University of Edinburgh), Guy McGarva, DCC Geospatial Advisor (EDINA National Data Centre, Edinburgh), and Joy Davidson, DCC Training Coordinator (University of Glasgow). The presentations and other material for download are available at: http://www.dcc.ac.uk/events/geospatial-2006/ The original workshop website is located at: http://www.nesc.ac.uk/esi/events/697/ Following is a synopsis of the workshop presentations and discussions; for more details please refer to the original presentations or contact the speakers directly. Although three discrete sessions were used to group the day’s talks, all the topics covered—repositories and archival storage, metadata, formats and citation—are really just parts of the same puzzle: how best to maintain long-term access to data. Guy McGarva (DCC) opened the event by providing a brief introduction to DCC and the timeliness of the workshop topics. Session 1: Citing and managing geospatial databases Rajendra Bose (DCC), working jointly with Guy, spoke about nascent ideas for how to accomplish geospatial database citation. Most research papers conclude with a list of references—and increasingly, associated web links—that allow people to access published journal articles and online data cited in the body of the paper. Access to cited work is crucial to verify or build on existing research. Achieving stable citations for online material, however, remains difficult. Published maps are also cited, but paper maps are being replaced by selections of features that reside in dynamic geospatial databases at a specific moment. Newer data collections like Ordnance Survey (OS) MasterMap are databases of geospatial features that may change geometry or attributes as versions of the feature progress over time. Event Report for 27 October 2006 DCC Workshop 2 Raj and Guy introduced the idea of a manifest or a simple list of feature IDs and versions that could serve as an unambiguous citation for a set of geospatial features; an XML manifest represents a portable citation that could be easily supplied to others. Given stable feature IDs, geographic information systems (GIS) could generate XML manifests, while a complete archive of geospatial feature versions could accept XML manifests to generate a previously cited set of features. Next, Chris Fleet (National Library of Scotland) and Kimberly Kowal (The British Library) discussed how current and historical OS data is managed in the six Legal Deposit Libraries throughout the UK and Ireland. Kimberly summarized the legislative framework which constrains how the libraries can provide this data to patrons, and mentioned the topics of voluntary deposit and recent repository-related programmes. Chris commented on maintaining the continuity of user experience across the various OS data formats that the libraries manage; most patrons want to view the same place at various points in time. These issues have influenced the design of a system for supplying the latest MasterMap incarnation of OS data to Legal Deposit Library users, currently nearing completion. Session 2: Geospatial data formats and metadata Steve Morris (North Carolina State University Libraries) provided an update on the ongoing North Carolina Geospatial Archiving Project (NCGDAP), one of two projects in the US Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP) that involves preservation of geospatial data. He underscored the importance of the social and institutional aspect of the project, and presented preliminary survey results which hinted that many organizations in the state only occasionally (for example, annually) “capture” data sets for archiving. He mentioned emerging ideas for a “hub and spoke” model for metadata workflow: a group of core metadata elements (the hub) is captured during ingest, which form the basis for more detailed metadata records for various systems and needs (the spokes). This fits into a “repository-agnostic” architecture that builds in the concept of using alternative repositories over time. Following this, Tony Mathys (EDINA National Data Centre) presented an overview of geospatial metadata standards and described the development of metadata profiles, including the AGMAP (Academic Geospatial Metadata Application Profile) which includes elements from ISO 19115 and UK GEMINI standards. Tony also described the Go-Geo! Website that provides geospatial resources and access to the AGMAP and supporting guidelines; a metadata creation and editing tool; and a portal for searching and discovering spatial datasets and other geospatial resources. Session 3: Geospatial repositories and storage Greg Janée (University of California, Santa Barbara) spoke about the National Geospatial Digital Archive (NGDA), the other NDIIPP project on geospatial data. He differentiated between preservation efforts for recently produced, contemporary information, and the concept of the "archive at mid-century," which acquires 50-year-old digital information to preserve until the next hand-off in the chain of stewardship. His project recognizes that ongoing, long-term preservation will involve multiple migrations and handoffs over time at all levels. It also recognizes that some material may exist in a low-cost "fallback" state that may be resurrected in the future if both interest and resources for this exist. Accordingly, the NGDA architecture has two components. The first is a storage API that abstracts and places minimal requirements on the underlying storage system. The second is a data model that integrates description and structuring of complex digital objects with a web of semantics-defining information, the latter represented recursively using digital Event Report for 27 October 2006 DCC Workshop 3 objects. NGDA archive content is being mapped to, and made available through, the Alexandria Digital Library, which supports geospatial search and access. Sam Pepler then provided an overview of geospatial data management issues at the British Atmospheric Data Centre (BADC). He showed examples of the wide range of applications data sets held at the BADC are used for (including non-atmospheric science projects in such areas as health and history), and spoke about topics including the current metadata system for data set searches, the NERC Data Grid effort and the Climate Science Markup Language (CSML). Through CSML, the BADC hopes to move towards identifying and searching for “features” rather than files. A brief introduction to the Scoping a Geospatial Repository for Academic Deposit and Extraction (GRADE) project was presented by James Reid (EDINA National Data Centre). GRADE is a JISC-funded repository project that is looking at the technical and cultural issues around the reuse of geospatial data. James highlighted some of the work done to date, including building a demonstrator repository with which to explore user requirements, looking at legal issues surrounding the use and reuse of geospatial data and identifying the main barriers to geospatial data sharing. One of the main conclusions so far is that there is a perceived need for a national geospatial data repository. Finally, Humphrey Southall (Department of Geography, Portsmouth University) addressed the topic: Geo-spatial repositories – or the lack of them: recording Britain’s historical geography. His talk revolved around the effort to keep continuity regarding funding and curation for the Great Britain Historical GIS project over the past decade. He raised concerns for underappreciated historical data sets, such as census records, land utilization surveys, and farm boundary maps; the continuity of funding; representation of temporal data elements; and the situation of having numerous, small, distributed collections rather than one single repository. He brought up the important issue, also mentioned by James Reid, of what exactly constitutes geospatial data: for example, does the text description of historical administrative units fit in this category? Discussion and conclusions Steve Morris, Greg Janée, Humphrey Southall and James Reid formed a panel for the final half-hour informal discussion and wrap up session at the end of the workshop. One area of questioning asked whether disciplinary experts (such as hydrologists for hydrology data sets, atmospheric scientists for atmospheric data sets, etc.) should maintain separate archives or repositories, or one centralized repository should exist, supported by the government, for example. Another question involved the creation of a spatial data infrastructure (SDI): what level is this best managed on? A good set of notes on this topic and other threads of the workshop is provided by a public blog written by one of the workshop participants soon after the event; for more detail, see: http://archaeogeek.wordpress.com/2006/10/31/forget-long-term-access-were-struggling-withthe-short-term/ Special thanks go to the workshop participants and speakers, and our hosts and event organizers at eSI for financial and logistical support. Rajendra Bose and Guy McGarva, Digital Curation Centre 15 November 2006