Digital Curation Centre Event Report for 27 October 2006 DCC Workshop:

advertisement
Digital Curation Centre
Event Report for 27 October 2006 DCC Workshop:
Maintaining Long-term Access to Geospatial Data
On Friday 27 October, the UK Digital Curation Centre (DCC) held a one-day workshop,
hosted by the e-Science Institute (eSI) in Edinburgh. The workshop, on maintaining longterm access to geospatial data, was targeted for people already knowledgeable about
geospatial data use and management. The theme reflects the missions of both the DCC and
the eSI: to assist scientists and other researchers in actively managing and adding value to the
digital data collections they work with, over long time periods. The goal of the workshop
was to bring together a group of people who could learn from and update each other on
progress and best practices, through informal presentations and discussions, about the
following areas:
•
•
•
Citing and managing geospatial databases
Geospatial data formats and metadata
Geospatial repositories and storage
Roughly 35 participants from various UK organizations gathered to hear eight short talks
about these topics, with an open discussion session featuring four speakers in the role of
panel members at the close of the workshop.
This workshop was organized by Rajendra Bose, DCC Postdoctoral Researcher (School of
Informatics, University of Edinburgh), Guy McGarva, DCC Geospatial Advisor (EDINA
National Data Centre, Edinburgh), and Joy Davidson, DCC Training Coordinator (University
of Glasgow).
The presentations and other material for download are available at:
http://www.dcc.ac.uk/events/geospatial-2006/
The original workshop website is located at:
http://www.nesc.ac.uk/esi/events/697/
Following is a synopsis of the workshop presentations and discussions; for more details
please refer to the original presentations or contact the speakers directly. Although three
discrete sessions were used to group the day’s talks, all the topics covered—repositories and
archival storage, metadata, formats and citation—are really just parts of the same puzzle: how
best to maintain long-term access to data. Guy McGarva (DCC) opened the event by
providing a brief introduction to DCC and the timeliness of the workshop topics.
Session 1: Citing and managing geospatial databases
Rajendra Bose (DCC), working jointly with Guy, spoke about nascent ideas for how to
accomplish geospatial database citation. Most research papers conclude with a list of
references—and increasingly, associated web links—that allow people to access published
journal articles and online data cited in the body of the paper. Access to cited work is crucial
to verify or build on existing research. Achieving stable citations for online material,
however, remains difficult. Published maps are also cited, but paper maps are being replaced
by selections of features that reside in dynamic geospatial databases at a specific moment.
Newer data collections like Ordnance Survey (OS) MasterMap are databases of geospatial
features that may change geometry or attributes as versions of the feature progress over time.
Event Report for 27 October 2006 DCC Workshop
2
Raj and Guy introduced the idea of a manifest or a simple list of feature IDs and versions that
could serve as an unambiguous citation for a set of geospatial features; an XML manifest
represents a portable citation that could be easily supplied to others. Given stable feature
IDs, geographic information systems (GIS) could generate XML manifests, while a complete
archive of geospatial feature versions could accept XML manifests to generate a previously
cited set of features.
Next, Chris Fleet (National Library of Scotland) and Kimberly Kowal (The British
Library) discussed how current and historical OS data is managed in the six Legal
Deposit Libraries throughout the UK and Ireland. Kimberly summarized the legislative
framework which constrains how the libraries can provide this data to patrons, and mentioned
the topics of voluntary deposit and recent repository-related programmes. Chris commented
on maintaining the continuity of user experience across the various OS data formats that the
libraries manage; most patrons want to view the same place at various points in time. These
issues have influenced the design of a system for supplying the latest MasterMap incarnation
of OS data to Legal Deposit Library users, currently nearing completion.
Session 2: Geospatial data formats and metadata
Steve Morris (North Carolina State University Libraries) provided an update on the
ongoing North Carolina Geospatial Archiving Project (NCGDAP), one of two projects in
the US Library of Congress National Digital Information Infrastructure and Preservation
Program (NDIIPP) that involves preservation of geospatial data. He underscored the
importance of the social and institutional aspect of the project, and presented preliminary
survey results which hinted that many organizations in the state only occasionally (for
example, annually) “capture” data sets for archiving. He mentioned emerging ideas for a
“hub and spoke” model for metadata workflow: a group of core metadata elements (the hub)
is captured during ingest, which form the basis for more detailed metadata records for various
systems and needs (the spokes). This fits into a “repository-agnostic” architecture that builds
in the concept of using alternative repositories over time.
Following this, Tony Mathys (EDINA National Data Centre) presented an overview of
geospatial metadata standards and described the development of metadata profiles,
including the AGMAP (Academic Geospatial Metadata Application Profile) which includes
elements from ISO 19115 and UK GEMINI standards. Tony also described the Go-Geo!
Website that provides geospatial resources and access to the AGMAP and supporting
guidelines; a metadata creation and editing tool; and a portal for searching and discovering
spatial datasets and other geospatial resources.
Session 3: Geospatial repositories and storage
Greg Janée (University of California, Santa Barbara) spoke about the National
Geospatial Digital Archive (NGDA), the other NDIIPP project on geospatial data. He
differentiated between preservation efforts for recently produced, contemporary information,
and the concept of the "archive at mid-century," which acquires 50-year-old digital
information to preserve until the next hand-off in the chain of stewardship. His project
recognizes that ongoing, long-term preservation will involve multiple migrations and
handoffs over time at all levels. It also recognizes that some material may exist in a low-cost
"fallback" state that may be resurrected in the future if both interest and resources for this
exist. Accordingly, the NGDA architecture has two components. The first is a storage API
that abstracts and places minimal requirements on the underlying storage system. The second
is a data model that integrates description and structuring of complex digital objects with a
web of semantics-defining information, the latter represented recursively using digital
Event Report for 27 October 2006 DCC Workshop
3
objects. NGDA archive content is being mapped to, and made available through, the
Alexandria Digital Library, which supports geospatial search and access.
Sam Pepler then provided an overview of geospatial data management issues at the
British Atmospheric Data Centre (BADC). He showed examples of the wide range of
applications data sets held at the BADC are used for (including non-atmospheric science
projects in such areas as health and history), and spoke about topics including the current
metadata system for data set searches, the NERC Data Grid effort and the Climate Science
Markup Language (CSML). Through CSML, the BADC hopes to move towards identifying
and searching for “features” rather than files.
A brief introduction to the Scoping a Geospatial Repository for Academic Deposit and
Extraction (GRADE) project was presented by James Reid (EDINA National Data
Centre). GRADE is a JISC-funded repository project that is looking at the technical and
cultural issues around the reuse of geospatial data. James highlighted some of the work done
to date, including building a demonstrator repository with which to explore user
requirements, looking at legal issues surrounding the use and reuse of geospatial data and
identifying the main barriers to geospatial data sharing. One of the main conclusions so far is
that there is a perceived need for a national geospatial data repository.
Finally, Humphrey Southall (Department of Geography, Portsmouth University)
addressed the topic: Geo-spatial repositories – or the lack of them: recording Britain’s
historical geography. His talk revolved around the effort to keep continuity regarding
funding and curation for the Great Britain Historical GIS project over the past decade. He
raised concerns for underappreciated historical data sets, such as census records, land
utilization surveys, and farm boundary maps; the continuity of funding; representation of
temporal data elements; and the situation of having numerous, small, distributed collections
rather than one single repository. He brought up the important issue, also mentioned by
James Reid, of what exactly constitutes geospatial data: for example, does the text description
of historical administrative units fit in this category?
Discussion and conclusions
Steve Morris, Greg Janée, Humphrey Southall and James Reid formed a panel for the final
half-hour informal discussion and wrap up session at the end of the workshop. One area of
questioning asked whether disciplinary experts (such as hydrologists for hydrology data sets,
atmospheric scientists for atmospheric data sets, etc.) should maintain separate archives or
repositories, or one centralized repository should exist, supported by the government, for
example. Another question involved the creation of a spatial data infrastructure (SDI): what
level is this best managed on? A good set of notes on this topic and other threads of the
workshop is provided by a public blog written by one of the workshop participants soon after
the event; for more detail, see:
http://archaeogeek.wordpress.com/2006/10/31/forget-long-term-access-were-struggling-withthe-short-term/
Special thanks go to the workshop participants and speakers, and our hosts and event
organizers at eSI for financial and logistical support.
Rajendra Bose and Guy McGarva, Digital Curation Centre
15 November 2006
Download