Scoping a Geospatial Repository for Academic Deposit and Extraction James Reid EDINA National Data Centre University of Edinburgh October 2006 Geographic image: © 2005 Clark Labs. Scoping a Geospatial Repository for Academic Deposit and Extraction Scoping a Geospatial Repository for Academic Deposit and Extraction JISC: Digital Repositories Programme June 2005 JISC £4m programme Aim of encouraging growth of repositories in UK universities and colleges Programme consists of 25 projects exploring role and operation of projects Focus on how repositories can assist academic researchers both to do and share work more easily Open access is key driver plus growing demand for outputs of publicly-funded research to be freely available on the web Scoping a Geospatial Repository for Academic Deposit and Extraction Vision (aspirational) Reusability Managed, quality controlled Streamlined access Curation & preservation Scoping a Geospatial Repository for Academic Deposit and Extraction JUNE 2005 Project Work ProgrammeAPRIL 2007 1 Formal Repositories Establish user based evidence for the requirements and functionality of a repository capable of managing licensed geospatial assets •Automatic data validation •More ‘traditional’ geo friendly views on to data •Automated (partial) metadata completion 2 Informal geospatial data-sharing Investigate and make an assessment of informal mechanisms for geospatial data sharing •Informal geodata-sharing survey •Compile use-cases of informal geo-data-sharing from sites •Classification of existing informal repositories •Informal ‘demonstrator’ setup 3 4 DRM Digital rights issues when we consider the reuse of derived geospatial data concerns over data ownership, IPR and copyright are commonplace •AHRC legal report and framework for geospatial data sharing Institutional vs media-centric (geospatial) repositories Debate over institutional repository – one size fits all? Cultural aspects of allegiance to discipline not institution •Audit & review of geospatial data within institutional repositories •SWOT of institutional v media-centric repositories 5 Interoperability Interoperability issues – how could a geospatial repository interact within JISC IE, how could it make its assets available to the Grid / eScience community •SWOT analysis of interoperability issues within repositories Scoping a Geospatial Repository for Academic Deposit and Extraction Tentative Conclusions There appears to be a genuine desire and demand, at least at an expressive level, for the establishment of a formal national geospatial data repository. This demand however is not fully realised due to uncertainties relating to the IPR and digital rights (policy) issues that cloud all discussions of geospatial data reuse in the UK. The main interim conclusion is that the legal uncertainties as highlighted by the work of the AHRC partners needs to be taken forward and used to improve the ability of the research community to fully exploit its prior endeavours. Alternatively, existing licensing contexts can be exploited, but will restrict the breadth of the audience that can be serviced. Scoping a Geospatial Repository for Academic Deposit and Extraction Issues – Content Packaging Consider a geospatial data asset deposited into a repository, it’s more than one file: GML and associated schema! proprietary vector format plus cartographic representation detail geodatabase raster with header file Data set metadata and IPR info What is best method to package data? In eLibrary world the Metadata Encoding and Transmission Standard (METS) and IMS content package (IMS CP) and MPEG-21 DIDL for repository objects What direction is the GI industry taking with content packaging? Scoping a Geospatial Repository for Academic Deposit and Extraction Issues – GML for archiving? If content packaging is about asking ‘best’ method to package data, next question is about content being packaged. “Permanent access” requirements: profiles and application schemas widely understood and supported, avoid requiring “digital archaeology” Role of GML : current focus is as transfer format Assessing formats for preservation: sustainability v. quality v. functionality How to handle proprietary formats? Spatial databases pose special challenge Scoping a Geospatial Repository for Academic Deposit and Extraction Issues – Persistent Identifiers Once a geospatial data asset is deposited within a repository, there is a need to be able to persistently identify this asset Particular repository softwares use particular schemes e.g. Fedora uses ‘info’ URI scheme Requirement to ensure identifier is actionable What about version management? OpenURL Resolvers? Digital Object Identifier (DOI) for handle schemes? UUIDs? URI, URN, URL, URC!!!! A N Other? Interoperability? Persistence? What direction is GI industry taking with persistent identifiers? Scoping a Geospatial Repository for Academic Deposit and Extraction Issues – more! Data citation Data2article citation Data lifecycles Feature types … (add your own pet concern)