Grid-enabling Humanities Data Mark Hedges Arts and Humanities Data Service King’s College London Funded by: © AHDS Overview • What do we mean? • Particular challenges of the humanities. • Some approaches. Funded by: NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS What do we mean by grid-enabling? Grid-enabling a data resource Making the resource available via a grid Making the resource accessible via some form of grid middleware Funded by: NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS What do we mean by grid-enabling? • • • • Shared discovery/access Virtualised access Data integration Collaborative working Funded by: NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS Data on the grid Funded by: Implementation often driven by: • need for fast access to very large, distributed data sets. • dynamic data sets Issues: • Replication, caching • Data transfer, latency, data propagation Data formats: Flat files, databases, XML. NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS An oversimplification … • “Hard” sciences – a driver to adopting grids was the need for fast, shared access to large, distributed datasets. • Humanities – driver is the essential nature of the research data, rather than data size or speed of access Funded by: NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS Challenges of humanities data • • • • • • • Diverse and multi-media/multi-format Complex structure Highly contextual Fuzzy and uncertain Interpretative Dispersed and isolated Rigid yet diverse metadata (if any) Funded by: NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS What would we like to do? • • • • • Federation of the data resources Location-transparent resource discovery Delivery of (integrated) data resources Use of delivered data resources Semantics of data resources of prime importance Funded by: NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS Digital repositories • Want to represent digital resources so as to reflect complexity and context. • So: Store using flexible digital repository systems (Fedora (www.fedora.info at AHDS). • Want seamless integration between these highly structured repositories. • So: Integrate repository software with grid middleware. Funded by: NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS Fedora data model ator dc:cre "Eddie Shin" te lastModDa "2005-01-10:11:02" hasRep r embe hasM info:fedora/ demo:11 hasR ep info:fedora/demo:11/DC info:fedora/ demo:10 info:fedora/demo:11/bdef:2/getHIGH hasM emb er dc:crea to r lastM odDa te info:fedora/ demo:12 ate odD lastM dc:creator ep sR ha hasR ep ha sR ep "Chris Wilper" "2005-02-01:12:05" info:fedora/demo:12/DC info:fedora/demo:12/bdef:2/getHIGH info:fedora/demo:10/bdef:1/MEMBERS "2005-01-01:10:00" "Elly Cramer" Funded by: NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS Repository-Grid Integration Two broad approaches: • Grid as virtualised distributed storage. • Repositories as data resources on grid. Funded by: NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS Grid storage for Fedora Funded by: • Fedora has been integrated with SRB. • Currently looking at iRODS (Rule Oriented Data System) integration • Will be able to make use of the complex metadata stored within Fedora. • Complex actions encoded as rules built up from atomic services • Rules integrated with system, yet easily changeable NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS Fedora and iRODS Fedora repository Fedora repository data storage Fedora repository data storage Rules data storage Rules iRODS zone iRODS zone Federated iRODS zones Funded by: NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS Fedora repositories as grid resources Funded by: Use grid technologies to implement: • Discovery of distributed Fedora repositories belonging to different administrative domains. • Discovery of data resources across multiple Fedora repositories • Delivery and integration of data resources across multiple Fedora repositories • Grid AuthN and AuthZ mechanisms providing uniform access. NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS Resource Discovery Funded by: • Fedora resources all exposed as web services. • Use service registries to publish metadata about repository resources. • Populate registries by a number of means. • Populate registries by a number of means NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS Delivery and Integration Funded by: • Discovered data needs to be retrieved, integrated and delivered to clients. • Looking at use of OGSA-DAI for this task • OGSA-DAI has an extensible architecture to deal with multiple types of data resources. • Initially relational and XML databases; projects looking at extending to other types of data resource: GEMS, SEE-GEO, GEESE. • Highly varied nature of repository contents a significant challenge NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS Delivery Client Search/Discovery OGSA-DAI Data Service registry with semantic annotations Integration RDF export Text-mining & semantic annotation tools User annotation tools OGSA-DAI Data Service OGSA-DAI Data Service data resource data resource exposes Funded by: Fedora repository Fedora repository local storage local storage s expose NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS Contact ? mark dot hedges at ahds dot ac dot uk Funded by: NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007 © AHDS