Grid-enabling Humanities Data Mark Hedges Arts and Humanities Data Service King’s College London

advertisement
Grid-enabling Humanities Data
Mark Hedges
Arts and Humanities Data Service
King’s College London
Funded by:
© AHDS
Overview
• What do we mean?
• Particular challenges of the humanities.
• Some approaches.
Funded by:
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
What do we mean by grid-enabling?
Grid-enabling a data resource
Making the resource available via a grid
Making the resource accessible via
some form of grid middleware
Funded by:
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
What do we mean by grid-enabling?
•
•
•
•
Shared discovery/access
Virtualised access
Data integration
Collaborative working
Funded by:
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
Data on the grid
Funded by:
Implementation often driven by:
• need for fast access to very large,
distributed data sets.
• dynamic data sets
Issues:
• Replication, caching
• Data transfer, latency, data propagation
Data formats: Flat files, databases, XML.
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
An oversimplification …
• “Hard” sciences – a driver to adopting
grids was the need for fast, shared
access to large, distributed datasets.
• Humanities – driver is the essential
nature of the research data, rather than
data size or speed of access
Funded by:
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
Challenges of humanities data
•
•
•
•
•
•
•
Diverse and multi-media/multi-format
Complex structure
Highly contextual
Fuzzy and uncertain
Interpretative
Dispersed and isolated
Rigid yet diverse metadata (if any)
Funded by:
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
What would we like to do?
•
•
•
•
•
Federation of the data resources
Location-transparent resource discovery
Delivery of (integrated) data resources
Use of delivered data resources
Semantics of data resources of prime
importance
Funded by:
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
Digital repositories
• Want to represent digital resources so as to
reflect complexity and context.
• So: Store using flexible digital repository
systems (Fedora (www.fedora.info at AHDS).
• Want seamless integration between these
highly structured repositories.
• So: Integrate repository software with grid
middleware.
Funded by:
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
Fedora data model
ator
dc:cre
"Eddie Shin"
te
lastModDa
"2005-01-10:11:02"
hasRep
r
embe
hasM
info:fedora/
demo:11
hasR
ep
info:fedora/demo:11/DC
info:fedora/
demo:10
info:fedora/demo:11/bdef:2/getHIGH
hasM
emb
er
dc:crea
to
r
lastM
odDa
te
info:fedora/
demo:12
ate
odD
lastM
dc:creator
ep
sR
ha
hasR
ep
ha
sR
ep
"Chris Wilper"
"2005-02-01:12:05"
info:fedora/demo:12/DC
info:fedora/demo:12/bdef:2/getHIGH
info:fedora/demo:10/bdef:1/MEMBERS
"2005-01-01:10:00"
"Elly Cramer"
Funded by:
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
Repository-Grid Integration
Two broad approaches:
• Grid as virtualised distributed storage.
• Repositories as data resources on grid.
Funded by:
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
Grid storage for Fedora
Funded by:
• Fedora has been integrated with SRB.
• Currently looking at iRODS (Rule Oriented
Data System) integration
• Will be able to make use of the complex
metadata stored within Fedora.
• Complex actions encoded as rules built up
from atomic services
• Rules integrated with system, yet easily
changeable
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
Fedora and iRODS
Fedora
repository
Fedora
repository
data
storage
Fedora
repository
data
storage
Rules
data
storage
Rules
iRODS zone
iRODS zone
Federated iRODS zones
Funded by:
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
Fedora repositories as grid resources
Funded by:
Use grid technologies to implement:
• Discovery of distributed Fedora repositories
belonging to different administrative domains.
• Discovery of data resources across multiple
Fedora repositories
• Delivery and integration of data resources
across multiple Fedora repositories
• Grid AuthN and AuthZ mechanisms providing
uniform access.
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
Resource Discovery
Funded by:
• Fedora resources all exposed as web
services.
• Use service registries to publish
metadata about repository resources.
• Populate registries by a number of
means.
• Populate registries by a number of
means
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
Delivery and Integration
Funded by:
• Discovered data needs to be retrieved,
integrated and delivered to clients.
• Looking at use of OGSA-DAI for this task
• OGSA-DAI has an extensible architecture to
deal with multiple types of data resources.
• Initially relational and XML databases; projects
looking at extending to other types of data
resource: GEMS, SEE-GEO, GEESE.
• Highly varied nature of repository contents a
significant challenge
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
Delivery
Client
Search/Discovery
OGSA-DAI
Data Service
registry with
semantic
annotations
Integration
RDF export
Text-mining &
semantic annotation
tools
User annotation
tools
OGSA-DAI
Data Service
OGSA-DAI
Data Service
data
resource
data
resource
exposes
Funded by:
Fedora
repository
Fedora
repository
local storage
local storage
s
expose
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
Contact
?
mark dot hedges at ahds dot ac dot uk
Funded by:
NeSC Theme, e-Science in the Arts and Humanities, 2nd July 2007
© AHDS
Download