Data 2.0: towards collaborative data for arts and humanities Grid Enabling Humanities Datasets

advertisement
Data 2.0: towards
collaborative data for
arts and humanities
Grid Enabling Humanities Datasets
e-Science Institute, 2nd July 2007
Neil Chue Hong
Summary




From Data Grids
To Data Services
The Rise of Web 2.0
Towards Data 2.0
Collaborative data for arts
and humanities building
on Grid middleware
©
2
Grid versus Users

Grid is about:








sharing resources
interoperable middleware
allowing bigger problems
integrating communities
improving security
bringing together data
Users want to:






access more resources
ignore middleware
solve bigger problems
form communities
have simple security
bring together data
Grid and Users want very similar things


and yet there is still a “want-got-gap” between them
how can this be bridged?
©
3
Data Grids

The first generation of
Grids concentrated on
Compute Grids


harnessing capacity to
improve capability
Then came the first Data
Grids

mechanisms for dealing with
the large amounts of data
generated by sensors and
simulations
©
4
Data Challenges
Diversity
of data resource types, vendors,
middleware, schema, metadata
Scale
of collections, formats, geographical,
political and social distance
Ownership
on individual, group, and organisation
levels; intersecting yet independent
Security
for client, service and data owner;
at many levels, with many tradeoffs
©
5
Move towards data services

Defined interface to stored collection of data
 e.g. Google and Amazon

But the data could be:
 replicated
 shared
 federated
 virtual
 incomplete
Make access transparent
Make integration easy
Make management simple

Improve the ability to discover, reference,
annotate, search, and provide provenance
©
6
Grid Data Services

Data middleware provides
a way of publishing data
in a uniform way




accessible
discoverable
searchable
Provide tools such as



registries
replica catalogs
mediators
©
7
Grid versus User: Round 2

Grids provide:







data
discovery services
distributed queries
basic provenance
workflows to represent
analysis process
Users want:





information
to find the right data
cross-database searches
sophisticated annotation
to explore the information
space
Data 2.0 must go beyond simple data access


domain-specific vs generic data services
composability, interoperability and ease of use
©
8
The Rise of Web 2.0

New sites allow non-technical users to share
information and interact in programmable
environments





Social Networking: MySpace, Bebo, Facebook
GIS: Google Maps, Google Earth
Preference Matching: Amazon
Meta-clustering: digg, del.icio.us
Information Publishing: Flickr
©
9
The Rise of Web 2.0

New sites allow non-technical users to share
information and interact in programmable
environments






Social Networking: MySpace, Bebo, Facebook
GIS: Google Maps, Google Earth
Preference Matching: Amazon
Meta-clustering: digg, del.icio.us
Information Publishing: Flickr
An army of curators, a world of information
©
10
From DSs to VREs

Virtual Research Environments



bridge gap between middleware and users
integrate functionality and facilities
OMII-UK is working with projects to support
and develop solutions



projects: nanoCMOS, CARMEN, Documents and
Manuscripts, VERA, SEEGEO, myExperiment, …
software: portlets, OGSA-DAI, Taverna, BPEL
solutions: campus data management, annotation
©
11
SEE-GEO: Geolinking
GLS
Portal
Access domain-specific
Census GDAS
DB
Request
attributes
Send
parameterised
query
data sets
Retrieve
annotated
image
Efficient
delivery methods
OGSA-DAI
getData
Cache
attributes
Run
algorithm
geoLink
Borders WFS
DB
getFeature Stream
polygons
Request
features
Stream
relevant
annotated
polygons
Concentrate on algorithm
©
Feature
Portrayal
Store
image on
server
Map
Server
Utilise existing
services
FPS
Call out
to existing
FP service
12
Virtual Workspace for the Study of
Ancient Documents

An interface allowing browsing and searching
of multiple image collections, including tools
to compare and annotate the researcher’s
personal collection
©
13
Data 2.0:
Grid Enabling Datasets

Many diverse data sources


Many diverse users


each sharing and utilising multiple datasets
A personalised, virtual data warehouse


independently owned and curated
bring together many sources to appear as one
Allow shared, distributed, centralised,
replicated annotation to build a community
©
14
Data 2.0: From Silos to Sharing

Edin
Data
OD

Amy
Annot.
Manc
Data
Choose data based on stored metadata
 bring together for each user
Build a community by providing tools to
contribute back
Dataset
Annotation
OD
Choose
Dataset
Soton
Data
VRE
Portal
OD
Central
Annot.
Bob
Annot.
©
Add
Annotation
15
Summary




Grids provide ways of making data more
accessible
Users are looking for ways of making data
more personal
Web 2.0 shows a new way of collaborative
working enhanced by technology
OMII-UK is working with projects to help
develop and support software and solutions
to enhance collaborative data for humanities
©
16
Download