Data 2.0: towards collaborative data for arts and humanities Grid Enabling Humanities Datasets e-Science Institute, 2nd July 2007 Neil Chue Hong Summary From Data Grids To Data Services The Rise of Web 2.0 Towards Data 2.0 Collaborative data for arts and humanities building on Grid middleware © 2 Grid versus Users Grid is about: sharing resources interoperable middleware allowing bigger problems integrating communities improving security bringing together data Users want to: access more resources ignore middleware solve bigger problems form communities have simple security bring together data Grid and Users want very similar things and yet there is still a “want-got-gap” between them how can this be bridged? © 3 Data Grids The first generation of Grids concentrated on Compute Grids harnessing capacity to improve capability Then came the first Data Grids mechanisms for dealing with the large amounts of data generated by sensors and simulations © 4 Data Challenges Diversity of data resource types, vendors, middleware, schema, metadata Scale of collections, formats, geographical, political and social distance Ownership on individual, group, and organisation levels; intersecting yet independent Security for client, service and data owner; at many levels, with many tradeoffs © 5 Move towards data services Defined interface to stored collection of data e.g. Google and Amazon But the data could be: replicated shared federated virtual incomplete Make access transparent Make integration easy Make management simple Improve the ability to discover, reference, annotate, search, and provide provenance © 6 Grid Data Services Data middleware provides a way of publishing data in a uniform way accessible discoverable searchable Provide tools such as registries replica catalogs mediators © 7 Grid versus User: Round 2 Grids provide: data discovery services distributed queries basic provenance workflows to represent analysis process Users want: information to find the right data cross-database searches sophisticated annotation to explore the information space Data 2.0 must go beyond simple data access domain-specific vs generic data services composability, interoperability and ease of use © 8 The Rise of Web 2.0 New sites allow non-technical users to share information and interact in programmable environments Social Networking: MySpace, Bebo, Facebook GIS: Google Maps, Google Earth Preference Matching: Amazon Meta-clustering: digg, del.icio.us Information Publishing: Flickr © 9 The Rise of Web 2.0 New sites allow non-technical users to share information and interact in programmable environments Social Networking: MySpace, Bebo, Facebook GIS: Google Maps, Google Earth Preference Matching: Amazon Meta-clustering: digg, del.icio.us Information Publishing: Flickr An army of curators, a world of information © 10 From DSs to VREs Virtual Research Environments bridge gap between middleware and users integrate functionality and facilities OMII-UK is working with projects to support and develop solutions projects: nanoCMOS, CARMEN, Documents and Manuscripts, VERA, SEEGEO, myExperiment, … software: portlets, OGSA-DAI, Taverna, BPEL solutions: campus data management, annotation © 11 SEE-GEO: Geolinking GLS Portal Access domain-specific Census GDAS DB Request attributes Send parameterised query data sets Retrieve annotated image Efficient delivery methods OGSA-DAI getData Cache attributes Run algorithm geoLink Borders WFS DB getFeature Stream polygons Request features Stream relevant annotated polygons Concentrate on algorithm © Feature Portrayal Store image on server Map Server Utilise existing services FPS Call out to existing FP service 12 Virtual Workspace for the Study of Ancient Documents An interface allowing browsing and searching of multiple image collections, including tools to compare and annotate the researcher’s personal collection © 13 Data 2.0: Grid Enabling Datasets Many diverse data sources Many diverse users each sharing and utilising multiple datasets A personalised, virtual data warehouse independently owned and curated bring together many sources to appear as one Allow shared, distributed, centralised, replicated annotation to build a community © 14 Data 2.0: From Silos to Sharing Edin Data OD Amy Annot. Manc Data Choose data based on stored metadata bring together for each user Build a community by providing tools to contribute back Dataset Annotation OD Choose Dataset Soton Data VRE Portal OD Central Annot. Bob Annot. © Add Annotation 15 Summary Grids provide ways of making data more accessible Users are looking for ways of making data more personal Web 2.0 shows a new way of collaborative working enhanced by technology OMII-UK is working with projects to help develop and support software and solutions to enhance collaborative data for humanities © 16