Re-use or Re-invention - a Roadmap for Data Integration 27th-28th November 2006 Prof. Jessie Kennedy e-SI Research Theme Leader e-SI Theme: Exploiting Diverse Sources of Scientific Data e-SI Research Theme Exploiting Diverse Sources of Scientific Data Aim of theme is to investigate some of the issues and solutions to exploiting diverse sources of scientific data. Theme Wiki http://wiki.esi.ac.uk Exploiting Diverse Sources of Scientific Data 2 Workshops hosted in theme: Spatiotemporal Databases for Geosciences, Biomedical Sciences and Physical sciences, eSI, 1-2 November 2005 Oracle Corporation and the e-Science Institute Seminar - Temporal Database in Depth: Time and the Data Warehouse, eSI, 3 November 2005 The Second Workshop on Scientific Data Mining, Integration and Visualization (SDMIV2), eSI, December 2005 DIALOGUE meeting, 9-10 February 2006 Integrated Health Records - Practice and Technology, 9-10 March 2006 Taxonomic Databases Working Group (TDWG) Technical Architecture Group meeting, 11 April 2006 Taxonomic Databases Working Group (TDWG) Core Ontology, 16-18 May RDF, Ontologies and Meta-Data Workshop, 7-9 June 2006 Taxonomic Databases Working Group (TDWG) GUIDs-2, 10-12 June 2006 The Closed World of Databases Meets the Open World of the Semantic Web 12-13th October 2006 Exploiting Diverse Sources of Scientific Data 3 Workshops hosted in theme: Spatiotemporal Databases for Geosciences, Biomedical Sciences and Physical sciences, eSI, 1-2 November 2005 Oracle Corporation and the e-Science Institute Seminar - Temporal Database in Depth: Time and the Data Warehouse, eSI, 3 November 2005 The Second Workshop on Scientific Data Mining, Integration and Visualization (SDMIV2), eSI, 14 December 2005 DIALOGUE meeting, 9-10 February 2006 Integrated Health Records - Practice and Technology, 9-10 March 2006 Taxonomic Databases Working Group (TDWG) Technical Architecture Group meeting, 11 April 2006 Taxonomic Databases Working Group (TDWG) Core Ontology, 16-18 May RDF, Ontologies and Meta-Data Workshop, 7-9 June 2006 Taxonomic Databases Working Group (TDWG) GUIDs-2, 10-12 June 2006 The Closed World of Databases Meets the Open World of the Semantic Web, 12-13 October 2006 Exploiting Diverse Sources of Scientific Data 4 Recurring Issues Focus Architectures for Data Integration What strategies are used for data integration workflow architectures grid architectures Globally Unique Identifiers What gets a GUID? Who issues them? What technology? Metadata, Terminologies and Ontologies needed… Data discovery for sharing/analysis (integrating) Understanding the content of data sets Automatic transformations (semantic mediation) Will these solve the problems? Exploiting Diverse Sources of Scientific Data 5 Metadata and Ontologies Issues Standardisation of formats Creation of metadata and ontologies Manual and automatic For whom and by whom? Many or one ontology (granularity)? How do we integrate them or map between them? How do we choose which to use/reuse? How do we know data is suitable for our purpose? Exploiting Diverse Sources of Scientific Data 6 Re-use or Re-invention - a Roadmap for Data Integration Investigate data integration issues in the Neuroscience community Data Collection across sites and populations Integration of different types of data Integration of data at different levels of granularity Ontologies Harmonisation or Integration Day 1 - General issues integration across sites and populations Day 2 - Special areas of interest ontologies + information governance Format Brief Introductory Talks followed by Roundtable discussions Exploiting Diverse Sources of Scientific Data 7 Enjoy the Workshop! e-SI Theme: Exploiting Diverse Sources of Scientific Data