Re-use or Re-invention - a Roadmap for Data Integration 27 -28th November 2006

advertisement
Re-use or Re-invention - a
Roadmap for Data Integration
27th-28th November 2006
Prof. Jessie Kennedy
e-SI Research Theme Leader
e-SI Theme:
Exploiting Diverse Sources of Scientific Data
e-SI Research Theme
Exploiting Diverse Sources of Scientific Data
Aim of theme is to investigate some of the
issues and solutions to exploiting diverse
sources of scientific data.
Theme Wiki
http://wiki.esi.ac.uk
Exploiting Diverse Sources of Scientific Data
2
Workshops hosted in theme:
 Spatiotemporal Databases for Geosciences, Biomedical Sciences and
Physical sciences, eSI, 1-2 November 2005
 Oracle Corporation and the e-Science Institute Seminar - Temporal
Database in Depth: Time and the Data Warehouse, eSI, 3 November 2005
 The Second Workshop on Scientific Data Mining, Integration and
Visualization (SDMIV2), eSI, December 2005
 DIALOGUE meeting, 9-10 February 2006
 Integrated Health Records - Practice and Technology, 9-10 March 2006
 Taxonomic Databases Working Group (TDWG) Technical Architecture Group
meeting, 11 April 2006
 Taxonomic Databases Working Group (TDWG) Core Ontology, 16-18 May
 RDF, Ontologies and Meta-Data Workshop, 7-9 June 2006
 Taxonomic Databases Working Group (TDWG) GUIDs-2, 10-12 June 2006
 The Closed World of Databases Meets the Open World of the Semantic Web
12-13th October 2006
Exploiting Diverse Sources of Scientific Data
3
Workshops hosted in theme:
 Spatiotemporal Databases for Geosciences, Biomedical Sciences and
Physical sciences, eSI, 1-2 November 2005
 Oracle Corporation and the e-Science Institute Seminar - Temporal
Database in Depth: Time and the Data Warehouse, eSI, 3 November 2005
 The Second Workshop on Scientific Data Mining, Integration and
Visualization (SDMIV2), eSI, 14 December 2005
 DIALOGUE meeting, 9-10 February 2006
 Integrated Health Records - Practice and Technology, 9-10
March 2006
 Taxonomic Databases Working Group (TDWG) Technical Architecture Group
meeting, 11 April 2006
 Taxonomic Databases Working Group (TDWG) Core Ontology, 16-18 May
 RDF, Ontologies and Meta-Data Workshop, 7-9 June 2006
 Taxonomic Databases Working Group (TDWG) GUIDs-2, 10-12 June 2006
 The Closed World of Databases Meets the Open World of the
Semantic Web, 12-13 October 2006
Exploiting Diverse Sources of Scientific Data
4
Recurring Issues  Focus
Architectures for Data Integration
What strategies are used for data integration
workflow architectures
grid architectures
Globally Unique Identifiers
What gets a GUID? Who issues them? What technology?
Metadata, Terminologies and Ontologies needed…
Data discovery for sharing/analysis (integrating)
Understanding the content of data sets
Automatic transformations (semantic mediation)
Will these solve the problems?
Exploiting Diverse Sources of Scientific Data
5
Metadata and Ontologies
Issues
Standardisation of formats
Creation of metadata and ontologies
Manual and automatic
For whom and by whom?
Many or one ontology (granularity)?
How do we integrate them or map between them?
How do we choose which to use/reuse?
How do we know data is suitable for our purpose?
Exploiting Diverse Sources of Scientific Data
6
Re-use or Re-invention - a
Roadmap for Data Integration
Investigate data integration issues in the
Neuroscience community
Data
Collection across sites and populations
Integration of different types of data
Integration of data at different levels of granularity
Ontologies
Harmonisation or Integration
Day 1 - General issues
integration across sites and populations
Day 2 - Special areas of interest
ontologies + information governance
Format
Brief Introductory Talks followed by Roundtable discussions
Exploiting Diverse Sources of Scientific Data
7
Enjoy the Workshop!
e-SI Theme:
Exploiting Diverse Sources of Scientific Data
Download