e-Science Research Jano van Hemert, PhD NI VER S E R G O F H Y TH IT E U D I U N B 1 Supporting the research life cycle Perform experiments & gather data Analyse data & extract knowledge Design experiments Research life cycle Verify hypotheses Generate hypotheses Publish findings Jano van Hemert, National e-Science Centre 2 Supporting the research life cycle Perform experiments & gather data Analyse data & extract knowledge Design experiments Research life cycle Verify hypotheses Generate hypotheses Publish findings Co lla b or a tio n Perform experiments & gather data Analyse data & extract knowledge Design experiments Research life cycle Verify hypotheses Generate hypotheses Publish findings Jano van Hemert, National e-Science Centre 3 Supporting the research life cycle Perform experiments & gather data Perform experiments & gather data Analyse data & extract knowledge Analyse data & extract knowledge Design experiments Design experiments Research life cycle Research life cycle Verify hypotheses Generate hypotheses Verify hypotheses Publish findings Generate hypotheses Perform experiments & gather data Publish findings Analyse data & extract knowledge Design experiments Research life cycle Verify hypotheses Generate hypotheses Perform experiments & gather data Analyse data & extract knowledge Analyse data & extract knowledge Design experiments Research life cycle Verify hypotheses Publish findings Perform experiments & gather data Design experiments Research life cycle Generate hypotheses Publish findings Verify hypotheses Generate hypotheses Publish findings Jano van Hemert, National e-Science Centre 4 Collaborative life cycles – issues to address • Sharing results: data ownership; heterogeneous nature of data formats, access points and technologies • Sharing knowledge: incompatibilities between methodologies • Shared analyses: access to sufficient computational power; transparent access to analysis techniques; complexities of data integration • Sharing hypotheses: inconsistent ontological basis; difficulties in formal capturing • Intrinsic difficulties: different speed of cycles; cycles are never perfect; humans exhibit unpredictable behaviour Jano van Hemert, National e-Science Centre 5 Technologies to support the research life cycle • Data integration technology – to allow integration across institutes • Workflow technology – to make integration and analysis processes more transparent to the domain experts • Grid technology – to meet the requirements for computational power, large storage and data transfers • Portlet technology – to lower the Grid technology barrier for domain scientists • Grid-enabled modelling and analyses tools – to allow use of these tailored tools by their respective domains • Authentication technology – to realise the single sign-on principle Jano van Hemert, National e-Science Centre 6 Example of a research life cycle • FP6/EU-funded design study that aims to design and prototype a panEuropean infrastructure for collaborative gene expression studies on early human development • More information: http://www.dgemap.org • Partners: ‣ Institute of Human Genetics, Centre for Life, Newcastle University ‣ Human Genetics Unit, Medical Research Council (UK), Edinburgh ‣ National e-Science Centre (UK), University of Edinburgh, UK NI VER S E R G O F H Y TH IT E U D I U N B 7 From embryo to analysis Section ISH Curate Analyse Reconstruct 8 Collaborative data gathering Strong Moderate curation Not detected Original image shows expression of Hmgb1 in a mouse embryo at Theiler Stage 17 Standard embryo with mapped levels of expression 9 Collaborative data access • Standard web service architecture • Extensive set of functions • Returns a proprietary chunk of XML • Identical interface exists for human • Allows integration with third party applications (e.g., Taverna) 10 Taverna workflow Extracting all studies on the diencephalon of the mus musculus 11 Modelling gene interaction based on spatial data si gj 260 5 S(si ∩ gj ) similarity(si , gj ) = S(si ∪ gj ) 12 Lhx4, Otx2 => Dmbx1 9830124H08Rik, Trim45 => Brap J.I. van Hemert and R.A. Baldock. Mining spatial gene expression data for association rules. In S. Hochreiter and R. Wagner, editors, Proceedings of the 1st International Conference on BioInformatics Research and Development, Lecture Notes in Bioinformatics, pages 66–76. Springer Verlag, 2007. Analyses of spatial data Relating genes and spatial regions through association rules mining of expression patterns 13 Hierarchical clustering Nine studies that form a cluster in the hierarchical clustering of all studies in the data set 14 Thanks for your attention Jano van Hemert Malcolm Atkinson Adam Barker Jos Koetsier Yin Chen Lihao Liang Richard Baldock Susan Lindsay Yiya Yang Xunxian Wang Bill Hill Alina Andreas Carole Goble Demetrius Vouyiouklis Stuart Owen Mark Scott Carlos Ramos Marie-Laure Muiras 15