e-Science Research Jano van Hemert, PhD 1 T

advertisement
e-Science Research
Jano van Hemert, PhD
NI VER
S
E
R
G
O F
H
Y
TH
IT
E
U
D I
U
N B
1
Supporting the research life cycle
Perform experiments
& gather data
Analyse data &
extract knowledge
Design experiments
Research life cycle
Verify hypotheses
Generate hypotheses
Publish findings
Jano van Hemert, National e-Science Centre
2
Supporting the research life cycle
Perform experiments
& gather data
Analyse data &
extract knowledge
Design experiments
Research life cycle
Verify hypotheses
Generate hypotheses
Publish findings
Co
lla
b
or
a
tio
n
Perform experiments
& gather data
Analyse data &
extract knowledge
Design experiments
Research life cycle
Verify hypotheses
Generate hypotheses
Publish findings
Jano van Hemert, National e-Science Centre
3
Supporting the research life cycle
Perform experiments
& gather data
Perform experiments
& gather data
Analyse data &
extract knowledge
Analyse data &
extract knowledge
Design experiments
Design experiments
Research life cycle
Research life cycle
Verify hypotheses
Generate hypotheses
Verify hypotheses
Publish findings
Generate hypotheses
Perform experiments
& gather data
Publish findings
Analyse data &
extract knowledge
Design experiments
Research life cycle
Verify hypotheses
Generate hypotheses
Perform experiments
& gather data
Analyse data &
extract knowledge
Analyse data &
extract knowledge
Design experiments
Research life cycle
Verify hypotheses
Publish findings
Perform experiments
& gather data
Design experiments
Research life cycle
Generate hypotheses
Publish findings
Verify hypotheses
Generate hypotheses
Publish findings
Jano van Hemert, National e-Science Centre
4
Collaborative life cycles – issues to address
• Sharing results: data ownership; heterogeneous nature of data formats,
access points and technologies
• Sharing knowledge: incompatibilities between methodologies
• Shared analyses: access to sufficient computational power; transparent
access to analysis techniques; complexities of data integration
• Sharing hypotheses: inconsistent ontological basis; difficulties in formal
capturing
• Intrinsic difficulties: different speed of cycles; cycles are never perfect;
humans exhibit unpredictable behaviour
Jano van Hemert, National e-Science Centre
5
Technologies to support the research life cycle
• Data integration technology – to allow integration across institutes
• Workflow technology – to make integration and analysis processes more
transparent to the domain experts
• Grid technology – to meet the requirements for computational power, large
storage and data transfers
• Portlet technology – to lower the Grid technology barrier for domain scientists
• Grid-enabled modelling and analyses tools – to allow use of these tailored
tools by their respective domains
• Authentication technology – to realise the single sign-on principle
Jano van Hemert, National e-Science Centre
6
Example of a research life cycle
• FP6/EU-funded design study that aims to design and prototype a panEuropean infrastructure for collaborative gene expression studies on early
human development
• More information: http://www.dgemap.org
• Partners:
‣ Institute of Human Genetics, Centre for Life, Newcastle University
‣ Human Genetics Unit, Medical Research Council (UK), Edinburgh
‣ National e-Science Centre (UK), University of Edinburgh, UK
NI VER
S
E
R
G
O F
H
Y
TH
IT
E
U
D I
U
N B
7
From embryo to analysis
Section
ISH
Curate
Analyse
Reconstruct
8
Collaborative data gathering
Strong
Moderate
curation
Not detected
Original image shows expression
of Hmgb1 in a mouse embryo at
Theiler Stage 17
Standard embryo with mapped
levels of expression
9
Collaborative data access
• Standard web service architecture
• Extensive set of functions
• Returns a proprietary chunk of XML
• Identical interface exists for human
• Allows integration with third party
applications (e.g., Taverna)
10
Taverna workflow
Extracting all studies on
the diencephalon of the
mus musculus
11
Modelling gene interaction based on spatial data
si
gj
260
5
S(si ∩ gj )
similarity(si , gj ) =
S(si ∪ gj )
12
Lhx4, Otx2 => Dmbx1
9830124H08Rik, Trim45 => Brap
J.I. van Hemert and R.A. Baldock. Mining spatial gene expression data for association rules. In S. Hochreiter and R. Wagner, editors, Proceedings of the
1st International Conference on BioInformatics Research and Development, Lecture Notes in Bioinformatics, pages 66–76. Springer Verlag, 2007.
Analyses of spatial data
Relating genes and spatial
regions through association rules
mining of expression patterns
13
Hierarchical clustering
Nine studies that form a cluster
in the hierarchical clustering of all
studies in the data set
14
Thanks for your attention
Jano van Hemert
Malcolm Atkinson
Adam Barker
Jos Koetsier
Yin Chen
Lihao Liang
Richard Baldock Susan Lindsay
Yiya Yang Xunxian Wang
Bill Hill Alina Andreas
Carole Goble Demetrius Vouyiouklis
Stuart Owen Mark Scott
Carlos Ramos
Marie-Laure Muiras
15
Download