Alun_eScienceSeminar - Homepages | The University of Aberdeen

advertisement
CS5547
e-Science & Grid Computing
- introduction -
What is e-Science? What is the Grid?
Grid middleware
Virtual Organisations - some issues
Data access & integration
Metadata
MSc in e-Science Technology at-a-glance
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
1
CS5547
Some definitions
e-Science
“The large scale science that will increasingly be carried out
through distributed global collaborations enabled by the Internet.
“Typically, a feature of such collaborative scientific enterprises is
that they will require access to very large data collections, very
large scale computing resources and high performance
visualisation back to the individual user scientists.”
[nesc.ac.uk]
Grid
“An infrastructure that enables flexible, secure, coordinated
resource sharing among dynamic collections of individuals,
institutions and resources.”
[Foster & Kesselman, globus.org]
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
2
The Global Grid
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
3
http://www.nesc.ac.uk/events/ahm2004/presentations/TonyHey.ppt
CS5547
UK SuperJANET 4/5
(Links up to 2.5Gbit/s)
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
4
http://www.nesc.ac.uk/events/ahm2004/presentations/TonyHey.ppt
CS5547
CS5547
Scale, distribution, complexity
Person
Cell
Multiscale modelling of the heart
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
5
http://www.nesc.ac.uk/events/ahm2004/presentations/TonyHey.ppt
Multiscale modelling of cancer
Large Hadron Collider (LHC)
http://gridportal.hep.ph.ic.ac.uk/rtm/
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
6
http://www.nesc.ac.uk/events/ahm2004/presentations/BobJones.ppt
CS5547
e-Science &
CS5547
engineering
Airline
office
London Airport
New York Airport
Grid
Diagnostics Centre
Maintenance Centre
American data center
European data center
XTO
Companies:
Rolls-Royce
DS&S
Cybula
Universities:
York,
Leeds,
Sheffield, Oxford
Engine Model
Case Based Reasoning
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
Signal Data Explorer
7
http://www.nesc.ac.uk/events/ahm2004/presentations/TonyHey.ppt
Engine flight data
“A Significant factor in the success of the Rolls-Royce
campaign to power the Boeing 7E7 with the Trent 1000
was the emphasis on the new aftermarket support service
for the engines provided via DS&S. Boeing personnel
were shown DAME as an example of the new ways of
gathering and processing the large amounts of data that
could be retrieved from an advanced aircraft such as the
7E7, and they were very impressed”, DS&S 2004
e-Science
CS5547 workflows
B
C
A: Identification of
overlapping sequence
B: Characterisation of
nucleotide sequence
C: Characterisation of
protein sequence
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
8
http://www.nesc.ac.uk/events/ahm2004/presentations/TonyHey.ppt
A
CS5547 Grid middleware: Globus toolkit (GT)
The Physiology of the Grid: An
Open Grid Services Architecture
for Distributed Systems
Integration. I. Foster, C.
Kesselman, J. Nick, S. Tuecke,
Open Grid Service Infrastructure
WG, Global Grid Forum, 2002.
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
9
http://www.globus.org
The Anatomy of the Grid:
Enabling Scalable Virtual
Organizations. I. Foster, C.
Kesselman, S. Tuecke.
International J. Supercomputer
Applications, 15(3), 2001.
Grid & Web Services convergence
The definition of WSRF means that the Grid and Web services
communities can move forward on a common base.
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
10
http://www.globus.org
CS5547
CS5547
Web & Grid Services
Specifications that have/will enter a standardisation process
but are not stable and are still experimental
‘WS-I+’
profile
WS-I
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
11
http://www.globus.org
Specifications that are emerging
from standardisation process
and are recognised as being ‘useful’
Standards that have
broad industry support
and multiple interoperable
implementations
UK National Grid Service
Interfaces
Projects
e-Minerals
e-Materials
Orbital Dynamics of Galaxies
Bioinformatics (using BLAST)
GEODISE project
UKQCD Singlet meson project
Census data analysis
MIAKT project
e-HTPX project.
RealityGrid (chemistry)
OGSI::Lite
Users
Leeds
Oxford
UCL
Cardiff
Southampton
Imperial
Liverpool
Sheffield
Cambridge
Edinburgh
QUB
BBSRC
CCLRC
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
12
http://www.nesc.ac.uk/events/ahm2004/presentations/TonyHey.ppt
CS5547
CS5547 Grid Virtual Organisations - some issues
Forming a VO dynamically
• partner identification
• Service Level Agreements
(SLAs)
• QoS, trust, reputation
Operating a VO
• monitoring QoS
• perturbation: coping with
failures - and new
opportunities!
• policing: what went wrong?
who’s to blame?
www.conoise.org
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
13
CS5547
Grid Data Service
element
query
data
element
element
Transform
Activity
data
Delivery
Activity
credentials
data
connection
credentials
connection
role
Data Resource
Implementation
role
Role Mapper
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
14
http://www.ogsadai.org.uk/
Query
Activity
response
document
The
Engine
perform
document
Sql
Query
Statement
Deliver
ToURL
GDS - pipeline example
<sqlQueryStatement name="statement">
<expression>
select * from myTable where id=10
</expression>
<resultSetStream name=“MyOutput"/>
</sqlQueryStatement>
<deliverToURL name="deliverOutput">
<fromLocal from=“MyOutput"/>
<toURL>
ftp://anon:frog@ftp.example.com/home
</toURL>
</deliverToURL>
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
15
http://www.ogsadai.org.uk/
CS5547
CS5547
Grid data access & integration
Solutions in place to handle
• heterogeneous data storage
• pipelines / dataflows
• access control
• … within the Grid svc arch
Major issues remain, including
• provenance - where did it come
from, who did what to it?
• data quality - living with
variable-quality data
(www.qurator.org)
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
16
http://www.ogsadai.org.uk/
Not specific to e-Science!
e.g. see
FirstDIG project
CS5547
Metadata in e-Science
Publications
• formal/reviewed
• “grey”
• associated artefects
Experiment datasets
• formally curated
• raw/pre-processed
• in vivo / in vitro / in silico
People
• expert directories
• communities of practice
Scientific method
• experiment workflow
• knowledge roles:
hypotheses, observations,
predictions, deductions, …
• Discourse & natural
arguments: proof,
refutation, agreement, …
Projects
• formal/funded
• working groups
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
17
CS5547
Managing scientific metadata
Evidence
Experiment
Experiment
Hypothesis
Publication
Publication
Hypothesis
Hypothesis
Publication
Publication
Disagrees With
Hypothesis
Agrees With
Hypothesis
Hypothesis
e-Science metadata management platform
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
18
CS5547
Fearlus-G
pilot project
metadata schema
(ontology)
desktop client
metadata
client
Globus
client
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
19
CS5547
MSc e-Science Technologies: next…
CS5547 e-Science & Grid Computing
• Grid middleware, e-Science workflow, metadata
CS5553 Intelligent Architectures
• technologies for Virtual Organisations
CS5545 Data Interpretation & Communication
• technologies at the data/user-scientist interface
CS5544 E-Technology Workshop
• group project, with an e-Science application
CS5945 MSc Project in E-Technology
• potential to do a project with user-scientists
http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547
20
Download