e-Science Institute & National e-Science Centre Malcolm

advertisement
Progress with
UK e-Science
BCS
Anglia Ruskin University
Chelmsford
Malcolm Atkinson
Director e-Science Institute
& e-Science Envoy
www.nesc.ac.uk
20th February 2007
Overview
History of e-Science in UK
Three Significant Strengths Established
Projects
e-Infrastructure
Communities & Breadth
ESFRI, EGEE, et al. thriving in Europe
e-Science & Cyberinfrastructure everywhere
e-Science definition & history
Propose an e-Science Framework
Test drive framework on 3 UK project
The framework in today’s technical context
Defining e-Science
e-Science: Systematic Support for
Collaborative Research
Multi-disciplinary, Multi-Site & Multi-National
All disciplines contribute & benefit
Enabling wider engagement
Building with and demanding advances in
Computing Science
Using advances in computing to support
research, design, diagnosis
Dates back 50 years
Prevalent in branches of biology 20 years
Prevalent in Engineering for >40 years
UK e-Science
e-Science and the Grid
‘e-Science is about global collaboration in key
areas of science, and the next generation of
infrastructure that will enable it.’
‘e-Science will change the dynamic of the
way science is undertaken.’
John Taylor
Director General of Research Councils
Office of Science and Technology
From presentation by Tony Hey
UK e-Science Budget
(2001-2006)
Total: £213M + £100M via JISC
EPSRC Breakdown
MRC (£21.1M)
10%
EPSRC (£77.7M)
37%
Applied (£35M)
Staff costs 45%Grid Resources
HPC (£11.5M)
BBSRC (£18M)
15%
8%
NERC (£15M)
7%
Computers & Network Core (£31.2M)
40%
(£57.6M)
funded separately PPARC27%
CLRC (£10M)
5%
ESRC (£13.6M)
6%
+ Industrial Contributions £25M
Source: Science Budget 2003/4 – 2005/6, DTI(OST)
Slide from Steve Newhouse
UK e-Science Diversity
Thriving Community
All disciplines & all
Research Councils
Industry & Academia
Many universities &
research institutes
UK e-Science All
Hands Meetings
Productive
collaboration
e-Infrastructure
A shared resource
That enables science,
research, engineering,
medicine, industry, …
It will improve UK /
European / …
productivity


Lisbon Accord 2000
E-Science Vision SR2000 –
John Taylor
Commitment by UK
government

Sections 2.23-2.25
Always there

c.f. telephones, transport,
power
OSI report

www.nesc.ac.uk/documents/
OSI/index.html
National Centre for e-Social Science
Aberdeen
University of Manchester
University of Essex
Lancaster
Manchester Leeds
Nottingham
Oxford
Bristol
Colchester
London
Edinburgh
National Grid Service and partners
Edinburgh
CCLRC Rutherford
Appleton Laboratory
Lancaster
Manchester
Leeds
York
Sheffield
Cardiff Didcot
Westminster
Bristol
Slide: Neil Geddes
e-Science Centres in the UK
Coordinated by:
Directors’ Forum
Digital Curation Centre & NeSC
Edinburgh
White Rose Grid
Glasgow
Access Grid
Support Centre
Newcastle
Lancaster
Manchester
Leicester
Belfast
National Centre for
Text Mining
National Centre for
e-Social Science
CCLRC Daresbury
National Institute for
e-Science
Leeds
York Environmental
Sheffield
Cambridge
Birmingham
Oxford
National Grid
Service
Cardiff
Bristol
Reading
+2 years
CCLRC RAL
Open Middleware
Infrastructure Institute Southampton LeSC
UCL
OMII-UK nodes
EPCC & National e-Science Centre
School of Computer Science
University of Manchester
Edinburgh
School of Electronics and
Computer Science
University of Southampton
Manchester
Southampton
Digital Curation Centre and partners
Humanities Advanced
Technology and
Information Institute
Database Research Group,
School of Informatics
AHRC Research Centre for
Studies in Intellectual Property
and Technology Law
EDINA
National e-Science Centre
Edinburgh
Glasgow
Rutherford Appleton
(Didcot) and Daresbury
(Warrington)
Laboratories
UKOLN (formerly UK
Office for Library
Networking)
Warrington
Didcot
Bath
Achieving the CI Vision requires
synergy between 3 types of Foundation wide
activities
Transformative
Application - to
enhance discovery &
learning
Provisioning Creation, deployment
and operation of
advanced CI
R&D to enhance technical and
social dimensions of future CI
systems
Office of
Cyberinfrastructure
D. E.
Atkins
Framework for e-Science
Motivation for collaboration
Socio-economic value identified
Impediments recognised
All participants agree & cooperate
Challenge and Insights
Articulated & demanding challenge
Creative new approach
Potentially feasible
Technical advances
New models, new methods, collaboration support
Economic changes - e.g. shared computing
Cultural changes - e.g. shared information
The NERC Success
Professor Robert Gurney
Director, Environmental Systems Science
Centre, Reading
The NERC e-Science experience
11 papers in Nature
Enthusiastic uptake of ensemble methods
climateprediction.net
Predicting Climate Change
Through Volunteer Computing
University of Oxford
Department of
Atmospheric Physics
Slide: Robert Gurney
climateprediction.net Users Worldwide
>300,000 users total (90% MS Windows): >60,000 active
~17 million model-years simulated (as of September '06)
~180,000 completed simulations
Impact:
New Science
Understanding of science
Engaging schools
BBC follow on
The world's largest climate modelling supercomputer!
(NB: a black dot is one or more computers running climateprediction.net)
Slide: Robert Gurney
Climateprediction.net
– Volunteer computing
– Myles Allen, Atmospheric Physics
- More than 10 Million models calculated
- Uses BOINC – portal for broader community
- Used in schools
- Interesting distributed data analysis problems
Framework for e-Science
Motivation for collaboration
Socio-economic value?


Better global warming prediction
public understanding of GW
Impediments?


Reaching enough participants
Gaining attention & resources
Participants cooperate?






Volunteers “buy in”
Boinc culture helps
Good PR  media interest  BBC involved  more incentives
motivated by cause, by visualisation and by wiki
Global net of data collection centres needed - storage & compute!
Why should they contribute?
Framework for e-Science
Challenge and Insights
Challenge?


Explore effects of uncertainty in models & physics of climate
Infeasible amounts of supercomputing time
New approach?



Run simpler model
Use ensemble computing - Monte Carlo parameter exploration
Analyses and integration over all results
Feasible?



BOINC from SETI suggest computation resource feasible
But large volumes of data per model run
Needs to be stored and later analysed
http://climateprediction.net
Framework for e-Science
Technical advances
New model?

Simplified Hadley + …
New method?




Ensemble methods
Distributed using BOINC
Distributed data collection
Distributed data integration and analysis
http://www.allhands.org.uk/2006/proceedings/papers/595.pdf
Collaboration support?


Built on BOINC collaboration support
Improved visualisation
Economic change?


Free model runs > 21 million model hours
How were the data centres financed?
Cultural change?


Explicit use of media
NERC support for community integration
NERC centres
National Institute for
Environmental e-Science,
University of Cambridge
Cambridge
University of Reading
6th September 2006
Swindon
Reading
24
In silico biology
http://www.mygrid.org.uk
Construct in silico experiments,
find and adapt others, manage
the experiment lifecycle
Taverna Workflow workbench
OGSA-DQP
Semantic Technologies
Williams-Beuren Syndrome,
Grave’s Disease,
Trypanosomiasis in cattle.
OMII-UK Node, GRIMOIRE Registry,
Taverna Workflow workbench
12000+ Downloads of Taverna
Wide transfer to BBSRC (e-Fungi,
ISPIDER, ComparaGrid) & MRC
projects (PsyGrid, CLEF, CLEFS)
Semantic Grid pioneer
WBS gene identification
Outstanding international links
Great deal of open source s/w
Links into BOSC & HGMP
KT to BT, ComparaGrid, OntoGrid,
BBSRC Systems Biology Centre,
MIASGrid, Rice Institute etc
Middleware for data intensive in
silico biology by bioinformaticians
• Carole Goble (Comp Sci, Manchester)
• 7 Universities and institutes (incl. EBI)
• 8 Companies
Slide: Carole Goble & Jim Fleming
Framework for e-Science
Motivation for collaboration
Socio-economic value?
Impediments?
Participants cooperate?
Challenge and Insights
Challenge?
New approach?
Feasible?
Technical advances
New model? New method? Collaboration support?
Economic change?
Cultural change?
Taverna Workflow Workbench
Carole Goble
David De Roure
Slide: Dave De Roure & Jeremy Frey
CombeChem Semantic Datagrid
Video
Simulation
Diffractometer
Properties
Analysis
Structures
Database
X-Ray
e-Lab
Properties
e-Lab
Grid Middleware
Slide: Dave De Roure & Jeremy Frey
Presentation services: subject, media-specific, data, commercial portals
Data creation /
capture /
gathering:
laboratory
experiments,
Grids,
fieldwork,
surveys, media
Resource
discovery, linking,
embedding
Data analysis,
transformation,
mining, modelling
Searching ,
harvesting,
embedding
Aggregator
services: national,
commercial
Resource
discovery,
linking,
embedding
Learning object
creation, re-use
Harvesting
metadata
Research &
e-Science
workflows
Deposit / selfarchiving
Learning &
Teaching
workflows
Repositories :
institutional,
e-prints, subject,
data, learning objects
Validation
Deposit / selfarchiving
Publication
Resource
discovery, linking,
embedding
The scholarly knowledge cycle.
Liz Lyon, Ariadne, July 2003.
© Liz Lyon (UKOLN, University of Bath), 2003
This work is licensed under a Creative Commons License
Attribution-ShareAlike 2.0
Peer-reviewed publications:
journals, conference
proceedings
Institutional
presentation
services: portals,
Learning
Management
Systems, u/g, p/g
courses, modules
Validation
Quality
assurance
bodies
Framework for e-Science
Motivation for collaboration
Socio-economic value?
Impediments?
Participants cooperate?
Challenge and Insights
Challenge?
New approach?
Feasible?
Technical advances
New model? New method? Collaboration support?
Economic change?
Cultural change?
Data capture
Slide: Dave De Roure & Jeremy Frey
Ingredient List
Fluorinated biphenyl
Br11OCB
Potassium Carbonate
Butanone
Dissolve 4flourinated
biphenyl in
butanone
0.9 g
1.59 g
2.07 g
40 ml
Plan
To Do
List
CombeChem Semantic Datagrid
Add
Add K2CO3
powder
Heat at reflux
for 1.5 hours
Add
0.9031
Cool and add
Br11OCB
Heat at
reflux until
completion
Cool and add
water (30ml)
Extract with
DCM
(3x40ml)
Cool
Reflux
Add
Cool
Reflux
Liquidliquid
extraction
Add
Combine organics,
dry over MgSO4 &
filter
Dry
Remove
solvent in
vacuo
Remove
Solvent
by Rotary
Evaporation
Filter
Fuse compound to silica &
column in ether/petrol
Column
Chromatography
Fuse
grammes
Inorganics dissolve 2
layers. Added brine
~20ml.
3 of 40
g
excess
ml
text
Ether/
Petrol
Ratio
image
Process
Record
Weigh
Butanone dried via silica column and
measured into 100ml RB flask.
Used 1ml extra solvent to wash out
container.
Silica
Measure
Measure
Sample of 4flourinated
biphenyl
Annotate
DCM
MgSO4
Annotate
Add
1
1
2
2
1
Add
3
Cool
Reflux
text
Sample of
K2CO3
Powder
Measure
3
4
Add
Sample of
Br11OCB
Annotate
Butanone
1
Weigh
5
2
Reflux
Weigh
6
2
4
7
Add
Cool
Water
8
9
10
Dry
Liquidliquid
extraction
Annotate
11
Filter
(Buchner)
Annotate
12
Remove
Solvent
by Rotary
Evaporation
13
Fuse
14
Column
Chromatography
Measure
text
40
Started reflux at 13.30. (Had to
change heater stirrer) Only reflux
for 45min, next step 14:15.
ml
2.0719
g
1.5918
g
30
ml
Organics are yellow
solution
Key
Observation Types
Future Questions
Process
weight - grammes
Whether to have many subclasses of processes or fewer with annotations
Input
Literal
measure - ml, drops
How to depict destructive processes
annotate - text
°
How to depict taking lots of samples
temperature - K, C
Observation
text
Washed MgSO4 with
DCM ~ 50ml
text
Combechem
30 January 2004
gvh, hrm, gms
What is the observation/process boundary? e.g. MRI scan
Slide: Dave De Roure & Jeremy Frey
ecrystals.chem.soton.ac.uk
Slide: Dave De Roure & Jeremy Frey
Slide: Dave De Roure & Jeremy Frey
Grunts and
body language
500,000 years
Printing
600 years
Speech
300,000 years
Broadcasting
100 years
Telecommunications
170 years
Home Computers
Internet and WWW
Mobile phones
Grid and Web 2.0
Writing Web 3.0 and Ubiquitous connected devices
30 years
5,000 years
Today
“Wellbeing” the global-scale killer
app., Sir Robin Saxby Oct. 2006
Timeline
Healthcare @ Home
REFERRAL
GP
Home-mobile-clinic
via PDA-laptop-PC-Paper
REFERRAL
Diabetician
Home-mobile-clinic
via PDA-laptop-PC-Paper
Various Clinical Specialists (Distributed)
e.g. Ophthalmologist, Podiatrist, Vascular
Surgeons, Renal Specialists, Wound clinic,
Foot care clinic, Neurologists, Cardiologists
REFERRAL
VARIABLES
ACCESS
MATRIX
CASE
Patient
Home-mobile-clinic
via TV-PDA-laptop-PC-Paper
Dietitian
Biochemist
Diabetes Specialist / Other Specialist Nurses
Home-mobile-clinic
via TV-PDA-laptop-PC-Paper
Community Nurses / Health Visitors
●
●
●
●
●
●
●
●
●
●
●
DAME
http://www.cs.york.ac.uk/dame/
Aims to manage >1Tb per year of Aero
Engine vibration and maintenance data.
Interlinks with search and reasoning
services.
Defined and evaluated a distributed
search system.
GSI enabled secure engine
performance simulation
CBR advisor for diagnostic engineer
A data architecture defined based on
Globus and SRB.
BROADEN DTI Project (£3.9M)
Spun out technology exploited through
Cybula Ltd., Oxford Biosignals and
DS&S.
Successful mid-term demonstrator well
received by Rolls Royce
White Rose Grid: experience of building
& using production Grids
In Grid Blue Print 2 edition 2
Aircraft healthcare diagnosis
• Jim Austin (Comp Sci, York)
• 4 Universities and institutes
• 3 Companies
Slide: Carole Goble, Jim Fleming & Jim Austin
Timeline (years ago)
Homo habilis existed
between 2.4 and 1.5 million
years ago and the species’
brain shape shows evidence
that some speech had
developed.
Johannes Gutenberg
invented the first printing
press in 1440.
First ‘writing’ system
developed in ancient
Sumeria (cuneiform).
In the US, Charles Herrold
sent out broadcasts as early
as April 1909.
In the UK, the first
experimental broadcasts
from Marconi’s factory began
in 1920.
Arrival of ‘modern man’.
Up to
1,500,000
300,000
Homo erectus lived between
1.8 million and 300,000
years ago. It was a
successful species for over a
million years. The brain
grew steadily during its reign.
The species definitely had
6th September
2006
speech.
50,000
5,000
The first commercial
electrical telegraph was
constructed and opened on 9
April 1839.
600
170
100
30
Home Computers
Internet and WWW
Mobile phones
Grid and Web 2.0
Web 3.0 and Ubiquitous
40
connected devices
The Semantic Web layer
cake
User Interface and Applications
Trust
OWL
SPARQL
(queries)
Rules
RDF Schema
Signature
Proof
RDF
XML + Namespaces
URI
Encryption
Attribution
Explanation
Ontologies +
Inference
Metadata
Standard syntax
Unicode
Identity
S-OGSA
Model
Semantic Grid
Annotation
Tool/Service
Is-a
Ontology
Service
Reasoning
Service
VO
Manager
Metadata
Store/Service
Knowledge
Service
Semantic Binding
Provisioning Service
Is-a
Semantic Provisioning
Service
1..m
produce
Is-a
Ontology
Grid Service
Is-a
Is-a
1..m
Grid Entity
consume
0..m
0..m
Is-a
Grid Resource
Is-a
Knowledge
Carole
Goble
File
mgt
Policy
Semantic Entity
Knowledge
Resource
Rule set
Intelligent
Monitoring
Is-a
Is-a
Knowledge
Entity
Is-a
0..m
Semantic Binding
Satellite
Image File
0..m
Grid
Semantic aware
Grid Service
Is-a
JSDL file
What’s Web2.0 ?
“Web 2.0, a phrase coined
by O'Reilly Media in 2004,
refers to a supposed
second-generation of
Internet-based services
such as social networking
sites, wikis, communication
tools, and folksonomies
that let people collaborate
and share information
online in previously
unavailable ways.” Wikipedia
Pamela
Fox
So what’s a mashup
anyway?
A mashup is a website or application that
combines content from more than one
source into an integrated experience.
Content used in mashups is typically
sourced from a third party via a public
interface or API. Other methods of sourcing
content for mashups include Web feeds (e.g.
RSS or Atom) and JavaScript.” – Wikipedia
A mashup is the ultimate user-generated
content: user likes data source A, data
source B, & puts them together how they
like.
* There are also music & video mashups
Pamela
Fox
Amazon Web Services
Web 2.0 APIs
http://www.programmableweb.com/apis
currently (Jan 10 2007) 356 Web 2.0 APIs with
GoogleMaps the most used in Mashups
This site acts as a “UDDI” for Web 2.0
Geoffrey Fox
Take Home
UK e-Science investment built 3
interdependent strengths:
Communities & collaboration
Projects delivering & demanding
e-Infrastructure: organisation, support &
technology
Three success factors for projects
Engagement & value for all participants
Creativity & insight addressing a well-posed
challenge
Technology adoption and innovation
Research, design or diagnosis is the driver
Integrate whatever technology you need
Invent new technology only if you have to
Download