Research at the National e-Science Centre 6 November 2003

advertisement
Research at the
National e-Science Centre
Dr. Dave Berry
Research Manager
www.nesc.ac.uk
6th November 2003
Three Pillars of e-Science Research
Apply known
results
Foundations
Edinburgh:
- Informatics
- Physics &
Astronomy
Glasgow:
- Computing
Science
- Physics &
Astronomy
Enable new
science
Technology
Focus for
new work
EPCC
ETF&Testbeds
edikt
Repositories
Computing
Industry
Applications
Steering of
development
Research
Departments
Research
Institutes
Other
Universities
Commercial
Customers
Information Grids
Apply known
Peter
results
Foundations
Publishing
Scientific
Data
GridPP
ScotGrid
Buneman’s
GroupTechnology
Focus for
new work
OGSA-DAI/
DAIT
th
1,000
Enable new
Download
science
Sep 2003 Applications
Steering of
development
ODD-Genes
AstroGrid
Tony edikt
Doyle
&
– eldas
BinX
Steveand
Playfer
BRIDGES
FirstDIG
QCDGrid
Richard
Kenway
Richard
Baldock
Biological
SpatioTemporal
Databases
Computation Grids
Apply known
results
Foundations
GridPP
ScotGrid
Enable new
science
Technology
Focus for
new work
SunDCG
> 3000 doc
Applications
downloads
Steering of
development
ODD-Genes
PGPGrid
RealityGrid
Enhance
Murray
Cole
Paul
Cockshott
Fabrics and Platforms
Apply known
Joe
results
Enable new
science
Sventek
Foundations
AMUSE
Dynamic
Configuration
of Grid Fabrics
Dependable
Grid Services
Technology
Focus for
new work
MS.NETGrid
GridWeaver
Applications
Steering of
development
OGSA Test
Grid
IBM Grid
Evaluation
Stuart
Anderson
LCFG +
SmartFrog
More foundations
Service Composition
Alan
Bundy
Deductive Synthesis Techniques …
Inferring QoS Properties for Grid Applications
Don Sannella,
Stephen Gilmore
Mobile Code
Mobile Resource Guarantees
IRCs
CoAKTinG
EQUATOR
Security
Austin
Tate
Matthew
Chalmers
Technologies for Information Environment
Security
More applications
Physics
CDF Grid Development
NeuroInformatics
David
Wilshaw for
Grid-enabled Modelling Tools and Databases
Neuroinformatics
BioInformatics
e-Diamond (mammography)
Rob
Procter
http://www.nesc.ac.uk/projects/
Data Repositories
Medical Genetics
Generation Scotland
Human Genetics Unit
Mouse Atlas
Nuclear Protein Database
Roslin Institute
ArkDB,
Informatics
EUSTACE Corpus
FlyTrap
GeoSciences
Antarctic Survey data
Continental seismic survey data
BGS offshore survey
Example: ODD-Genes
ODD-Genes is a demonstrator
Demonstrates how Grid technologies enable e-Science,
accelerating scientific discovery
SunDCG’s TOG software allows for job submission on remote
compute resources
OGSA-DAI provides access, control and discovery of data
resources
"This project has demonstrated how
Grid technologies can be used to
ODD-Genes used
to investigate Wilms Tumour
enable true e-Science - discoveries
Routine statistical conditioning
ofotherwise
microarray
results
that would not
have
been
Data-driven discovery
of novel
targets
investigation and
achieved
without
thisfor
infrastructure
potential therapy
in place."
Professor Peter Ghazal, Director,
Collaborative project
GTI.
NeSC/EPCC
Scottish Centre for Genomic Technology and Informatics (GTI)
Human Genetics Unit at MRC, Western General Hospital (HGU)
SunDCG – Enabling Routine Statistical
Conditioning
Choose analysis to perform
Automates analysis process
Provides predetermined
workflow
Can run more than one
analysis at a time
Multiple reproducible avenues
for investigation
Reduces cost (human,
machine), increases availability
TOG enables this by allowing
access to HPC resources
SunDCG Compute Scheduler
B
A
Grid Engine
a b c d
e
Globus 2
User A
Grid Engine
e
f
User B
g h
d
Integrates Grid Engine and Globus 2
GE execution methods provide job submission/control
GE job context stores job specific information
Globus GSI for security
Globus GRAM enables interaction with remote resource
GASS for small data transfer, GridFTP for large datasets
OGSA-DAI - Results Investigation
Multiple views of data
Raw
Heat Map
Cluster Map
Wilms Tumour study
takes a new direction
two genes appear
significant in early
development
Researchers would like
more info on these
genes…
OGSA-DAI - Data Resource
Discovery
OGSA-DAI uses keywords to locate
relevant data resources
May return data resources previously
unknown to researcher
Researcher selects most interesting
data resource to query for information
about gene
Researcher selects Mouse atlas –
narrow, deep database of spatial gene
expression in mice embryonic
development
Contrast with GTI database of broad,
shallow genome-wide gene expression
across multiple organisms, stages &
conditions
OGSA-DAI - Data Resource Query
OGSA-DAI returns data from
query
Data and annotation displayed
Data contains references to
related images
Researcher rapidly moves from
numeric and textual description
to spatial representation of
relevant gene expression
These show that the genes
are stem cell markers
Targets for focussed
investigation, potential therapy
Data Access & Integration Services
1a. Request to Registry
for sources of data
about “x”
SOAP/HTTP
Registry
1b. Registry
responds with
Factory handle
service creation
API interactions
2a. Request to Factory for access
to database
Factory
Client
2c. Factory returns
handle of GDS to
client
3a. Client queries GDS with
XPath, SQL, etc
3c. Results of query returned to
client as XML
2b. Factory creates
GridDataService to manage
access
Grid Data
Service
XML /
Relationa
l
database
3b. GDS interacts with database
Example: Mobile Resource Guarantees
The MRG technology consists of programming
languages; type systems for the languages; logics for
expressing statements of resource consumption; and
proof technology for proving these statements.
Camelot, a high-level functional programming language with
objects and resource control;
Grail , a strongly-typed intermediate language which is the target
language of the Camelot compiler and is interconvertible with
Java byte code;
A cost model, a formal semantics for byte code execution which
tracks execution time and space allocation;
A byte code logic allowing the expression of costs, embedded in
a generic proof system (Isabelle).
Resource-bounded mobile code
Relevance to Grids
Grid service providers need to schedule
competing requests for access to resources.
With 25Kb of code and 1Pb of sky survey data it
is infeasible to ship the data to the code.
There are projects which have supported
scientific programming in functional languages
(e.g. Psicho).
An alternative would be to transfer the
MRGtechnology to Java or Java-like languages
(ESC/Java, SpecialJ, and Pizza).
Example: AMUSE
Autonomic Management of Ubiquitous Systems
for e-Health
Automated management of complex distributed
application systems
Architectural pattern and prototype
implementations for closed-loop management of
such systems
Policy-based management
AMUSE will integrate these to address
automated management of e-Health applications
Closed-loop Management
Pattern (Self-Managed Cell)
Management
Application
Measurement
Analysis,
Simulation,
Optimization
Raw
Measurement
Provisioning
Trends &
Prediction
Event Bus
Policy
Management
Measurement
Adapters
Service Goals
System Policy
“System”
Configuration
Topology,
Other
“System” Under Test
Two-level nesting
Management
Application Level n
Measurement
Analysis,
Simulation,
Optimization
Raw
Measurement
Provisioning
Trends &
Prediction
Event Bus
Measurement
Adapter
Policy
Management
Service Goals
System Policy
Meas
Level
n-2
Infer
Level n-1
Prov
Event Bus
Policy
Agents
“System”
Config
“System”
Configuration
Topology,
Other
GGF: Standardisation
Grid Research Oversight Committee & Programme Committee
Prof. Malcolm Atkinson
Data Access and Integration Services Working Group
Dr Mario Antonioletti (Group Secretary & Editor), Dr Amy Krause
(Editor)
Prof. Malcolm Atkinson, Dr Martin Westhead, Neil Chue Hong (Authors)
Dr. Mike Jackson
Data Format Definition Language Working Group
Dr Martin Westhead (Founder and Chair)
Job Submission Definition Language Working Group
Dr Ali Anjomshoaa (founder and chair)
Open Grid Services Architecture Working Group
Dr Dave Berry
Open Grid Services Infrastructure Working Group
Dr Mike Jackson, Daragh Byrne
Data Services
GGF Data Access & Integration Services (DAIS)
OGSI-compliant interfaces to access relational and
XML databases
Needs to be generalized to encompass other data
sources (see next slide…)
Generalized DAIS becomes the foundation for:
Replication: Data located in multiple locations
Federation: Composition of multiple sources
Provenance: How was data generated?
Future DAI Services
1a. Request to Registry for
sources of data about “x” &
“y”
1b. Registry
responds with
Factory handle
Data
Registry
SOAP/HTTP
service creation
API interactions
2a. Request to Factory for access and
integration from resources Sx and Sy
Data Access
& Integration
master
2c. Factory
returns handle of GDS to client
3b.
Client
Problem
tells“scientific”
Solving
analyst
Client
Application
Environment
coding
scientific
insights
Analyst
2b. Factory creates
Semantic
GridDataServices network
Meta data
3a. Client submits sequence of
scripts each has a set of queries
to GDS with XPath, SQL, etc
GDTS1
GDS
GDTS
XML
database
GDS2
Sx
3c. Sequences of result sets returned to
analyst as formatted binary described in
a standard XML notation
Application Code
GDS
GDS1
Sy
GDS3
GDS
GDTS2
GDTS
Relational
database
Take Home Message
In addition to our national services, NeSC has a thriving
research programme
Foundation departments
Technology development (EPCC, NeSC, Globus Alliance)
Research scientists
There are many opportunities for
Wide breadth of interest
collaboration
Particular focus on scientific data
OGSA-DAI is here now
Join in making better DAI services & standards
Bioinformatics and Astronomy are Priority Application
Areas
Download