Research at the National e-Science Centre 6 November 2003

advertisement
Research at the
National e-Science Centre
Dr. Dave Berry
Research Manager
www.nesc.ac.uk
6th November 2003
Three Pillars of e-Science Research
Apply known
results
Foundations
Edinburgh:
- Informatics
- Physics &
Astronomy
Glasgow:
- Computing
Science
- Physics &
Astronomy
Enable new
science
Technology
Focus for
new work
EPCC
ETF&Testbeds
edikt
Repositories
Computing
Industry
Applications
Steering of
development
Research
Departments
Research
Institutes
Other
Universities
Commercial
Customers
Information Grids
Apply known
Peter
results
Buneman’s
Foundations
Publishing
Scientific
Data
GridPP
ScotGrid
GroupTechnology
Focus for
new work
OGSA-DAI/
DAIT
th
1,000
Enable new
Download
science
Sep 2003 Applications
Steering of
development
AstroGrid
Tony edikt
Doyle
&
– eldas
BinX
Steveand
Playfer
QCDGrid
Richard
Kenway
ODD-Genes
BRIDGES
FirstDIG
Richard
Baldock
Biological
SpatioTemporal
Databases
Computation Grids
Apply known
results
Foundations
GridPP
ScotGrid
Enable new
science
Technology
Focus for
new work
SunDCG
> 3000 doc
Applications
downloads
Steering of
development
ODD-Genes
PGPGrid
RealityGrid
Enhance
Murray
Cole
Paul
Cockshott
Fabrics and Platforms
Apply known
Joe
results
Enable new
science
Sventek
Foundations
AMUSE
Dynamic
Configuration
of Grid Fabrics
Dependable
Grid Services
Technology
Focus for
new work
MS.NETGrid
GridWeaver
Applications
Steering of
development
OGSA Test
Grid
IBM Grid
Evaluation
Stuart
Anderson
LCFG +
SmartFrog
More foundations
Service Composition
Alan
Bundy
Deductive Synthesis Techniques …
Inferring QoS Properties for Grid Applications
Don Sannella,
Stephen Gilmore
Mobile Code
Mobile Resource Guarantees
IRCs
CoAKTinG
EQUATOR
Security
Austin
Tate
Matthew
Chalmers
Technologies for Information Environment
Security
More applications
Physics
CDF Grid Development
NeuroInformatics
David
Wilshaw for
Grid-enabled Modelling Tools and Databases
Neuroinformatics
BioInformatics
e-Diamond (mammography)
Rob
Procter
http://www.nesc.ac.uk/projects/
Data Repositories
Medical Genetics
Generation Scotland
Human Genetics Unit
Mouse Atlas
Nuclear Protein Database
Roslin Institute
ArkDB,
Informatics
EUSTACE Corpus
FlyTrap
GeoSciences
Antarctic Survey data
Continental seismic survey data
BGS offshore survey
Example: ODD-Genes
ODD-Genes is a demonstrator
Demonstrates how Grid technologies enable e-Science,
accelerating scientific discovery
SunDCG’s TOG software allows for job submission on remote
compute resources
OGSA-DAI provides access, control and discovery of data
"This project has demonstrated how
resources
Grid technologies can be used to
ODD-Genes used
to investigate Wilms Tumour
enable true e-Science - discoveries
Routine statistical conditioning
ofotherwise
microarray
results
that would not
have
been
Data-driven discovery
of novel
targets
investigation and
achieved
without
thisfor
infrastructure
potential therapy
in place."
Professor Peter Ghazal, Director,
Collaborative project
GTI.
NeSC/EPCC
Scottish Centre for Genomic Technology and Informatics (GTI)
Human Genetics Unit at MRC, Western General Hospital (HGU)
SunDCG – Enabling Routine Statistical
Conditioning
Choose analysis to perform
Automates analysis process
Provides predetermined
workflow
Can run more than one
analysis at a time
Multiple reproducible avenues
for investigation
Reduces cost (human,
machine), increases availability
TOG enables this by allowing
access to HPC resources
SunDCG Compute Scheduler
B
Grid Engine
Grid Engine
a b
e
c d
Globus 2
User A
A
e
f
User B
g h
d
Integrates Grid Engine and Globus 2
GE execution methods provide job submission/control
GE job context stores job specific information
Globus GSI for security
Globus GRAM enables interaction with remote resource
GASS for small data transfer, GridFTP for large datasets
OGSA-DAI - Results Investigation
Multiple views of data
Raw
Heat Map
Cluster Map
Wilms Tumour study
takes a new direction
two genes appear
significant in early
development
Researchers would like
more info on these
genes…
OGSA-DAI - Data Resource
Discovery
OGSA-DAI uses keywords to locate
relevant data resources
May return data resources previously
unknown to researcher
Researcher selects most interesting
data resource to query for information
about gene
Researcher selects Mouse atlas –
narrow, deep database of spatial gene
expression in mice embryonic
development
Contrast with GTI database of broad,
shallow genome-wide gene expression
across multiple organisms, stages &
conditions
OGSA-DAI - Data Resource Query
OGSA-DAI returns data from
query
Data and annotation displayed
Data contains references to
related images
Researcher rapidly moves from
numeric and textual description
to spatial representation of
relevant gene expression
These show that the genes
are stem cell markers
Targets for focussed
investigation, potential therapy
Data Access & Integration Services
1a. Request to
Registry for sources
of data about “x”
SOAP/HTTP
Registry
1b. Registry
responds with
Factory handle
service creation
API interactions
2a. Request to Factory for
access to database
Factory
Client
2c. Factory returns
handle of GDS to
client
3a. Client queries GDS
with XPath, SQL, etc
3c. Results of query returned to
client as XML
2b. Factory creates
GridDataService to manage
access
Grid Data
Service
XML /
Relational
database
3b. GDS interacts with database
Example: Mobile Resource Guarantees
The MRG technology consists of programming
languages; type systems for the languages; logics for
expressing statements of resource consumption; and
proof technology for proving these statements.
Camelot, a high-level functional programming language with
objects and resource control;
Grail , a strongly-typed intermediate language which is the target
language of the Camelot compiler and is interconvertible with
Java byte code;
A cost model, a formal semantics for byte code execution which
tracks execution time and space allocation;
A byte code logic allowing the expression of costs, embedded in
a generic proof system (Isabelle).
Resource-bounded mobile code
Relevance to Grids
Grid service providers need to schedule
competing requests for access to resources.
With 25Kb of code and 1Pb of sky survey data it
is infeasible to ship the data to the code.
There are projects which have supported
scientific programming in functional languages
(e.g. Psicho).
An alternative would be to transfer the
MRGtechnology to Java or Java-like languages
(ESC/Java, SpecialJ, and Pizza).
Example: AMUSE
Autonomic Management of Ubiquitous Systems
for e-Health
Automated management of complex distributed
application systems
Architectural pattern and prototype
implementations for closed-loop management of
such systems
Policy-based management
AMUSE will integrate these to address
automated management of e-Health applications
Closed-loop Management
Pattern (Self-Managed Cell)
Management
Application
Measurement
Raw
Measurement
Analysis,
Simulation,
Optimization
Provisioning
Trends &
Prediction
Event Bus
Policy
Management
Measurement
Adapters
Service Goals
System Policy
“System”
Configuration
Topology,
Other
“System” Under Test
Two-level nesting
Management
Level n
Application
Measurement
Analysis,
Simulation,
Optimization
Raw
Measurement
Provisioning
Trends &
Prediction
Event Bus
Measurement
Adapter
Policy
Management
Service Goals
System Policy
Meas
Level
n-2
Infer
Level n-1
Prov
Event Bus
Policy
Agents
“System”
Config
“System”
Configuration
Topology,
Other
GGF: Standardisation
Grid Research Oversight Committee & Programme Committee
Prof. Malcolm Atkinson
Data Access and Integration Services Working Group
Dr Mario Antonioletti (Group Secretary & Editor), Dr Amy Krause
(Editor)
Prof. Malcolm Atkinson, Dr Martin Westhead, Neil Chue Hong (Authors)
Dr. Mike Jackson
Data Format Definition Language Working Group
Dr Martin Westhead (Founder and Chair)
Job Submission Definition Language Working Group
Dr Ali Anjomshoaa (founder and chair)
Open Grid Services Architecture Working Group
Dr Dave Berry
Open Grid Services Infrastructure Working Group
Dr Mike Jackson, Daragh Byrne
Data Services
GGF Data Access & Integration Services (DAIS)
OGSI-compliant interfaces to access relational and
XML databases
Needs to be generalized to encompass other data
sources (see next slide…)
Generalized DAIS becomes the foundation for:
Replication: Data located in multiple locations
Federation: Composition of multiple sources
Provenance: How was data generated?
Future DAI Services
1a. Request to Registry for
sources of data about “x” &
“y”
1b. Registry
responds with
Factory handle
Data
Registry
SOAP/HTTP
service creation
API interactions
2a. Request to Factory for access and
integration from resources Sx and Sy
2c. Factory
returns handle of GDS to client
3b.
Client
Problem
tells“scientific”
Solving
analyst
Client
Application
Environment
coding
scientific
insights
Analyst
Data Access
& Integration
master
3a. Client submits sequence of
scripts each has a set of queries
to GDS with XPath, SQL, etc
2b. Factory creates
Semantic
GridDataServices network
Meta data
GDTS1
GDS
GDTS
GDS2
3c. Sequences of result sets returned to
analyst as formatted binary described in
a standard XML notation
Application Code
Sx
GDS
GDS1
XML
database
Sy
GDS3
GDS
GDTS2
GDTS
Relational
database
Take Home Message
In addition to our national services, NeSC has a thriving
research programme
Foundation departments
Technology development (EPCC, NeSC, Globus Alliance)
Research scientists
There are many opportunities for
Wide breadth of interest
collaboration
Particular focus on scientific data
OGSA-DAI is here now
Join in making better DAI services & standards
Bioinformatics and Astronomy are Priority Application
Areas
Infrastructure Architecture
Data Intensive Users
Data Intensive Applications for Science X
Simulation, Analysis & Integration Technology for Science X
Generic Virtual Data Access and Integration Layer
Job Submission
Brokering
Registry
Banking
Data Transport
Workflow
Structured Data
Integration
Authorisation
OGSA
Resource Usage Transformation Structured Data Access
OGSI: Interface to Grid Infrastructure
Compute, Data & Storage Resources
Structured Data
Relational
Distributed
Virtual Integration Architecture
XML Semi-structured
-
ODD-Genes Caveats & Further
Work
ODD-Genes is a demonstrator
Need to develop production applications for both routine
statistical processing and data resource discovery and query
Need to parameterise routine conditioning appropriately to
complete automation
ODD-Genes requires GRID infrastructure
Participating researchers need to partner with centres who host
application front-ends (or, host the infrastructure themselves)
However, alternatives often proprietary, expensive, less flexible
ODD-Genes requires registration by data-hosts
Critical mass of registered data sources.
SunDCG - Conditioning Results
Results of conditioning can
be analysed and investigated
Researcher has potentially several
views of data to explore, all
presented simultaneously in
parallel (cp traditional serialised,
manual process)
Researcher can reproduce this
initial condition for repeated
analyses
Researcher need not perform each
step manually and serially, or ask
dedicated statistician to do so.
“OGSA Data Services”
Foster, Tuecke, Unger, editors
Describes conceptual model for representing all manner
of data sources as Web services
Database, filesystems, devices, programs, …
Integrates WS-Agreement
Data service is an OGSI-compliant Web service that
implements one or more of base data interfaces:
DataDescription, DataAccess, DataFactory,
DataManagement
These would be extended and combined for specific
domains (including DAIS)
OGSA-DAI Approach
Reuse existing technologies and standards
OGSA, Query languages, Java, transport
Build portTypes and services which will enable:
controlled exposure of heterogenous data resources on an OGSIcompliant grid
access to these resource via common interfaces using existing underlying
query mechanisms
(ultimately) data integration across distributed data resources
OGSA-DAI (the software) seeks to be a reference implementation of
the GGF DAIS WG standard
Can’t keep up with frequent standard changes, so software releases track
specific drafts
See http://www.ogsadai.org.uk/ for details.
Download