National e-Science Centre Local Developments

advertisement
National e-Science Centre
Local Developments
Dr Richard Sinnott
Dr Dave Berry
Technical Director
National e-Science Centre |||
Deputy Director (Technical)
Bioinformatics Research Centre
University of Glasgow
ros@dcs.gla.ac.uk
Research Manager
National e-Science Centre
University of Edinburgh
daveb@nesc.ac.uk
5th February 2004
Overview
NeSC Role in UK e-Science
NeSC Edinburgh developments




e-Science Institute
Infrastructure/set-up
Projects
Plans
NeSC Glasgow developments



Infrastructure/set-up
Projects
Plans
Conclusions
NeSC’s Role
Help coordinate and lead the UK e-Science
Programme
Community building activities, regional support &
outreach
Grid building as a member of the Engineering Task Force
Skill building through training events & support centre
Help establish the UK’s international role
International meetings, standardisation work &
presentations
Undertake R&D projects
To deliver reliable middleware
To engage industry
To stimulate the uptake of e-Science technology and
methods
Run the e-Science Institute
Knowledge building through workshops and
conferences
Research visitors and events
NeSC at Edinburgh:
Recent Developments
Globus Alliance
Digital Curation Centre
Edinburgh, Glasgow, UKOLN, CCLRC
New e-Science Lecturer (Particle Physics)
Training Team
PPARC and EGEE funding
Manager + 4 trainers
Europe-wide role
DAI Two (Extension of OGSA-DAI)
OGSA Test Grid
Digital Curation Centre
communities
of practice:
users
curation
organisations
communit
y support
& outreach
Collaborative
Associates
Network of
Data
Organisations
services
management
& coordination
research
research
collaborators
development
testbeds
& tools
Industry
standards bodies
e-Science Institute
A meeting place
The focus for presenting UK e-Science
Visiting researchers
Collaborate in our research and development
Engage in and develop our event programme
Build bridges with their community
Visits last between one week and six months
Research-oriented event programme
e-Science research topics
Training to e-Science research teams
eSI Workshops
Space for real work
Crossing communities
Creativity: new strategies and solutions
Written reports
Scientific Data Mining, Integration and Visualisation
Suggestions
Grid Information Systems
always
Portals and Portlets
Virtual Observatory as a Data Gridwelcome!
Imaging, Medical Analysis and Grid Environments
Open Issues in Grid Scheduling
Data Provenance & Annotation
e-Science Workflow Services
GeoSciences & Scottish Bioinformatics Forum
http://www.nesc.ac.uk/events/
Projects
OGSA-DAI/DAIT, MS.NETGrid, SunDCG, GridWeaver,
BRIDGES, PGPGrid, FirstDIG, ODD-Genes
EGEE, NextGrid
OGSA Test Grid, IBM Early Evaluation
edikt
Publishing Scientific Data
GridPP, AstroGrid, QCDGrid, RealityGrid Portal
Biological Spatio-Temporal Databases
CoAKTinG, Grid-enabled Modelling Tools and
Databases for Neuroinformatics, e-Diamond
Dynamic Configuration of Grid Fabrics, Dependable
Grid Services, Deductive Synthesis Techniques,
Inferring QoS Properties for Grid Applications, Mobile
Resource Guarantees
TIES, TIES-II
The Virtual Observatory
International Virtual Observatory
Alliance
UK, Australia, EU, China,
Canada, Italy, Germany, Japan,
Korea, US, Russia, France, India
How to integrate many
multi-TB collections of
heterogeneous data
distributed globally?
Sociological and technological challenges to be met
Data Services
GGF Data Access and Integration Svcs
(DAIS)
OGSI-compliant interfaces to access relational and
XML databases
Needs to be generalized to encompass other data
sources (see next slide…)
Generalized DAIS becomes the foundation
for:
Replication: Data located in multiple locations
Federation: Composition of multiple sources
Provenance: How was data generated?
Data Access & Integration Services
1a. Request to Registry
for sources of data
about “x”
SOAP/HTTP
Registry
1b. Registry
responds with
Factory handle
service creation
API interactions
2a. Request to Factory for access
to database
Factory
Client
2c. Factory returns
handle of GDS to
client
3a. Client queries GDS with
XPath, SQL, etc
3c. Results of query returned to
client as XML
2b. Factory creates
GridDataService to manage
access
Grid Data
Service
XML /
Relationa
l
database
3b. GDS interacts with database
edikt
Standards
Requirements
analysis
Technology
matchmaking
E-Science Apps
CS Research
Edikt project
Gap filling
Grid Services for
e-Science Data
Management
Rigorous
engineering
Commercial SW
components
and skills
The team: 8 professional software engineers, support
staff, project manager, commercialisation manager,
architect, and SAB
SHEFC funded research and development grant
3 years funding: May 2002 – 2005
+3 years funding upon successful project and review
ELDAS – Data Access Service
Grid User1
ELDAS runs anywhere
Suitable for grid & web
Grid User2
Grid Proxy
ELDAS
DAC
Xindice DB
Web User1
Web Servlet
Java
Framework
EJB - DAS
DAC
MySQL DB
DAC
DAC
DB2 DB
Oracle 9i DB
Implemented using Enterprise Java Beans
Data Access Components interface to distinct DBMSs
Accessible as a grid data service or a web data
service
BinX – accessing legacy binary
data
simulations
The Problem:
Many binary data files
Applications must
“know”
the data format
Binary data formats are
machine-specific
Binary
Binary
Data File
Binary
Data File
Data File
The Solution:
Write a “stand-aside”
format description in XML
Provide a library to


Interpret the description
Provide file access across
different machines
Build higher-level
services
BinX file
describes
binary file
structure
BinX Library
e-Science
Application
NeSC at Glasgow
E-Science Hub
Externally

Glasgow end of NeSC
– Involved in UK wide activities
» ETF: In May 2003 became first UK e-Science Centre to
run integration tests across every site of the UK (Level
2) Grid. Therefore 100% access to UK Grid resources at
this time
– Public visibility of NeSC
» responsible for NeSC web site
Internally



Focal point for e-Science research/activities at Glasgow
Work closely with foundation departments
– Department of Computing Science
– Department of Physics & Astronomy
Also working closely with other groups including
– Bioinformatics Research Centre
– Electronics and Electrical Engineering
– Biostatistics, …
Glasgow e-Science
Investment
Major investment by
university
230m2 of newly renovated
floor space in Kelvin Building



offices
access grid facility
training room
– equipped with 20PCs/server for
training courses
Funding Technical Director
Resource Consolidation at
Glasgow
Building around ScotGrid
Providing shared Grid resource for wide
variety of scientists inside/outside Glasgow



Particle physicists, computer scientists,
electronic engineers, bioinformaticians, …
Focal point, knowledge pool, primary resource
for e-Science activity at Glasgow
Target shares
Shared Resources:
Disk ~15TB
Hardware
CPU ~ 330 1GHz
•
59 IBM X Series 330 dual 1 GHz Pentium III with 2GB memory
CDF
•
2 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory
•
3 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory and 100 + 1000 Mbit/s ethernet
BIO
LHC
•
1TB disk
– 60% PP, 20% Bioinf, 20% open share…
•
•
•
•
•
•
•
•
LTO/Ultrium Tape Library
Cisco ethernet switches
IBM X Series 370 PIII Xeon with 32 x 512 MB RAM
5TB FastT500 disk 70 x 73.4 GB IBM FC Hot-Swap HDD
eDIKT 28 IBM blades dual 2.4 GHz Xeon with 1.5GB memory
eDIKT 6 IBM X Series 335 dual 2.4 GHz Xeon with 1.5GB memory
CDF 10 Dell PowerEdge 2650 2.4 GHz Xeon with 1.5GB memory
CDF 7.5TB Raid disk
Projects with NeSC Glasgow
Involvement
DCC
National Digital Curation Centre
AMUSE
Autonomous Management of Ubiquitous Systems for
e-Health
P2Popt
Performance measurement & mgt of 2-Layer Peer to
Peer NWs…
PGPGrid
Peppers Ghost Productions
Equator
Environmental e-Science Interdisciplinary Research
Project
BPS
Biochemical Pathway Simulator
BRIDGES
Overview of BRIDGES
Biomedical Research Informatics Delivered by
Grid Enabled Services (BRIDGES)
NeSC (Edinburgh and Glasgow) and IBM
2 year project started 1st October 2003
Supporting project for CFG project
Generating data on hypertension
Rat, Mouse, Human genome databases
Variety of tools used
BLAST, FASTA, MPsrch, BLAT, Gene Prediction,
visualisation, …
Variety of data sources and formats
Microarray data, genome DBs, project partner
research data, medical records, …
Aim is integrated infrastructure supporting
Data federation
Security
CFG Partner Distribution
Glasgow
Shared data
Edinburgh
Private
data
Public curated
data
Private
data
Leicester
Private
data
Oxford
Private
data
Netherlands
London
Private
data
Private
data
Problems specific to BioCommunity
PDB Content Growth
•DBs growing exponentially!!!
•Biobliographic (MedLine, …)
•Amino Acid Seq (SWISS-PROT, …)
•3D Molecular Structure (PDB, …)
•Nucleotide Seq (GenBank, EMBL, …)
•Biochemical Pathways (KEGG, WIT…)
•Molecular Classifications (SCOP, CATH,…)
•Motif Libraries (PROSITE, Blocks, …)
•…
More genomes …...
Yersinia
pestis
Arabidopsis
thaliana
Buchnerasp. Aquifex
APS
aeolicus
CaenorhabitisCampylobacter Chlamydia
elegans
jejuni
pneumoniae
Helicobacter Mycobacterium
pylori
leprae
rat
mouse
Archaeoglobus Borrelia Mycobacterium
fulgidus
burgorferi tuberculosis
Vibrio
Drosophila EscherichiaThermoplasma
cholerae melanogaster
coli
acidophilum
Neisseria Plasmodium PseudomonasUreaplasma
meningitidis falciparum aeruginosa urealyticum
Z2491
Rickettsia SaccharomycesSalmonella
prowazekii
cerevisiae
enterica
Bacillus
subtilis
Thermotoga
maritima
Xylella
fastidiosa
Organisms
Physiology
Tissues
Protein-protein interaction (pathways)
Protein Structures
Gene expressions
Nucleotide structures
Complexity of Biological Data
BRIDGES Data
Integration/Federation
Local repository being developed
Populated with data that cannot be federated

e.g. public data sets with no programmatic interface
Shared data sets of CFG scientists
Security through


X.509 PKI (authentication)
PERMIS (authorisation)
Will make use of e-Science technologies (OGSADAI/DAIT, ELDAS, IBM’s DiscoveryLink)

Automatically keep fresh/updated data
Web (Grid) services offered that allow to make use
of these local data sets

For example for visualising, searching, querying, …
Example usage scenario …
System Usage Scenario
Smith
W
SV
Authorisation
Java App
downloaded (via
WebStart)
Per user, per site
DL
Secure Data
Repository
Remote data in Oracle, DB2,
Sybase, Excel, flat files, XML...
BLAST
results
input to
DB
wrappers
Shared/
Private
Data Sets
Personalised
Services
Generic services
used by other
Up to
projects
date
OGSA-DAI
Browser based clients…
Client
Site X
Secure
access
for CFG
VO
BRIDGES Portal
Push relevant
data onto
ScotGrid for
BLAST’ing
Conclusions
NeSC continues to provide leadership in UK
e-Science
Difficult with multitude of scientific research areas,
heterogeneity of systems and fluidity of technologies,

GT2, GT3, WSRF, GT4…?
Closer working with GridPP beneficial for
everyone
move towards Production Grid

ScotGrid a good model for co-operation
Planning for soft landing through
diversification and more integration into
university
MRC bids, BBSRC bids, EPSRC bids, …
UK e-Science operating as community for upcoming
DTI funding opportunities
Plans for developing Grid Computing teaching
modules as part of advanced MSc
Website
National e-Science Centre
http://www.nesc.ac.uk/
Mission, Background, Foundation
Locations, Staff, Resources, Projects
Register interest, Mailing lists, NeSCForge
Regional associations and Collaborations
News, Notices
Presentations & Lectures http://www.nesc.ac.uk/presentations/
e-Science Institute
http://www.nesc.ac.uk/esi/
Mission, Events (Future and Past)
Register for Events, Visitor Programme
UK e-Science
Map and Index of Centres
http://www.nesc.ac.uk/centres/
Technical Papers
http://www.nesc.ac.uk/technical_papers/
Index of >100 Projects
http://www.nesc.ac.uk/projects/
Task Forces
http://www.nesc.ac.uk/teams/
General Information
Glossary, Bibliography, Who’s who
E-Science job vacancies
Questions…?
Download