BRIDGES Dr Richard Sinnott

advertisement
BRIDGES
Dr Richard Sinnott
Technical Director National e-Science Centre
|||
Deputy Director (Technical) Bioinformatics Research Centre
University of Glasgow
26th April 2005
PRISM Forum,
26th April 2005
Grids? E-Science? E-Research?
methodologies transforming science, engineering, medicine
and business
driven by exponential growth in data, compute demands
X
enabling a whole-system approach
computers
software
Grid
sensor nets
instruments
colleagues
PRISM Forum,
26th April 2005
Shared data
archives
NeSC in the UK
NeSC
HPC(x)
Edinburgh
Glasgow
Belfast
Lancaster
Manchester
Daresbury Lab
Midlands
Newcastle
White Rose Grid
York
Leeds Core National
Sheffield
Grid Service
Leicester
Cambridge
Oxford
UCL Hinxton
RAL
Bristol Reading Imperial
CSAR
Cardiff
Southampton
PRISM Forum,
26th April 2005
Glasgow e-Science Hub
E-Science Hub
Externally
X
Glasgow end of NeSC
– Involved in UK wide activities
» ETF, STF, …
» Involved in numerous life science/security related projects (more later)
– Public visibility of NeSC
» responsible for NeSC web site
Internally
X
X
Focal point for e-Science research/activities at Glasgow
Work closely with foundation departments
– Department of Computing Science
» Offer full course on Grid Computing to advanced MSc students
» First batch of students completed in December 2004
X
– Department of Physics & Astronomy
Also working closely with other groups including
– Bioinformatics Research Centre
– Electronics and Electrical Engineering
– Biostatistics
– Sir Henry Wellcome Functional Genomics Facility
– Clinical/Medical
– …
PRISM Forum,
26th April 2005
Glasgow e-Science Infrastructure Now
Consolidation of resources
Story started with building around ScotGrid
X
Providing shared Grid resource for wide
variety of scientists inside/outside Glasgow
– HEP, CS, BRC, EEE, …
» Target shares established
» Non-contributing groups encouraged
Hardware
•
59 IBM X Series 330 dual 1 GHz Pentium III with 2GB memory
•
2 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory
•
3 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory
and 100 + 1000 Mbit/s ethernet
•
1TB disk
•
LTO/Ultrium Tape Library
•
Cisco ethernet switches
New..
•
IBM X Series 370 PIII Xeon with 32 x 512 MB RAM
•
5TB FastT500 disk 70 x 73.4 GB IBM FC Hot-Swap HDD
•
eDIKT 28 IBM blades dual 2.4 GHz Xeon with 1.5GB memory
•
eDIKT 6 IBM X Series 335 dual 2.4 GHz Xeon with 1.5GB memory
•
CDF 10 Dell PowerEdge 2650 2.4 GHz Xeon with 1.5GB memory
•
CDF 7.5TB Raid disk
PRISM Forum,
26th April 2005
ScotGrid [ Disk ~15TB
CPU ~ 255 1GHz ]
Over 2 million CPU hours
completed (April 2005)
Over 200,000 jobs completed
Includes time out for major
rebuilds
Typically running at ~90%
usage
Glasgow e-Science Infrastructure Plans
But not enough…
Computer Services second HPC facility (128 processor)
X
being deployed
University SAN (50TB – 25TB mirrored across campus)
X
being deployed
– ~£850k investment
» Expected usage May 2005
Recent SMP donations to NeSC Glasgow by Sun
Access to campus wide resources
X
X
X
NeSC training lab condor pool, EEE condor pool, Physics & Astronomy, …
EEE compute clusters and larger SMP machines…
others…???
National Grid Service
SRDG proposals for Scottish Grid Service infrastructure… in progress
Scottish Bioinformatics Research Network equipment funds (more later)
…
PRISM Forum,
26th April 2005
Glasgow e-Science People Plans
E-Science Strategy/Business Plan
Widely supported by Senior Management Group
X
X
X
X
X
Underwriting E-Science applications co-ordinator
Underwriting of NeSC staff contracts
Underwriting existing Grid systems administrator positions
Future funds for NeSC running costs
… new positions under discussion (Research Computing Director)
To be funded through contributions from university wide e-Science
activities and university Strategic Investment Funds
X
Depends on university wide engagement and support of e-Science
– Helped through e-Science / e-Research applicable to all faculties
PRISM Forum,
26th April 2005
Life Sciences
Extensive Research Community
>1000 per research university
Extensive Applications
Many people care about them
X
Health, Food, Environment
Interacts with virtually every discipline
Physics, Chemistry, Maths/Stats, Nano-engineering, …
450+ databases relevant to bioinformatics (and growing!)
Heterogeneity, Interdependence, Complexity, Change, …
PRISM Forum,
26th April 2005
PRISM Forum,
26th April 2005
+ links to plant/crops,
environmental, health, …
information sources
Populations
Organisms
Physiology
Organs
Tissues
Cell signalling
Cell
Protein-protein interaction (pathways)
Protein functions
Protein Structures
Gene expressions
Nucleotide structures
Nucleotide sequences
Systems Biology?
More genomes …...
Yersinia
pestis
Arabidopsis
thaliana
Buchnerasp.
APS
Caenorhabitis Campylobacter Chlamydia
elegans
jejuni
pneumoniae
Helicobacter Mycobacterium
pylori
leprae
rat
mouse
Aquifex
aeolicus
Man
Archaeoglobus Borrelia
Mycobacterium
fulgidus
burgorferi
tuberculosis
Drosophila
melanogaster
Escherichia Thermoplasma
coli
acidophilum
Neisseria
Plasmodium Pseudomonas Ureaplasma
meningitidis falciparum
aeruginosa urealyticum
Z2491
Rickettsia
Salmonella
PRISMSaccharomyces
Forum,
prowazekii
cerevisiae
enterica
26th April 2005
Bacillus
subtilis
Thermotoga
maritima
Xylella
fastidiosa
Distributed and Heterogeneous data
Structure
Sequence
LPSYVDWRSA
ECGGCWAFSA
TSGSLISLSE
NTRGCDGGYI
GGINTEENYP
Function
GAVVDIKSQG
IATVEGINKI
QELIDCGRTQ
TDGFQFIIND
YTAQDGDCDV
Gene expression
PRISM Forum,
26th April 2005
Morphology
Database Growth
•DBs growing rapidly!!!
•Biobliographic (MedLine, PubMed…)
•Amino Acid Seq (SWISS-PROT/UNI-PROT, …)
•3D Molecular Structure (PDB, …)
•Nucleotide Seq (GenBank, EMBL, …)
•Biochemical Pathways (KEGG, WIT…)
•Molecular Classifications (SCOP, CATH,…)
•Motif Libraries (PROSITE, Blocks, …)
http://www.genome.jp/dbget/db_growth.gif
PRISM Forum,
26th April 2005
Is Grid the Answer?
Some key problems to be addressed
Tools that simplify access to and usage of data
X
Internet hopping is not ideal!
Tools that simplify access to and usage of large scale HPC facilities
X
qsub [-a date_time] [-A account_string] [-c interval] [-C directive_prefix] [-e path] [-h]
[-I] [-j join] [-k keep] [-l resource_list] [-m mail_options] [-M user_list] [-N name] [-o path]
[-p priority] [-q destination] [-r c] [-S path_list] [-u user_list] [-v variable_list] [-V] [-W
additional_attributes] [-z] [script]
Tools designed to aid understanding of complex data sets and
relationships between them
X
e.g. through visualisation
Make it all easy to use!
X
X
X
X
Scientists should not have to be Linux script experts,
…nor set up/configure complex Grid software or follow complex procedures for getting,
using Grid certificates,
…nor have detailed understanding of low level data schemas for all data sites,
… etc etc
PRISM Forum,
26th April 2005
PRISM Forum,
26th April 2005
Populations
Organisms
Physiology
Organs
Tissues
Cell signalling
Cell
Protein-protein interaction (pathways)
Protein functions
Protein Structures
Gene expressions
Nucleotide structures
Nucleotide sequences
BRIDGES
Overview of BRIDGES
Biomedical Research Informatics Delivered by Grid
Enabled Services (BRIDGES)
NeSC (Edinburgh and Glasgow) and IBM
Two year project funded by DTI - started October 2003
Supporting project for Cardiovascular Functional Genomics
(CFG) project
Generating data on hypertension
Rat, Mouse, Human genome databases
Variety of tools used
BLAST, BLAT, Gene Prediction, visualisation, …
Variety of data sources and formats
Microarray data, genome DBs, project partner research data, …
Aim is integrated infrastructure supporting
Data federation
Security
PRISM Forum,
26th April 2005
Bridges Project
C F G V ir t u a l
P u b lic a lly C u r a te d D a t a
E nsem bl
O r g a n is a t io n
O M IM
G la s g o w
S W I S S -P R O T
P riv a te
E d in b u r g h
MGI
VO Authorisation
P r iv a te
d ata
O x fo rd
Information
Integrator
st
Magna
Vista
Service
bl a
Synteny
Service
London
…
L e ic e s te r
P r iv a te
d ata
N e th e rla n d s
P r iv a te
data
P riv a te
d ata
+
PRISM Forum,
26th April 2005
HUGO
RGD
D ATA
HUB
OGSA-DAI
P riv a te
data
d ata
+
+
Grid Security
Grid security
Single sign-on based on (X.509) digital certificates
X
X
CA in RAL
local CA’s possible also
Services (and clients) have APIs for fine grained security
X
“I know who you are and here is your local account”
Provides for authentication but need authorisation
X
“I know who you are and here is what you are allowed to do on my
resource”
– Various technologies for authorisation including PERMIS, CAS, VOMS, …
PERMIS is leading implementation!
X
Lead by Prof David Chadwick, University of Kent (www.permis.org)
– Supports GGF SAML AuthZ interface
» Generic way to link Grid services with authorisation infrastructure
PRISM Forum,
26th April 2005
Security Authorisation
PERMIS allows to
Define roles for who can do what on what
X
Policy = { Role x Target x Action }
– Can user X invoke service Y and access or change data Z?
» Policies created with PERMIS PolicyEditor (output is XML based policy)
PRISM Forum,
26th April 2005
Security Authorisation
PERMIS tools then used to associate roles with
specific users and sign policies
X
Policies stored as attribute certificates in LDAP server
BRIDGES authorisation
Authorisation of computational resources
X
Policies defined for:
– Condor pool (default for unknown users)
– ScotGrid (if local account exists for that trusted user)
– National Grid Service (if known and trusted user)
» Note solution does not require users have their own Grid certificates
Authorisation of data sets
X
Policies defined to restrict data sets scientists/others can get access to
PRISM Forum,
26th April 2005
PRISM Forum,
26th April 2005
Other NeSC Glasgow Projects
Joint Data Standards Survey
Public data resources openness
X
X
X
Often cannot query directly
Often not easy/possible to find schemas
Joint Data Standards Study investigating this
– Started on 1st June and involves
» Digital Archiving Consultancy
» Bioinformatics Research Centre (Glasgow)
» NeSC (Edinburgh and Glasgow)
– Look at technical, political, social, ethical etc issues involved in
accessing and using public life science resources
» Interview relevant scientists, data curators/providers
– 8 month project with final report due imminently
» Funded by MRC, BBSRC, Wellcome Trust, JISC, NERC, DTI
PRISM Forum,
26th April 2005
DyVOSE Project
Dynamic Virtual Organisations for e-Science Education
(DyVOSE) project
Two year project started 1st May 2004 funded by JISC
Exploring advanced authorisation infrastructures for security
X
… in Grid Computing Module as part of advanced MSc at Glasgow
– Provide insight into rolling Grid out to the masses!
ScotGrid
GU Condor pool
Other (known!)
Grid resources
Education
VO policies
PERMIS based
Authorisation checks
Authorisation decisions
PRISM Forum,
26th April 2005
DyVOSE Phase 2/3
Glasgow
ScotGrid
Edinburgh
Condor pool
Blue Dwarf
Dynamically
established VO
resources/users
Delegated
VO policies
Glasgow
Education
VO policies
Shibboleth
PERMIS based
Authorisation
checks/decisions
PRISM Forum,
26th April 2005
Edinburgh
Education
VO policies
Scottish Bioinformatics Research Network
Four year proposal expected to start imminently
Funded (£2.4M) by Scottish Enterprise, Scottish Higher Education
Funding Council, Scottish Executive Environment and Rural Affairs
Department
X
Involves Glasgow, Dundee, Edinburgh, Scottish Bioinformatics Forum
Aim to provide bioinformatics infrastructure for Scottish health,
agriculture and industry
X
X
X
Infrastructure support at Dundee, Edinburgh and Glasgow to support first-rate
research in bioinformatics at each academic institute
Infrastructure support at three institutes, to support inter-institutional sharing of
compute and data resources through application of Grid computing
Outreach and training activities mediated by the Scottish Bioinformatics Forum
PRISM Forum,
26th April 2005
VOTES
Virtual Organisations for Trials and Epidemiological Studies
3 year MRC (£2.9M) funded project expected to start imminently
Plans to develop Grid infrastructure to address key components of
clinical trial/observational study
X
X
X
Recruitment of potentially eligible participants
Data collection during the study
Study administration and coordination
– Involves Glasgow, Oxford, Leicester, Nottingham, Manchester
Clinical Virtual Organisation Framework
Used to realise
CVO-1
(e.g. for data
collection)
CVO-2
(e.g. for
recruitment)
LeiNott
GLA
Transfer
Grid
GPs
OX
IMP
Clinical trial
data sets
PRISM Forum,
26th April 2005
Disease
registries
Hospital
databases
Genetics and Healthcare Initiative
Five (2+3) year proposal (£4.4M) expected to start imminently
Funded by Health Department and Department for Enterprise and
Lifelong Learning
X
Involves Glasgow, Dundee, Edinburgh, Aberdeen
– focus of genetics as applied to healthcare
– first two years emphasis on providing a platform for research into the genetic
basis of common complex diseases in Scotland
» Mental health, cardiovascular, …
» Plan to establish 15,000 family-based intensively-phenotyped cohort recruited from
the East and West of Scotland
– basis for neutralising heritable (genetic) risk factors in disease surveillance,
treatment optimisation, avoidance of adverse drug events and prediction of
response to therapy, health care planning and drug discovery, …
PRISM Forum,
26th April 2005
BRIDGES
SBRN
PRISM Forum,
26th April 2005
JDSS
Populations
Organisms
Physiology
Organs
Tissues
Cell signalling
Cell
Protein-protein interaction (pathways)
Protein functions
Protein Structures
Gene expressions
Nucleotide structures
Nucleotide sequences
GHI
VOTES
DyVOSE
Numerous other proposals submitted looking
at other parts of this picture!
Systems Biology?
Once we have (securely) connected all relevant
data sets and simplified access to and usage of
HPC resources, wrapped your favourite
bioinformatics applications as Grid services...
what questions would you like to ask?
– How does a cell work?
– Why do people who eat less tend to live longer?
– How many people across Scotland had a heart attack in the last 5
years took drug X, and of those that did where genes A or B
influenced by this drug?
– Who has performed an experiment similar to mine and where their
results similar?
– …
PRISM Forum,
26th April 2005
www.nesc.ac.uk
PRISM Forum,
26th April 2005
www.nesc.ac.uk
PRISM Forum,
26th April 2005
Bridges Portal
PRISM Forum,
26th April 2005
MagnaVista
www.nesc.ac.uk
PRISM Forum,
26th April 2005
MagnaVista
PRISM Forum,
26th April 2005
QTL upload
PRISM Forum,
26th April 2005
QTL upload
PRISM Forum,
26th April 2005
QTL browsing
PRISM Forum,
26th April 2005
Grid Blast Client
• Allows
‘genome scale’
blasting
• Uses ScotGrid and idle
compute resources of
training lab Condor pool
PRISM Forum,
26th April 2005
PRISM Forum,
26th April 2005
PRISM Forum,
26th April 2005
PRISM Forum,
26th April 2005
Download