BRIDGES Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical) Bioinformatics Research Centre University of Glasgow 26th April 2005 PRISM Forum, 26th April 2005 Grids? E-Science? E-Research? methodologies transforming science, engineering, medicine and business driven by exponential growth in data, compute demands X enabling a whole-system approach computers software Grid sensor nets instruments colleagues PRISM Forum, 26th April 2005 Shared data archives NeSC in the UK NeSC HPC(x) Edinburgh Glasgow Belfast Lancaster Manchester Daresbury Lab Midlands Newcastle White Rose Grid York Leeds Core National Sheffield Grid Service Leicester Cambridge Oxford UCL Hinxton RAL Bristol Reading Imperial CSAR Cardiff Southampton PRISM Forum, 26th April 2005 Glasgow e-Science Hub E-Science Hub Externally X Glasgow end of NeSC – Involved in UK wide activities » ETF, STF, … » Involved in numerous life science/security related projects (more later) – Public visibility of NeSC » responsible for NeSC web site Internally X X Focal point for e-Science research/activities at Glasgow Work closely with foundation departments – Department of Computing Science » Offer full course on Grid Computing to advanced MSc students » First batch of students completed in December 2004 X – Department of Physics & Astronomy Also working closely with other groups including – Bioinformatics Research Centre – Electronics and Electrical Engineering – Biostatistics – Sir Henry Wellcome Functional Genomics Facility – Clinical/Medical – … PRISM Forum, 26th April 2005 Glasgow e-Science Infrastructure Now Consolidation of resources Story started with building around ScotGrid X Providing shared Grid resource for wide variety of scientists inside/outside Glasgow – HEP, CS, BRC, EEE, … » Target shares established » Non-contributing groups encouraged Hardware • 59 IBM X Series 330 dual 1 GHz Pentium III with 2GB memory • 2 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory • 3 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory and 100 + 1000 Mbit/s ethernet • 1TB disk • LTO/Ultrium Tape Library • Cisco ethernet switches New.. • IBM X Series 370 PIII Xeon with 32 x 512 MB RAM • 5TB FastT500 disk 70 x 73.4 GB IBM FC Hot-Swap HDD • eDIKT 28 IBM blades dual 2.4 GHz Xeon with 1.5GB memory • eDIKT 6 IBM X Series 335 dual 2.4 GHz Xeon with 1.5GB memory • CDF 10 Dell PowerEdge 2650 2.4 GHz Xeon with 1.5GB memory • CDF 7.5TB Raid disk PRISM Forum, 26th April 2005 ScotGrid [ Disk ~15TB CPU ~ 255 1GHz ] Over 2 million CPU hours completed (April 2005) Over 200,000 jobs completed Includes time out for major rebuilds Typically running at ~90% usage Glasgow e-Science Infrastructure Plans But not enough… Computer Services second HPC facility (128 processor) X being deployed University SAN (50TB – 25TB mirrored across campus) X being deployed – ~£850k investment » Expected usage May 2005 Recent SMP donations to NeSC Glasgow by Sun Access to campus wide resources X X X NeSC training lab condor pool, EEE condor pool, Physics & Astronomy, … EEE compute clusters and larger SMP machines… others…??? National Grid Service SRDG proposals for Scottish Grid Service infrastructure… in progress Scottish Bioinformatics Research Network equipment funds (more later) … PRISM Forum, 26th April 2005 Glasgow e-Science People Plans E-Science Strategy/Business Plan Widely supported by Senior Management Group X X X X X Underwriting E-Science applications co-ordinator Underwriting of NeSC staff contracts Underwriting existing Grid systems administrator positions Future funds for NeSC running costs … new positions under discussion (Research Computing Director) To be funded through contributions from university wide e-Science activities and university Strategic Investment Funds X Depends on university wide engagement and support of e-Science – Helped through e-Science / e-Research applicable to all faculties PRISM Forum, 26th April 2005 Life Sciences Extensive Research Community >1000 per research university Extensive Applications Many people care about them X Health, Food, Environment Interacts with virtually every discipline Physics, Chemistry, Maths/Stats, Nano-engineering, … 450+ databases relevant to bioinformatics (and growing!) Heterogeneity, Interdependence, Complexity, Change, … PRISM Forum, 26th April 2005 PRISM Forum, 26th April 2005 + links to plant/crops, environmental, health, … information sources Populations Organisms Physiology Organs Tissues Cell signalling Cell Protein-protein interaction (pathways) Protein functions Protein Structures Gene expressions Nucleotide structures Nucleotide sequences Systems Biology? More genomes …... Yersinia pestis Arabidopsis thaliana Buchnerasp. APS Caenorhabitis Campylobacter Chlamydia elegans jejuni pneumoniae Helicobacter Mycobacterium pylori leprae rat mouse Aquifex aeolicus Man Archaeoglobus Borrelia Mycobacterium fulgidus burgorferi tuberculosis Drosophila melanogaster Escherichia Thermoplasma coli acidophilum Neisseria Plasmodium Pseudomonas Ureaplasma meningitidis falciparum aeruginosa urealyticum Z2491 Rickettsia Salmonella PRISMSaccharomyces Forum, prowazekii cerevisiae enterica 26th April 2005 Bacillus subtilis Thermotoga maritima Xylella fastidiosa Distributed and Heterogeneous data Structure Sequence LPSYVDWRSA ECGGCWAFSA TSGSLISLSE NTRGCDGGYI GGINTEENYP Function GAVVDIKSQG IATVEGINKI QELIDCGRTQ TDGFQFIIND YTAQDGDCDV Gene expression PRISM Forum, 26th April 2005 Morphology Database Growth •DBs growing rapidly!!! •Biobliographic (MedLine, PubMed…) •Amino Acid Seq (SWISS-PROT/UNI-PROT, …) •3D Molecular Structure (PDB, …) •Nucleotide Seq (GenBank, EMBL, …) •Biochemical Pathways (KEGG, WIT…) •Molecular Classifications (SCOP, CATH,…) •Motif Libraries (PROSITE, Blocks, …) http://www.genome.jp/dbget/db_growth.gif PRISM Forum, 26th April 2005 Is Grid the Answer? Some key problems to be addressed Tools that simplify access to and usage of data X Internet hopping is not ideal! Tools that simplify access to and usage of large scale HPC facilities X qsub [-a date_time] [-A account_string] [-c interval] [-C directive_prefix] [-e path] [-h] [-I] [-j join] [-k keep] [-l resource_list] [-m mail_options] [-M user_list] [-N name] [-o path] [-p priority] [-q destination] [-r c] [-S path_list] [-u user_list] [-v variable_list] [-V] [-W additional_attributes] [-z] [script] Tools designed to aid understanding of complex data sets and relationships between them X e.g. through visualisation Make it all easy to use! X X X X Scientists should not have to be Linux script experts, …nor set up/configure complex Grid software or follow complex procedures for getting, using Grid certificates, …nor have detailed understanding of low level data schemas for all data sites, … etc etc PRISM Forum, 26th April 2005 PRISM Forum, 26th April 2005 Populations Organisms Physiology Organs Tissues Cell signalling Cell Protein-protein interaction (pathways) Protein functions Protein Structures Gene expressions Nucleotide structures Nucleotide sequences BRIDGES Overview of BRIDGES Biomedical Research Informatics Delivered by Grid Enabled Services (BRIDGES) NeSC (Edinburgh and Glasgow) and IBM Two year project funded by DTI - started October 2003 Supporting project for Cardiovascular Functional Genomics (CFG) project Generating data on hypertension Rat, Mouse, Human genome databases Variety of tools used BLAST, BLAT, Gene Prediction, visualisation, … Variety of data sources and formats Microarray data, genome DBs, project partner research data, … Aim is integrated infrastructure supporting Data federation Security PRISM Forum, 26th April 2005 Bridges Project C F G V ir t u a l P u b lic a lly C u r a te d D a t a E nsem bl O r g a n is a t io n O M IM G la s g o w S W I S S -P R O T P riv a te E d in b u r g h MGI VO Authorisation P r iv a te d ata O x fo rd Information Integrator st Magna Vista Service bl a Synteny Service London … L e ic e s te r P r iv a te d ata N e th e rla n d s P r iv a te data P riv a te d ata + PRISM Forum, 26th April 2005 HUGO RGD D ATA HUB OGSA-DAI P riv a te data d ata + + Grid Security Grid security Single sign-on based on (X.509) digital certificates X X CA in RAL local CA’s possible also Services (and clients) have APIs for fine grained security X “I know who you are and here is your local account” Provides for authentication but need authorisation X “I know who you are and here is what you are allowed to do on my resource” – Various technologies for authorisation including PERMIS, CAS, VOMS, … PERMIS is leading implementation! X Lead by Prof David Chadwick, University of Kent (www.permis.org) – Supports GGF SAML AuthZ interface » Generic way to link Grid services with authorisation infrastructure PRISM Forum, 26th April 2005 Security Authorisation PERMIS allows to Define roles for who can do what on what X Policy = { Role x Target x Action } – Can user X invoke service Y and access or change data Z? » Policies created with PERMIS PolicyEditor (output is XML based policy) PRISM Forum, 26th April 2005 Security Authorisation PERMIS tools then used to associate roles with specific users and sign policies X Policies stored as attribute certificates in LDAP server BRIDGES authorisation Authorisation of computational resources X Policies defined for: – Condor pool (default for unknown users) – ScotGrid (if local account exists for that trusted user) – National Grid Service (if known and trusted user) » Note solution does not require users have their own Grid certificates Authorisation of data sets X Policies defined to restrict data sets scientists/others can get access to PRISM Forum, 26th April 2005 PRISM Forum, 26th April 2005 Other NeSC Glasgow Projects Joint Data Standards Survey Public data resources openness X X X Often cannot query directly Often not easy/possible to find schemas Joint Data Standards Study investigating this – Started on 1st June and involves » Digital Archiving Consultancy » Bioinformatics Research Centre (Glasgow) » NeSC (Edinburgh and Glasgow) – Look at technical, political, social, ethical etc issues involved in accessing and using public life science resources » Interview relevant scientists, data curators/providers – 8 month project with final report due imminently » Funded by MRC, BBSRC, Wellcome Trust, JISC, NERC, DTI PRISM Forum, 26th April 2005 DyVOSE Project Dynamic Virtual Organisations for e-Science Education (DyVOSE) project Two year project started 1st May 2004 funded by JISC Exploring advanced authorisation infrastructures for security X … in Grid Computing Module as part of advanced MSc at Glasgow – Provide insight into rolling Grid out to the masses! ScotGrid GU Condor pool Other (known!) Grid resources Education VO policies PERMIS based Authorisation checks Authorisation decisions PRISM Forum, 26th April 2005 DyVOSE Phase 2/3 Glasgow ScotGrid Edinburgh Condor pool Blue Dwarf Dynamically established VO resources/users Delegated VO policies Glasgow Education VO policies Shibboleth PERMIS based Authorisation checks/decisions PRISM Forum, 26th April 2005 Edinburgh Education VO policies Scottish Bioinformatics Research Network Four year proposal expected to start imminently Funded (£2.4M) by Scottish Enterprise, Scottish Higher Education Funding Council, Scottish Executive Environment and Rural Affairs Department X Involves Glasgow, Dundee, Edinburgh, Scottish Bioinformatics Forum Aim to provide bioinformatics infrastructure for Scottish health, agriculture and industry X X X Infrastructure support at Dundee, Edinburgh and Glasgow to support first-rate research in bioinformatics at each academic institute Infrastructure support at three institutes, to support inter-institutional sharing of compute and data resources through application of Grid computing Outreach and training activities mediated by the Scottish Bioinformatics Forum PRISM Forum, 26th April 2005 VOTES Virtual Organisations for Trials and Epidemiological Studies 3 year MRC (£2.9M) funded project expected to start imminently Plans to develop Grid infrastructure to address key components of clinical trial/observational study X X X Recruitment of potentially eligible participants Data collection during the study Study administration and coordination – Involves Glasgow, Oxford, Leicester, Nottingham, Manchester Clinical Virtual Organisation Framework Used to realise CVO-1 (e.g. for data collection) CVO-2 (e.g. for recruitment) LeiNott GLA Transfer Grid GPs OX IMP Clinical trial data sets PRISM Forum, 26th April 2005 Disease registries Hospital databases Genetics and Healthcare Initiative Five (2+3) year proposal (£4.4M) expected to start imminently Funded by Health Department and Department for Enterprise and Lifelong Learning X Involves Glasgow, Dundee, Edinburgh, Aberdeen – focus of genetics as applied to healthcare – first two years emphasis on providing a platform for research into the genetic basis of common complex diseases in Scotland » Mental health, cardiovascular, … » Plan to establish 15,000 family-based intensively-phenotyped cohort recruited from the East and West of Scotland – basis for neutralising heritable (genetic) risk factors in disease surveillance, treatment optimisation, avoidance of adverse drug events and prediction of response to therapy, health care planning and drug discovery, … PRISM Forum, 26th April 2005 BRIDGES SBRN PRISM Forum, 26th April 2005 JDSS Populations Organisms Physiology Organs Tissues Cell signalling Cell Protein-protein interaction (pathways) Protein functions Protein Structures Gene expressions Nucleotide structures Nucleotide sequences GHI VOTES DyVOSE Numerous other proposals submitted looking at other parts of this picture! Systems Biology? Once we have (securely) connected all relevant data sets and simplified access to and usage of HPC resources, wrapped your favourite bioinformatics applications as Grid services... what questions would you like to ask? – How does a cell work? – Why do people who eat less tend to live longer? – How many people across Scotland had a heart attack in the last 5 years took drug X, and of those that did where genes A or B influenced by this drug? – Who has performed an experiment similar to mine and where their results similar? – … PRISM Forum, 26th April 2005 www.nesc.ac.uk PRISM Forum, 26th April 2005 www.nesc.ac.uk PRISM Forum, 26th April 2005 Bridges Portal PRISM Forum, 26th April 2005 MagnaVista www.nesc.ac.uk PRISM Forum, 26th April 2005 MagnaVista PRISM Forum, 26th April 2005 QTL upload PRISM Forum, 26th April 2005 QTL upload PRISM Forum, 26th April 2005 QTL browsing PRISM Forum, 26th April 2005 Grid Blast Client • Allows ‘genome scale’ blasting • Uses ScotGrid and idle compute resources of training lab Condor pool PRISM Forum, 26th April 2005 PRISM Forum, 26th April 2005 PRISM Forum, 26th April 2005 PRISM Forum, 26th April 2005