BRIDGES Status Report Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director Technical Bioinformatics Research Centre University of Glasgow ros@dcs.gla.ac.uk 18th March 2004 Overview Review goals of Bridges project Briefly summarise technical approach Outline achievements thus far Demonstration Plans for the future Bridges Goals High blood pressure affects 25% of adults in western societies Cardiovascular Functional Genomics (CFG) project investigating this through physiological models of hypertension in rat Bridges is a supporting project to CFG and will provide Grid infrastructure to facilitate scientific research CFG project partners are distributed but need to access and integrate various software and especially data resources Main aims of BRIDGES are to develop re-useable infrastructure to provide data federation incorporating appropriate security concerns CFG Partner Distribution Glasgow Shared data Edinburgh Private data Public curated data Private data Leicester Private data Oxford Private data Netherlands London Private data Private data Problems to be addressed BRIDGES will address the following problems facing CFG biologists How to integrate data with multiple levels of security including public data, project only data and private data? How to search multiple distributed databases through single optimised queries? How to use multiple tools in a coordinated (and automated) manner, e.g. how to develop re-useable workflows for the CFG scientists? Integration of a range of bioinformatics analysis and visualisation tools, e.g. BLAST, genome browsers, etc. How to deal with inconsistencies of online databases and possible “dirty data”? How to get more “up to date” data? Make it all user friendly… portals, hidden infrastructure, e.g. security authorisation Planned Approach BRIDGES will address these problems through Development of re-useable Grid services based upon GT3 technologies Virtualisation of multiple distributed data sets to provide a single virtual data set for use by the biologists – exploiting IBM’s DiscoveryLink Developing a collection of data on a well-managed platform, including copies of extracts of relevant public data, all project data, and the required software tools (administered using DB2 and DiscoveryLink) Access to and integration of multiple distributed data sets in a Grid environment using results from the OGSA_DAI/DAIT projects A secure environment offering authentication and authorisation will build on results of the PERMIS security authorisation project Bridges team Project Management Richard Sinnott Dave Berry Database Design/Development Derek Houghton Grid Services Developer Micha Bayer Magnus Ferrier Technical Input David White, Jean-Christophe Mestres, Andy Knox, Emmanuel Guyonnet (IBM), Ela Hunt (Glasgow), Neil Hanlon (Glasgow) Prof’s David Gilbert, Malcolm Atkinson, Anna Dominiczak, Achievements Web site and project portal established http://europa.nesc.gla.ac.uk/wps/portal Engaged with CFG consortia Staff trained in relevant technologies GT3, DiscoveryLink, Condor Initial version of local repository developed Populated with data that cannot be federated e.g. public data sets with no programmatic interface – Ensembl/EMBL-EBI, NCBI - GENBANK, REFSEQ, Gene Expression Omnibus UCSC, SwissProt/TrEMBL UniSTS/dbSTS UNIGENE LOCUSLINK GENMAPP OMIM Sanger dbSNP dbEST InterPro, Pfam,Prints,Cath, SCOP, ProSite, Weissman Institute PDB Rikken Rat Genome DB, Mouse Atlas, Affymetrix, … Includes shared data sets of CFG scientists QTL DB, … Achievements …ctd GT3 based Grid services offered that allow to make use of these local data sets Grid enabled BLAST services produced Offer access to large e-Science infrastructures at Glasgow (ScotGrid) SyntenyVista tool extended to allow Grid enabled visual navigation of genomic data sets Planned front end for many other tools Externally Poster at AHM 2003 Tutorial submitted to ISMB/ECCB (the major bioinformatics conference) Liaising with other projects eDIKT, myGrid, GeneGrid, PERMIS, ... Achievements …ctd Demonstration of some of the achievements Plans Refine/extend and requirements Further refinement of use cases & scenarios More data sets (public, shared, private, …) Implementation and realisation of further use cases e.g. extended query services for microarray data interpretation, workflows for probe set mapping, … Security realisation and roll-out We can only help share CFG data sets if we can get SECURE access to them – following up with CFG sites Authorisation with PERMIS coming GSI based authentication Investigate application of replication manager (RLS) Should support illusion of data from each site being available to all other sites Further Grid based data visualisation services accessible via SyntenyVista Ensure that keep track of relevant developments (WSRF, GT4, …) Future Vision of Tools via Portal DRILL-DOWN FUNCTIONS To tabular summaries To sequence To multiple alignment Questions?