BRIDGES Status Report Dr Richard Sinnott

advertisement
BRIDGES
Status Report
Dr Richard Sinnott
Technical Director National e-Science Centre
|||
Deputy Director Technical Bioinformatics
Research Centre
University of Glasgow
ros@dcs.gla.ac.uk
18th March 2004
Overview
Review goals of Bridges project
Briefly summarise technical approach
Outline achievements thus far
Demonstration
Plans for the future
Bridges Goals
High blood pressure affects 25% of adults in western societies
Cardiovascular Functional Genomics (CFG) project investigating
this through physiological models of hypertension in rat
Bridges is a supporting project to CFG and will provide Grid
infrastructure to facilitate scientific research
CFG project partners are distributed but need to access and
integrate various software and especially data resources
Main aims of BRIDGES are to develop re-useable infrastructure
to provide data federation incorporating appropriate security
concerns
CFG Partner Distribution
Glasgow
Shared data
Edinburgh
Private
data
Public curated
data
Private
data
Leicester
Private
data
Oxford
Private
data
Netherlands
London
Private
data
Private
data
Problems to be addressed
BRIDGES will address the following problems facing CFG biologists
How to integrate data with multiple levels of security including public data,
project only data and private data?
How to search multiple distributed databases through single optimised
queries?
How to use multiple tools in a coordinated (and automated) manner, e.g.
how to develop re-useable workflows for the CFG scientists?
Integration of a range of bioinformatics analysis and visualisation tools, e.g.
BLAST, genome browsers, etc.
How to deal with inconsistencies of online databases and possible “dirty
data”?
How to get more “up to date” data?
Make it all user friendly…


portals,
hidden infrastructure, e.g. security authorisation
Planned Approach
BRIDGES will address these problems through
Development of re-useable Grid services based upon GT3 technologies
Virtualisation of multiple distributed data sets to provide a single virtual
data set for use by the biologists – exploiting IBM’s DiscoveryLink
Developing a collection of data on a well-managed platform, including
copies of extracts of relevant public data, all project data, and the required
software tools (administered using DB2 and DiscoveryLink)
Access to and integration of multiple distributed data sets in a Grid
environment using results from the OGSA_DAI/DAIT projects
A secure environment offering authentication and authorisation

will build on results of the PERMIS security authorisation project
Bridges team
Project Management
Richard Sinnott
Dave Berry
Database Design/Development
Derek Houghton
Grid Services Developer
Micha Bayer
Magnus Ferrier
Technical Input
David White, Jean-Christophe Mestres, Andy Knox, Emmanuel
Guyonnet (IBM), Ela Hunt (Glasgow), Neil Hanlon (Glasgow)
Prof’s David Gilbert, Malcolm Atkinson, Anna Dominiczak,
Achievements
Web site and project portal established
http://europa.nesc.gla.ac.uk/wps/portal
Engaged with CFG consortia
Staff trained in relevant technologies
GT3, DiscoveryLink, Condor
Initial version of local repository developed
Populated with data that cannot be federated

e.g. public data sets with no programmatic interface
– Ensembl/EMBL-EBI, NCBI - GENBANK, REFSEQ, Gene Expression Omnibus
UCSC, SwissProt/TrEMBL UniSTS/dbSTS UNIGENE LOCUSLINK GENMAPP
OMIM Sanger dbSNP dbEST InterPro, Pfam,Prints,Cath, SCOP, ProSite,
Weissman Institute PDB Rikken Rat Genome DB, Mouse Atlas, Affymetrix, …
Includes shared data sets of CFG scientists

QTL DB, …
Achievements …ctd
GT3 based Grid services offered that allow to
make use of these local data sets
Grid enabled BLAST services produced

Offer access to large e-Science infrastructures at Glasgow (ScotGrid)
SyntenyVista tool extended to allow Grid enabled
visual navigation of genomic data sets
Planned front end for many other tools
Externally
Poster at AHM 2003
Tutorial submitted to ISMB/ECCB (the major
bioinformatics conference)
Liaising with other projects

eDIKT, myGrid, GeneGrid, PERMIS, ...
Achievements …ctd
Demonstration of some of the achievements
Plans
Refine/extend and requirements
Further refinement of use cases & scenarios
More data sets (public, shared, private, …)
Implementation and realisation of further use cases
e.g. extended query services for microarray data interpretation, workflows
for probe set mapping, …
Security realisation and roll-out
We can only help share CFG data sets if we can get SECURE access to
them – following up with CFG sites


Authorisation with PERMIS coming
GSI based authentication
Investigate application of replication manager (RLS)

Should support illusion of data from each site being available to all other sites
Further Grid based data visualisation services accessible via
SyntenyVista
Ensure that keep track of relevant developments (WSRF, GT4, …)
Future Vision of Tools via Portal
DRILL-DOWN
FUNCTIONS
To tabular
summaries
To sequence
To multiple
alignment
Questions?
Download