Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe http://www.ebi.ac.uk/msd Temblor Advanced search EMBL Wellcome Trust EBI-MSD EU Spine CCP4 Structural Genomics harvesting Oxford CLRC EU Autostruct EHTPX Validation E-science York Daresbury BBSRC EU Integration NMRQual SCOP CATH pfam Utrecht Sanger Inst EU MRC Data Exchange IIMS CCPN Electron Microscopy Cambridge EBI-MSD BBSRC EU USA Grant & co-ordinator Grant Funding RCSB Core Funding BMRB Data Exchange E-MSD Provides clean biological data integrated data a single web access point query interfaces for different users interconnected views of the data relating structure, sequence, text & experimental details For Biologist, Chemist, Structural Biologist, Teacher Web Interface Query Results and Search Query Interactive viewer Keyword Sequence Structure Active site Ligand PDB Atlas page Structure Secondary Struct Sequence Active Site Expt data Folds- Scop/Dali Ligands Active Sites Sorted Hit List Medline SwissProt Methods FastA SSM Web services PDB Secondary Struct Data API’s Folds- Scop/Dali Ligands Active Sites Methods - as web services Medline SwissProt Methods FastA SSM Web based pages Search interfaces Interactive Visualisation DATA INTEGRATION A Database for all ? MSD SEARCH DATABASE Data integration We want to include all types of biological data Structure, Sequence, Textual Observed biochemistry (Brenda) Sequence annotation (Prints) DNA - ORFS, SNIPS PDB Secondary Struct Folds- Scop/Dali Ligands Active Sites But we can’t do everything ! So can the Grid allow the integration of data from other sources ? Medline SwissProt Problems for Grid (1- Provenance) We are a funded institute. We have to be seen to be useful or we do not get funded ! Industry need to be seen - share holders Origin of the Distributed information: User and funding body need to see who provided the information. How do we retain and present detail of this ? Problem for Grid (2) We do not know “best practice” in much of biology There will be conflict of information Methods : structure alignment, secondary structure… Data : multiple coordinates, multiple sequence data…. Data/methods have associated validity information - the different data/methods may be only inconsistent in part. How is conflicting information going to be presented to and filtered for a user Who is going to assign data validity ! Grid problem (3- Data access control) Bioinformatics is fashionable at the moment. There is a “problem” when something is perceived to be useful eg : There are about 60,000 patents in the US for the ~30,000 human genes - not a problem yet, but….. This is more than data security : Will Grid employ some good lawyers ? Will Grid hide information on request - cf PDB has “hold” status Will Grid “modify” information on request - cf. Google search result order as been “updated” Summary We want to be able to provide a scientific service Web pages and Web services We would like to be able to expand the results to include information from other data resources. The 3 issues are only a small number of issues, but represent fundamental problems CLEAN DATA : Quaternary structure Assembly Sub-Assembly Chains Biology Xray Experiment Atoms Residues CLEAN DATA :Example of experimental result Authors would know structure, we have to derive it at submission Asymmetric unit M.BOCHTLER et al, NATURE, 403, 800 (2000) Contains 3 separate molecules - 2 copies of a dodecamer and 1 hexamer Dodecamer Hexamer Assembly http://pqs.ebi.ac.uk Clean data RESOLUTION SLIDING SCALE FOR RULES electron density at different resolutions phenylalanine Correctly placed into the 1.2 Å data. This still can be done with confidence in the 2 Å case. But at 3 Å we already observe a deviation of the centroid of the ring from the correct model Zscore=(Fit-<Fit>)/sigma A large positive spike is indicative of a residue which is worse than the average for that residue type in structures of similar resolutions. 1qi3 1f83 1rmg Terrible Good Geometric outliers PHENYLALANINE LIGAND DB Loader Site environment DB Covalent Bonds Coordinate bonds Hydrogen bonds Planes Non-bonding Electrostatics Di-Sulphide bonds N ASP PHE O S PHE VAL