Protein Database in Europe Deposition , Validation, Search and Analysis Services Gaurav Sahni, Ph.D. EBI is an Outstation of the European Molecular Biology Laboratory. worldwide Protein Data Bank (wwPDB) • Consists of four sites • RCSB (USA), PDB-j (Japan) BMRB (USA) and PDBe. • Single repository of macromolecular structures. • Started in 1971 and now ~61,000 entries, adding ~200 new entries/week. • Deposited by experimentalists and contents is freely available. • The format of the archive is flat-files with fixed line format, although an improved flat-file format (mmCIF) and XML are also available. Protein Databank in Europe (PDBe) group • Is one of the four sites around the world that where 3D structures may be deposited. • Provides stable and clean repository of macromolecular structure data. • Has services that allow users to access, search and retrieve structural data from a single web access point. PDBe Tasks Deposition and Validation Database design and implementation Retrieve data Analysis tools & Services EBI is an Outstation of the European Molecular Biology Laboratory. Depositions and Curation Deposition via AutoDep4 (http://www.ebi.ac.uk/pdbe-xdep/autodep) Closely collaborate with the other wwPDB members for a single unified archive.. Depositions via EMDEP (http://www.ebi.ac.uk/pdbe-emdep/emdep) Depositions started June 2002 Validation of Structures • Authentication of source That the protein is from human and not rabbit, for example ! • Authentication of structure Comparison of structure against raw data. Geometry and Stereochemistry. Provide results back to depositor. • Validation of correct methodology used Whether X-Ray, NMR or EM. • Conformity to standards Follows PDB format specifications • Error checks • Consistency checks - to identify simple typos Homo sapiens and not Homo sapien (single human?). • Outlier detection - to identify suspect records PDBe Tasks Deposition site Database design and implementation Retrieve data Analysis tools & Services EBI is an Outstation of the European Molecular Biology Laboratory. Disadvantages of Flat files… • Macromolecular structures are very complex. • Existing PDB format is incapable of fully describing few existing structures also. • Format is not readily extensible, to cope, for example, with structural genomics data. • Historical archive is non-uniform and poorly populated. • Search and retrieval of flat files is difficult and/or inaccurate. Uniform Data PDBe Relational Database Improved Query Functionality Crystallographers Biologists Time Effort Usefulness Usage Programmers Bioinformaticians PDBe Tasks Deposition site Database design and implementation Retrieve data Analysis tools & Services EBI is an Outstation of the European Molecular Biology Laboratory. Some Implementation Issues The PDBe database is large and complex: ~61,000 PDB entries Cross-referenced against SwissProt, PubMed etc. Making data accessible without adding additional complexity. Tools for different categories of end-user Simple – biobar Intermediate - PDBelite Advanced – PDBepro New - PDBeView biobar A toolbar search application for Mozilla/Netscape or firefox browsers http://biobar.mozdev.org/ Simple and quick retrieval of data from PDBe and 45 other Databases PDBelite A simple form-based query system to search the PDBe Databases PDBelite Search Results Features of Search Interface • Strengths: • simple, easy to use form • allows multiple search fields to be combined • relatively fast, despite performing quite complex SQL queries • Weaknesses: • not exposing the power of a relational database • limited logical operators between search fields: "name" AND "title" AND "keyword“ "name" OR "title" OR "keyword“ ( "name" OR "title" ) AND NOT "keyword" • the search form is defined by the authors of the search system, not the author of a query PDBepro A java-based flexible graphical search interface for advanced searching Complex searches • User have comprehensive control of their query • Applet provide a dynamic form, as compared to a static HTML form: • • • • choose the fields to be searched specify the relationships between search fields choose the result fields and how results are presented perform “complex” sub-queries e.g. SSM, FASTA • PDBepro uses an applet for constructing queries and a server to execute them • The user describes their query entirely graphically, including logical operations such as AND, OR and NOT PDBeView Search result: The Atlas page PDBe Tasks Deposition site Database design and implementation Retrieve data Analysis tools & Services EBI is an Outstation of the European Molecular Biology Laboratory. AstexViewer™: Visualization@PDBe • View structures as wireframe, backbone or ribbons • Built-in sequence viewer • Calculate and display surfaces • Various display options: • Ramachandran plots • Distance matrix • B-factors Based on the AstexViewer™ from Astex Technology Limited and modified under licence by the PDBe group PDBeChem Ligand Database PDBeSite What is the environment around alpha-D-mannose and beta-D-mannose? PDBeSite What binds ASP ASP HIS LYS ? PDBeSite How does ATP generally interact with LYS in all structures ? PDBeAnalysis Assess Quality of a Structure Bond Distances Bond Angles Ramachandran Plot PDBePisa What assembly can my structure have ? PDBeFold Discover unknown relationships… • Are there any structures in the PDB that are similar to mine? • What SCOP and/or CATH family could my structure belong to ? • Can I get some idea about the possible function of my protein based on similarity with others based on structural similarity ? • Mutiple alignment of many of my structures ? ChemSearch Sub-structure based search of a million chemicals PDBeAnalysis/PDBeValidate Online PDB validation PDBeStatus PDB Deposition status search PDBe provides… • Clean biological data • Integrated data • A single web access point • Query interfaces for different users (Beginner, Occasional or expert). • Interconnected views of the data relating structure, sequence, text & experimental details. PDBechem ligand data Sequence Mapping, SIFTS Electron Density Visualisation AstexViewer PDBePro, PDBelite Active sites Linking to Domain data, eFamily PISA biological assemblies Fold matching Surface Matching