Introduction_02Nov2009

advertisement
Protein Database in Europe
Deposition , Validation, Search and Analysis Services
Gaurav Sahni, Ph.D.
EBI is an Outstation of the European Molecular Biology Laboratory.
worldwide Protein Data Bank (wwPDB)
•
Consists of four sites
• RCSB (USA), PDB-j (Japan)
BMRB (USA) and PDBe.
•
Single repository of
macromolecular structures.
•
Started in 1971 and now ~61,000
entries, adding ~200 new
entries/week.
•
Deposited by experimentalists and
contents is freely available.
•
The format of the archive is flat-files
with fixed line format, although an
improved flat-file format (mmCIF)
and XML are also available.
Protein Databank in Europe (PDBe) group
• Is one of the four sites around the world that where 3D structures
may be deposited.
• Provides stable and clean repository of macromolecular structure
data.
• Has services that allow users to access, search and retrieve
structural data from a single web access point.
PDBe Tasks
Deposition and Validation
Database design and implementation
Retrieve data
Analysis tools & Services
EBI is an Outstation of the European Molecular Biology Laboratory.
Depositions and Curation
Deposition via AutoDep4
(http://www.ebi.ac.uk/pdbe-xdep/autodep)
Closely collaborate with the other wwPDB
members for a single unified archive..
Depositions via EMDEP
(http://www.ebi.ac.uk/pdbe-emdep/emdep)
Depositions started June 2002
Validation of Structures
•
Authentication of source
That the protein is from human and not rabbit, for example !
•
Authentication of structure
Comparison of structure against raw data. Geometry and Stereochemistry.
Provide results back to depositor.
•
Validation of correct methodology used
Whether X-Ray, NMR or EM.
•
Conformity to standards
Follows PDB format specifications
•
Error checks
•
Consistency checks - to identify simple typos
Homo sapiens and not Homo sapien (single human?).
•
Outlier detection - to identify suspect records
PDBe Tasks
Deposition site
Database design and implementation
Retrieve data
Analysis tools & Services
EBI is an Outstation of the European Molecular Biology Laboratory.
Disadvantages of Flat files…
• Macromolecular structures are very complex.
• Existing PDB format is incapable of fully describing few
existing structures also.
• Format is not readily extensible, to cope, for example,
with structural genomics data.
• Historical archive is non-uniform and poorly populated.
• Search and retrieval of flat files is difficult and/or
inaccurate.
Uniform Data
PDBe Relational Database
Improved Query
Functionality
Crystallographers
Biologists
Time Effort Usefulness Usage
Programmers
Bioinformaticians
PDBe Tasks
Deposition site
Database design and implementation
Retrieve data
Analysis tools & Services
EBI is an Outstation of the European Molecular Biology Laboratory.
Some Implementation Issues
 The PDBe database is large and complex:
 ~61,000 PDB entries
 Cross-referenced against SwissProt, PubMed etc.
 Making data accessible without adding
additional complexity.
 Tools for different categories of end-user
 Simple – biobar
 Intermediate - PDBelite
 Advanced – PDBepro
 New - PDBeView
biobar
A toolbar search application for
Mozilla/Netscape or firefox browsers
http://biobar.mozdev.org/
Simple and quick retrieval of data from PDBe and 45 other Databases
PDBelite
A simple form-based query system to search the PDBe
Databases
PDBelite Search Results
Features of Search Interface
• Strengths:
• simple, easy to use form
• allows multiple search fields to be combined
• relatively fast, despite performing quite complex SQL queries
• Weaknesses:
• not exposing the power of a relational database
• limited logical operators between search fields:
 "name" AND "title" AND "keyword“
 "name" OR "title" OR "keyword“
 ( "name" OR "title" ) AND NOT "keyword"
• the search form is defined by the authors of the search system,
not the author of a query
PDBepro
A java-based flexible
graphical search
interface for
advanced searching
Complex searches
• User have comprehensive control of their query
• Applet provide a dynamic form, as compared to a static
HTML form:
•
•
•
•
choose the fields to be searched
specify the relationships between search fields
choose the result fields and how results are presented
perform “complex” sub-queries e.g. SSM, FASTA
• PDBepro uses an applet for constructing queries and a
server to execute them
• The user describes their query entirely graphically,
including logical operations such as AND, OR and NOT
PDBeView
Search result: The Atlas page
PDBe Tasks
Deposition site
Database design and implementation
Retrieve data
Analysis tools & Services
EBI is an Outstation of the European Molecular Biology Laboratory.
AstexViewer™: Visualization@PDBe
• View structures as wireframe,
backbone or ribbons
• Built-in sequence viewer
• Calculate and display surfaces
• Various display options:
• Ramachandran plots
• Distance matrix
• B-factors
Based on the AstexViewer™ from Astex Technology Limited
and modified under licence by the PDBe group
PDBeChem
Ligand Database
PDBeSite
What is the environment around alpha-D-mannose and beta-D-mannose?
PDBeSite
What binds ASP ASP HIS LYS ?
PDBeSite
How does ATP generally interact with LYS in all structures ?
PDBeAnalysis
Assess Quality of a Structure
Bond Distances
Bond Angles
Ramachandran Plot
PDBePisa
What assembly can my structure have ?
PDBeFold
Discover unknown relationships…
•
Are there any structures in the PDB that are similar to mine?
•
What SCOP and/or CATH family could my structure belong to ?
•
Can I get some idea about the possible function of my protein
based on similarity with others based on structural similarity ?
•
Mutiple alignment of many of my structures ?
ChemSearch
Sub-structure based search of a million chemicals
PDBeAnalysis/PDBeValidate
Online PDB validation
PDBeStatus
PDB Deposition status search
PDBe provides…
• Clean biological data
• Integrated data
• A single web access point
• Query interfaces for different users (Beginner,
Occasional or expert).
• Interconnected views of the data relating
structure, sequence, text & experimental
details.
PDBechem ligand data
Sequence Mapping,
SIFTS
Electron Density Visualisation
AstexViewer PDBePro,
PDBelite
Active
sites
Linking
to
Domain
data,
eFamily
PISA biological assemblies
Fold matching
Surface Matching
Download