PPTX

advertisement
PDBeFold (SSM)
http://www.pdbe.org/fold
A web-based service for protein structure comparison and
structure searches
EBI is an Outstation of the European Molecular Biology Laboratory.
Structure alignment
Structure alignment
may be defined as
identification of
residues occupying
“equivalent”
geometrical positions
 Unlike in sequence alignment, residue type
is neglected
 Used for
 measuring the structural similarity
 protein classification and functional analysis
 database searches
2
Sequence and Structure Alignments
Sequence alignment
Based on residue identity,
sometimes with a modified alphabet
Structure alignment
Based on geometrical equivalence of
residue positions, residue type disregarded
--AARNEDDDGKMPSTF-L
E-AARNFG-DGK--STFIL
Used for:
 evolution studies
 protein function analysis
 guessing on structure similarity
Algorithms: Dynamic programming +
heuristics
Applications: BLAST, FASTA, FLASH
and others
Used for:
 protein function analysis
 some aspects of evolution studies
Algorithms: Dynamic programming, graph
theory, MC, geometric hashing and others
Applications: DALI, VAST, CE,
MASS, SSM and others
Methods
 Many methods are known:
 Distance matrix alignment (DALI, Holm & Sander, EBI)
 Vector alignment (VAST, Bryant et. al. NCBI)
 Depth-first recursive search on SSEs (DEJAVU, Madsen & Kleywegt,
Uppsala)
 Combinatorial extension (CE, Shindyalov & Bourne, SDSC)
 Dynamical programming on Ca (Gerstein & Levitt)
 Dynamical programming on SSEs (SSA, Singh & Brutlag, Stanford
University)
 many more …
 SSM employs a 2-step procedure:
A Initial structure alignment and superposition using SSE graph matching
B Ca - alignment
Three dimensional graph matching
• Protein secondary structure elements (SSE)–
natural and convenient objects for building three
dimensional graphs.
• Secondary structures provide most functionality
and is conserved through evolution
• Details of protein fold –expressed in terms of two
SSE – helices and strands.
SSE graph matching
H1
A
B
S3
S2
S3
H4
H1
H2
H6
S1
S4
S2
S2
S1
H3 S
7
S6
S4
B
H1
H2
S1
H5
A
S5
H1
H2
S1
S3
S2
S4
S5
S3
S6
S4
S7
H2
H3
H4
H5
H6
Matching the SSE graphs yields a
correspondence between secondary
structure elements, that is, groups of
residues. The correspondence may
be used as initial guess for structure
superposition and alignment of
individual residues.
What next?
• We have considered three dimensional arrangement of
secondary structure element (SSE) regardless of their
ordering in protein chain.
• Connectivity of SSEs is significant (can be neglected in
comparing mutated/engineered proteins)
• In previous methods connectivity was either preserved or
neglected.
PDBeFold (SSM) Approach – a more flexible way
• There are three options –
1) connectivity of SSEs neglected
Different
connectivity in
SSE but SSE
graphs are
geometrically
identical
2) Soft connectivity – general order of SSEs along their
protein chains are same in both structures BUT any
number of missing/unmatched SSE between matched
ones allowed
3) Strict connectivity – matched SSEs follow same order
along their protein chains – separated only by equal
number of matched/unmatched SSE in both structures
• To obtain 3D alignment of individual residues – represent
them by their C-alpha atoms – use results of graph
matching as a starting point
Ca - alignment
 SSE-alignment is used as an initial guess for Ca-alignment
 Ca-alignment is an iterative procedure based on the expansion
of shortest contacts at best superposition of structures
chain A
chain B
matched helices
matched strands
 Ca-alignment is a compromise between the alignment length
Nalign and r.m.s.d. Longest contacts are unmapped in order to
maximise the Q-score:
2
Q
N align
 1  r.m.s.d . R0 2  N A N B
Multiple structure alignment
 More than 2 structures are aligned
simultaneously
 Multiple alignment is not equal to the
set of all-to-all pairwise alignments
 Helps to identify common structure
motifs for a whole family of
structures
PDBeFold output
 Table of matched Secondary Structure Elements
 Table of matched backbone Ca-atoms with distances
between them at best structure superposition
 Rotation-translation matrix of best structure
superposition
 Visualisation in Jmol and Rasmol
 r.m.s.d. of Ca-alignment
 Length of Ca-alignment Nalign
 Number of gaps in Ca-alignment
 Quality score Q
 Statistical significance scores P(S), Z
 Sequence identity
The PDBefold Search Interface
The Results Page For Pairwise Alignment
Analyzing the result from a particular pairwise alignment
Residue-by-Residue Structure alignment result
Multiple 3D alignment using PDBefold
Results from multiple 3D alignment
Conclusion
 it is quite possible that residue identity plays a much
less significant role in protein structure than often
believed
 as a consequence, the role of residue identity in protein
function may be often overestimated
 using sequence identity for the assessment of structural
or functional features may give more false negatives
than expected
 physical-chemical properties of residues should be
given preference over residue identity in structure and
function analysis
 modern methods for structure alignment are efficient;
there is little sense to use sequence alignment in
structure-related studies
If you have to ask….
• Are there any structures in the
PDB that are similar to mine?
• What SCOP and/or CATH
family could my structure
Use PDBefold.
belong to ?
• Can I get some idea about the
possible function of my protein
based on similarity with others
based on structural similarity ?
• Mutiple alignment of many of
Upload your own PDB file for analysis !!
my structures ?
22
31.10.07
Macromolecular Structure Database
Download