Structural_compariso..

advertisement
By: Z. S. Rezaei
Structural comparison










Structural alignment
spectrum
of structural alignment methods
The properties of output
Types of comparison
Algorithmic complexity
Representation of structures
Distance matrix
Methods
Alignment of large RNA molecules
The classes of scoring
Structural alignment





homology between two or more polymer (2)
a window into the distant past of protein evolution(1)
identification homologous(1)
imply evolutionary relationships between proteins that share
very little common sequence(2)
prediction of the functions and the family of the query
protein(2)
rely
on information about conformations.( from X-ray
crystallography or NMR spectroscopy or structure
prediction methods)
for evaluating prediction methods
spectrum
of structural alignment methods(1)
The properties of out put



a superposition of the atomic coordinate sets and a
minimal RMSD.
existence of multiple protein domains complicates the
Structural alignment
a set of superposed three-dimensional coordinates for
each input structure(2)
the root mean square (RMS)
(3)
A geometrical
system
Determination
uniquely a
spatial element
Coordinate system

Spatial coordinate

Planar coordinate
Types of comparisons

Structural superposition
used to compare multiple
conformations of the same protein
uses a simple least-squares fitting
algorithm(2)
 Alignment Algorithms based on
multidimensional rotations and modified
quaternions (2)
a number system In mathematics
a quaternion as the quotient of two
directed lines in a three-dimensional
 represented as the sum of a scalar and
a vector
(6)


Definition of
quaternion
Algorithmic complexity


Optimal solution
Approximate solution(2)
Optimal solution

The optimal "threading" shown to be NP-complete
 Strictly speaking, an optimal solution is only known
for certain protein structure similarity measures
 the algorithm for optimal solution is not practical
(2)
Approximate solution
Approximate
polynomial-time
algorithms for
structural alignment
theoretically classify
the approximate
protein structure
alignment(2)
Representation of structures
Protein structurrepresented in
some coordinate-independent
space
by constructing series of
matrices (2)
distance matrix
a two-dimensional matrix
distances between some subset
of the atoms (such as the alpha
carbons)
Reducing the protein to a coarse
metric such as secondary
structure elements (SSEs)(2)
Methods(2)

DALI
 Combinatorial extension(CE)
 GANGSTA+
 MAMMOTH
 ProBiS
 RAPIDO
 SABERTOOTH
 SSAP
 Spalign
 TOPOFIT
 SSM
DALI





distance alignment matrix method
breaks the input structures into hexapeptide fragments and
calculates a distance matrix
Distance matrix has two diagonals
conducted via a series of overlapping submatrices of size
6x6
Submatrix matches are reassembled into a final alignment
DALI



The original version used a Monte Carlo simulation
The DALI method has also been used to construct a
database known as FSSP (Fold classification based on
Structure-Structure alignment of Proteins, or Families of
Structurally Similar Proteins)
There is an searchable database based on DALI as well as a
downloadable program and web search based on a
standalone version known as DaliLite.
Montecarlo method
a class of computational algorithms
relies on repeated random sampling to
compute their results
especially useful for simulating systems
with many coupled degrees of freedom,
such as fluids, disordered materials,
strongly coupled solids, and cellular
structures (4)
http://ehkinda.biocenter.helsinki.fi/dali_server/
http://ebi.ac.uk/Tools/structure/dalilite
http://ebi.ac.uk/Tools/structure/dalilite
http://ebi.ac.uk/Tools/structure/dalilite
http://ebi.ac.uk/Tools/structure/dalilite
Combinatorial extension(CE)
is similar
to DALI
uses AFPs
to define a
similarity
matrix
A number
of
similarity
metrics are
possible
Combinatorial extension(CE)




initial AFP pair that nucleates the alignment
proceed with the next AFP
The RCSB PDB has recently released an updated
version of CE and FATCAT as part of the RCSB
PDB Protein Comparison Tool
provides a new variation of CE that can detect
circular permutations in protein structures
Circular permutations

A circular
permutation is a
relationship between
proteins whereby the
proteins have a
changed order of
amino acids in their
peptide sequence.
The result is a protein
structure with different
connectivity, but
overall similar threedimensional (3D)
shape(7)
GANGSTA+



A combinatorial algorithm for non-sequential
structural alignment of proteins
searching for similarity in databases
(http://agknapp.chemie.fu-berlin.de/gplus/)
evaluates based on contact maps and secondry
structure
MAMMOTH






MAtching Molecular Models Obtained from Theory
For comparing models coming from structure prediction
decompose the protein structure into heptapeptides
The similarity score between two heptapeptides is calculated
using a unit-vector RMS (URMS) method
These scores are stored in a similarity matrix
Derived from the likelihood of obtaining a given structural
alignment by chance
MAMMOTH-mult




extension of the MAMMOTH algorithm
is very fast
produces consistent and high quality structural
alignments
produces structurally implied sequence alignments
that can be further used for multiple-template
homology modeling
ProBiS






Protein Binding Sites. ProBiS
detects structurally similar sites on protein
surfaces
compares the query protein to members of a
database of protein 3D structures
Using an efficient maximum clique algorithm
Structural similarity scores are calculated for the
query protein’s surface residues, and are
expressed as different colors
used successfully for the detection of protein–
protein, protein–small ligand and protein–DNA
binding sites
RAPIDO

Rapid Alignment of Proteins In terms of Domains
 a web server for the 3D alignment of crystal
 using an approach based on difference distance
matrices
 The Matching Fragment Pairs (MFPs) are then
represented as nodes in a graph
 nodes in graph are chained together to form an
alignment by means of an algorithm for the
identification of the longest path on a DAG
(Directed Acyclic Graph).
 The final step: improve the quality of the alignment
SABERTOOTH

structural profiles to perform structural alignments
 has favourable scaling of computation time with
chain length
 SABERTOOTH can be used online at
http://www.fkp.tu-darmstadt.de/sabertooth/
SSAP







Sequential Structure Alignment Program
uses double dynamic programming
constructs its vectors from the beta carbons for all residues
except glycine
A series of matrices are constructed
Dynamic programming applied to each resulting matrix
matrices are then summed into a "summary" matrix to
Final dynamic programming is applied again to determine
the overall structural alignment
SSAP

originally produced only pairwise alignments
 but has since been extended to multiple
alignments as well
 applied in an all-to-all fashion to produce a
hierarchical fold classification scheme known as
CATH (Class, Architecture, Topology, Homology)

construct the CATH Protein Structure
Classification database
SPalign

Based on a new size-independent score
called SPscore for
 The source code for SPalign and the
server are available at
http://sparks.informatics.iupui.edu/yueyang/
server/SPalign/
TOPOFIT

Based on Delaunay tessellation (DT)
 identifies a feature point on the RMSD/Ne curve
 topomax point
 to detect conformational changes, topological
differences in variable parts
SSM


Secondary Structure Matching (SSM), or
PDBeFold at the Protein Data Bank in
Europe
uses graph matching followed by calpha alignment to compute alignments
Recent Developments
Tmalign
uses a novel method
for weighting its
distance matrix
correct for effects arising
from alignment lengths
RNA structural alignment



large RNA molecules also form
characteristic tertiary structures
A recent method for pairwise structural
alignment of RNA sequences
implemented in the program
FOLDALIGN
In low sequence identity cases
(1)
References
1.
2.
3.
4.
5.
6.
7.
Hitomi Hasegawa and Liisa Holm: Advances and pitfalls of protein
structural alignment, Current Opinion in Structural Biology 2009,
19:341–348
en.wikipedia.org/wiki/structural_alignment software
Cartwright, Kenneth V (Fall 2007). "Determining the Effective or RMS
Voltage of Various Waveforms without Calculus". Technology Interface
8 (1): 20 pages
Anderson, H.L. (1986). "Metropolis, Monte Carlo and the MANIAC".
Los Alamos Science 14: 96–108
Weisstein, Eric W., "Coordinate System" from MathWorld
Boris Abramovich Rozenfelʹd (1988). The history of non-euclidean
geometry: evolution of the concept of a geometric space. Springer. p.
385
Cunningham, B. A.; Hemperly, J. J.; Hopp, T. P.; Edelman, G. M.
(1979). "Favin versus concanavalin A: Circularly permuted amino acid
sequences". Proceedings of the National Academy of Sciences of the
United States of America 76 (7): 3218–3222
I am ready to answer your
questions
Download