The Greek Key Motif - University of Waterloo

advertisement
The Greek Key Motif
Shuo Xiang (Alex)
Dr. Ming Li
CS 882 Course Project
Presentation
Fall 2006
Outline
Introduction



What is a Greek key?
Where do Greek keys occur?
History of the Greek key motif
Formal Definition



Preparatory knowledge
Formal definition of Greek key
Classification of Greek key
Outline
Operational Definition and Machine
recognition of Greek keys





Motivation
PDB
DSSP
PROMOTIF
Greek key hunter
Greek key hunter in action


Setup
Results
Part 1
Introduction
What is a Greek Key
A Greek key is a
series of four
consecutive β strands
taking on the
conformations shown
to the right when
viewed in a topology
diagram (Branden
and Tooze, 1999)
What is a Greek key
Note, however, that topology
diagrams are a simplified
way of representing
proteins, in real life, Greek
keys look more like the
object shown to the right.
The picture is generated by
PyMOL on PDB file 4GCR
for γ crystallin with residues
34-62 displayed and
everything else masked.
What is a Greek Key
The Greek keys were so named because
of their visual affinity to decorational
patterns used in ancient Greek vases
shown below (Li, 2006)
Where do Greek keys occur?
Being a β-motif, Greek keys obviously
occur only in proteins having β-strands.
This means that α-only proteins such as
myoglobin and hemoglobin will not have
Greek keys
From Dr. Li’s lectures, we also know that
γ-crystallins are a very important class of
proteins whose Greek key motifs have
evolutionary significance
Where do Greek keys occur?
According to Dr. Hutchinson and Dr.
Thornton, (Hutchinson and Thornton,
1993), Greek key motifs could also be
found in the following proteins:





Trypsin
Haemmagglutinin
Tumour necrosis factor (TNF)
Immunoglobulins
Azurin
Where do Greek keys occur?






Prealbumin
PapD (which is a chaperon)
Nitrite reductase
Insecticidal δ-endotoxin
Bacterial cellulase
Sperical virus capsid proteins
History of Greek key
The Greek key motif was first studied and
formally characterized by Dr Jane S.
Richardson in her paper “β-Sheet topology
and the relatedness of proteins”
(Richardson, 1977)
In (Richardson, 1977) Dr. Richardson has
compared Greek key motifs to the Greek
keys found on a black Greek vase
History of Greek key
History of Greek key
The earliest Greek key
containing protein whose
structure has entered the
PDB is Immunoglobulin
FAB (7FAB)
Its structure is determined
by Dr. F.A. Saul and Dr.
R.J. Poljak using x-ray
diffraction in August 27,
1976 (Saul and Poljak,
1992)
Part 2
Formal Definition
Preparatory Knowledge
Three dimensional protein representations are
often too complex for any useful patterns to be
extracted.
Therefore, a simpler, two dimensional
abstraction of proteins, known as a “topology
diagram” is used.
In a topology diagram, α-helices and β-strands
are laid out across a role, with their spatial
orientations and connections (coils) preserved.
β-sheets are also preserved to a certain extent.
Preparatory Knowledge
It is when one lays out the topology
diagram for proteins that structural motifs
such as the Greek key becomes apparent.
Dr. Jane Richardson was the earliest
researcher to study topologies of βstructures.
During her study, she has created a
nomenclature for β-strand topologies
(Richardson, 1981)
Preparatory Knowledge
Preparatory Knowledge
Therefore, Dr. Richardson’s nomenclature of βstrand topologies may be summarized as:
“+y” : coil goes y β-strands to the right, starting βstrand and destination β-strand are anti-parallel to
each other
“-y” : coil goes y β-strands to the left, starting βstrand and destination β-strand are anti-parallel to
each other
“+yX” : coil goes y β-strands to the right, starting βstrand and destination β-strand are parallel to each
other
“-yX” : coil goes y β-strands to the left, starting βstrand and destination β-strand are parallel to each
other
Formal Definition of Greek key
With Dr. Richardson’s nomenclature, Greek keys
could now be formally defined as any set of 4
consecutive β-strands having the topology of “-3,
+1, +1” or “-1, -1, +3” (Hutchinson and Thornton,
1993)
Classification of Greek key
However, not all four β-strands of the
Greek key falls within the same β-sheet.
Hence there arises a need to classify
Greek key structures according to their
distribution of β-strands amongst βsheet(s).
Dr. Hutchinson and Dr. Thornton has given
such a classification in (Hutchinson and
Thornton, 1993)
Classification of Greek key
If all four β-strands of the Greek key lie in
the same β-sheet, then it is called a (4,0)
Greek key, meaning that there are four
strands in one β-sheet and zero strands in
the other β-sheet.
Note that β-strands of a Greek key can go
into at most two β-sheets. More than two
β-sheets would make it very hard to
decide whether a Greek key exists instead
of some other random β-structure.
Classification of Greek key
Furthermore, (4,0) Greek keys come in two
flavours — an “N” version where the N-end of
the Greek key is on the outside, and a “C”
version where the C-end of the Greek key is on
the outside. This is shown in the diagram below.
Classification of Greek key
Similarly, (Hutchinson and Thornton, 1993)
classified the following as (3,1)N and (3,1)C
Greek keys. Note that the green arrow
represents β-strands from a different βsheet.
Classification of Greek key
(Hutchinson and Thornton, 1993) also classified the (2,2)
structures as having an “N” version and a “C” version
However, from an examination of the PROMOTIF
outputs (to be covered later) and the fact that the “N”
version could be rotated to produce the “C” version, and
so the two versions are topologically equivalent to each
other, I conclude that there is only one flavour of (2,2)
structure.
Classification of Greek key
For this project the classification of (Hutchinson and
Thornton, 1993) is extended to include the following
additional combinations of four β-strands from two
different β-sheets
Part 3
Operational definition and
machine recognition of Greek
keys
Motivation
In the previous part, we have developed a
“formal” definition of what Greek keys are
in terms of topological diagrams
But we need an “operational” definition of
what Greek keys are so that computers
will be able to identify them from PDB files
The “formal” definition, while fine for
humans, remains too sketchy and
ambiguous for computers to work with
Motivation
In this part the various software whose
output the “Greek key hunter” depends on
will be examined
I will then show the working principles of
“Greek key hunter”
With the “Greek key hunter” it will be
possible for computers to automatically
identify both the Hutchinson and Thornton
classification and the extended
classification of Greek keys for this project.
PDB
The Protein Data Bank (PDB) is a repository of
protein structures that have been obtained
through X-Ray crystallography or Nuclear
Magnetic Resonance (NMR).
Almost every structural bioinformatics project
makes use of the PDB in some way.
For this project, PDB data acts as input for the
DSSP algorithm.
Thanks go out to Gao, Xin and Sun, Yang for
giving me the PDB data so that I do not have to
download it myself.
DSSP
DSSP is the standard algorithm used in
structural bioinformatics to characterize
secondary structures of a protein
molecule.
It is written by Wolfgang Kabsch and Chris
Sander (Kabsch and Sander, 1983)
In this project DSSP processes PDB data
to produce output that will be worked on
by the PROMOTIF software.
PROMOTIF
PROMOTIF is one of the key software for this
project.
It takes the DSSP output and further refines
them to produce data that are more relevant to
motif-analyses.
For this project, PROMOTIF produces the
Richardson topology information that will be vital
to the recognition of Greek keys.
PROMOTIF is written by Dr. Hutchinson and Dr.
Thornton using the programming language
FORTRAN. Fortunately it could be compiled on
Linux using the f77 compiler.
Greek key hunter
The PROMOTIF suite of software was
easy to use and its β-structure analyzer
worked efficiently with the PDB files to fully
characterize all the β-strands in the protein
of a given PDB file
Unfortunately there is a very important
component that is absent from the
PROMOTIF framework — (gasp) a Greek
key analyzer
Greek key hunter
This lack of a Greek key analyzer provides me
with an opportunity to write such a analyzer that
not only identifies the Greek key structures
classified by Drs. Hutchinson and Janet, but also
the extended classification I have developed for
this project.
The objective is then to write a program that
could identify Greek keys from the β-structural
output of the PROMOTIF software and other
relevant data. In other words, a “Greek key
hunter”.
Greek key hunter
There is a first principle of Greek keys that
vastly simplifies their search in the
pdb#.str file — Greek keys always contain
“four sequential β-strands” (Hutchinson
and Thornton, 1993)
This means that only consecutive quartets
of lines needs to be grouped and searched
for Greek keys in the pdb#.str file.
Greek key hunter
This is the PROMOTIF β-strand analyzer output
for 1FNB — Ferredoxin reductase
The first principle dictates that line n and the
next three lines comprise Greek key candidate n
Greek key hunter
Once we have four lines representing a
potential Greek key candidate, how do we
develop the rules that would allow the
computer to judge these four lines as
representing either a valid or an invalid
instance of Greek key?
I have found that the best way to deriving
these rules is through the pragmatic
approach of “learning by examples”.
Greek key hunter
In their paper, Drs. Hutchinson and Thornton has
listed Greek-key containing proteins for each
Greek key class defined in part 2.
By looking at the pdb#.str output file and the
PDB files (in PyMOL) of the representative
proteins of each Greek key class, I would be
able derive the rules that would characterize
different classes of Greek keys and differentiate
Greek keys from non-Greek keys.
These rules would then be coded into the Greek
key hunter
Greek key hunter
I have decided to start with the easiest
class of Greek keys to characterize — the
(4,0) class of Greek keys
The first entry for the (4,0) Greek keys in
Table II of (Hutchinson and Thornton,
1993) is “Ferredoxin reductase” (1FNB).
Before I looked into 1FNB.str, I expected
the “topology” column of that file to read
either “-3, +1, +1” or “+3, -1, -1”.
Greek key hunter
…But the actual topology read-out was
wildly different and came as a shock!
Greek key hunter
The residue columns indicated that these 4
sequential β-strands extended from residues 57
to 116, (Hutchinson and Thornton, 2006) says
the Greek key extends from residue 56 to 117.
So I know I am looking at the right quartet.
But the topology column reads “+3, +1, -5” !!!
Greek key hunter
Upon examining
the PDB file for
ferredoxin
reductase in
PyMOL, it became
all clear why the
assigned topology
of “+3, +1, -5” was
appropriate.
Greek key hunter
In fact, the topology
diagram for the Greek
key in residues 56-117
of ferredoxin
reductase looks like
this →
(the coils associated
with the middle two
strands are omitted for
clarity)
Greek key hunter
Therefore, to enable an operational
definition of the (4,0)C Greek key, we must
relax the original formal definition
somewhat
Operational definition of (4,0)C Greek key:
let the Richardson topology of four
sequential β-strands be (+x, +y, -z) or (-x,
-y, +z), where x, y and z are positive
integers, if (x + y) < z, then the β-strands
form a (4,0)C Greek key.
Greek key hunter
Assume that we are currently processing line i
to line i+3 of pdb#.str.
If not(sheet # of line i == sheet # of
== sheet # of line i+2 == sheet # of
print(“Not a (4,0)[C] Greek key!”)
return false;
end if
x = Richardson topology number of line
y = Richardson topology number of line
z = Richardson topology number of line
if any of x, y or z contain the suffix
print(“Not a (4,0)[C] Greek key!”)
return false;
end if
line i+1
line i+3) then
i
i+1
i+2
‘X’ then
Greek key hunter
if not(x > 0 and y > 0 and z < 0) and not(x < 0 and
y < 0 and z > 0) then
print(“Not a (4,0)[C] Greek key!”)
return false
end if
x = absolute_value(x)
y = absolute_value(y)
z = absolute_value(z)
if x + y < z then
print(“(4,0)[C] Greek key found!”)
return true
else
print(“Not a (4,0)[C] Greek key!”)
return false
end if
Greek key hunter
Likewise, the operational definition of (4,0)N
Greek keys could be easily derived by
comparing it to the (4,0)N Greek keys.
Greek key hunter
Operational definition of (4,0)N Greek key:
let the Richardson topology of four
sequential β-strands be (+x, -y, -z) or (-x,
+y, +z), where x, y and z are positive
integers, if (y + z) < x, then the β-strands
form a (4,0)N Greek key.
And here is the pseudo-code for machine
recognition of a (4,0)N Greek key.
Greek key hunter
Assume that we are currently processing line i
to line i+3 of pdb#.str.
If not(sheet # of line i == sheet # of
== sheet # of line i+2 == sheet # of
print(“Not a (4,0)[N] Greek key!”)
return false
end if
x = Richardson topology number of line
y = Richardson topology number of line
z = Richardson topology number of line
if any of x, y or z contain the suffix
print(“Not a (4,0)[N] Greek key!”)
return false
end if
line i+1
line i+3) then
i
i+1
i+2
‘X’ then
Greek key hunter
if not(x > 0 and y < 0 and z < 0) and not(x < 0 and
y > 0 and z > 0) then
print(“Not a (4,0)[N] Greek key!”)
return false
end if
x = absolute_value(x)
y = absolute_value(y)
z = absolute_value(z)
if y + z < x then
print(“(4,0)[N] Greek key found!”)
return true
else
print(“Not a (4,0)[N] Greek key!”)
return false
end if
Greek key hunter
Now we are ready to tackle the operational
definition of Greek keys that spreads
across two β-sheets
Continuing our “learning by
examples”, let us study the
(3,1)C Greek key of γ crystallin,
whose evolutionary significance
has been discussed in Dr. Li’s
notes and lecture
Greek key hunter
As we can see, the 1st, 2nd
and 4th β-strand counting
from the N-end entry point
of the Greek key found in
residues 41-81 of γ
crystallin belong to one βsheet, while the 3rd β-strand
belong to another β-sheet.
Looking at the PyMOL
display, the 3rd β-strand is
antiparallel to the 2nd and 4th
β-strands, even though they
are in different β-sheets
Greek key hunter
Once again, I was shocked to see the number “99”. I
initially thought that maybe one β-strand had to skip over
99 other β-strands in order to reach the next β-strand in
the Greek key. Yet upon further consulting the
PROMOTIF documentation I realized “99” indicates that
the next β-strand is in a different β-sheet, and so its
Richardson topology in relation to the previous β-strand
is not given by PROMOTIF (i.e. PROMOTIF does not
attempt to characterize inter-sheet β-strand topologies)
Greek key hunter
Yet the PyMOL display clearly tells us that
the β-strands in question are anti-parallel
to each other. What could I do in order to
make the computer realize that too?
Interstrand angles to the rescue!
Greek key hunter
Dr. Hutchinson and Dr. Thornton have had
the same problem in characterizing the
topology of inter-sheet β-strands, which is
not defined by Richardson topologies.
Instead, they used the concept of
interstrand angles to determine whether
sequential β-strands in different β-sheets
are antiparallel to each other.
Greek key hunter
In the inter-strand angle scheme, the N to
C orientation of β-strands in space could
be approximated by linear vectors.
Greek key hunter
The dot products of pairs of these vectors could then be
taken, regardless of whether the corresponding βstrands are in the same β-sheet or not. Since the dot
product yields the cosine of the angle θ between two βstrands, if θ > 120°, then the two β-strands are taken to
be anti-parallel to each other. (Hutchinson and Thornton,
1993)
Greek key hunter
With the definition of the inter-strand
angles it is now possible to characterize
the topology between β-strands in different
β-sheets.
Thus it is now possible to give an
operational definition of Greek key motifs
that spans more than one β-sheet —
whenever we hit a “99” in the Richardson
topology column, we simply look up the
corresponding inter-strand angle in a
table.
Greek key hunter
Operational definition of (3,1)C Greek key:
let the four sequential β-strands be from i to
i+3. If all of the following are satisfied:






i, i + 1, i + 3 belong to one β-sheet, i + 2 belongs
to another β-sheet
Richardson topology of i is “+x” or “-x”, where x
is a positive integer
θ between i + 1 and i + 2 is greater than 120°
θ between i + 2 and i + 3 is greater than 120°
θ between i + 3 and i is greater than 120° (this is
a sanity check)
i is hydrogen bonded to both i + 1 and i + 3
Greek key hunter
Assume that we are currently processing line i
to line i+3 of pdb#.str.
If not((sheet # of line i == sheet # of line i+1
== sheet # of line i+3) and sheet # of line
i+2 not= sheet # of line i and sheet # of line
i+2 not= sheet # of line i+1 and sheet # of
line i+2 not= sheet# of line i+3) then
print(“Not a (3,1)[C] Greek key!”)
return false
end if
x = Richardson topology number of line i
if (x contain the suffix ‘X’) or x==99 then
print(“Not a (3,1)[C] Greek key!”)
return false
end if
Greek key hunter
θ[1] = interstrand_angle(i+1, i+2)
θ[2] = interstrand_angle(i+2, i+3)
if not(θ[1] > 120° and θ[2] > 120°) then
print(“Not a (3,1)[C] Greek key!)
return false
end if
if not(hydrogen_bonded(i, i+1) == true and
hydrogen_bonded(i, i+3) == true)
print(“Not a (3,1)[C] Greek key!)
return false
end if
print(“(3,1)[C] Greek key found!”)
return true
Greek key hunter
function interstrand_angle(strand1#, strand2#)
vector u = β_vector_table[strand1#]
vector v = β_vector_table[strand2#]
dotproduct = <u, v>
u_modulus = sqrt(u[x]*u[x] + u[y]*u[y] + u[z]*u[z])
v_modulus = sqrt(v[x]*v[x] + v[y]*v[y] + v[z]*v[z])
the_cosine = dotproduct / (u_modulus * v_modulus)
return arccos(the_cosine)
Greek key hunter
So the only remaining question is: what would
be the best way to approximate a β-strand by a
linear vector?
An intuitive approach would be to construct a
vector whose starting point is the N-end αcarbon and whose end point is the C-end αcarbon
However, if the β-strand is highly curved, this
might not work well
Dr. Hutchinson and Dr. Thornton proposed a
“best line of fit” approach to this problem
Greek key hunter
The diagram above shows a β-strand, notice how the αcarbons zigzag up and down, and so the β-strand is
“pleated”.
(Hutchinson and Thornton, 1993) constructed vector Vi
to be [(Cαi+1-Cαi) + (Cαi-1-Cαi)], and take the “X” point to
be ¼ from Cαi along vector Vi
Together, the set of “X” points provides a “smoothed out”
version of the originally pleated β-strand.
Greek key hunter
The line that “best fits” the set of “X” points is
then taken to be the vector approximating the βstrand
In (Hutchinson and Thornton, 1993), the line of
best fit was found via application of the “Principal
Component Analysis” method (Burkowski, 2006)
However, for Dr. Li’s project I have decided to
use the more straightforward “3D Orthogonal
Distance Regression (ODR)” method (George,
2005)
Greek key hunter
The ODR is basically an
extension of the leastsquares method of linear
regression used for points
on a plane to points in
space.
ODR problem statement:
given a set of points in 3D
space, find a line such
that the sum of the
squares of the distances
of each point to the line is
minimized.
Greek key hunter
Assume that the line of best fit has the
following parametric form:
Greek key hunter
Let the given set of points in 3D space be
S = {(x1, y1, z1), (x2, y2, z2), …, (xn, yn, zn)}
Then for any point P(x, y, z) on the line of
best fit for S, the sum of the squared
distances of all points in S to point P is
(expressed as a function of (x0, y0, z0)
which is a point on the line of best fit)
Greek key hunter
Greek key hunter
Greek key hunter
Greek key hunter
Greek key hunter
Therefore, the second derivative of f(x0, y0, z0) is:
According to (Weisstein, 2006), a second
derivate that is positive indicates the presence of
a minimum at points that satisfy the equation in
which the corresponding first derivative is set to
0
Greek key hunter
Thus, f(x0, y0, z0) is minimized with respect
to the x0 axis at the following value of t:
Greek key hunter
From multivariate
calculus, we know
that in order to
minimize a
multivariate function
f(x1, x2, …, xn), the
multivariate function
must be minimized
with respect to each
of its independent
variable, xi, for all i.
Greek key hunter
In the previous few slides we have derived a
minimization of f(x0, y0, z0) with respect to x0
In order to find the point that will minimize f(x0,
y0, z0), we must also minimize f(x0, y0, z0) with
respect to y0 and z0
Fortunately f is symmetric, therefore the
minimization of f with respect to y0 and z0 has
forms that are analogous to the minimization of f
with respect to x0
Greek key hunter
Mathematically speaking, this is expressed
as:
Greek key hunter
Since all three minimization equations
have “-t” on their right hand side, we may
combine them into the following equation
where
,and (x, y, z)
is known as the centroid of point set S.
Greek key hunter
Thus, for any triple (x0, y0, z0) that satisfies
the equation
f(x0, y0, z0), and hence the sum of squared
distances of points in set S to the line of
best fit, is minimized
Note that (x0, y0, z0) is taken to be a
variable point here
Greek key hunter
It is easy to see that the centroid itself is a point
that satisfies the equation in the previous slide
which will always be true
Therefore f is minimized at the centroid point,
and by definition centroid point is on the line of
best fit
Greek key hunter
Recall the original parametric equation for the
line of best fit
Now that we have found a suitable (x0, y0, z0) on
the line, all we need additionally is the vector (a,
b, c), and then we’ll be able to fully characterize
the line of best fit and solve the problem
Greek key hunter
Let C be the centroid
(which is an (x0, y0, z0) on
the line of best fit), L be
the actual line of best fit,
and P be the plane that is
perpendicular to L that
contains the centroid
(thus P is uniquely
determined), that is, L
can be seen as a normal
vector to the plane P
(George, 2006). The
setup is shown in the
diagram to the right
Greek key hunter
For any point xi in the set S,
we have the following
relationship through the
application of the Pythagorean
theorem:
d²(xi, L) = d²(xi, C) – d²(xi, P)
where d(xi, L) is the distance
from point xi to line L, d(xi, C)
is the distance from point xi to
the centroid, and d(xi, P) is the
distance from point xi to plane
P
Greek key hunter
Generalizing the previous formula to all points in
set S, we have the following formula:
We notice right away that the summation on the
left hand side is the one we would like to
minimize. Since the sum of squared distances of
all the points in S to the centroid is a constant,
we could minimize the summation on the left
side by maximizing the summation of the
squared distances of all the points in S to the
plane P on the right side.
Greek key hunter
But how do we maximize the summation
of the squared distances of all the points in
S to the plane P?
We use a method called the “Orthogonal
Distance Regression Plane Analysis”
(ODR plane analysis)
Greek key hunter
Let the equation of plane
P (now we know that this
is the “orthogonal
distance regression
plane”) be
ax + by + cz + d = 0
Notice that the
coefficients of x, y and z
for the plane is (a, b, c)
which is the vector for the
line of best fit, since it is
normal to the plane.
Greek key hunter
Since the centroid is on plane P, we can substitute the
centroid into the equation for plane P to find out what d
is.
Thus, the equation for plane P could be rewritten as:
Greek key hunter
From analytic geometry we know that the
distance from a point (x0, y0, z0) to a line (Ax +
By + Cz + D = 0) is:
Therefore, the distance from a point (xi, yi, zi) in
set S to the ODR plane (ax+by+cz+d=0) is:
Greek key hunter
Using the alternative form of the equation
of plane P shown two slides ago, we have:
Hence, the function that relates (a, b, c) to
the sum of distances of all points in S to
plane P is as follows:
Greek key hunter
Since both the numerator and denominator of
f(a, b, c) are positive, let us square each term of
f to produce the function F, that is, let:
Greek key hunter
It is easy to see that f and F are
“optimizationally equivalent”, that is:
max f = max F
min f = min F
But F is in a form that is more
computationally tractable than f, and F
also allows us to develop a matrix notation
for f in order to perform further analysis
Greek key hunter
Let
then
Greek key hunter
And you must be screaming, “but how?!”
I don’t have the space to give a full proof,
but I’ll give an example for the case where
n = 2. (So two points in the set S)
Greek key hunter
We have
Greek key hunter
Greek key hunter
which is precisely the non-matrix form of
the F(a, b, c) function
Greek key hunter
So the final question is: how do we
maximize F?
Remember that we need to maximize F in
order to maxmize f, which in turn
minimizes the total squared distance of all
points to the line of best fit
Dr. Burkowski to the rescue!
Greek key hunter
This final set of slides on the math of Greek key
hunter is inspired by materials taught by Dr.
Burkowski for his Fall 2006 graduate course on
kernel methods (CS 898)
According to Dr. Burkowski,
is called a Rayleigh quotient (Burkowski, 2006)
Greek key hunter
The Courant-Fisher theorem states that “if A  n×n is
symmetric, then for k = 1, …, n, the kth eigenvalue λk(A) of
the matrix A satisfies
with the extrema achieved by the corresponding
eigenvector.” (Shawe-Taylor and Christianini, 2004)
Let XXT = A, then the Courant-Fisher theorem can be
seen essentially as saying that the extrema of the
Rayleigh quotient will only occur for w = (a, b, c) that are
equal or multiples of eigenvectors of XXT. In other words,
if the maximum of Rayleigh quotient will occur at all, it
occurs at one of the eigenvalues of A = XXT
Greek key hunter
It is then immediately obvious that, in order to maximize
the Rayleigh quotient, we take w to be a vector that is a
linear multiple of the eigenvector of XXT that corresponds
with the largest eigenvalue of XXT
This is because by definition, if λ is an eigenvalue of
matrix A, and vector ξ is an eigenvector corresponding to
the eigenvalue of λ, then Aξ = λξ. Thus for λmax and ξmax
of the matrix X, we have:
Greek key hunter
Another way of looking at this is by the analogy of “mountains”
Courant-Fisher is basically saying that all the peaks (and valleys) of
the Rayleigh quotient occur at and only at eigenvalues and their
corresponding eigenvectors. So what should you do in order to
globally maximize the Rayleigh quotient? — pick the globally highest
peak, of course!
Greek key hunter
At this point, the 3D orthogonal distance
regression problem has been satisfactorily
solved.
For more information regarding the CourantFisher theorem, please refer to pp. 57 of “the
kernel book” (Shawe-Taylor and Cristianini,
2004)
Dr. Burkowski’s CS 898 course notes
(Burkowski, 2006), contains a proof of the
Courant-Fisher theorem
Greek key hunter
procedure compute_β_vector_table
for each β strand in PROMOTIF output table
m = starting residue number for β strand
n = ending residue number for β strand
pull out all Cα in PDB file with residue number
inside [m, n]
for each triple of (Cα[i+1], Cα[i], Cα[i-1])
compute V[i]=(Cα[i+1]-Cα[i])+(Cα[i-1]-Cα[i]);
quarter point of V[i] → {S}
end for
β_vector_table[current β strand] = ODR({S})
end for
Greek key hunter
function ODR({S}) returns vector (a, b, c)
sumx = sumy = sumz = 0
for each point p in {S}
sumx = sumx + p[x]
sumy = sumy + p[y]
sumz = sumz + p[z]
end for
centroid = (sumx/|{S}|, sumy/|{S}|, sumz/|{S}|)
for i = 1..|{S}|
M[1][i] = x-coordinate of (i)th point of {S}
M[2][i] = y-coordinate of (i)th point of {S}
M[3][i] = z-coordinate of (i)th point of {S}
end for
Greek key hunter
A = M*transpose(M)
find eigenvector (a, b, c) corresponding to the
largest eigenvalue of A
return (a, b, c)
With the last major component of Greek
key hunter fully realized, the operational
definitions for the rest of the Greek key
classifications may now be given.
Greek key hunter
Operational definition of (3, 1)N Greek key
β-strands i, i+2, i+3 in one β-sheet, βstrands i+1 in another
Richardson topology of β-strand i+2 is “+x”
or “-x”, where x is a positive integer
Interstrand angles of β-strand i and i+1 is
greater than 120°
Interstrand angles of β-strand i+1 and i+2 is
greater than 120°
Interstrand angles of β-strand i+3 and i is
greater than 120° (this is a sanity check)
β-strand i+3 is hydrogen bonded to βstrands i and i+2
Greek key hunter
Operational definition of (2, 2) Greek key
β-strands i and i+3 in one β-sheet, βstrands i+1 and i+2 in another
Richardson topology of β-strand i+1 is
either “+x” or “-x”, where x is a positive
integer
Interstrand angles of β-strand i and i+1 is
greater than 120°
Interstrand angles of β-strand i+2 and i+3
is greater than 120°
Interstrand angles of β-strand i and i+3 is
greater than 120° (this is a sanity check)
Greek key hunter
Operational definition of (3, 1)2c Greek key
β-strands i, i+2, i+3 in one β-sheet, βstrands i+1 in another
Richardson topology of β-strand i+2 is
“+x” or “-x”, where x is a positive integer
Interstrand angles of β-strand i and i+1 is
greater than 120°
Interstrand angles of β-strand i+1 and i+2
is greater than 120°
Interstrand angles of β-strand i+3 and i is
greater than 120° (this is a sanity check)
β-strand i is hydrogen bonded to β-strands
i+2 and i+3
Greek key hunter
Operational definition of (3, 1)2N Greek key
β-strands i, i+1, i+3 in one β-sheet, β-strands
i+2 in another
Richardson topology of β-strand i is “+x” or “x”, where x is a positive integer
Interstrand angles of β-strand i+1 and i+2 is
greater than 120°
Interstrand angles of β-strand i+2 and i+3 is
greater than 120°
Interstrand angles of β-strand i+3 and i is
greater than 120° (this is a sanity check)
β-strand i+3 is hydrogen bonded to β-strands
i and i+1
Greek key hunter
Operational definition of (3, 1)3C Greek key
β-strands i+1, i+2, i+3 in one β-sheet, βstrands i in another
Richardson topology for β-strands i+1, i+2
are “-x, +y” or “+x, -y”, where x and y are
positive integers
Interstrand angles of β-strand i and i+1 is
greater than 120°
Interstrand angles of β-strand i and i+3 is
greater than 120° (this is a sanity check)
Greek key hunter
Operational definition of (3, 1)3N Greek key
β-strands i, i+1, i+2 in one β-sheet, βstrands i+3 in another
Richardson topology for β-strands i, i+1 are
“-x, +y” or “+x, -y”, where x and y are
positive integers
Interstrand angles of β-strand i+3 and i+2 is
greater than 120°
Interstrand angles of β-strand i+3 and i is
greater than 120° (this is a sanity check)
Greek key hunter
Operational definition of (3, 1)4C
Greek key
β-strands i, i+1, i+2 in one β-sheet,
β-strands i+3 in another
Richardson topology for β-strands i,
i+1 are “-x, -y” or “+x, +y”, where x
and y are positive integers
Interstrand angles of β-strand i+3
and i+2 is greater than 120°
Greek key hunter
Operational definition of (3, 1)4N
Greek key
β-strands i+1, i+2, i+3 in one β-sheet,
β-strands i in another
Richardson topology for β-strands
i+1, i+2 are “-x, -y” or “+x, +y”, where
x and y are positive integers
Interstrand angles of β-strand i and
i+1 is greater than 120°
Greek key hunter
Operational definition of (2, 2)2 Greek key
β-strands i and i+2 in one β-sheet, β-strands
i+1 and i+3 in another
Interstrand angles of β-strands i and i+1 is
greater than 120°
Interstrand angles of β-strands i+1 and i+2
is greater than 120°
Interstrand angles of β-strand i+2 and i+3 is
greater than 120°
Interstrand angles of β-strand i+3 and i is
greater than 120° (this is a sanity check)
Greek key hunter
Operational definition of (2, 2)3 Greek key
β-strands i and i+1 in one β-sheet, β-strands
i+2 and i+3 in another
Richardson topology for β-strands i, i+2 are
“-x, +y” or “+x, -y”, where x and y are positive
integers
Interstrand angles of β-strands i+1 and i+2 is
greater than 120°
Interstrand angles of β-strand i+3 and i is
greater than 120° (this is a sanity check)
References
Burkowski, F. J. (2006) CS 898 Course Notes: Unit 5 —
Pattern Analysis and Eigen-Decompositions. Waterloo:
University of Waterloo, http://www.student.cs.uwaterloo.ca/
~cs898/005_EigenDecompositions.pdf
George M. (2005) “Line of Best Fit for Points in Three
Dimensional Space”, http://mathforum.org
/library/drmath/view/69103.html
Hutchinson, E.G. and Thornton, J.M. (1993) The Greek key
motif: extraction, classification and analysis. Protein
Engineering, 6(3):233-245.
Kabsch W & Sander C (1983). Dictionary of protein
secondary structure: pattern recognition of hydrogenbonded and geometrical features. Biopolymers, 22, 25772637
References
Li, M. (2006) CS 882 Course Notes. Waterloo:
University of Waterloo.
Richardson, J.S. (1981) The anatomy and
taxonomy of protein structure. Advances in
Protein Chemistry, 34, 167−339
Richardson, J.S. (1977) β-sheet topology and the
relatedness of proteins. Nature, 268, 495-500
Saul, F.A., Poljak, R.J. (1992) Crystal structure of
human immunoglobulin fragment Fab New
refined at 2.0 A resolution. Proteins, 14, 363-371
References
Shawe-Taylor, J. and Christianini, N. (2004)
Kernel Methods for Pattern Analysis.
Cambridge: Cambridge University Press
Weisstein, Eric W. (2006) "Second
Derivative Test." From MathWorld--A
Wolfram Web Resource.
http://mathworld.wolfram.com/
SecondDerivativeTest.html
Acknowledgements
Dr. Ming Li, for giving me the opportunity
to work on such a wonderful project
Dr. Gail Hutchinson and Dr. Janet
Thornton, whose PROMOTIF software
saved my life
Sun, Yang and Gao, Xin, for the PDB
database
And finally
A Big “Thank You!”
to the class of CS 882 for taking
the time to listen to my
presentation!
Download