The Greek Key Motif Shuo Xiang (Alex) Dr. Ming Li CS 882 Course Project Presentation Fall 2006 Outline Introduction What is a Greek key? Where do Greek keys occur? History of the Greek key motif Formal Definition Preparatory knowledge Formal definition of Greek key Classification of Greek key Outline Operational Definition and Machine recognition of Greek keys Motivation PDB DSSP PROMOTIF Greek key hunter Greek key hunter in action Setup Results Part 1 Introduction What is a Greek Key A Greek key is a series of four consecutive β strands taking on the conformations shown to the right when viewed in a topology diagram (Branden and Tooze, 1999) What is a Greek key Note, however, that topology diagrams are a simplified way of representing proteins, in real life, Greek keys look more like the object shown to the right. The picture is generated by PyMOL on PDB file 4GCR for γ crystallin with residues 34-62 displayed and everything else masked. What is a Greek Key The Greek keys were so named because of their visual affinity to decorational patterns used in ancient Greek vases shown below (Li, 2006) Where do Greek keys occur? Being a β-motif, Greek keys obviously occur only in proteins having β-strands. This means that α-only proteins such as myoglobin and hemoglobin will not have Greek keys From Dr. Li’s lectures, we also know that γ-crystallins are a very important class of proteins whose Greek key motifs have evolutionary significance Where do Greek keys occur? According to Dr. Hutchinson and Dr. Thornton, (Hutchinson and Thornton, 1993), Greek key motifs could also be found in the following proteins: Trypsin Haemmagglutinin Tumour necrosis factor (TNF) Immunoglobulins Azurin Where do Greek keys occur? Prealbumin PapD (which is a chaperon) Nitrite reductase Insecticidal δ-endotoxin Bacterial cellulase Sperical virus capsid proteins History of Greek key The Greek key motif was first studied and formally characterized by Dr Jane S. Richardson in her paper “β-Sheet topology and the relatedness of proteins” (Richardson, 1977) In (Richardson, 1977) Dr. Richardson has compared Greek key motifs to the Greek keys found on a black Greek vase History of Greek key History of Greek key The earliest Greek key containing protein whose structure has entered the PDB is Immunoglobulin FAB (7FAB) Its structure is determined by Dr. F.A. Saul and Dr. R.J. Poljak using x-ray diffraction in August 27, 1976 (Saul and Poljak, 1992) Part 2 Formal Definition Preparatory Knowledge Three dimensional protein representations are often too complex for any useful patterns to be extracted. Therefore, a simpler, two dimensional abstraction of proteins, known as a “topology diagram” is used. In a topology diagram, α-helices and β-strands are laid out across a role, with their spatial orientations and connections (coils) preserved. β-sheets are also preserved to a certain extent. Preparatory Knowledge It is when one lays out the topology diagram for proteins that structural motifs such as the Greek key becomes apparent. Dr. Jane Richardson was the earliest researcher to study topologies of βstructures. During her study, she has created a nomenclature for β-strand topologies (Richardson, 1981) Preparatory Knowledge Preparatory Knowledge Therefore, Dr. Richardson’s nomenclature of βstrand topologies may be summarized as: “+y” : coil goes y β-strands to the right, starting βstrand and destination β-strand are anti-parallel to each other “-y” : coil goes y β-strands to the left, starting βstrand and destination β-strand are anti-parallel to each other “+yX” : coil goes y β-strands to the right, starting βstrand and destination β-strand are parallel to each other “-yX” : coil goes y β-strands to the left, starting βstrand and destination β-strand are parallel to each other Formal Definition of Greek key With Dr. Richardson’s nomenclature, Greek keys could now be formally defined as any set of 4 consecutive β-strands having the topology of “-3, +1, +1” or “-1, -1, +3” (Hutchinson and Thornton, 1993) Classification of Greek key However, not all four β-strands of the Greek key falls within the same β-sheet. Hence there arises a need to classify Greek key structures according to their distribution of β-strands amongst βsheet(s). Dr. Hutchinson and Dr. Thornton has given such a classification in (Hutchinson and Thornton, 1993) Classification of Greek key If all four β-strands of the Greek key lie in the same β-sheet, then it is called a (4,0) Greek key, meaning that there are four strands in one β-sheet and zero strands in the other β-sheet. Note that β-strands of a Greek key can go into at most two β-sheets. More than two β-sheets would make it very hard to decide whether a Greek key exists instead of some other random β-structure. Classification of Greek key Furthermore, (4,0) Greek keys come in two flavours — an “N” version where the N-end of the Greek key is on the outside, and a “C” version where the C-end of the Greek key is on the outside. This is shown in the diagram below. Classification of Greek key Similarly, (Hutchinson and Thornton, 1993) classified the following as (3,1)N and (3,1)C Greek keys. Note that the green arrow represents β-strands from a different βsheet. Classification of Greek key (Hutchinson and Thornton, 1993) also classified the (2,2) structures as having an “N” version and a “C” version However, from an examination of the PROMOTIF outputs (to be covered later) and the fact that the “N” version could be rotated to produce the “C” version, and so the two versions are topologically equivalent to each other, I conclude that there is only one flavour of (2,2) structure. Classification of Greek key For this project the classification of (Hutchinson and Thornton, 1993) is extended to include the following additional combinations of four β-strands from two different β-sheets Part 3 Operational definition and machine recognition of Greek keys Motivation In the previous part, we have developed a “formal” definition of what Greek keys are in terms of topological diagrams But we need an “operational” definition of what Greek keys are so that computers will be able to identify them from PDB files The “formal” definition, while fine for humans, remains too sketchy and ambiguous for computers to work with Motivation In this part the various software whose output the “Greek key hunter” depends on will be examined I will then show the working principles of “Greek key hunter” With the “Greek key hunter” it will be possible for computers to automatically identify both the Hutchinson and Thornton classification and the extended classification of Greek keys for this project. PDB The Protein Data Bank (PDB) is a repository of protein structures that have been obtained through X-Ray crystallography or Nuclear Magnetic Resonance (NMR). Almost every structural bioinformatics project makes use of the PDB in some way. For this project, PDB data acts as input for the DSSP algorithm. Thanks go out to Gao, Xin and Sun, Yang for giving me the PDB data so that I do not have to download it myself. DSSP DSSP is the standard algorithm used in structural bioinformatics to characterize secondary structures of a protein molecule. It is written by Wolfgang Kabsch and Chris Sander (Kabsch and Sander, 1983) In this project DSSP processes PDB data to produce output that will be worked on by the PROMOTIF software. PROMOTIF PROMOTIF is one of the key software for this project. It takes the DSSP output and further refines them to produce data that are more relevant to motif-analyses. For this project, PROMOTIF produces the Richardson topology information that will be vital to the recognition of Greek keys. PROMOTIF is written by Dr. Hutchinson and Dr. Thornton using the programming language FORTRAN. Fortunately it could be compiled on Linux using the f77 compiler. Greek key hunter The PROMOTIF suite of software was easy to use and its β-structure analyzer worked efficiently with the PDB files to fully characterize all the β-strands in the protein of a given PDB file Unfortunately there is a very important component that is absent from the PROMOTIF framework — (gasp) a Greek key analyzer Greek key hunter This lack of a Greek key analyzer provides me with an opportunity to write such a analyzer that not only identifies the Greek key structures classified by Drs. Hutchinson and Janet, but also the extended classification I have developed for this project. The objective is then to write a program that could identify Greek keys from the β-structural output of the PROMOTIF software and other relevant data. In other words, a “Greek key hunter”. Greek key hunter There is a first principle of Greek keys that vastly simplifies their search in the pdb#.str file — Greek keys always contain “four sequential β-strands” (Hutchinson and Thornton, 1993) This means that only consecutive quartets of lines needs to be grouped and searched for Greek keys in the pdb#.str file. Greek key hunter This is the PROMOTIF β-strand analyzer output for 1FNB — Ferredoxin reductase The first principle dictates that line n and the next three lines comprise Greek key candidate n Greek key hunter Once we have four lines representing a potential Greek key candidate, how do we develop the rules that would allow the computer to judge these four lines as representing either a valid or an invalid instance of Greek key? I have found that the best way to deriving these rules is through the pragmatic approach of “learning by examples”. Greek key hunter In their paper, Drs. Hutchinson and Thornton has listed Greek-key containing proteins for each Greek key class defined in part 2. By looking at the pdb#.str output file and the PDB files (in PyMOL) of the representative proteins of each Greek key class, I would be able derive the rules that would characterize different classes of Greek keys and differentiate Greek keys from non-Greek keys. These rules would then be coded into the Greek key hunter Greek key hunter I have decided to start with the easiest class of Greek keys to characterize — the (4,0) class of Greek keys The first entry for the (4,0) Greek keys in Table II of (Hutchinson and Thornton, 1993) is “Ferredoxin reductase” (1FNB). Before I looked into 1FNB.str, I expected the “topology” column of that file to read either “-3, +1, +1” or “+3, -1, -1”. Greek key hunter …But the actual topology read-out was wildly different and came as a shock! Greek key hunter The residue columns indicated that these 4 sequential β-strands extended from residues 57 to 116, (Hutchinson and Thornton, 2006) says the Greek key extends from residue 56 to 117. So I know I am looking at the right quartet. But the topology column reads “+3, +1, -5” !!! Greek key hunter Upon examining the PDB file for ferredoxin reductase in PyMOL, it became all clear why the assigned topology of “+3, +1, -5” was appropriate. Greek key hunter In fact, the topology diagram for the Greek key in residues 56-117 of ferredoxin reductase looks like this → (the coils associated with the middle two strands are omitted for clarity) Greek key hunter Therefore, to enable an operational definition of the (4,0)C Greek key, we must relax the original formal definition somewhat Operational definition of (4,0)C Greek key: let the Richardson topology of four sequential β-strands be (+x, +y, -z) or (-x, -y, +z), where x, y and z are positive integers, if (x + y) < z, then the β-strands form a (4,0)C Greek key. Greek key hunter Assume that we are currently processing line i to line i+3 of pdb#.str. If not(sheet # of line i == sheet # of == sheet # of line i+2 == sheet # of print(“Not a (4,0)[C] Greek key!”) return false; end if x = Richardson topology number of line y = Richardson topology number of line z = Richardson topology number of line if any of x, y or z contain the suffix print(“Not a (4,0)[C] Greek key!”) return false; end if line i+1 line i+3) then i i+1 i+2 ‘X’ then Greek key hunter if not(x > 0 and y > 0 and z < 0) and not(x < 0 and y < 0 and z > 0) then print(“Not a (4,0)[C] Greek key!”) return false end if x = absolute_value(x) y = absolute_value(y) z = absolute_value(z) if x + y < z then print(“(4,0)[C] Greek key found!”) return true else print(“Not a (4,0)[C] Greek key!”) return false end if Greek key hunter Likewise, the operational definition of (4,0)N Greek keys could be easily derived by comparing it to the (4,0)N Greek keys. Greek key hunter Operational definition of (4,0)N Greek key: let the Richardson topology of four sequential β-strands be (+x, -y, -z) or (-x, +y, +z), where x, y and z are positive integers, if (y + z) < x, then the β-strands form a (4,0)N Greek key. And here is the pseudo-code for machine recognition of a (4,0)N Greek key. Greek key hunter Assume that we are currently processing line i to line i+3 of pdb#.str. If not(sheet # of line i == sheet # of == sheet # of line i+2 == sheet # of print(“Not a (4,0)[N] Greek key!”) return false end if x = Richardson topology number of line y = Richardson topology number of line z = Richardson topology number of line if any of x, y or z contain the suffix print(“Not a (4,0)[N] Greek key!”) return false end if line i+1 line i+3) then i i+1 i+2 ‘X’ then Greek key hunter if not(x > 0 and y < 0 and z < 0) and not(x < 0 and y > 0 and z > 0) then print(“Not a (4,0)[N] Greek key!”) return false end if x = absolute_value(x) y = absolute_value(y) z = absolute_value(z) if y + z < x then print(“(4,0)[N] Greek key found!”) return true else print(“Not a (4,0)[N] Greek key!”) return false end if Greek key hunter Now we are ready to tackle the operational definition of Greek keys that spreads across two β-sheets Continuing our “learning by examples”, let us study the (3,1)C Greek key of γ crystallin, whose evolutionary significance has been discussed in Dr. Li’s notes and lecture Greek key hunter As we can see, the 1st, 2nd and 4th β-strand counting from the N-end entry point of the Greek key found in residues 41-81 of γ crystallin belong to one βsheet, while the 3rd β-strand belong to another β-sheet. Looking at the PyMOL display, the 3rd β-strand is antiparallel to the 2nd and 4th β-strands, even though they are in different β-sheets Greek key hunter Once again, I was shocked to see the number “99”. I initially thought that maybe one β-strand had to skip over 99 other β-strands in order to reach the next β-strand in the Greek key. Yet upon further consulting the PROMOTIF documentation I realized “99” indicates that the next β-strand is in a different β-sheet, and so its Richardson topology in relation to the previous β-strand is not given by PROMOTIF (i.e. PROMOTIF does not attempt to characterize inter-sheet β-strand topologies) Greek key hunter Yet the PyMOL display clearly tells us that the β-strands in question are anti-parallel to each other. What could I do in order to make the computer realize that too? Interstrand angles to the rescue! Greek key hunter Dr. Hutchinson and Dr. Thornton have had the same problem in characterizing the topology of inter-sheet β-strands, which is not defined by Richardson topologies. Instead, they used the concept of interstrand angles to determine whether sequential β-strands in different β-sheets are antiparallel to each other. Greek key hunter In the inter-strand angle scheme, the N to C orientation of β-strands in space could be approximated by linear vectors. Greek key hunter The dot products of pairs of these vectors could then be taken, regardless of whether the corresponding βstrands are in the same β-sheet or not. Since the dot product yields the cosine of the angle θ between two βstrands, if θ > 120°, then the two β-strands are taken to be anti-parallel to each other. (Hutchinson and Thornton, 1993) Greek key hunter With the definition of the inter-strand angles it is now possible to characterize the topology between β-strands in different β-sheets. Thus it is now possible to give an operational definition of Greek key motifs that spans more than one β-sheet — whenever we hit a “99” in the Richardson topology column, we simply look up the corresponding inter-strand angle in a table. Greek key hunter Operational definition of (3,1)C Greek key: let the four sequential β-strands be from i to i+3. If all of the following are satisfied: i, i + 1, i + 3 belong to one β-sheet, i + 2 belongs to another β-sheet Richardson topology of i is “+x” or “-x”, where x is a positive integer θ between i + 1 and i + 2 is greater than 120° θ between i + 2 and i + 3 is greater than 120° θ between i + 3 and i is greater than 120° (this is a sanity check) i is hydrogen bonded to both i + 1 and i + 3 Greek key hunter Assume that we are currently processing line i to line i+3 of pdb#.str. If not((sheet # of line i == sheet # of line i+1 == sheet # of line i+3) and sheet # of line i+2 not= sheet # of line i and sheet # of line i+2 not= sheet # of line i+1 and sheet # of line i+2 not= sheet# of line i+3) then print(“Not a (3,1)[C] Greek key!”) return false end if x = Richardson topology number of line i if (x contain the suffix ‘X’) or x==99 then print(“Not a (3,1)[C] Greek key!”) return false end if Greek key hunter θ[1] = interstrand_angle(i+1, i+2) θ[2] = interstrand_angle(i+2, i+3) if not(θ[1] > 120° and θ[2] > 120°) then print(“Not a (3,1)[C] Greek key!) return false end if if not(hydrogen_bonded(i, i+1) == true and hydrogen_bonded(i, i+3) == true) print(“Not a (3,1)[C] Greek key!) return false end if print(“(3,1)[C] Greek key found!”) return true Greek key hunter function interstrand_angle(strand1#, strand2#) vector u = β_vector_table[strand1#] vector v = β_vector_table[strand2#] dotproduct = <u, v> u_modulus = sqrt(u[x]*u[x] + u[y]*u[y] + u[z]*u[z]) v_modulus = sqrt(v[x]*v[x] + v[y]*v[y] + v[z]*v[z]) the_cosine = dotproduct / (u_modulus * v_modulus) return arccos(the_cosine) Greek key hunter So the only remaining question is: what would be the best way to approximate a β-strand by a linear vector? An intuitive approach would be to construct a vector whose starting point is the N-end αcarbon and whose end point is the C-end αcarbon However, if the β-strand is highly curved, this might not work well Dr. Hutchinson and Dr. Thornton proposed a “best line of fit” approach to this problem Greek key hunter The diagram above shows a β-strand, notice how the αcarbons zigzag up and down, and so the β-strand is “pleated”. (Hutchinson and Thornton, 1993) constructed vector Vi to be [(Cαi+1-Cαi) + (Cαi-1-Cαi)], and take the “X” point to be ¼ from Cαi along vector Vi Together, the set of “X” points provides a “smoothed out” version of the originally pleated β-strand. Greek key hunter The line that “best fits” the set of “X” points is then taken to be the vector approximating the βstrand In (Hutchinson and Thornton, 1993), the line of best fit was found via application of the “Principal Component Analysis” method (Burkowski, 2006) However, for Dr. Li’s project I have decided to use the more straightforward “3D Orthogonal Distance Regression (ODR)” method (George, 2005) Greek key hunter The ODR is basically an extension of the leastsquares method of linear regression used for points on a plane to points in space. ODR problem statement: given a set of points in 3D space, find a line such that the sum of the squares of the distances of each point to the line is minimized. Greek key hunter Assume that the line of best fit has the following parametric form: Greek key hunter Let the given set of points in 3D space be S = {(x1, y1, z1), (x2, y2, z2), …, (xn, yn, zn)} Then for any point P(x, y, z) on the line of best fit for S, the sum of the squared distances of all points in S to point P is (expressed as a function of (x0, y0, z0) which is a point on the line of best fit) Greek key hunter Greek key hunter Greek key hunter Greek key hunter Greek key hunter Therefore, the second derivative of f(x0, y0, z0) is: According to (Weisstein, 2006), a second derivate that is positive indicates the presence of a minimum at points that satisfy the equation in which the corresponding first derivative is set to 0 Greek key hunter Thus, f(x0, y0, z0) is minimized with respect to the x0 axis at the following value of t: Greek key hunter From multivariate calculus, we know that in order to minimize a multivariate function f(x1, x2, …, xn), the multivariate function must be minimized with respect to each of its independent variable, xi, for all i. Greek key hunter In the previous few slides we have derived a minimization of f(x0, y0, z0) with respect to x0 In order to find the point that will minimize f(x0, y0, z0), we must also minimize f(x0, y0, z0) with respect to y0 and z0 Fortunately f is symmetric, therefore the minimization of f with respect to y0 and z0 has forms that are analogous to the minimization of f with respect to x0 Greek key hunter Mathematically speaking, this is expressed as: Greek key hunter Since all three minimization equations have “-t” on their right hand side, we may combine them into the following equation where ,and (x, y, z) is known as the centroid of point set S. Greek key hunter Thus, for any triple (x0, y0, z0) that satisfies the equation f(x0, y0, z0), and hence the sum of squared distances of points in set S to the line of best fit, is minimized Note that (x0, y0, z0) is taken to be a variable point here Greek key hunter It is easy to see that the centroid itself is a point that satisfies the equation in the previous slide which will always be true Therefore f is minimized at the centroid point, and by definition centroid point is on the line of best fit Greek key hunter Recall the original parametric equation for the line of best fit Now that we have found a suitable (x0, y0, z0) on the line, all we need additionally is the vector (a, b, c), and then we’ll be able to fully characterize the line of best fit and solve the problem Greek key hunter Let C be the centroid (which is an (x0, y0, z0) on the line of best fit), L be the actual line of best fit, and P be the plane that is perpendicular to L that contains the centroid (thus P is uniquely determined), that is, L can be seen as a normal vector to the plane P (George, 2006). The setup is shown in the diagram to the right Greek key hunter For any point xi in the set S, we have the following relationship through the application of the Pythagorean theorem: d²(xi, L) = d²(xi, C) – d²(xi, P) where d(xi, L) is the distance from point xi to line L, d(xi, C) is the distance from point xi to the centroid, and d(xi, P) is the distance from point xi to plane P Greek key hunter Generalizing the previous formula to all points in set S, we have the following formula: We notice right away that the summation on the left hand side is the one we would like to minimize. Since the sum of squared distances of all the points in S to the centroid is a constant, we could minimize the summation on the left side by maximizing the summation of the squared distances of all the points in S to the plane P on the right side. Greek key hunter But how do we maximize the summation of the squared distances of all the points in S to the plane P? We use a method called the “Orthogonal Distance Regression Plane Analysis” (ODR plane analysis) Greek key hunter Let the equation of plane P (now we know that this is the “orthogonal distance regression plane”) be ax + by + cz + d = 0 Notice that the coefficients of x, y and z for the plane is (a, b, c) which is the vector for the line of best fit, since it is normal to the plane. Greek key hunter Since the centroid is on plane P, we can substitute the centroid into the equation for plane P to find out what d is. Thus, the equation for plane P could be rewritten as: Greek key hunter From analytic geometry we know that the distance from a point (x0, y0, z0) to a line (Ax + By + Cz + D = 0) is: Therefore, the distance from a point (xi, yi, zi) in set S to the ODR plane (ax+by+cz+d=0) is: Greek key hunter Using the alternative form of the equation of plane P shown two slides ago, we have: Hence, the function that relates (a, b, c) to the sum of distances of all points in S to plane P is as follows: Greek key hunter Since both the numerator and denominator of f(a, b, c) are positive, let us square each term of f to produce the function F, that is, let: Greek key hunter It is easy to see that f and F are “optimizationally equivalent”, that is: max f = max F min f = min F But F is in a form that is more computationally tractable than f, and F also allows us to develop a matrix notation for f in order to perform further analysis Greek key hunter Let then Greek key hunter And you must be screaming, “but how?!” I don’t have the space to give a full proof, but I’ll give an example for the case where n = 2. (So two points in the set S) Greek key hunter We have Greek key hunter Greek key hunter which is precisely the non-matrix form of the F(a, b, c) function Greek key hunter So the final question is: how do we maximize F? Remember that we need to maximize F in order to maxmize f, which in turn minimizes the total squared distance of all points to the line of best fit Dr. Burkowski to the rescue! Greek key hunter This final set of slides on the math of Greek key hunter is inspired by materials taught by Dr. Burkowski for his Fall 2006 graduate course on kernel methods (CS 898) According to Dr. Burkowski, is called a Rayleigh quotient (Burkowski, 2006) Greek key hunter The Courant-Fisher theorem states that “if A n×n is symmetric, then for k = 1, …, n, the kth eigenvalue λk(A) of the matrix A satisfies with the extrema achieved by the corresponding eigenvector.” (Shawe-Taylor and Christianini, 2004) Let XXT = A, then the Courant-Fisher theorem can be seen essentially as saying that the extrema of the Rayleigh quotient will only occur for w = (a, b, c) that are equal or multiples of eigenvectors of XXT. In other words, if the maximum of Rayleigh quotient will occur at all, it occurs at one of the eigenvalues of A = XXT Greek key hunter It is then immediately obvious that, in order to maximize the Rayleigh quotient, we take w to be a vector that is a linear multiple of the eigenvector of XXT that corresponds with the largest eigenvalue of XXT This is because by definition, if λ is an eigenvalue of matrix A, and vector ξ is an eigenvector corresponding to the eigenvalue of λ, then Aξ = λξ. Thus for λmax and ξmax of the matrix X, we have: Greek key hunter Another way of looking at this is by the analogy of “mountains” Courant-Fisher is basically saying that all the peaks (and valleys) of the Rayleigh quotient occur at and only at eigenvalues and their corresponding eigenvectors. So what should you do in order to globally maximize the Rayleigh quotient? — pick the globally highest peak, of course! Greek key hunter At this point, the 3D orthogonal distance regression problem has been satisfactorily solved. For more information regarding the CourantFisher theorem, please refer to pp. 57 of “the kernel book” (Shawe-Taylor and Cristianini, 2004) Dr. Burkowski’s CS 898 course notes (Burkowski, 2006), contains a proof of the Courant-Fisher theorem Greek key hunter procedure compute_β_vector_table for each β strand in PROMOTIF output table m = starting residue number for β strand n = ending residue number for β strand pull out all Cα in PDB file with residue number inside [m, n] for each triple of (Cα[i+1], Cα[i], Cα[i-1]) compute V[i]=(Cα[i+1]-Cα[i])+(Cα[i-1]-Cα[i]); quarter point of V[i] → {S} end for β_vector_table[current β strand] = ODR({S}) end for Greek key hunter function ODR({S}) returns vector (a, b, c) sumx = sumy = sumz = 0 for each point p in {S} sumx = sumx + p[x] sumy = sumy + p[y] sumz = sumz + p[z] end for centroid = (sumx/|{S}|, sumy/|{S}|, sumz/|{S}|) for i = 1..|{S}| M[1][i] = x-coordinate of (i)th point of {S} M[2][i] = y-coordinate of (i)th point of {S} M[3][i] = z-coordinate of (i)th point of {S} end for Greek key hunter A = M*transpose(M) find eigenvector (a, b, c) corresponding to the largest eigenvalue of A return (a, b, c) With the last major component of Greek key hunter fully realized, the operational definitions for the rest of the Greek key classifications may now be given. Greek key hunter Operational definition of (3, 1)N Greek key β-strands i, i+2, i+3 in one β-sheet, βstrands i+1 in another Richardson topology of β-strand i+2 is “+x” or “-x”, where x is a positive integer Interstrand angles of β-strand i and i+1 is greater than 120° Interstrand angles of β-strand i+1 and i+2 is greater than 120° Interstrand angles of β-strand i+3 and i is greater than 120° (this is a sanity check) β-strand i+3 is hydrogen bonded to βstrands i and i+2 Greek key hunter Operational definition of (2, 2) Greek key β-strands i and i+3 in one β-sheet, βstrands i+1 and i+2 in another Richardson topology of β-strand i+1 is either “+x” or “-x”, where x is a positive integer Interstrand angles of β-strand i and i+1 is greater than 120° Interstrand angles of β-strand i+2 and i+3 is greater than 120° Interstrand angles of β-strand i and i+3 is greater than 120° (this is a sanity check) Greek key hunter Operational definition of (3, 1)2c Greek key β-strands i, i+2, i+3 in one β-sheet, βstrands i+1 in another Richardson topology of β-strand i+2 is “+x” or “-x”, where x is a positive integer Interstrand angles of β-strand i and i+1 is greater than 120° Interstrand angles of β-strand i+1 and i+2 is greater than 120° Interstrand angles of β-strand i+3 and i is greater than 120° (this is a sanity check) β-strand i is hydrogen bonded to β-strands i+2 and i+3 Greek key hunter Operational definition of (3, 1)2N Greek key β-strands i, i+1, i+3 in one β-sheet, β-strands i+2 in another Richardson topology of β-strand i is “+x” or “x”, where x is a positive integer Interstrand angles of β-strand i+1 and i+2 is greater than 120° Interstrand angles of β-strand i+2 and i+3 is greater than 120° Interstrand angles of β-strand i+3 and i is greater than 120° (this is a sanity check) β-strand i+3 is hydrogen bonded to β-strands i and i+1 Greek key hunter Operational definition of (3, 1)3C Greek key β-strands i+1, i+2, i+3 in one β-sheet, βstrands i in another Richardson topology for β-strands i+1, i+2 are “-x, +y” or “+x, -y”, where x and y are positive integers Interstrand angles of β-strand i and i+1 is greater than 120° Interstrand angles of β-strand i and i+3 is greater than 120° (this is a sanity check) Greek key hunter Operational definition of (3, 1)3N Greek key β-strands i, i+1, i+2 in one β-sheet, βstrands i+3 in another Richardson topology for β-strands i, i+1 are “-x, +y” or “+x, -y”, where x and y are positive integers Interstrand angles of β-strand i+3 and i+2 is greater than 120° Interstrand angles of β-strand i+3 and i is greater than 120° (this is a sanity check) Greek key hunter Operational definition of (3, 1)4C Greek key β-strands i, i+1, i+2 in one β-sheet, β-strands i+3 in another Richardson topology for β-strands i, i+1 are “-x, -y” or “+x, +y”, where x and y are positive integers Interstrand angles of β-strand i+3 and i+2 is greater than 120° Greek key hunter Operational definition of (3, 1)4N Greek key β-strands i+1, i+2, i+3 in one β-sheet, β-strands i in another Richardson topology for β-strands i+1, i+2 are “-x, -y” or “+x, +y”, where x and y are positive integers Interstrand angles of β-strand i and i+1 is greater than 120° Greek key hunter Operational definition of (2, 2)2 Greek key β-strands i and i+2 in one β-sheet, β-strands i+1 and i+3 in another Interstrand angles of β-strands i and i+1 is greater than 120° Interstrand angles of β-strands i+1 and i+2 is greater than 120° Interstrand angles of β-strand i+2 and i+3 is greater than 120° Interstrand angles of β-strand i+3 and i is greater than 120° (this is a sanity check) Greek key hunter Operational definition of (2, 2)3 Greek key β-strands i and i+1 in one β-sheet, β-strands i+2 and i+3 in another Richardson topology for β-strands i, i+2 are “-x, +y” or “+x, -y”, where x and y are positive integers Interstrand angles of β-strands i+1 and i+2 is greater than 120° Interstrand angles of β-strand i+3 and i is greater than 120° (this is a sanity check) References Burkowski, F. J. (2006) CS 898 Course Notes: Unit 5 — Pattern Analysis and Eigen-Decompositions. Waterloo: University of Waterloo, http://www.student.cs.uwaterloo.ca/ ~cs898/005_EigenDecompositions.pdf George M. (2005) “Line of Best Fit for Points in Three Dimensional Space”, http://mathforum.org /library/drmath/view/69103.html Hutchinson, E.G. and Thornton, J.M. (1993) The Greek key motif: extraction, classification and analysis. Protein Engineering, 6(3):233-245. Kabsch W & Sander C (1983). Dictionary of protein secondary structure: pattern recognition of hydrogenbonded and geometrical features. Biopolymers, 22, 25772637 References Li, M. (2006) CS 882 Course Notes. Waterloo: University of Waterloo. Richardson, J.S. (1981) The anatomy and taxonomy of protein structure. Advances in Protein Chemistry, 34, 167−339 Richardson, J.S. (1977) β-sheet topology and the relatedness of proteins. Nature, 268, 495-500 Saul, F.A., Poljak, R.J. (1992) Crystal structure of human immunoglobulin fragment Fab New refined at 2.0 A resolution. Proteins, 14, 363-371 References Shawe-Taylor, J. and Christianini, N. (2004) Kernel Methods for Pattern Analysis. Cambridge: Cambridge University Press Weisstein, Eric W. (2006) "Second Derivative Test." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/ SecondDerivativeTest.html Acknowledgements Dr. Ming Li, for giving me the opportunity to work on such a wonderful project Dr. Gail Hutchinson and Dr. Janet Thornton, whose PROMOTIF software saved my life Sun, Yang and Gao, Xin, for the PDB database And finally A Big “Thank You!” to the class of CS 882 for taking the time to listen to my presentation!