Structure Classification

Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics, Bourne/Weissig, chapter 12 and 13 Michael Schroeder BioTechnological Center TU Dresden Biotec Folding  Proteins are linear polymer mainchains with different amino acid side chains  Proteins fold spontaneously reaching a state of minimal energy  Side and main chains interact with one another and with solvent  Example movie Jones, D.T. (1997) Successful ab initio prediction of the tertiary structure of NKLysin using multiple sequences and recognized supersecondary structural motifs. PROTEINS. Suppl. 1, 185-191 By Michael Schroeder, Biotec, 2 Examining Proteins  Specialised tools with different views of structure  Corey, Pauling, Koltun (CPK)  Diameter of sphere ~ atomic radius  Hydrogen white, carbon grey, nitrogen blue, oxygen red, sulphur yellow  Cartoon  Wire  Balls By Michael Schroeder, Biotec, 3 Examining Proteins By Michael Schroeder, Biotec, 4 Protein Folding  Conformation of residue  Rotation around N-Ca bond,  (phi)  Rotation around Ca-C bond,  (psi)  Rotation around peptide bond  (omega) Residue   Peptide bond tends to be  planar and  in one of two states:  trans 180 (usually) and  cis, 0 (rarely, and mostly proline)    By Michael Schroeder, Biotec, Image taken from www.expasy.org/swissmod/course 5 Sasisekharan-RamakrishnanRamachandran plot  Solid line = energetically preferred  Outside dotted line = disallowed  Most amino acids fall into R region (right-handed alpha helix) or -region (beta-strand)  Glycine has additional conformations (e.g. lefthanded alpha helix = L region) and in lower right panel By Michael Schroeder, Biotec, Image taken from www.expasy.org/swissmod/course 6 Ramachandran plot Plot for a protein with mostly beta-sheets Image taken from www.expasy.org/swissmod/course By Michael Schroeder, Biotec, Example for conformations 7 Helices and Strands  Consecutive residues in alpha or beta conformation generate alpha-helices and betastrands, respectively  Such secondary structure elements are stabilised by weak hydrogen bonds  They are by turns or loops, regions in which the chain alters direction  Turns are often surface exposed and tend to contain charged or polar residues By Michael Schroeder, Biotec, 8 Alpha Helix  Residue j is hydrogen-bonded to residue j+4  3.6 residues per turn  1.5A rise per turn  Repeat every 3.6*1.5A = 5.4 A   = -60 ,  = -45  Image takenBiotec, from www.expasy.org/swissmod/course By Michael Schroeder, 9 Beta strand By Michael Schroeder, Biotec, Image taken from www.expasy.org/swissmod/course 10 Beta Sheets By Michael Schroeder, Biotec, Image taken from www.expasy.org/swissmod/course 11 Turn  Residue j is bonded to residue j+3  Often proline and glycine By Michael Schroeder, Biotec, Image taken from www.expasy.org/swissmod/course 12 How to Fold a Structure  All residues must have stereochemically allowed conformations  Buried polar atoms must be hydrogen-bonded  If a few are missed, it might be energetically preferable to bond these to solvent  Enough hydrophobic surface must be buried and interior must be sufficiently densely packed  There is evidence, that folding occurs hierarchically: First secondary structure elements, then supersecondary,…  This justifies hierarchic approach when simulating folding By Michael Schroeder, Biotec, 13 Structure Alignment + By Michael Schroeder, Biotec, Slides from Hanekamp, University of Wyoming, www.uwyo.edu 14 Structure Alignment + By Michael Schroeder, Biotec, 15 Structure Alignment  In the same way that we align sequences, we wish to align structure  Let’s start simple: How to score an alignment  Sequences: E.g. percentage of matching residues  Structure: rmsd (root mean square deviation) By Michael Schroeder, Biotec, 16 Root Mean Square Deviation  What is the distance between two points a with coordinates xa and ya and b with coordinates xb and yb?  Euclidean distance: d(a,b) = √ (xa--xb )2 + (ya -yb )2 + (za -zb )2 a b By Michael Schroeder, Biotec, 17 Root Mean Square Deviation  In a structure alignment the score measures how far the aligned atoms are from each other on average  Given the distances di between n aligned atoms, the root mean square deviation is defined as rmsd = √ 1/n ∑ di2 By Michael Schroeder, Biotec, 18 Quality of Alignment and Example  Unit of RMSD => e.g. Ångstroms  Identical structures => RMSD = “0”  Similar structures => RMSD is small (1 – 3 Å)  Distant structures => RMSD > 3 Å  Structural superposition of gamma-chymotrypsin and Staphylococcus aureus epidermolytic toxin A By Michael Schroeder, Biotec, 19 Pitfalls of RMSD  all atoms are treated equally (e.g. residues on the surface have a higher degree of freedom than those in the core)  best alignment does not always mean minimal RMSD  significance of RMSD is size dependent By Michael Schroeder, Biotec, From www.uwyo.edu/molecbio/LectureNotes/ MOLB5650 20 Alternative RSMDs  aRMSD = best root-mean-square deviation calculated over all aligned alpha-carbon atoms  bRMSD = the RMSD over the highest scoring residue pairs  wRMSD = weighted RMSD Source: W. Taylor(1999), Protein Science, 8: 654-665. http://www.prosci.uci.edu/Articles/Vol8/issue3/8272/8272.html#relat By Michael Schroeder, Biotec, From www.uwyo.edu/molecbio/LectureNotes/ MOLB5650 21 Computing Structural Alignments  DALI (Distance-matrix-ALIgnment) is one of the first tools for structural alignment  How does it work?  Atoms:  Given two structures’ atomic coordinates  Compute two distance matrices:  Compute for each structure all pairwise inter-atom distances.  This step is done as the computed distances are independent of a coordinate system  The two original atomic coordinate sets cannot be compared, the two distance matrices can  Align two distance matrices:  Find small (e.g. 6x6) sub-matrices along diagonal that match  Extend these matches to form overall alignment  This method is a bit similar to how BLAST works.  SSAP (double dynamic programming) in term 3. By Michael Schroeder, Biotec, 22 DALI Example  The regions of common fold, as determined by the program DALI by L. Holm and C. Sander, in the TIM-barrel proteins mouse adenosine deaminase [1fkx] (black) and Pseudomonas diminuta phosphotriesterase [1pta] (red): By Michael Schroeder, Biotec, 23 Protein zinc finger (4znf) By Michael Schroeder, Biotec, Slides from Hanekamp, University of Wyoming, www.uwyo.edu 24 Superimposed 3znf and 4znf 30 CA atoms RMS = 0.70Å 248 atoms RMS = 1.42Å By Michael Schroeder, Biotec, Lys30 Slides from Hanekamp, University of Wyoming, www.uwyo.edu 25 Superimposed 3znf and 4znf backbones 30 CA atoms RMS = 0.70Å By Michael Schroeder, Biotec, Slides from Hanekamp, University of Wyoming, www.uwyo.edu 26 RMSD vs. Sequence Similarity  At low sequence identity, good structural alignments possible By Michael Schroeder, Biotec, Picture from www.jenner.ac.uk/YBF/DanielleTalbot.ppt 27 Structure Classification By Michael Schroeder, Biotec, 28 Why classify structures?  Structure similarity is good indicator for homology, therefore classify structures  Classification at different levels  Similar general folding patterns (structures not necessarily related)  Possibly low sequence similarity, but similar structure and function implies very likely homology  High sequence similarity implies similar structures and homology  Classification can be used to investigate evolutionary relationships and possibly infer function By Michael Schroeder, Biotec, 29 Structure Classification  SCOP: Structural Classification of Proteins  Hand curated (Alexei Murzin, Cambridge) with some automation  CATH: Class, Architecture, Topology, Homology  Automated, where possible, some checks by hand  FSSP: Fold classification based on StructureStructure alignment of Proteins  Fully automated  Reasonable correspondance (>80%) By Michael Schroeder, Biotec, 30 Evolutionary Relation  Strong sequence similarity is assumed to be sufficient to infer homology  Close structural and functional similarity together are also considered sufficient to infer homology  Similar structure alone not sufficient, as proteins may have converged on structure due to physiochemical necessity  Similar function alone not sufficient, as proteins may have developed it due to functional selection  In general, structure is more conserved than sequence  Beware: Descendents of ancestor may have different function, structure, and sequence! Difficult to detect By Michael Schroeder, Biotec, 31 What is a domain? Single and Multi-Domain Proteins By Michael Schroeder, Biotec, 32 What is a domain?  Functional: Domain is “independent” functional unit, which occurs in more than one protein  Physiochemical: Domain has a hydrophobic core  Topological: Intra-domain distances of atoms are minimal, Inter-domain distances maximal  Difficult to exactly define domain  Difficult to agree on exact domain border By Michael Schroeder, Biotec, 33 Domains re-occur  A domain re-occurs in different structures and possibly in the context of different other domains  P-loop domain in  1goj: Structure Of A Fast Kinesin: Implications For ATPase Mechanism and Interactions With Microtubules Motor Protein (single domain)  1ii6: Crystal Structure Of The Mitotic Kinesin Eg5 In Complex With Mg-ADP Cell Cycle (two domains) By Michael Schroeder, Biotec, 34 Domains re-occur 1in5: interaction of P-loop domain (green & orange) and winged helix DNA binding domain By Michael Schroeder, Biotec, 1a5t: interaction of P-loop domain (green & orange) and DNA polymerase III domain 35 Domains have hydrophobic core  Kyte J., Doolittle R.F, J. Mol. Biol. 157:105132(1982). Hydrophobicity Plot for 1GOJ Kinesin Motor Hydrophobicity 3 2 1 0 -1 1 51 101 151 -2 -3 Residue By Michael Schroeder, Biotec, 201 251 301 Ala: 1.800 Arg: -4.500 Asn: -3.500 Asp: -3.500 Cys: 2.500 Gln: -3.500 Glu: -3.500 Gly: -0.400 His: -3.200 Ile: 4.500 Leu: 3.800 Lys: -3.900 Met: 1.900 Phe: 2.800 Pro: -1.600 Ser: -0.800 Thr: -0.700 Trp: -0.900 Tyr: -1.300 Val: 4.200 36 Intra-domain distances minimal  Distances between atoms within domain are minimal  Distances between atoms of two different domains are maximal By Michael Schroeder, Biotec, 37 PDB, Proteins, and Domains  Ca. 20.000 structures in PDB Dom# Freq. 1 8464 2 4358 3 926 4 1888 5 148 8000 6 624 6000 7 42 8 491 9 22 10 58  50% single domain  50% multiple domain  90% have less than 5 domains Distribution of Number of Domains Frequency 10000 4000 2000 0 -2000 0 10 20 30 Number of Domains 40 50 60 … By Michael Schroeder, Biotec, … 30 7 31 1 32 16 36 1 40 8 42 1 48 3 49 1 38 A structure with 49 domains  1AON, Asymmetric Chaperonin Complex Groel/Groes/(ADP)7 By Michael Schroeder, Biotec, 39 SCOP: Structural Classification of Proteins top CLASS All alpha (218) All Beta (144) Alpha+Beta (279) Alpha/Beta (136) FOLD Trypsin-like serine proteases (1) Immunoglobulin-like (23) SUPERFAMILY Transglutaminase (1) Immunoglobulin (6) FAMILY C1 set domains (antibody constant) By Michael Schroeder, Biotec, V set domains (antibody variable) 40 Class  All beta  (possibly small alpha adornments)  All alpha  (possibly small beta adornments) By Michael Schroeder, Biotec, 41 Class  Alpha/beta (alpha and beta) = single beta sheet with alpha helices joining C-terminus of one strand to the N-terminus of the next  subclass: beta sheet forming barrel surrounded by alpha helices  sublass: central planar beta sheet  Alpha+beta (alpha plus beta) = Alpha and beta units are largely separated  Strands joined by hairpins leading to antiparallel sheets By Michael Schroeder, Biotec, 42 Class  Multi-domain proteins  have domains placed in different classes  domains have not been observed elsewhere  E.g. 1hle By Michael Schroeder, Biotec, 43 Class  Membrane (few and most unique) and cell surface proteins  E.g. Aquaporin 1ih5 By Michael Schroeder, Biotec, 44 Class  Small Proteins  E.g. Insulin, 1pid By Michael Schroeder, Biotec, 45 Class  Coiled coil proteins  E.g. 1i4d, Arfaptin-Rac binding fragment By Michael Schroeder, Biotec, 46 Class  Low-resolution structures, peptides, designed proteins  E.g. 1cis, a designed protein, hybrid protein between chymotrypsin inhibitor CI-2 and helix E from subtilisin Carlsberg from Barley (Hordeum vulgare), hiproly strain By Michael Schroeder, Biotec, 47 Fold, Superfamily, Family  Fold  Common core structure i.e. same secondary structure elements in the same arrangement with the same topological structure  Superfamily  Very similar structure and function  Family  Sequence identity (>30%) or extremely similar structure and function By Michael Schroeder, Biotec, 48 Distribution (2007) Class Fold Superfamily Family All alpha 259 459 772 All beta 165 331 679 Alpha/beta 141 232 736 Alpha+beta 334 488 897 Multidomain 53 53 74 Membrane and cell surface 50 92 104 Small proteins 85 122 202 1086 1777 3464 Total By Michael Schroeder, Biotec, 49 Uses of SCOP  Automatic classification  Understanding of protein enzymatic function  Use superfamily and fold to study distantly related proteins  Study sequence and structure variability  Derive substitution matrices for sequence comparison  Extract structural principles for design  Study decomposition of multi domain proteins  Estimate total number of folds  Derived databases By Michael Schroeder, Biotec, 50 PDB, Proteins, Domains revisited  80% of PDB have only one type of SCOP superfamily  15% of PDB have two different SCOP superfamilies Frequency Frequency of Number of SCOP Superfamilies 16000 14000 12000 10000 8000 6000 4000 2000 0 -2000 0 5 10 15 Number of Superfamilies By Michael Schroeder, Biotec, 20 25 sfNo sfNoFreq 1 13960 2 2721 3 495 4 178 5 33 6 25 7 1 9 4 20 9 21 1 22 1 23 6 51 A structure with 23 different superfamilies  1k9m Co Crystal Structure Of Tylosin Bound To The 50S Ribosomal Subunit Of Haloarcula Marismortui Ribosome By Michael Schroeder, Biotec, 52 The 20 Most Frequently Occurring Superfamilies Suyperfamily SCOP ID #PDB Immunoglobulin b.1.1 823 Lysozyme-like d.2.1 777 Trypsin-like serine proteases b.47.1 649 P-loop containing nucleotide triphosphate hydrolases c.37.1 521 NAD(P)-binding Rossmann-fold domains c.2.1 384 Globin-like a.1.1 384 (Trans)glycosidases c.1.8 332 Acid proteases b.50.1 288 Concanavalin A-like lectins/glucanases b.29.1 230 Thioredoxin-like c.47.1 217 EF-hand a.39.1 212 alpha/beta-Hydrolases c.69.1 195 b.6.1 178 Ribonuclease H-like c.55.3 178 PLP-dependent transferases c.67.1 176 Periplasmic binding protein-like II c.94.1 171 Carbonic anhydrase b.74.1 169 Metalloproteases (\zincins\"), catalytic domain" d.92.1 169 FAD/NAD(P)-binding domain c.3.1 162 Cytochrome c a.3.1 161 Cupredoxins By Michael Schroeder, Biotec, 53 CATH  Class  secondary structure composition  Architecture  orientation in 3D  Topology  connectivity  Homology  Grouped by evidence for homology (sequence, structure and function) By Michael Schroeder, Biotec, 54 Generating CATH  1. Identify close relatives by pairwise sequence alignment  2. Detect more distant relatives using  2a. sequence profiles and  2b. structure alignment  3. Structures still unclassified after 1. and 2. are examined by hand to detect domain boundaries  4. Try 2. and 3. again  5. If still unclassified assign manually By Michael Schroeder, Biotec, 55 CATH step 1: Sequence-based Identification of Homologues Structures  > 30% sequence similarity implies similar structure  Relatives identified using pairwise alignment are clustered using hierarchical clustering with single linkage  Reminder… By Michael Schroeder, Biotec, 56 1 1 2 3 4 5 0 2 6 10 9 0 5 9 8 0 4 5 0 3 2 3 4 5 (1,2) Hierarchical Clustering 0 (1,2) 3 4 5 0 5 9 8 0 4 5 4 0 3 3 3 4 5 5 0 (1,2) (1,2) 3 (4,5) 0 5 8 1 0 4 0 3 (4,5) 0 (1,2) (3,(4,5)) 2 (1,2) (3,(4,5)) 0 5 1 2 3 4 5 0 By Michael Schroeder, Biotec, 57 Hierarchical Clustering:  How to define distance between clusters?  Single linkage:  Minimum  Example: Distance (A,B) to C is 1 A B  Complete linkage:  Maximum C A B C 0 1 2 0 1 0  Example: Distance (A,B) is C is 2  Average linkage:  Average  Example: Distance (A,B) to C is 1.5  Are dendrograms always the same A B C independent of the linkage method? By Michael Schroeder, Biotec, A B C 58 Hierarchical Clustering: Chaining  Beware of chaining when using single linkage A B  As nearest neighbour selected, it appears that all members of the cluster are very similar to each other, when in fact A and Z are very different C D … Z A B C D … Z 0 1 2 3 … 25 0 1 2 … 24 0 1 … 23 0 … 22 … 0 A B C D …Z By Michael Schroeder, Biotec, 59 CATH and single linkage  It is argued that  structural data is quite sparse,  hence it cannot be expected that all cluster members will be very similar (in terms of sequence) to each other,  so that the chaining effect is even useful By Michael Schroeder, Biotec, 60 CATH step 2a:  Profile-based methods such as PSI-BLAST are used to detect distant relatives  Build profiles using all sequence data available (rather than only sequences for which structure exists)  This increases quality of profiles dramatically  51% distant relatives retrieved using profiles based on sequences with known structure only  82% distant relatives retrieved using profile based on all sequences By Michael Schroeder, Biotec, 61 CATH step 2b: Structure-based methods to detect distant relatives  For ca. 15% of structures, sequence-based method does not work  Example: For globins sequence similarity can fall below 10%, yet structure and function (oxygenbinding) are preserved  Use SSAP, the Sequential Structure Alignment Program By Michael Schroeder, Biotec, 62 Clustering Result of Structure Alignment  Relatives identified using pairwise alignment are clustered using hierarchical clustering with single linkage By Michael Schroeder, Biotec, 63 Improving Efficiency: GRATH  Screening large structures (>300 residues) against database can take days  Idea of GRATH (Graphical Representation of CATH):  Improve efficiency by filtering at a higher level before doing detailed comparison  Represent protein as graph where  Nodes are secondary structure elements represented as their midpoint, tilt, and rotation  Edges distances between midpoints of secondary structure elements  Use algorithm to determine subgraph isomorphism (i.e. does one graph occur in another one)  Yes, then do detailed comparison using SSAP By Michael Schroeder, Biotec, 64 Structure Prediction and Modelling By Michael Schroeder, Biotec, 65 Structure Prediction: Four Main Problem Areas  Given a sequence with unknown structure, predict its structure  Secondary structure prediction  Predict regions of helices and strands  Homology modelling  Predict structure from known structures of one or more related proteins  Fold recognition  Given a library of structures, determine which one (if any) is the fold of the given sequence  Prediction of novel folds: A-priori and knowledge-based methods By Michael Schroeder, Biotec, 66 Structure Prediction of Novel Folds: Two Approaches  A priori:  Most approaches aim to reproduce inter-atomic interactions by defining an energy function and trying to find global minimum  Problem: Inadequacy of the energy function Algorithms get stuck in local minima  Knolwedge-based:  Find similarities to known structures or substructures By Michael Schroeder, Biotec, 67 Secondary Structure Prediction  A successful tool for secondary structure prediction is PROF  PROF uses a neural networks to learn secondary structure from known structures  ¾ of PROF’s prediction are correct  At CASP 2000 it predicted e.g. the following |10 |20 |30 |40 |50 Sequence ALVEDPPLKVSEGGLIREGYDPDLDALRAAHREGVAYFLELEERERERTG Prediction HH------------EEE------HHHHHHHHHH-HHHHHHHHHHHHHHHExperiment -E-------------E-----HHHHHHHHHHHHHHHHHHHHHHHHHHHH|60 |70 |80 | 90 |100 IPTLKVGYNAVFGYYLEVTRPYYERVPKEYRPVQTLKDRQRYTLPEMKEK --EEEEEEEEEEEEEEEE-----------EEEEEEEE—-EEEE-HHHHHH ----EEEEE---EEEEEEEHHHHHH-----EEEEE---EEEEE-HHHHHH |110 |120 EREVYRLEALIRRREEEVFLEVRERAKRQ HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH-By Michael Schroeder, Biotec, 68 PROF’s prediction  The regions predicted by the PROF server of Rost to be helical are shown as wider ribbons. The prediction missed only a short helix, at the top left of the picture By Michael Schroeder, Biotec, 69 Homology modelling  Define the model of an unknown structure by making minimal changes to a relative with known structure  Align amino acid sequences of target and one or more known structures  Insertions and deletions should be in loop regions  Determine mainchain segments to represent the regions containing insertions and deletions and stitch these into the known structure  Replace the sidechains of the residues that have been mutated  Examine the model (by hand and computationally) to detect collisions between atoms  Refine the model by limited energy minimisation By Michael Schroeder, Biotec, 70 Accuracy of Homology Modelling  Works for >40-50% sequence similarity  Example: SWISS-MODEL Prediction of neurotoxin of red scorpion (1DQ7) from neurotoxin of yellow scorpion (1PTX) By Michael Schroeder, Biotec, 71 Fold Recognition: 3D Profiles  Given a sequence determine which (if any) fold is most similar  Can we build profiles to represent structures of similar fold (similar to sequence profiles)?  3D profiles:  Classify the environment of each residue  Secondary structure:  Is it part of helix, sheet or other (determined by Mainchain hydrogen bonding interactions)  Surface exposure:  <40A2, 40-114A2, or >114A2 accessible surface area  Polar or non-polar nature of environment  Total of 18 residue classes, one of which each residue is part of  Sequence of these residue classes is 3D profile By Michael Schroeder, Biotec, 72 3D Profiles and Alignments  Structure-Structure Alignment:  3D profiles of two known structures can be aligned against each other  Sequence-Structure Alignment:  Based on existing 3D profiles, probability can be determined for a residue occurring in a residue class.  Using this probability, we can assign 3D profile to a sequence  And hence align the sequence 3D profile to a structure 3D profile  For correctly determined protein structures, the structure 3D profile fits the sequence 3D profile well  However, other proteins may score even better  If a structure does not match its own 3D profile well it is likely that there is an error in the structure determination By Michael Schroeder, Biotec, 73 Threading  Pull query sequence through known structure and rate the score  Necessary:  Method to score the models to select best one  Method to calibrate the scores to decide which of the best is correct By Michael Schroeder, Biotec, Homology modelling Threading Identify homologues Try all possible parents Determine optimal alignment Try many alignments Optimize one model Evaluate many rough models 74 Scoring for Threading  Empirical patterns of residue neighbours derived from known structures  Observe distribution of inter-residue distances for all 20 x 20 residue pairs  Derive probability distribution as function of distance in space and on sequence  Boltzmann equation relates probability and energy  Reverse this and derive energy function from probability distribution By Michael Schroeder, Biotec, 75 Threading the sequence template Target By Michael Schroeder, Biotec, Slides from Hanekamp, University of Wyoming, www.uwyo.edu 76 “Threaded” sequence Yellow = adrenergic receptor sequence Blue = adrenergic receptor (PDB 1F88 ) By Michael Schroeder, Biotec, Slides from Hanekamp, University of Wyoming, www.uwyo.edu 77 Modeled structure Gaps By Michael Schroeder, Biotec, Slides from Hanekamp, University of Wyoming, www.uwyo.edu 78 Corrected Model By Michael Schroeder, Biotec, Slides from Hanekamp, University of Wyoming, www.uwyo.edu 79 Ab initio Structure Prediction By Michael Schroeder, Biotec, 80 Molecular dynamics  Structure prediction = place atoms so that interactions between them create a unique state of maximum stability  Problem:  Model of inter-atomic distances is not complete  Computational scale: Large number of variables and massive search space Non-linearities Rough energy surface with many local minima By Michael Schroeder, Biotec, 81 Conformational energy calculations     Bond stretching: Bond angle bend Torsion angle (e.g. , , ) Van der Waals interactions  Short-range repulsion ~R-12 and long-range attraction ~R-6, where R is the inter-atom distance  Hydrogen bond  Weak chemical/electrostatic interaction, ~R-12 and ~R-10  Electrostatics  Charges on atoms  Solvent  Interactions with water, salt, sugar, etc. By Michael Schroeder, Biotec, 82 Rosetta  Predicts structure by first generating structures of fragments using known structures (3-9 residues)  Combine fragments using Monte Carlo simulation using an energy function with terms for  Paired beta-sheets  Burial of hydrophobic residues  Carries out 1000 simulations  Results are clustered and the centre of the largest cluster is presented as prediction  Demo By Michael Schroeder, Biotec, 83 ROSETTA  The program ROSETTA, by D. Baker and colleagues, can predict the structures of proteins for which no complete domain of similar folding pattern appears in the database. Prediction by ROSETTA of H. influenzae, hypothetical protein. Black lines, experimental structure; red lines, prediction By Michael Schroeder, Biotec, 84 Rosetta  Prediction by ROSETTA of The N-terminal half of domain 1 of human DNA repair protein Xrcc4. This figures shows a selected substructure of Xrcc4 containing the N-terminal 55 out of 116 residues. Black lines, experimental structure; red lines, prediction By Michael Schroeder, Biotec, 85 LINUS  Another programme with similar idea  Prediction by LINUS (program by G.D. Rose and R. Srinivasan) of Cterminal domain of rat endoplasmic reticulum protein ERp29. Black lines, experimental structure; red lines, prediction By Michael Schroeder, Biotec, 86 Monte Carlo Simulation  Objective: Find conformation with minimal energy  Problem: Avoid local minima  Algorithm:     1. Generate a random initial conformation x 2. Perturb conformation x to generate a neighbouring conformation x’ 3. Calculate the energies E(x) and E(x’), resp., for conformations x and x’ 4. If E(x)>E(x’) (i.e. x’ is an improvement, we go down hill from x to x’) then accept x’ as new conformation and go to 2.  5. If E(x)<E(x’) (i.e. x’ is no improvement, we go uphill from x to x’) then accept x’ as new conformation with probability p  6. The probability p to accept uphill moves is reduced with every step  7. Go to step 2.  Step 1.-4. make sure that we “walk” downhill towards a minimum  Step 5.-7. make sure that if we are in local minimum there is a chance to get out of it by accepting an uphill move. It’s important that this probability decreases so that we are getting more and more unlikely to walk uphill By Michael Schroeder, Biotec, 87 Summary  You should know now           What helices, strands, sheets are What a Ramachandran plot is How to score a structural alignment (rmsd) How to compute a structural alignment How a domain can be characterised Why structure classification is useful What the main structure classes are How classifications can be generated automatically What the problems are What secondary structure prediction, homology modelling, threading, ab-initio and knowledge-based structure prediction of novel folds are  Visit PDB, SCOP and CATH websites and  Read chapter 5 By Michael Schroeder, Biotec, 88

Structure Classification

Related documents

Products

Support

Structure Classification

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib