Evolution of Protein Structure Mario A. Fares Evolutionary Genetics and Bioinformatics Laboratory Department of Genetics, Smurfit Institute of Genetics, TCD Phone: 8913521 Email: faresm@tcd.ie http://bioinf.gen.tcd.ie/~faresm Contents 1. In vivo, in vitro, in silico •Proteins •Protein structure and conformation: Basic concepts •Proteins in diseases 2. Patterns and forms in protein structure •Helices and sheets •The hierarchical nature of protein architecture •Structure based classification of proteins •protein folding: Intra-cellular pathogens and the survival of the flattest •Protein folding and disease: Amyloidoes, Parkinson, Huntington, Prion disease 3. Conformational changes in protein Structural changes arising from changes in state of ligation Hinge motions in proteins Mechanisms of conformation changes (Haemoglobin, Serpins, muscle contraction) Higher level structural changes (GroELS) 4: Protein Structure Prediction and Determination Methods of protein structure determination Critical assessment of structure prediction Homology modelling Threading Prediction of novel folds Protein design 5. Evolution of Protein Structure and Function Protein structure classification Structural relationships among homologous proteins Changes in proteins during evolution uncovers functionally/structurally important amino acid sites Domain swapping Classification of protein folding patterns How do proteins evolve new functions? Classification of protein functions 6. Molecular evolution Evolution of Globins Evolution of Serine proteinases Evolution of visual pigments and related molecules a) 7. Molecular Coevolution and mutation epistatic effects on protein structure Defining molecular coevolution Non-parametric methods to detect coevolution Parametric methods to detect coevolution acid sites Intra-molecular coevolution and prediction of amino three-dimensional proximity Inter-protein coevolution and the identification of protein-protein interfaces 8. Some examples of the immune system Antibody structure Protein of the Major histocompatibility Complex T-cell receptors Cancer and protein structures REPRESENTATIVE DISCIPLINE EXAMPLE UNITS Anatomy MRI Physiology Heart Cell Biology Proteomics Genomics Medicinal Chemistry SCIENTIFIC RESEARCH & DISCOVERY Organisms Translational Medicine Neuron Structure Sequence Protease Inhibitor REPRESENTATIVE TECHNOLOGY Migratory Sensors Organs Ventricular Modeling Cells Electron Microscopy Macromolecules Biopolymers Atoms & Molecules X-ray Crystallography Protein Docking REPRESENTATIVE DISCIPLINE EXAMPLE UNITS Anatomy MRI Physiology Heart Cell Biology Neuron SCIENTIFIC RESEARCH & DISCOVERY Organisms REPRESENTATIVE TECHNOLOGY Migratory Sensors Organs Ventricular Modeling Cells Electron Microscopy We will focus here Proteomics Genomics Medicinal Chemistry Structure Sequence Protease Inhibitor Macromolecules Biopolymers Atoms & Molecules X-ray Crystallography Protein Docking In vivo, in vitro, in silico Proteins Proteins roles Structural proteins Catalytic proteins Viral capside proteins Haemoglobin Cytoskeleton Myoglobin Epidermal keratin Ferritin Regulatory proteins Hormones Transcription factors Estimated Functional Roles (by % of Proteins) of the Proteome in a Complex Organism (a) myoglobin (b) hemoglobin (c) lysozyme (d) transfer RNA (e) antibodies (f) viruses (g) actin (h) the nucleosome (i) myosin (j) ribosome Courtesy of David Goodsell, TSRI Step 3. What Can Be Got from Structure When You Have it? From Structural Bioinformatics Ed Bourne and Weissig p394 Wiley 2002 Step 4. Proteins Do Not Function in Isolation But are Part of Complex Interaction Networks http://www.genome.jp/kegg/ •Proteins have the ability to organise themselves in three dimensions •Proteins have the ability to evolve due to the inheritable property of protein structure variation •Proteins are the direct responsibles for the cell viability in normal physiological conditions as well as in stressful conditions •Projects in development: Strutural Genomics Proteome project Fascinating projects and questions under study: 1. Interpretation of the mechanisms of function of individual proteins 2. Approaches to the protein folding problem Dependence on environmental parameters Protein structure prediction 3. Patterns of molecular evolution Structural and functional selective constraints relaxed amino acid sequence evolution Structure evolution 4. Prediction of the structure of closely related proteins Homology methods Functional convergence leads to structure convergence 5. Protein engineering Modifications to probe mechanisms of protein function Molecular manipulation to enhance thermostability Clinical aplications (therapeutic antibodies) 6. Drug design Peptide inhibitors of HSPs proteins inducers of the immune response GENOME SEQUENCES • The genomes of 100 prokaryotes, many viruses, organelles and plasmids, and over 12 eukaryotes, representing all major categories of living things Completed eukaryotic nuclear genomes Type of organism Species Primitive microsporidian Fungi Nematode worm Insect: Fruit fly mosquito Malarial parasite Plants: Thale cress rice Human Mouse Rat Chicken E. cuniculi S. cerevisiae Sc. pombe N. crassa C. elegans D. melanogaster A. gambiae P. falciparum A. thaliana O. sativa H. sapiens M. musculus R. norvegicus G. gallus Genome size (106 base pairs) 2.5 12.1 13.8 40 100 180 278 22.8 116.8 400 3400 3454 2556 1200 ENCODE (Encyclopedia of DNA Elements) • Determining the function of all significant regions of a selected region of the human genome • For a selected 1% of the genome, the corresponding regions in 30 vertebrates genomes will be sequenced • A variety of experimental and computational techniques will be applied, including comparative genomics • The results will feed up and engine new initiatives aimed at developing models and computational tools to deal with the data in the human genome Amino acid sequences determine protein structure: Protein structure and conformation • Proteins are polymers of amino acids containing a constant main chain (backbone) of repeating units, with a variable side chain attached to each Residue i -1 Residue i Si-1 Si Residue i +1 Si+1 Side chains variable …N-C-C- N-C-C- N-C-C-… Main chain constant O O O • The amino acid sequence of a protein, together with any post-translational modifications, specify the primary structure of the protein, the fixed chemical bonds. • Because the chain is flexible, the primary structure is compatible with a very large number of spatial conformations of the main chain and side chains Sequence --> Structure • Given the right conditions, the same protein sequence will always* fold up into the same structure, the native structure • The conformation of the native state is thus determined by the amino acid sequence • This routinely happens in the cell. However, for many proteins, it can also be made to happen in a solution that only somewhat resembles cellular conditions • For a protein to take up a unique conformation means that evolution has produced a set of interresidue interactions that stabilize the desired state, and that no alternative conformation has comparable stability How is that achieved? The origins of bioinformatics • Figuring out that “how” was one of the origins of bioinformatics • Structural biologists wanted larger and larger collections of structure so they could extract rules about sequence/structure relationships and apply them to predict structure • Extracting rules from protein sequence/structure data precedes the exponential growth of the PDB and GenBank by more than a decade What’s interesting about proteins • Sequence • “Fold” • 3D shape – Surface crevices, interior holes, channels • Surface properties • Conformational changes • Effects of variability/mutation What can we do with structures? • Establish relationships between sequence patterns and structural features -- how have proteins evolved? • Develop hypotheses about the function of a particular protein • Predict how a sequence will fold • Build a model by comparison with a known structure of similar sequence What can we do with structures? • Study the effects of mutation on structure and function • Predict the effects of a novel mutation on structure or function (protein engineering -- beginning) • Design and build whole new proteins with novel functionality (protein engineering -- advanced) • Design drugs to interact with particular protein active sites Protein structure basics • Building blocks of protein structure – Amino acids – Organic cofactors – Metal ions • • • • Levels of structural description Descriptors of protein geometry Form and content of structure data Visualization styles Protein Secondary Structure -helix -sheet These secondary structures are highly present in proteins due to: -They keep the main strain in an unstrained conformation - Satisfy the hydrogen-bonding potential of the main-chain N-H and C=O groups These secondary structures link in a specific way in different combinations to perform the final protein structure -helices are formed from a single consecutive set of residues in the amino acid sequence The H-bond links the C=O group of residue i with the H-N group of residue i + 4 There are alternatives to the helix configuration giving more constrained or less constrained structures: -310 helices, in which hydrogen bonds form between residues i and i + 3 - -helices, in which hydrogen bonds form between residues i and i + 5 This configurations are much rarer due to the constraints and effects they have on the protein stability. -sheets are formed by lateral interactions of several independent sets of residues. They can bring together sections of the chain widely separated in the amino acid sequence In this figures, all the strands are anti-parallel Protein in disease Protein folding Goal = lowest energy conformation Problems of protein folding Energy trap • Energy landscape is rough – Red = slow track • Peptide trapped in energy minimum (thermodynamically favorable) – Yellow = fast track • Cell crowded • poor folding fidelity Local energy minimum = intermediate Protein misfolding diseases = amyloidoses Prions • = “Protein-based infectious particles” • Involved in multiple diseases • Protein undergoes conformational switch – PrPc = cellular isoform • Membrane-bound GPI anchored glycoprotein – PrPSc = “scrapie” isoform • Forms amyloid Cellular isoform “Scrapie” isoform