V. Evolution of Protein Structure and Function •Protein structure classification •Structural relationships among homologous proteins •Changes in proteins during evolution uncovers functionally/structurally important amino acid sites •Domain swapping •Classification of protein folding patterns •How do proteins evolve new functions? •Classification of protein functions Super secondary Structures (I) b1 • Hairpins connect two antiparallel strands; b2 • Cross-overs connect two parallel beta strands, most common through an a-helix (b-a-b topology). All cross-overs are right-handed. That is, when placing C-side strand closer and pointing right, the connecting ahelix or loop is on the top of the sheet; b1 b1 b2 b2 Right-handed Cross-over Left-handed Cross-over Super Secondary Structures (II) • Coiled-coil is a common alpha helix structure found in proteins that participate in protein folding and protein-protein interactions. – (a-b-c-d-e-f-g)n, where a and d are nonpolar that leads to a hydrophobic side • Helix bundles refers to three or more helices packing together; – Knobs into holes packing: In both kinds of helix packings, slight distortion of the individual helices and the inclination of their axes with respect to each other allows the side chains of the nonpolar residues to mesh together b-barrels It is like a sheet wrapped around a cylindre The Hierarchical nature of protein architecture • Primary structure – Proteins are first synthesized as linear sequences of amino acids • Secondary structure – The linear sequence can undergo simple packing in regions of local regularity • i.e., a-helices, b-strands, -sheets & -turns • Super-secondary structure – the packing of secondary structure elements into stable units • e.g., b-barrels, bab units, Greek keys, etc. Most of the secondary structured proteins are folded to protect hydrophobic regions (Tertiary structures) • Tertiary structure – The complex folding of packed secondary structures give the tertiary structure of the protein Some proteins work as multi-complex machines and have to undergo a quaternary level of folding. • Quaternary structure – the arrangement of separate chains within a protein that has more than one subunit • e.g., haemoglobin The highest level of organisation is the Quinternary structure • Quinternary structure – the arrangement of separate molecules, such as in protein-protein or protein-nucleic acid interactions Protein domains: compact units within the folding pattern of single chains, that look as if they should have independent stability Modular proteins are multidomain proteins which often contain many copies of closely related domains. A Domain is a compact, semi-independent region of 100-150 amino acids that has a hydrophobic core and hydrophilic exterior. Domains can be structural and/or functional Bundle structural domain b-barrel structural domain Glyceraldehyde-3-phosphate dehydrogenase has two functional domains Glyceraldehyde-3-phosphate binding domain NAD+ binding domain Quaternary Structure Spatial arrangement of protein subunits and the nature of their contacts. Hemoglobin Tetramer Immunoglobulin Quanternary Structure Evolutionary changes in protein sequences Events responsible for the generation of diversity: - mutations - insertions - deletions - transposition of large dna pieces Selection reacts to protein function as determined with protein structure A mutant gene may determine a protein with: - equivalent function (neutral mutation) - new and optimised function (adaptive evolution) - new and sub-optimised function (purifying selection) Evolution and proteins • You can see the effects of evolution, not only in the whole organism, but also in its molecules - DNA and protein • For a mutation to have an effect on the phenotype (and be subject to selection) it must (usually) affect the structure or function of a protein • You can learn a lot about evolution by studying the structure of proteins Evolution in a population may occur through positive or negative selection or through the neutral fixation of proteinfunction variants Proteins from different species have similar but not identical sequences. This fact implies that they have similar but not identical protein structures Gilbert maintained that exons represent structural components of proteins that can be recombined in different contexts, as a mechanism of generation of new protein folds. This suggestion could not been supported below the protein domain level Table of alignment of amino acid sequences is a very useful tool for evolutionary studies and provide more information than structure does The pattern of variation at the amino acid level give clues of the selective constraints operating in the sequence or even in the protein structure It is possible to construct phylogenetic trees derived from tabulations of related sequences. Phylogenies derived from different families of proteins from the same range of species are mutually consistent with the branching order To infer phylogenetic relationships between species through genes it is important to choose functionally equivalent proteins One of the hypotheses that have gained much attention is the molecular clock hypothesis, which suggests that amino acid substitutions proceed at a constant rate within protein families A molecular clock • Plot the number of changes in amino-acids between the same protein in different species (such as cytochrome C) against the time since the species diverged • Gives a straight line - so evolution of a protein sequence proceeds at a constant rate and therefore can be used as a clock Calibration of the clock for specific protein families would ensure the dating of biological events not present in the fossil record and would imply that changes are non-adaptive due to their independence of the selective constraints Variability of selective constraints in protein molecules Amino acid substitution rates do vary between: - Different protein families - proteins within the same family - amino acid regions in the same protein The main reason for the variability in the substitution rates among amino acid regions is that different amino acids are under different functional and structural constraints Those amino acids playing less important functional or structural role can fix greater number of mutations due to their neutral effect on the biological fitness of the protein Evolution of protein structure In families of closely related proteins, mutations alter the specificity of proteins rather than changing their structure - Family of serine proteinases -specificity of haemoglobin by other ligands In very few cases punctual mutations alter the protein in such a way that novel functions arise, being the chymotryosin family of serine proteinases a clear example of the emergence of novel functions: - Haptoglobine = chymotripsine – proteolytic activity. Acts as a chaperone, preventing protein aggregation - Serine proteinases of rhinoviruses forms the initiation complex of RNA synthesis Neutral evolution vs selection Non-synonymous nucleotide substitution Amino acid replacements changes Protein function or structure Neutral Theory of molecular evolution Purifying selection Amino acid changes Neutrality Positive selection Biological fitness (W) Selection: Positive & Negative One sequence scenario Population scenario A A A A C C A One sequence scenario again ThrSer ACGTCA ThrPro ACGCCA A A A A A A A C C A A A C C The selection criteria could in principle be anything, but the selection against amino acid changes is without comparison the most important. ArgSer AGGCCG ThrSer ACGCCG ThrSer ACTCTG AlaSer GCTCTG AlaSer GCACTG Certain events have functional consequences and will be selected out. The strength and localization of this selection is of great interest. Domain combination and recombination One mechanism to ensure the generation of different partners is gene duplication followed by divergence In some cases catalytic domains can be formed by the contribution of both duplicates (paralogues) Serine proteinases domains In some others, gene duplication provides an additional regulatory function, by development of an oligomeric protein mutations on the tetrameric structure of haemoglobin can turn the allosteric structure efficiency in transporting oxygen Proteins can combine gene duplication or fusion with generation of partners by domain swapping IL-5 A A B B B A Two-domain monomer Domain-swapped dimer Families of related proteins tend to retain similar folding patterns The general folding pattern of a protein use to be preserved even with amino acid substitutions. The amount of structural distortions, however, increases locally with the increase in the amino acid sequence divergence between two proteins These distortions are not uniformly distributed in the structure but seems that the core preserve the folding pattern in a family, with other parts of the structure suffering dramatic distortions In the overwhelming majority of proteins, the core is formed by the main elements of secondary structures and peptides flanking them, including active sites peptides The fraction of identical residues in the core measures the amount of sequence divergence between two proteins proteins related in more than 60% of amino acids, the core contains more than 90% of the residues, the refolding of the remaining 10% will involve minor surface loops Pairwise Sequence Identities and Structure Similarity (SSAP) Scores in Domain Families structure similarity (SSAP) score same function different function sequence identity (%) ATP Grasp Superfamily Biotin Carboxylase D-Ala D-Ala Ligase In distant homologues the structure can be embellished - but 50-60% of the structure in the core is highly conserved Conservation of Protein Structure the cores of protein structures are very well conserved during evolution even when their sequences have changed considerably comparing protein structures allows us to identify more distant evolutionary relationship Structural Genomics initiatives will give structures for most of the major protein families Related structures RMSD usually < 3.5A Evolution of New Protein Functions gene duplication incremental mutations gene fusion oligomerisation Protein structures can accommodate many but not all single-site mutations Some of this single mutations are very important from the medical point of view: SNPs can produce incorrect chain termination (some Thalassaemias) As qualitative rules, we should know that single mutations on the surface of proteins use to be innocuous. Mutations in important buried regions of the molecule will be lethal and removed by selection (we will never see it) Natural protein variants are only a subset of all possible variants that have been subjected to natural selection Artificial variants can extend our knowledge beyond the imaginable and show as the possible subsets of optimising proteins The allumwandlung technique consists on the substitution of a single amino acid by the other 19, testing of functional properties, and their crystal-structure solution In case we could predict the effect of single mutations on the protein structure and function, that would be a first step to design more optimum proteins with a clear relatedness to public health