Jan. 8-10, 2003 Biochemistry 301 Principles of Protein Structure Walter Chazin 5140 BIOSCI/MRBIII E-mail: Walter.Chazin http://structbio.vanderbilt.edu/chazin Text Books Branden and Tooze Introduction to Protein Structure Voet, Voet and Pratt Fundamentals of Biochemistry Stryer Biochemistry Proteins: Polymers of Amino Acids • 20 different amino acids: many combinations • Proteins are made in the RIBOSOME Amino Acid Chemistry 20 different types R amino NH2 Ca acid COOH H R1 NH2 Ca R1 COOH H NH2 R2 NH2 Ca COOH Ca H R2 CO NH Ca H H Amino acid Polypeptide Protein COOH Amino Acid Chemistry R amino NH2 Ca acid COOH H The free amino and carboxylic acid groups have pKa’s NH3+ NH2 COOH pKa ~ 9.4 COO- pKa ~ 2.2 R +NH 3 Ca COO- H At physiological pH, amino acids are zwitterions Amino Acid Chemistry Note the axes Also titratable groups in side chain Amino Acids with Aliphatic R-Groups Glycine Gly - G 2.4 9.8 Alanine Ala - A 2.4 9.9 Valine Val - V 2.2 9.7 Leucine Leu - L 2.3 9.7 Isoleucine Ile - I 2.3 9.8 pKa’s Amino Acids with Polar R-Groups Non-Aromatic Amino Acids with Hydroxyl R-Groups Serine Ser - S 2.2 9.2 ~13 Threonine Thr - T 2.1 9.1 ~13 8.3 Amino Acids with Sulfur-Containing R-Groups Cysteine Cys - C 1.9 10.8 Methionine Met-M 2.1 9.3 Acidic Amino Acids and Amide Conjugates Aspartic Acid Asp - D 2.0 9.9 Asparagine Asn - N 2.1 8.8 Glutamic Acid Glu - E 2.1 9.5 Glutamine Gln - Q 2.2 9.1 3.9 4.1 Basic Amino Acids Arginine Arg - R 1.8 9.0 12.5 Lysine Lys - K 2.2 9.2 10.8 Histidine His - H 1.8 9.2 6.0 Aromatic Amino Acids and Proline Phenylalanine Phe - F 2.2 9.2 Tyrosine Tyr - Y 2.2 9.1 Tryptophan Trp-W 2.4 9.4 Proline Pro - P 2.0 10.1 10.6 Hierarchy of Protein Structure • 20 different amino acids: many combinations Primary Structure The order of amino acids: Protein sequence Secondary Structure Local conformation, depends on sequence Tertiary/Quaternary Structure Overall structure of the chain(s) in full 3D Beyond Primary Structure: The Peptide Bond Peptide plane is flat w angle ~180º Partial double-bond: Peptide bond - H -C = N- - = - H -C - NO O- Resonance structures Implications of Peptide Planes f R H Ca Ca Ca Peptide planes H w angle varies little, f and angles vary alot Many f/ combinations cause atoms to collide Each residue is sandwiched between two planes R Polypeptide Backbone f R H Ca Ca H R Ca H R Backbone restricted limited conformations Collisions with side chain groups further limit f/ combinations Secondary Structure Local Conformation of Consecutive Residues • Three low energy backbone f/ combinations 1. Right-hand helix: a-helix (-40°, -60°) 2. Extended: antiparallel b-sheet (140°, -140°) 3. Left-hand helix (rare): a-helix (45°, 45°) Glycine: special it has no side chain! • Hydrogen bonds between backbone atoms provides stability to secondary structures • Amino acids have specific preferences Secondary Structure- a Helix H-bond Secondary Structure- b Sheet Oxygen Nitrogen Hydrogen Carbonyl C Carbon a R Group H Bond Secondary Structure- b Turn 3 4 2 1 Reverses direction of the chain Ribbon and Topology Diagrams Representations of Secondary Structures Sheets (arrows), Helices (cylinders) B/T- Figure 2.17 Ribbon and Topology Diagrams Organization of Secondary Structures helix B/T- Figure 2.11 Beyond Secondary Structure Supersecondary structure (motifs): small, discrete, commonly observed aggregates of secondary structures b sheet helix-loop-helix bab Domains: independent units of structure b barrel four-helix bundle *Domains and motifs sometimes interchanged* Protein Motifs V/V/P- Figure 6.28 Hairpin Motif B/T- Figure 2.14 Helix-Loop-Helix (H-L-H) Motif B/T- Figure 2.12 EF-Hand H-L-H Motif B/T- Figure 2.13 Greek Key Motif B/T- Figure 2.15 Multi-Domain (Modular) Proteins Protein Domain EGF Protease Kringle Ca-binding Tertiary Structure Definition: Overall 3D form of a molecule Organization of the secondary structures/ motifs/domains Optimization of interactions between residues A specific 3D structure is formed All proteins have multiple secondary structures, almost always multiple motifs, and in some cases multiple domains Tertiary Structure Specific structures result from long-range interactions Electrostatic (charged) interactions Hydrogen bonds (OH, N H, S H) Hydrophobic interactions Soluble proteins have an inside (core) and outside Folding driven by water- hydrophilic/phobic Side chain properties specify core/exterior Some interactions inside, others outside Tertiary Structure I. Ionic Interactions (exterior) Forms between 2 charged side chains: 1 Negative – Glu,Asp 1 Positive – Lys,Arg,His Also called “salt bridges”. Ionic interactions are pH-dependent (pKa). Occurs at the exterior NOTE: pKs for in the interior of a protein may be very different from free amino acid. Tertiary Structure II. Hydrogen bonds (interior and exterior) Forms between side chains/backbone/water: Charged side chains: Glu,Asp,His,Lys,Arg Polar chains: Ser,Thr,Cys,Asn,Gln,[Tyr,Trp] Not a specific covalent bond – lower energy. Occurs inside, at the exterior, and with water. Tertiary Structure III. Hydrophobic Interactions (interior) Forms between side chains of non-polar residues: Aliphatic (Ala,Val,Leu,Ile,Pro,Met) Aromatic (Phe,Trp,[Tyr]) Clusters of side chains- but no requirement for a specific orientation like an H-bond In the protein interior, away from water Not pH dependent Tertiary Structure IV. Disulfide Bonds (interior and exterior) Forms between Cys residues: Cys-SH + HS-Cys Cys-S-S-Cys Catalyzed by specific enzymes, oxidizing agents Restricts flexibility of the protein Usually within a protein, less for linking proteins Disulfide Bonding V/V/P- Figure 16.6 Quaternary Structure Definition: Organization of multiple chain associations Oligomerization- Homo (self), Hetero (different) Used in organizing single proteins and protein machines Specific structures result from long-range interactions Electrostatic (charged) interactions Hydrogen bonds (OH, N H, S H) Hydrophobic interactions Disulfides only VERY infrequently Quaternary Structure The classic example- hemoglobin a2-b2 B/T- Figure 3.7 END OF PART 1 Protein Structure from Sequence The pattern of amino acid side chains determines the local conformation and the global structure *Pattern is more important than exact sequence* Reporting/Comparing Protein Sequences h-CaM b-CaM A T V R L L E W E D L A T V R L L E Y K D L 5 conservative 10 non-conservative Proteins Fold To Their Native Structure Folded proteins are only marginally stable!! ~0.4 kJ•mol-1 required to unfold (cf. ~20/H-bond) Balance loss of entropy vs. stabilizing forces Protein fold is specified by sequence Reversible reaction- denature (fold)/renature Even single mutations can cause changes Recent discovery that amyloid diseases (eg. CJD, Alzheimer) are due to unstable protein folding How Does a Protein Find It’s Fold? Amino terminus Carboxyl terminus N Residue number 1 C 2 3 4 • 20 different amino acids: many combinations A protein of n residues: 20n possible sequences! 100 residue protein has 10020 possibilities 1.3 X 10130! The latest estimates indicate < 40,000 sequences in the human genome THERE MUST BE RULES! Limitations on Protein Sequence *Length is generally 100-1000 residues* Minimum length based on ability to perform a biochemical function: ~40 residues (e.g. inhibitors) Maximum length based on complexity of assembly: Conversion of DNA code and production of proteins is carried out by molecular machines that are not perfect. If the sequence gets too long, too many errors will build up. Protein Folding The hydrophobic effect is the major driving force Hydrophobic side chains cluster/exclude water Release of water cages in unfolded state Other forces providing stability to the folded state Hydrogen bonds Electrostatic interactions Chemical cross links- Disulfides, metal ions Protein Folding Random folding has too many possibilities • Backbone restricted but side chains not • A 100 residue protein would require 1087 s to search all conformations (age of universe < 1018 s) • Most proteins fold in less than 10 s!! Proteins must fold along specific pathways!! Protein Folding Pathways Usual order of folding events Secondary structures formed quickly (local) Secondary structures aggregate to form motifs Hydrophobic collapse to form domains Coalescence of domains Molecular chaperones assist folding in-vivo Complexity of large chains/multi-domains Cellular environment is rich in interacting molecules Chaperones sequester proteins and allow time to fold Progressive Folding of Proteins From Disordered to Native State Protein Folding Funnel V/V/P- Figures 6.37/38 Functional Classes of Proteins • Receptors- sense stimuli, e.g. in neurons • Channels- control cell contents • Transport- e.g. hemoglobin in blood • Storage- e.g. ferritin in liver • Enzyme- catalyze biochemical reactions • Cell function- multi-protein machines • Structural- collagen in skin • Immune response- antibodies Structural Classes of Proteins 1. Globular proteins (enzymes, molecular machines) Variety of secondary structures Approximately spherical shape Water soluble Function in dynamic roles (e.g. catalysis, regulation, transport, gene processing) Globular Proteins Hemoglobin a Conconavalin A Triose Phosphate isomerase V/V/P- Figure 6.27 Structural Classes of Proteins 2. Fibrous Proteins (fibrils, structural proteins) One dominating secondary structure Typically narrow, rod-like shape Poor water solubility Function in structural roles (e.g. cytoskeleton, bone, skin) Collagen: A Fibrous Protein Triple Helix Stabilizing Inter-strand H-bonds Gly-Pro-Pro Repeat V/V/P- Figures 6.17/18 Structural Classes of Proteins 3. Membrane Proteins (receptors, channels) Inserted into (through) membranes Multi-domain- membrane spanning, cytoplasmic, and extra-cellular domains Poor water solubility Function in cell communication (e.g. cell signaling, transport) Photosynthetic Reaction Center Extracellular Membranespanning Intracellular (cytoplasmic) B/T Figure 13.6 In the physical sense, the progression of living organisms results from the communication between molecules. Interaction between molecules is determined by binding affinities. Binding Classification of Proteins • Structural- other structural proteins • Receptors- regulatory proteins, transmitters • Toxins- receptors • Transport- O2/CO2, cholesterol, metals, sugars • Storage- metals, amino acids, • Enzymes- substrates, inhibitors, co-factors • Cell function- proteins, RNA, DNA, metals, ions • Immune response- foreign matter (antigens) Surface Determines What Binds 1. Steric access 2. Shape 3. Hydrophobic accessible surface 4. Electrostatic surface Sequence and structure optimized to generate surface properties for requisite binding event(s) Determinants of Protein Surface Function requires specific amino acid properties Not all amino acids are equally useful Abundant: Leu, Ala, Gly, Ser, Val, Glu Rare: Trp, Cys, Met, His Post-translational modifications Addition of co-factors- metals, hemes, etc. Chemical modification- phosphorylation, glycosylation, acetylation, ubiquination, sumoylation Binding Alters Protein Structure Mechanisms of Achieving Functional Properties 1. Allosteric Control- binding at one site effects changes in conformation or chemistry at a point distant in space 2. Stimulation/inhibition by control factors- proteins, ions, metals control progression of a biochemical process (e.g. controlling access to active site) 3. Reversible covalent modification- chemical bonding, e.g. phosphorylation (kinase/phosphatase) 4. Proteolytic activation/inactivation- irreversible, involves cleavage of one or more peptide bonds Calcium Signal Transduction Allostery & Stimulation by Control Factor Calmodulin Ca2+ Target SequenceStructureFunction Many sequences can give same structure Side chain pattern more important than sequence When homology is high (>50%), likely to have same structure and function (Structural Genomics) Cores conserved Surfaces and loops more variable *3-D shape more conserved than sequence* *There are a limited number of structural frameworks* Varied Relationships Between Sequence, Structure and Function I. Homologous: similar sequence (cytochrome c) Same structure Same function Modeling structure from homology C-Type Cytochromes Same structure/function- Different Sequence Heme Constant structural elements and basic architecture V/V/P Figure 6.31 Varied Relationships Between Sequence, Structure and Function I. Homologous: very similar sequence (cytochrome c) Same structure Same function Modeling structure from homology II. Similar function- different sequence (dehydrogenases) One domain same structure One domain different NAD-Binding Domains Conserved Domains/Functional Elements Alcohol Dehydrogenase Lactate Dehydrogenase B/T Figure 10.8 Varied Relationships Between Sequence, Structure and Function I. Homologous: very similar sequence (cytochrome c) Same structure Same function Modeling structure from homology II. Similar function- different sequence (dehydrogenases) One domain same structure One domain different III. Similar structure- different function (cf. thioredoxin) Same 3-D structure Not same function NADH-Binding and Redox Same structure- Different Function Thioredoxin Alcohol Dehydrogenase Lactate Dehydrogenase B/T Figures 10.8/2.7