Protein Structures from an NMR Perspective Background – We are using NMR Information to “FOLD” the Protein. – We need to know how this NMR data relates to a protein structure. – We need to know the specific details of properly folded protein structures to verify the accuracy of our own structures. – We need to know how to determine what NMR experiments are required. – We need to know how to use the NMR data to calculate a protein structure. – We need to know how to use the protein structure to understand biological function Protein Structures from an NMR Perspective Analyzing NMR Data is a Non-Trivial Task! there is an abundance of data that needs to be interpreted X Interpreting NMR Data Requires Making Informed “Guesses” to Move Toward the “Correct” Fold Distance from Correct Structure Not A Direct Path! Initial rapid convergence to approximate correct fold Correct structure NMR Data Analysis Iterative “guesses” allow “correct” fold to emerge Protein Structures from an NMR Perspective What Do We Mean By Informed “Guesses”? As we will see in detail, analysis of NMR data is commonly ambiguous But, it represents a unique structure! A simple illustration: Diagonal peak assigned to Ala 97 CaH Options: NOE cross- peak assigned to Thr 17 CgH Chemical Shift assignment of peak consistent with: Ala 16 CbH Thr 43 CgH Ile 36 Cg2H etc, 1) be conservative and leave the ambiguous peak unassigned 2) Guess the assignment at Ala 16 CbH based on the proximity to assigned Thr 17 CgH Distance from Correct Structure Protein Structures from an NMR Perspective Initial rapid convergence to approximate correct fold Iterative “guesses” allow “correct” fold to emerge Correct structure NMR Data Analysis To progress to the correct protein fold, it is important to make limited “guesses” Do Not Be Afraid or Hesitant to Make Reasonable “Guesses”! • if the “guess” is wrong: within limits, process is self-correcting too many guesses are a problem the structure combined with the abundance of other correct data will identify the wrong “guess” • if the “guess” is correct: the assignment will be consistent with the structure more correct DATA! may resolve other ambiguous data allow for other “guesses” to further the structure analysis Protein Structures from an NMR Perspective What Information Do We Know at the Start of Determining A Protein Structure By NMR? Amino Acids (building blocks of protein structures): Important features of Amino Acids that Impact the Overall Structure of a Protein Include: Size Charge Polarity Hydrophobicity Aromaticity Conformationally unusual side chains Protein Structures from an NMR Perspective What Information Do We Know at the Start of Determining A Protein Structure By NMR? Amino Acids (building blocks of protein structures): Important features of Amino Acids that Impact the Overall Structure of a Protein Include: Zwitterion (charge) The chemistry of amino acids is complicated by the fact that the -NH2 group is base and the -CO2H group is an acid. At physiological pH (7.4), an H+ ion is transferred from one end of the molecule to the other to form a zwitterion or “salt-like” structure Protein Structures from an NMR Perspective Illustration of Zwitterion Characteristics of Amino Acids from the pH Titration of Alanine Protein Structures from an NMR Perspective Polar Amino Acids: Asparagine, ASN, N Serine, SER, S Cysteine, CYS, C Theronine, THR, T Glutamine, GLN, Q Tryptophan, TRP, W Histidine, HIS, H (depends on pH) Tyrosine, TYR, Y Carbon: gray Oxygen: red Hydrogen: white Nitrogen: light blue Sulfur: yellow Protein Structures from an NMR Perspective Hydrophobic Amino Acids: Alanine, ALA, A Phenylalanine, ALA, A Isoleucine, ILE, I Proline, Pro, P Leucine, LEU, L Valine, VAL, V Methionine, MET, M Glycine, GLY, G Carbon: gray Oxygen: red Hydrogen: white Nitrogen: light blue Sulfur: yellow Protein Structures from an NMR Perspective Charged Amino Acids: Positive Arginine, ARG, R Histidine, HIS, H (depends on pH) Lysine, LYS, K Negative Aspartate, ASP, D Glutamate, GLU, E Carbon: gray Oxygen: red Hydrogen: white Nitrogen: light blue Sulfur: yellow Protein Structures from an NMR Perspective Amino Acid Structures as Part of a Protein Structure: Knowing the shape and composition of individual amino acids makes it easier to identify them as part of a more complex protein structure Protein Structures from an NMR Perspective Amino Acid Structures as Part of a Protein Structure: Protein Structures from an NMR Perspective Venn diagram grouping amino acids according to their properties Livingstone & Barton, CABIOS, 9, 745-756, 1993 Protein Structures from an NMR Perspective Name 3-Letter Code Single Code Relative Abundance MW Alanine ALA A 13.0 89 Arginine ARG R 5.3 175 Asparagine ASN N 9.9 132 Aspartate ASP D 9.9 132 Cysteine CYS C 1.8 Glutamate GLU E Glutamine GLN Glycine pKa Residue Volume (Å3) Surface Area (Å2) Charged, Polar, Hydrophobic 88.6 115 H 173.4 225 C+ 111.1 150 P 4.5 114.1 160 C- 121 9.1-9.5 108.5 135 P 10.8 146 4.6 138.4 190 C- Q 10.8 146 143.8 180 P GLY G 7.8 75 60.1 75 - Histidine HIS H 0.7 155 153.2 195 P, C+ Isoleucine ILE I 4.4 131 166.7 175 H Leucine LEU L 7.8 131 166.7 170 H Lysine LYS K 7.0 147 168.6 200 C+ Methionine MET M 3.8 149 162.9 185 H Phenylalanine PHE F 3.3 165 189.9 210 H Proline PRO P 4.6 115 112.7 145 H Serine SER S 6.0 105 89.0 115 P Threonine THR T 4.6 119 116.1 140 P Tryptophan TRP W 1.0 204 227.8 255 P Tyrosine TYR Y 2.2 181 193.6 230 P Valine VAL V 6.0 117 140.0 155 H ~12 6.2 10.4 9.7 Protein Structures from an NMR Perspective Some General Rules Regarding the Distribution of Amino Acids in Proteins: • Charged residues are hardly ever buried. ► if buried generally involved in “salt-bridge” • Polar residues are usually found on the surface of the protein, but can be buried. ► if buried generally involved in hydrogen bond • The inside, or core of a protein contains mostly non-polar residues. • Non-polar residues are also found on the outside of proteins. Energetic Cost of Putting Amino Acid in Interior or Surface of Protein Amino Acid Composition of Protein Interior and Surface Residue Total Inside Surface -RTln(surface/inside) Ala 8.7 11.0 7.9 0.20 Arg 3.1 0.4 4.0 -1.34 Asn 5.2 2.0 6.3 -0.69 Asp 6.1 2.2 7.4 -0.72 Cys 2.7 5.4 1.8 0.67 Gln 3.6 1.3 4.5 -0.74 Glu 4.9 1.0 6.2 -1.09 Gly 9.0 9.7 8.6 0.06 His 2.3 2.4 2.2 0.04 Ile 4.9 10.5 3.0 0.74 Leu 6.5 12.8 4.3 0.65 Lys 6.7 0.3 8.9 -2.00 Met 1.5 3.0 0.9 0.71 Phe 3.6 7.7 2.5 0.67 Pro 4.0 2.2 4.7 -0.44 Ser 7.9 5.0 8.9 -0.34 Thr 6.4 4.6 7.1 -0.26 Trp 1.6 2.7 1.3 0.45 Tyr 4.4 3.3 4.8 -0.22 Val 6.6 12.7 4.6 0.61 Totals 5436 1396 4040 Protein Structures from an NMR Perspective Kyte-Doolittle Hydropathy Ranking of Relative Amino Acid Hydrophobicity - Does it make sense for the residue to be on the protein surface or buried in its core? - Based on an amalgam of experimental observations derived from the literature. - Web page to calculate hydrophobicity plots for protein sequence http://fasta.bioch.virginia.edu/o_fasta/grease. htm J. Mol. Biol. (1982) 157: 105-132. Protein Structures from an NMR Perspective Biological Base Hydrophobicity Scale (Nature (2005):433:377) - based on the stability of a peptide sequence in a membrane where n = 0-7 Decreasing stability - also, variable stability based on position Protein Structures from an NMR Perspective Consensus Hydrophobicity Scale (Journal of Chromatography A (2003):1000:637) Distribution of hydrophobicity rankings Ala Arg Asp Asn Cys Gln Comparison of four commonly used oil partioning scales to measure hydrophobicity ethanol-dioxane, N-methylacetamide, octanolwater, water-cyclohexane Protein Structures from an NMR Perspective Consensus Hydrophobicity Scale (Journal of Chromatography A (2003):1000:637) Distribution of hydrophobicity rankings Glu Gly Met Phe His Ile Pro Ser Leu Lys Thr Trp Protein Structures from an NMR Perspective Some General Rules Regarding the Distribution of Amino Acids in Proteins: • To bury charged or polar residues, residues are probably involved in a “salt bridge” or hydrogen bond. Salt Bridge: Hydrogen Bond: Salt-bridge • This minimizes or eliminates the DG transfer energy needed to bury polar or charged residues Protein Structures from an NMR Perspective Propensity of Amino-Acids To Be Present In A Protein’s Active-Site: • probability of contact with a non-protein atom positive number means higher than random likely to be part of active-site negative number means lower than random unlikely to be part of active-site • does not include protein-protein or protein-peptide interactions roles for tryptophan and proline HIS CYS SER LYS THR ASN ARG GLN GLU ASP 0.360 0.210 0.130 0.100 0.100 0.080 0.055 0.050 0.050 0.045 ALA MET ILE TYR VAL GLY PHE TRP LEU PRO 0.025 0.025 -0.005 -0.040 -0.060 -0.070 -0.120 -0.140 -0.180 -0.200 Holm & Sander, Intelligent Systems for Molecular Biology, 5, 140-146, 1997 Protein Structures from an NMR Perspective All Amino Acids (except Gly) Have at Least One Chiral Center • All amino acids in protein are L-configuration Gly Increases Main Chain Flexibility • well-conserved during evolution Branched Side Chains are Stiffer • Val, Ile, Leu • chain folding is facilitated (DS is small) Pro is a Very Rigid Side-Chain • Also Fixes backbone conformation • Phi (f) is always -60o His is Suitable for Enzyme Catalytic Site • Commonly Found in Protein Active-Site • pKa (6.0) Near Physiological pH Cys can form intra or inter-strand bonds • formation of disulphide bonds between two spatial close Cys • free Cys can cause problems by forming unwanted cross-linking Protein Structures from an NMR Perspective pH Titration of Histidine Side Chain • observed pKa is very dependent on the local structure around the histidine Protein Structures from an NMR Perspective pH Titration of Histidine Side Chain • • • Experimentally measure pKa of His by following chemical shift difference of His ring proton as a function of pH. Will observe different pKa’s for different His in a single protein based on their local structure and involvement in protein’s function/activity. pKa = pH where the observed chemical shift is half-way between protonated and deprotonated state pKa His fully protonated His fully deprotonated Protein Structures from an NMR Perspective pH Titration of Histidine Side Chain • • • Experimental data for Human Myoglobin Similar Titrations for Other Side-Chains (Tyr, Glu) Measure presence of salt-bridge, hydrogen bonds, etc. Protein Structures from an NMR Perspective pH Titration of Histidine Side Chain ► Presence of a protonated side chain affects the local carbon chemical shifts – – Unprotonated: ► Ca 54.3 ppm ► Cb 30.7 ppm Protonated: ► Ca 53.3 ppm ► Cb 28.5 ppm Protein Structures from an NMR Perspective Spectral properties of amino acids: • Trp, Tyr, and Phe contain conjugated aromatic rings and absorb UV light. ► Extinction coefficients are: Trp 5,050 M-1cm-1 (280 nm) Tyr 1,440 M-1cm-1 (274 nm) Phe 220 M-1cm-1 (257 nm) ► Extinction coefficients are additive: Therefore, if a protein contained 3 Tyr and one Trp its extinction coefficient would be: e =3 x 1,440 + 1 x 5,050 = 9,370 Protein Structures from an NMR Perspective Basic Amino Acid Nomenclature: Protein Structures from an NMR Perspective More Detail Amino Acid Nomenclature: Each atom is given a unique identifier. This includes equivalent methyl hydrogens. Two Versions of Naming Convention 31 Protein Structures from an NMR Perspective Amino Acid 1H NMR Assignments: Protein Structures from an NMR Perspective Amino Acid 13C NMR Assignments: Protein Structures from an NMR Perspective NMR Chemical Shifts Exhibit Specific Amino-Acid Trends – By combining 2 or more correlated chemical shifts Protein Structures from an NMR Perspective Local Protein Structure Affects NMR Chemical Shifts – – Significant Deviations From Random-Coiled Chemical Shifts Are Routinely Observed ► Charge state, conformation, covalent modification, etc. Structure-Based Deviations May be Larger than Residue Based Differences ► Ring Current Effect Proximity to Aromatic Rings will have pronounced affect on NMR Chemical shifts. - Affect also depends on spatial orientation above/below plane has different impact than edge on. - Which amino-acids that are next to aromatic rings depend on the overall fold of the protein Protein Structures from an NMR Perspective Local Protein Structure Affects NMR Chemical Shifts ► Hydrogen Bond – a dipole-dipole attraction – typical ranges: – 2.4 Å < d < 4.5 Å – 180o < f < 90o HN Chemical Shifts and Hydrogen Bond Length DdN = 19.2dN-3 – 2.3 Wagner et al., JACS, 105, 5948, 1983 Protein Structures from an NMR Perspective A Number of Amino Acid Hydrogens are Labile and Exchange Readily with Water – – – Exchange Rate is pH Dependent As Exchange Increases NMR Lines Broaden Beyond Detection Backbone NH is Critical Hydrogen that Exchanges with Water ► – Hydrogen Bonds and buried NHs (protected from solvent) decrease Exchange Rate Reason Why NMR Samples Use low pH Buffers (typically pH 5.0 to 6.5) NMR Line widths Increase Exchange Rate Protein Structures from an NMR Perspective Overview of Some Basic Structural Principals: a) Primary Structure: the amino acid sequence arranged from the amino (N) terminus to the carboxyl (C) terminus polypeptide chain b) Secondary Structure: regular arrangements of the backbone of the polypeptide chain without reference to the side chain types or conformation c) Tertiary Structure: the three-dimensional folding of the polypeptide chain to assemble the different secondary structure elements in a particular arrangement in space. d) Quaternary Structure: Complexes of 2 or more polypeptide chains held together by noncovalent forces but in precise ratios and with a precise threedimensional configuration. Protein Structures from an NMR Perspective Primary Structure: linear arrangement of amino-acid sequence N- Alanine – Glycine – Phenylalanine – … – Tyrosine – Serine – C Three letter code: N-Ala-Gly-Phe- … -Tyr-Ser-C Single Letter code: AGF…YS Protein Structures from an NMR Perspective The linear arrangement of amino-acid are joined or connected by the formation of a peptide bond The Peptide Bond: chemical linkage -CO-NH- formed by the condensation of the amino group and carboxyl group of a pair of amino acids to form an amide bond. Protein Structures from an NMR Perspective Important Features of the Peptide Bond: 1) the bond is always planar. – Rotation about peptide bond is inhibited 2) The bond is very stable – Not generally pH, buffer or temperature labile • Boil the sample in very high or low pH to cleave • Cleavage more efficient at high pH – Exception: cleavage occurs at Asp-Pro peptide bond at low pH and elevated temperatures • Half-life at pH 2.5 and 40oC is ~ 50 hrs Protein Structures from an NMR Perspective Important Features of the Peptide Bond: 3) the bond is always trans except for proline – Cis-proline and trans-proline exhibit unique H-H distances • Trans: distance of Ha of residue preceding proline and the proline Hd is short (<2.5Å) • Cis: distance of Ha of residue preceding the proline and the proline Ha is short (<2.5Å) Protein Structures from an NMR Perspective Important Features of the Peptide Bond 4) Structural Dimensions are well defined – Bond lengths and bond angles of peptide bond are known Protein Structures from an NMR Perspective General PolyPeptide Nomenclature: Protein Structures from an NMR Perspective Amino Acid Structural Nomenclature: - Definitions of Torsion Angles ► Backbone Phi (f): C’i-1 – Ni – Cai – C’i Psi (y): Ni – Cai – C’i – Ni+1 Omega (w): Cai-1 – C’i-1 – Ni– Cai – constrained to 180o ► Side-chain Chi-1 (c1): Ni – Cai – Cbi – Cgi Chi-2 (c2): Cai – Cbi – Cgi –Cd1i Note: c1 Ile:Cg1, Ser:Sg, Thr:Og1, Val:Cg1 c2 His: Nd1 Protein Structures from an NMR Perspective Ramachandran Plot: - Peptide Conformation is Defined by f,y diehdrals (w – constrained) Steric Configurations Limits the Range of f,y diehdrals Available to the Amino Acid. ► Pro is more restricted where f is constrained to -60o ► Gly is less restricted, wider range of f,y diehdrals Non-Gly/Proline Allowable Regions in f,y space. Dark Gray Corresponds to Most Favorable Regions . Significant region of f,y is unallowed Gly Protein Structures from an NMR Perspective Ramachandran Plot: ‒ ‒ ‒ – – If f,y dihedral values were listed for every amino acid ► Protein Topology is Defined! Ramachandran considered what combinations of f, y were favorable for each amino acid ► Only van der Waals forces were considered. How many backbone conformations of a 300 residue protein are possible? ► Only f, y important. ► f, y need only be given ±15o i,.e sampled every 30o ► Consider only minima of Ramachandran plot. Still Encounter Approximately 10300 conformations! Levinthal paradoxes: ► How is the right conformation found? ► Why are there only ~5,000 protein folds? Protein Structures from an NMR Perspective Ramachandran Plot: ‒ Sensitivity of the protein structure to relatively small changes in f, y Same Number of Amino Acids f = -57o y = -70o f = -57o y = -47o f = -74o y = -4o Protein Structures from an NMR Perspective Similar Issues For Side Chain Conformation: - Steric considerations define allowable c Staggered configuration is lowest energy ► 60o, -60o or 180o Valine c1 180o 60o -60o Protein Structures from an NMR Perspective Protein Structures from an NMR Perspective Limited Number of Possible Conformers for c1, c2, c3 - All conformers are not equal energy Different amino acids have different c energy profile and different population ► Example Potential Energy Surfaces for Side Chain Dihedrals • Still combination of 60o, 180o, or -60o (300o) Gln/Glu c1 c2 Map http://spin.niddk.nih.gov/clore/Software/Torsion_angles/protein-tor/protein_side.html Gln/Glu c2 c3 Map Protein Structures from an NMR Perspective ► Example Potential Energy Surfaces for Side Chain Dihedrals • Still combination of 60o, 180o, or -60o (300o) Leu c1 c2 Map Ile c1 c2 Map Protein Structures from an NMR Perspective c2 for Phe, Trp and Tyr are Restricted to 90o or -90o c1 can still be 60o, 180o or -60o Phe/Tyr c1 c2 Map Trp c1 c2 Map Protein Structures from an NMR Perspective Primary Structure: Disulphide Bonds ► ► Distinct regions of the primary polypeptide sequence may be joined by the formation of a disulphide bond between two spatially adjacent Cysteines. Disulphide bonds are formed by the oxidation of two cysteine residues to form a covalent sulphur-sulphur bond which can be intra- or inter- molecular bridges. - Multiple disulphide bonds are possible in a protein structure. - Presence of a disulphide bond(s) restricts the conformations available to the protein. -Disulphide bonds stabilize the overall protein’s fold by 2.5 - 3.5 kcal/mol. -Disulphide bond is present in both folded and unfolded protein. Probably only contributes entropically, not enthalpically. Cysteine Cysteine Disulphide bond Protein Structures from an NMR Perspective Primary Structure: Disulphide Bonds ► Restriction of conformational space is more apparent in small protein structures ► Presence of free Cysteines in the protein structure may cause problems in NMR/X-ray structural work – May cause unwanted interstrand cross-linking aggregation/solubility issues – Use reducing agents (DTT, 2-Mercaptoethanol) or mutate Cys to Ser. Protein Structures from an NMR Perspective Primary Structure: Disulphide Bonds ► Geometry of a disulphide bond – – – Sg – Sg covalent bond length of 2.08Å Defined by 5 dihedral angles Two main types: Left-handed: c1 -60o c2 -60o c3 -85o c2’ -60o c1’ -60o Ca-Ca distance 5.88±0.49Å Right-handed: c1 -60o c2 +120o c3 +99o c2’ -50o c1’ -60 Ca-Ca distance 5.07±0.73Å Protein Structures from an NMR Perspective Primary Structure: Disulphide Bonds ► Presence of a disulphide bond affects the local carbon chemical shifts – – Reduced: – Ca 56.9 ppm – Cb 28.9 ppm Oxidized: – Ca 54.05 ppm – Cb 42.25 ppm Cysteine Cysteine Disulphide bond Protein Structures from an NMR Perspective What Information Do We Know at the Start of Determining A Protein Structure By NMR? Effectively Everything We have Discussed to this Point! The primary amino acid sequence of the protein of interest. ► All the known properties and geometry associated with each amino acid and peptide bond within the protein. ► General NMR data and trends for the unstructured (random coiled) amino acids in the protein. The number and location of disulphide bonds. ► Not Necessary can be deduced from structure. Protein Structures from an NMR Perspective Secondary Structure: regular arrangements of the backbone of the polypeptide chain without reference to the side chain types or conformation Major Types of Secondary Structure Elements: ► helices a-helix 310helix p-helix ► b-strands parallel anti-parallel ► Turns b turns types I,I’,II,II’,III,III’,Via,VIb g turns Inverse ► Other or random coil Assigning the Secondary Structure is the First Stage of Determining an NMR Protein Structure Protein Structures from an NMR Perspective Secondary Structure: Helices - Helix Nomenclature Protein Structures from an NMR Perspective Secondary Structure: Helices Secondary structures are typically distinguished by f,y values and hydrogen bonding pattern - Radius (Å) (backbone) Atoms in Hbonded loop (A) H-bond pattern (CO, HN) i,i+4 Structure f y Residues per helical turn a-helix -57 -47 3.6 1.5 5.4 2.3 13 310-helix -74 -4 3.0 2.0 6.0 1.9 10 p-helix -57 -70 4.4 1.1 5.0 2.8 16 Helical Rise r(Å) Helical Pitch p(Å) i,i+3 i,i+5 Protein Structures from an NMR Perspective Secondary Structure: Helices - Secondary structures are typically distinguished by f,y values and hydrogen bonding pattern Protein Structures from an NMR Perspective Secondary Structure: Helices ‒ ‒ ‒ ‒ ‒ ‒ a-helix – most common helix found in protein structures most thermodynamically stable ► 31% of secondary structure elements Right-handed twist to helix. Helix Dipole ~ 85% of helices are distorted (f,y ≠ -60o) Amino-acid preference in a-helix Side-chains on the Surface of Helix Protein Structures from an NMR Perspective Secondary Structure: Helices ‒ Amino Acid Preference for a-Helix a-Helix Propensity (larger number better) Ala: 1.489 Arg: 1.224 Asn: 0.772 Asp: 0.924 Cys: 0.966 Gln: 1.164 Glu: 1.504 Gly: 0.510 His: 1.003 Ile: 1.003 Leu: 1.236 Lys: 1.172 Met: 1.363 Phe: 1.195 Pro: 0.492 Ser: 0.739 Thr: 0.785 Trp: 1.090 Tyr: 0.787 Val: 0.990 Protein Engineering 1:289-294(1987). J. Mol. Biol. (2004) 337, 1195–1205 Protein Structures from an NMR Perspective Secondary Structure: Helices ‒ ‒ Amphipathic a-helix have a polar and a non-polar side ► ‒ hydrophobic residues are regularly spaced three or four position apart in a linear sequence. plays a crucial role in ► ► ► ► ► helix-helix interaction interaction of small peptides that have a helical conformation interaction with membranes air-water interfaces self-assembly processes Helical wheel representation of amphipathic a-helix leucine zipper Protein Structures from an NMR Perspective Secondary Structure: Helices ‒ ‒ Amphipathic a-helix have a polar and a non-polar side Amphipathic a-helix intereacts with membrane Protein Structures from an NMR Perspective Secondary Structure: Helix Dipole ► ► ► CO - HN H-bonds are almost parallel with the helix axis H-bond dipoles reinforce in the helix to form helix dipole Helix dipole (+ end towards N-terminal) capping by hydrogen bonding to NH and CO groups at the N- and C-termini charge-dipole interactions charged side chains form stabilizing interactions with the helix dipole. Protein Structures from an NMR Perspective Secondary Structure: Helix Dipole ► Residues preferred at N- and C-terminus of an a-helix Protein Science (1995), 4:1325-1336. Protein Structures from an NMR Perspective Secondary Structure: Helices ‒ ~ 85% of helices are distorted (f,y ≠ -60o) ► radius of curvature > 90Å ► deviation of axis from straight line is ≥ 0.25Å. ‒ Distortions caused by: ► A substantial amount of all 310-helices occur at the ends of a-helices. ► p-helixes also occur at the ends of a-helices. ► Packing of buried helices against other secondary structural elements in the core of a protein can lead to distortions since the side chains are on the surface of helices. ► Proline residues induce distortions of around 20o in the direction of a helix. Proline causes 2 hydrogen bonds in the helix to be broken. Helices containing proline are usually long because shorter helices would be destabilized. ► Exposed helices are often bent away from the solvent. CO form H-bonds with solvent Protein Structures from an NMR Perspective Secondary Structure: Helix Length ‒ ‒ ‒ Average Length of a-helix is 10 residues One helical turn requires ~4 residues defines minimal length Helix Nomenclature: ...-N''-N'-Ncap-N1-N2-N3-............-C3-C2-C1-Ccap-C'-C''-... ► Ncap : N-terminus of helix, Ccap: C-terminus of Helix Stability of Helix Length Depends on Relative Spatial Orientation of Ncap, Ccap, etc Position of C-cap relative to N-cap in function of length. The good length are black circle, the bad length are white circle. The N-cap is a cross. Position of C2 relative to N-cap in function of length. The good length are black circle, the bad length are white circle. The N-cap is a cross. Protein Structures from an NMR Perspective Secondary Structure: 310-helix and p-helix – – 310-helix is rare ► Only 3.4% of helical residues. ► Found at end of a-helix. ► Dipoles not aligned as in a-helix. ► 3 residues per turn & 10 atoms enclosed in ring formed by each hydrogen bond. ► CO forms H-bond with NH 3 residues along chain (i,i+3) p-helix is extremely rare ► Found at end of a-helix ► f,y at edge of allowed region of Ramachandran plot ► t (N-Ca-C') angle is 114.9o larger than standard 109.5o ► Larger radius causes axial hole too small for solvent ► Side-chains less staggered than a-helix Protein Structures from an NMR Perspective Secondary Structure: b-strands – b-sheet is an abundant secondary structure 25% of globular proteins ► ► ► ► ► b-strands adopt an extended structure with an average length of 6 residues Single b-strands are not stable. If the b -strand contains alternating polar and non-polar residues amphipathic b -sheet. b-strands occur in association with other strands to form b-sheets. Strands can be parallel: NC or anti-parallel: NC NC CN b-strand has right-handed twist (0-30o per residue) Hydrogen bonding occurs between strands H-bond geometry is different between parallel and anti-parallel strands Structure f y Residues per repeat Rise r(Å) Pitch p(Å) Parallel b-strand -119 113 2.0 3.2 6.4 Anti-parallel b-strand -139 135 2.0 3.4 6.8 rise Rise – distance between adjacent residues Pitch- distance between repeat structure pitch Protein Structures from an NMR Perspective Secondary Structure: b-Sheets - Secondary structures are typically distinguished by f,y values and hydrogen bonding pattern b a Protein Structures from an NMR Perspective Secondary Structure: b-strands – anti-parallel b-sheet ► ► Left-handed twist (~25o) Majority of bulges occur in anti-parallel b-sheets Note: alternating spaced H-bonds b-strand II Hydrogen bonds between NH (blue) and CO (red) C-terminus N-terminus b-strand I H-bond length 2.9±0.3Å Protein Structures from an NMR Perspective Secondary Structure: b-strands – b-bulge ► hydrogen-bonding of two residues from one strand with one residue from another strand Bulge Hydrogen bonds from residue 33 to both residues 41 and 42 Protein Structures from an NMR Perspective Secondary Structure: b-strands – parallel b-sheet ► ► ► Less twisted than anti-parallel b-sheets Less likely to have a bulge compared to anti-parallel b-sheets (only ~ 5%) Hydrogen bonds are not perpendicular to individual strands Has macrodipole that is ~5 times less than average a-helix dipole b-strand II Hydrogen bonds between NH (blue) and CO (red) N-terminus C-terminus b-strand III b-strand I Note: Individual strands that comprise a sheet do not need to be sequentially related or the same size Protein Structures from an NMR Perspective Secondary Structure: b-Sheets – – – – – b-sheet can continue in both directions. ► Most b-sheets have < 6 b-strands with an average of 6 residues per strand. ► H-bonds are 0.1Å shorter than a-helix b-sheets can be all parallel, all anti-parallel or mixed. Formed from strands that are very often from distant portions of the polypeptide sequence. Lengths of individual strands can vary. ► Do not need to be of uniformed length Most b-sheets exhibit a left-handed twisted (~25o). ► results from a relative rotation of each residue in the strands by 30 o per amino acid in a right-handed sense. Protein Structures from an NMR Perspective Secondary Structure: b-sheet ‒ Amino Acid Preference for b-Sheet ► Hydrophobic and steric effects are unimportant ► inductive effect largely determines the beta-sheet propensities amino acid side chains shielding of the Ca nucleus ► No capping preference has been identified to date b-Sheet Propensity (larger number better) Ala: 0.79 Arg: 0.94 Asn: 0.66 Asp: 0.66 Cys: 1.07 Gln: 1.00 Glu: 0.51 Gly: 0.87 His: 0.83 Ile: 1.57 Leu: 1.17 Lys: 0.73 Met: 1.01 Phe: 1.23 Pro: 0.62 Ser: 0.94 Thr: 1.33 Trp: 1.24 Tyr: 1.31 Val: 1.64 Protein Structures from an NMR Perspective Secondary Structure: Turns ‒ Short and tight structural regions that connect other secondary structure elements ► Comprised of 3 to 5 residues ► Allows the peptide chain to reverse directions Therefore, Proline and glycine are prevalent in turns Connect adjacent b-strands ► Reverse turns occur mainly on the surface Therefore, charged residues are prevalent in turns ► Two common turns: b-turns More common turn Four consecutive residues, two do not form H-bonds Carbonyl of one residue is H-bonded to amide proton of a residue three residues away Distance < 7 Å between the Ca atoms of residue i and i+3 Nine types of b-turns differ by f, y of i+1 and i+2 residues Types I’, II’, III’ are mirror images of Types I, II, III Type III b-turns may be considered as short regions of 310helix g-turns Very tight turn Three consecutive residues, one does not H-bond. Protein Structures from an NMR Perspective Secondary Structure: Turns - - Secondary structures are typically distinguished by f,y values and hydrogen bonding pattern Some preferred residues are indicated, bold are most significant i+ 1 i+ 2 f y f y R(i) R(i+1) R(i+2) R(i+3) Type I -60 -30 -90 0 C, P, S, H, N, D P, S, E T, S, N, D G Type I’ 60 30 90 0 Y G, H, N, D G K Type II -60 120 80 0 Y, P P, K G C, S, K Type II’ 60 -120 -80 0 G Type III -60 -30 -60 -30 Type III’ 60 30 60 30 Turn b-turns Type IV G G Deviations of more than 40o of the above Type V -80 80 80 -80 Type V’ 80 -80 -80 80 Type VIa1 -60 120 -90 0 cis-P Type VIa2 -120 120 -60 0 cis-P Type VIb -135 135 -75 160 cis-P Type VIII -60 -30 -120 120 Turn 70 to 85 -60 to -70 Inverse Turn -70 to -85 60 to 70 g-turns Protein Structures from an NMR Perspective Secondary Structure: b-turns ‒ Illustration of the Type I & II b-turns and mirror images Hydrogen Bond Protein Structures from an NMR Perspective Secondary Structure: g-turns ‒ Illustration of the classical and inverse g-turn Hydrogen Bond Protein Structures from an NMR Perspective Secondary Structure: Turns ‒ Amino acids preference for turns Protein Structures from an NMR Perspective SuperSecondary Structure: – arrangements of two, three or more consecutive secondary structures ► ► ► a-helices or b-strands Common features in many different proteins Completely different amino acid sequences (a) βαβ - two parallel strands of β-sheet connected by a stretch of α-Helix (b) αα - two anti-parallel α-helices (c) β meander - an anti-parallel sheet formed by a series of tight reverse turns connecting stretches of a polypeptide chain (d) Greek Key –a repetitive super-secondary structure formed when an anti-parallel sheet doubles back on itself Protein Structures from an NMR Perspective SuperSecondary Structure: – Coiled coils ► ► 2 or more a-helices Contains a heptad repeat (H –hydrophobic; P – polar): a b c d e f g (H P P H P P P)n – – – Leucine zippers – leucine in d position N is ≥ 3 Knob (a and d) into hole interactions Knobs Moutevelis and Woolfson (2009) J. Mol. Biol. 385:726 Protein Structures from an NMR Perspective SuperSecondary Structure: – Coiled coils ► Periodic Table – – – – ► Number of coils increase to right – – ► Leucine zippers – leucine in d position N is ≥ 3 Knob into hole interactions (KIH) Population and percentage of occupancy Circle helix, lines KIH, grey hydrophobic core Population and percentage of occupancy below each architecture. Complexity increases down column – – Helix shared between two helix coiled coils Interface between 2 or more coiled coils Protein Structures from an NMR Perspective SuperSecondary Structure: – – Coiled coils Diversity of structures Kohn et al. (1997) J. Biol. Chem. 272:2583 Protein Structures from an NMR Perspective SuperSecondary Structure: – Coiled coils ► Packing angle (W) and axial separation – – Angle between two helices Shortest distance between the two helices Walther et al. (1998) PROTEINS 33:457 Protein Structures from an NMR Perspective SuperSecondary Structure: – Coiled coils ► Average axial separation differ for transmembrane and soluble coiled coils – – – Solution: 9.6 Å Transmembrane: 9.0 Å. Two clusters ate 7.3 Å and 10.8 Å Transmembrane coiled coils are more compact contain shorter amino acids (Gly) Eilers et al. (2000) PNAS 97:5796 Protein Structures from an NMR Perspective SuperSecondary Structure: – Coiled coils ► Average axial separation varies linearely with amino-acid volumes – Size (volume) of residues at helix contact Protein Structures from an NMR Perspective SuperSecondary Structure: – Coiled coils ► Packing angle (W) – distribution Preferential angles are: W ~ -45o W ~ +23o W ~ +75o Bowie (1997) Nature Structural Biology 4:915 Protein Structures from an NMR Perspective SuperSecondary Structure: – Coiled coils ► Packing angle (W) – – Depends on geometry of hydrophobic residues Steric compatibility alone defines packing angle Heptad repeat Side by side 11-residue repeat long Heptad repeat face to face 11-residue repeat normal W ~ 20o W ~ 20o W ~ 0o to -10o W ~ -30o to -40o Efimov (1999) FEBS Letters 463:3 Protein Structures from an NMR Perspective Tertiary Structure: – the three-dimensional folding of the polypeptide chain to assemble the different secondary structure elements in a particularly arrangement in space Protein Structures from an NMR Perspective Tertiary Structure: – – Periodic table of Protein Folds Set of idealized structures Experimental structures are compared to idealized set to find best match and classification Basis Set- most biologically important protein structures are derived from these idealized structures a-helix b-strand Taylor (2002) Nature 416:657 Protein Structures from an NMR Perspective Tertiary Structure: – – Periodic table of Protein Folds Set of idealized structures – looking edge on 4- layers thick Small Circles - helix Bars - b-sheet Arc – curved b-sheet Open circle – b-barrel Protein Structures from an NMR Perspective Tertiary Structure: – the three-dimensional folding of the polypeptide chain to assemble the different secondary structure elements in a particularly arrangement in space ► ► ~800 unique folds have been identified 1,000 – 5,000 protein folds are predicted SCOP: Structural Classification of Proteins. 1.75 release http://scop.mrc-lmb.cam.ac.uk/scop/index.html 38221 PDB Entries (23 Feb 2009). 110800 Domains. 1 Literature Reference (excluding nucleic acids and theoretical models) Class Number of folds Number of superfamilies Number of families All a proteins All b proteins a and b proteins (a/b) a and b proteins (a+b) Multi-domain proteins Membrane and cell surface proteins Small proteins Total 284 174 147 376 66 507 354 244 552 66 871 742 803 1055 89 58 110 123 90 1195 129 1962 219 3902 Protein Structures from an NMR Perspective Tertiary Structure: Family: Clear evolutionarily relationship Proteins clustered together into families are clearly evolutionarily related. Generally, this means that pairwise residue identities between the proteins are 30% and greater. However, in some cases similar functions and structures provide definitive evidence of common descent in the absense of high sequence identity; for example, many globins form a family though some members have sequence identities of only 15%. Superfamily: Probable common evolutionary origin Proteins that have low sequence identities, but whose structural and functional features suggest that a common evolutionary origin is probable are placed together in superfamilies. For example, actin, the ATPase domain of the heat shock protein, and hexakinase together form a superfamily. Fold: Major structural similarity Proteins are defined as having a common fold if they have the same major secondary structures in the same arrangement and with the same topological connections. Different proteins with the same fold often have peripheral elements of secondary structure and turn regions that differ in size and conformation. In some cases, these differing peripheral regions may comprise half the structure. Proteins placed together in the same fold category may not have a common evolutionary origin: the structural similarities could arise just from the physics and chemistry of proteins favoring certain packing arrangements and chain topologies. Protein Structures from an NMR Perspective Tertiary Structure: – Classifying protein structures is not straightforward or definitive ► CATH v3.2 Multiple equally valid approaches http://www .cathdb.info/ Mainly a (1) 5 386 875 2917 37038 Mainly b (2) 20 229 520 2618 43881 Mixed a b (3) 14 594 1113 6183 90029 Few Secondary Structures (4) 1 104 118 208 2588 40 1313 2626 11926 173536 Totals CATH assigns each protein domain to a four number code based on its class (C), architecture (A), topology (T), and homologous super family (H). Example: chain A from PDB ID: 1kbl is assigned a CATH code of 1.20.80.30 class: ................................................... 1 – mainly alpha architecture: ........................................ 20 – Up-down bundle topology:............................................. 80 – Acyl-CoA Binding Protein homologous super family: ................. 30 – no description Protein Structures from an NMR Perspective CATH is a novel hierarchical classification of protein domain structures, which clusters proteins at four major levels: Class ( ) derived from secondary structure content, is assigned for more than 90% of protein structures automatically. Architecture ( ) describes the gross orientation of secondary structures, independent of connectivities, is currently assigned manually. Topology ( ) clusters structures according to their toplogical connections and numbers of secondary structures are made by sequence and structure comparisons. Homologous super family ( ) cluster proteins with highly similar structures and functions are made by sequence and structure comparisons. Other Levels: Sequence Family ( ): cluster proteins based on sequence identity ≥35%, nearly always have identical structure Non-Identical ( ) : cluster proteins based on sequence identity ≥95% Identical ( ): numerous cases where the protein structure based on the identical sequence has been deposited into the PDB. Domain ( ): semi-independent folding unit Protein Structures from an NMR Perspective Tertiary Structure: Some Common Examples Mainly a (4-helix bundle) Mixed a/b (a/b-barrel) Mainly b (b-sandwich) Minimal Secondary Structure (Kringle Domain) Protein Structures from an NMR Perspective Tertiary Structure: Continuity of Fold Space Common Structure Core (dark grey) for 3.40.50.300 But, different members also have different structural elements (clear) Making distinct fold classification is difficult and problematic because of fold overlap and divergence Cuff et al. (2009) Structure 17:1051 Protein Structures from an NMR Perspective Tertiary Structure: – – Similarity to existing protein fold increases likelihood of accuracy of structure determination ► All structural folds are not known ► May have a novel fold Structures generally not observed ► Knot structure are highly unlikely ► Previously thought not to exist. View a knot in a structure to be suspect Need to be indisputably verified by experimental data Knot Structure in acetohydroxy acid isomeroreductase simple trefoil knots Protein Structures from an NMR Perspective Tertiary Structure: – RCSB Protein Data Bank (PDB) (http://www.rcsb.org/pdb/home/home.do) ► Database of all known NMR and X-ray Protein structures (> 93,970 structures) ► Includes DNA, RNA, small molecule ligands and peptide complexes Protein Structures from an NMR Perspective Protein Stability and Folding: – – Levinthal paradox (Journal de Chimie Physique et de Physico-Chimie Biologique (1968) 65: 44–45) ► Chain with 101 amino acids ► three permissible conformations per residue pair ► 3100 or 5x1047 configurations! ► Sample new configuration at the rate of 1013/sec it will take 1027 years ► Proteins do not fold by random search Anfinsen dogma (Science (1973) 181: 223) ► Thermodynamic hypothesis – native structure is the one in which DG of the whole system is the lowest ► Native conformation determined by the totality of interatomic interactions Amino acid sequence determines native structure Regions of native structure form that act as nucleation sites Protein Structures from an NMR Perspective Protein Stability and Folding: – Apply Energy Bias to Search for Native Structure (PNAS (1992) 89: 20-22) ► N – number of amino acids, ko & k1 rate to form incorrect and correct local conformation ► Rates ~ 109 s-1 Simple model, depends on choice of magnitude and ratio of rates Changes in k1 shifts the graph vertically DG of typical globular protein is ~ -5 to -15 kcal/mol 1 kT = 0.593 kcal/mol Protein Structures from an NMR Perspective Protein Stability and Folding: – Water is a Poor Solvent for Unfolded or Denatured Proteins ► If wasn’t true proteins wouldn’t fold! ► Specifically, water is a poor solvent for protein backbone – Protein Energetics ► Hydrophobic effect contributes ~ -8 kJ/mol per buried residue or ~ -1.912 kcal/mol ► Hydrogen bonding contributes ~ controversy ► Destabilizing: DG = +3.1 kcal/mol (Biochemistry (1990) 29: 7133) Partially stabilizing and destabilizing DG = -1 kcal/mol (J. Mol. Biol. (1999) 293:283) Important driving force 40 cal/mol per residue (J. Biol. Chem. (2003) 278:31790) Intramolecular hydrogen bonds are marginally favored over water:backbone hydrogen bonds ► -0.9 kcal/mol per buried C atom Generally accepted view: DG = -1 kcal/mol Different dielectric values Difficult to measure hydrogen bond energies heavily dependent on model. Chem. Rev. (1997), 97:1251-1267. System Dielectric Alkanes 1-2 Alkenes 2-3 Dry protein 2 Slightly wet protein 4 water 80 Protein Structures from an NMR Perspective Protein Stability and Folding: – TMAO Hydrogen bond contribution to native structure ► Osmolytes interact with protein backbone betaine sucrose trehalose sarcosine sorbitol proline glycerol urea guanidine Colored by water polarity Larger energetic effect for transfer backbone into stabilizing osmolyte PNAS (2006), 103:13997-14002. Chem. Rev. (1997), 97:1251-1267. Protein Structures from an NMR Perspective Protein Stability and Folding: – Buried Salt-Bridges in Folded Protein Structures ► On average, buried salt bridges are energetically favorable ► ► ► Note: high desolvation energy and high standard deviation dslv – unfavorable desolvation energies brd – favorable bridge energy due to electrostatic interactions of side-chains prt – favorable electrostatic interaction of salt-bridge with rest of protein assoc – does not consider electrostatic interaction with rest of protein 66 buried salt bridges and 156 exposed salt bridges 190 out of 222 are stabilizing (55 buried salt bridges) 32 out of 222 are destabilizing (11 buried salt bridges) J. Mol. Biol. (1999), 293:1241-1255. Protein Structures from an NMR Perspective Protein Stability and Folding: – Salt Bridges Tend to be Tightly Clustered ► Most are separated by 5 or less residues J. Mol. Biol. (1999), 293:1241-1255. Protein Structures from an NMR Perspective Protein Stability and Folding: – Ionizable residues Tend to be on the Outer Protein Surface ► Can interact with water ► Interior of the protein is hydrophobic ► Are rarely buried – high DG of solvation 2.8 times as many buried acids as bases 1% DDGrxn > 5 DpK Defined as buried 20% ASA Accessible surface area (ASA) compared against DDGrxn 6.8 kcal/mol ~ DpKa of 5 pH units JMB (2005), 348:1283-1298. Protein Structures from an NMR Perspective Protein Stability and Folding: – Ionizable residues Tend to be on the Outer Protein Surface ► Can interact with water ► Interior of the protein is hydrophobic ► Are rarely buried – high DG of solvation 2.8 times as many buried acids as bases > 5 DpK Distribution of buried charged residues decreases rapidly. 6.8 kcal/mol ~ DpKa of 5 pH units JMB (2005), 348:1283-1298. Protein Structures from an NMR Perspective Protein Stability and Folding: – When an apolar residue is replaced by a charged residues ► Increased tendency to be exposed and solvated ► Involved in a salt-bridge or hydrogen bond interaction ► At pH 7, still ionized JMB (2005), 348:1283-1298. PROTEINS (2003), 53:783-791. Plot of average depth of buried apolar residues, averaged for every 1Å interval of depth, and the normalized occurrence for the cases when they are substituted by buried charged residues and by exposed charged residues. Protein Structures from an NMR Perspective Protein Stability and Folding: (PNAS (2008), 105:17784-17788.) – Recent Analysis Suggest Proteins have a High Tolerance for Buried Ionizable Residues ► 25 internal positions in S. aureus nuclease were substituted with Arg, Lys Glu & ASP ► No loss in activity or changes in structure 86 of 87 changes were destabilizing pKa of buried ionizable residues were shifted to be neutral at physiological pH Decrease in stability of mutant versus wild-type protein DG S. aureus nuclease – 12.5 kcal/mol Difference in the above stability curves Protein Structures from an NMR Perspective Protein Stability and Folding: – Recent Analysis Suggest Proteins have a High Tolerance for Buried Ionizable Residues ► 25 internal positions in S. aureus nuclease were substituted with Arg, Lys Glu & ASP ► No loss in activity or changes in structure 86 of 87 changes were destabilizing pKa of buried ionizable residues were shifted to be neutral at physiological pH86 of Protein interior is less polarizable and polar compared to water. Difference in the thermodynamic stability curves Similarity between the two curves implies a pKa shift of Lys Thin lines are Lys titration curves for pKa of 6.0 and 10.4. Solid line is the area between the two titration curves Protein Structures from an NMR Perspective Protein Stability and Folding: – Transfer of non-polar compound into polar solvent is highly unfavorable. – Protein Thermodynamics Determined by Chain Length ► Surface area increases with number of residues ► Burying hydrophobic residues is favorable Chem Rev. (1997), 97:1251-1267. Biochemistry (1990), 29:7133-7155. Protein Structures from an NMR Perspective Protein Stability and Folding: – Protein Thermodynamics Determined by Chain Length ► All thermodynamic parameters increase linearly with number of residues ► Protein stability increase with number of residuesc Heat Capacity (Cp) Enthalpy (H) Entropy (S) Chem Rev. (1997), 97:1251-1267. Protein Structures from an NMR Perspective Protein Stability and Folding: – Formation of Secondary Structure Elements are Favored as Compactness Increases ► Radius of Gyration – measure of structure compactness where N is the number of atoms and r is the position vector Or where N is the number of residues – Unique Native Conformation Corresponds to Optimal Arrangement of Hydrogen Bonds Biochemistry (1990), 29:7133-7155. Protein Structures from an NMR Perspective Protein Stability and Folding: – Mechanism of Protein folding ► Hydrophobic collapse ► Re-organization into native state to optimize alignment of hydrogen bonds unfolded Structural collapse Blue spheres indicate fully solvated regions (water molecules) Native Curr. Opin. Struct. Biol. (2004), 14:70-75. Protein Structures from an NMR Perspective Protein Stability and Folding: – Mechanism of Protein folding ► Hydrophobic collapse ► Re-organization into native state to optimize alignment of hydrogen bonds Curr. Opin. Struct. Biol. (2004), 14:70-75. Protein Structures from an NMR Perspective Protein Stability and Folding: – Mechanism of Protein Folding, Protein Assocation and Aggregation are Closely Related ► Aggregation propensity best prediction of protein-protein interfaces ► Chaperone’s play an important role in folding a sub-class of proteins Nat. Struct. Mol. Biol. (2009), 16:574-581. Protein Structures from an NMR Perspective Protein Stability and Folding: – Aggregation propensity best prediction of protein-protein interfaces ► Amino acids physico-chemical properties influence aggregation Hydrophibicity Charge a-helix & b-strand propensity Nature (2003), 424:805-808. Protein Structures from an NMR Perspective Protein Stability and Folding: – Aggregation propensity best prediction of protein-protein interfaces ► Sequence based prediction of aggregation Increasing aggregation propensity JMB (2005), 350:379-392. Protein Structures from an NMR Perspective Quaternary Structure: – Complexes of 2 or more polypeptide chains held together by noncovalent forces but in precise ratios and with precise three-dimensional configuration ► Homo- multiple repeats of the same protein ► Hetero- combinations of different proteins Homo-multimers Hetero-multimers