Computational Method for Predicting Amyloidogenic Sequences Bill Welsh UMDNJ- Robert Wood Johnson Medical School welshwj@umdnj.edu Amyloid Fibril Formation A Common Mechanism for Protein Misfolding Diseases • Numerous amyloid & misfolding diseases • All of them are incurable at present • Short list of more familiar examples – – – – – – Alzheimer’s disease Parkinson’s disease Huntington’s disease Crutzfeld-Jakob disease (“Mad Cow”) Familial Amyloidosis Type II Diabetes • Triggered by short sequences that convert from native a-helix or coil to b-strand • We call this trait ‘hidden b-strand propensity’ Problems 1. No sequence specificities 2. Absence of detailed structural information on misfolded proteins (amyloid fibrils) Our Solution 1. Misfolding process is triggered by short (5-7 residue) sequences 2. Redefine sequence-structure relationships in terms of tertiary context 3. Identify short sequences that exhibit non-native (hidden) b-strand propensity [HbP]. Intriguing Relationship Between Tertiary Contacts and Secondary Structure Relative Occurrence of Secondary Structure Elements in Different Tertiary Contact States Tertiary Contact (TC) Two non-H atoms 4Å apart separated by more than 4 residues in sequence Secondary structure Tertiary contacts Coil a b Total sequences Low 38 % 59 % 3% 191,300 Medium 47 % 37 % 16 % 112,199 High 39 % 11 % 50 % 150,288 All 41 % 38 % 21 % 453,787 Based on SCOP20v1.57 Striking Conclusion a-helix dominates in low-TC regions b-sheet dominates in high-TC regions TC Influence on Secondary Structure Propensity a-helix propensity of b-strands increases sharply at low TCs 0.9 0.9 Fragments predicted as helix helix propensity fragments predicted as beta-strand b-strand propensity β-strand propensity of helices increases sharply at high TCs 0.8 0.7 0.6 0.5 0.4 0.3 Helix 0.2 Coil 0.1 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.8 0.7 0.6 0.5 0.4 beta-strand Coil 0.3 0.2 0.1 0 0 0.2 0.4 0.6 TC 0.8 1 1.2 1.4 1.6 1.8 2 TC Average Tertiary Contacts (TCs) in SCOP20 L S 9.9 8.3 A Q E 6.4 10.2 8.6 K 8.0 D N R I F T M P 9.1 10.6 15.9 11.0 18.5 9.3 11.9 7.0 G W C V Y H 6.5 25.9 12.7 10.1 21.7 16.2 The CSSP Algorithm: Locating Sequences Exhibiting HbP Amyloid fibrils from myoglobin SCOP20 Sequences 3D Structure from PDB DSSP Sec Str Tertiary Contact (TC) Database of >450,000 7-residue sequences with secondary structure & TCs Fandrich et al., 2001 Nature Sequence of hidden b-propensity Low TC P(a|low) High TC Query Sequence -Q–E–V–L–I–R–Lsliding 7-residue window Similar Sequences … P(b|high) A G HGQ E V L I R L F T G H P E T L… W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W PHD prediction of secondary structure Amino acid|… PHD |… P(a) |… P(b) |… …AGHGQEVLIRLFTGHPETL… …HHHHHHHHHHHHHH HHH… …7756899999999623469… …0000000000000000000… …| …| …| …| Sensitivity of the CSSP Method Cameleon Sequences 1AMP Aminopeptidase 1GKY Guanylate kinase ASVKQVS in a-helix ASVKQVS in b-sheet Query local sequence Resident protein HbP prediction (0-10 scale) Native secondary structure Tertiary Contacts (TC) P(a) P(b) P(Coil) PDB ID Name 1AMP Aminopeptidase b strand 1.3 2 7 1 1GKY Guanylate kinase a helix 0.4 8 1 1 ASVKQVS Hidden β-propensity in Alzheimer’s Disease KLVFF are key residues in amyloid fibril polymerization (Tjernberg et al., JBC 1996) Amyloidogenic wild type Aβ fragment Helix Beta Coil Non-amyloidogenic mutant Aβ fragment Propensity Strong Moderate Weak Very weak Yoon and Welsh, Protein Science (2004); ibid., Proteins (2005) hIAPP sequence (Type 2 Diabetes) -NFLVH-FLVHSMazor et al., JMB (2002) -NFGAILZanuy, Nussinov, et al. Biophysical Journal (2003) hIAPP sequence (4-34) associated with type II diabetes NAC sequence (Parkinson’s disease) VTNVGGAVVTGVTAVA VTGVTAVAQKTV GAVVTGVTAVA Bodles et al., J Neurochem (2001) NAC sequence of α-synuclein associated with Parkinson’s disease Beta propensity of acetylcholinesterase (AChE) and its homolog butyrylcholinesterase (BuChE) Cottingham et al., Biochemistry (2002); ibid., (2003): AChE586-599 and BuChE573-586 Amyloidogenic AChE586-599 fragment Nonamyloidogenic BuChE573-596 fragment Amyloid Formation by G334V Mutant p53 Associated with Lung Cancer Higashimoto et al, Biochemistry 45, 1608-1619 (2006) Amyloidogenic Sequence Knowledge Base (ASKB) CSSP Algorithm that predicts “Hidden” b-Strand Propensity in Proteins & Polypeptides Searchable peptide database http://askb.umdnj.edu/askb/welcome.html Estimating Free Energies Unfolded Partially Folded Ga Gb Gcoil a-helix Ga b b-strand Gamyloid b-rich amyloid Gcoilb Random coil Ghidden b Gab Gcoilb Gamyloid Ghidden b Ghidden b RT log K b RT (log Ka b log K coilb ) Ghidden b Pb Pb RT log log P P a coil Pb2 RT log P P a coil Predicted vs. Expt’l b-Sheet Structure of Prion Protein Peptide • Decatur and coworkers employed FTIR spectroscopy to determine % b-sheet structure for peptides based on residues 109-122 of the Syrian hamster prion protein (H1) substituted at position 117. • We plotted our calculated HbP metrics for the sequences H1, A117G, A117V, A117L, and A117I vs. Decatur’s expt’l values. • Strong correlation (R2=0.96) suggests that calculated HbP profiles are excellent predictors of b-sheet nature. SA Petty, T Thorsteinn, & SM Decatur, Biochemistry 44:4720-4726 (2005) General Observations and Implications The CSSP algorithm successfully pinpoints amyloidogenic sequences in numerous examples where expt’l data are available These sequences possess hidden b-strand propensity generally short sequences (4-7 residues) that serve as ‘core nucleation motifs’ to trigger amyloid fibril formation adopt a-helix in low contact regions (low TC) and b-strand in high contact regions (high TC) These sequences are conformationally ambivalent interconvertible between a-helix and b-strand highly sensitive to tertiary environment generally contain hydrophobic, aromatic residues (Phe, Trp, Tyr) consistent with recent findings: Rojas Quijano et al Biochemistry (2006) Ability to form amyloid is a generic trait of all proteins Thank You! welshwj@umdnj.edu