Secondary Structure Assignment from Structure PHAR 201/Bioinformatics I Philip E. Bourne Department of Pharmacology, UCSD Reading Chapter 19 Structural Bioinformatics PHAR 201 Lecture 05, 2012 1 Agenda • Why secondary structure assignment is important • Hydrogen bonding models • DSSP (Kabsch-Sander) and its impact • Other methods • Conclusions PHAR 201 Lecture 05, 2012 2 Reminder - Dihedral Angles From http://www.imb-jena.de phi psi omega - dihedral angle about the N-Calpha bond dihedral angle about the Calpha-C bond dihedral angle about the C-N (peptide) bond PHAR 201 Lecture 05, 2012 3 Reminder - Helices phi(deg) psi(deg) H-bond pattern -----------------------------------------------------------------right-handed alpha-helix -57.8 -47.0 i+4 pi-helix -57.1 -69.7 i+5 310 helix -74.0 -4.0 i+3 (omega is ~180 deg in all cases) ----------------------------------------------------------------From http://www.imb-jena.de PHAR 201 Lecture 05, 2012 4 Reminder - Beta Strands phi(deg) psi(deg) omega (deg) -----------------------------------------------------------------beta strand -120 120 180 ----------------------------------------------------------------- Hydrogen bond patterns in beta sheets. Here a four-stranded beta sheet is drawn schematically which contains three antiparallel and one parallel strand. Hydrogen bonds are indicated with red lines (antiparallel strands) and green lines (parallel strands) connecting the hydrogen and receptor oxygen. From http://broccoli.mfn.ki.se/pps_course_96/ PHAR 201 Lecture 05, 2012 5 Why is consistent secondary structure assignment from structure important? • Part of the fold and domain • Useful conceptualization for understanding structure • Influences the sequence alignment • It is related to function • It is useful as part of structure prediction – defines regions on the templates • As a training set in machine learning algorithms • Consistency of searching – author’s assignments differ PHAR 201 Lecture 05, 2012 6 150 Ilk____PSS .......... Ilk____Seq .......... -----------1fmk--_Seq KHADGLCHRL 1fmk--_SS HCCCCCCCCC 200 Ilk____PSS EEEECCCCE. Ilk____Seq WKGRWQGND. ------------ W+G+W-G+1fmk--_Seq WMGTWNGTTR 1fmk--_SS EEEEECCCEE Ilk____PSS Ilk____Seq -----------1fmk--_Seq 1fmk--_SS 250 EECCCCEEEE CQSPPAPHPT ++++P -VSEEP...IY ECCCC...EE Ilk____PSS Ilk____Seq -----------1fmk--_Seq 1fmk--_SS 300 HHHCCCCCEE FLHTLEPLIP ++++--- YVERMNY..V HHHHHCC..C Ilk____PSS Ilk____Seq -----------1fmk--_Seq 1fmk--_SS 350 HHHHHHCCCC APEALQKKPE APEA++++APEAALYGR. CHHHHHHCC. *** .......... ........CC ....CEEEHH .......... ........FK ....QLNFLT -+ +L-+++ TTVCPTSKPQ TQGLAKDAWE IPRESLRLEV CEECCCCCCC CCCCCCCCCE CCHHHEEEEE 200 HHCCCCCCEE KLNENHSGEL KL-+---GEKLGQGCFGEV EEEECCCEEE * * * 250 EEEEEEECCC IVVKVLKVRD +-+K+LKVAIKTLKP.. EEEEEECC.. * CCCCCHHHHH WSTRKSRDFN +T+++-+F.GTMSPEAFL .CCCCHHHHH HHHHHHHHHC EECPRLRIFS +E---++-++ QEAQVMKKLR HHHHHHHHCC * EEHHHHCCCC LITHWMPYGS ++T--M++GS IVTEYMSKGS EEEECCCCCE HHHHHHCCCC LYNVLHEGTN L-++L-+-T+ LLDFLKGETG HHHHHCCCCC CCCCHHHHHH FVVDQSQAVK --+--+Q-V+ KYLRLPQLVD CCCCHHHHHH CCCCCCCCEE RHALNSRSVM ---L-+++++ HRDLRAANIL CCCCCHHHEE * * Cat. Loop ECCCCEEEEC IDEDMTARIS ++E+-+++++ VGENLVCKVA EECCCEEEEC CCCCEEECCC MADVKFSFQC ---+-DFGLAR.... CCCCCC.... * CCCCEEEEEE DTNRRSADMW ---++D+W ..FTIKSDVW ..CCHHHHHH EEHHHHHHHH SFAVLLWELV SF++LL+EL+ SFGILLTELT HHHHHHHHHH H.CCCCCCCC T.REVPFADL T -+VP+-++ TKGRVPYPGM CCCCCCCCCC CCCEEEEEEE HPNVLPVLGA H++++-++++ HEKLVQLYAV CCCECCEEEE Example where secondary structure is important •“Integrin-linked kinase” (Ilk) is a novel protein kinase fold with strong sequence similarity to known structures (Hannigan et al. 1996 Nature 379, 91-96) 300 HHHHHHHHHH FALDMARGMA +A+++A+GMA MAAQIASGMA HHHHHHHHHH •Aligns to Src kinases with BLAST e-value of 10-19 and 27% identity (alignment shown is to a known Src kinase structure) 350 CCCCCCCCCC PGRMYAPAWV +---W....FPIKWT ....CCHHHC •Several key residues are conserved, but residues important to catalysis, including catalytic Asp, are missing 400 CHHHHHHHHH SNMEIGMKVA +N-E+-++V VNREVLDQV. CHHHHHHHH. PHAR 201 Lecture 05, 2012 •Recent experimental evidence suggests that Ilk lacks kinase activity (Lynch et al. 1999 Oncogene 18, 8024-8032) 7 History of Assignment • Originally left to the interpretation of the structural biologist – inconsistent • 1983 - the Kabsch- Sander algorithm was written as an aid in secondary structure prediction – the program as such never emerged – what did emerge is perhaps the most consistent and accepted algorithm in all of structural bioinformatics • Assignments are embodies in the DSSP algorithm and associated database of assignments PHAR 201 Lecture 05, 2012 8 Inconsistent Author Assignment PHAR 201 Lecture 05, 2012 9 Hydrogen Bonding is Key to Automated Methods • Why? - ~90% of backbone donors (NH) and acceptors (C=O) form hydrogen bonds • 62% are intra-backbone • Basic definition – Angle N – (H) – O greater than 120 degrees – H …O less than 2.5A – Note H’s not usually identified directly PHAR 201 Lecture 05, 2012 10 Hydrogen Bond - Definition PHAR 201 Lecture 05, 2012 11 Coulomb Hydrogen Bond Calculation – used by DSSP 1 1 1 1 + - E = f + + + rNO rHC' rHO rNC' • • • • f is a constant 332 Å kcal/e2 Delta is the + and – polar charge in electrons Weakest H-bond –0.5 kcal/mole in DSSP H not given – requires extrapolation – note assumes planar geometry for peptide bond PHAR 201 Lecture 05, 2012 12 DSSP – Dictionary of Secondary Structures of Proteins • Defined solely based on the H-bonds given – from the list of bonds and residues that form them; helix assignments are made as follows: – Alpha helix (H): start i -> i+4; end i-4 -> i – 310 helix (G): start i -> i+3; end i-3 -> I – Pi helix (I): start i -> i+5 PHAR 201 Lecture 05, 2012 13 DSSP – Dictionary of Secondary Structures of Proteins • Similarly for beta sheets: – Residues (E) – have 2 Hbonds in the sheet or are surrounded by 2 H-bonds – Isolated residues (B) beta bridge 1GCS – Beta bulges also assigned E – may exist as up to 4 on one side of sheet and 1 on the other PHAR 201 Lecture 05, 2012 14 DSSP Nomenclature • • • • • • • • H – alpha helix G = 310 helix I = Pi helix B = bridge – single residue sheet E = extended beta strand T = beta turn (example) S = bend C = coil PHAR 201 Lecture 05, 2012 15 Converse Situation? • In our discussions of structure comparison and alignment, structure classification and (soon) domain assignment we learnt there was not one generally accepted method • DSSP has for a long time been a generally accepted method PHAR 201 Lecture 05, 2012 16 DSSP as Implemented in the PDB 1ATP PHAR 201 Lecture 05, 2012 17 STRIDE – Empirical Hydrogen Bond Calculation E hb E r E t E p 4 r 6 8 3 r m m E r 6 8 E m r r E p = co s2 () [0.9 + 0 .1 sin(2 it)] co s(to ) E t = K1 [K2 - co s2 (t i ) ] cos(to) 0 0 < t i 9 0 90 < t i 1 10 11 0 t i - Derived from small molecule structures rm (3.0A) and Em (-2.8kcal/mole) - Total energy Ehb PHAR 201 Lecture 05, 2012 18 STRIDE – Empirical Hydrogen Bond Calculation • Uses Ehb and phi-psi torsional angle criteria • Torsional angles define secondary structures according to the regions of the Ramachandran plot in which they fall • E is ignored if phi and psi are unfavorable PHAR 201 Lecture 05, 2012 19 Comparison DSSP & STRIDE PHAR 201 Lecture 05, 2012 20 DSSP vs STRIDE • Stride – added term in the expression of hydrogen bond energy • Stride - Selection of terminal residues through reliance on torsional angles • Stride – stresses planarity of hydrogen bonds while allowing longer bonds PHAR 201 Lecture 05, 2012 21 Other Methods • DEFINE – uses a distance criteria between Calpha atoms which varies slightly for each secondary structure type; allows modifications for curvature • P-Curve – analysis of protein curvature – compares to ideal motifs – unknown motif defined by tilt, roll etc between peptide planes. PHAR 201 Lecture 05, 2012 22 Comparative Notes • The last residues of a sheet or a helix are often still in the same conformation, although they no longer have hydrogen bonds in the structure. This translates to the observation that ends (caps) of regular secondary structure segments are not well defined. • It seems that Ca-distance criteria (applied in DEFINE) alone can accommodate considerable distortion of the backbone, giving an excess of secondary structure assignments despite having reduced e considerably. • DSSP is the only assignment scheme with a large peak for a-helices of four residues, many of which constitute single helical turns. • DEFINE assigns more than twice as many sheets of length four than the other methods. • P-Curve has a tendency to assign overly long elements of regular secondary structure. PHAR 201 Lecture 05, 2012 23 Amino Acid Propensities Indicate the Role of Side Chains in Defining Secondary Structure – Basis of Prediction Methods – Note that none of the assignment methods use this • Alpha helices – rich in ALA, LEU; poor in PRO and GLY • Beta sheets – rich in VAL, ILE; poor in GLY, ASP, PRO • 310 – rich in PRO; poor in ALA, LEU • Beta bridges – poor in VAL, ILE PHAR 201 Lecture 05, 2012 24 Newer Methods DSSPcont • Use known alignments from multiple 3D structures or from multiple members of the NMR ensemble (DSSPcont) • Consensus based approach PHAR 201 Lecture 05, 2012 25 Supersecondary Structures http://en.wikipedia.org/wiki/Meander_(art) http://en.wikipedia.org/wiki/Zinc_finger Zinc Finger Motif PHAR 201 Lecture 05, 2012 26 I-sites (Baker) • I-sites – specific segments with common amino acid propensities • Used by Rosetta to predict structure – perhaps the most successful method thus far • Note considers only main chain hydrogen bonds – much of the tertiary structure is associated with side chain interactions PHAR 201 Lecture 05, 2012 27 Summary • DSSP remains the first and most popular approach • STRIDE may have been developed as part of the EMBL …. • DSSP has been coded a number of times from the paper often with different results – open source helps this today • DSSP is perhaps the most accepted algorithm in all of structural bioinformatics • It is not always clear whether the secondary structure assignments deposited with a structure are from DSSP or from the authors view • Consistent searching requires that DSSP be used for all structures – early structures had no author assignments PHAR 201 Lecture 05, 2012 28