Secondary Structure Assignment from Structure PHAR 201/Bioinformatics I Philip E. Bourne

advertisement
Secondary Structure
Assignment from Structure
PHAR 201/Bioinformatics I
Philip E. Bourne
Department of Pharmacology, UCSD
Reading Chapter 19 Structural
Bioinformatics
PHAR 201 Lecture 05, 2012
1
Agenda
• Why secondary structure assignment is
important
• Hydrogen bonding models
• DSSP (Kabsch-Sander) and its impact
• Other methods
• Conclusions
PHAR 201 Lecture 05, 2012
2
Reminder - Dihedral Angles
From http://www.imb-jena.de
phi
psi
omega
-
dihedral angle about the N-Calpha bond
dihedral angle about the Calpha-C bond
dihedral angle about the C-N (peptide) bond
PHAR 201 Lecture 05, 2012
3
Reminder - Helices
phi(deg)
psi(deg) H-bond pattern
-----------------------------------------------------------------right-handed alpha-helix
-57.8
-47.0
i+4
pi-helix
-57.1
-69.7
i+5
310 helix
-74.0
-4.0
i+3
(omega is ~180 deg in all cases)
----------------------------------------------------------------From http://www.imb-jena.de
PHAR 201 Lecture 05, 2012
4
Reminder - Beta Strands
phi(deg)
psi(deg)
omega (deg)
-----------------------------------------------------------------beta strand
-120
120
180
-----------------------------------------------------------------
Hydrogen bond patterns in beta sheets. Here a four-stranded
beta sheet is drawn schematically which contains three
antiparallel and one parallel strand. Hydrogen bonds are
indicated with red lines (antiparallel strands) and green lines
(parallel strands) connecting the hydrogen and receptor oxygen.
From http://broccoli.mfn.ki.se/pps_course_96/
PHAR 201 Lecture 05, 2012
5
Why is consistent secondary
structure assignment from structure
important?
• Part of the fold and domain
• Useful conceptualization for understanding
structure
• Influences the sequence alignment
• It is related to function
• It is useful as part of structure prediction –
defines regions on the templates
• As a training set in machine learning algorithms
• Consistency of searching – author’s
assignments differ
PHAR 201 Lecture 05, 2012
6
150
Ilk____PSS
..........
Ilk____Seq
..........
-----------1fmk--_Seq
KHADGLCHRL
1fmk--_SS
HCCCCCCCCC
200
Ilk____PSS
EEEECCCCE.
Ilk____Seq
WKGRWQGND.
------------ W+G+W-G+1fmk--_Seq
WMGTWNGTTR
1fmk--_SS
EEEEECCCEE
Ilk____PSS
Ilk____Seq
-----------1fmk--_Seq
1fmk--_SS
250
EECCCCEEEE
CQSPPAPHPT
++++P
-VSEEP...IY
ECCCC...EE
Ilk____PSS
Ilk____Seq
-----------1fmk--_Seq
1fmk--_SS
300
HHHCCCCCEE
FLHTLEPLIP
++++--- YVERMNY..V
HHHHHCC..C
Ilk____PSS
Ilk____Seq
-----------1fmk--_Seq
1fmk--_SS
350
HHHHHHCCCC
APEALQKKPE
APEA++++APEAALYGR.
CHHHHHHCC.
***
.......... ........CC ....CEEEHH
.......... ........FK ....QLNFLT
-+
+L-+++
TTVCPTSKPQ TQGLAKDAWE IPRESLRLEV
CEECCCCCCC CCCCCCCCCE CCHHHEEEEE
200
HHCCCCCCEE
KLNENHSGEL
KL-+---GEKLGQGCFGEV
EEEECCCEEE
* * *
250
EEEEEEECCC
IVVKVLKVRD
+-+K+LKVAIKTLKP..
EEEEEECC..
*
CCCCCHHHHH
WSTRKSRDFN
+T+++-+F.GTMSPEAFL
.CCCCHHHHH
HHHHHHHHHC
EECPRLRIFS
+E---++-++
QEAQVMKKLR
HHHHHHHHCC
*
EEHHHHCCCC
LITHWMPYGS
++T--M++GS
IVTEYMSKGS
EEEECCCCCE
HHHHHHCCCC
LYNVLHEGTN
L-++L-+-T+
LLDFLKGETG
HHHHHCCCCC
CCCCHHHHHH
FVVDQSQAVK
--+--+Q-V+
KYLRLPQLVD
CCCCHHHHHH
CCCCCCCCEE
RHALNSRSVM
---L-+++++
HRDLRAANIL
CCCCCHHHEE
*
*
Cat. Loop
ECCCCEEEEC
IDEDMTARIS
++E+-+++++
VGENLVCKVA
EECCCEEEEC
CCCCEEECCC
MADVKFSFQC
---+-DFGLAR....
CCCCCC....
*
CCCCEEEEEE
DTNRRSADMW
---++D+W
..FTIKSDVW
..CCHHHHHH
EEHHHHHHHH
SFAVLLWELV
SF++LL+EL+
SFGILLTELT
HHHHHHHHHH
H.CCCCCCCC
T.REVPFADL
T -+VP+-++
TKGRVPYPGM
CCCCCCCCCC
CCCEEEEEEE
HPNVLPVLGA
H++++-++++
HEKLVQLYAV
CCCECCEEEE
Example where
secondary structure
is important
•“Integrin-linked kinase” (Ilk)
is a novel protein kinase fold
with strong sequence similarity
to known structures (Hannigan
et al. 1996 Nature 379, 91-96)
300
HHHHHHHHHH
FALDMARGMA
+A+++A+GMA
MAAQIASGMA
HHHHHHHHHH
•Aligns to Src kinases with
BLAST e-value of 10-19 and
27% identity (alignment shown
is to a known Src kinase
structure)
350
CCCCCCCCCC
PGRMYAPAWV
+---W....FPIKWT
....CCHHHC
•Several key residues are
conserved, but residues
important to catalysis, including
catalytic Asp, are missing
400
CHHHHHHHHH
SNMEIGMKVA
+N-E+-++V
VNREVLDQV.
CHHHHHHHH.
PHAR 201 Lecture 05, 2012
•Recent experimental evidence
suggests that Ilk lacks kinase
activity (Lynch et al. 1999
Oncogene 18, 8024-8032)
7
History of Assignment
• Originally left to the interpretation of the
structural biologist – inconsistent
• 1983 - the Kabsch- Sander algorithm was written
as an aid in secondary structure prediction – the
program as such never emerged – what did
emerge is perhaps the most consistent and
accepted algorithm in all of structural
bioinformatics
• Assignments are embodies in the DSSP
algorithm and associated database of
assignments
PHAR 201 Lecture 05, 2012
8
Inconsistent Author Assignment
PHAR 201 Lecture 05, 2012
9
Hydrogen Bonding is Key to
Automated Methods
• Why? - ~90% of backbone donors (NH)
and acceptors (C=O) form hydrogen
bonds
• 62% are intra-backbone
• Basic definition
– Angle N – (H) – O greater than 120 degrees
– H …O less than 2.5A
– Note H’s not usually identified directly
PHAR 201 Lecture 05, 2012
10
Hydrogen Bond - Definition
PHAR 201 Lecture 05, 2012
11
Coulomb Hydrogen Bond
Calculation – used by DSSP
 1 1 1 1 
+ - 

E = f    + + +

rNO rHC' rHO rNC' 
•
•
•
•
f is a constant 332 Å kcal/e2
Delta is the + and – polar charge in electrons
Weakest H-bond –0.5 kcal/mole in DSSP
H not given – requires extrapolation – note assumes
planar geometry for peptide bond
PHAR 201 Lecture 05, 2012
12
DSSP – Dictionary of Secondary Structures
of Proteins
• Defined solely based on the H-bonds
given – from the list of bonds and residues
that form them; helix assignments are
made as follows:
– Alpha helix (H): start i -> i+4; end i-4 -> i
– 310 helix (G): start i -> i+3; end i-3 -> I
– Pi helix (I): start i -> i+5
PHAR 201 Lecture 05, 2012
13
DSSP – Dictionary of Secondary Structures
of Proteins
• Similarly for beta sheets:
– Residues (E) – have 2 Hbonds in the sheet or are
surrounded by 2 H-bonds
– Isolated residues (B) beta
bridge 1GCS
– Beta bulges also assigned
E – may exist as up to 4 on
one side of sheet and 1 on
the other
PHAR 201 Lecture 05, 2012
14
DSSP Nomenclature
•
•
•
•
•
•
•
•
H – alpha helix
G = 310 helix
I = Pi helix
B = bridge – single residue sheet
E = extended beta strand
T = beta turn (example)
S = bend
C = coil
PHAR 201 Lecture 05, 2012
15
Converse Situation?
• In our discussions of structure comparison
and alignment, structure classification and
(soon) domain assignment we learnt there
was not one generally accepted method
• DSSP has for a long time been a generally
accepted method
PHAR 201 Lecture 05, 2012
16
DSSP as Implemented in the PDB
1ATP
PHAR 201 Lecture 05, 2012
17
STRIDE – Empirical Hydrogen Bond
Calculation
E hb  E r  E t  E p
4 r 6
8 
3
r
m
m 
E r  
 6
8 E m
r 
 r
E p = co s2 ()
[0.9 + 0 .1 sin(2 it)] co s(to )

E t = K1 [K2 - co s2 (t i ) ] cos(to)

0

0 < t i  9 0
90 < t i  1 10
11 0  t i
- Derived from small molecule structures rm (3.0A) and Em (-2.8kcal/mole)
- Total energy Ehb
PHAR 201 Lecture 05, 2012
18
STRIDE – Empirical Hydrogen
Bond Calculation
• Uses Ehb and phi-psi torsional angle
criteria
• Torsional angles define secondary
structures according to the regions of the
Ramachandran plot in which they fall
• E is ignored if phi and psi are unfavorable
PHAR 201 Lecture 05, 2012
19
Comparison DSSP & STRIDE
PHAR 201 Lecture 05, 2012
20
DSSP vs STRIDE
• Stride – added term in the expression of
hydrogen bond energy
• Stride - Selection of terminal residues
through reliance on torsional angles
• Stride – stresses planarity of hydrogen
bonds while allowing longer bonds
PHAR 201 Lecture 05, 2012
21
Other Methods
• DEFINE – uses a distance criteria
between Calpha atoms which varies
slightly for each secondary structure type;
allows modifications for curvature
• P-Curve – analysis of protein curvature –
compares to ideal motifs – unknown motif
defined by tilt, roll etc between peptide
planes.
PHAR 201 Lecture 05, 2012
22
Comparative Notes
• The last residues of a sheet or a helix are often still in the same
conformation, although they no longer have hydrogen bonds in the
structure. This translates to the observation that ends (caps) of
regular secondary structure segments are not well defined.
• It seems that Ca-distance criteria (applied in DEFINE) alone can
accommodate considerable distortion of the backbone, giving an
excess of secondary structure assignments despite having reduced
e considerably.
• DSSP is the only assignment scheme with a large peak for a-helices
of four residues, many of which constitute single helical turns.
• DEFINE assigns more than twice as many sheets of length four than
the other methods.
• P-Curve has a tendency to assign overly long elements of regular
secondary structure.
PHAR 201 Lecture 05, 2012
23
Amino Acid Propensities Indicate the Role of
Side Chains in Defining Secondary Structure
– Basis of Prediction Methods –
Note that none of the assignment methods
use this
• Alpha helices – rich in ALA, LEU; poor in
PRO and GLY
• Beta sheets – rich in VAL, ILE; poor in
GLY, ASP, PRO
• 310 – rich in PRO; poor in ALA, LEU
• Beta bridges – poor in VAL, ILE
PHAR 201 Lecture 05, 2012
24
Newer Methods DSSPcont
• Use known alignments from multiple 3D
structures or from multiple members of the
NMR ensemble (DSSPcont)
• Consensus based approach
PHAR 201 Lecture 05, 2012
25
Supersecondary Structures
http://en.wikipedia.org/wiki/Meander_(art)
http://en.wikipedia.org/wiki/Zinc_finger
Zinc Finger Motif
PHAR 201 Lecture 05, 2012
26
I-sites (Baker)
• I-sites – specific segments with common
amino acid propensities
• Used by Rosetta to predict structure –
perhaps the most successful method thus
far
• Note considers only main chain hydrogen
bonds – much of the tertiary structure is
associated with side chain interactions
PHAR 201 Lecture 05, 2012
27
Summary
• DSSP remains the first and most popular approach
• STRIDE may have been developed as part of the EMBL
….
• DSSP has been coded a number of times from the paper
often with different results – open source helps this today
• DSSP is perhaps the most accepted algorithm in all of
structural bioinformatics
• It is not always clear whether the secondary structure
assignments deposited with a structure are from DSSP
or from the authors view
• Consistent searching requires that DSSP be used for all
structures – early structures had no author assignments
PHAR 201 Lecture 05, 2012
28
Download