Structure and evolution of IDPs
Peter Tompa
Institute of Enzymology
Hungarian Academy of Sciences
Budapest, Hungary
Why do we want to characterize/predict
IDPs?
1) Find new ones (460 in DisProt vs.
tens of thousands)
2) Describe our protein
Why do we want to describe the structure
of IDPs in detail?
Extend the structure-function paradigm
To characterize…
In the free
state
Structure
In the bound
state
Structural levels
Sequence
(primary)
Structure
Local
(secondary)
Global
(tertiary)
1) Primary structure
Primary structure (sequence) of IDPs
Dunker et al. (2001) J. Mol. Graph. Model. 19, 26
Low-complexity regions in proteins
Wootton (1994) Comp. Chem. 18, 269
Low complexity: Drosophila mastermind
Drosophila mastermind
MDAGGLPVFQSASQAAAAAVAQQQQQQQQQQQQQQQQQQQHLNLQLHQQHLQQQQSLGIHLQQQQQLQLQQQQQHNAQAQQQ
QQLQVQQQQQQRQQQQQQQQQHSLYNANLAAAGGIVGGLVPGGNGAGGVALQQVFGGPNGNNNSNNNNNSNNNSININNGNI
SPGDGLPTKRQPILDRLRRRMENYRRRQTDCVPRYEQTFSTVCEQQNHETSALQKRFLESKNKRAAKKTEKKLPETQQQAQT
QMLAGQLQSSVHVQQKILKRPADDVDNGAENYEPPQKLPNNNNNNNNNNNNNNNSSSGVGGGSENLTKFSVEIVQQLEFTTS
AANSQPQQISTNVTVKALTNTSVKSEPGVGGGRGRHQQQQQHQQHQQQQHQQQQHQQHQQHQQQQQHQQQQHQQQQHQQQQQ
QHHHQQQQQQGGGLGGLGNNGRGGGGPGGGGHMATGPGGVGVGMGPNMMSAQQKSALGNLANLVECKREPDHDFPDLGSLAK
DGANGQFPGFPDLLGDDNSENNDTFKDLINNLHDFNPSFLDGFDEKPLLDIKTEDGIKVEPPNAQDLINSLNVKSETGLGHG
FGGFGVGLGLDPQSMKMRPGVGFQNGPNGNANAGNGGPTAGGGGGGNGPGGLMSEHSLAAQTLKQMAEQHQHKSAMGGMGGF
HVPPHGMQQQQPQQQQQAPQQQQQQHGQMMGGPGQGQQQQQQQQPRYNDYGGGFPNDFAMGPNPTQQQQQHLPPQFHQKAPG
GGPGMNVQQNFLDIKQELFYSSPNDFDLKHLQQQQAMQQQQQQQQQQQQQQQHHAQQQQQHPNGPNMGVPMGGAGNFAKQQQ
QQVPTPQQQQQQQLQQQQQQYSPFSNQNANANFLNCPPRGGPQGNQAPGNMPQQQQQQPQQQQQPPRGPQSNPNAVPGGNAA
NATQQQQQQQQQQQQQQQQQQQQQQQATTTTLQMKQTQQLHISQQGGGSHGIQVSAGQHLHLSSDMKSNVSVAAQQGVFFSQ
QQAAQQQQQQQQQPGNAGPNPQQQQQQPHGGNAGANGGGPNGPQQQQPNQNMNNSNVPSDGFSLSQSQSMNFTQQQQQQAAA
AAAAAAAAQQQQAAAAQQQQQQVPPNMRQRQTQAQAAAAAAAAAAAQAQAAANANGGPGGNVPLMQQQQQTPGGVPVGAGSG
NASVGVPVSAGGPNNGAMNQLGGPMGGMPGMQMGGPGGVPINPMQMNPNGGAPNAQMMMGGNGGGPVPAASQAKFLQQQQIM
RAQAMQHQQQVQQHMAGARPPPPEYNATKAQLMQAQMMQQTVGGGGGGGVGVGVGVGGGVGGGGGAGRFPNSAAQAAAMRRM
TQQPIPPSGPMMRPQHAAMYMQQHGGAGGGPRGGMGGPYGGGGVGGAGGPMGGGGGGQQQQQRPPNVQVTPDGMPMGSQQEW
RHMMMTQQQQQMGFGPGGPMRQGPGGFNGGNFMPNGAPNAPGNGPNGGGGGGMMPGPNGPQMQLTPAQMQQQHMRQQQQQQH
MGPGGGGGGGGGNMQMQQLLQQQQNAAAGGGGGMMATQMQMTSIHMSQTQQQQQLTMQQQQFVQSTSTTTTHQQQQQLQLQM
QSQSGGPGGNGPSNNNGANQAGGVGVGVGVGVGVGVVGSSATIASASSISQTINSVVANSNDLCLEFLDNLPDGNFSTQDLI
NSLDNDNFNIQDILQ
2) Secondary structure
Structure in the free state (3 examples)
CREB-KID - CBP-KIX binding and NMR
Radhakrishnan et al. (1998) FEBS Lett. 430, 317
FlgM: evidence for disorder in vivo
Plaxco and Gross (1997) Nature, 386, 657
FlgM - sigma 28 binding and NMR
Sorenson (2004) Mol. Cell 14, 127
p27 – CycA/Cdk2 binding (NMR, MD)
Sivakolundu et al. (2005) JMB 353, 1118
And a fourth: polyproline II helix
SH3-PPII
Wikipedia
PPII helix conformation is common in IDPs
PPII
Dominates in :
a-casein
a-synuclein
tau
wheat gluten
Raman optical activity (ROA)
Syme et al. (2002) EJB 269, 148
2) Secondary structure
Structure in the bound state
Complexes of IDPs in PDB
IUP
SP code
Length
Partner
Method
CREB
DFF 45
P16220
O00273
28
89
CBP KIX
DFF 40
NMR
NMR
E-cadherin
P09803
57
b-catenin
X-ray
21
TFIIF/RAP74
NMR
24
Fibronectin
NMR
FCP1
p27Kip1
AAC64549.1*
IA3
Tcf3
FnBPA
Q53971
IA3
P01094
29
Proteinase A
X-ray
Killer toxin
P19972
77
Killer toxin a chain
X-ray
Bob-1
P10636
13
Pin1 WW
NMR
MAP tau
P25912
86
DNA
X-ray
MAX
Q16633
22
Oct-1 POU/DNA
X-ray
p27Kip1
P46527
69
CycA-Cdk2
X-ray
p53
P04637
11
MDM 2
X-ray
Phe-tRNA synthetase a
P27001
79
PKI
P04541
20
RB3
Q9H169
RNA pol II
Cdk2
FnBP
Phe-tRNA synthetase b + tRNA
Asp
prot.
X-ray
PKA
X-ray
91
tubulin
X-ray
P04050
17
mRNA capping enzyme
X-ray
SNAP 25
P13795-2
77
neuronal fusion complex
X-ray
SV 40 virus coat
P03087
66
assembled coat
X-ray
TAFII230
P51123
67
TBP
NMR
TBS virus coat
P11795
assembled coat
X-ray
Tcf3
CAA67686*
41
b-catenin
X-ray
Tcf4
Q9NQB0
24
Troponin I
P19429
17
Troponin C
NMR
Vitamin D3R
P11473
89
DNA
X-ray
CycA
34
fibronectin
b-catenin
b-catenin
X-ray
Secondary structural elements
Helix
Hélix
100
globular
IDP
60
40
60
40
20
20
0
Turn
80
fehérjék %-a
fehérjék %-a
80
100
0
20
40
60
80
0
100
0
80
100
Coil
80
fehérjék %-a
fehérjék %-a
60
100
Extended
80
60
21.9 %
10.9 %
40
20
0
40
másodlagos szerkezet %-a
másodlagos szerkezet %-a
100
20
60
31.3 %
44.8 %
40
20
0
20
40
60
80
másodlagos szerkezet %-a
100
0
0
20
40
60
80
másodlagos szerkezet %-a
100
Comparison of free and bound states:
what does it tell us ?
Local secondary structural elements in IDPs:
molecular recognition
1) disorder pattern
molecular recognition element
MoRE, MoRF
2) consensus sequence:
linear motif
LM, ELM, SLiM
3) local predictable structure
preformed structural element
PSE
1) Disorder pattern: MoRE in tumor suppressor p53
Uversky et al. (2005) J. Mol. Recogn. 18, 343
2) Consensus sequences: ELMs
ELMs and local disorder
Fuxreiter et al (2006) Bioinformatics, 23, 950
3) Predictability of structure: preformed
structural elements, PSEs
IUP
SP code
Length
Partner
Method
CREB
DFF 45
P16220
O00273
28
89
CBP KIX
DFF 40
NMR
NMR
E-cadherin
P09803
57
b-catenin
X-ray
FCP1
AAC64549.1*
21
TFIIF/RAP74
NMR
FnBPA
p27Kip1
Tcf3
P01094
IA3
P19972
77
Killer toxin a chain
X-ray
P10636
13
Pin1 WW
NMR
MAP tau
P25912
86
DNA
X-ray
MAX
Q16633
22
Oct-1 POU/DNA
X-ray
p27Kip1
P46527
69
CycA-Cdk2
X-ray
p53
P04637
11
MDM 2
X-ray
Phe-tRNA synthetase a
P27001
79
Phe-tRNA synthetase b + tRNA
X-ray
PKI
P04541
20
RB3
Q9H169
91
tubulin
X-ray
RNA pol II
P04050
17
mRNA capping enzyme
X-ray
SNAP 25
P13795-2
77
neuronal fusion complex
X-ray
SV 40 virus coat
P03087
66
assembled coat
X-ray
TAFII230
P51123
67
TBP
NMR
assembled coat
X-ray
IA3
Killer toxin
Cdk2
Bob-1
TBS virus coat
Tcf3
Q53971
24
29
FnBP
Fibronectin
NMR
Proteinase A
X-ray
Asp prot. PKA
P11795 fibronectin
34
CycACAA67686*
41
b-catenin
X-ray
b-catenin
X-ray
Tcf4
Q9NQB0
24
b-catenin
X-ray
Troponin I
P19429
17
Troponin C
NMR
Vitamin D3R
P11473
89
DNA
X-ray
PSE: predictability of secondary structure
80
%
60
40
20
0
Q3
SOV
Fuxreiter et al. (2004) JMB 338, 1015
MorE, LM, PSE: devices of effective recognition
MoRE
PSE
Sequential mechanism of p27 binding
45
Lacy et al (2004) NSMB 11, 358
3) Tertiary structure
Structural ensemble of a-synuclein
(NMR paramagnetic relaxation
enhancement)
Dedmon et al. (2005) JACS 127, 476
SAXS distance-distribution function and
topology of cellulase E
Von Ossowski et al. (2005) Biophys. J. 88, 2823
Global (tertiary) structure of IUPs
Hydrodynamic volume, Å
IUPRC
7
U (RC)
3
10
IUPPMG
10
6
10
5
10
4
PMG
MG
10
2
10
Native
3
N um ber of residues
Uversky (2002) Prot. Sci. 11, 739
A lesson from denatured states of globular
proteins:
spatial topology in denatured state resembles native
structure (David Shortle)
p27
Gillespie et al (1997) JMB 268, 170
Models
Protein
trinity
Protein
quartet
ordered
molten
globule
ordered
random
coil
(Dunker)
MG
PMG
RC
(Uversky)
The evolution of protein disorder
Generation
Evolution
Disorder in complete genomes (PONDR)
Dunker et al. (2000) Genome Inf. 11, 161
Disorder in complete genomes (DISOPRED)
Ward et al. (2004) JMB 337, 635
coli
yeast
IDPs: high frequency in proteomes
Tompa et al. (2006) J. Prot. Res 5, 1996
LDR (40<) protein, %
Structural disorder: evolutionary success story
60
E
40
A
20
B
0
Domain of life
Vucetic et al. (2002) Proteins 52, 573
The evolution of protein disorder
de novo generation
Generation
Evolution
gene duplication
lateral gene transfer,
LGT
The evolution of protein disorder
de novo generation
Generation
gene duplication
lateral gene transfer,
LGT
Evolution
Point mutation
Mutations
Rapid evolution by point mutations
number of families
20
15
10
5
0
larger
same
smaller
evolutionary variability IUP vs glob.
Brown et al. (2002) J. Mol. Evol. 55, 104
Non-synonymous vs. synonymous substitutions
Synonymous (Ks)
Point mutations
Non-synonymous (Ka)
Nonsense
0.1-0.2: „functional”
Evolution (Ka/Ks):
1.0: „neutral”
 1.0: „adaptive”
Rapid evolution of SRY gene
SRY: sex determining region on the Y chromosome
(testis determining factor)
The evolution of protein disorder
de novo generation
Generation
gene duplication
lateral gene transfer,
LGT
Evolution
Point mutation
Mutations
Repeat expansion
RNA polymerase II
RNAP II CTD: coordination of 5’ capping, splicing,
3’ polyadenylation of mRNA
CTDK
TFs
Initiation
Elongation
Termination
Yeast RNAP II CTD
IGTGAFDVMIDEESLVKYMPEQKITEIEDGQDGGV
TPYSNESGLVNADLDVKDELMFSPLVDSGSNDAMA
GGFTAYGGADYGEATSPFGAYGEAPTSPGFGVSSP
GFSPTSPTYSPTSPAYSPTSPSYSPTSPSYSPTSP
SYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSP
SYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSP
SYSPTSPSYSPTSPAYSPTSPSYSPTSPSYSPTSP
SYSPTSPSYSPTSPNYSPTSPSYSPTSPGYSPGSP
AYSPKQDEQKHNENENSR
RNAP II CTD evolution
60
repeat number
50
40
30
20
-SPSYSPT-
10
-2.5
-2.0
-1.5
-1.0
time (GYr)
-0.5
0.0
Repeats in IUPs and other datasets
proteins
40
frequency (%)
35
residues
30
25
20
15
10
5
0
Swiss-Prot
Yeast
Human
IUP
protein dataset
Tompa (2003) BioEssays 25, 847
Functional microsatellites (short
repeats) in IDPs
Protein
(repeat region)
Repeat sequence
Repetition
Function
Type
Calreticulin
E/D2-8K/R1-3
6
weak, large-capacity calcium
binding
I
Cdk p57
AP
43
linker between domains
I
RS protein SC-35
(K)RS
50
mRNA splicing
I
mastermind
G1-7V/A
7
linker/spacer
I
TF GAL11
Q
23
assembly of transcription
preinitiation complex
I
CPEB
Q1-13R/I/L/S
15
regulation of mRNA translation
I
Sup35p
Q2X2NN/Y
14
Nonsense mutation suppression
I
Sry
(QQK)0,1Q2-13FHDH1-5
1– 19
transactivator domain of sexdetermining factor
I
SFRS6_HUMAN Splicing factor
MPRVYIGRLSYNVREKDIQRFFSGYGRLLEVDLKN
GYGFVEFEDSRDADDAVYELNGKELCGERVIVEHA
RGPRRDRDGYSYGSRSGGGGYSSRRTSGRDKYGPP
VRTEYRLIVENLSSRCSWQDLKDFMRQAGEVTYAD
AHKERTNEGVIEFRSYSDMKRALDKLDGTEINGRN
IRLIEDKPRTSHRRSYSGSRSRSRSRRRSRSRSRR
SSRSRSRSISKSRSRSRSRSKGRSRSRSKGRKSRS
KSKSKPKSDRGSHSHSRSRSKDEYEKSRSRSRSRS
PKENGKGDIKSKSRSRSQSRSNSPLPVPPSKARSV
SPPPKRATSRSRSRSRSKSRSRSRSSSRD
Mouse SRY (testis determining factor)
MEGHVKRPMNAFMVWSRGERHKLAQQNPSMQNTEI
SKQLGCRWKSLTEAEKRPFFQEAQRLKILHREKYP
NYKYQPHRRAKVSQRSGILQPAVASTKLYNLLQWD
RNPHAITYRQDWSRAAHLYSKNQQSFYWQPVDIPT
GHLQQQQQQQQQQQFHNHHQQQQQFYDHHQQQQQQ
QQQQQQFHDHHQQKQQFHDHHQQQQQFHDHHHHHQ
EQQFHDHHQQQQQFHDHQQQQQQQQQQQFHDHHQQ
KQQFHDHHHHQQQQQFHDHQQQQQQFHDHQQQQHQ
FHDHPQQKQQFHDHPQQQQQFHDHHHQQQQKQQFH
DHHQQKQQFHDHHQQKQQFHDHHQQQQQFHDHHQQ
QQQQQQQQQQQFHDQQLTYLLTADITGEHTYQEHL
STALWLAVS
Functional minisatellites (long
repeats) in IDPs
Protein
(repeat region)
Repeat sequence
Repetition
Function
Type
fibronectin-binding protein A
(Du-D4)
EDT/SX9,10GGX3,4I/VDF
2–5
fibronectin binding
I
involucrin (Q-region)
QEGQLK/EH/LL/PEQ
24 – 63
transglutaminase cross-linking to
form keratinocyte envelope
I
neurofilament-H (KSP
domain)
XKSPY1-3K
42 – 55
entropic sidearm of
neurofilaments
I
prion protein (octarepeats)
PQ/HGGGWGQ
3 – 14
copper binding
III
RNA polymerase II (CTD)
YSPTSPS
11 – 52
coordination of transcription and
mRNA processing
II
salivary PRPs
PPPGKPQGPPPQGGNK
PQGPP
6 – 33
binding of polyphenolic plant
compounds (tannins)
I
tau protein
VQ/K/TSKI/CGSL/T/KD/
E/GNI/LK/H/THV/KQP
GGG
3–5
microtubule-binding,
polymerization
I
titin (PEVK)
PEV/APKEVVPEKKA/V
PVAPPKKPEV/APPVKV
5 – 60
providing entropic elasticity
during sarcomere stretch
I
INVO_HUMAN Involucrin
MSQQHTLPVTLSPALSQELLKTVPPPVNTHQEQMK
QPTPLPPPCQKVPVELPVEVPSKQEEKHMTAVKGL
PEQECEQQQKEPQEQELQQQHWEQHEEYQKAENPE
QQLKQEKTQRDQQLNKQLEEEKKLLDQQLDQELVK
RDEQLGMKKEQLLELPEQQEGHLKHLEQQEGQLKH
PEQQEGQLELPEQQEGQLELPEQQEGQLELPEQQE
GQLELPEQQEGQLELPQQQEGQLELSEQQEGQLEL
SEQQEGQLELSEQQEGQLKHLEHQEGQLEVPEEQM
GQLKYLEQQEGQLKHLDQQEQEGQLEQLEEQEGQL
KHLEQQEGQLEHLEHQEGQLGLPEQQVLQLKQLEK
QQGQPKHLEEEEGQLKHLVQQEGQLKHLVQQEGQL
EQQERQVEHLEQQVGQLKHLEEQEGQLKHLEQQQG
QLEVPEQQVGQPKNLEQEEKQLELPEQQEGQVKHL
EKQEAQLELPEQQVGQPKHLEQQEKHLEHPEQQDG
QLKHLEQQEGQLKDLEQQKGQLEQPVFAPAPGQVQ
DIQPALPTKGEVLLPVEHQQQKQEVQWPPKHK
PRIO_HUMAN major prion protein
................SDLGLCKKRPKPGGWNTGG
SRYPGQGSPGGNRYPPQGGGGWGQPHGGGWGQPHG
GGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKP
KTNMKHMAGAAAAGAVVGGLGGYMLGSAMSRPIIH
FGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNF
VHDCVNITIKQHTVTTTTKGENFTETDVKMMERVV
EQMCITQYERESQAYYQRGSSMVLFSSPPVILLIS
FLIFLIVG
IUPs often evolve by repeat expansion
Basic mechanisms of repeat expansion
Meiotic: replication slippage (micro)
Mitotic: unequal crossing over (mini)
Replication slippage
Wells RD (2001) JBC 271, 2875)
(Unequal) crossing over
Morgan 1916
Evolution of repetitive regions in IUPs
Type I
Type II
Type III
Tompa (2003) BioEssays 25, 847