Structure and evolution of IDPs Peter Tompa Institute of Enzymology Hungarian Academy of Sciences Budapest, Hungary Why do we want to characterize/predict IDPs? 1) Find new ones (460 in DisProt vs. tens of thousands) 2) Describe our protein Why do we want to describe the structure of IDPs in detail? Extend the structure-function paradigm To characterize… In the free state Structure In the bound state Structural levels Sequence (primary) Structure Local (secondary) Global (tertiary) 1) Primary structure Primary structure (sequence) of IDPs Dunker et al. (2001) J. Mol. Graph. Model. 19, 26 Low-complexity regions in proteins Wootton (1994) Comp. Chem. 18, 269 Low complexity: Drosophila mastermind Drosophila mastermind MDAGGLPVFQSASQAAAAAVAQQQQQQQQQQQQQQQQQQQHLNLQLHQQHLQQQQSLGIHLQQQQQLQLQQQQQHNAQAQQQ QQLQVQQQQQQRQQQQQQQQQHSLYNANLAAAGGIVGGLVPGGNGAGGVALQQVFGGPNGNNNSNNNNNSNNNSININNGNI SPGDGLPTKRQPILDRLRRRMENYRRRQTDCVPRYEQTFSTVCEQQNHETSALQKRFLESKNKRAAKKTEKKLPETQQQAQT QMLAGQLQSSVHVQQKILKRPADDVDNGAENYEPPQKLPNNNNNNNNNNNNNNNSSSGVGGGSENLTKFSVEIVQQLEFTTS AANSQPQQISTNVTVKALTNTSVKSEPGVGGGRGRHQQQQQHQQHQQQQHQQQQHQQHQQHQQQQQHQQQQHQQQQHQQQQQ QHHHQQQQQQGGGLGGLGNNGRGGGGPGGGGHMATGPGGVGVGMGPNMMSAQQKSALGNLANLVECKREPDHDFPDLGSLAK DGANGQFPGFPDLLGDDNSENNDTFKDLINNLHDFNPSFLDGFDEKPLLDIKTEDGIKVEPPNAQDLINSLNVKSETGLGHG FGGFGVGLGLDPQSMKMRPGVGFQNGPNGNANAGNGGPTAGGGGGGNGPGGLMSEHSLAAQTLKQMAEQHQHKSAMGGMGGF HVPPHGMQQQQPQQQQQAPQQQQQQHGQMMGGPGQGQQQQQQQQPRYNDYGGGFPNDFAMGPNPTQQQQQHLPPQFHQKAPG GGPGMNVQQNFLDIKQELFYSSPNDFDLKHLQQQQAMQQQQQQQQQQQQQQQHHAQQQQQHPNGPNMGVPMGGAGNFAKQQQ QQVPTPQQQQQQQLQQQQQQYSPFSNQNANANFLNCPPRGGPQGNQAPGNMPQQQQQQPQQQQQPPRGPQSNPNAVPGGNAA NATQQQQQQQQQQQQQQQQQQQQQQQATTTTLQMKQTQQLHISQQGGGSHGIQVSAGQHLHLSSDMKSNVSVAAQQGVFFSQ QQAAQQQQQQQQQPGNAGPNPQQQQQQPHGGNAGANGGGPNGPQQQQPNQNMNNSNVPSDGFSLSQSQSMNFTQQQQQQAAA AAAAAAAAQQQQAAAAQQQQQQVPPNMRQRQTQAQAAAAAAAAAAAQAQAAANANGGPGGNVPLMQQQQQTPGGVPVGAGSG NASVGVPVSAGGPNNGAMNQLGGPMGGMPGMQMGGPGGVPINPMQMNPNGGAPNAQMMMGGNGGGPVPAASQAKFLQQQQIM RAQAMQHQQQVQQHMAGARPPPPEYNATKAQLMQAQMMQQTVGGGGGGGVGVGVGVGGGVGGGGGAGRFPNSAAQAAAMRRM TQQPIPPSGPMMRPQHAAMYMQQHGGAGGGPRGGMGGPYGGGGVGGAGGPMGGGGGGQQQQQRPPNVQVTPDGMPMGSQQEW RHMMMTQQQQQMGFGPGGPMRQGPGGFNGGNFMPNGAPNAPGNGPNGGGGGGMMPGPNGPQMQLTPAQMQQQHMRQQQQQQH MGPGGGGGGGGGNMQMQQLLQQQQNAAAGGGGGMMATQMQMTSIHMSQTQQQQQLTMQQQQFVQSTSTTTTHQQQQQLQLQM QSQSGGPGGNGPSNNNGANQAGGVGVGVGVGVGVGVVGSSATIASASSISQTINSVVANSNDLCLEFLDNLPDGNFSTQDLI NSLDNDNFNIQDILQ 2) Secondary structure Structure in the free state (3 examples) CREB-KID - CBP-KIX binding and NMR Radhakrishnan et al. (1998) FEBS Lett. 430, 317 FlgM: evidence for disorder in vivo Plaxco and Gross (1997) Nature, 386, 657 FlgM - sigma 28 binding and NMR Sorenson (2004) Mol. Cell 14, 127 p27 – CycA/Cdk2 binding (NMR, MD) Sivakolundu et al. (2005) JMB 353, 1118 And a fourth: polyproline II helix SH3-PPII Wikipedia PPII helix conformation is common in IDPs PPII Dominates in : a-casein a-synuclein tau wheat gluten Raman optical activity (ROA) Syme et al. (2002) EJB 269, 148 2) Secondary structure Structure in the bound state Complexes of IDPs in PDB IUP SP code Length Partner Method CREB DFF 45 P16220 O00273 28 89 CBP KIX DFF 40 NMR NMR E-cadherin P09803 57 b-catenin X-ray 21 TFIIF/RAP74 NMR 24 Fibronectin NMR FCP1 p27Kip1 AAC64549.1* IA3 Tcf3 FnBPA Q53971 IA3 P01094 29 Proteinase A X-ray Killer toxin P19972 77 Killer toxin a chain X-ray Bob-1 P10636 13 Pin1 WW NMR MAP tau P25912 86 DNA X-ray MAX Q16633 22 Oct-1 POU/DNA X-ray p27Kip1 P46527 69 CycA-Cdk2 X-ray p53 P04637 11 MDM 2 X-ray Phe-tRNA synthetase a P27001 79 PKI P04541 20 RB3 Q9H169 RNA pol II Cdk2 FnBP Phe-tRNA synthetase b + tRNA Asp prot. X-ray PKA X-ray 91 tubulin X-ray P04050 17 mRNA capping enzyme X-ray SNAP 25 P13795-2 77 neuronal fusion complex X-ray SV 40 virus coat P03087 66 assembled coat X-ray TAFII230 P51123 67 TBP NMR TBS virus coat P11795 assembled coat X-ray Tcf3 CAA67686* 41 b-catenin X-ray Tcf4 Q9NQB0 24 Troponin I P19429 17 Troponin C NMR Vitamin D3R P11473 89 DNA X-ray CycA 34 fibronectin b-catenin b-catenin X-ray Secondary structural elements Helix Hélix 100 globular IDP 60 40 60 40 20 20 0 Turn 80 fehérjék %-a fehérjék %-a 80 100 0 20 40 60 80 0 100 0 80 100 Coil 80 fehérjék %-a fehérjék %-a 60 100 Extended 80 60 21.9 % 10.9 % 40 20 0 40 másodlagos szerkezet %-a másodlagos szerkezet %-a 100 20 60 31.3 % 44.8 % 40 20 0 20 40 60 80 másodlagos szerkezet %-a 100 0 0 20 40 60 80 másodlagos szerkezet %-a 100 Comparison of free and bound states: what does it tell us ? Local secondary structural elements in IDPs: molecular recognition 1) disorder pattern molecular recognition element MoRE, MoRF 2) consensus sequence: linear motif LM, ELM, SLiM 3) local predictable structure preformed structural element PSE 1) Disorder pattern: MoRE in tumor suppressor p53 Uversky et al. (2005) J. Mol. Recogn. 18, 343 2) Consensus sequences: ELMs ELMs and local disorder Fuxreiter et al (2006) Bioinformatics, 23, 950 3) Predictability of structure: preformed structural elements, PSEs IUP SP code Length Partner Method CREB DFF 45 P16220 O00273 28 89 CBP KIX DFF 40 NMR NMR E-cadherin P09803 57 b-catenin X-ray FCP1 AAC64549.1* 21 TFIIF/RAP74 NMR FnBPA p27Kip1 Tcf3 P01094 IA3 P19972 77 Killer toxin a chain X-ray P10636 13 Pin1 WW NMR MAP tau P25912 86 DNA X-ray MAX Q16633 22 Oct-1 POU/DNA X-ray p27Kip1 P46527 69 CycA-Cdk2 X-ray p53 P04637 11 MDM 2 X-ray Phe-tRNA synthetase a P27001 79 Phe-tRNA synthetase b + tRNA X-ray PKI P04541 20 RB3 Q9H169 91 tubulin X-ray RNA pol II P04050 17 mRNA capping enzyme X-ray SNAP 25 P13795-2 77 neuronal fusion complex X-ray SV 40 virus coat P03087 66 assembled coat X-ray TAFII230 P51123 67 TBP NMR assembled coat X-ray IA3 Killer toxin Cdk2 Bob-1 TBS virus coat Tcf3 Q53971 24 29 FnBP Fibronectin NMR Proteinase A X-ray Asp prot. PKA P11795 fibronectin 34 CycACAA67686* 41 b-catenin X-ray b-catenin X-ray Tcf4 Q9NQB0 24 b-catenin X-ray Troponin I P19429 17 Troponin C NMR Vitamin D3R P11473 89 DNA X-ray PSE: predictability of secondary structure 80 % 60 40 20 0 Q3 SOV Fuxreiter et al. (2004) JMB 338, 1015 MorE, LM, PSE: devices of effective recognition MoRE PSE Sequential mechanism of p27 binding 45 Lacy et al (2004) NSMB 11, 358 3) Tertiary structure Structural ensemble of a-synuclein (NMR paramagnetic relaxation enhancement) Dedmon et al. (2005) JACS 127, 476 SAXS distance-distribution function and topology of cellulase E Von Ossowski et al. (2005) Biophys. J. 88, 2823 Global (tertiary) structure of IUPs Hydrodynamic volume, Å IUPRC 7 U (RC) 3 10 IUPPMG 10 6 10 5 10 4 PMG MG 10 2 10 Native 3 N um ber of residues Uversky (2002) Prot. Sci. 11, 739 A lesson from denatured states of globular proteins: spatial topology in denatured state resembles native structure (David Shortle) p27 Gillespie et al (1997) JMB 268, 170 Models Protein trinity Protein quartet ordered molten globule ordered random coil (Dunker) MG PMG RC (Uversky) The evolution of protein disorder Generation Evolution Disorder in complete genomes (PONDR) Dunker et al. (2000) Genome Inf. 11, 161 Disorder in complete genomes (DISOPRED) Ward et al. (2004) JMB 337, 635 coli yeast IDPs: high frequency in proteomes Tompa et al. (2006) J. Prot. Res 5, 1996 LDR (40<) protein, % Structural disorder: evolutionary success story 60 E 40 A 20 B 0 Domain of life Vucetic et al. (2002) Proteins 52, 573 The evolution of protein disorder de novo generation Generation Evolution gene duplication lateral gene transfer, LGT The evolution of protein disorder de novo generation Generation gene duplication lateral gene transfer, LGT Evolution Point mutation Mutations Rapid evolution by point mutations number of families 20 15 10 5 0 larger same smaller evolutionary variability IUP vs glob. Brown et al. (2002) J. Mol. Evol. 55, 104 Non-synonymous vs. synonymous substitutions Synonymous (Ks) Point mutations Non-synonymous (Ka) Nonsense 0.1-0.2: „functional” Evolution (Ka/Ks): 1.0: „neutral” 1.0: „adaptive” Rapid evolution of SRY gene SRY: sex determining region on the Y chromosome (testis determining factor) The evolution of protein disorder de novo generation Generation gene duplication lateral gene transfer, LGT Evolution Point mutation Mutations Repeat expansion RNA polymerase II RNAP II CTD: coordination of 5’ capping, splicing, 3’ polyadenylation of mRNA CTDK TFs Initiation Elongation Termination Yeast RNAP II CTD IGTGAFDVMIDEESLVKYMPEQKITEIEDGQDGGV TPYSNESGLVNADLDVKDELMFSPLVDSGSNDAMA GGFTAYGGADYGEATSPFGAYGEAPTSPGFGVSSP GFSPTSPTYSPTSPAYSPTSPSYSPTSPSYSPTSP SYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSP SYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSP SYSPTSPSYSPTSPAYSPTSPSYSPTSPSYSPTSP SYSPTSPSYSPTSPNYSPTSPSYSPTSPGYSPGSP AYSPKQDEQKHNENENSR RNAP II CTD evolution 60 repeat number 50 40 30 20 -SPSYSPT- 10 -2.5 -2.0 -1.5 -1.0 time (GYr) -0.5 0.0 Repeats in IUPs and other datasets proteins 40 frequency (%) 35 residues 30 25 20 15 10 5 0 Swiss-Prot Yeast Human IUP protein dataset Tompa (2003) BioEssays 25, 847 Functional microsatellites (short repeats) in IDPs Protein (repeat region) Repeat sequence Repetition Function Type Calreticulin E/D2-8K/R1-3 6 weak, large-capacity calcium binding I Cdk p57 AP 43 linker between domains I RS protein SC-35 (K)RS 50 mRNA splicing I mastermind G1-7V/A 7 linker/spacer I TF GAL11 Q 23 assembly of transcription preinitiation complex I CPEB Q1-13R/I/L/S 15 regulation of mRNA translation I Sup35p Q2X2NN/Y 14 Nonsense mutation suppression I Sry (QQK)0,1Q2-13FHDH1-5 1– 19 transactivator domain of sexdetermining factor I SFRS6_HUMAN Splicing factor MPRVYIGRLSYNVREKDIQRFFSGYGRLLEVDLKN GYGFVEFEDSRDADDAVYELNGKELCGERVIVEHA RGPRRDRDGYSYGSRSGGGGYSSRRTSGRDKYGPP VRTEYRLIVENLSSRCSWQDLKDFMRQAGEVTYAD AHKERTNEGVIEFRSYSDMKRALDKLDGTEINGRN IRLIEDKPRTSHRRSYSGSRSRSRSRRRSRSRSRR SSRSRSRSISKSRSRSRSRSKGRSRSRSKGRKSRS KSKSKPKSDRGSHSHSRSRSKDEYEKSRSRSRSRS PKENGKGDIKSKSRSRSQSRSNSPLPVPPSKARSV SPPPKRATSRSRSRSRSKSRSRSRSSSRD Mouse SRY (testis determining factor) MEGHVKRPMNAFMVWSRGERHKLAQQNPSMQNTEI SKQLGCRWKSLTEAEKRPFFQEAQRLKILHREKYP NYKYQPHRRAKVSQRSGILQPAVASTKLYNLLQWD RNPHAITYRQDWSRAAHLYSKNQQSFYWQPVDIPT GHLQQQQQQQQQQQFHNHHQQQQQFYDHHQQQQQQ QQQQQQFHDHHQQKQQFHDHHQQQQQFHDHHHHHQ EQQFHDHHQQQQQFHDHQQQQQQQQQQQFHDHHQQ KQQFHDHHHHQQQQQFHDHQQQQQQFHDHQQQQHQ FHDHPQQKQQFHDHPQQQQQFHDHHHQQQQKQQFH DHHQQKQQFHDHHQQKQQFHDHHQQQQQFHDHHQQ QQQQQQQQQQQFHDQQLTYLLTADITGEHTYQEHL STALWLAVS Functional minisatellites (long repeats) in IDPs Protein (repeat region) Repeat sequence Repetition Function Type fibronectin-binding protein A (Du-D4) EDT/SX9,10GGX3,4I/VDF 2–5 fibronectin binding I involucrin (Q-region) QEGQLK/EH/LL/PEQ 24 – 63 transglutaminase cross-linking to form keratinocyte envelope I neurofilament-H (KSP domain) XKSPY1-3K 42 – 55 entropic sidearm of neurofilaments I prion protein (octarepeats) PQ/HGGGWGQ 3 – 14 copper binding III RNA polymerase II (CTD) YSPTSPS 11 – 52 coordination of transcription and mRNA processing II salivary PRPs PPPGKPQGPPPQGGNK PQGPP 6 – 33 binding of polyphenolic plant compounds (tannins) I tau protein VQ/K/TSKI/CGSL/T/KD/ E/GNI/LK/H/THV/KQP GGG 3–5 microtubule-binding, polymerization I titin (PEVK) PEV/APKEVVPEKKA/V PVAPPKKPEV/APPVKV 5 – 60 providing entropic elasticity during sarcomere stretch I INVO_HUMAN Involucrin MSQQHTLPVTLSPALSQELLKTVPPPVNTHQEQMK QPTPLPPPCQKVPVELPVEVPSKQEEKHMTAVKGL PEQECEQQQKEPQEQELQQQHWEQHEEYQKAENPE QQLKQEKTQRDQQLNKQLEEEKKLLDQQLDQELVK RDEQLGMKKEQLLELPEQQEGHLKHLEQQEGQLKH PEQQEGQLELPEQQEGQLELPEQQEGQLELPEQQE GQLELPEQQEGQLELPQQQEGQLELSEQQEGQLEL SEQQEGQLELSEQQEGQLKHLEHQEGQLEVPEEQM GQLKYLEQQEGQLKHLDQQEQEGQLEQLEEQEGQL KHLEQQEGQLEHLEHQEGQLGLPEQQVLQLKQLEK QQGQPKHLEEEEGQLKHLVQQEGQLKHLVQQEGQL EQQERQVEHLEQQVGQLKHLEEQEGQLKHLEQQQG QLEVPEQQVGQPKNLEQEEKQLELPEQQEGQVKHL EKQEAQLELPEQQVGQPKHLEQQEKHLEHPEQQDG QLKHLEQQEGQLKDLEQQKGQLEQPVFAPAPGQVQ DIQPALPTKGEVLLPVEHQQQKQEVQWPPKHK PRIO_HUMAN major prion protein ................SDLGLCKKRPKPGGWNTGG SRYPGQGSPGGNRYPPQGGGGWGQPHGGGWGQPHG GGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKP KTNMKHMAGAAAAGAVVGGLGGYMLGSAMSRPIIH FGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNF VHDCVNITIKQHTVTTTTKGENFTETDVKMMERVV EQMCITQYERESQAYYQRGSSMVLFSSPPVILLIS FLIFLIVG IUPs often evolve by repeat expansion Basic mechanisms of repeat expansion Meiotic: replication slippage (micro) Mitotic: unequal crossing over (mini) Replication slippage Wells RD (2001) JBC 271, 2875) (Unequal) crossing over Morgan 1916 Evolution of repetitive regions in IUPs Type I Type II Type III Tompa (2003) BioEssays 25, 847