MoRFs A DATASET OF MOLECULAR RECOGNITION FEATURES Amrita Mohan Submitted to the faculty of the Bioinformatics Graduate Program in partial fulfillment of the requirements for the degree Master of Science in the School of Informatics, Indiana University December 2005 Accepted by the Faculty of Indiana University, in partial fulfillment of the requirements for the degree of Master of Science ________________________ A. Keith Dunker, PhD, Chair ________________________ Vladimir Uversky, PhD ________________________ Predrag Radivojac, PhD Master’s Thesis Committee ________________________ Narayanan B. Perumal, PhD ii To my parents, in recognition of their worth iii Acknowledgements The completion of this thesis would not have been possible without the support of many people. A heart felt thanks to my research advisor, Keith for his support, guidance and above all providing me with the financial means to complete this thesis and Masters degree. Many thanks to Volodya, who besides guiding my research work, sportingly read numerous revisions of this thesis draft and helped make sense of confusion. I have furthermore to thank my committee member, Peru who has unfailingly boosted my morale throughout the duration of this study. Last but in no means the least, a sincere thanks to Pedja, for being an incredible scientific help and a great friend in an advisor’s guise. iv ABSTRACT AMRITA MOHAN MoRFs A DATASET OF MOLECULAR RECOGITION FEATURES The last decade has witnessed numerous proteomic studies which have predicted and successfully confirmed the existence of extended structurally flexible regions in protein molecules. Parallel to these advancements, the last five years of structural bioinformatics has also experienced an explosion of results on molecular recognition and its importance in protein-protein interactions. This work provides an extension to past and ongoing research efforts by looking specifically at the “flexibility and disorder” found in protein sequences involved in molecular recognition processes and known as, Molecular Recognition Elements or Molecular Recognition Features (MoREs or MoRFs, as we call them). MoRFs are relatively short in length (10 – 70 residues length); loosely structured protein regions within longer sequences that are largely disordered in nature. Interestingly, upon binding to other proteins, these MoRFs are able to undergo disorder-to-order transition. Thus, in our interpretation, MoRFs could serve as potential binding sites, and that this binding to another protein lends a functional advantage to the whole protein complex by enabling interaction with their physiological partner. There are at least three basic types of MoRFs: those that form α-helical structures upon binding, those that form β-strands (in which the peptide forms a β-sheet with additional β-strands provided by the protein partner), and those that form irregular structures when bound. Our proposed names for these structures are α-MoRF (also known as α-MoRE, alpha helical molecular v recognition feature/element), β-MoRF (beta sheet molecular recognition feature/element), and I-MoRF (Irregular molecular recognition feature/element), respectively. The results presented in this work suggest that functionally significant residual structure can exist in MoRF regions prior to the actual binding event. We also demonstrate profound conformational preferences within MoRF regions for α-helices. We believe that the results from this study would subsequently improve our understanding of protein-protein interactions especially those related to the molecular recognition, and may pave way for future work on the development of protein binding site predictions. We hope that via the conclusions of this work, we would have demonstrated that within only a few of years of its conception, intrinsic protein disorder has gained wide-scale importance in the field of protein-protein interactions and can be strongly associated with molecular recognition. vi Table of Contents Acknowledgements .................................................................................................................. iv ABSTRACT .............................................................................................................................. v Introduction .............................................................................................................................. 1 A. Introduction to subject ................................................................................................... 1 B. Importance of this subject .............................................................................................. 4 C. Knowledge Gap ............................................................................................................... 5 Background ............................................................................................................................... 7 A. Relevant research ............................................................................................................ 7 B. Goal to be tested .............................................................................................................. 8 C. Intended research goal ................................................................................................... 8 Materials & Methods ............................................................................................................. 10 A. Dataset of MoRFs.......................................................................................................... 10 Results ..................................................................................................................................... 13 A. MoRFs dataset statistics & length distribution .......................................................... 13 B. Secondary Structure Analysis ...................................................................................... 14 C. Amino acid Composition, Charge & Aromatics in MoRFs ....................................... 17 (a) Amino acid compositions ............................................................................................. 17 (b) Net & Total charges, Aromatics and Proline Content .............................................. 19 D. Order-Disorder Predictions and Functional classes .................................................. 20 E. Presence of poly–proline type II hélices & Ramachandran Plot .............................. 23 Conclusions ............................................................................................................................. 25 Discussions .............................................................................................................................. 28 Appendix A. B. MoRFs(or MoREs) and their partners………………………………...………...….31 MoRF update......................................................................................................38 References ............................................................................................................................... 51 CURRICULUM VITAE vii Introduction A. Introduction to subject Traditional understanding of protein structure and function relationship relies on protein function being critically dependent on a well-defined threedimensional protein structure. However, recent studies revealed that the true functional state for many proteins and protein domains is an intrinsically unstructured conformation [1-14]. This phenomenon has been described for both partially and wholly disordered proteins. Since these first observations, the field of protein disorder and protein functionality resulting from this disorder has been steadily progressing. Number of publications 300 250 200 150 100 50 0 1985-89 1990-94 1995-99 2000-04 Years Figure 1: Time-dependent increase in the number of PubMed hits dealing with intrinsically disordered proteins. The following set of keywords has been used to perform this search: intrinsically disordered, intrinsically unstructured, natively unfolded, intrinsically unfolded and intrinsically flexible. Figure 1 reflects the rapidly growing interest in the domain of intrinsically disordered proteins. In fact, 110 papers discussing disordered proteins were published during the year of 2004 (and as many as 50 such papers were published during the first quarter of 2005) [15]. 1 The conformation of natively disordered proteins closely mimics the observed denatured states of structured proteins [16, 17]. Past initiatives and efforts in the field of structural biology have proven that disordered proteins are common in various proteomes and their frequency increases with increasing complexity of the organisms [18]. This increased prediction of disorder in eukaryotes compared with the prokaryotes or the archaea has been suggested to be a consequence of the increased need for cell signaling and regulation [19-21]. The functional importance of protein disorder is further emphasized by its role in various signal transduction processes, cell-cycle regulation, gene expression and molecular recognition [2-4, 15]. The widespread prevalence and importance of these proteins has called for re-assessing the classical understanding of protein structure–function paradigm [1 -11, 21]. It has also long been recognized that the formation of protein-protein complexes is probably the most common phenomenon by virtue of which biological function is achieved. In this report we discuss a specialized subset of these protein-protein interactions, ‘Molecular Recognition Elements’ or “Molecular Recognition Features” which are protein regions that specifically participate in protein-protein interactions. Molecular recognition is defined as a process by which biological entities interact with each other or with small molecules, to form specific complexes. In case of proteins, this binding phenomenon enables a proteinaceous complex to participate in specialized activities and mediate select biochemical functions. Important aspects of signaling-related molecular recognition in comparison with other binding events are: (a) the unique combination of high specificity and low affinity; (b) 2 binding diversity in which one region specifically recognizes differently shaped partners by structural accommodation at the binding interface; (c) binding commonality in which multiple, distinct sequences recognize a common binding site (with perhaps different folds). Besides this, another important feature of molecular recognition is that it coincides with more complex and functionally important mechanisms such as protein folding, signal transduction or the formation of multisubunit and supramolecular structures. Some special examples of molecular recognition have been reported where one or both of the partners are very flexible or wholly disordered prior to binding and their interaction resulted in the formation of a regular structured protein complex. This phenomenon can only be explained by the complex being formed over a huge configurational space via the binding-induced folding. Obviously, in this case each residue constituting this complex is under the influence of numerous attractive and repulsive forces. This may explain why experimental analysis and detection of such a phenomenon is hard to undertake. The complexity of this problem can be gauged in close proximity to that of the problem of protein folding [5, 22]. One illustrative example of functionally important disordered protein that participates in a molecular recognition process is considered below. In 2004 Callaghan et al. [23] showed experimentally and by means of GlobPlot [24] and PONDR [52 - 54] that the C terminal domain of a full-length RNase E was predominantly disordered. Besides being highly enriched in the R, P, G, Q 3 amino acids, this domain also included three isolated stretches that demonstrated increased propensity to be ordered when bound to RNA. It was also proposed that the second stretch amongst these was the one that truly interacts with the RNA. B. Importance of this subject The recognition of one protein molecule by another is an important phenomenon in all living systems. Enzymes are a good example of molecular recognition and substrate binding. This selective recognition process lies in the ‘complementary’ nature of the interacting surfaces and was termed initially as the so called ‘lock-and-key’ concept by Fischer more than a hundred years ago [41]. However, a modern view on molecular recognition, called induced fit [42], takes into account that the interacting molecules are flexible and can adapt their shape during the recognition process. Induced fit has been observed experimentally for many protein-ligand interactions. Our work aims to study proteins participating in molecular recognition. To this end, a dataset of Molecular Recognition Features or Elements (which will be referred to as MoRFs from here onwards) was created and some characteristic features of MoRFs were described. We report and discuss the results from a few qualitative tests (such as amino acid compositions, order-disorder percentages etc.) performed on MoRF dataset. These results are also compared to those from representative disorder and globular protein datasets. We believe 4 that this analysis would help us to better understand the physico-chemical and structural variations between molecular recognition elements and ordinary order – disorder datasets. Any notable differences may allow future characterization and prediction of MoRFs and subsequently improve our understanding of the structural changes that bring about the binding of a MoRF to its macromolecular target. A parallel advantage foreseen from the results of this study would be to have more accurate estimation of protein binding sites within MoRFs. This could ultimately lead to the design and development of a predictor of MoRFs. On the commercial side, we expect all these results can facilitate simpler design of drug compounds that influence the process of molecular recognition. C. Knowledge Gap We are now aware of the large body of evidence that supports an idea of functional importance of intrinsic disorder. However, there is an apparent lack of information on the various features and characteristics of MoRFs (such as amino acid compositions, order-disorder predisposition, charge, aromaticity etc). Little is also known about the mechanisms underlying the structural changes in MoRFs during their binding phase. In general, this problem has been difficult to approach experimentally, especially since studying the extremely flexible conformations of MoRFs poses a special challenge [40]. Computational approaches may help solve the problem in such situation. In doing so, we compared the actual bound structures to their inherent structural preferences. For this purpose, we collected MoRFs from PDB and determined their secondary structures in the bound state. It is our belief that the binding of 5 MoRFs to their respective partners after undergoing a disorder-order transition is certainly template-driven and not a chance event. 6 Background A. Relevant research Molecular Recognition Features (MoRFs) are common in various proteomes and occupy a unique structural and functional niche in which function is a direct consequence of intrinsic disorder. The evidence that these intrinsically disordered proteins without a well defined folded structure do exist in vitro and in vivo is compelling and justifies considering them as a separate class within the protein universe. A number of reviews and papers have reported and discussed advances in the rapidly progressing field of intrinsically disordered proteins, with major focus towards gathering evidence for their unfolded nature prior to binding and discussing the functional benefits their malleable structural state provides [1-4, 12-18]. In their unbound form, many intrinsically disordered proteins have been traditionally considered to exist in a random coil state (non-alpha, non-beta conformations maintaining aperiodic phi and psi angles), since their structures closely mimic the unfolded state of globular proteins in the presence of high concentrations of strong denaturants [19- 21]. A closer look at natively unfolded proteins and some MoRFs however reveals this statement not to be correct. To begin with, a true irregular does not exist even under the harshest conditions [46, 47]. Hence, it is not surprising that many MoRFs have been reported to bear traces of residual structure. [12, 17] and upon interaction with their binding partners, MoRFs have the ability to undergo significant induced folding steps or disorder-to-order transition [1, 2, 12]. Such a molecular 7 recognition mechanism, which is coupled to the folding process, has been noted to confer exceptional specificity and versatility [3, 26-28]. All these features explain the prevalence of structural disorder in signaling and regulatory proteins [28]. The interaction of MoRFs with their partners highlights the need and importance of comprehending the mechanism of their induced folding process. Since effective functioning of MoRFs requires fast formation of the folded state [49], their template-induced folding represents a special and interesting case of protein folding. The advantages of this binding mode have been studied in detail in the case of the transcription factor GCN4, where binding strength correlates with α-helicity of its critical DNA-binding segment [50, 51]. It has to be noted, however, that over-stabilization of a secondary structural element can also decrease the rate constant of complex formation, as was shown for the cyclin-dependent kinase inhibitor, p27 (Kip1). B. Goal to be tested The goal of this work is to discover signs of inherent secondary structure preferences, if any, in MoRFs prior to binding which could possibly influence their final structure in the ordered complex. In doing so, MoRF sequences will first be assessed by a secondary structure predictor, PHD [34, 35] and then compared to the bound structures. C. Intended research goal Our primary goal in this project is to design and develop a database of MoRFs and to study a few types and examples of molecular recognition elements from 8 this database. We also carry out several qualitative tests on this database and compare the results with those from representative disordered (DISPROT [29]) and ordered datasets to look into any physico-chemical differences between their members. Our ultimate goals are: (a) To facilitate future characterization and prediction of MoRFs; (b) To help us have better knowledge about potential binding sites and (c) To gain further insight into the structural changes that bring about the binding of a MoRF to its macromolecular target. We also hope that by doing these analyses we provide a ground for future design of compounds that influence this process. Eventually, the results from these tests will not only help associate disorder with MoRFs but also show that structural disorder observed in MoRFs actually predisposes them for special functional modes, which are either a direct result of their fluctuating conformation or is realized via binding to one or several other proteins in a structurally adaptive process. 9 Materials & Methods A. Dataset of MoRFs Using the Seqres dataset available at the Protein Data Bank (PDB) [30], we collected protein segments shorter than 70 residues, which are bound to other proteins with lengths of 100 residues or more. Our choice for selecting protein chains with lengths less than 70 residues stemmed from the fact that such proteins would be less likely to form self-folding globular units and then interact with other proteins. In other words, such protein chains very likely do not have significant buried surface area and participate in the molecular recognition phenomena by forming parts of larger protein complexes. Using these criteria, we were able to prepare a starting dataset consisting of 2512 protein chains. The PDB files corresponding to these 2512 proteins were downloaded to obtain sequences, secondary structure, and information on Ramachandran’s phi and psi angles. The PDB Seqres dataset houses all the protein sequences available at PDB along with their residues observed in a protein crystal or in solution. These sequences also included residues not present in the crystal model (i.e., disordered, lacking electron density, cloning artifacts, His–tags, etc.). An obvious next step was to get rid of all chains with ambiguous sequence information from our initial working dataset (i.e., sequence containing X or Z annotations instead of real amino acids). We also removed protein chains with 10 or less residues since such short peptides may or may not be specific to larger sequences making the later steps of identifying sequences containing such MoRFs difficult. At the end of all these steps of data preprocessing 1261 chains (approx. 55000 residues with an average chain 10 length of 44.9 residues) were remaining. Further, after removing redundancy amongst these 1261 protein chains our initial dataset gave us 372 nonredundant MoRFs. Since these putative MoRFs were variable in their lengths we made use of Rost’s formula [31] to dynamically calculate the sequence identity threshold based on each chain’s length. A preliminary study based on the results from the redundancy check step showed that the minimum number of members per family was at least 1 and the maximum number of members for another family was 177 (Thrombin, Alpha-Thrombin). Figure 2 shows the distribution of cluster members within the MoRFs dataset. Total # of clusters = 372 Figure 2: Frequency distribution of number of homologous MoRF sequences for 372 Non Redundant MoRFs [x axis: # of MoRF sequences (# of clusters), y axis: # of homologous members (cluster members) Using other database references (Swiss-Prot [32], PIR [33], and NCBI [71] listed in the respective PDB files for each of the MoRFs; we were able to extract 301 sequences containing these 372 MoRF chains. All but 53, of the total MoRFs were found to be fragments and parts of larger sequences. A final 11 task after collecting and processing these MoRFs was to design a database for MoRFs. For this task we used MySQL as the backend and Perl scripts to load MoRFs and information about the MoRFs such as secondary structure, binding partner, length etc. Finally using the DSSP program [34], secondary structure assignments for MoRFs was made (results shown in Figure 5). Figure 3 represents several illustrative examples of MoRFs from our dataset. Figure 3: Some examples of complexes between MoRFs and their binding partners. The structures (PDB code in parenthesis) shown are: (a) α-helical MoRF p53 attached to MDM2 (b) extended β-MoRF Grim attached to Apoptosis Inhibitor (c) irregular-MoRF p53 attached to Cyclin A2 (d) Complex– MoRF ovomucoid attached to Trypsin. The structures have been visualized by the Swiss-PDB viewer. 12 Results A. MoRFs dataset statistics & length distribution Table 3 lists the number of MoRFs obtained after each data pre-processing step to reach a final non-redundant working dataset. Number of MoRFs 2512 Initial MoRFs obtained using PDB Seqres dataset (July 2004) Filtering ambiguous data (X,Z), Removal of sequences with less than 10 residues Sequence redundancy removal 1261 372 Table 3: Number of MoRFs after each data processing step Analysis of the lengths for all MoRFs showed that as many as two-thirds of these features had lengths between 10 and 20 residues and were relatively short in lengths in comparison to other proteins (Figure 4). 40 35 30 Frequencys 25 20 15 10 5 0 10 15 20 25 30 35 40 45 Length of MoRE 50 55 60 Figure 4: Length distribution in MoREs dataset. 13 65 B. Secondary Structure Analysis We used the DSSP [34] program to determine the secondary structure assignments for each of the 372 MoRFs. The DSSP program was designed to standardize protein secondary structure assignments. It accepts as input, a single PDB entry file to assign secondary structure types (viz., helices, sheets and irregular) to each residue of this protein’s sequence. Results showed that, 27% of this dataset (approximately 9000 residues) had αhelical conformation, 12% were β-sheet residues and approximately 48% of the residues with an irregular structure. The remaining 13% of the residues constituted missing coordinate data from PDB files confirming their disorder structure type. We compared these results with those from a similar size (approx. 9000 residues) control dataset consisting of single chain X-ray structures with a primitive space group (necessarily monomeric). The structures in the control dataset had no missing residues. Results of this analysis are shown in Figure 5. Disorder 13% Disorder 0% Helices 33% Helices 27% Irregular 42% Sheet/Strand 12% Sheet/Strand 25% Irregular 48% Figure 4: (a) Secondary structure distribution of residues in MoRFs (b) Secondary structure distribution in Monomers 14 We observe a decreased overall preference for extended – beta conformation in MoRFs. This can be justified by an abundance of hydrophilic side-chains in them [2, 3, and 12]. The most pronounced difference between the secondary structure distributions of bound MoRFs (~12%) and monomeric proteins (~25%) is also seen in the extended - beta structural elements or sheet motifs. The possibility that interactions with the partner protein influence the native conformational preferences of MoRFs was studied by comparing predicted secondary structure results to the DSSP assignments of the bound structures. We followed up with comparisons between MoRFs’ structure assignments and their predisposition to form particular secondary structure. For this we used the PHD algorithm [34, 35]. The PHD secondary structure algorithm uses a combination of multiple sequence alignments and neural networks to predict secondary structure elements for each residue of a given protein sequence. When a protein is input, this method finds all the homologs and builds a profile using multiple sequence alignment. It then feeds this profile into a series of neural networks to output the predictions. As mentioned earlier, the goal of this exercise was to test our original hypothesis that protein complex formation influences or modifies the disordered state of MoRFs. To estimate the effect of partner proteins in modifying the inherent structural preferences of MoRFs upon binding, predicted secondary structures have been related to the observed 15 conformations in the bound state. Results obtained from this experiment (Table 4) establish that the inherent secondary structural features of MoRFs were well preserved in their bound state. This is similar to globular proteins, where non-local interactions were found to have negligible effect on the predictability of secondary structures [68]. The most remarkable preference, as seen in the case of helices, predicts a substantial stability for these motifs and points to them as preformed structural elements in the solution state. In contrast, coils can be produced by random sequences almost as well as by the MoRF chains themselves. The correlation of the secondary structure preferences of MoRFs with and without their binding partners can help in future analysis and probing of the role of these structure elements. DSSP (residues) α-helix 2469 β-sheet 1118 Irregular 4359 Disorder 1147 PHD H:74%, B: 9%, I: 17% H:11%, B: 55%, I: 34% H: 21%, B: 15%, I:64% H: 18%, B: 10%, I:72% Table 4: PhD secondary structure prediction accuracies for MoREs Results revealed that α–MoRFs were predicted with higher confidence as against β-MoRFs or I-MoRFs. Also a high percentage of the originally assigned disordered residues were predicted to be irregular. Extended conformations can hardly be predicted from MoRF sequences, possibly due to the fact that they are less structurally defined while still in solution and have a tendency to become ordered only upon binding to the partner. As in the case of secondary structure assignment results, these results were also compared with those from our control dataset of monomeric proteins. 16 A region-wise analysis for different structural types of MoRFs (Table 5 & Figure 6) showed that there were in all 1880 regions of known secondary structures, 269 of which were helical in nature, 381 were sheets. More than half of the total regions or 991 were found to have an irregular conformation. The remaining 239 regions were disordered. # of Disordered regions 205 26 5 3 239 Region Length (in residues) 1 -9 10 -19 20 - 29 30 - 69 # of Helical regions 167 76 17 9 269 # of Ext. Beta regions 376 5 0 0 381 # of Irregular regions 847 128 10 6 991 Table 5: Region wise distribution in different structural types of MoRFs 840 790 740 Number of Regions 690 640 590 Number of D regions 540 490 Number of B regions Number of H regions Number of I regions 440 390 340 290 240 190 140 90 40 -10 1 -9 10 -19 20 - 29 30 - 69 Re gion Le ngths Figure 6: Histogram for region wise distribution in MoRFs C. Amino acid Composition, Charge & Aromatics in MoRFs (a) Amino acid compositions Comparisons between amino acid compositions for monomers and MoRFs show that MoRFs have increased levels of C, R, S, P and K. On the other hand 17 they show decreased content of amino acids important for the formation of strong β-sheets (with low α-helical propensity) such as L, V, F, I, Y and D. (Figure 7(a)). 10 9 MoREs 8 Monomers 7 6 5 4 3 2 1 0 W C F I Y V L H M A T R G Q S N P D E K Figure 7: (a) Amino acid composition of MoRFs (MoREs) vs. Monomers The following histograms (Figure 7(b) & 7(c)) depict the relative composition of all MoRFs as well as the different structural categories of MoRFs with respect to proteins from the control monomeric dataset. It is interesting to note the significant enrichment of Cystine in the MoRF dataset (Figure 7(b)) in general when compared to the control dataset. Figure 7(c) also shows that Cystine seems to be more prevalent in β-MoRFs as compared to their α-helical and irregular counterparts. Based on these results it might be interesting to probe further for the presence of disulfide bonds in MoRF interactions. 1.60 1.40 (MoRFs - Monomers)/Monomers 1.20 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 -0.40 -0.60 W C F I Y V L H M A T R G Q S N P D E K Figure 7: (b) Relative amino acid composition of MoRFs w.r.t. Monomers: Y axis shows the fractional difference of the amino acid compositions of MoRFs and Monomers i.e., (MoRFs – Monomers)/Monomers 18 8.70 8.20 7.70 7.20 6.70 6.20 5.70 5.20 4.70 4.20 3.70 3.20 2.70 2.20 1.70 1.20 0.70 0.20 -0.30 -0.80 -1.30 (α-MoRE - Monomeric helices)/Monomeric helices (β-MoRE -Monomeric sheets)/Monomeric Sheets (i-MoRE -Monomeric Irregulars)/Monomeric Irregulars W C F I Y V L H M A T R G Q S N P D E K Figure 7: (c) Relative amino acid composition of different structural types (αhelical, β, and Irregular) of MoRFs w.r.t. Monomers. (b) Net & Total charges, Aromatics and Proline Content Figure 8: Total and Net charges, Proline Percentage & Aromatics in MoRFs Figure 8 displays the comparative results of features such as Total Charge (K + R + D + E), Net Charge (K + R - D – E), Proline composition and Aromatic content (F+W+Y) between MoRFs (MoREs) and monomeric chains. It is interesting to note that despite comparable total charges in both these classes of proteins, MoRFs tend to maintain higher net charge than monomers. This is similar to the case found in disordered proteins [2]. Proline content observed in MoRFs also exceeds the proline percentage found in monomers. This result also motivated us to explore the presence of polyproline II helices in MoRFs 19 in later experiments. MoRFs also show higher proportions of aromatic amino acids unlike monomeric proteins. This can be reasoned well since the side chains of aromatic amino acids tend to make strong and specific interactions [69] and which would be expected to exist in the case of proteins involved in molecular recognition phenomena. D. Order-Disorder Predictions and Functional classes Order/Disorder predictions using VL-XT [52-53] and VL3 [54] predictors revealed that as much as 65% disorder was present in sequences containing MoRFs. It was also interesting to note that as many as 30% (2723 residues) of the irregular residues were found to be ordered. This data confirm the hypothesis that the presence of such recognition motifs may be a general feature of disordered proteins. Table 6 lists the percentage distribution of order/disorder with respect to the different secondary structure assignments for the 372 MoRFs. α – residues β – residues ι – residues PDB Disorder Percent Predicted Disordered 9 2 18 7 Percent Predicted Ordered 18 10 30 7 Table 6: Order-Disorder statistics for different classes of MoRFs Figure 9 shows the distribution of predicted disorder in sequences containing MoRFs using both the predictors VL-XT and VL3. 20 VL3 Vs VLXT predictions for MoREs VLXT :# of MoRE Sequences VL3: # of MoRE sequences 70 66 65 60 56 52 # of sequences 50 47 44 40 40 32 28 30 22 31 32 30 22 23 41 - 50 51 - 60 23 17 20 10 10 5 5 0 0 - 10 11 - 20 21 - 30 31 - 40 61 - 70 71 - 80 81 - 90 91 - 100 Percent Disordered Figure 9: Disorder distribution in MoRFs using VL-XT & VL3 predictors Using the results of previous studies [13, 14] and a number of disorder prediction results from the MoRFs database it was easy to conclude that MoRFs primarily associated with signal transduction, cell-cycle regulation and gene expression and thus may often be implicated in various cancer types [15]. Recent studies have also helped unveil the high incidence and functional importance of disorder–to–order transitions in endocytosis [66] and in RNAand protein chaperones [67]. The disorder found in these sequences also strongly correlates with the sites of post-translational modification. A parallel PROSITE [37] search using these MoRFs also showed that a third of these contained phosphorylation sites and as many as 14% of them displayed the presence of myrostilation sites. An important observation from these order-disorder predictions was the coincidence of two of the well known binding regions on p53 (one in the N terminal domain with MDM2 as the binding partner and another in the Cterminal domain with Cyclin A2 as its recognition partner) with dips in VLXT 21 order-disorder plots and presence of disordered regions on either sides (Figure 10). Such examples from the MoRF dataset indicate the possibility of discovering novel binding regions in other proteins containing MoRFs. α-helical MoRF p53 bound to mdm2 irregular -MoRF p53 bound to cyclin A2 Figure 10: VLXT disorder prediction in p53 protein: Residues 17-27 on the Nterminal bind to MDM2 & residues 378 -386 on the C –terminal bind to cyclin A2. These regions also correspond to visible dips in the VLXT prediction plot indicating the possibility of finding novel binding sites in other proteins by using knowledge on MoRFs in them. Also, using Swiss-Prot sequences (201 in number) containing 227 MoRFs, we were able to gather preliminary insights into the general nature and functional classes MoRFs tend to form. Results of this analysis have been discussed in Table 7. 22 SW Keyword 3D-Structure Signal Glycoprotein Transmembrane Alternative Splicing Hydrolase DNA Binding Transcription Regulation Serine Protease Inhibitor Frequency 174 57 41 37 35 25 24 23 21 Table 7: Top 10 Swiss Prot functional classes returned for MoRFs The higher number of hits for keywords such as “Signal”, “Glycoprotein”, “Transmembrane” and “Alternative Splicing” corresponding to the MoRF dataset suggests that sequences containing MoRFs are more likely to be found involved in signaling processes or have higher than normal likelihoods of being transmembranic in nature or being alternatively spliced. By means of weak associations we could conclude that MoRFs may be found to have similar functional characteristics. These functional capacities are exploited in many molecular settings and thus making it easy to say that MoRFs may fulfill many different functions. By considering unifying mechanistic details of their various modes of action, one could possibly better understand other novel functions of MoRFs. E. Presence of polyproline type II hélices & Ramachandran Plot Using the algorithm from Sreerama et al [56] to calculate the presence of poly proline type II helices, we were able to obtain 53 such peptides (between the lengths of 4 and 12 residues) in the MoRF dataset. The existence of such 23 peptides in this dataset suggests that the extended and rather stiff poly-proline II helix conformation in MoRFs might be an explanation as to why the interaction site is exposed. Also, by extracting phi and psi angles for each of the MoRFs from their respective DSSP outputs, we were able to draw the following Ramachandran plot [70]. The boxed region in the plot indicates the region where the incidence of poly-proline II helices is the highest. Figure 9: Ramachandran Plot for MoRFs 24 Conclusions Functional disorder has long been noted to be associated with molecular recognition elements (MoRFs) that can bind to RNA, DNA and other protein(s) (or sometimes even smaller ligands). Pertinent to this function is also the success of disorder-based prediction of phosphorylation sites. Furthermore, the function of many, or possibly all, of these MoRFs depends directly on disorder in a way that the disordered segment serves for recognizing, solubilizing or loosening the structure of its binding partner. The multifarious functioning of MoRFs (as in the example of p53 which functions both as α –helical and irregular MoRF; Figures 3 (a) and 3(c)) assumes that the lack of an ordered structure contributes in many ways to their mechanisms of action. In fact, their highly malleable structure endows them with functional features unparalleled by ordered proteins. Here, in this report novel examples and extensions of MoRFs and their features are presented. Typical advantage of the great conformational freedom of intrinsically disordered proteins or protein fragments is most evident with entropic chains, which may exert a long range, entropic exclusion of other proteins or cellular constituents in spacer functions [57]. Another molecular setting where such regions abound is multidomain proteins, where globular domains are often separated by flexible linkers. These regions facilitate easy orientational search and allows the recognition of distant and/or discontinuous determinants on the target [14]. Fully disordered MoRFs also exploit this unique feature. Their extended structure enables them to contact their 25 partner(s) over a large binding surface for a protein of the given size, which allows the same interaction potential to be realized by shorter proteins overall, encoded by a more economical genome [26]. In addition to these advantages, the flexibility itself is instrumental to the assembly process itself, as certain complexes may not be assembled successfully from rigid components. Another unique consequence of the structural flexibility of MoRFs is their capacity to adapt to the structure of distinct partners, which enables an exceptional plasticity in cellular responses. An amply characterized case for this behavior is the Cdk inhibitor p21Cip1, which can interact with CycACdk2, CycE-Cdk2, CycD-Cdk4 complexes [58] and apoptosis signalregulating kinase 1 [59] under different conditions. The open, extended structure of MoRFs also enables an increased speed of interaction. It has been noted that macromolecular association rates are substantially improved by an initial, relatively non-specific association enabled by flexible (disordered) recognition segments, mechanistically formulated in the ‘‘fly-casting’’ [49] method of molecular recognition. Another prominent feature of MoRFs is that their extreme proteolytic sensitivity, in principle, allows for an effective control via rapid turnover. In fact, protein disorder prevails in signaling, regulatory and cancer-associated proteins, and which are known to be shortlived proteins subject to rapid turnover [10, 11]. Furthermore, disorder itself constitutes an integral part of the proteasomal destruction signal in two distinct ways. On the one hand, non-ubiquitinated MoRFs may be directly degraded by the 20S proteasome, as shown for p21Cip1 [60], tau proteins (also known as β-tranferrins and found involved in the Alzheimer’s disease) [61]. On the 26 other hand, this mechanism may also play a more subtle regulatory role, by processing disordered segments in multidomain proteins and releasing the flanking, constitutively activated globular domains due to the endoproteolytic activity of the proteasome [62]. Disorder may also constitute part of the signal to the ubiquitination system itself as the regions of securin and cyclin B recognized by the ubiquitination machinery have been shown recently to be natively unfolded [63]. 27 Discussions Our observations suggest that MoRFs, in general, do not have to undergo extensive structural rearrangements to adapt to their partner, as their residual structure is germane to their final conformational state. The importance of such structure in the binding process has been proposed for some MoRFs, such as p27 (Kip1), p53 [58] and GCN4 [64]. The function of MoRFs is often realized via the phenomena of molecular recognition, in a process of binding to a protein, RNA or DNA partner via disorder to- order transition [2, 3, 9–11]. Based on this terminology we suggest that the binding process be considered as a special type of protein folding and protein complex formation, since it includes the formation of intermolecular (tertiary structure) contacts between the MoRF and its binding partner and also enables the stabilization of the secondary structure elements. A physiologically effective action of MoRFs requires (i) specific and reversible, interactions with the partner (for activation and deactivation of the whole complex) and (ii) ability to fold quickly. To the analogy of folding models for globular proteins, two mechanisms of the formation of structure of MoRFs have been suggested. One of these mechanisms is that the MoRF is in a completely disordered state prior to binding and makes initial contacts almost anywhere along its sequence randomly. Subsequently, these contact points serve as sites for folding around which the formation of secondary structure elements occurs as dictated by the partner. In such a mechanism, the inherent conformational preferences of the intrinsically disordered protein 28 itself may be overridden by interactions with the partner, resulting in significantly different secondary structure elements in its uncomplexed and bound state. This mechanism could be understood to invoke the a priori formation of long-range interactions that facilitate the formation of subsequent secondary structural elements. The other mechanism involves the early formation of local secondary structure [43]. In this case, the structure of the MoRF is not entirely random and shows features that are also visible in the bound conformation. We believe that transiently or permanently ordered segment(s) present in MoRFs may serve as the binding sites for the partner proteins and around which the protein folds. Based on this, one can hypothesize that a MoRF complex which contains a multitude of contact points for its partner can be considered as a transient state of folding. Analysis of the distribution of secondary structure elements shows that MoRFs contain more irregular secondary structures, even in the bound state. The abundance of irregular motifs in the bound structures suggests that although their folding may be template-driven, MoRF partners do not impose large constraints on their structure. Helices were found with comparable frequencies in both MoRFs and monomeric proteins, whereas extended or sheet structures are less preferred in MoRFs. The prime cause of this deviation may be attributed to a different amino acid composition of MoRFs with increased levels of C, R, S, P and K and, decreased levels of L, V, F, I, Y and D. Speaking from an evolutionary perspective, evolution in monomeric proteins aims at conserving an amino acid sequence that, after folding, yields a protein with a well-defined function. In the case of MoRFs, evolutionary pressure 29 targeted the conservation of a sequence that initially lacks most signs of regular structure and yet is primed to assume order as soon as it encounters its macromolecular target(s). The strong conformational preference of MoRFs for helical structural elements also suggests that these structural elements could be temporarily populated while in the non-bound state. In other words, the actual possible conformational space of MoRFs is more limited than expansive, and there is fairly lesser amount of final possible structures. This idea is in perfect accordance with previous reported observations that MoRFs display signs of residual structure. Restricted choice of available conformational states minimizes the entropic costs of binding. Also the higher secondary structure prediction rates of MoRF structures indicates that partner proteins cause minimal disturbance in their pre-existing states. In short, interactions between a MoRF and the contact sites of the partner facilitate decreased enthalpic conditions for the reaction to a great extent, thereby leading to better stabilization of the protein complex. In summary, MoRFs can be regarded as “mixtures” of segments with strong and weak (negligible) secondary structure preferences. These results extend previous assertions that MoRFs possess structural features pertinent to their partner recognition and function. 30 Appendix A: Molecular Recognition Features (or Elements) and their partners 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end 1a02f 1a0rg 1a1rc 1a2cl 1a2xb 1a6ac 4aahb 1ab9a 1an1i 1aqdc c-Fos Transducin Ns4A Protein Alpha-Thrombin Troponin I Clip Methanol Dehydrogenase Gamma-Chymotrypsin Tryptase Inhibitor Hla-A2 P01100 P02698 P27958 P00734 P02643 P04233 AAA83766 P00766 P80424 P01892 SW SW SW SW SW SW GB SW SW SW 1-53 1-65 1-16 2-15 1-31 1-15 1-69 1-10 1-40 1-14 140-192 2-66 1678-1693 337-350 4-34 103-117 28-96 1-10 2-41 128-141 MoRE partner PDB ID 1ao2a 1aorp 1a1rb 1a2ch 1a2ca 1a6cb 4aaha 1a9ad 1an1e 1aqdd 1avfp 1avoa 1avpb 1avzc 1axcb 1b0nb 1b33n P20142 Q06323 P24937 P06241 P38936 P23308 P20116 SW SW SW SW SW SW SW 1-21 1-60 1-11 1-57 1-18 1-31 1-67 17-37 4-63 240-250 85-141 143-160 9-39 1-67 1avfa 1avob 1avpa 1avzb 1axcc 1bona 1b33a 1b41b 1b8hd Gastricsin 11S Regulator Adenoviral Proteinase Fyn Tyrosine Kinase P21/Waf1 Sini Protein Phycobilisome 7.8 Kd Linker Polypeptide Fasciculin-2 DNA Polymerase Fragment P01403 AAA93077 SW GB 1-61 1-11 1-61 893-903 1b41a 1b8hc 1be3k 1bqpb 2btci 1bunb 1bxlb Cytochrome Bc1 Complex Lectin Trypsin Inhibitor Beta2-Bungarotoxin Bak Peptide P07552 P02867 P10293 P00989 Q16611 SW SW SW SW SW 1-22 1-47 1-29 1-61 1-16 15-36 218-264 4-32 25-85 72-87 1be3f 1bqpa 2btce 1buna 1bxla 31 MoRE partner PDB Name N-Fat Phosducin Ns3 Alpha-Thrombin Troponin C Hla-Dr3 Methanol Dehydrogenase Gamma-Chymotrypsin Trypsin Hla-Dr1 Class II Histocompatibility Protein Gastricsin 11S Regulator Adenoviral Proteinase Negative Factor Pcna Sinr Protein Allophycocyanin, Beta Chain Acetylcholinesterase DNA Polymerase Processivity Component Cytochrome Bc1 Complex Lectin Trypsin Beta2-Bungarotoxin Bcl-Xl Appendix A: Molecular Recognition Features (or Elements) and their partners 25 26 27 28 29 30 31 32 33 34 35 MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end 1c04c 1c5wa Ribosomal Protein L11 Urokinase-Type Plasminogen Activator Ice Inhibitor Tnf-R2 Wiskott-Aldrich Syndrome Protein Wasp Activated P21Cdc42Hs Kinase Calcium Pump Alpha-Amylase Inhibitor Fragment Of Coat Protein Vp2 Ref-1 Peptide Cathepsin B Coagulation Factor Viia (Light Chain) (Des-Gl Signaling Lymphocytic Act. Molecule Bowman-Birk Proteinase Inhibitor Precursor K-Ras4B Peptide Substrate 50S Ribosomal Protein L7/L12 Immunoglobulin G Binding Protein A Smad Anchor For Receptor Activation Shiga Toxin B Subunit DNA Polymerase Proteinase Inhibitor Ia3 Casein Kinase, Beta Chain P56210 P00749 SW SW 1-67 1-9 63-129 164-172 MoRE partner PDB ID 1c04d 1c5wb P07385 P20333 A55197 SW SW SW 1-32 1-7 1-59 310-341 422-428 230-288 1c8oa 1ca9e 1ceea Q07912 AAA74511 P80403 P12908 P27695 P07858 P08709 SW GB SW SW SW SW SW 2-44 1-20 1-32 11-29 1-13 1-47 1-55 447-489 1100-1119 1-32 279-297 59-71 80-126 150-204 1c4fa 1cffa 1clva 1cn3e 1cqga 1csbb 1cvwh NP_003028 GB 1-11 276-286 1d4ta P01055 SW 1-58 45-102 1d6ra Cdc42 Homolog Calmodulin Alpha-Amylase Coat Protein Vp1 Thioredoxin Cathepsin B Coagulation Factor Viia (Heavy Chain) (Des-G T Cell Signal Transduction Molecule Sap Trypsinogen P01118 P29396 P02976 SW SW SW 1-11 1-32 1-51 178-188 1-32 103-153 1d8db 1dd3b 1deee Farnesyltransferase (Beta Subunit) 50S Ribosomal Protein L7/L12 Igm Rf 2A2 AAC99462 GB 1-41 669-709 1deva XVEBBD P07917 P01094 P13862 GB SW SW SW 1-69 1-36 1-29 1-16 21-89 1200-1235 2-30 188-203 1dm0a 1dmla 1dp6a 1ds5d Mad (Mothers Against Decapentaplegic, Drosop Shiga Toxin A Subunit DNA Polymerase Processivity Factor Fixl Protein Casein Kinase, Alpha Chain 1c8ob 1ca9g 1ceeb 1cf4b 1cffb 1clvi 1cn3f 1cqgb 1csba 1cvwl 36 1d4tb 37 1d6ri 38 39 40 1d8dp 1dd3c 1deeg 41 1devb 42 43 44 45 46 1dm0b 1dmlb 1dp5b 1ds5e 32 MoRE partner PDB Name Ribosomal Protein L14 Urokinase-Type Plasminogen Activator Ice Inhibitor Tnf Receptor Associated Factor 2 GTP-Binding Rho-Like Protein Appendix A: Molecular Recognition Features (or Elements) and their partners 47 48 49 MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end 1dtdb 1e0ab Metallocarboxypeptidase Inhibitor Serine/Threonine-Protein Kinase Pak-Alpha ATP Synthase Epsilon Chain Chymotrypsin/Elastase Isoinhibitor 1 Dihydrolipoamide Acetyltransferase Peptide Mtf-E (13N3E) P81511 P35465 SW SW 1-61 3-46 20-80 75-118 MoRE partner PDB ID 1dtda 1e0aa P05632 P07851 SW SW 1-47 1-61 2-48 1-61 1e79h 1eaib Lipase G25K GTP-Binding Protein, Placental Isoform ATP Synthase Delta Chain Elastase P11961 SW 1-41 130-170 1ebdb Dihydrolipoamide Dehydrogenase P05504 SW 1-13 29-41 1ed3d Nucleoplasmin Beta-Dystroglycan Eukaryotic Translation Initiation Factor 4E B Eukaryotic Initiation Factor 4Gii Fmdv Peptide P05221 Q14118 NP_004086 SW SW GB 1-19 1-13 1-14 153-171 882-894 51-64 1ee6a 1eg4a 1ej4a Class I Major Histocompatibility Antigen Rt1 Pectate Lyase Dystrophin Eukaryotic Initiation Factor 4E AAC02903 AAA42624 GB GB 1-14 1-13 622-635 136-148 1ejhc 1ejoh P98072 P25054 SW SW 1-7 1-16 788-794 2034-2049 1ekbb 1emua AAC53151 GB 1-12 687-698 3erdb Estrogen Receptor Alpha P22289 SW 4-58 1ezvx Heavy Chain (Vh) Of Fv-Fragment 1f02t 1f3jp Enteropeptidase Adenomatous Polyposis Coli Protein Glucocorticoid Receptor Interacting Protein 1 Ubiquinol-Cytochrome C Reductase Complex 7.3 Translocated Intimin Receptor Lysozyme C Eukaryotic Initiation Factor 4E Igg2A Monoclonal Antibody (Heavy Chain) Enteropeptidase Axin AAC38390 P00698 GB SW 2-66 1-14 272-336 29-42 1f02i 1f3jd 1f47a 1f4vd 1f83b Cell Division Protein Zipa Flagellar Motor Switch Protein Synaptobrevin-II P06138 P06974 P19065 SW SW SW 1-17 1-16 1-24 367-383 1-16 53-76 1f47b 1f4vc 1f83a Intimin H-2 Class II Histocompatibility Antigen Cell Division Protein Ftsz Chemotaxis Chey Protein Botulinum Neurotoxin Type B 1e79i 1eaic 50 1ebdc 51 1ed3c 52 53 54 55 56 57 58 1ee5b 1eg4p 1ej4b 1ejhe 1ejop 1ekba 1emub 59 3erdc 60 1ezvi 61 62 63 64 65 66 33 MoRE partner PDB Name Appendix A: Molecular Recognition Features (or Elements) and their partners 67 68 69 70 71 72 73 74 75 76 77 MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end 1f8vd 1f93e Mature Capsid Protein Gamma Hepatocyte Nuclear Factor 1Alpha Ribosomal Protein L24E Ecotin Beta-Acrosin Light Chain 30S Ribosomal Protein S14 30S Ribosomal Protein Thx Coagulation Factor Xa Elafin Retinal Rod Rhodopsin-Sensitive Cgmp 3',5'- C Cyclin A/Cdk2-Associated P19 Myelin Basic Protein AAF71693 P22361 GB SW 1-26 1-31 362-401 1-31 MoRE partner PDB ID 1f8va 1f93d P14116 AAA16410 P08001 P24320 P32193 P00742 P19957 P04972 SW GB SW SW SW SW SW SW 1-53 1-48 1-13 1-60 1-24 1-52 1-47 1-38 4-56 122-169 20-32 1-60 2-25 127-178 71-117 50-87 1ffkt 1fi8b 1fiza 1fjgm 1fjgt 1fjsa 1flee 1fqjd AAC50242 AAC41944 GB GB 1-41 1-20 109-149 111-130 1fs1d 1fv1d CAA67686 GB 1-41 2-52 1g3jc AAA64465 P01062 GB SW 1-24 1-22 140-163 10-31 1g5ja 1g9ie P03652 SW 1-12 14-25 1gff2 1gg6a 1gl0i 1gl1i 1gngx 1h15c Tcf3-Cbd (Catenin Binding Domain) Bad Protein Bowman-Birk Type Trypsin Inhibitor Bacteriophage G4 Capsid Proteins Gpf, Gpg, Gp Gamma Chymotrypsin Protease Inhibitor Lcmi I Protease Inhibitor Lcmi II Frattide DNA Polymerase P00766 P80060 P80060 Q92837 P03198 SW SW SW SW SW 1-10 1-32 1-34 1-26 1-14 1-10 22-53 58-91 198-223 628-641 1gg6b 1gl0e 1glle 1gngb 1h15d 1h2ls Hypoxia-Inducible Factor 1 Alpha Q16665 SW 1-22 795-822 1h21a 1ffkr 1fi8d 1fizl 1fjgn 1fjgv 1fjsl 1flei 1fqjc 1fs1a 1fv1c 78 1g3jb 79 80 1g5jb 1g9ii 81 1gff3 82 83 84 85 86 87 88 34 MoRE partner PDB Name Mature Capsid Protein Beta Dimerization Cofactor Of Hepatocyte Nucl Ribosomal Protein L30 Natural Killer Cell Protease 1 Beta-Acrosin Heavy Chain 30S Ribosomal Protein S13 30S Ribosomal Protein S20 Coagulation Factor Xa Elastase Chimera Of Guanine NucleotideBinding Protei Cyclin A/Cdk2-Associated P45 Major Histocompatibility Complex Alpha Chain Beta-Catenin Armadillo Repeat Region Apoptosis Regulator Bcl-X Trypsinogen, Cationic Bacteriophage G4 Capsid Proteins Gpf, Gpg, G Gamma Chymotrypsin Chymotrypsinogen A Alpha-Chymotrypsin Glycogen Synthase Kinase-3 Beta Hla Class II Histocompatibility Antigen Factor Inhibiting Hif1 Appendix A: Molecular Recognition Features (or Elements) and their partners 89 90 91 92 MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end 1h2sb 1h25e Sensory Rhodopsin II Transducer Retinoblastoma-Associated Protein Cellular Tumor Antigen P53 Retinoblastoma-Like Protein 1 Cyclin-Dependent Kinase Inhibitor 1B Cytotoxic T-Lymphocyte Protein 4 Bacteriophage T4 Short Tail Fibre Caat/Enhancer Binding Protein Beta P-Selectin Peptide P42259 P06400 SW SW 1-60 1-10 23-82 869-879 P04637 P28749 P46527 SW SW SW 1-9 1-10 1-6 P16410 SW P10930 P17676 Myelin Basic Protein Cytochrome C Oxidase Polypeptide IV Hirudin Variant-1 Melanoma-Associated Antigen 4 S-Adenosylmethionine Decarboxylase Beta Chain Mhc Class II I-Ak Cathepsin L: Light Chain DNA Ligase IV Ca2+/Calmodulin Dependent Kinase Kinase Importin Alpha-2 Subunit Epidermal Growth Factor Holliday Junction DNA Helicase Ruva 1h26e 1h28e 1h27e 93 1h6ep 94 95 1h6wb 1h89a 96 1hesp 97 98 99 100 101 102 103 104 105 106 107 108 109 1hqrc 1hr8o 1hxei 1i4fc 1i72b 1iakp 1icfb 1ik9c 1iq5b 1iq1a 1ivoc 1ixsa MoRE partner PDB ID 1h2sa 1h25a MoRE partner PDB Name 378-386 654-663 25-35 1h26d 1h28d Cyclin A2 Cyclin A2 Cyclin A2 1-10 197-206 1h6ea SW SW 1-10 1-64 518-527 273-336 1h6wa 1h89c Clathrin Coat Assembly Protein Ap50 Bacteriophage T4 Short Tail Fibre Myb Proto-Oncogene Protein P16109 SW 1-9 814-822 1hesa AAC41944 P04037 GB SW 1-10 1-13 114-123 7-19 1hqrb 1hr8h P01050 P43358 P17707 SW SW SW 1-10 1-10 1-61 55-64 230-239 4-67 1hxeh 1i4fb 1i72a P24364 P07711 XP_007098 T37317 SW SW GB PIR 1-13 1-42 1-28 1-24 50-62 292-333 755-782 334-357 1iakb 1icfc 1ik9b 1iq5b Clathrin Coat Assembly Protein Ap50 Hla-Dr Beta Chain Mitochondrial Processing Peptidase Beta Subunit Thrombin Beta-2-Microglobulin S-Adenosylmethionine Decarboxylase Alpha Chain Mhc Class II I-Ak Cathepsin L: Heavy Chain DNA Repair Protein Xrcc4 Calmodulin P52293 P01133 Q9F1Q3 SW SW SW 1-7 1-47 1-50 47-53 975-1021 142-191 1iq1c 1ivob 1ixsb Importin Alpha-2 Subunit Epidermal Growth Factor Receptor Ruvb 35 Sensory Rhodopsin II Cyclin A2 Appendix A: Molecular Recognition Features (or Elements) and their partners 110 111 112 113 MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end 1izlf 1izlk 1j5am 1jacb 1jb0i Photosystem II: Subunit Psbf Photosystem II: Subunit Psbk Ribosomal Protein L32 Jacalin Photosystem 1 Reaction Centre Subunit Viii Photosystem 1 Reaction Centre Subunit Xii Cell Death Protein Grim Head Involution Defective Protein C-Type Natriuretic Peptide NP_682332 Q9F1K9 P49228 P18671 P25900 GB SW SW SW SW 1-30 1-27 1-58 1-15 1-38 14-43 14-40 2-59 4-18 1-38 MoRE partner PDB ID 1izld 1izld 1j5al 1jaca 1jb0f P25903 SW 1-31 1-31 1jb0l AAC47727 AAA79985 P23582 GB GB SW 1-8 1-8 1-18 2-9 2-9 109-126 1jd5a 1jd6a 1jdpb Neuroserpin Neuroserpin Insulin B Peptide Mitogen-Activated Protein Kinase Kinase 2 Splicing Factor U2Af 65 Kda Subunit Protein Mu-1 Agglutinin Cation-Independent Mannose 6Phosphate Recept Adenomatous Polyposis Coli Protein Elongation Factor G Crk Cation-Independent Mannose-6Phosphate Recept O35684 O35684 CAA08766 NP_109587 SW SW GB GB 1-40 1-31 1-13 1-16 25-64 367-397 35-47 1-16 1jjoc 1jjoc 1jk8b 1jkya P26368 SW 1-23 90-112 1jmta AAA47236 P18676 P11717 GB SW SW 1-33 1-16 1-8 10-42 2-17 2484-2491 1jmub 1jota 1jplb P25054 SW 1-11 1021-1031 1jppb P13551 Q64010 P11717 SW SW SW 1-32 1-12 1-7 220-251 217-228 2485-2491 1jqra 1ju5a 1jwgb 114 1jb0m 115 116 117 118 119 120 121 1jd5b 1jd6b 1jdph 1jjoa 1jjoe 1jk8c 1jkyb 122 1jmtb 123 124 125 1jmua 1jotb 1jple 126 1jppc 127 128 129 130 1jqsb 1ju5b 1jwgc 36 MoRE partner PDB Name Photosystem II: Subunit Psbd Photosystem II: Subunit Psbd Ribosomal Protein L22 Jacalin Photosystem 1 Reaction Centre Subunit III Photosystem 1 Reaction Centre Subunit Xi Apoptosis 1 Inhibitor Apoptosis 1 Inhibitor Atrial Natriuretic Peptide Clearance Recepto Neuroserpin Neuroserpin Mhc Class II Hla-Dq8 Lethal Factor Splicing Factor U2Af 35 Kda Subunit Protein Mu-1 Agglutinin ADP-Ribosylation Factor Binding Protein Beta-Catenin DNA Polymerase Beta-Like Crk ADP-Ribosylation Factor Binding Protein Gga1 Appendix A: Molecular Recognition Features (or Elements) and their partners MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end 1k2dp Myelin Basic Protein Peptide With 8 Residue Insulin Receptor Substrate 1 XP_040888 GB 1-8 2-9 MoRE partner PDB ID 1k2db P35568 SW 1-8 894-901 1k3aa Dipeptydil-Peptidase I Heavy Chain Steroid Receptor Coactivator-1 Peptide N-Y-C Ps1 Peptide Regulator Of G-Protein Signaling 14 Nuclear Receptor Co-Repressor 2 P53634 SW 1-69 395-463 1k3bb AAB50242 Q13291 AAK97192 O08773 GB SW GB SW 1-10 1-12 1-15 1-35 687-696 275-286 17-31 496-530 1k4wa 1ka7a 1kcrh 1kjyc Q9Y618 SW 1-19 2339-2357 1kkqd Nuclear Pore Complex Protein Nup98 Amphiphysin Traf Family Member-Associated Nf-Kappa-B Acti Traf Family Member-Associated Nf-Kappa-B Acti Outer Membrane Virulence Protein Yope General Control Protein Gcn4 Map Kinase Kinase 3B Eukaryotic Protein Synthesis Initiation Facto Plasma Serine Protease Inhibitor Fibrinogen Alpha/Alpha-E Chain Oligopeptide Substrate For The Protease P52948 SW 1-6 882-887 1ko6a P49418 Q92844 SW SW 1-9 1-11 322-330 177-187 1ky7a 1kzza Nuclear Receptor Ror-Beta Sh2 Domain Protein 1A Pc283 Immunoglobulin Guanine Nucleotide-Binding Protein G(I) Peroxisome Proliferator Activated Receptor Nuclear Pore Complex Protein Nup98 Alpha-Adaptin C Tnf Receptor Associated Factor 3 Q92844 SW 1-17 178-194 1l0aa Tnf Receptor Associated Factor 3 P08008 SW 1-57 22-78 1l2wh Yope Regulator P03069 AAB40652 AAC82471 SW GB GB 1-28 1-8 1-22 250-277 22-29 138-159 1ld4d 1leza 1lj2a P05154 P02671 P04517 SW SW SW 1-29 1-65 1-10 378-406 145-209 2785-2794 1lq8c 1lt9b 1lvba Coat Protein C Mitogen-Activated Protein Kinase 14 Nonstructural RNA-Binding Protein 34 Plasma Serine Protease Inhibitor Fibrinogen Beta Chain Catalytic Domain Of The Nuclear Inclusio 131 1k3ab 132 1k3bc 133 134 135 136 1k4wb 1ka7b 1kcrp 1kjyb 137 1kkqe 138 1ko6b 139 140 1ky7p 1kzzb 141 1l0ab 142 1l2wi 143 144 145 146 147 148 149 1ld4e 1lezb 1lj2c 1lq8b 1lt9a 1lvbc 37 MoRE partner PDB Name H-2 Class II Histocompatibility Antigen Insulin-Like Growth Factor 1 Receptor Dipeptydil-Peptidase I Light Chain Appendix A: Molecular Recognition Features (or Elements) and their partners 150 151 152 153 MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end 1lw6i Subtilisin-Chymotrypsin Inhibitor2A Jacalin, Beta Chain Nuclear Receptor Coactivator 2 Integrin Beta3 Transforming Growth Factor Alpha Serine Proteinase Inhibitor (Serpin), Chain B P-Glycoprotein Target Sequence Of Rat Calmodulin-Dependent Protein Kinase I U4/U6 Snrnp 60Kda Protein Peptide Corresponding To The NTerminal Exten Pyruvoyl-Dependent Arginine Decarboxylase Bet Iq2 and Iq3 Motifs From Myo2P, A Class V Myos Enoyl-Acyl Carrier Reductase Transcription Initiation Factor Iia Large Chain Nitric-Oxide Synthase, Endothelial Retinoic Acid Receptor Steroid Receptor Coactivator-1 Nuclear Receptor Coactivator 1 Isoform 3 Glutamate Decarboxylase P01053 SW 1-63 AAA32678 Q15596 AAA67537 P01135 GB SW GB SW ZP_00059457 1m26b 1m2zb 1mk7a 1moxc 154 1mtpb 155 156 157 158 1mvup 1mxee 1mzwb 1n12b 159 1n13a 160 1n2dc 161 162 1nhdc 1nh2b 163 1niwb 164 165 166 167 168 2nlla 1nq7b 1nrlc 1nwdb MoRE partner PDB Name 21-83 MoRE partner PDB ID 1lw6e 1-15 1-21 3-13 1-49 64-78 734-754 739-749 41-89 1m26c 1m2zd 1mk7b 1moxb Jacalin, Alpha Chain Glucocorticoid Receptor Talin Epidermal Growth Factor Receptor GB 1-35 385-419 1mtpa AAA37004 Q63450 GB SW 1-13 1-25 1210-1222 294-318 1mvub 1mxeb Serine Proteinase Inhibitor (Serpin), Chain Ig Vdj-Region (Heavy Chain) Calmodulin O43172 P42190 SW SW 1-31 1-11 107-137 22-32 1mzwa 1n12a U-Snrnp-Associated Cyclophilin Mature Fimbrial Protein Pape Q57764 SW 1-46 7-52 1n13b P19524 SW 1-48 806-853 1n2db Pyruvoyl-Dependent Arginine Decarboxylas Myosin Light Chain AAK25802 P32773 GB SW 1-60 1-46 366-425 3-48 1nhdb 1nh2a Enoyl-Acyl Carrier Reductase Transcription Initiation Factor Tfiid P29474 SW 1-19 492-510 1niwc Calmodulin P19793 AAB50242 NP_671766 SW GB GB 1-66 1-10 1-15 135-200 687-696 682-696 2nllb 1nq7a 1nrlb Thyroid Hormone Receptor Nuclear Receptor Ror-Beta Orphan Nuclear Receptor Pxr Q07346 SW 3-28 470-495 1nwda Calmodulin 38 Subtilisin Bpn Appendix A: Molecular Recognition Features (or Elements) and their partners MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end 1nx0c Calpastatin P49342 SW 1-11 230-240 MoRE partner PDB ID 1nw0b 1nx1c Calpastatin P49342 SW 1-11 230-240 1nx1b 1occm 1oc0b 1oj5b Cytochrome C Oxidase Vitronectin Signal Transducer and Activator Of Transcript Restricted Expression Proliferation Associate Splicing Factor Sf1 P10175 P04004 P42226 SW SW SW 1-43 1-37 1-14 25-67 22-58 795-808 1occn 1oc0a 1oj5a Calcium-Dependent Protease, Small Subunit Calcium-Dependent Protease, Small Subunit Cytochrome C Oxidase Plasminogen Activator Inhibitor-1 Steroid Receptor Coactivator 1A Q9ULW0 SW 1-30 7-43 1ol5a Serine/Threonine Kinase 6 CAA03883 GB 1-13 13-25 1opia Flagellin Nuclear Receptor Coactivator 2 Flavocytochrome B558 Alpha Polypeptide Chymotrypsinogen A Copii-Binding Peptide Of The Integral Membran Histone H3 Retinoblastoma-Associated Protein Histone-Binding Protein N1/N2 Histone H3 Aspartate 1-Decarboxylase Beta Chain Outer Membrane Phospholipase (Ompla) Gh-Loop From Virus Capsid Protein Vp1 O67803 Q61026 NP_000092 SW SW GB 1-40 1-12 1-11 479-518 741-752 150-160 1orya 1osvb 1ov3b Splicing Factor U2Af 65 Kda Subunit Flagellar Protein Flis Bile Acid Receptor Neutrophil Cytosol Factor 1 P00766 Q01590 SW SW 1-14 1-10 16-29 201-210 1oxga 1pd0a Chymotrypsinogen A Protein Transport Protein Sec24 P02303 P06400 SW SW 1-7 3-19 8-14 860-876 1pega 1pjmb Histone H3 Methyltransferase Dim-5 Importin Alpha-2 Subunit P06180 P02303 P31664 SW SW SW 1-20 1-15 1-24 532-552 8-22 1-24 1pjnb 1pu9b 1pyub P00631 SW 1-13 33-45 1qd6c AAA42665 GB 1-24 133-156 1qgc4 Importin Alpha-2 Subunit Hat A1 Aspartate 1-Decarboxylase Alfa Chain Outer Membrane Phospholipase (Ompla) Inmunoglobuline 169 170 171 172 173 1ol5b 174 1opib 175 176 177 178 179 180 181 182 183 184 1oryb 1osvc 1ov3c 1oxgb 1pd0b 1pegp 1pjma 1pjna 1pu9b 1pyua 185 1qd6a 186 1qgc5 187 39 MoRE partner PDB Name Appendix A: Molecular Recognition Features (or Elements) and their partners 188 189 190 191 192 193 MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end 1qgkb 1qled Importin Alpha-2 Subunit Ccytochrome C Oxidase P52292 P77921 SW SW 1-44 1-43 11-54 8-49 MoRE partner PDB ID 1qgka 1qlec 1qsnb 1r0tb 1r1rd Histone H3 Ovomucoid Ribonucleotide Reductase R2 Protein Nuclear Receptor Co-Repressor 2 Peroxisome Proliferator-Activated Receptor Bi Within The Bgcn Gene Intron Protein Flap Structure-Specific Endonuclease Rhodopsin Peptide From Collagen II Transcription Initiation Factor Iid 230K Chai Tissue Factor Pathway Inhibitor Heat Labile Enterotoxin Type Iib Nucleoporin Nup2 Potential Transcriptional Repressor Not4Hp Hirudin P53 Marcks Iq4 Motif From Myo2P, A Class V Myosin Parathyroid Hormone-Related Protein P02303 P01004 P00453 SW SW SW 1-11 1-62 1-16 10-20 65-126 361-375 1qsna 1r0ta 1rlrb Importin Beta Subunit Cytochrome C Oxidase Polypeptide III Tgcn5 Histone Acetyl Transferase Trypsin Ribonucleotide Reductase R1 Protein Q9Y618 Q15648 SW SW 1-17 1-11 1414-1430 640-650 1r2bb 1rk3a B-Cell Lymphoma 6 Protein Vitamin D3 Receptor P82804 SW 1-33 3-35 1rk8b Mago Nashi Protein O29975 SW 1-11 326-336 1rxza DNA Polymerase Sliding Clamp O62798 P02458 A47371 SW SW PIR 1-18 2-11 1-67 50-67 1169-1178 11-77 1ry1u 2sebd 1tbab Signal Recognition Particle Protein Enterotoxin Type B Transcription Initiation Factor Tfiid P10646 P43528 P32499 O95628 SW SW SW SW 1-58 1-36 1-16 1-52 121-178 215-250 36-51 12-63 1tfxb 1tiia 1un0b 1ur6a P28507 AAA59989 P26645 P19524 SW GB SW SW 1-15 1-11 1-18 1-25 51-65 17-27 148-165 854-878 1vith 1ycqa 1iwqa 1m46a Trypsin Heat Labile Enterotoxin Type Iib Importin Alpha Subunit Ubiquitin-Conjugating Enzyme E217 Kda 2 Alpha Thrombin Mdm2 Calmodulin Myosin Light Chain P12272 SW 1-28 103-130 1m5ns Importin Beta-1 Subunit 1r2bc 1rk3c 194 1rk8c 195 1rxzb 196 197 198 199 200 201 202 203 204 205 206 1ry1s 2sebe 1tbaa 1tfxc 1tiic 1un0c 1ur6b 1viti 1ycqb 1iwqb 1m46b 207 1m5nq 208 40 MoRE partner PDB Name Appendix A: Molecular Recognition Features (or Elements) and their partners 209 210 211 212 MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end 1m93a 1mqsb 1n0wb Serine Proteinase Inhibitor 2 Integral Membrane Protein Sed5 Breast Cancer Type 2 Susceptibility Protein Transcription Factor E2F2 Glycogen Synthase Kinase-3 Beta P07385 Q01590 P51587 SW SW SW 1-46 1-21 1-33 1-46 1-21 1519-1551 MoRE partner PDB ID 1m93b 1mqsa 1n0wa Q14209 P49841 SW SW 1-18 1-10 410-427 3-12 1n4mb 1o6ka Genome Polyprotein Capsid Protein C 16-Mer Peptide From Intercellular Adhesion Mo E2F-1 Transcription Factor P29846 SW 1-16 25-40 1n64h Serine Proteinase Inhibitor 2 Sly1 Protein DNA Repair Protein Rad51 Homolog 1 Retinoblastoma Pocket Rac-Beta Serine/Threonine Protein Kinase Fab 19D9D6 Heavy Chain P35330 SW 1-16 253-268 1j19a Radixin Q01094 SW 1-18 409-426 1o9kh Axin Peptide ADP-Ribosylation Factor Binding Protein Gga1 Transcription Initiation Factor Iia Alpha Cha Tumor Necrosis Factor Receptor Superfamily Meber Tumor Necrosis Factor Receptor Superfamily Member Cbp/P300-Interacting Transactivator 2 Rabaptin-5 O15169 Q9UJY5 SW SW 1-18 1-41 383-400 168-208 1o9ua 1j2ja Retinoblastoma Tumour Suppressor Protein Glycogen Synthase Kinase-3 Beta ADP-Ribosylation Factor 1 P52655 SW 1-43 9-51 1nvpa Tata Box Binding Protein Q02223 SW 1-39 8-46 1oqdj Q96RJ3 SW 1-31 16-46 1oqej Q99967 SW 1-52 193-259 1p4qb Tumor Necrosis Factor Ligand Superfamily Member Tumor Necrosis Factor Ligand Superfamily Member E1A-Associated Protein P300 Q15276 SW 1-6 440-445 1p4ua Preprotein Translocase Seca Subunit Nuclear Receptor Coactivator 2 Bcl2-Like Protein 11 Large T Antigen P43803 SW 1-24 876-899 1ozbh ADP-Ribosylation Factor Binding Protein Gga3 Protein-Export Protein Secb Q15596 AAC40030 P03070 SW GB SW 1-9 1-33 1-7 743-751 83-115 127-133 1p93b 1pqla 1qltc Glucocorticoid Receptor Apoptosis Regulator Bcl-X Importin Alpha-2 Subunit 1n4mc 1o6kc 213 1n64p 214 1j19b 215 1o9kp 216 217 1o9ub 1j2jb 218 1nvpb 219 1oqdk 220 1oqek 221 1p4qa 222 1p4ub 223 1ozbi 224 225 226 227 1p93e 1pq1b 1q1ta 41 MoRE partner PDB Name Appendix A: Molecular Recognition Features (or Elements) and their partners 228 229 MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end 1ujjc C-Terminal Peptide From BetaSecretase Nedd2-Like Caspase Cg8091-Pa Cytochrome B6-F Complex IronSulfur Subunit Golgi Autoantigen, Golgin Subfamily A Member R18 Peptide (Phcvprdlswldleanmclp) Peptide L-Pro10 P56817 SW 1-7 495-501 MoRE partner PDB ID 1ujjb NP_524017 P49728 GB SW 1-8 1-39 115-122 33-71 1q4qj 1q90d Q13439 SW 1-51 2172-2222 1r4aa NF00163012 PIR 1 - 20 1 - 20 1a38b ADP-Ribosylation Factor Binding Protein Gga1 Apoptosis 1 Inhibitor Cytochrome B6-F Complex Subunit 4 ADP-Ribosylation Factor-Like Protein 1 14-3-3 Protein Zeta O73683 Q9JJN2 SW SW 1 – 10 1- 10 1aqcb 1awib X11 Profilin Q8JNV2 SW 1 – 67 1aym3 Human Rhinovirus 16 Coat Protein CAB58569 NF00375716 NF00514021 P82107 NF00110208 EMBL PIR PIR SW PIR 1 - 10 1 - 10 1 - 10 1 - 59 1 - 44 459 - 468 1 - 10 1 - 10 1 - 59 1 - 44 1biib 1bjre 1bogb 1c9pa 1cqtb 1dxpc 1e0fi 1eb1a 1ebpc 2h1pp 1heze 1hh6c Human Rhinovirus 16 Coat Protein Decameric Peptide Lactoferrin Peptide Bdellastasin Pou Domain, Class 2, Associating Factor 1 Nonstructural Protein Ns4A (P4) Haemadin Peptide Inhibitor Epo Mimetics Peptide 1 Pa1 Protein L Pep-4 766 - 775 3098 - 3017, 3099 - 3018 2 - 69 NF00235394 Q25163 NF00866356 CAD13109 NF00522862 NF00429845 NF00505422 PIR SQ PIR EMBL PIR PIR PIR 1 - 16 1- 45 1 - 10 1 – 19 1 – 12 1 – 60 1 – 11 1 - 16 21 - 65 1 - 10 234 - 253 1 -12 1 - 60 1 -11 1dxpb 1eoff 1eb1h 1epbh 2h1ph 1hezd 1hh6b 2hrpp Hiv-1 Protease Peptide Q8ADZ9 SW 1 - 10 524 - 533 2hrpm Beta-2 Microglobulin Proteinase K Antibody (Cb 4-1) Trypsin Pou Domain, Class 2, Transcription Factor 1 Protease/Helicase Ns3 (P70) Thrombin Thrombin Heavy Chain Epo Receptor 2H1 Heavy Chain Of Ig Igg2A Kappa Antibody Cb41 (Heavy Chain) Monoclonal Antibody F11.2.32 1q4qk 1q90r 230 1r4ae 231 1a38p 232 233 1aqcc 1awip 234 1aym4 235 236 237 238 239 240 241 242 243 244 245 246 247 248 1biip 1bjri 1bogc 1c9pb 1cqti 42 MoRE partner PDB Name Appendix A: Molecular Recognition Features (or Elements) and their partners 249 250 251 252 253 254 255 256 257 258 259 260 MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end 1ir3b 1juqe Peptide Substrate Cation-Dependent Mannose-6Phosphate Receptor Endo-1,4-Beta-Xylanase Y Gp120 Talin Strep-Tag II Peptide Two Chain Tissue Plasminogen Activator Sfti-1 Peptide Epitope Ste-20 Related Adaptor Orexin NF00103224 P20645 PIR SW 1 – 18 1 - 10 1 - 18 265 - 277 MoRE partner PDB ID 1ir3a 1juqa P16218 CAA00727 Q8AWI0 CAC22716 NF00107636 SW EMBL SW EMBL PIR 2 - 56 1 -16 1 -26 1 - 10 1 - 17 832 - 891 316 - 333 1944 - 1969 650 - 659 1 - 17 1ohza 1qnzh 1rkca 1rsub 1rtfb NF00227071 NP_877418 GI:33303905 NF01572587 PIR GB GB PIR 1 - 14 1 – 13 1 - 10 1 - 33 1 - 14 179 - 191 391 - 402 32 - 63 1sfia 1sm3h 1upka 1uvqb 13-Mer Peptide Fusion Protein Consisting Of Transforming Pro Igg1 Fab Fragment (59.1) Complexed With Hiv-1 Alpha1 Antichymotrypsin - Chain B c-AMP-Dependent Protein Kinase (E.C. 2.7.1.37) C-Myc Tag and His Tag Modified Alpha=1=-Antitrypsin (Modified Alpha Tyrosine Phosphatase Syp (NTerminal Sh2 Domain) Tyrosine Phosphatase Syp (NTerminal Sh2 Domain) O73683 NF01479846 SW PIR 1 - 13 1 - 11 764 - 776 1 - 11 1x11a 1n4pl NF00927552 PIR 2 -25 1 - 23 1acyh Q9UNU9 SW 1 - 40 368 - 407 2acha GI:530223 GB 1 -20 7 - 26 1apme GI:28474948 GI:22207050 GB GB 1 -17 1 -36 364 - 380 613 - 648 2ap2d 7apia GI:189730 GB 1 - 11 1006 - 1016 1ayaa Q28224 SW 1 - 12 901 - 912 1ayba 1ohzb 1qnzp 1rkcb 1rsup 1rtfa 1sfii 1sm3p 1upkb 1uvqc 1x11c 1n4pm 261 1acyp 262 2achb 263 1apmi 264 265 2ap2e 7apib 266 1ayap 267 1aybp 268 43 MoRE partner PDB Name Insulin Receptor ADP-Ribosylation Factor Binding Protein Cellulosomal Scaffolding Protein A 0.5B Antibody (Heavy Chain) Vinculin Streptavidin Two Chain Tissue Plasminogen Activator Trypsin Sm3 Antibody Mo25 Protein Hla Class II Histocompatibility Antigen X11 Geranyltransferase Type-I Beta Subunit Igg1 Fab Fragment (59.1) Complexed With HivAlpha1 Antichymotrypsin - Chain A c-AMP-Dependent Protein Kinase (E.C. 2.7.1.3 Antibody (Heavy Chain) Modified Alpha=1=-Antitrypsin (Modified Alph Tyrosine Phosphatase Syp (NTerminal Sh2 Domain) Tyrosine Phosphatase Syp (NTerminal Sh2 Domain) Appendix A: Molecular Recognition Features (or Elements) and their partners 269 270 MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end 1aycp Tyrosine Phosphatase Syp (NTerminal Sh2 Doma Cricket Paralysis Virus, Vp4 Calmodulin (Calcium-Bound) Complexed With Rab Black Beetle Virus Capsid Protein (Bbv) Compl Hirudin I Bacteriophage Phix174 Capsid Proteins Gpf, Gp Peptide Trypsin (E.C. 3.4.21.4) Variant (D189G, G226D Grip1 Hla-Dr2 His Tag Calmodulin Complexed With Calmodulin-Binding Calmodulin Complexed With Calmodulin-Binding Antigen Bound Peptide NF00959209 PIR 1 - 11 1 - 11 MoRE partner PDB ID 1aycp Q9IJX3 XP_130630 SW GB 1 - 57 1 – 16 1 - 57 646 - 671 1b35c 2bbma P04329 SW 1 – 44 364 - 407 2bbvb GI:2297640 NF00701276 GB PIR 1 - 10 1 – 37 383 - 394 1 - 37 1bmmh 2bpa2 Q7KZ97 NF00704918 SW PIR 1 - 10 1 - 56 413 - 424 1 - 58 1br8i 1brbe NF00126022 NF00516257 GI:5359489 P11799 PIR PIR GB SW 1 - 13 1 -15 1 - 12 1 -20 1 - 13 240 - 254 1 - 12 1730 - 1749 1bsxb 1bx2d 1c3qa 1cdla Q9Y2H4 SW 1 - 25 339 - 363 1cdma NF00531311 PIR 1 - 11 1 - 11 1cfsb Alpha-Chymotrypsinogen Complex With Human Pan Alpha-Chymotrypsin (E.C. 3.4.21.1) Complex Wi Proline Peptide Ribonuclease S Rat Ca2+/Calmodulin Dependent Protein Kinase NF00086129 PIR 1 - 56 1 - 56 1cgie C31444 PIR 1 - 56 1 - 56 1choe Q9Y6V0 GI:15984328 Q64572 SW GB SW 1 - 14 1 - 15 1 - 16 2336 - 2350 173 - 187 438 - 463 1cjfa 1cjqb 1ckka 1b35d 2bbmb 271 2bbvd 272 273 274 275 276 277 278 279 1bmmi 2bpa3 1br8p 1brbi 1bsxx 1bx2c 1c3qx 1cdle 280 1cdmb 281 1cfsc 282 1cgii 283 1choi 284 285 286 287 1cjfc 1cjqa 1ckkb 44 MoRE partner PDB Name Tyrosine Phosphatase Syp (NTerminal Sh2 Dom Cricket Paralysis Virus, Vp3 Calmodulin (Calcium-Bound) Complexed With Ra Black Beetle Virus Capsid Protein (Bbv) Comp Alpha-Thrombin Bacteriophage Phix174 Capsid Proteins Gpf, G Antithrombin-III Trypsin (E.C. 3.4.21.4) Variant (D189G, G226 Thyroid Hormone Receptor Beta Hla-Dr2 Hydroxyethylthiazole Kinase Calmodulin Complexed With Calmodulin-Binding Calmodulin Complexed With Calmodulin-Binding Igg2A Kappa Antibody Cb41 (Heavy Chain) Alpha-Chymotrypsinogen Complex With Human Pa Alpha-Chymotrypsin (E.C. 3.4.21.1) Complex W Human Platelet Profilin Ribonuclease S Calmodulin Appendix A: Molecular Recognition Features (or Elements) and their partners 288 289 290 291 292 293 294 295 MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end 2ck0p 2clrc 11-Mer Human Class I Histocompatibility Antigen (Hla Recognition Peptide N-Terminal Histidine Tag Conalbumin Peptide Numb Associate Kinase Factor Xiii Activation Peptide (28-37) 12-Mer Peptide Hla-Dr1 (Dra, Drb1 0101) Human Class II Histo Alpha-Thrombin (E.C. 3.4.21.5) Complex With ( Ba3-Type Cytochrome-C Oxidase Endothia Aspartic Proteinase (Endothiapepsin) Hiv-1 Gp120 NF01342382 P27797 PIR SW 1 - 11 1 - 10 1 -11 1 -10 MoRE partner PDB ID 2ck0h 2clrd NF00502045 NF00113306 NF00518514 Q9U485 GI:182837 PIR PIR PIR SW GB 1 - 10 1 - 14 1- 11 1 – 11 1 - 10 1 - 10 1 - 14 1 - 11 1439 - 1449 58-67 1cu4h 1d7qa 1d97k 1ddma 1de7k NF00691447 Q03909 PIR SW 1 - 12 1 - 13 1 - 12 327 - 339 1dkdb 1dlhb GI:2297640 GB 1 - 11 384 - 394 1dwbh P82543 NF00646473 SW PIR 1 - 33 1 - 10 2 - 34 1 - 10 1ehkb 4er4e NF00498472 PIR 1 - 11 1 - 11 2f58h Cyclic Peptide (Gp120) Immunoglobin Fc (Igg1) Complexed With Protein Beta-Acrosin Light Chain Igg2A Fab Fragment (C3) Complexed With Poliov Antagonist Peptide Af10847 Bisubstrate Peptide Inhibitor NF00528581 NF00155672 PIR PIR 1 - 10 1- 57 1 - 10 47-103 3f58h 1fcca Q9GL10 Q84865 SW SW 1 - 22 1 - 18 18 - 39 678 - 695 1fiwa 1fptl NF00130104 NF00138001 PIR PIR 1 - 21 1 - 13 1 - 21 1 - 13 1g0yr 1gaga Igg2A Fab Fragment (50.1) Complex With 16-Res NF00927547 PIR 2 - 17 2 - 17 1ggim 1cu4p 1d7qb 1d9kp 1ddmb 1de7a 1dkde 1dlhc 296 1dwbi 297 298 1ehkc 4er4i 299 2f58p 300 301 302 303 304 305 3f58p 1fccc 1fiwl 1fptp 1g0yi 1gagb 306 1ggip 307 45 MoRE partner PDB Name Immunoglobulin Human Class I Histocompatibility Antigen (Hl Fab Heavy Chain Translation Initiation Factor 1A Mhc I-Ak B Chain (Beta Chain) Numb Protein Alpha-Thrombin (Heavy Chain) Groel Hla-Dr1 (Dra, Drb1 0101) Human Class II Hist Alpha-Thrombin (E.C. 3.4.21.5) Complex With Ba3-Type Cytochrome-C Oxidase Endothia Aspartic Proteinase (Endothiapepsin Igg1 Fab 58.2 Antibody (Heavy Chaiin) Immunoglobulin Gamma I (58.2) Immunoglobin Fc (Igg1) Complexed With Protei Beta-Acrosin Heavy Chain Igg2A Fab Fragment (C3) Complexed With Polio Interleukin-1 Receptor, Type I Insulin Receptor, Tyrosine Kinase Domain Igg2A Fab Fragment (50.1) Complex With 16-Re Appendix A: Molecular Recognition Features (or Elements) and their partners MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db 1hagi GI:2297634 GB GI:50838420 GB P05619 1jgdc 1jn5c 1jpfc 1juip 1jycp 1jyip 1klqb Prethrombin2 (E.C. 3.4.21.5) Complexed With H Human Class I Histocompatibility Antigen (Hla Horse Leukocyte Elastase Inhibitor (Hlei) - C Mp-2 Alpha-Thrombin (E.C. 3.4.21.5) Complex With H Heat Labile Enterotoxin (Lt) Mutant With Val Hemagglutinin Ectodomain (Soluble Fragment, T Mp-2 Mp-1 Epidermal Growth Factor Receptor, Egfrviii Pe Igg1 Fab' Fragment (B13I2) Complex With Pepti Decameric Peptide Ligand From The Mart-1/Mela Peptide S10R Fg-Repeat Lcmv Peptidic Epitope Gp276 10-Mer Peptide 15-Mer Peptide 12-Mer Peptide Mad2-Binding Peptide 1klgc Triosephosphate Isomerase 308 1hhhc 309 1hleb 310 311 1hqqe 1hrti 312 1htlc 313 1htma 314 315 316 1hxlc 1hy2e 1i8ic 317 2igfp 318 1jf1c 319 320 321 322 323 324 325 326 327 MoRF start - end 389 - 398 MoRE partner PDB ID 1hage 1 - 10 237 - 246 1hhhb SW 1 - 31 349 - 379 1hlea NF01417697 GI:1568172 PIR GB 1 - 14 1 - 60 1 - 14 2 - 61 1hqqa 1hrth GI:412520 GB 1 -49 210 - 258 1htla GI:413463 GB 1 - 27 140 - 166 1htmb NF01324821 NF01324820 NF00926580 PIR PIR PIR 1 - 14 1 - 12 1 - 12 1 - 14 1 - 12 1 - 12 1hxlb 1hy2d 1i8ib P02247 SW 1 – 30 68 - 97 2igfh GI:32260204 GB 1 - 10 10 - 19 1jf1b NF01333403 Q86XD3 Q9WA79 NF01057159 NF01059711 NF01059710 NF00866633 PIR SW SW PIR PIR PIR PIR 1 - 10 1 - 10 1 - 11 1 - 10 1 - 15 1 - 12 1 - 12 1 - 10 1836 - 1847 281 - 291 1 - 10 1 - 15 1 - 12 1 - 12 1jgdb 1jn5b 1jpfb 1juid 1jycd 1jyid 1klqa NF01019839 PIR 1 - 15 1 - 15 1klgd 46 Db match start - end MoRE partner PDB Name Prethrombin2 (E.C. 3.4.21.5) Complexed With Human Class I Histocompatibility Antigen (Hl Horse Leukocyte Elastase Inhibitor (Hlei Streptavidin Alpha-Thrombin (E.C. 3.4.21.5) Complex With Heat Labile Enterotoxin (Lt) Mutant With Val Hemagglutinin Ectodomain (Soluble Fragment, Streptavidin Streptavidin Epidermal Growth Factor Receptor Antibody Mr Igg1 Fab' Fragment (B13I2) Complex With Pept Beta-2-Microglobulin Beta-2-Microglobulin Tap Beta-2-Microglobulin Concanavalin A Concanavalin A Concanavalin A Mitotic Spindle Assembly Checkpoint Protein Enterotoxin Type C-3 Appendix A: Molecular Recognition Features (or Elements) and their partners MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end MoRE partner PDB ID MoRE partner PDB Name 1ktrm Peptide Peptide Linker GI:668327 GB 1 - 20 1ktrh Anti-His Tag Antibody 3D5 Variable Heavy Cha Minimized B-Domain Of Protein A Z34C Phosphopeptide Epq(Phospho)Yeeipiyl Myocyte-Specific Enhancer Factor 2A Hei-Toe I Thioredoxin Mutant With Cys 35 Replaced By Al Clip Peptide NF00945281 PIR 1 - 34 258 - 277 , 263 - 282, 268 287 1 - 34 116xa P03079 SW 1 - 11 321 - 331 1lcja Immunoglobulin Gamma-1 Heavy Chain Constant P56==Lck== Tyrosine Kinase Q03414 SW 1 - 12 308 - 319 1lewa Mitogen-Activated Protein Kinase 14 NF01188315 O13075 PIR SW 1 - 28 1 - 13 1 - 28 81 - 93 1mcva 1mdia NF01197858 PIR 1 - 36 1 - 36 1mujb P56488 SW 1 - 39 30 - 68 1nrnh XP_342878 GB 1 - 10 1078 - 1087 1ntva 1om9p Alpha-Thrombin (E.C. 3.4.21.5) Non-Covalently Apolipoprotein E Receptor-2 Peptide 15-Mer Peptide Fragment Of P56 Elastase 1 Thioredoxin Mutant With Cys 35 Replaced By A H-2 Class II Histocompatibility Antigen, A B Alpha-Thrombin (E.C. 3.4.21.5) Non-Covalentl Disabled Homolog 1 Q9D8L5 SW 2 - 16 2 - 16 1om9a 1or8b Substrate Peptide XP_407876 GB 1 – 19 346 - 364 1or8a 1orhb Substrate Peptide Q7S480 SW 1 - 10 1orhb 1ou8c Synthetic Ssra Peptide NF01422527 PIR 1 – 11 466 - 475, 498 - 507 1 - 11 1ox1b 2pldb 11-Mer Peptide Phospholipase C-Gamma-1 (E.C. 3.1.4.11) (C-Te 50S Ribosomal Protein L9 NF01756611 GI:189730 PIR GB 1 - 11 1 - 10 1 - 11 918 - 1029 1ox1a 2plda NF01342110 PIR 1 - 52 1 - 52 1pnue 328 1l6xb 329 1lcjb 330 1lewb 331 332 1mcvi 1mdib 333 1mujc 334 1nrnr 335 1ntvb 336 337 338 339 340 341 342 343 1pnuf 47 1ou8b ADP-Ribosylation Factor Binding Protein Gga1 Protein Arginine NMethyltransferase 1 Protein Arginine NMethyltransferase 1 Stringent Starvation Protein B Homolog Trypsinogen, Cationic Phospholipase C-Gamma-1 (E.C. 3.1.4.11) (C-T 50S Ribosomal Protein L6 Appendix A: Molecular Recognition Features (or Elements) and their partners 344 345 346 347 348 349 MoRF PDB ID MoRF PDB Name MoRF Dbref MoRF Db MoRF start - end Db match start - end 1pnu4 1pwwc 1qc6c 50S Ribosomal Protein L36 Lf20 Phe-Glu-Phe-Pro-Pro-Pro-ProThr-Asp-Glu-Glu His Tag Fibrinopeptide B Consensus Fen-1 Peptide Alpha-I Gliadin Q9RSK0 NF01571451 S20887 SW PIR PIR 2 - 36 1 - 20 1 - 10 2 - 36 1 - 20 198-208 MoRE partner PDB ID 1pnu5 1pwwb 1qc6a GI:13275534 NF01479229 NF01571458 NF01683718 GI PIR PIR PIR 1 - 15 1 - 16 1 - 12 1 - 11 8 - 22 1 - 16 1 - 12 1 - 11 1qrjb 1r17b 1rxma 1s9vb Myosin (Regulatory Domain) Chain A Reaper Serine Proteinase B Complex With The Potato I Semisynthetic Ribonuclease A (RNase 1-118(Col Semisynthetic Ribonuclease A Mutant With Asp Semisynthetic Ribonuclease A Mutant With Asp Ribonuclease A (Residues 1 - 118) Complexed W Ribonuclease A (Semisynthetic) Crystallized F Igg1 Monoclonal Fab Fragment (Te33) Complex W Alpha-Thrombin (E.C. 3.4.21.5) Complex With H Truncated Human Class I Histocompatibility An Theiler'S Murine Q17042 SW 1 -60 780 - 839 1scmb Q24475 P01080 SW SW 1 - 10 1 -51 2 - 11 55 - 105 1sdza 4sgbe GI:387884 GB 2 -15 143 - 156 1srna NF00159871 PIR 1 - 10 114 - 124 3srna NF00159749 PIR 1 - 10 114 - 124 3srna NF00945476 PIR 1 - 14 1 - 14 1ssaa GI:387884 GB 1 - 10 146 - 156 1ssca GI:209556 GB 1 - 15 78 - 92 1teth GI:490635 GB 1 - 13 67 - 79 1thrh Q99MQ0 SW 1 - 10 351 - 360 1tmcb NF01026525 PIR 1 - 31 1 - 31 1tmf3 1qrja 1r17c 1rxmb 1s9vc 350 1scma 351 352 1sdzb 4sgbi 353 1srnb 354 3srnb 355 4srnb 356 1ssab 357 1sscb 358 1tetp 359 1thri 360 1tmcc 361 362 1tmf4 48 MoRE partner PDB Name 50S Ribosomal Protein L1P Lethal Factor Evh1 Domain From Ena/Vasp-Like Protein Htlv-I Capsid Protein Fibrinogen-Binding Protein Sdrg DNA Polymerase Sliding Clamp Hla Class II Histocompatibility Antigen, Dq( Myosin (Regulatory Domain) - Chain B Apoptosis 1 Inhibitor Serine Proteinase B Complex With The Potato Semisynthetic Ribonuclease A (RNase 1-118(Co Semisynthetic Ribonuclease A Mutant With Asp Semisynthetic Ribonuclease A Mutant With Asp Ribonuclease A (Residues 1 - 118) Complexed Ribonuclease A (Residues 1 - 118) Complexed Igg1 Monoclonal Fab Fragment (Te33) Complex Alpha-Thrombin (E.C. 3.4.21.5) Complex With Truncated Human Class I Histocompatibility A Theiler'S Murine Encephalomyelitis Appendix A: Molecular Recognition Features (or Elements) and their partners 363 364 365 366 367 368 369 370 371 372 MoRF PDB ID MoRF PDB Name 1vf5e 1vf5h 1vppx 1m0fb 1n6eb 1p4bp 1ow6d 1ow8d 1r5ve 1r5we Encephalomyelitis Virus Coat Protein Pet L Protein Pet N Peptide V108 Scaffolding Protein B Dqtqkaaaeltff Gcn4(7P-14P) Peptide Paxillin Paxillin Artificial Peptide Artificial Peptide MoRF Dbref P83795 P83798 NF00088534 NF01151402 NF01138087 NF01743773 XP_341094 XP_341094 NF01572515 NF01676397 MoRF Db SW SW PIR PIR PIR PIR GB GB PIR PIR MoRF start - end 1 - 32 1 - 29 1 -20 1 - 60 1 - 13 1 - 12 1 – 13 1 - 13 1 - 13 1 - 13 372 Non redundant MoRFs (Source: PDB Seqres July 2004) 49 Db match start - end 1 - 32 1 - 29 1 - 20 1 - 60 1 - 13 1 - 12 284 - 296 163 - 175 1 - 13 1 - 13 MoRE partner PDB ID 1vf5d 1vf5n 1vppw 1m0fg 1n6ec 1p4bh 1ow6c 1ow6c 1r5vb 1r5wd MoRE partner PDB Name Virus Coa Rieske Iron-Sulfur Protein Cytochrome B6 Vascular Endothelial Growth Factor Major Spike Protein G Tricorn Protease Antibody Variable Heavy Chain Focal Adhesion Kinase 1 Focal Adhesion Kinase 1 Mhc H2-Ie-Beta Mhc H2-Ie-Beta Appendix B: MoRF Update MoRF chains from PDB Seqres Filtering (remove chains containing less than 10 residues, ambiguous amino acids etc) Non Redundant MoRFs July 2004 2512 1261 October 2005 4410 1937 372 486 50 References 1. Wright, P. E., and Dyson, H. J. (1999) Intrinsically unstructured proteins: Reassessing the protein structure-function paradigm, J. Mol. Biol. 293, 321-331. 2. Uversky VN, Gillespie JR, Fink AL. 2000. Why are "natively unfolded" proteins unstructured under physiologic conditions? Proteins 41: 415-427 3. Dunker AK, Lawson JD, Brown CJ, Williams RM, Romero P, Oh JS, Oldfield CJ, Campen AM, Ratliff CM, Hipps KW, Ausio J, Nissen MS, Reeves R, Kang C, Kissinger CR, Bailey RW, Griswold MD, Chiu W, Garner EC, Obradovic Z. 2001. Intrinsically disordered protein. J Mol Graph Model 19: 26-59 4. Dunker AK, Obradovic Z. 2001. The protein trinity--linking function and disorder. Nat Biotechnol 19: 805-806 5. Demchenko AP. 2001. Recognition between flexible protein molecules: induced and assisted folding. J Mol Recognit 14: 42-61 6. Namba K. 2001. Roles of partly unfolded conformations in macromolecular self-assembly. Genes Cells 6: 1-12 7. Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z. 2002. Intrinsic disorder and protein function. Biochemistry 41: 6573-6582 8. Dunker AK, Brown CJ, Obradovic Z. 2002. Identification and functions of usefully disordered proteins. Adv Protein Chem 62: 25-49 9. Dunker, A. K., Obradovic, Z., Romero, P., Garner, E. C., and Brown, C. J. (2000) Intrinsic protein disorder in complete genomes, Genome Inform. Ser. Workshop Genome Inform. 11,161-171. 51 10. Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F. and Jones, D.T. (2004) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol. 337, 635–645. 11. Iakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, Dunker AK. 2002. Intrinsic disorder in cell-signaling and cancer-associated proteins. J Mol Biol 323: 573-584 12. Tompa P. 2002. Intrinsically unstructured proteins. Trends Biochem Sci 27: 527-533 13. Fink AL. 2005. Natively unfolded proteins. Curr Opin Struct Biol 15: 35-41 14. Dyson HJ, Wright PE. 2005. Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6: 197-208; 15. Dunker A.K., Cortese M.S., Romero P., Iakoucheva L.M., Uversky V.N. (2005) Flexible nets: The roles of intrinsic disorder in protein interaction networks. FEBS Journal. (In press). 16. Uversky V.N., Oldfield, C., Dunker, A.K. (2005) Showing your ID: Intrinsic disorder as an ID for recognition, regulation and cell signalling. J. Mol. Recognition 18 (5) 343-384. 17. Uversky VN. 2002. Natively unfolded proteins: a point where biology waits for physics. Protein Sci 11: 739-756; 18. Uversky V.N. (2003) Protein folding revisited. A polypeptide chain at the folding – misfolding – non-folding crossroads: Which way to go? Cell. Mol. Life Sci. 60 (9) 1852-1871. 19. Oldfield CJ, Cheng Y, Cortese MS, Brown CJ, Uversky VN, Dunker AK. 2005. Comparing and combining predictors of mostly disordered proteins. Biochemistry 44: 1989-2000 52 20. Liu, J. and Rost, B. (2001).Comparing function and structure between entire proteomes. Protein Sci 10: 1970-1979 21. Vucetic S, Brown CJ, Dunker AK, Obradovic Z. 2003. Flavors of protein disorder. Proteins 52: 573-584-22. Callaghan, A.J., Aurikko, J.P., Ilag, L.L., Gunter Grossmann, J., Chandran, V., Kuhnel, K., Poljak, L., Carpousis, A.J., Robinson, C.V., Symmons, M.F. 2004. Studies of the RNA degradosome-organizing domain of the Escherichia coli ribonuclease RNase E. J. Mol. Biol. 340: 965-979 23. GlobPlot: exploring protein sequences for globularity and disorder Nucleic Acid Res 2003 - Vol. 31, No.13 24. Demchenko AP. 2001. Recognition between flexible protein molecules: induced and assisted folding. J Mol Recognit 14: 42-61 25. Namba K. 2001. Roles of partly unfolded conformations in macromolecular self-assembly. Genes Cells 6: 1-12; Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z. 2002. Intrinsic disorder and protein function. Biochemistry 41: 6573-6582 26. Gunasekaran K, Tsai CJ, Kumar S, Zanuy D, Nussinov R. 2003. Extended disordered proteins: targeting function with less scaffold. Trends Biochem Sci 28: 81-85 27. Dyson HJ, Wright PE. 2005. Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6: 197-208 28. Dyson HJ, Wright PE. 2002. Coupling of folding and binding for unstructured proteins. Curr Opin Struct Biol 12: 54-60 53 29. Vucetic S, Obradovic Z, Vacic V, Radivojac P, Peng K, Iakoucheva LM, Cortese MS, Lawson JD, Brown CJ, Sikes JG, Newton CD, and Dunker AK. 2005. "DisProt: A Database of Protein Disorder." Bioinformatics 21:137-140. 30. H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne: The Protein Data Bank. Nucleic Acids Research, 28 pp. 235-242 (2000 31. B Rost (1999) Twilight zone of protein sequence alignments. Protein Engineering, 12, 85-94 32. Bairoch A., Apweiler R.The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research 28:45-48(2000). 33. Cathy H. Wu, Lai-Su L. Yeh, Hongzhan Huang, Leslie Arminski, Jorge Castro-Alvear, Yongxing Chen, Zhang-Zhi Hu, Robert S. Ledley, Panagiotis Kourtesis, Baris E. Suzek, C. R. Vinayaka, Jian Zhang, and Winona C. Barker. The Protein Information Resource. Nucleic Acids Research, 31: 345-347, 2003. 34. Kabsch W. & Sander C. (1983) Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features, 22:25772637. 35. Rost, Burkhard; Sander, Chris: Prediction of protein structure at better than 70% accuracy. J. Mol. Biol., 1993, Vol. 232, pp. 584-599. 36. Rost, Burkhard; Sander, Chris: Combining evolutionary information and neural networks to predict protein secondary structure. Proteins, 1994 37. Falquet L., Pagni M., Bucher P., Hulo N., Sigrist C.J, Hofmann K., Bairoch A. The PROSITE database, its status in 2002. Nucleic Acids Research. 30:235238(2002). 54 38. Garner,E., Cannon,P., Romero,P., Obradovic,Z. and Dunker,A. (1998) Predicting disordered regions from amino acid sequence: common themes despite differing structural characterization. Genome Inform Ser Workshop Genome Inform, 9, 201–213. 39. Garner, E., Romero, P., Dunker, A., Brown, C. and Obradovic, Z. (1999) Predicting binding regions within disordered proteins. Genome Inform Ser Workshop Genome Inform, 10, 41–50. 40. Dyson, H. J. & Wright, P. E. (2001). Nuclear magnetic resonance methods for elucidation of structure and dynamics in disordered states. Methods Enzymol. 339, 258–270. 41. Fischer E, "Einfluss der configuration auf die wirkung derenzyme" Ber. Dt. Chem. Ges. 27, 2985-2993 (1894). 42. Koshland D.E. (1958). Application of a theory of enzyme specificity to protein synthesis. Proceedings of the National Academy of Sciences USA, 44(2), 98104. Wootton, J. C. (1994) Sequences with “unusual” amino acid compositions, Curr. Opin. Struct. Biol. 4, 413-421. 43. Kim, T. D., Ryu, H. J., Cho, H. I., Yang, C. H., and Kim, J. (2000) Thermal behavior of proteins: Heat-resistant proteins and their heat-induced secondary structural changes, Biochemistry 39, 14839-14846. 44. Schweers, O., Schonbrunn-Hanebeck, E., Marx, A., and Mandelkow, E. (1994) Structural studies of tau protein and Alzheimer paired helical filaments show no evidence for â-structure, J. Biol Chem. 269, 24290-24297. 45. Gast, K., Damaschun, H., Eckert, K., Schulze-Forster, K., Maurer, H. R., Muller-Frohne, M., Zirwer, D., Czarnecki, J., and Damaschun, G. (1995) 55 Prothymosin R: A biologically active protein with random coil conformation, Biochemistry 34, 13211-13218. 46. Shortle, D. & Ackerman, M. S. (2001). Persistence of native-like topology in a denatured protein in 8 M urea. Science, 293, 487–489. 47. Shortle, D. (1996) The denatured state (the other half of the folding equation) and its role in protein stability, FASEB J. 10, 27-34. 48. Tompa, P. (2003) The functional benefits of protein disorder, J. Mol. Struct. 666-667, 361-371. 49. Shoemaker, B. A., Portman, J. J. & Wolynes, P. G. (2000). Speeding molecular recognition by using the folding funnel: the fly-casting mechanism. Proc. Natl Acad. Sci. USA, 97, 8868–8873. 50. Zitzewitz, J. A., Ibarra-Molero, B., Fishel, D. R., Terry, K. L. & Matthews, C. R. (2000). Preformed secondary structure drives the association reaction of GCN4-p1, a model coiled-coil system. J. Mol. Biol. 296, 1105–1116. 51. Hollenbeck, J. J., McClain, D. L. & Oakley, M. G. (2002). The role of helix stabilizing residues in GCN4 basic region folding and DNA binding. Protein Science. 11, 2740–2747. 52. Li X, Romero P, Rani M, Dunker AK, Obradovic Z: Predicting Protein Disorder for N-, C-, and Internal Regions.Genome Inform Ser Workshop Genome Inform 1999, 10:30-40. 53. Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK: Sequence complexity of disordered protein. Proteins 2001, 42:38-48. 54. Romero P, Obradovic Z, Dunker K: Sequence Data Analysis for Long Disordered Regions Prediction in the Calcineurin Family. Genome Inform Ser Workshop Genome Inform 1997, 8:110-124. 56 55. Obradovic Z., Peng K., Vucetic S., Radivojac P., Brown C. and Dunker A.K., Predicting intrinsic disorder from amino acid sequence (2003). Proteins 53 (S6); 566-572. 56. Sreerama, N., and Woody, R.W. (1994) Biochemistry 33, 10022-10025. 57. Mukhopadhyay, R., and Hoh, J. H. (2001) AFM force measurements on microtubule-associated proteins: The projection domain exerts a long-range repulsive force, FEBS Lett. 505, 374-378. 58. Sherr C J, Roberts J M. Inhibitors of mammalian G1 cyclin-dependent kinases. Genes Dev. 1995; 9:1149–1163. 59. Kanamoto T, Mota MA, Takeda K, Rubin LL, Miyazopo K, Ichijo H, • Bazenet CE: Role of apoptosis signal-regulating kinase in regulation of the cJun N-terminal kinase pathway and apoptosis in sympathetic neurons. Mol Cell Biol 2000, 20:196-204. 60. Sheaff, R.J., Singer, J.D., Swanger, J., Smitherman, M., Roberts, J.M. and Clurman, B.E. (2000) Proteasomal turnover of p21Cip1 does not require p21Cip1 ubiquitination. Mol. Cell 5, 403–410. 61. David, D.C., Layfield, R., Serpell, L., Narain, Y., Goedert, M. and Spillantini, M.G. (2002) Proteasomal degradation of tau protein. J. Neurochem. 83, 176– 185. 62. Liu, C.W., Corboy, M.J., DeMartino, G.N. and Thomas, P.J. (2003) Endoproteolytic activity of the proteasome. Science 299, 408–411. 63. Cox, C.J., Dutta, K., Petri, E.T., Hwang, W.C., Lin, Y., Pascal, S.M. and Basavappa, R. (2002) The regions of securin and cyclin B proteins recognized by the ubiquitination machinery are natively unfolded. FEBS Lett. 527, 303– 308. 57 64. Hinnebusch, A. G., and G. R. Fink. 1983. Positive regulation in the general control of Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA 80:53745378 65. Kim, P. S. & Baldwin, R. L. (1982). Specific intermediates in the folding reactions of small proteins and the mechanism of protein folding. Annu. Rev. Biochem. 51, 459–489. 66. Dafforn, T.R. and Smith, C.J. (2004) Natively unfolded domains in endocytosis: hooks, lines and linkers. EMBO Rep. 5, 1046–1052. 67. Tompa, P. and Csermely, P. (2004) The role of structural disorder in the function of RNA and protein chaperones. FASEB J. 18, 1169–1175. 68. Fiser, A., Dosztanyi, Z. & Simon, I. (1997). The role of long-range interactions in defining the secondary structure of proteins is overestimated. Comput. Appl.Biosci. 13, 297–301. 69. Burley, S.K. and Petsko, G.A. 1985]\Aromatic-aromatic interaction: A mechanism of protein structure stabilization, "Science, vol. 229, pp. 23-28 70. G. N. Ramachandran and V. Sasiskharan (1968) Adv. Protein Chem. 23, 283437. 71. The NCBI handbook [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; 2002 Oct. Available from 58 CURRICULUM VITAE AMRITA MOHAN ammohan@indiana.edu EDUCATION 2005 – Present PhD student, Informatics, Indiana University, Bloomington 2003 - 2005 Masters of Bioinformatics, Indiana University, IUPUI 1999 - 2003 Bachelor of Info. Technology, University of Delhi, India RESEARCH/PROFESSIONAL EXPERIENCE 1. 2. 3. 4. 5 6 May ’05 – Aug ’05 Intern, Rosetta Inpharmatics - Merck, Seattle, WA, USA Aug ’03 – Aug ’05 Research Assistant, Center for Computational Biology & Bioinformatics, IUPUI, IN, USA Jun'02 - Aug'02 Intern, Institute of Advanced Biosciences-‘E-Cell Lab’, Japan Jun ‘01 – May ’02 Project Trainee, Center for Biochemical Technology (under Council for Scientific & Industrial Research), New Delhi, India Jul’98 – Apr’99 Experimental project “Gene probe for detection of chronic mylogenous leukemia”, India Jul’98 – Apr’99 Experimental project, “Gene expression for breast cancer”, India POSTERS & RESEARCH PUBLICATIONS 1. 2. 3. Poster Presentation: First Annual Indiana Bioinformatics Conference, Department of Biochemistry & Molecular Biology Poster Session, IUPUI, May 27, 2004. Amrita Mohan, Predrag Radivojac and Keith Dunker MoREs: Molecular Recognition Elements Poster Presentation: Research Day, Department of Biochemistry & Molecular Biology Poster Session, IUPUI, September 30, 2005. Amrita Mohan, Predrag Radivojac and Keith Dunker, MoREs: Molecular Recognition Elements (Publication in process) MoRFs: A dataset of Molecular Recognition Features Amrita Mohan, Predrag Radivojac and Keith Dunker,