A study of Molecular Recognition Elements `MoREs`

advertisement
MoRFs
A DATASET OF
MOLECULAR RECOGNITION FEATURES
Amrita Mohan
Submitted to the faculty of the Bioinformatics Graduate Program
in partial fulfillment of the requirements
for the degree
Master of Science
in the School of Informatics,
Indiana University
December 2005
Accepted by the Faculty of Indiana University, in partial
fulfillment of the requirements for the degree of Master of Science
________________________
A. Keith Dunker, PhD, Chair
________________________
Vladimir Uversky, PhD
________________________
Predrag Radivojac, PhD
Master’s Thesis
Committee
________________________
Narayanan B. Perumal, PhD
ii
To my parents,
in recognition of their worth
iii
Acknowledgements
The completion of this thesis would not have been possible without the
support of many people. A heart felt thanks to my research advisor, Keith for
his support, guidance and above all providing me with the financial means to
complete this thesis and Masters degree. Many thanks to Volodya, who
besides guiding my research work, sportingly read numerous revisions of this
thesis draft and helped make sense of confusion. I have furthermore to thank
my committee member, Peru who has unfailingly boosted my morale
throughout the duration of this study. Last but in no means the least, a sincere
thanks to Pedja, for being an incredible scientific help and a great friend in an
advisor’s guise.
iv
ABSTRACT
AMRITA MOHAN
MoRFs
A DATASET OF MOLECULAR RECOGITION FEATURES
The last decade has witnessed numerous proteomic studies which have
predicted and successfully confirmed the existence of extended structurally
flexible regions in protein molecules. Parallel to these advancements, the last
five years of structural bioinformatics has also experienced an explosion of
results on molecular recognition and its importance in protein-protein
interactions. This work provides an extension to past and ongoing research
efforts by looking specifically at the “flexibility and disorder” found in protein
sequences involved in molecular recognition processes and known as,
Molecular Recognition Elements or Molecular Recognition Features (MoREs
or MoRFs, as we call them). MoRFs are relatively short in length (10 – 70
residues length); loosely structured protein regions within longer sequences
that are largely disordered in nature. Interestingly, upon binding to other
proteins, these MoRFs are able to undergo disorder-to-order transition. Thus,
in our interpretation, MoRFs could serve as potential binding sites, and that
this binding to another protein lends a functional advantage to the whole
protein complex by enabling interaction with their physiological partner.
There are at least three basic types of MoRFs: those that form α-helical
structures upon binding, those that form β-strands (in which the peptide forms
a β-sheet with additional β-strands provided by the protein partner), and those
that form irregular structures when bound. Our proposed names for these
structures are α-MoRF (also known as α-MoRE, alpha helical molecular
v
recognition feature/element), β-MoRF (beta sheet molecular recognition
feature/element),
and
I-MoRF
(Irregular
molecular
recognition
feature/element), respectively. The results presented in this work suggest that
functionally significant residual structure can exist in MoRF regions prior to
the actual binding event. We also demonstrate profound conformational
preferences within MoRF regions for α-helices.
We believe that the results from this study would subsequently improve our
understanding of protein-protein interactions especially those related to the
molecular recognition, and may pave way for future work on the development
of protein binding site predictions.
We hope that via the conclusions of this work, we would have demonstrated
that within only a few of years of its conception, intrinsic protein disorder has
gained wide-scale importance in the field of protein-protein interactions and
can be strongly associated with molecular recognition.
vi
Table of Contents
Acknowledgements .................................................................................................................. iv
ABSTRACT .............................................................................................................................. v
Introduction .............................................................................................................................. 1
A.
Introduction to subject ................................................................................................... 1
B.
Importance of this subject .............................................................................................. 4
C.
Knowledge Gap ............................................................................................................... 5
Background ............................................................................................................................... 7
A.
Relevant research ............................................................................................................ 7
B.
Goal to be tested .............................................................................................................. 8
C.
Intended research goal ................................................................................................... 8
Materials & Methods ............................................................................................................. 10
A.
Dataset of MoRFs.......................................................................................................... 10
Results ..................................................................................................................................... 13
A.
MoRFs dataset statistics & length distribution .......................................................... 13
B.
Secondary Structure Analysis ...................................................................................... 14
C.
Amino acid Composition, Charge & Aromatics in MoRFs ....................................... 17
(a) Amino acid compositions ............................................................................................. 17
(b) Net & Total charges, Aromatics and Proline Content .............................................. 19
D.
Order-Disorder Predictions and Functional classes .................................................. 20
E.
Presence of poly–proline type II hélices & Ramachandran Plot .............................. 23
Conclusions ............................................................................................................................. 25
Discussions .............................................................................................................................. 28
Appendix
A.
B.
MoRFs(or MoREs) and their partners………………………………...………...….31
MoRF update......................................................................................................38
References ............................................................................................................................... 51
CURRICULUM VITAE
vii
Introduction
A.
Introduction to subject
Traditional understanding of protein structure and function relationship relies
on protein function being critically dependent on a well-defined threedimensional protein structure. However, recent studies revealed that the true
functional state for many proteins and protein domains is an intrinsically
unstructured conformation [1-14]. This phenomenon has been described for
both partially and wholly disordered proteins. Since these first observations,
the field of protein disorder and protein functionality resulting from this
disorder has been steadily progressing.
Number of publications
300
250
200
150
100
50
0
1985-89
1990-94
1995-99
2000-04
Years
Figure 1: Time-dependent increase in the number of PubMed hits dealing with
intrinsically disordered proteins. The following set of keywords has been used to
perform this search: intrinsically disordered, intrinsically unstructured, natively
unfolded, intrinsically unfolded and intrinsically flexible.
Figure 1 reflects the rapidly growing interest in the domain of intrinsically
disordered proteins. In fact, 110 papers discussing disordered proteins were
published during the year of 2004 (and as many as 50 such papers were
published during the first quarter of 2005) [15].
1
The conformation of natively disordered proteins closely mimics the observed
denatured states of structured proteins [16, 17]. Past initiatives and efforts in
the field of structural biology have proven that disordered proteins are
common in various proteomes and their frequency increases with increasing
complexity of the organisms [18]. This increased prediction of disorder in
eukaryotes compared with the prokaryotes or the archaea has been suggested
to be a consequence of the increased need for cell signaling and regulation
[19-21]. The functional importance of protein disorder is further emphasized
by its role in various signal transduction processes, cell-cycle regulation, gene
expression and molecular recognition [2-4, 15]. The widespread prevalence
and importance of these proteins has called for re-assessing the classical
understanding of protein structure–function paradigm [1 -11, 21].
It has also long been recognized that the formation of protein-protein
complexes is probably the most common phenomenon by virtue of which
biological function is achieved. In this report we discuss a specialized subset
of these protein-protein interactions, ‘Molecular Recognition Elements’ or
“Molecular Recognition Features” which are protein regions that specifically
participate in protein-protein interactions. Molecular recognition is defined as
a process by which biological entities interact with each other or with small
molecules, to form specific complexes. In case of proteins, this binding
phenomenon enables a proteinaceous complex to participate in specialized
activities and mediate select biochemical functions. Important aspects of
signaling-related molecular recognition in comparison with other binding
events are: (a) the unique combination of high specificity and low affinity; (b)
2
binding diversity in which one region specifically recognizes differently
shaped partners by structural accommodation at the binding interface; (c)
binding commonality in which multiple, distinct sequences recognize a
common binding site (with perhaps different folds). Besides this, another
important feature of molecular recognition is that it coincides with more
complex and functionally important mechanisms such as protein folding,
signal transduction or the formation of multisubunit and supramolecular
structures.
Some special examples of molecular recognition have been reported where
one or both of the partners are very flexible or wholly disordered prior to
binding and their interaction resulted in the formation of a regular structured
protein complex. This phenomenon can only be explained by the complex
being formed over a huge configurational space via the binding-induced
folding. Obviously, in this case each residue constituting this complex is under
the influence of numerous attractive and repulsive forces. This may explain
why experimental analysis and detection of such a phenomenon is hard to
undertake. The complexity of this problem can be gauged in close proximity
to that of the problem of protein folding [5, 22].
One illustrative example of functionally important disordered protein that
participates in a molecular recognition process is considered below. In 2004
Callaghan et al. [23] showed experimentally and by means of GlobPlot [24]
and PONDR [52 - 54] that the C terminal domain of a full-length RNase E
was predominantly disordered. Besides being highly enriched in the R, P, G, Q
3
amino acids, this domain also included three isolated stretches that
demonstrated increased propensity to be ordered when bound to RNA. It was
also proposed that the second stretch amongst these was the one that truly
interacts with the RNA.
B.
Importance of this subject
The recognition of one protein molecule by another is an important
phenomenon in all living systems. Enzymes are a good example of molecular
recognition and substrate binding. This selective recognition process lies in the
‘complementary’ nature of the interacting surfaces and was termed initially as
the so called ‘lock-and-key’ concept by Fischer more than a hundred years ago
[41].
However, a modern view on molecular recognition, called induced fit [42],
takes into account that the interacting molecules are flexible and can adapt
their shape during the recognition process. Induced fit has been observed
experimentally for many protein-ligand interactions.
Our work aims to study proteins participating in molecular recognition. To this
end, a dataset of Molecular Recognition Features or Elements (which will be
referred to as MoRFs from here onwards) was created and some characteristic
features of MoRFs were described. We report and discuss the results from a
few qualitative tests (such as amino acid compositions, order-disorder
percentages etc.) performed on MoRF dataset. These results are also compared
to those from representative disorder and globular protein datasets. We believe
4
that this analysis would help us to better understand the physico-chemical and
structural variations between molecular recognition elements and ordinary
order – disorder datasets. Any notable differences may allow future
characterization and prediction of MoRFs and subsequently improve our
understanding of the structural changes that bring about the binding of a
MoRF to its macromolecular target. A parallel advantage foreseen from the
results of this study would be to have more accurate estimation of protein
binding sites within MoRFs. This could ultimately lead to the design and
development of a predictor of MoRFs. On the commercial side, we expect all
these results can facilitate simpler design of drug compounds that influence
the process of molecular recognition.
C.
Knowledge Gap
We are now aware of the large body of evidence that supports an idea of
functional importance of intrinsic disorder. However, there is an apparent lack
of information on the various features and characteristics of MoRFs (such as
amino acid compositions, order-disorder predisposition, charge, aromaticity
etc). Little is also known about the mechanisms underlying the structural
changes in MoRFs during their binding phase. In general, this problem has
been difficult to approach experimentally, especially since studying the
extremely flexible conformations of MoRFs poses a special challenge [40].
Computational approaches may help solve the problem in such situation. In
doing so, we compared the actual bound structures to their inherent structural
preferences. For this purpose, we collected MoRFs from PDB and determined
their secondary structures in the bound state. It is our belief that the binding of
5
MoRFs to their respective partners after undergoing a disorder-order transition
is certainly template-driven and not a chance event.
6
Background
A.
Relevant research
Molecular Recognition Features (MoRFs) are common in various proteomes
and occupy a unique structural and functional niche in which function is a
direct consequence of intrinsic disorder. The evidence that these intrinsically
disordered proteins without a well defined folded structure do exist in vitro
and in vivo is compelling and justifies considering them as a separate class
within the protein universe. A number of reviews and papers have reported
and discussed advances in the rapidly progressing field of intrinsically
disordered proteins, with major focus towards gathering evidence for their
unfolded nature prior to binding and discussing the functional benefits their
malleable structural state provides [1-4, 12-18].
In their unbound form, many intrinsically disordered proteins have been
traditionally considered to exist in a random coil state (non-alpha, non-beta
conformations maintaining aperiodic phi and psi angles), since their structures
closely mimic the unfolded state of globular proteins in the presence of high
concentrations of strong denaturants [19- 21].
A closer look at natively
unfolded proteins and some MoRFs however reveals this statement not to be
correct. To begin with, a true irregular does not exist even under the harshest
conditions [46, 47]. Hence, it is not surprising that many MoRFs have been
reported to bear traces of residual structure. [12, 17] and upon interaction with
their binding partners, MoRFs have the ability to undergo significant induced
folding steps or disorder-to-order transition [1, 2, 12]. Such a molecular
7
recognition mechanism, which is coupled to the folding process, has been
noted to confer exceptional specificity and versatility [3, 26-28]. All these
features explain the prevalence of structural disorder in signaling and
regulatory proteins [28]. The interaction of MoRFs with their partners
highlights the need and importance of comprehending the mechanism of their
induced folding process. Since effective functioning of MoRFs requires fast
formation of the folded state [49], their template-induced folding represents a
special and interesting case of protein folding. The advantages of this binding
mode have been studied in detail in the case of the transcription factor GCN4,
where binding strength correlates with α-helicity of its critical DNA-binding
segment [50, 51]. It has to be noted, however, that over-stabilization of a
secondary structural element can also decrease the rate constant of complex
formation, as was shown for the cyclin-dependent kinase inhibitor, p27
(Kip1).
B.
Goal to be tested
The goal of this work is to discover signs of inherent secondary structure
preferences, if any, in MoRFs prior to binding which could possibly influence
their final structure in the ordered complex. In doing so, MoRF sequences will
first be assessed by a secondary structure predictor, PHD [34, 35] and then
compared to the bound structures.
C.
Intended research goal
Our primary goal in this project is to design and develop a database of MoRFs
and to study a few types and examples of molecular recognition elements from
8
this database. We also carry out several qualitative tests on this database and
compare the results with those from representative disordered (DISPROT
[29]) and ordered datasets to look into any physico-chemical differences
between their members. Our ultimate goals are: (a) To facilitate future
characterization and prediction of MoRFs; (b) To help us have better
knowledge about potential binding sites and (c) To gain further insight into the
structural changes that bring about the binding of a MoRF to its
macromolecular target. We also hope that by doing these analyses we provide
a ground for future design of compounds that influence this process.
Eventually, the results from these tests will not only help associate disorder
with MoRFs but also show that structural disorder observed in MoRFs
actually predisposes them for special functional modes, which are either a
direct result of their fluctuating conformation or is realized via binding to one
or several other proteins in a structurally adaptive process.
9
Materials & Methods
A.
Dataset of MoRFs
Using the Seqres dataset available at the Protein Data Bank (PDB) [30], we
collected protein segments shorter than 70 residues, which are bound to other
proteins with lengths of 100 residues or more. Our choice for selecting protein
chains with lengths less than 70 residues stemmed from the fact that such
proteins would be less likely to form self-folding globular units and then
interact with other proteins. In other words, such protein chains very likely do
not have significant buried surface area and participate in the molecular
recognition phenomena by forming parts of larger protein complexes. Using
these criteria, we were able to prepare a starting dataset consisting of 2512
protein chains. The PDB files corresponding to these 2512 proteins were
downloaded to obtain sequences, secondary structure, and information on
Ramachandran’s phi and psi angles. The PDB Seqres dataset houses all the
protein sequences available at PDB along with their residues observed in a
protein crystal or in solution. These sequences also included residues not
present in the crystal model (i.e., disordered, lacking electron density, cloning
artifacts, His–tags, etc.). An obvious next step was to get rid of all chains with
ambiguous sequence information from our initial working dataset (i.e.,
sequence containing X or Z annotations instead of real amino acids). We also
removed protein chains with 10 or less residues since such short peptides may
or may not be specific to larger sequences making the later steps of identifying
sequences containing such MoRFs difficult. At the end of all these steps of
data preprocessing 1261 chains (approx. 55000 residues with an average chain
10
length of 44.9 residues) were remaining. Further, after removing redundancy
amongst these 1261 protein chains our initial dataset gave us 372 nonredundant MoRFs.
Since these putative MoRFs were variable in their lengths we made use of
Rost’s formula [31] to dynamically calculate the sequence identity threshold
based on each chain’s length. A preliminary study based on the results from
the redundancy check step showed that the minimum number of members per
family was at least 1 and the maximum number of members for another family
was 177 (Thrombin, Alpha-Thrombin). Figure 2 shows the distribution of
cluster members within the MoRFs dataset.
Total # of clusters = 372
Figure 2: Frequency distribution of number of homologous MoRF sequences
for 372 Non Redundant MoRFs [x axis: # of MoRF sequences (# of clusters),
y axis: # of homologous members (cluster members)
Using other database references (Swiss-Prot [32], PIR [33], and NCBI [71]
listed in the respective PDB files for each of the MoRFs; we were able to
extract 301 sequences containing these 372 MoRF chains. All but 53, of the
total MoRFs were found to be fragments and parts of larger sequences. A final
11
task after collecting and processing these MoRFs was to design a database for
MoRFs. For this task we used MySQL as the backend and Perl scripts to load
MoRFs and information about the MoRFs such as secondary structure,
binding partner, length etc. Finally using the DSSP program [34], secondary
structure assignments for MoRFs was made (results shown in Figure 5).
Figure 3 represents several illustrative examples of MoRFs from our dataset.
Figure 3: Some examples of complexes between MoRFs and their binding
partners. The structures (PDB code in parenthesis) shown are: (a) α-helical
MoRF p53 attached to MDM2 (b) extended β-MoRF Grim attached to
Apoptosis Inhibitor (c) irregular-MoRF p53 attached to Cyclin A2 (d)
Complex– MoRF ovomucoid attached to Trypsin. The structures have been
visualized by the Swiss-PDB viewer.
12
Results
A.
MoRFs dataset statistics & length distribution
Table 3 lists the number of MoRFs obtained after each data pre-processing
step to reach a final non-redundant working dataset.
Number of MoRFs
2512
Initial MoRFs obtained using PDB Seqres
dataset (July 2004)
Filtering ambiguous data (X,Z), Removal of
sequences with less than 10 residues
Sequence redundancy removal
1261
372
Table 3: Number of MoRFs after each data processing step
Analysis of the lengths for all MoRFs showed that as many as two-thirds of
these features had lengths between 10 and 20 residues and were relatively
short in lengths in comparison to other proteins (Figure 4).
40
35
30
Frequencys
25
20
15
10
5
0
10
15
20
25
30
35
40
45
Length of MoRE
50
55
60
Figure 4: Length distribution in MoREs dataset.
13
65
B.
Secondary Structure Analysis
We used the DSSP [34] program to determine the secondary structure
assignments for each of the 372 MoRFs. The DSSP program was designed to
standardize protein secondary structure assignments. It accepts as input, a
single PDB entry file to assign secondary structure types (viz., helices, sheets
and irregular) to each residue of this protein’s sequence.
Results showed that, 27% of this dataset (approximately 9000 residues) had αhelical conformation, 12% were β-sheet residues and approximately 48% of
the residues with an irregular structure. The remaining 13% of the residues
constituted missing coordinate data from PDB files confirming their disorder
structure type. We compared these results with those from a similar size
(approx. 9000 residues) control dataset consisting of single chain X-ray
structures with a primitive space group (necessarily monomeric).
The
structures in the control dataset had no missing residues. Results of this
analysis are shown in Figure 5.
Disorder 13%
Disorder 0%
Helices 33%
Helices 27%
Irregular
42%
Sheet/Strand
12%
Sheet/Strand
25%
Irregular 48%
Figure 4: (a) Secondary structure distribution of residues in MoRFs
(b) Secondary structure distribution in Monomers
14
We observe a decreased overall preference for extended – beta conformation
in MoRFs. This can be justified by an abundance of hydrophilic side-chains in
them [2, 3, and 12]. The most pronounced difference between the secondary
structure distributions of bound MoRFs (~12%) and monomeric proteins
(~25%) is also seen in the extended - beta structural elements or sheet motifs.
The possibility that interactions with the partner protein influence the native
conformational preferences of MoRFs was studied by comparing predicted
secondary structure results to the DSSP assignments of the bound structures.
We followed up with comparisons between MoRFs’ structure assignments and
their predisposition to form particular secondary structure. For this we used
the PHD algorithm [34, 35].
The PHD secondary structure algorithm uses a combination of multiple
sequence alignments and neural networks to predict secondary structure
elements for each residue of a given protein sequence. When a protein is input,
this method finds all the homologs and builds a profile using multiple
sequence alignment. It then feeds this profile into a series of neural networks
to output the predictions.
As mentioned earlier, the goal of this exercise was to test our original
hypothesis that protein complex formation influences or modifies the
disordered state of MoRFs. To estimate the effect of partner proteins in
modifying the inherent structural preferences of MoRFs upon binding,
predicted secondary structures have been related to the observed
15
conformations in the bound state. Results obtained from this experiment
(Table 4) establish that the inherent secondary structural features of MoRFs
were well preserved in their bound state. This is similar to globular proteins,
where non-local interactions were found to have negligible effect on the
predictability of secondary structures [68]. The most remarkable preference, as
seen in the case of helices, predicts a substantial stability for these motifs and
points to them as preformed structural elements in the solution state. In
contrast, coils can be produced by random sequences almost as well as by the
MoRF chains themselves. The correlation of the secondary structure
preferences of MoRFs with and without their binding partners can help in
future analysis and probing of the role of these structure elements.
DSSP (residues)
α-helix
2469
β-sheet
1118
Irregular
4359
Disorder
1147
PHD
H:74%,
B: 9%,
I: 17%
H:11%,
B: 55%,
I: 34%
H: 21%,
B: 15%,
I:64%
H: 18%,
B: 10%,
I:72%
Table 4: PhD secondary structure prediction accuracies for MoREs
Results revealed that α–MoRFs were predicted with higher confidence as
against β-MoRFs or I-MoRFs. Also a high percentage of the originally
assigned disordered residues were predicted to be irregular. Extended
conformations can hardly be predicted from MoRF sequences, possibly due to
the fact that they are less structurally defined while still in solution and have a
tendency to become ordered only upon binding to the partner. As in the case
of secondary structure assignment results, these results were also compared
with those from our control dataset of monomeric proteins.
16
A region-wise analysis for different structural types of MoRFs (Table 5 &
Figure 6) showed that there were in all 1880 regions of known secondary
structures, 269 of which were helical in nature, 381 were sheets. More than
half of the total regions or 991 were found to have an irregular conformation.
The remaining 239 regions were disordered.
# of
Disordered
regions
205
26
5
3
239
Region Length
(in residues)
1 -9
10 -19
20 - 29
30 - 69
# of
Helical
regions
167
76
17
9
269
# of Ext.
Beta
regions
376
5
0
0
381
# of
Irregular
regions
847
128
10
6
991
Table 5: Region wise distribution in different structural types of MoRFs
840
790
740
Number of Regions
690
640
590
Number of D regions
540
490
Number of B regions
Number
of H regions
Number of I regions
440
390
340
290
240
190
140
90
40
-10
1 -9
10 -19
20 - 29
30 - 69
Re gion Le ngths
Figure 6: Histogram for region wise distribution in MoRFs
C.
Amino acid Composition, Charge & Aromatics in MoRFs
(a) Amino acid compositions
Comparisons between amino acid compositions for monomers and MoRFs
show that MoRFs have increased levels of C, R, S, P and K. On the other hand
17
they show decreased content of amino acids important for the formation of
strong β-sheets (with low α-helical propensity) such as L, V, F, I, Y and D.
(Figure 7(a)).
10
9
MoREs
8
Monomers
7
6
5
4
3
2
1
0
W
C
F
I
Y
V
L
H
M
A
T
R
G
Q
S
N
P
D
E
K
Figure 7: (a) Amino acid composition of MoRFs (MoREs) vs. Monomers
The following histograms (Figure 7(b) & 7(c)) depict the relative composition
of all MoRFs as well as the different structural categories of MoRFs with
respect to proteins from the control monomeric dataset. It is interesting to note
the significant enrichment of Cystine in the MoRF dataset (Figure 7(b)) in
general when compared to the control dataset. Figure 7(c) also shows that
Cystine seems to be more prevalent in β-MoRFs as compared to their α-helical
and irregular counterparts. Based on these results it might be interesting to probe
further for the presence of disulfide bonds in MoRF interactions.
1.60
1.40
(MoRFs - Monomers)/Monomers
1.20
1.00
0.80
0.60
0.40
0.20
0.00
-0.20
-0.40
-0.60
W
C
F
I
Y
V
L
H
M
A
T
R
G
Q
S
N
P
D
E
K
Figure 7: (b) Relative amino acid composition of MoRFs w.r.t. Monomers: Y
axis shows the fractional difference of the amino acid compositions of MoRFs
and Monomers i.e., (MoRFs – Monomers)/Monomers
18
8.70
8.20
7.70
7.20
6.70
6.20
5.70
5.20
4.70
4.20
3.70
3.20
2.70
2.20
1.70
1.20
0.70
0.20
-0.30
-0.80
-1.30
(α-MoRE - Monomeric helices)/Monomeric
helices
(β-MoRE -Monomeric sheets)/Monomeric
Sheets
(i-MoRE -Monomeric Irregulars)/Monomeric
Irregulars
W
C
F
I
Y
V
L
H
M
A
T
R
G
Q
S
N
P
D
E
K
Figure 7: (c) Relative amino acid composition of different structural types (αhelical, β, and Irregular) of MoRFs w.r.t. Monomers.
(b) Net & Total charges, Aromatics and Proline Content
Figure 8: Total and Net charges, Proline Percentage & Aromatics in MoRFs
Figure 8 displays the comparative results of features such as Total Charge (K
+ R + D + E), Net Charge (K + R - D – E), Proline composition and Aromatic
content (F+W+Y) between MoRFs (MoREs) and monomeric chains. It is
interesting to note that despite comparable total charges in both these classes
of proteins, MoRFs tend to maintain higher net charge than monomers. This is
similar to the case found in disordered proteins [2]. Proline content observed
in MoRFs also exceeds the proline percentage found in monomers. This result
also motivated us to explore the presence of polyproline II helices in MoRFs
19
in later experiments. MoRFs also show higher proportions of aromatic amino
acids unlike monomeric proteins. This can be reasoned well since the side
chains of aromatic amino acids tend to make strong and specific interactions
[69] and which would be expected to exist in the case of proteins involved in
molecular recognition phenomena.
D.
Order-Disorder Predictions and Functional classes
Order/Disorder predictions using VL-XT [52-53] and VL3 [54] predictors
revealed that as much as 65% disorder was present in sequences containing
MoRFs. It was also interesting to note that as many as 30% (2723 residues) of
the irregular residues were found to be ordered. This data confirm the
hypothesis that the presence of such recognition motifs may be a general
feature of disordered proteins. Table 6 lists the percentage distribution of
order/disorder with respect to the different secondary structure assignments for
the 372 MoRFs.
α – residues
β – residues
ι – residues
PDB Disorder
Percent Predicted
Disordered
9
2
18
7
Percent Predicted
Ordered
18
10
30
7
Table 6: Order-Disorder statistics for different classes of MoRFs
Figure 9 shows the distribution of predicted disorder in sequences containing
MoRFs using both the predictors VL-XT and VL3.
20
VL3 Vs VLXT predictions for MoREs
VLXT :# of MoRE Sequences
VL3: # of MoRE sequences
70
66
65
60
56
52
# of sequences
50
47
44
40
40
32
28
30
22
31
32
30
22
23
41 - 50
51 - 60
23
17
20
10
10
5
5
0
0 - 10
11 - 20
21 - 30
31 - 40
61 - 70
71 - 80
81 - 90
91 - 100
Percent Disordered
Figure 9: Disorder distribution in MoRFs using VL-XT & VL3 predictors
Using the results of previous studies [13, 14] and a number of disorder
prediction results from the MoRFs database it was easy to conclude that
MoRFs primarily associated with signal transduction, cell-cycle regulation and
gene expression and thus may often be implicated in various cancer types [15].
Recent studies have also helped unveil the high incidence and functional
importance of disorder–to–order transitions in endocytosis [66] and in RNAand protein chaperones [67]. The disorder found in these sequences also
strongly correlates with the sites of post-translational modification. A parallel
PROSITE [37] search using these MoRFs also showed that a third of these
contained phosphorylation sites and as many as 14% of them displayed the
presence of myrostilation sites.
An important observation from these order-disorder predictions was the
coincidence of two of the well known binding regions on p53 (one in the N
terminal domain with MDM2 as the binding partner and another in the Cterminal domain with Cyclin A2 as its recognition partner) with dips in VLXT
21
order-disorder plots and presence of disordered regions on either sides (Figure
10). Such examples from the MoRF dataset indicate the possibility of
discovering novel binding regions in other proteins containing MoRFs.
α-helical MoRF p53
bound to mdm2
irregular -MoRF p53
bound to cyclin A2
Figure 10: VLXT disorder prediction in p53 protein: Residues 17-27 on the Nterminal bind to MDM2 & residues 378 -386 on the C –terminal bind to cyclin
A2. These regions also correspond to visible dips in the VLXT prediction plot
indicating the possibility of finding novel binding sites in other proteins by using
knowledge on MoRFs in them.
Also, using Swiss-Prot sequences (201 in number) containing 227 MoRFs, we
were able to gather preliminary insights into the general nature and functional
classes MoRFs tend to form. Results of this analysis have been discussed in
Table 7.
22
SW Keyword
3D-Structure
Signal
Glycoprotein
Transmembrane
Alternative Splicing
Hydrolase
DNA Binding
Transcription Regulation
Serine Protease Inhibitor
Frequency
174
57
41
37
35
25
24
23
21
Table 7: Top 10 Swiss Prot functional classes returned for MoRFs
The higher number of hits for keywords such as “Signal”, “Glycoprotein”,
“Transmembrane” and “Alternative Splicing” corresponding to the MoRF
dataset suggests that sequences containing MoRFs are more likely to be found
involved in signaling processes or have higher than normal likelihoods of
being transmembranic in nature or being alternatively spliced. By means of
weak associations we could conclude that MoRFs may be found to have
similar functional characteristics.
These functional capacities are exploited in many molecular settings and thus
making it easy to say that MoRFs may fulfill many different functions. By
considering unifying mechanistic details of their various modes of action, one
could possibly better understand other novel functions of MoRFs.
E.
Presence of polyproline type II hélices & Ramachandran Plot
Using the algorithm from Sreerama et al [56] to calculate the presence of poly
proline type II helices, we were able to obtain 53 such peptides (between the
lengths of 4 and 12 residues) in the MoRF dataset. The existence of such
23
peptides in this dataset suggests that the extended and rather stiff poly-proline
II helix conformation in MoRFs might be an explanation as to why the
interaction site is exposed. Also, by extracting phi and psi angles for each of
the MoRFs from their respective DSSP outputs, we were able to draw the
following Ramachandran plot [70]. The boxed region in the plot indicates the
region where the incidence of poly-proline II helices is the highest.
Figure 9: Ramachandran Plot for MoRFs
24
Conclusions
Functional disorder has long been noted to be associated with molecular
recognition elements (MoRFs) that can bind to RNA, DNA and other
protein(s) (or sometimes even smaller ligands). Pertinent to this function is
also the success of disorder-based prediction of phosphorylation sites.
Furthermore, the function of many, or possibly all, of these MoRFs depends
directly on disorder in a way that the disordered segment serves for
recognizing, solubilizing or loosening the structure of its binding partner. The
multifarious functioning of MoRFs (as in the example of p53 which functions
both as α –helical and irregular MoRF; Figures 3 (a) and 3(c)) assumes that
the lack of an ordered structure contributes in many ways to their mechanisms
of action. In fact, their highly malleable structure endows them with functional
features unparalleled by ordered proteins.
Here, in this report novel examples and extensions of MoRFs and their
features are presented. Typical advantage of the great conformational freedom
of intrinsically disordered proteins or protein fragments is most evident with
entropic chains, which may exert a long range, entropic exclusion of other
proteins or cellular constituents in spacer functions [57]. Another molecular
setting where such regions abound is multidomain proteins, where globular
domains are often separated by flexible linkers. These regions facilitate easy
orientational search and allows the recognition of distant and/or discontinuous
determinants on the target [14]. Fully disordered MoRFs also exploit this
unique feature. Their extended structure enables them to contact their
25
partner(s) over a large binding surface for a protein of the given size, which
allows the same interaction potential to be realized by shorter proteins overall,
encoded by a more economical genome [26]. In addition to these advantages,
the flexibility itself is instrumental to the assembly process itself, as certain
complexes may not be assembled successfully from rigid components.
Another unique consequence of the structural flexibility of MoRFs is their
capacity to adapt to the structure of distinct partners, which enables an
exceptional plasticity in cellular responses. An amply characterized case for
this behavior is the Cdk inhibitor p21Cip1, which can interact with CycACdk2, CycE-Cdk2, CycD-Cdk4 complexes [58] and apoptosis signalregulating kinase 1 [59] under different conditions. The open, extended
structure of MoRFs also enables an increased speed of interaction. It has been
noted that macromolecular association rates are substantially improved by an
initial, relatively non-specific association enabled by flexible (disordered)
recognition segments, mechanistically formulated in the ‘‘fly-casting’’ [49]
method of molecular recognition. Another prominent feature of MoRFs is that
their extreme proteolytic sensitivity, in principle, allows for an effective
control via rapid turnover. In fact, protein disorder prevails in signaling,
regulatory and cancer-associated proteins, and which are known to be shortlived proteins subject to rapid turnover [10, 11]. Furthermore, disorder itself
constitutes an integral part of the proteasomal destruction signal in two distinct
ways. On the one hand, non-ubiquitinated MoRFs may be directly degraded
by the 20S proteasome, as shown for p21Cip1 [60], tau proteins (also known
as β-tranferrins and found involved in the Alzheimer’s disease) [61]. On the
26
other hand, this mechanism may also play a more subtle regulatory role, by
processing disordered segments in multidomain proteins and releasing the
flanking, constitutively activated globular domains due to the endoproteolytic
activity of the proteasome [62]. Disorder may also constitute part of the signal
to the ubiquitination system itself as the regions of securin and cyclin B
recognized by the ubiquitination machinery have been shown recently to be
natively unfolded [63].
27
Discussions
Our observations suggest that MoRFs, in general, do not have to undergo
extensive structural rearrangements to adapt to their partner, as their residual
structure is germane to their final conformational state. The importance of
such structure in the binding process has been proposed for some MoRFs,
such as p27 (Kip1), p53 [58] and GCN4 [64].
The function of MoRFs is often realized via the phenomena of molecular
recognition, in a process of binding to a protein, RNA or DNA partner via
disorder to- order transition [2, 3, 9–11]. Based on this terminology we
suggest that the binding process be considered as a special type of protein
folding and protein complex formation, since it includes the formation of
intermolecular (tertiary structure) contacts between the MoRF and its binding
partner and also enables the stabilization of the secondary structure elements.
A physiologically effective action of MoRFs requires (i) specific and
reversible, interactions with the partner (for activation and deactivation of the
whole complex) and (ii) ability to fold quickly. To the analogy of folding
models for globular proteins, two mechanisms of the formation of structure of
MoRFs have been suggested. One of these mechanisms is that the MoRF is in
a completely disordered state prior to binding and makes initial contacts
almost anywhere along its sequence randomly. Subsequently, these contact
points serve as sites for folding around which the formation of secondary
structure elements occurs as dictated by the partner. In such a mechanism, the
inherent conformational preferences of the intrinsically disordered protein
28
itself may be overridden by interactions with the partner, resulting in
significantly different secondary structure elements in its uncomplexed and
bound state. This mechanism could be understood to invoke the a priori
formation of long-range interactions that facilitate the formation of subsequent
secondary structural elements. The other mechanism involves the early
formation of local secondary structure [43]. In this case, the structure of the
MoRF is not entirely random and shows features that are also visible in the
bound conformation. We believe that transiently or permanently ordered
segment(s) present in MoRFs may serve as the binding sites for the partner
proteins and around which the protein folds. Based on this, one can
hypothesize that a MoRF complex which contains a multitude of contact
points for its partner can be considered as a transient state of folding.
Analysis of the distribution of secondary structure elements shows that MoRFs
contain more irregular secondary structures, even in the bound state. The
abundance of irregular motifs in the bound structures suggests that although
their folding may be template-driven, MoRF partners do not impose large
constraints on their structure. Helices were found with comparable frequencies
in both MoRFs and monomeric proteins, whereas extended or sheet structures
are less preferred in MoRFs. The prime cause of this deviation may be
attributed to a different amino acid composition of MoRFs with increased
levels of C, R, S, P and K and, decreased levels of L, V, F, I, Y and D.
Speaking from an evolutionary perspective, evolution in monomeric proteins
aims at conserving an amino acid sequence that, after folding, yields a protein
with a well-defined function. In the case of MoRFs, evolutionary pressure
29
targeted the conservation of a sequence that initially lacks most signs of
regular structure and yet is primed to assume order as soon as it encounters its
macromolecular target(s).
The strong conformational preference of MoRFs for helical structural
elements also suggests that these structural elements could be temporarily
populated while in the non-bound state. In other words, the actual possible
conformational space of MoRFs is more limited than expansive, and there is
fairly lesser amount of final possible structures. This idea is in perfect
accordance with previous reported observations that MoRFs display signs of
residual structure. Restricted choice of available conformational states
minimizes the entropic costs of binding. Also the higher secondary structure
prediction rates of MoRF structures indicates that partner proteins cause
minimal disturbance in their pre-existing states. In short, interactions between
a MoRF and the contact sites of the partner facilitate decreased enthalpic
conditions for the reaction to a great extent, thereby leading to better
stabilization of the protein complex.
In summary, MoRFs can be regarded as “mixtures” of segments with strong
and weak (negligible) secondary structure preferences. These results extend
previous assertions that MoRFs possess structural features pertinent to their
partner recognition and function.
30
Appendix A: Molecular Recognition Features (or Elements) and their partners
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
1a02f
1a0rg
1a1rc
1a2cl
1a2xb
1a6ac
4aahb
1ab9a
1an1i
1aqdc
c-Fos
Transducin
Ns4A Protein
Alpha-Thrombin
Troponin I
Clip
Methanol Dehydrogenase
Gamma-Chymotrypsin
Tryptase Inhibitor
Hla-A2
P01100
P02698
P27958
P00734
P02643
P04233
AAA83766
P00766
P80424
P01892
SW
SW
SW
SW
SW
SW
GB
SW
SW
SW
1-53
1-65
1-16
2-15
1-31
1-15
1-69
1-10
1-40
1-14
140-192
2-66
1678-1693
337-350
4-34
103-117
28-96
1-10
2-41
128-141
MoRE
partner
PDB ID
1ao2a
1aorp
1a1rb
1a2ch
1a2ca
1a6cb
4aaha
1a9ad
1an1e
1aqdd
1avfp
1avoa
1avpb
1avzc
1axcb
1b0nb
1b33n
P20142
Q06323
P24937
P06241
P38936
P23308
P20116
SW
SW
SW
SW
SW
SW
SW
1-21
1-60
1-11
1-57
1-18
1-31
1-67
17-37
4-63
240-250
85-141
143-160
9-39
1-67
1avfa
1avob
1avpa
1avzb
1axcc
1bona
1b33a
1b41b
1b8hd
Gastricsin
11S Regulator
Adenoviral Proteinase
Fyn Tyrosine Kinase
P21/Waf1
Sini Protein
Phycobilisome 7.8 Kd Linker
Polypeptide
Fasciculin-2
DNA Polymerase Fragment
P01403
AAA93077
SW
GB
1-61
1-11
1-61
893-903
1b41a
1b8hc
1be3k
1bqpb
2btci
1bunb
1bxlb
Cytochrome Bc1 Complex
Lectin
Trypsin Inhibitor
Beta2-Bungarotoxin
Bak Peptide
P07552
P02867
P10293
P00989
Q16611
SW
SW
SW
SW
SW
1-22
1-47
1-29
1-61
1-16
15-36
218-264
4-32
25-85
72-87
1be3f
1bqpa
2btce
1buna
1bxla
31
MoRE partner
PDB Name
N-Fat
Phosducin
Ns3
Alpha-Thrombin
Troponin C
Hla-Dr3
Methanol Dehydrogenase
Gamma-Chymotrypsin
Trypsin
Hla-Dr1 Class II Histocompatibility
Protein
Gastricsin
11S Regulator
Adenoviral Proteinase
Negative Factor
Pcna
Sinr Protein
Allophycocyanin, Beta Chain
Acetylcholinesterase
DNA Polymerase Processivity
Component
Cytochrome Bc1 Complex
Lectin
Trypsin
Beta2-Bungarotoxin
Bcl-Xl
Appendix A: Molecular Recognition Features (or Elements) and their partners
25
26
27
28
29
30
31
32
33
34
35
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
1c04c
1c5wa
Ribosomal Protein L11
Urokinase-Type Plasminogen
Activator
Ice Inhibitor
Tnf-R2
Wiskott-Aldrich Syndrome
Protein Wasp
Activated P21Cdc42Hs Kinase
Calcium Pump
Alpha-Amylase Inhibitor
Fragment Of Coat Protein Vp2
Ref-1 Peptide
Cathepsin B
Coagulation Factor Viia (Light
Chain) (Des-Gl
Signaling Lymphocytic Act.
Molecule
Bowman-Birk Proteinase Inhibitor
Precursor
K-Ras4B Peptide Substrate
50S Ribosomal Protein L7/L12
Immunoglobulin G Binding
Protein A
Smad Anchor For Receptor
Activation
Shiga Toxin B Subunit
DNA Polymerase
Proteinase Inhibitor Ia3
Casein Kinase, Beta Chain
P56210
P00749
SW
SW
1-67
1-9
63-129
164-172
MoRE
partner
PDB ID
1c04d
1c5wb
P07385
P20333
A55197
SW
SW
SW
1-32
1-7
1-59
310-341
422-428
230-288
1c8oa
1ca9e
1ceea
Q07912
AAA74511
P80403
P12908
P27695
P07858
P08709
SW
GB
SW
SW
SW
SW
SW
2-44
1-20
1-32
11-29
1-13
1-47
1-55
447-489
1100-1119
1-32
279-297
59-71
80-126
150-204
1c4fa
1cffa
1clva
1cn3e
1cqga
1csbb
1cvwh
NP_003028
GB
1-11
276-286
1d4ta
P01055
SW
1-58
45-102
1d6ra
Cdc42 Homolog
Calmodulin
Alpha-Amylase
Coat Protein Vp1
Thioredoxin
Cathepsin B
Coagulation Factor Viia (Heavy
Chain) (Des-G
T Cell Signal Transduction Molecule
Sap
Trypsinogen
P01118
P29396
P02976
SW
SW
SW
1-11
1-32
1-51
178-188
1-32
103-153
1d8db
1dd3b
1deee
Farnesyltransferase (Beta Subunit)
50S Ribosomal Protein L7/L12
Igm Rf 2A2
AAC99462
GB
1-41
669-709
1deva
XVEBBD
P07917
P01094
P13862
GB
SW
SW
SW
1-69
1-36
1-29
1-16
21-89
1200-1235
2-30
188-203
1dm0a
1dmla
1dp6a
1ds5d
Mad (Mothers Against
Decapentaplegic, Drosop
Shiga Toxin A Subunit
DNA Polymerase Processivity Factor
Fixl Protein
Casein Kinase, Alpha Chain
1c8ob
1ca9g
1ceeb
1cf4b
1cffb
1clvi
1cn3f
1cqgb
1csba
1cvwl
36
1d4tb
37
1d6ri
38
39
40
1d8dp
1dd3c
1deeg
41
1devb
42
43
44
45
46
1dm0b
1dmlb
1dp5b
1ds5e
32
MoRE partner
PDB Name
Ribosomal Protein L14
Urokinase-Type Plasminogen
Activator
Ice Inhibitor
Tnf Receptor Associated Factor 2
GTP-Binding Rho-Like Protein
Appendix A: Molecular Recognition Features (or Elements) and their partners
47
48
49
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
1dtdb
1e0ab
Metallocarboxypeptidase Inhibitor
Serine/Threonine-Protein Kinase
Pak-Alpha
ATP Synthase Epsilon Chain
Chymotrypsin/Elastase
Isoinhibitor 1
Dihydrolipoamide
Acetyltransferase
Peptide Mtf-E (13N3E)
P81511
P35465
SW
SW
1-61
3-46
20-80
75-118
MoRE
partner
PDB ID
1dtda
1e0aa
P05632
P07851
SW
SW
1-47
1-61
2-48
1-61
1e79h
1eaib
Lipase
G25K GTP-Binding Protein,
Placental Isoform
ATP Synthase Delta Chain
Elastase
P11961
SW
1-41
130-170
1ebdb
Dihydrolipoamide Dehydrogenase
P05504
SW
1-13
29-41
1ed3d
Nucleoplasmin
Beta-Dystroglycan
Eukaryotic Translation Initiation
Factor 4E B
Eukaryotic Initiation Factor 4Gii
Fmdv Peptide
P05221
Q14118
NP_004086
SW
SW
GB
1-19
1-13
1-14
153-171
882-894
51-64
1ee6a
1eg4a
1ej4a
Class I Major Histocompatibility
Antigen Rt1
Pectate Lyase
Dystrophin
Eukaryotic Initiation Factor 4E
AAC02903
AAA42624
GB
GB
1-14
1-13
622-635
136-148
1ejhc
1ejoh
P98072
P25054
SW
SW
1-7
1-16
788-794
2034-2049
1ekbb
1emua
AAC53151
GB
1-12
687-698
3erdb
Estrogen Receptor Alpha
P22289
SW
4-58
1ezvx
Heavy Chain (Vh) Of Fv-Fragment
1f02t
1f3jp
Enteropeptidase
Adenomatous Polyposis Coli
Protein
Glucocorticoid Receptor
Interacting Protein 1
Ubiquinol-Cytochrome C
Reductase Complex 7.3
Translocated Intimin Receptor
Lysozyme C
Eukaryotic Initiation Factor 4E
Igg2A Monoclonal Antibody (Heavy
Chain)
Enteropeptidase
Axin
AAC38390
P00698
GB
SW
2-66
1-14
272-336
29-42
1f02i
1f3jd
1f47a
1f4vd
1f83b
Cell Division Protein Zipa
Flagellar Motor Switch Protein
Synaptobrevin-II
P06138
P06974
P19065
SW
SW
SW
1-17
1-16
1-24
367-383
1-16
53-76
1f47b
1f4vc
1f83a
Intimin
H-2 Class II Histocompatibility
Antigen
Cell Division Protein Ftsz
Chemotaxis Chey Protein
Botulinum Neurotoxin Type B
1e79i
1eaic
50
1ebdc
51
1ed3c
52
53
54
55
56
57
58
1ee5b
1eg4p
1ej4b
1ejhe
1ejop
1ekba
1emub
59
3erdc
60
1ezvi
61
62
63
64
65
66
33
MoRE partner
PDB Name
Appendix A: Molecular Recognition Features (or Elements) and their partners
67
68
69
70
71
72
73
74
75
76
77
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
1f8vd
1f93e
Mature Capsid Protein Gamma
Hepatocyte Nuclear Factor 1Alpha
Ribosomal Protein L24E
Ecotin
Beta-Acrosin Light Chain
30S Ribosomal Protein S14
30S Ribosomal Protein Thx
Coagulation Factor Xa
Elafin
Retinal Rod Rhodopsin-Sensitive
Cgmp 3',5'- C
Cyclin A/Cdk2-Associated P19
Myelin Basic Protein
AAF71693
P22361
GB
SW
1-26
1-31
362-401
1-31
MoRE
partner
PDB ID
1f8va
1f93d
P14116
AAA16410
P08001
P24320
P32193
P00742
P19957
P04972
SW
GB
SW
SW
SW
SW
SW
SW
1-53
1-48
1-13
1-60
1-24
1-52
1-47
1-38
4-56
122-169
20-32
1-60
2-25
127-178
71-117
50-87
1ffkt
1fi8b
1fiza
1fjgm
1fjgt
1fjsa
1flee
1fqjd
AAC50242
AAC41944
GB
GB
1-41
1-20
109-149
111-130
1fs1d
1fv1d
CAA67686
GB
1-41
2-52
1g3jc
AAA64465
P01062
GB
SW
1-24
1-22
140-163
10-31
1g5ja
1g9ie
P03652
SW
1-12
14-25
1gff2
1gg6a
1gl0i
1gl1i
1gngx
1h15c
Tcf3-Cbd (Catenin Binding
Domain)
Bad Protein
Bowman-Birk Type Trypsin
Inhibitor
Bacteriophage G4 Capsid Proteins
Gpf, Gpg, Gp
Gamma Chymotrypsin
Protease Inhibitor Lcmi I
Protease Inhibitor Lcmi II
Frattide
DNA Polymerase
P00766
P80060
P80060
Q92837
P03198
SW
SW
SW
SW
SW
1-10
1-32
1-34
1-26
1-14
1-10
22-53
58-91
198-223
628-641
1gg6b
1gl0e
1glle
1gngb
1h15d
1h2ls
Hypoxia-Inducible Factor 1 Alpha
Q16665
SW
1-22
795-822
1h21a
1ffkr
1fi8d
1fizl
1fjgn
1fjgv
1fjsl
1flei
1fqjc
1fs1a
1fv1c
78
1g3jb
79
80
1g5jb
1g9ii
81
1gff3
82
83
84
85
86
87
88
34
MoRE partner
PDB Name
Mature Capsid Protein Beta
Dimerization Cofactor Of
Hepatocyte Nucl
Ribosomal Protein L30
Natural Killer Cell Protease 1
Beta-Acrosin Heavy Chain
30S Ribosomal Protein S13
30S Ribosomal Protein S20
Coagulation Factor Xa
Elastase
Chimera Of Guanine NucleotideBinding Protei
Cyclin A/Cdk2-Associated P45
Major Histocompatibility Complex
Alpha Chain
Beta-Catenin Armadillo Repeat
Region
Apoptosis Regulator Bcl-X
Trypsinogen, Cationic
Bacteriophage G4 Capsid Proteins
Gpf, Gpg, G
Gamma Chymotrypsin
Chymotrypsinogen A
Alpha-Chymotrypsin
Glycogen Synthase Kinase-3 Beta
Hla Class II Histocompatibility
Antigen
Factor Inhibiting Hif1
Appendix A: Molecular Recognition Features (or Elements) and their partners
89
90
91
92
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
1h2sb
1h25e
Sensory Rhodopsin II Transducer
Retinoblastoma-Associated
Protein
Cellular Tumor Antigen P53
Retinoblastoma-Like Protein 1
Cyclin-Dependent Kinase
Inhibitor 1B
Cytotoxic T-Lymphocyte Protein
4
Bacteriophage T4 Short Tail Fibre
Caat/Enhancer Binding Protein
Beta
P-Selectin Peptide
P42259
P06400
SW
SW
1-60
1-10
23-82
869-879
P04637
P28749
P46527
SW
SW
SW
1-9
1-10
1-6
P16410
SW
P10930
P17676
Myelin Basic Protein
Cytochrome C Oxidase
Polypeptide IV
Hirudin Variant-1
Melanoma-Associated Antigen 4
S-Adenosylmethionine
Decarboxylase Beta Chain
Mhc Class II I-Ak
Cathepsin L: Light Chain
DNA Ligase IV
Ca2+/Calmodulin Dependent
Kinase Kinase
Importin Alpha-2 Subunit
Epidermal Growth Factor
Holliday Junction DNA Helicase
Ruva
1h26e
1h28e
1h27e
93
1h6ep
94
95
1h6wb
1h89a
96
1hesp
97
98
99
100
101
102
103
104
105
106
107
108
109
1hqrc
1hr8o
1hxei
1i4fc
1i72b
1iakp
1icfb
1ik9c
1iq5b
1iq1a
1ivoc
1ixsa
MoRE
partner
PDB ID
1h2sa
1h25a
MoRE partner
PDB Name
378-386
654-663
25-35
1h26d
1h28d
Cyclin A2
Cyclin A2
Cyclin A2
1-10
197-206
1h6ea
SW
SW
1-10
1-64
518-527
273-336
1h6wa
1h89c
Clathrin Coat Assembly Protein
Ap50
Bacteriophage T4 Short Tail Fibre
Myb Proto-Oncogene Protein
P16109
SW
1-9
814-822
1hesa
AAC41944
P04037
GB
SW
1-10
1-13
114-123
7-19
1hqrb
1hr8h
P01050
P43358
P17707
SW
SW
SW
1-10
1-10
1-61
55-64
230-239
4-67
1hxeh
1i4fb
1i72a
P24364
P07711
XP_007098
T37317
SW
SW
GB
PIR
1-13
1-42
1-28
1-24
50-62
292-333
755-782
334-357
1iakb
1icfc
1ik9b
1iq5b
Clathrin Coat Assembly Protein
Ap50
Hla-Dr Beta Chain
Mitochondrial Processing Peptidase
Beta Subunit
Thrombin
Beta-2-Microglobulin
S-Adenosylmethionine
Decarboxylase Alpha Chain
Mhc Class II I-Ak
Cathepsin L: Heavy Chain
DNA Repair Protein Xrcc4
Calmodulin
P52293
P01133
Q9F1Q3
SW
SW
SW
1-7
1-47
1-50
47-53
975-1021
142-191
1iq1c
1ivob
1ixsb
Importin Alpha-2 Subunit
Epidermal Growth Factor Receptor
Ruvb
35
Sensory Rhodopsin II
Cyclin A2
Appendix A: Molecular Recognition Features (or Elements) and their partners
110
111
112
113
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
1izlf
1izlk
1j5am
1jacb
1jb0i
Photosystem II: Subunit Psbf
Photosystem II: Subunit Psbk
Ribosomal Protein L32
Jacalin
Photosystem 1 Reaction Centre
Subunit Viii
Photosystem 1 Reaction Centre
Subunit Xii
Cell Death Protein Grim
Head Involution Defective Protein
C-Type Natriuretic Peptide
NP_682332
Q9F1K9
P49228
P18671
P25900
GB
SW
SW
SW
SW
1-30
1-27
1-58
1-15
1-38
14-43
14-40
2-59
4-18
1-38
MoRE
partner
PDB ID
1izld
1izld
1j5al
1jaca
1jb0f
P25903
SW
1-31
1-31
1jb0l
AAC47727
AAA79985
P23582
GB
GB
SW
1-8
1-8
1-18
2-9
2-9
109-126
1jd5a
1jd6a
1jdpb
Neuroserpin
Neuroserpin
Insulin B Peptide
Mitogen-Activated Protein Kinase
Kinase 2
Splicing Factor U2Af 65 Kda
Subunit
Protein Mu-1
Agglutinin
Cation-Independent Mannose 6Phosphate Recept
Adenomatous Polyposis Coli
Protein
Elongation Factor G
Crk
Cation-Independent Mannose-6Phosphate Recept
O35684
O35684
CAA08766
NP_109587
SW
SW
GB
GB
1-40
1-31
1-13
1-16
25-64
367-397
35-47
1-16
1jjoc
1jjoc
1jk8b
1jkya
P26368
SW
1-23
90-112
1jmta
AAA47236
P18676
P11717
GB
SW
SW
1-33
1-16
1-8
10-42
2-17
2484-2491
1jmub
1jota
1jplb
P25054
SW
1-11
1021-1031
1jppb
P13551
Q64010
P11717
SW
SW
SW
1-32
1-12
1-7
220-251
217-228
2485-2491
1jqra
1ju5a
1jwgb
114
1jb0m
115
116
117
118
119
120
121
1jd5b
1jd6b
1jdph
1jjoa
1jjoe
1jk8c
1jkyb
122
1jmtb
123
124
125
1jmua
1jotb
1jple
126
1jppc
127
128
129
130
1jqsb
1ju5b
1jwgc
36
MoRE partner
PDB Name
Photosystem II: Subunit Psbd
Photosystem II: Subunit Psbd
Ribosomal Protein L22
Jacalin
Photosystem 1 Reaction Centre
Subunit III
Photosystem 1 Reaction Centre
Subunit Xi
Apoptosis 1 Inhibitor
Apoptosis 1 Inhibitor
Atrial Natriuretic Peptide Clearance
Recepto
Neuroserpin
Neuroserpin
Mhc Class II Hla-Dq8
Lethal Factor
Splicing Factor U2Af 35 Kda
Subunit
Protein Mu-1
Agglutinin
ADP-Ribosylation Factor Binding
Protein
Beta-Catenin
DNA Polymerase Beta-Like
Crk
ADP-Ribosylation Factor Binding
Protein Gga1
Appendix A: Molecular Recognition Features (or Elements) and their partners
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
1k2dp
Myelin Basic Protein Peptide
With 8 Residue
Insulin Receptor Substrate 1
XP_040888
GB
1-8
2-9
MoRE
partner
PDB ID
1k2db
P35568
SW
1-8
894-901
1k3aa
Dipeptydil-Peptidase I Heavy
Chain
Steroid Receptor Coactivator-1
Peptide N-Y-C
Ps1 Peptide
Regulator Of G-Protein Signaling
14
Nuclear Receptor Co-Repressor 2
P53634
SW
1-69
395-463
1k3bb
AAB50242
Q13291
AAK97192
O08773
GB
SW
GB
SW
1-10
1-12
1-15
1-35
687-696
275-286
17-31
496-530
1k4wa
1ka7a
1kcrh
1kjyc
Q9Y618
SW
1-19
2339-2357
1kkqd
Nuclear Pore Complex Protein
Nup98
Amphiphysin
Traf Family Member-Associated
Nf-Kappa-B Acti
Traf Family Member-Associated
Nf-Kappa-B Acti
Outer Membrane Virulence
Protein Yope
General Control Protein Gcn4
Map Kinase Kinase 3B
Eukaryotic Protein Synthesis
Initiation Facto
Plasma Serine Protease Inhibitor
Fibrinogen Alpha/Alpha-E Chain
Oligopeptide Substrate For The
Protease
P52948
SW
1-6
882-887
1ko6a
P49418
Q92844
SW
SW
1-9
1-11
322-330
177-187
1ky7a
1kzza
Nuclear Receptor Ror-Beta
Sh2 Domain Protein 1A
Pc283 Immunoglobulin
Guanine Nucleotide-Binding Protein
G(I)
Peroxisome Proliferator Activated
Receptor
Nuclear Pore Complex Protein
Nup98
Alpha-Adaptin C
Tnf Receptor Associated Factor 3
Q92844
SW
1-17
178-194
1l0aa
Tnf Receptor Associated Factor 3
P08008
SW
1-57
22-78
1l2wh
Yope Regulator
P03069
AAB40652
AAC82471
SW
GB
GB
1-28
1-8
1-22
250-277
22-29
138-159
1ld4d
1leza
1lj2a
P05154
P02671
P04517
SW
SW
SW
1-29
1-65
1-10
378-406
145-209
2785-2794
1lq8c
1lt9b
1lvba
Coat Protein C
Mitogen-Activated Protein Kinase 14
Nonstructural RNA-Binding Protein
34
Plasma Serine Protease Inhibitor
Fibrinogen Beta Chain
Catalytic Domain Of The Nuclear
Inclusio
131
1k3ab
132
1k3bc
133
134
135
136
1k4wb
1ka7b
1kcrp
1kjyb
137
1kkqe
138
1ko6b
139
140
1ky7p
1kzzb
141
1l0ab
142
1l2wi
143
144
145
146
147
148
149
1ld4e
1lezb
1lj2c
1lq8b
1lt9a
1lvbc
37
MoRE partner
PDB Name
H-2 Class II Histocompatibility
Antigen
Insulin-Like Growth Factor 1
Receptor
Dipeptydil-Peptidase I Light Chain
Appendix A: Molecular Recognition Features (or Elements) and their partners
150
151
152
153
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
1lw6i
Subtilisin-Chymotrypsin Inhibitor2A
Jacalin, Beta Chain
Nuclear Receptor Coactivator 2
Integrin Beta3
Transforming Growth Factor
Alpha
Serine Proteinase Inhibitor
(Serpin), Chain B
P-Glycoprotein
Target Sequence Of Rat
Calmodulin-Dependent Protein
Kinase I
U4/U6 Snrnp 60Kda Protein
Peptide Corresponding To The NTerminal Exten
Pyruvoyl-Dependent Arginine
Decarboxylase Bet
Iq2 and Iq3 Motifs From Myo2P,
A Class V Myos
Enoyl-Acyl Carrier Reductase
Transcription Initiation Factor Iia
Large Chain
Nitric-Oxide Synthase,
Endothelial
Retinoic Acid Receptor
Steroid Receptor Coactivator-1
Nuclear Receptor Coactivator 1
Isoform 3
Glutamate Decarboxylase
P01053
SW
1-63
AAA32678
Q15596
AAA67537
P01135
GB
SW
GB
SW
ZP_00059457
1m26b
1m2zb
1mk7a
1moxc
154
1mtpb
155
156
157
158
1mvup
1mxee
1mzwb
1n12b
159
1n13a
160
1n2dc
161
162
1nhdc
1nh2b
163
1niwb
164
165
166
167
168
2nlla
1nq7b
1nrlc
1nwdb
MoRE partner
PDB Name
21-83
MoRE
partner
PDB ID
1lw6e
1-15
1-21
3-13
1-49
64-78
734-754
739-749
41-89
1m26c
1m2zd
1mk7b
1moxb
Jacalin, Alpha Chain
Glucocorticoid Receptor
Talin
Epidermal Growth Factor Receptor
GB
1-35
385-419
1mtpa
AAA37004
Q63450
GB
SW
1-13
1-25
1210-1222
294-318
1mvub
1mxeb
Serine Proteinase Inhibitor (Serpin),
Chain
Ig Vdj-Region (Heavy Chain)
Calmodulin
O43172
P42190
SW
SW
1-31
1-11
107-137
22-32
1mzwa
1n12a
U-Snrnp-Associated Cyclophilin
Mature Fimbrial Protein Pape
Q57764
SW
1-46
7-52
1n13b
P19524
SW
1-48
806-853
1n2db
Pyruvoyl-Dependent Arginine
Decarboxylas
Myosin Light Chain
AAK25802
P32773
GB
SW
1-60
1-46
366-425
3-48
1nhdb
1nh2a
Enoyl-Acyl Carrier Reductase
Transcription Initiation Factor Tfiid
P29474
SW
1-19
492-510
1niwc
Calmodulin
P19793
AAB50242
NP_671766
SW
GB
GB
1-66
1-10
1-15
135-200
687-696
682-696
2nllb
1nq7a
1nrlb
Thyroid Hormone Receptor
Nuclear Receptor Ror-Beta
Orphan Nuclear Receptor Pxr
Q07346
SW
3-28
470-495
1nwda
Calmodulin
38
Subtilisin Bpn
Appendix A: Molecular Recognition Features (or Elements) and their partners
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
1nx0c
Calpastatin
P49342
SW
1-11
230-240
MoRE
partner
PDB ID
1nw0b
1nx1c
Calpastatin
P49342
SW
1-11
230-240
1nx1b
1occm
1oc0b
1oj5b
Cytochrome C Oxidase
Vitronectin
Signal Transducer and Activator
Of Transcript
Restricted Expression
Proliferation Associate
Splicing Factor Sf1
P10175
P04004
P42226
SW
SW
SW
1-43
1-37
1-14
25-67
22-58
795-808
1occn
1oc0a
1oj5a
Calcium-Dependent Protease, Small
Subunit
Calcium-Dependent Protease, Small
Subunit
Cytochrome C Oxidase
Plasminogen Activator Inhibitor-1
Steroid Receptor Coactivator 1A
Q9ULW0
SW
1-30
7-43
1ol5a
Serine/Threonine Kinase 6
CAA03883
GB
1-13
13-25
1opia
Flagellin
Nuclear Receptor Coactivator 2
Flavocytochrome B558 Alpha
Polypeptide
Chymotrypsinogen A
Copii-Binding Peptide Of The
Integral Membran
Histone H3
Retinoblastoma-Associated
Protein
Histone-Binding Protein N1/N2
Histone H3
Aspartate 1-Decarboxylase Beta
Chain
Outer Membrane Phospholipase
(Ompla)
Gh-Loop From Virus Capsid
Protein Vp1
O67803
Q61026
NP_000092
SW
SW
GB
1-40
1-12
1-11
479-518
741-752
150-160
1orya
1osvb
1ov3b
Splicing Factor U2Af 65 Kda
Subunit
Flagellar Protein Flis
Bile Acid Receptor
Neutrophil Cytosol Factor 1
P00766
Q01590
SW
SW
1-14
1-10
16-29
201-210
1oxga
1pd0a
Chymotrypsinogen A
Protein Transport Protein Sec24
P02303
P06400
SW
SW
1-7
3-19
8-14
860-876
1pega
1pjmb
Histone H3 Methyltransferase Dim-5
Importin Alpha-2 Subunit
P06180
P02303
P31664
SW
SW
SW
1-20
1-15
1-24
532-552
8-22
1-24
1pjnb
1pu9b
1pyub
P00631
SW
1-13
33-45
1qd6c
AAA42665
GB
1-24
133-156
1qgc4
Importin Alpha-2 Subunit
Hat A1
Aspartate 1-Decarboxylase Alfa
Chain
Outer Membrane Phospholipase
(Ompla)
Inmunoglobuline
169
170
171
172
173
1ol5b
174
1opib
175
176
177
178
179
180
181
182
183
184
1oryb
1osvc
1ov3c
1oxgb
1pd0b
1pegp
1pjma
1pjna
1pu9b
1pyua
185
1qd6a
186
1qgc5
187
39
MoRE partner
PDB Name
Appendix A: Molecular Recognition Features (or Elements) and their partners
188
189
190
191
192
193
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
1qgkb
1qled
Importin Alpha-2 Subunit
Ccytochrome C Oxidase
P52292
P77921
SW
SW
1-44
1-43
11-54
8-49
MoRE
partner
PDB ID
1qgka
1qlec
1qsnb
1r0tb
1r1rd
Histone H3
Ovomucoid
Ribonucleotide Reductase R2
Protein
Nuclear Receptor Co-Repressor 2
Peroxisome Proliferator-Activated
Receptor Bi
Within The Bgcn Gene Intron
Protein
Flap Structure-Specific
Endonuclease
Rhodopsin
Peptide From Collagen II
Transcription Initiation Factor Iid
230K Chai
Tissue Factor Pathway Inhibitor
Heat Labile Enterotoxin Type Iib
Nucleoporin Nup2
Potential Transcriptional
Repressor Not4Hp
Hirudin
P53
Marcks
Iq4 Motif From Myo2P, A Class
V Myosin
Parathyroid Hormone-Related
Protein
P02303
P01004
P00453
SW
SW
SW
1-11
1-62
1-16
10-20
65-126
361-375
1qsna
1r0ta
1rlrb
Importin Beta Subunit
Cytochrome C Oxidase Polypeptide
III
Tgcn5 Histone Acetyl Transferase
Trypsin
Ribonucleotide Reductase R1 Protein
Q9Y618
Q15648
SW
SW
1-17
1-11
1414-1430
640-650
1r2bb
1rk3a
B-Cell Lymphoma 6 Protein
Vitamin D3 Receptor
P82804
SW
1-33
3-35
1rk8b
Mago Nashi Protein
O29975
SW
1-11
326-336
1rxza
DNA Polymerase Sliding Clamp
O62798
P02458
A47371
SW
SW
PIR
1-18
2-11
1-67
50-67
1169-1178
11-77
1ry1u
2sebd
1tbab
Signal Recognition Particle Protein
Enterotoxin Type B
Transcription Initiation Factor Tfiid
P10646
P43528
P32499
O95628
SW
SW
SW
SW
1-58
1-36
1-16
1-52
121-178
215-250
36-51
12-63
1tfxb
1tiia
1un0b
1ur6a
P28507
AAA59989
P26645
P19524
SW
GB
SW
SW
1-15
1-11
1-18
1-25
51-65
17-27
148-165
854-878
1vith
1ycqa
1iwqa
1m46a
Trypsin
Heat Labile Enterotoxin Type Iib
Importin Alpha Subunit
Ubiquitin-Conjugating Enzyme E217 Kda 2
Alpha Thrombin
Mdm2
Calmodulin
Myosin Light Chain
P12272
SW
1-28
103-130
1m5ns
Importin Beta-1 Subunit
1r2bc
1rk3c
194
1rk8c
195
1rxzb
196
197
198
199
200
201
202
203
204
205
206
1ry1s
2sebe
1tbaa
1tfxc
1tiic
1un0c
1ur6b
1viti
1ycqb
1iwqb
1m46b
207
1m5nq
208
40
MoRE partner
PDB Name
Appendix A: Molecular Recognition Features (or Elements) and their partners
209
210
211
212
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
1m93a
1mqsb
1n0wb
Serine Proteinase Inhibitor 2
Integral Membrane Protein Sed5
Breast Cancer Type 2
Susceptibility Protein
Transcription Factor E2F2
Glycogen Synthase Kinase-3 Beta
P07385
Q01590
P51587
SW
SW
SW
1-46
1-21
1-33
1-46
1-21
1519-1551
MoRE
partner
PDB ID
1m93b
1mqsa
1n0wa
Q14209
P49841
SW
SW
1-18
1-10
410-427
3-12
1n4mb
1o6ka
Genome Polyprotein Capsid
Protein C
16-Mer Peptide From Intercellular
Adhesion Mo
E2F-1 Transcription Factor
P29846
SW
1-16
25-40
1n64h
Serine Proteinase Inhibitor 2
Sly1 Protein
DNA Repair Protein Rad51 Homolog
1
Retinoblastoma Pocket
Rac-Beta Serine/Threonine Protein
Kinase
Fab 19D9D6 Heavy Chain
P35330
SW
1-16
253-268
1j19a
Radixin
Q01094
SW
1-18
409-426
1o9kh
Axin Peptide
ADP-Ribosylation Factor Binding
Protein Gga1
Transcription Initiation Factor Iia
Alpha Cha
Tumor Necrosis Factor Receptor
Superfamily Meber
Tumor Necrosis Factor Receptor
Superfamily Member
Cbp/P300-Interacting
Transactivator 2
Rabaptin-5
O15169
Q9UJY5
SW
SW
1-18
1-41
383-400
168-208
1o9ua
1j2ja
Retinoblastoma Tumour Suppressor
Protein
Glycogen Synthase Kinase-3 Beta
ADP-Ribosylation Factor 1
P52655
SW
1-43
9-51
1nvpa
Tata Box Binding Protein
Q02223
SW
1-39
8-46
1oqdj
Q96RJ3
SW
1-31
16-46
1oqej
Q99967
SW
1-52
193-259
1p4qb
Tumor Necrosis Factor Ligand
Superfamily Member
Tumor Necrosis Factor Ligand
Superfamily Member
E1A-Associated Protein P300
Q15276
SW
1-6
440-445
1p4ua
Preprotein Translocase Seca
Subunit
Nuclear Receptor Coactivator 2
Bcl2-Like Protein 11
Large T Antigen
P43803
SW
1-24
876-899
1ozbh
ADP-Ribosylation Factor Binding
Protein Gga3
Protein-Export Protein Secb
Q15596
AAC40030
P03070
SW
GB
SW
1-9
1-33
1-7
743-751
83-115
127-133
1p93b
1pqla
1qltc
Glucocorticoid Receptor
Apoptosis Regulator Bcl-X
Importin Alpha-2 Subunit
1n4mc
1o6kc
213
1n64p
214
1j19b
215
1o9kp
216
217
1o9ub
1j2jb
218
1nvpb
219
1oqdk
220
1oqek
221
1p4qa
222
1p4ub
223
1ozbi
224
225
226
227
1p93e
1pq1b
1q1ta
41
MoRE partner
PDB Name
Appendix A: Molecular Recognition Features (or Elements) and their partners
228
229
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
1ujjc
C-Terminal Peptide From BetaSecretase
Nedd2-Like Caspase Cg8091-Pa
Cytochrome B6-F Complex IronSulfur Subunit
Golgi Autoantigen, Golgin
Subfamily A Member
R18 Peptide
(Phcvprdlswldleanmclp)
Peptide
L-Pro10
P56817
SW
1-7
495-501
MoRE
partner
PDB ID
1ujjb
NP_524017
P49728
GB
SW
1-8
1-39
115-122
33-71
1q4qj
1q90d
Q13439
SW
1-51
2172-2222
1r4aa
NF00163012
PIR
1 - 20
1 - 20
1a38b
ADP-Ribosylation Factor Binding
Protein Gga1
Apoptosis 1 Inhibitor
Cytochrome B6-F Complex Subunit
4
ADP-Ribosylation Factor-Like
Protein 1
14-3-3 Protein Zeta
O73683
Q9JJN2
SW
SW
1 – 10
1- 10
1aqcb
1awib
X11
Profilin
Q8JNV2
SW
1 – 67
1aym3
Human Rhinovirus 16 Coat Protein
CAB58569
NF00375716
NF00514021
P82107
NF00110208
EMBL
PIR
PIR
SW
PIR
1 - 10
1 - 10
1 - 10
1 - 59
1 - 44
459 - 468
1 - 10
1 - 10
1 - 59
1 - 44
1biib
1bjre
1bogb
1c9pa
1cqtb
1dxpc
1e0fi
1eb1a
1ebpc
2h1pp
1heze
1hh6c
Human Rhinovirus 16 Coat
Protein
Decameric Peptide
Lactoferrin
Peptide
Bdellastasin
Pou Domain, Class 2, Associating
Factor 1
Nonstructural Protein Ns4A (P4)
Haemadin
Peptide Inhibitor
Epo Mimetics Peptide 1
Pa1
Protein L
Pep-4
766 - 775
3098 - 3017,
3099 - 3018
2 - 69
NF00235394
Q25163
NF00866356
CAD13109
NF00522862
NF00429845
NF00505422
PIR
SQ
PIR
EMBL
PIR
PIR
PIR
1 - 16
1- 45
1 - 10
1 – 19
1 – 12
1 – 60
1 – 11
1 - 16
21 - 65
1 - 10
234 - 253
1 -12
1 - 60
1 -11
1dxpb
1eoff
1eb1h
1epbh
2h1ph
1hezd
1hh6b
2hrpp
Hiv-1 Protease Peptide
Q8ADZ9
SW
1 - 10
524 - 533
2hrpm
Beta-2 Microglobulin
Proteinase K
Antibody (Cb 4-1)
Trypsin
Pou Domain, Class 2, Transcription
Factor 1
Protease/Helicase Ns3 (P70)
Thrombin
Thrombin Heavy Chain
Epo Receptor
2H1
Heavy Chain Of Ig
Igg2A Kappa Antibody Cb41 (Heavy
Chain)
Monoclonal Antibody F11.2.32
1q4qk
1q90r
230
1r4ae
231
1a38p
232
233
1aqcc
1awip
234
1aym4
235
236
237
238
239
240
241
242
243
244
245
246
247
248
1biip
1bjri
1bogc
1c9pb
1cqti
42
MoRE partner
PDB Name
Appendix A: Molecular Recognition Features (or Elements) and their partners
249
250
251
252
253
254
255
256
257
258
259
260
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
1ir3b
1juqe
Peptide Substrate
Cation-Dependent Mannose-6Phosphate Receptor
Endo-1,4-Beta-Xylanase Y
Gp120
Talin
Strep-Tag II Peptide
Two Chain Tissue Plasminogen
Activator
Sfti-1
Peptide Epitope
Ste-20 Related Adaptor
Orexin
NF00103224
P20645
PIR
SW
1 – 18
1 - 10
1 - 18
265 - 277
MoRE
partner
PDB ID
1ir3a
1juqa
P16218
CAA00727
Q8AWI0
CAC22716
NF00107636
SW
EMBL
SW
EMBL
PIR
2 - 56
1 -16
1 -26
1 - 10
1 - 17
832 - 891
316 - 333
1944 - 1969
650 - 659
1 - 17
1ohza
1qnzh
1rkca
1rsub
1rtfb
NF00227071
NP_877418
GI:33303905
NF01572587
PIR
GB
GB
PIR
1 - 14
1 – 13
1 - 10
1 - 33
1 - 14
179 - 191
391 - 402
32 - 63
1sfia
1sm3h
1upka
1uvqb
13-Mer Peptide
Fusion Protein Consisting Of
Transforming Pro
Igg1 Fab Fragment (59.1)
Complexed With Hiv-1
Alpha1 Antichymotrypsin - Chain
B
c-AMP-Dependent Protein Kinase
(E.C. 2.7.1.37)
C-Myc Tag and His Tag
Modified Alpha=1=-Antitrypsin
(Modified Alpha
Tyrosine Phosphatase Syp (NTerminal Sh2 Domain)
Tyrosine Phosphatase Syp (NTerminal Sh2 Domain)
O73683
NF01479846
SW
PIR
1 - 13
1 - 11
764 - 776
1 - 11
1x11a
1n4pl
NF00927552
PIR
2 -25
1 - 23
1acyh
Q9UNU9
SW
1 - 40
368 - 407
2acha
GI:530223
GB
1 -20
7 - 26
1apme
GI:28474948
GI:22207050
GB
GB
1 -17
1 -36
364 - 380
613 - 648
2ap2d
7apia
GI:189730
GB
1 - 11
1006 - 1016
1ayaa
Q28224
SW
1 - 12
901 - 912
1ayba
1ohzb
1qnzp
1rkcb
1rsup
1rtfa
1sfii
1sm3p
1upkb
1uvqc
1x11c
1n4pm
261
1acyp
262
2achb
263
1apmi
264
265
2ap2e
7apib
266
1ayap
267
1aybp
268
43
MoRE partner
PDB Name
Insulin Receptor
ADP-Ribosylation Factor Binding
Protein
Cellulosomal Scaffolding Protein A
0.5B Antibody (Heavy Chain)
Vinculin
Streptavidin
Two Chain Tissue Plasminogen
Activator
Trypsin
Sm3 Antibody
Mo25 Protein
Hla Class II Histocompatibility
Antigen
X11
Geranyltransferase Type-I Beta
Subunit
Igg1 Fab Fragment (59.1)
Complexed With HivAlpha1 Antichymotrypsin - Chain A
c-AMP-Dependent Protein Kinase
(E.C. 2.7.1.3
Antibody (Heavy Chain)
Modified Alpha=1=-Antitrypsin
(Modified Alph
Tyrosine Phosphatase Syp (NTerminal Sh2 Domain)
Tyrosine Phosphatase Syp (NTerminal Sh2 Domain)
Appendix A: Molecular Recognition Features (or Elements) and their partners
269
270
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
1aycp
Tyrosine Phosphatase Syp (NTerminal Sh2 Doma
Cricket Paralysis Virus, Vp4
Calmodulin (Calcium-Bound)
Complexed With Rab
Black Beetle Virus Capsid Protein
(Bbv) Compl
Hirudin I
Bacteriophage Phix174 Capsid
Proteins Gpf, Gp
Peptide
Trypsin (E.C. 3.4.21.4) Variant
(D189G, G226D
Grip1
Hla-Dr2
His Tag
Calmodulin Complexed With
Calmodulin-Binding
Calmodulin Complexed With
Calmodulin-Binding
Antigen Bound Peptide
NF00959209
PIR
1 - 11
1 - 11
MoRE
partner
PDB ID
1aycp
Q9IJX3
XP_130630
SW
GB
1 - 57
1 – 16
1 - 57
646 - 671
1b35c
2bbma
P04329
SW
1 – 44
364 - 407
2bbvb
GI:2297640
NF00701276
GB
PIR
1 - 10
1 – 37
383 - 394
1 - 37
1bmmh
2bpa2
Q7KZ97
NF00704918
SW
PIR
1 - 10
1 - 56
413 - 424
1 - 58
1br8i
1brbe
NF00126022
NF00516257
GI:5359489
P11799
PIR
PIR
GB
SW
1 - 13
1 -15
1 - 12
1 -20
1 - 13
240 - 254
1 - 12
1730 - 1749
1bsxb
1bx2d
1c3qa
1cdla
Q9Y2H4
SW
1 - 25
339 - 363
1cdma
NF00531311
PIR
1 - 11
1 - 11
1cfsb
Alpha-Chymotrypsinogen
Complex With Human Pan
Alpha-Chymotrypsin (E.C.
3.4.21.1) Complex Wi
Proline Peptide
Ribonuclease S
Rat Ca2+/Calmodulin Dependent
Protein Kinase
NF00086129
PIR
1 - 56
1 - 56
1cgie
C31444
PIR
1 - 56
1 - 56
1choe
Q9Y6V0
GI:15984328
Q64572
SW
GB
SW
1 - 14
1 - 15
1 - 16
2336 - 2350
173 - 187
438 - 463
1cjfa
1cjqb
1ckka
1b35d
2bbmb
271
2bbvd
272
273
274
275
276
277
278
279
1bmmi
2bpa3
1br8p
1brbi
1bsxx
1bx2c
1c3qx
1cdle
280
1cdmb
281
1cfsc
282
1cgii
283
1choi
284
285
286
287
1cjfc
1cjqa
1ckkb
44
MoRE partner
PDB Name
Tyrosine Phosphatase Syp (NTerminal Sh2 Dom
Cricket Paralysis Virus, Vp3
Calmodulin (Calcium-Bound)
Complexed With Ra
Black Beetle Virus Capsid Protein
(Bbv) Comp
Alpha-Thrombin
Bacteriophage Phix174 Capsid
Proteins Gpf, G
Antithrombin-III
Trypsin (E.C. 3.4.21.4) Variant
(D189G, G226
Thyroid Hormone Receptor Beta
Hla-Dr2
Hydroxyethylthiazole Kinase
Calmodulin Complexed With
Calmodulin-Binding
Calmodulin Complexed With
Calmodulin-Binding
Igg2A Kappa Antibody Cb41 (Heavy
Chain)
Alpha-Chymotrypsinogen Complex
With Human Pa
Alpha-Chymotrypsin (E.C. 3.4.21.1)
Complex W
Human Platelet Profilin
Ribonuclease S
Calmodulin
Appendix A: Molecular Recognition Features (or Elements) and their partners
288
289
290
291
292
293
294
295
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
2ck0p
2clrc
11-Mer
Human Class I Histocompatibility
Antigen (Hla
Recognition Peptide
N-Terminal Histidine Tag
Conalbumin Peptide
Numb Associate Kinase
Factor Xiii Activation Peptide
(28-37)
12-Mer Peptide
Hla-Dr1 (Dra, Drb1 0101) Human
Class II Histo
Alpha-Thrombin (E.C. 3.4.21.5)
Complex With (
Ba3-Type Cytochrome-C Oxidase
Endothia Aspartic Proteinase
(Endothiapepsin)
Hiv-1 Gp120
NF01342382
P27797
PIR
SW
1 - 11
1 - 10
1 -11
1 -10
MoRE
partner
PDB ID
2ck0h
2clrd
NF00502045
NF00113306
NF00518514
Q9U485
GI:182837
PIR
PIR
PIR
SW
GB
1 - 10
1 - 14
1- 11
1 – 11
1 - 10
1 - 10
1 - 14
1 - 11
1439 - 1449
58-67
1cu4h
1d7qa
1d97k
1ddma
1de7k
NF00691447
Q03909
PIR
SW
1 - 12
1 - 13
1 - 12
327 - 339
1dkdb
1dlhb
GI:2297640
GB
1 - 11
384 - 394
1dwbh
P82543
NF00646473
SW
PIR
1 - 33
1 - 10
2 - 34
1 - 10
1ehkb
4er4e
NF00498472
PIR
1 - 11
1 - 11
2f58h
Cyclic Peptide (Gp120)
Immunoglobin Fc (Igg1)
Complexed With Protein
Beta-Acrosin Light Chain
Igg2A Fab Fragment (C3)
Complexed With Poliov
Antagonist Peptide Af10847
Bisubstrate Peptide Inhibitor
NF00528581
NF00155672
PIR
PIR
1 - 10
1- 57
1 - 10
47-103
3f58h
1fcca
Q9GL10
Q84865
SW
SW
1 - 22
1 - 18
18 - 39
678 - 695
1fiwa
1fptl
NF00130104
NF00138001
PIR
PIR
1 - 21
1 - 13
1 - 21
1 - 13
1g0yr
1gaga
Igg2A Fab Fragment (50.1)
Complex With 16-Res
NF00927547
PIR
2 - 17
2 - 17
1ggim
1cu4p
1d7qb
1d9kp
1ddmb
1de7a
1dkde
1dlhc
296
1dwbi
297
298
1ehkc
4er4i
299
2f58p
300
301
302
303
304
305
3f58p
1fccc
1fiwl
1fptp
1g0yi
1gagb
306
1ggip
307
45
MoRE partner
PDB Name
Immunoglobulin
Human Class I Histocompatibility
Antigen (Hl
Fab Heavy Chain
Translation Initiation Factor 1A
Mhc I-Ak B Chain (Beta Chain)
Numb Protein
Alpha-Thrombin (Heavy Chain)
Groel
Hla-Dr1 (Dra, Drb1 0101) Human
Class II Hist
Alpha-Thrombin (E.C. 3.4.21.5)
Complex With
Ba3-Type Cytochrome-C Oxidase
Endothia Aspartic Proteinase
(Endothiapepsin
Igg1 Fab 58.2 Antibody (Heavy
Chaiin)
Immunoglobulin Gamma I (58.2)
Immunoglobin Fc (Igg1) Complexed
With Protei
Beta-Acrosin Heavy Chain
Igg2A Fab Fragment (C3)
Complexed With Polio
Interleukin-1 Receptor, Type I
Insulin Receptor, Tyrosine Kinase
Domain
Igg2A Fab Fragment (50.1) Complex
With 16-Re
Appendix A: Molecular Recognition Features (or Elements) and their partners
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
1hagi
GI:2297634
GB
GI:50838420
GB
P05619
1jgdc
1jn5c
1jpfc
1juip
1jycp
1jyip
1klqb
Prethrombin2 (E.C. 3.4.21.5)
Complexed With H
Human Class I Histocompatibility
Antigen (Hla
Horse Leukocyte Elastase
Inhibitor (Hlei) - C
Mp-2
Alpha-Thrombin (E.C. 3.4.21.5)
Complex With H
Heat Labile Enterotoxin (Lt)
Mutant With Val
Hemagglutinin Ectodomain
(Soluble Fragment, T
Mp-2
Mp-1
Epidermal Growth Factor
Receptor, Egfrviii Pe
Igg1 Fab' Fragment (B13I2)
Complex With Pepti
Decameric Peptide Ligand From
The Mart-1/Mela
Peptide S10R
Fg-Repeat
Lcmv Peptidic Epitope Gp276
10-Mer Peptide
15-Mer Peptide
12-Mer Peptide
Mad2-Binding Peptide
1klgc
Triosephosphate Isomerase
308
1hhhc
309
1hleb
310
311
1hqqe
1hrti
312
1htlc
313
1htma
314
315
316
1hxlc
1hy2e
1i8ic
317
2igfp
318
1jf1c
319
320
321
322
323
324
325
326
327
MoRF
start - end
389 - 398
MoRE
partner
PDB ID
1hage
1 - 10
237 - 246
1hhhb
SW
1 - 31
349 - 379
1hlea
NF01417697
GI:1568172
PIR
GB
1 - 14
1 - 60
1 - 14
2 - 61
1hqqa
1hrth
GI:412520
GB
1 -49
210 - 258
1htla
GI:413463
GB
1 - 27
140 - 166
1htmb
NF01324821
NF01324820
NF00926580
PIR
PIR
PIR
1 - 14
1 - 12
1 - 12
1 - 14
1 - 12
1 - 12
1hxlb
1hy2d
1i8ib
P02247
SW
1 – 30
68 - 97
2igfh
GI:32260204
GB
1 - 10
10 - 19
1jf1b
NF01333403
Q86XD3
Q9WA79
NF01057159
NF01059711
NF01059710
NF00866633
PIR
SW
SW
PIR
PIR
PIR
PIR
1 - 10
1 - 10
1 - 11
1 - 10
1 - 15
1 - 12
1 - 12
1 - 10
1836 - 1847
281 - 291
1 - 10
1 - 15
1 - 12
1 - 12
1jgdb
1jn5b
1jpfb
1juid
1jycd
1jyid
1klqa
NF01019839
PIR
1 - 15
1 - 15
1klgd
46
Db match
start - end
MoRE partner
PDB Name
Prethrombin2 (E.C. 3.4.21.5)
Complexed With
Human Class I Histocompatibility
Antigen (Hl
Horse Leukocyte Elastase Inhibitor
(Hlei
Streptavidin
Alpha-Thrombin (E.C. 3.4.21.5)
Complex With
Heat Labile Enterotoxin (Lt) Mutant
With Val
Hemagglutinin Ectodomain (Soluble
Fragment,
Streptavidin
Streptavidin
Epidermal Growth Factor Receptor
Antibody Mr
Igg1 Fab' Fragment (B13I2)
Complex With Pept
Beta-2-Microglobulin
Beta-2-Microglobulin
Tap
Beta-2-Microglobulin
Concanavalin A
Concanavalin A
Concanavalin A
Mitotic Spindle Assembly
Checkpoint Protein
Enterotoxin Type C-3
Appendix A: Molecular Recognition Features (or Elements) and their partners
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
MoRE
partner
PDB ID
MoRE partner
PDB Name
1ktrm
Peptide
Peptide Linker
GI:668327
GB
1 - 20
1ktrh
Anti-His Tag Antibody 3D5 Variable
Heavy Cha
Minimized B-Domain Of Protein
A Z34C
Phosphopeptide
Epq(Phospho)Yeeipiyl
Myocyte-Specific Enhancer
Factor 2A
Hei-Toe I
Thioredoxin Mutant With Cys 35
Replaced By Al
Clip Peptide
NF00945281
PIR
1 - 34
258 - 277 , 263
- 282, 268 287
1 - 34
116xa
P03079
SW
1 - 11
321 - 331
1lcja
Immunoglobulin Gamma-1 Heavy
Chain Constant
P56==Lck== Tyrosine Kinase
Q03414
SW
1 - 12
308 - 319
1lewa
Mitogen-Activated Protein Kinase 14
NF01188315
O13075
PIR
SW
1 - 28
1 - 13
1 - 28
81 - 93
1mcva
1mdia
NF01197858
PIR
1 - 36
1 - 36
1mujb
P56488
SW
1 - 39
30 - 68
1nrnh
XP_342878
GB
1 - 10
1078 - 1087
1ntva
1om9p
Alpha-Thrombin (E.C. 3.4.21.5)
Non-Covalently
Apolipoprotein E Receptor-2
Peptide
15-Mer Peptide Fragment Of P56
Elastase 1
Thioredoxin Mutant With Cys 35
Replaced By A
H-2 Class II Histocompatibility
Antigen, A B
Alpha-Thrombin (E.C. 3.4.21.5)
Non-Covalentl
Disabled Homolog 1
Q9D8L5
SW
2 - 16
2 - 16
1om9a
1or8b
Substrate Peptide
XP_407876
GB
1 – 19
346 - 364
1or8a
1orhb
Substrate Peptide
Q7S480
SW
1 - 10
1orhb
1ou8c
Synthetic Ssra Peptide
NF01422527
PIR
1 – 11
466 - 475, 498
- 507
1 - 11
1ox1b
2pldb
11-Mer Peptide
Phospholipase C-Gamma-1 (E.C.
3.1.4.11) (C-Te
50S Ribosomal Protein L9
NF01756611
GI:189730
PIR
GB
1 - 11
1 - 10
1 - 11
918 - 1029
1ox1a
2plda
NF01342110
PIR
1 - 52
1 - 52
1pnue
328
1l6xb
329
1lcjb
330
1lewb
331
332
1mcvi
1mdib
333
1mujc
334
1nrnr
335
1ntvb
336
337
338
339
340
341
342
343
1pnuf
47
1ou8b
ADP-Ribosylation Factor Binding
Protein Gga1
Protein Arginine NMethyltransferase 1
Protein Arginine NMethyltransferase 1
Stringent Starvation Protein B
Homolog
Trypsinogen, Cationic
Phospholipase C-Gamma-1 (E.C.
3.1.4.11) (C-T
50S Ribosomal Protein L6
Appendix A: Molecular Recognition Features (or Elements) and their partners
344
345
346
347
348
349
MoRF
PDB ID
MoRF PDB
Name
MoRF
Dbref
MoRF
Db
MoRF
start - end
Db match
start - end
1pnu4
1pwwc
1qc6c
50S Ribosomal Protein L36
Lf20
Phe-Glu-Phe-Pro-Pro-Pro-ProThr-Asp-Glu-Glu
His Tag
Fibrinopeptide B
Consensus Fen-1 Peptide
Alpha-I Gliadin
Q9RSK0
NF01571451
S20887
SW
PIR
PIR
2 - 36
1 - 20
1 - 10
2 - 36
1 - 20
198-208
MoRE
partner
PDB ID
1pnu5
1pwwb
1qc6a
GI:13275534
NF01479229
NF01571458
NF01683718
GI
PIR
PIR
PIR
1 - 15
1 - 16
1 - 12
1 - 11
8 - 22
1 - 16
1 - 12
1 - 11
1qrjb
1r17b
1rxma
1s9vb
Myosin (Regulatory Domain) Chain A
Reaper
Serine Proteinase B Complex
With The Potato I
Semisynthetic Ribonuclease A
(RNase 1-118(Col
Semisynthetic Ribonuclease A
Mutant With Asp
Semisynthetic Ribonuclease A
Mutant With Asp
Ribonuclease A (Residues 1 - 118)
Complexed W
Ribonuclease A (Semisynthetic)
Crystallized F
Igg1 Monoclonal Fab Fragment
(Te33) Complex W
Alpha-Thrombin (E.C. 3.4.21.5)
Complex With H
Truncated Human Class I
Histocompatibility An
Theiler'S Murine
Q17042
SW
1 -60
780 - 839
1scmb
Q24475
P01080
SW
SW
1 - 10
1 -51
2 - 11
55 - 105
1sdza
4sgbe
GI:387884
GB
2 -15
143 - 156
1srna
NF00159871
PIR
1 - 10
114 - 124
3srna
NF00159749
PIR
1 - 10
114 - 124
3srna
NF00945476
PIR
1 - 14
1 - 14
1ssaa
GI:387884
GB
1 - 10
146 - 156
1ssca
GI:209556
GB
1 - 15
78 - 92
1teth
GI:490635
GB
1 - 13
67 - 79
1thrh
Q99MQ0
SW
1 - 10
351 - 360
1tmcb
NF01026525
PIR
1 - 31
1 - 31
1tmf3
1qrja
1r17c
1rxmb
1s9vc
350
1scma
351
352
1sdzb
4sgbi
353
1srnb
354
3srnb
355
4srnb
356
1ssab
357
1sscb
358
1tetp
359
1thri
360
1tmcc
361
362
1tmf4
48
MoRE partner
PDB Name
50S Ribosomal Protein L1P
Lethal Factor
Evh1 Domain From Ena/Vasp-Like
Protein
Htlv-I Capsid Protein
Fibrinogen-Binding Protein Sdrg
DNA Polymerase Sliding Clamp
Hla Class II Histocompatibility
Antigen, Dq(
Myosin (Regulatory Domain) - Chain
B
Apoptosis 1 Inhibitor
Serine Proteinase B Complex With
The Potato
Semisynthetic Ribonuclease A
(RNase 1-118(Co
Semisynthetic Ribonuclease A
Mutant With Asp
Semisynthetic Ribonuclease A
Mutant With Asp
Ribonuclease A (Residues 1 - 118)
Complexed
Ribonuclease A (Residues 1 - 118)
Complexed
Igg1 Monoclonal Fab Fragment
(Te33) Complex
Alpha-Thrombin (E.C. 3.4.21.5)
Complex With
Truncated Human Class I
Histocompatibility A
Theiler'S Murine Encephalomyelitis
Appendix A: Molecular Recognition Features (or Elements) and their partners
363
364
365
366
367
368
369
370
371
372
MoRF
PDB ID
MoRF PDB
Name
1vf5e
1vf5h
1vppx
1m0fb
1n6eb
1p4bp
1ow6d
1ow8d
1r5ve
1r5we
Encephalomyelitis Virus Coat
Protein Pet L
Protein Pet N
Peptide V108
Scaffolding Protein B
Dqtqkaaaeltff
Gcn4(7P-14P) Peptide
Paxillin
Paxillin
Artificial Peptide
Artificial Peptide
MoRF
Dbref
P83795
P83798
NF00088534
NF01151402
NF01138087
NF01743773
XP_341094
XP_341094
NF01572515
NF01676397
MoRF
Db
SW
SW
PIR
PIR
PIR
PIR
GB
GB
PIR
PIR
MoRF
start - end
1 - 32
1 - 29
1 -20
1 - 60
1 - 13
1 - 12
1 – 13
1 - 13
1 - 13
1 - 13
372 Non redundant MoRFs
(Source: PDB Seqres July 2004)
49
Db match
start - end
1 - 32
1 - 29
1 - 20
1 - 60
1 - 13
1 - 12
284 - 296
163 - 175
1 - 13
1 - 13
MoRE
partner
PDB ID
1vf5d
1vf5n
1vppw
1m0fg
1n6ec
1p4bh
1ow6c
1ow6c
1r5vb
1r5wd
MoRE partner
PDB Name
Virus Coa
Rieske Iron-Sulfur Protein
Cytochrome B6
Vascular Endothelial Growth Factor
Major Spike Protein G
Tricorn Protease
Antibody Variable Heavy Chain
Focal Adhesion Kinase 1
Focal Adhesion Kinase 1
Mhc H2-Ie-Beta
Mhc H2-Ie-Beta
Appendix B: MoRF Update
MoRF chains from PDB Seqres
Filtering (remove chains containing less than
10 residues, ambiguous amino acids etc)
Non Redundant MoRFs
July 2004
2512
1261
October 2005
4410
1937
372
486
50
References
1. Wright, P. E., and Dyson, H. J. (1999) Intrinsically unstructured proteins: Reassessing the protein structure-function paradigm, J. Mol. Biol. 293, 321-331.
2. Uversky VN, Gillespie JR, Fink AL. 2000. Why are "natively unfolded"
proteins unstructured under physiologic conditions? Proteins 41: 415-427
3. Dunker AK, Lawson JD, Brown CJ, Williams RM, Romero P, Oh JS, Oldfield
CJ, Campen AM, Ratliff CM, Hipps KW, Ausio J, Nissen MS, Reeves R,
Kang C, Kissinger CR, Bailey RW, Griswold MD, Chiu W, Garner EC,
Obradovic Z. 2001. Intrinsically disordered protein. J Mol Graph Model 19:
26-59
4. Dunker AK, Obradovic Z. 2001. The protein trinity--linking function and
disorder. Nat Biotechnol 19: 805-806
5. Demchenko AP. 2001. Recognition between flexible protein molecules:
induced and assisted folding. J Mol Recognit 14: 42-61
6. Namba K. 2001. Roles of partly unfolded conformations in macromolecular
self-assembly. Genes Cells 6: 1-12
7. Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z. 2002.
Intrinsic disorder and protein function. Biochemistry 41: 6573-6582
8. Dunker AK, Brown CJ, Obradovic Z. 2002. Identification and functions of
usefully disordered proteins. Adv Protein Chem 62: 25-49
9. Dunker, A. K., Obradovic, Z., Romero, P., Garner, E. C., and Brown, C. J.
(2000) Intrinsic protein disorder in complete genomes, Genome Inform. Ser.
Workshop Genome Inform. 11,161-171.
51
10. Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F. and Jones, D.T. (2004)
Prediction and functional analysis of native disorder in proteins from the three
kingdoms of life. J. Mol. Biol. 337, 635–645.
11. Iakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, Dunker AK. 2002.
Intrinsic disorder in cell-signaling and cancer-associated proteins. J Mol Biol
323: 573-584
12. Tompa P. 2002. Intrinsically unstructured proteins. Trends Biochem Sci 27:
527-533
13. Fink AL. 2005. Natively unfolded proteins. Curr Opin Struct Biol 15: 35-41
14. Dyson HJ, Wright PE. 2005. Intrinsically unstructured proteins and their
functions. Nat Rev Mol Cell Biol 6: 197-208;
15. Dunker A.K., Cortese M.S., Romero P., Iakoucheva L.M., Uversky V.N.
(2005) Flexible nets: The roles of intrinsic disorder in protein interaction
networks. FEBS Journal. (In press).
16. Uversky V.N., Oldfield, C., Dunker, A.K. (2005) Showing your ID: Intrinsic
disorder as an ID for recognition, regulation and cell signalling. J. Mol.
Recognition 18 (5) 343-384.
17. Uversky VN. 2002. Natively unfolded proteins: a point where biology waits
for physics. Protein Sci 11: 739-756;
18. Uversky V.N. (2003) Protein folding revisited. A polypeptide chain at the
folding – misfolding – non-folding crossroads: Which way to go? Cell. Mol.
Life Sci. 60 (9) 1852-1871.
19. Oldfield CJ, Cheng Y, Cortese MS, Brown CJ, Uversky VN, Dunker AK.
2005. Comparing and combining predictors of mostly disordered proteins.
Biochemistry 44: 1989-2000
52
20. Liu, J. and Rost, B. (2001).Comparing function and structure between entire
proteomes. Protein Sci 10: 1970-1979
21. Vucetic S, Brown CJ, Dunker AK, Obradovic Z. 2003. Flavors of protein
disorder. Proteins 52: 573-584-22. Callaghan, A.J., Aurikko, J.P., Ilag, L.L., Gunter Grossmann, J., Chandran, V.,
Kuhnel, K., Poljak, L., Carpousis, A.J., Robinson, C.V., Symmons, M.F. 2004.
Studies of the RNA degradosome-organizing domain of the Escherichia coli
ribonuclease RNase E. J. Mol. Biol. 340: 965-979
23. GlobPlot: exploring protein sequences for globularity and disorder
Nucleic Acid Res 2003 - Vol. 31, No.13
24. Demchenko AP. 2001. Recognition between flexible protein molecules:
induced and assisted folding. J Mol Recognit 14: 42-61
25. Namba K. 2001. Roles of partly unfolded conformations in macromolecular
self-assembly. Genes Cells 6: 1-12; Dunker AK, Brown CJ, Lawson JD,
Iakoucheva LM, Obradovic Z. 2002. Intrinsic disorder and protein function.
Biochemistry 41: 6573-6582
26. Gunasekaran K, Tsai CJ, Kumar S, Zanuy D, Nussinov R. 2003. Extended
disordered proteins: targeting function with less scaffold. Trends Biochem Sci
28: 81-85
27. Dyson HJ, Wright PE. 2005. Intrinsically unstructured proteins and their
functions. Nat Rev Mol Cell Biol 6: 197-208
28. Dyson HJ, Wright PE. 2002. Coupling of folding and binding for unstructured
proteins. Curr Opin Struct Biol 12: 54-60
53
29. Vucetic S, Obradovic Z, Vacic V, Radivojac P, Peng K, Iakoucheva LM,
Cortese MS, Lawson JD, Brown CJ, Sikes JG, Newton CD, and Dunker AK.
2005. "DisProt: A Database of Protein Disorder." Bioinformatics 21:137-140.
30. H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig,
I.N. Shindyalov, P.E. Bourne: The Protein Data Bank. Nucleic Acids
Research, 28 pp. 235-242 (2000
31. B Rost (1999) Twilight zone of protein sequence alignments. Protein
Engineering, 12, 85-94
32. Bairoch A., Apweiler R.The SWISS-PROT protein sequence database and its
supplement TrEMBL in 2000. Nucleic Acids Research 28:45-48(2000).
33. Cathy H. Wu, Lai-Su L. Yeh, Hongzhan Huang, Leslie Arminski, Jorge
Castro-Alvear, Yongxing Chen, Zhang-Zhi Hu, Robert S. Ledley, Panagiotis
Kourtesis, Baris E. Suzek, C. R. Vinayaka, Jian Zhang, and Winona C. Barker.
The Protein Information Resource. Nucleic Acids Research, 31: 345-347,
2003.
34. Kabsch W. & Sander C. (1983) Dictionary of protein secondary structure:
Pattern recognition of hydrogen-bonded and geometrical features, 22:25772637.
35. Rost, Burkhard; Sander, Chris: Prediction of protein structure at better than
70% accuracy. J. Mol. Biol., 1993, Vol. 232, pp. 584-599.
36. Rost, Burkhard; Sander, Chris: Combining evolutionary information and
neural networks to predict protein secondary structure. Proteins, 1994
37. Falquet L., Pagni M., Bucher P., Hulo N., Sigrist C.J, Hofmann K., Bairoch A.
The PROSITE database, its status in 2002. Nucleic Acids Research. 30:235238(2002).
54
38. Garner,E., Cannon,P., Romero,P., Obradovic,Z. and Dunker,A. (1998)
Predicting disordered regions from amino acid sequence: common themes
despite differing structural characterization. Genome Inform Ser Workshop
Genome Inform, 9, 201–213.
39. Garner, E., Romero, P., Dunker, A., Brown, C. and Obradovic, Z. (1999)
Predicting binding regions within disordered proteins. Genome Inform Ser
Workshop Genome Inform, 10, 41–50.
40. Dyson, H. J. & Wright, P. E. (2001). Nuclear magnetic resonance methods for
elucidation of structure and dynamics in disordered states. Methods Enzymol.
339, 258–270.
41. Fischer E, "Einfluss der configuration auf die wirkung derenzyme" Ber. Dt.
Chem. Ges. 27, 2985-2993 (1894).
42. Koshland D.E. (1958). Application of a theory of enzyme specificity to protein
synthesis. Proceedings of the National Academy of Sciences USA, 44(2), 98104. Wootton, J. C. (1994) Sequences with “unusual” amino acid
compositions, Curr. Opin. Struct. Biol. 4, 413-421.
43. Kim, T. D., Ryu, H. J., Cho, H. I., Yang, C. H., and Kim, J. (2000) Thermal
behavior of proteins: Heat-resistant proteins and their heat-induced secondary
structural changes, Biochemistry 39, 14839-14846.
44. Schweers, O., Schonbrunn-Hanebeck, E., Marx, A., and Mandelkow, E.
(1994) Structural studies of tau protein and Alzheimer paired helical filaments
show no evidence for â-structure, J. Biol Chem. 269, 24290-24297.
45. Gast, K., Damaschun, H., Eckert, K., Schulze-Forster, K., Maurer, H. R.,
Muller-Frohne, M., Zirwer, D., Czarnecki, J., and Damaschun, G. (1995)
55
Prothymosin R: A biologically active protein with random coil conformation,
Biochemistry 34, 13211-13218.
46. Shortle, D. & Ackerman, M. S. (2001). Persistence of native-like topology in a
denatured protein in 8 M urea. Science, 293, 487–489.
47. Shortle, D. (1996) The denatured state (the other half of the folding equation)
and its role in protein stability, FASEB J. 10, 27-34.
48. Tompa, P. (2003) The functional benefits of protein disorder, J. Mol. Struct.
666-667, 361-371.
49. Shoemaker, B. A., Portman, J. J. & Wolynes, P. G. (2000). Speeding
molecular recognition by using the folding funnel: the fly-casting mechanism.
Proc. Natl Acad. Sci. USA, 97, 8868–8873.
50. Zitzewitz, J. A., Ibarra-Molero, B., Fishel, D. R., Terry, K. L. & Matthews, C.
R. (2000). Preformed secondary structure drives the association reaction of
GCN4-p1, a model coiled-coil system. J. Mol. Biol. 296, 1105–1116.
51. Hollenbeck, J. J., McClain, D. L. & Oakley, M. G. (2002). The role of helix
stabilizing residues in GCN4 basic region folding and DNA binding. Protein
Science. 11, 2740–2747.
52. Li X, Romero P, Rani M, Dunker AK, Obradovic Z: Predicting Protein
Disorder for N-, C-, and Internal Regions.Genome Inform Ser Workshop
Genome Inform 1999, 10:30-40.
53. Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK: Sequence
complexity of disordered protein. Proteins 2001, 42:38-48.
54. Romero P, Obradovic Z, Dunker K: Sequence Data Analysis for Long
Disordered Regions Prediction in the Calcineurin Family. Genome Inform Ser
Workshop Genome Inform 1997, 8:110-124.
56
55. Obradovic Z., Peng K., Vucetic S., Radivojac P., Brown C. and Dunker A.K.,
Predicting intrinsic disorder from amino acid sequence (2003). Proteins 53
(S6); 566-572.
56. Sreerama, N., and Woody, R.W. (1994) Biochemistry 33, 10022-10025.
57. Mukhopadhyay, R., and Hoh, J. H. (2001) AFM force measurements on
microtubule-associated proteins: The projection domain exerts a long-range
repulsive force, FEBS Lett. 505, 374-378.
58. Sherr C J, Roberts J M. Inhibitors of mammalian G1 cyclin-dependent kinases.
Genes Dev. 1995; 9:1149–1163.
59. Kanamoto T, Mota MA, Takeda K, Rubin LL, Miyazopo K, Ichijo H, •
Bazenet CE: Role of apoptosis signal-regulating kinase in regulation of the cJun N-terminal kinase pathway and apoptosis in sympathetic neurons. Mol
Cell Biol 2000, 20:196-204.
60. Sheaff, R.J., Singer, J.D., Swanger, J., Smitherman, M., Roberts, J.M. and
Clurman, B.E. (2000) Proteasomal turnover of p21Cip1 does not require
p21Cip1 ubiquitination. Mol. Cell 5, 403–410.
61. David, D.C., Layfield, R., Serpell, L., Narain, Y., Goedert, M. and Spillantini,
M.G. (2002) Proteasomal degradation of tau protein. J. Neurochem. 83, 176–
185.
62. Liu, C.W., Corboy, M.J., DeMartino, G.N. and Thomas, P.J. (2003)
Endoproteolytic activity of the proteasome. Science 299, 408–411.
63. Cox, C.J., Dutta, K., Petri, E.T., Hwang, W.C., Lin, Y., Pascal, S.M. and
Basavappa, R. (2002) The regions of securin and cyclin B proteins recognized
by the ubiquitination machinery are natively unfolded. FEBS Lett. 527, 303–
308.
57
64. Hinnebusch, A. G., and G. R. Fink. 1983. Positive regulation in the general
control of Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA 80:53745378
65. Kim, P. S. & Baldwin, R. L. (1982). Specific intermediates in the folding
reactions of small proteins and the mechanism of protein folding. Annu. Rev.
Biochem. 51, 459–489.
66. Dafforn, T.R. and Smith, C.J. (2004) Natively unfolded domains in
endocytosis: hooks, lines and linkers. EMBO Rep. 5, 1046–1052.
67. Tompa, P. and Csermely, P. (2004) The role of structural disorder in the
function of RNA and protein chaperones. FASEB J. 18, 1169–1175.
68. Fiser, A., Dosztanyi, Z. & Simon, I. (1997). The role of long-range
interactions in defining the secondary structure of proteins is overestimated.
Comput. Appl.Biosci. 13, 297–301.
69. Burley, S.K. and Petsko, G.A. 1985]\Aromatic-aromatic interaction: A
mechanism of protein structure stabilization, "Science, vol. 229, pp. 23-28
70. G. N. Ramachandran and V. Sasiskharan (1968) Adv. Protein Chem. 23, 283437.
71. The NCBI handbook [Internet]. Bethesda (MD): National Library of Medicine
(US), National Center for Biotechnology Information; 2002 Oct. Available
from
58
CURRICULUM VITAE
AMRITA MOHAN
ammohan@indiana.edu
EDUCATION
2005 – Present PhD student, Informatics, Indiana University, Bloomington
2003 - 2005
Masters of Bioinformatics, Indiana University, IUPUI
1999 - 2003
Bachelor of Info. Technology, University of Delhi, India
RESEARCH/PROFESSIONAL EXPERIENCE
1.
2.
3.
4.
5
6
May ’05 – Aug ’05
Intern, Rosetta Inpharmatics - Merck, Seattle,
WA, USA
Aug ’03 – Aug ’05
Research
Assistant,
Center
for
Computational Biology & Bioinformatics, IUPUI, IN, USA
Jun'02 - Aug'02
Intern, Institute of Advanced Biosciences-‘E-Cell
Lab’, Japan
Jun ‘01 – May ’02
Project Trainee, Center for Biochemical
Technology (under Council for Scientific & Industrial Research), New
Delhi, India
Jul’98 – Apr’99
Experimental project “Gene probe for detection of
chronic mylogenous leukemia”, India
Jul’98 – Apr’99
Experimental project, “Gene expression for breast
cancer”, India
POSTERS & RESEARCH PUBLICATIONS
1.
2.
3.
Poster
Presentation: First Annual Indiana Bioinformatics
Conference, Department of Biochemistry & Molecular Biology Poster
Session, IUPUI, May 27, 2004.
Amrita Mohan, Predrag Radivojac and Keith Dunker
MoREs: Molecular Recognition Elements
Poster Presentation: Research Day, Department of Biochemistry &
Molecular Biology Poster Session, IUPUI, September 30, 2005.
Amrita Mohan, Predrag Radivojac and Keith Dunker,
MoREs: Molecular Recognition Elements
(Publication in process)
MoRFs: A dataset of Molecular Recognition Features
Amrita Mohan, Predrag Radivojac and Keith Dunker,
Download