Positive Charge Cores in DNA

advertisement
508
Genome Informatics 14: 508–509 (2003)
Positive Charge Cores in DNA-Binding Proteins
1
2
Kimiko Horibe1
Shigeki Mitaku2
horibe@proteome.bio.tuat.ac.jp
mitaku@nuap.nagoya-u.ac.jp
Tokyo University of Agriculture and Technology, 2-24-16 Nakacho, Koganei, Tokyo
184-8588, Japan
Nagoya University, Furocho, Chikusa-ku, Nagoya 464-8603, Japan
Keywords: DNA-binding protein, positive charge core
1
Introduction
There are many structural motifs of DNA-binding proteins, which play a central role in the regulation
of gene expression. The structural motifs are, for example, helix-loop-helix (HLH), basic leucine
zipper (bZIP), and Zinc finger. Because of the biological significance of DNA binding proteins, the
structure of DNA-protein complexes have been extensively studied, and the information about the
DNA-binding regions are already available for many DNA-binding proteins. Now, it is possible to
identify the structural motifs by using the corresponding sequence motifs.
However, the characteristics of DNA-binding proteins common to all structural motifs have hardly
studied yet. Brendel and Kirlin (1989) pointed out that many DNA-binding proteins have multiclusters of electric charges as their common features according to the analysis of limited number of
data [1]. The binding between negative charges of DNA and a positive charge cluster of a protein is
physically sound, and if the charge clusters are really common features irrespective of the structural
motifs, it will be very useful for the annotation of amino acid sequences from genome analysis.
We analyzed, in this work, 955 DNA-binding proteins whose binding regions are known and found
that there are dual clusters of positive charges at the distance of 20 to 30 residues for many DNA
binding proteins of different structural motifs.
2
Method
Amino acid sequences of DNA-binding proteins were obtained from SWISS-PROT (Rel.41; 2003 Mar.).
Non-redundant data (<25% sequence similarity) whose length are between 100 and 1500 were classified
into 11 groups according to the structural motifs. The largest group was homeo-domain which includes
299 data and the smallest ones were T- and ETS-domains with 28 data. Other groups are bZIP,
HLH, fork head, HMG domain, MYB repeat, Zinc finger (C 4 type), Zinc finger (C6 Zn2 type) and
miscellaneous. As references, we also made three dataset of RNA-binding proteins, nuclear proteins
which do not bind with DNA and cytoplasmic proteins. We defined Arg and Lys as a positive charge,
Glu and Asp as a negative charge and other residues as neutral, although His is usually considered to
have a positive charge.
3
Results
Figure 1 shows the distribution of positive and negative charges, in which the number of charges in
every five residues is plotted as a function of sequence number. Three data are shown homeo domain
(HX6 of sheep), bZIP (ATF3 of mouse) and fork head (FXL1 of mouse). Gray area in the charge
plots represents the DNA-binding regions. The length of DNA-binding regions varies from motif to
motif. However, the distance between dual peaks of positive charges was almost constant, 20 to 30
residues, as marked by a box of broken lines. The DNA-binding region of homeo domains (HXC6
sheep) is about 60 residues in length and has two regions of dual peaks of positive charge core, as
Positive Charge Cores in DNA-Binding Proteins
509
shown in Fig. 1(a). The DNA-binding regions of bZIP proteins have a dual peak of about 20 residues
long. An example of this structural motif, ATF3 of mouse, is shown in Fig. 1(b). Two dual peaks are
observed for fork head proteins, FXL1 of mouse, but one of the dual peaks is located at the outside
of the DNA-binding region (Fig. 1(c)).
Many other structural motifs also showed one or more dual peaks. Of course, we could not observe
this property of DNA-binding regions for some proportion of DNA-binding protein data. There were
also false positive data of cytoplasmic proteins which have a dual positive charge cores. However,
the ratio of the appearance of dual peaks in DNA-binding proteins was much higher than that in
cytoplasmic proteins.
(a)
Homeo domain
[HXC6_SHEEP]
(b)
bZIP
[ATF3_MOUSE]
(c)
Fork Head
[FXL1_MOUSE]
Figure 1: Plots of the positive and the negative charge distributions of amino acid sequences. The
sum of the number of charges in every five residues is plotted as a function of the sequence number.
Gray areas are DNA-binding site, duel positive cores are boxed by dotted line.
4
Discussion
DNA-binding proteins have common property to bind to DNA double strands. However, the structural
motifs of DNA-binding proteins are quite diverse, and it is difficult to find a rule that is applicable
to most DNA-binding proteins. In this work, we found that dual positive charge cores are one of
the common properties of DNA-binding proteins. Previous study by Brendel and Karlin suggested
that multi-clusters of charges are observable in DNA-binding proteins and rare in different types of
proteins [1]. However, more detailed study, in this work, showed that positive charge cores appear in
most DNA-binding proteins as dual peaks of about 20 residues long in the edge of the DNA-binding
regions.
References
[1] Brendel V. and Karlin S., Association of charge clusters with functional domains of cellular
transcription factors, Proc. Natl. Acad. Sci. USA, 86(15):5698–5702, 1989.
Download