508 Genome Informatics 14: 508–509 (2003) Positive Charge Cores in DNA-Binding Proteins 1 2 Kimiko Horibe1 Shigeki Mitaku2 horibe@proteome.bio.tuat.ac.jp mitaku@nuap.nagoya-u.ac.jp Tokyo University of Agriculture and Technology, 2-24-16 Nakacho, Koganei, Tokyo 184-8588, Japan Nagoya University, Furocho, Chikusa-ku, Nagoya 464-8603, Japan Keywords: DNA-binding protein, positive charge core 1 Introduction There are many structural motifs of DNA-binding proteins, which play a central role in the regulation of gene expression. The structural motifs are, for example, helix-loop-helix (HLH), basic leucine zipper (bZIP), and Zinc finger. Because of the biological significance of DNA binding proteins, the structure of DNA-protein complexes have been extensively studied, and the information about the DNA-binding regions are already available for many DNA-binding proteins. Now, it is possible to identify the structural motifs by using the corresponding sequence motifs. However, the characteristics of DNA-binding proteins common to all structural motifs have hardly studied yet. Brendel and Kirlin (1989) pointed out that many DNA-binding proteins have multiclusters of electric charges as their common features according to the analysis of limited number of data [1]. The binding between negative charges of DNA and a positive charge cluster of a protein is physically sound, and if the charge clusters are really common features irrespective of the structural motifs, it will be very useful for the annotation of amino acid sequences from genome analysis. We analyzed, in this work, 955 DNA-binding proteins whose binding regions are known and found that there are dual clusters of positive charges at the distance of 20 to 30 residues for many DNA binding proteins of different structural motifs. 2 Method Amino acid sequences of DNA-binding proteins were obtained from SWISS-PROT (Rel.41; 2003 Mar.). Non-redundant data (<25% sequence similarity) whose length are between 100 and 1500 were classified into 11 groups according to the structural motifs. The largest group was homeo-domain which includes 299 data and the smallest ones were T- and ETS-domains with 28 data. Other groups are bZIP, HLH, fork head, HMG domain, MYB repeat, Zinc finger (C 4 type), Zinc finger (C6 Zn2 type) and miscellaneous. As references, we also made three dataset of RNA-binding proteins, nuclear proteins which do not bind with DNA and cytoplasmic proteins. We defined Arg and Lys as a positive charge, Glu and Asp as a negative charge and other residues as neutral, although His is usually considered to have a positive charge. 3 Results Figure 1 shows the distribution of positive and negative charges, in which the number of charges in every five residues is plotted as a function of sequence number. Three data are shown homeo domain (HX6 of sheep), bZIP (ATF3 of mouse) and fork head (FXL1 of mouse). Gray area in the charge plots represents the DNA-binding regions. The length of DNA-binding regions varies from motif to motif. However, the distance between dual peaks of positive charges was almost constant, 20 to 30 residues, as marked by a box of broken lines. The DNA-binding region of homeo domains (HXC6 sheep) is about 60 residues in length and has two regions of dual peaks of positive charge core, as Positive Charge Cores in DNA-Binding Proteins 509 shown in Fig. 1(a). The DNA-binding regions of bZIP proteins have a dual peak of about 20 residues long. An example of this structural motif, ATF3 of mouse, is shown in Fig. 1(b). Two dual peaks are observed for fork head proteins, FXL1 of mouse, but one of the dual peaks is located at the outside of the DNA-binding region (Fig. 1(c)). Many other structural motifs also showed one or more dual peaks. Of course, we could not observe this property of DNA-binding regions for some proportion of DNA-binding protein data. There were also false positive data of cytoplasmic proteins which have a dual positive charge cores. However, the ratio of the appearance of dual peaks in DNA-binding proteins was much higher than that in cytoplasmic proteins. (a) Homeo domain [HXC6_SHEEP] (b) bZIP [ATF3_MOUSE] (c) Fork Head [FXL1_MOUSE] Figure 1: Plots of the positive and the negative charge distributions of amino acid sequences. The sum of the number of charges in every five residues is plotted as a function of the sequence number. Gray areas are DNA-binding site, duel positive cores are boxed by dotted line. 4 Discussion DNA-binding proteins have common property to bind to DNA double strands. However, the structural motifs of DNA-binding proteins are quite diverse, and it is difficult to find a rule that is applicable to most DNA-binding proteins. In this work, we found that dual positive charge cores are one of the common properties of DNA-binding proteins. Previous study by Brendel and Karlin suggested that multi-clusters of charges are observable in DNA-binding proteins and rare in different types of proteins [1]. However, more detailed study, in this work, showed that positive charge cores appear in most DNA-binding proteins as dual peaks of about 20 residues long in the edge of the DNA-binding regions. References [1] Brendel V. and Karlin S., Association of charge clusters with functional domains of cellular transcription factors, Proc. Natl. Acad. Sci. USA, 86(15):5698–5702, 1989.