ANALYSIS OF P53 BINDING SITES BY USING CHIP-SEQ DATA I.S. Yevshin1,2, Yu.V. Kondrakhin1,3, M. Turunen4, T. Kivioja4, F. Nikulenkov5, R.N. Sharipov,1,6, J. Taipale4, G. Selivanova5, F.A. Kolpakov1,3,* 1 Institute of Systems Biology, Ltd, Novosibirsk, Russia; 2Novosibirsk State University, Novosibirsk, Russia; 3Design Technological Institute of Digital Techniques SB RAS, Novosibirsk, Russia; 4National Public Health Institute, Helsinki, Finland; 5Karolinska Institutet, Stockholm, Sweden; 6Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia e-mail: fedor@biouml.org *Corresponding author Key words: p53, binding site recognition, ChIP-seq, position weight matrix Motivation and Aim: Transcription factor p53 is a well-known tumor suppressor. Mutations of p53 binding sites are basic hallmarks of many types of cancer. The canonical structure of p53 binding sites is described by two decameric sequences PuPuPuC(A/T)(T/A)GPyPyPy separated by a spacer of 0–13 bp. Known models for recognition of p53 sites were built earlier on the base of separate small training sets obtained by less comprehensive methods than ChIP-seq. Recently, five sets of ChIP-seq data were obtained in the frameworks of “Net2Drug” project that allowed to perform: 1) comparison of methods for identification of transcription factor (TF)-binding fragments; 2) construction of more accurate method for p53 binding site prediction. Methods and Algorithms: ChIP-seq data were obtained in the experiment of treatment of breast cancer MCF7 cell line by activators of p53 10 uM Nutlin3a, 0.1 and 1 uM RITA (Reactivation of p53 and Induction of Tumor cell Apoptosis) and 100 uM 5FluoroUracil. Three methods SISSRs (Site Identification from Short Sequence Reads), MACS (Model-based Analysis of ChIP-Seq) and KOLI were applied to identification of p53-binding fragments. For recognition of potential binding sites the extended position weight matrix (PWM) method [1] was used. Our new alignment method and clusterization method adapted to ChIP-seq data (unpublished) were applied to construct a set of more optimal PWMs. Results and Conclusion: Different methods for TF-binding fragments identification generate quite distinct sets of fragments. Thus, in case of 1uM RITA the methods SISSRs, MACS and KOLI demonstrated overlapping of 48.13% of fragments only. On the base of analyzed ChIP-seq data more effective PWMs for p53 were built. The sensitivity and False Discovery Rate were selected as a measure of accuracy of recognition procedures. According to our PWMs, the most typical motif of p53-binding sites is (A/C/T)NN(A/G)(G/a)(A/G)CATG(C/T)CCA(G/a)(A/g)CATG(C/t)(C/t)(C/t)NN. Analysis of p53-binding sites allowed to conclude that spacers of 6, 7 and 8bp are more considerable. Analysis of human 6th chromosome demonstrated that SINE/Alu and LINE/L1 repeats are also significantly enriched by p53-motifs. To avoid this effect we constructed additional PWMs specific to these repeats that allowed us to decrease essentially the false positives of p53-binding sites prediction by our method. Acknowledgements: This work was supported by EU grant №037590 “Net2Drug”. References. 1. E.A.Ananko et al. (2007) Recognition of interferon-inducible sites, promoters, and enhancers, BMC Bioinformatics, 8:56: 1-14.