integrated approach for modeling physiological, biomechanical, and

advertisement
ANALYSIS OF P53 BINDING SITES BY USING CHIP-SEQ DATA
I.S. Yevshin1,2, Yu.V. Kondrakhin1,3, M. Turunen4, T. Kivioja4, F. Nikulenkov5, R.N.
Sharipov,1,6, J. Taipale4, G. Selivanova5, F.A. Kolpakov1,3,*
1
Institute of Systems Biology, Ltd, Novosibirsk, Russia; 2Novosibirsk State University,
Novosibirsk, Russia; 3Design Technological Institute of Digital Techniques SB RAS,
Novosibirsk, Russia; 4National Public Health Institute, Helsinki, Finland; 5Karolinska
Institutet, Stockholm, Sweden; 6Institute of Cytology and Genetics SB RAS,
Novosibirsk, Russia
e-mail: fedor@biouml.org
*Corresponding author
Key words: p53, binding site recognition, ChIP-seq, position weight matrix
Motivation and Aim: Transcription factor p53 is a well-known tumor suppressor.
Mutations of p53 binding sites are basic hallmarks of many types of cancer. The
canonical structure of p53 binding sites is described by two decameric sequences
PuPuPuC(A/T)(T/A)GPyPyPy separated by a spacer of 0–13 bp. Known models for
recognition of p53 sites were built earlier on the base of separate small training sets
obtained by less comprehensive methods than ChIP-seq. Recently, five sets of ChIP-seq
data were obtained in the frameworks of “Net2Drug” project that allowed to perform: 1)
comparison of methods for identification of transcription factor (TF)-binding fragments;
2) construction of more accurate method for p53 binding site prediction.
Methods and Algorithms: ChIP-seq data were obtained in the experiment of treatment
of breast cancer MCF7 cell line by activators of p53 10 uM Nutlin3a, 0.1 and 1 uM
RITA (Reactivation of p53 and Induction of Tumor cell Apoptosis) and 100 uM 5FluoroUracil. Three methods SISSRs (Site Identification from Short Sequence Reads),
MACS (Model-based Analysis of ChIP-Seq) and KOLI were applied to identification of
p53-binding fragments. For recognition of potential binding sites the extended position
weight matrix (PWM) method [1] was used. Our new alignment method and
clusterization method adapted to ChIP-seq data (unpublished) were applied to construct
a set of more optimal PWMs.
Results and Conclusion: Different methods for TF-binding fragments identification
generate quite distinct sets of fragments. Thus, in case of 1uM RITA the methods
SISSRs, MACS and KOLI demonstrated overlapping of 48.13% of fragments only. On
the base of analyzed ChIP-seq data more effective PWMs for p53 were built. The
sensitivity and False Discovery Rate were selected as a measure of accuracy of
recognition procedures. According to our PWMs, the most typical motif of p53-binding
sites
is
(A/C/T)NN(A/G)(G/a)(A/G)CATG(C/T)CCA(G/a)(A/g)CATG(C/t)(C/t)(C/t)NN.
Analysis of p53-binding sites allowed to conclude that spacers of 6, 7 and 8bp are more
considerable. Analysis of human 6th chromosome demonstrated that SINE/Alu and
LINE/L1 repeats are also significantly enriched by p53-motifs. To avoid this effect we
constructed additional PWMs specific to these repeats that allowed us to decrease
essentially the false positives of p53-binding sites prediction by our method.
Acknowledgements: This work was supported by EU grant №037590 “Net2Drug”.
References.
1. E.A.Ananko et al. (2007) Recognition of interferon-inducible sites, promoters, and
enhancers, BMC Bioinformatics, 8:56: 1-14.
Download