Abstract

advertisement
Seminar in Computational Biology Research
Lior Harpaz
Ofer Shany
9.5.04
Identification of Transcription Factors Binding Sites
The unveiling of the human genome and other genomes has generated a whole new set of
challenges for molecular biologists. One of the greatest of these challenges is learning
more about gene regulation. Transcription factors (TF) are an important aspect in gene
regulation, and thus, identifying and predicting their binding sites (BS) serves as a
valuable research tool.1
Experimental methods, such as DNA footprinting and EMSA, have proved useful in
identifying TFs and their BSs. However, these, and other experimental methods, are
usually time consuming and cannot be easily scaled up to whole genomes. As a result,
computational methods have been developed to address the issue.1
A simple sequence search for TFBSs may not solve the problem, as the binding sites are
short (5-15 bp), degenerate, and can appear in various locations in the genome.1
Various enrichment criteria are used to over come this difficulty. A common denominator
of these methods is the distinction that functional importance is reflected in conservation
of certain traits. For example, searching the upstream area of genes with similar mRNA
expression pattern for regulatory sites is based on the notion that if genes are expressed
together, they have similar regulation.1
Phylogenetic footprinting1,2 is another method for finding TFBSs. It is based on crossspecies comparisons of orthologous genes, and the assumption that orthologs are under
similar regulation. Presumably conserved non-coding regions of orthologous genes have
regulatory functionality, and therefore candidates for TFBSs are searched for there. An
example for the methods success is the identification of a previously unknown TF, the
YijC protein, in E.coli.
Network-level conservation3 is based on the notion that transcription factors regulate
many genes, and that the conservation of global gene expression between related
organisms requires that most of these genes will maintain their regulation. In this
approach searches are made for maximal overlap between groups of genes from different
organisms, which contain similar candidates for TFBSs. A meaningful overlap suggests
that the common motif is an actual functional binding site.
Bulyk, M. Computational prediction of transcription-factor binding site locations. Genome Biol.
2003 5:201
1
2
McCue L, Thompson W, Carmack C, Ryan MP, Liu JS, Derbyshire V, Lawrence CE. Phylogenetic
footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res.
2001 29:774-782
3
Pritzker M, Liu YC, Beer MA, Tavazoie S. Whole-genome discovery transcription factor binding
sites by network-level conservation. Genome Res. 2004 14:99-108
Download