Seminar in Computational Biology Research Lior Harpaz Ofer Shany 9.5.04 Identification of Transcription Factors Binding Sites The unveiling of the human genome and other genomes has generated a whole new set of challenges for molecular biologists. One of the greatest of these challenges is learning more about gene regulation. Transcription factors (TF) are an important aspect in gene regulation, and thus, identifying and predicting their binding sites (BS) serves as a valuable research tool.1 Experimental methods, such as DNA footprinting and EMSA, have proved useful in identifying TFs and their BSs. However, these, and other experimental methods, are usually time consuming and cannot be easily scaled up to whole genomes. As a result, computational methods have been developed to address the issue.1 A simple sequence search for TFBSs may not solve the problem, as the binding sites are short (5-15 bp), degenerate, and can appear in various locations in the genome.1 Various enrichment criteria are used to over come this difficulty. A common denominator of these methods is the distinction that functional importance is reflected in conservation of certain traits. For example, searching the upstream area of genes with similar mRNA expression pattern for regulatory sites is based on the notion that if genes are expressed together, they have similar regulation.1 Phylogenetic footprinting1,2 is another method for finding TFBSs. It is based on crossspecies comparisons of orthologous genes, and the assumption that orthologs are under similar regulation. Presumably conserved non-coding regions of orthologous genes have regulatory functionality, and therefore candidates for TFBSs are searched for there. An example for the methods success is the identification of a previously unknown TF, the YijC protein, in E.coli. Network-level conservation3 is based on the notion that transcription factors regulate many genes, and that the conservation of global gene expression between related organisms requires that most of these genes will maintain their regulation. In this approach searches are made for maximal overlap between groups of genes from different organisms, which contain similar candidates for TFBSs. A meaningful overlap suggests that the common motif is an actual functional binding site. Bulyk, M. Computational prediction of transcription-factor binding site locations. Genome Biol. 2003 5:201 1 2 McCue L, Thompson W, Carmack C, Ryan MP, Liu JS, Derbyshire V, Lawrence CE. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res. 2001 29:774-782 3 Pritzker M, Liu YC, Beer MA, Tavazoie S. Whole-genome discovery transcription factor binding sites by network-level conservation. Genome Res. 2004 14:99-108