#the protocol of BoBro2.0: BBR, BBS, BBC and BBA. Functions BoBro2.0 Unique feature Motif Refining BBR Strong ability in filtering out noises at a genome scale Motif Scanning BBS p-value assessment for all the scanned candidate motifs Motif Comparison Motif Annotation BBC BBA Utilization of weak conserved signals of motifs’ flanking regions when comparing motifs; A motif clustering algorithm Motifs’ co-occurrence annotation Installation Simply put "BoBro2.0.tar.gz" in any directory, $ tar zxvf BoBro2.0.tar.gz enter the folder "BoBro2.0/" and type $ ./INSTALL BBR usage The software called 'BBR' has the following functions: De-nove Motif finding in a fasta format promoter file. De-nove Motif finding with background sequences (if have) in fasta format. Motif finding with a comparative genomic framework. BBR: CMD line Simply run the following cmd to get a brief guide. $ perl BBR.pl To do De-nove Motif finding in a fasta format promoter file: $ perl BBR.pl 1 promoters To do De-nove Motif finding with background sequences (if have) in fasta format: $ perl BBR.pl 2 promoters background_file To do Motif finding with a comparative genomic framework: $ perl script/bbr_cmp_method.pl BBR: Inputs and outputs When used to do De-nove motif finding The promoter file and background_file should be in standard fasta format,(see promoter and background file in this folder for example). The output file will be named promoters.closures, (see promoters.closures for example). Basically, it contains input data summary command line summary for each motif candidate found, there will be detailed information: motif seed: the seed sequence used to find this motif(which is a 'core' of the motif); motif position weight matrix and consensus; a table show all the aligned motif. When used to do Motif finding with a comparative genomic framework For the target genome and reference genomes, three kinds of data is needed: the genome data (which could be downloaded from ncbi genebank); the operon data (which could be downloaded or predicted from DOOR database); the orthology relationship between the target and references (which could be predicted use RBH method or GOST); NOTE: Please see the the contents of folder example for details. Take E. coli as the target genome, two other species as reference. target_list: a list of gi from Ecoli; Ecoli.opr: operon structure of Ecoli; Escherichia_coli_K_12_substrMG1655_uid57779: Ecoli data from NCBI; ncbi_data: the directory contains the species reference information downloaded from NCBI; All the output files are stored in the folder example_output, which contains: result.txt: same as the a De-nove prediction; motif.alignment: all alignments of predicted motif; motif.alignment.similarity: similarity score between each motif, (same as the output of BBC); Logos foreach motif are also given. Data format peron: operon structure from DOOR; 1: 16077069 2: 16077070 3: 16077071 16077072 255767014 4: 16077074 ortholog: orthology information between Ecoli and reference: (stanard output of GOST) 145698239, 187933779 5e-90, 6e-90 145698257, 187933775 2e-05, 3e-05 145698262, 187935634 1e-27, 6e-27 145698268, 187932476 2e-25, 9e-23 145698269, 187933610 5e-05, 7e-05 BBS Usage This software provides a BoBro Based Searching/Scanning (BBS) tool capable of searching motifs in a set of sequences using known motif patterns. BBS: Inputs and outputs The major program in the provided package is *BBS*, it can search motifs in a fasta file using known alignment, matrix format or consensus of motifs, and (example_alignment, example_matrix and example consensus). example files are provided For basic usage of motif scanning Use packed PERL script BBS.pl by basic usage: $ perl BBS.pl Search motif in alignment format: $ perl BBS.pl motif_alignment promoters 1 Search motif in matrix format: $ perl BBS.pl motif_matrix promoters 2 Search motif in consensus format $ perl BBS.pl motif_consensus promoters 3 Search motif considering background genome: $ perl BBS.pl motif_consensus promoters 1/2/3 background For advanced usage of motif scanning Use the program BBS basic usage: $ ./BBS -h (./BBS) To search motif base on alignment $ ./BBS -i example -j example_alignment $ ./BBS -i example -j example_alignment -D (output seed only) note: the minimal length of sequences in example should not be less than the minimal motif length in example_alignment To search motif base on frequency matrix $ ./BBS -i example -m example_matrix BBS generates a output file, namely, '.motifinfo' file. In '.motifinfo' file, it generates a closures corresponding to each alignment (matrix) in example_alignment (example_matrix), in the increasing order of closure's pvalue. To calculate the zscore compare to the background $ ./BBS -i example -j example_matrix -z example -u .95 To transfer consensus format to matrix $ ./BBS -p example_consensus -i example > example_consensus_matrix To compare similarity between any pair of input motifs $ ./BBS -i example -j example_alignment -C $ ./BBS -i example -j motif.txt -C (uninformative cloumn example) To change alignment to matrix and consensus $ ./BBS -i example -j example_alignment -a Furthermore, we can control the output result mainly by controlling four parameters -e [1,3] the larger the e value the more searched TFBSs -t (0.3,0.9) the smaller the t value the more searched TFBSs -n (0,0.3] when e>1, the larger of n the stricter of searching strength -E if you want to get more searched TFBSs, adding -E BBS: Input Formats Matrix A 5645410443512433003433333 C 5022121510015000070004410 G 0104652158680588227784448 T 1450038110014200921000030 Alignment ATCAACTGAAACAAAACGAAAGATT GAAAACCATTATCTTTCGTTTTATT GACTTTCATTATGTTTCTTTTGTGA ACCAAGTGAAATGAAACGAAAGGCA AACTTTCAGTTTCTTTTCTATAGAT AAATTTCGTTTTATTTCTTTTTTCT GCAATCCCTTTTGCTTCCTTTATCT GCCTTTCTTTTTCTTTCGTTTTGAT CAGGGTCAATTAGCTTCGTTTTGAT GCAAAACGAAATGAAACGAAAGTTT AAGGTGGGCTTGCATTTGCTTAATA Consensus AGGRKTTBCCGA BBC Usage The software called 'BBC' has the following functions: Compare diffrent motif profiles. Cluster motif profiles. Annotate motifs in aligned sequences. BBC: Inputs and outputs The input file can be only sequences in standard fasta forma BBC will run BoBro firstly to get the motif prediction, then cluster and annotate them back to aligned sequences. $ perl BBC.pl SequenceFile The three result files will be in folder 'SequenceFile.BBC': Result about motif prediction: SequenceFile.BBC/SequenceFile.closures; Result about motif comparison is in SequenceFile.BBC/SequenceFile.similarity; Result about motif cluster and annotation is in SequenceFile.BBC/SequenceFile.BBC. Users can input SequenceFile and known motif files in 4 format BoBro standard output: 0; alignment: 1; matrix: 2; consensus: 3. the examples of the last three format are provided in current package (motif_alignment, motif_consensus, motif_matrix). perl BBC.pl SequenceFile motif_alignment [0/1/2/3] The three result files will be in folder 'SequenceFile.BBC' Result about motif prediction: SequenceFile.BBC/SequenceFile.motif_alignment.closures; Result about motif comparison is in SequenceFile.BBC/SequenceFile.motif_alignment.similarity; Result about motif cluster and annotation is in SequenceFile.BBC/SequenceFile.motif_alignment.BBC; BBC: Format of outputs Result file about motif comparison Here is an example similarity Motif-1 Motif-2 Motif-3 Motif-4 Motif-1 0.00 (0-0) 0.18 (1-1) 0.15 (4-1) 0.24 (2-1) Motif-2 0.18 (1-1) 0.00 (0-0) 0.20 (2-1) 0.11 (3-1) Motif-3 0.15 (1-4) 0.20 (1-2) 0.00 (0-0) 0.27 (2-1) Motif-4 0.24 (1-2) 0.11 (1-3) 0.27 (1-2) 0.00 (0-0) This is comparison result for 4 motifs. The decimals in 4*4 matrix (leading diagonal excluded) mean similarity scores of corresponding motif pairs. Result file about motif cluster and annotation Here is an example of head information of the output file, BOBRO-Based motif Comparison and Annotation (BBC) result: Input sequences: data.fasta; Predicted motifs: data.fasta.closures; Motifs with hierarchical clustering: Rank Name Length M(Motif rank)-(1st level cluster)-(2nd level cluster) 1 Motif-1 14 M1_1_1 2 Motif-2 14 M2_2_2 3 Motif-3 14 M3_3_3 4 Motif-4 14 M4_2_4 Above is the information of cluster result for 4 motifs, followed by annotation on aligned sequences. The label like 'M1_1_1' is cluster information of motifs, the first number is original label of motifs (ranked by the decreasing order of z-score), the motifs with same second number means that they are in same cluster with fair similarity and same third number means they share high similarity. BBA Usage The script called 'BBA.pl' is designed for motif 'co-occurrence Analysis'. Note: The computer you use should have R installed. BBA: CMD line Simply run the following cmd to get a brief guide. $ perl BBA.pl BBA can do co-occurrence Analysis for file 'Input' by CMD: $ perl BBA.pl Input SequenceNum BBA: Inputs The input file include TF name and the sequence lables which contain binding sites of this TF(see 'Motif_position_for_BBA' for example). The format of input file: >AcrR 2251 1650 Here is a motif information for the TF AcrR: its name should have a prefix '>'; the number 2251 means there is a motif occurrence for AcrR in the 2251st promoter sequence. BBA: Outputs There are two output files: 'Input.BBA' and 'Input.BBA.all'; The significantly co-occurred TF pairs are collected in "Input.BBA"; Co-related scores for all TF pairs are stored in "Input.BBA.all"; The data in result file have 7 columns with means: TF1 TF2 Hyper-geometric p-value Total sequences number The number of sequences contain binding sites of TF1 The number of sequences contain binding sites of TF2 The number of sequences contain binding sites for both TF1 and TF2 Contact Any questions, problems, bugs are welcome and should be dumped to Qin Ma <maqin2001@uga.edu> Bingqiang Liu <bingqiangsdu@gmail.com> Chuan Zhou <zhouchuan121@gmail.com> Creation: June. 27, 2012