#the protocol of BoBro2.0: BBR, BBS, BBC and BBA.

advertisement
#the protocol of BoBro2.0: BBR, BBS, BBC and BBA.
Functions
BoBro2.0
Unique feature
Motif Refining
BBR
Strong ability in filtering out noises at a genome scale
Motif Scanning
BBS
p-value assessment for all the scanned candidate motifs
Motif
Comparison
Motif
Annotation
BBC
BBA
Utilization of weak conserved signals of motifs’ flanking regions
when comparing motifs; A motif clustering algorithm
Motifs’ co-occurrence annotation
Installation
Simply put "BoBro2.0.tar.gz" in any directory,

$ tar zxvf BoBro2.0.tar.gz
enter the folder "BoBro2.0/" and type

$ ./INSTALL
BBR usage
The software called 'BBR' has the following functions:

De-nove Motif finding in a fasta format promoter file.

De-nove Motif finding with background sequences (if have) in fasta format.

Motif finding with a comparative genomic framework.
BBR: CMD line

Simply run the following cmd to get a brief guide.
$ perl BBR.pl

To do De-nove Motif finding in a fasta format promoter file:
$ perl BBR.pl 1 promoters

To do De-nove Motif finding with background sequences (if have) in fasta format:
$ perl BBR.pl 2 promoters background_file

To do Motif finding with a comparative genomic framework:
$ perl script/bbr_cmp_method.pl
BBR: Inputs and outputs
When used to do De-nove motif finding

The promoter file and background_file should be in standard fasta format,(see promoter
and background file in this folder for example).

The output file will be named promoters.closures, (see promoters.closures for example).
Basically, it contains

input data summary

command line summary

for each motif candidate found, there will be detailed information:
motif seed: the seed sequence used to find this motif(which is a 'core' of the motif);
motif position weight matrix and consensus;
a table show all the aligned motif.
When
used
to
do
Motif
finding
with
a
comparative genomic framework
For the target genome and reference genomes, three kinds of data is needed:

the genome data (which could be downloaded from ncbi genebank);

the operon data (which could be downloaded or predicted from DOOR database);

the orthology relationship between the target and references (which could be predicted use
RBH method or GOST);
NOTE: Please see the the contents of folder example for details.
Take E. coli as the target genome, two other species as reference.

target_list: a list of gi from Ecoli;

Ecoli.opr: operon structure of Ecoli;

Escherichia_coli_K_12_substrMG1655_uid57779: Ecoli data from NCBI;

ncbi_data: the directory contains the species reference information downloaded from
NCBI;
All the output files are stored in the folder example_output, which contains:
 result.txt: same as the a De-nove prediction;
 motif.alignment: all alignments of predicted motif;
 motif.alignment.similarity: similarity score between each motif, (same as the output of
BBC);
 Logos foreach motif are also given.
Data format

peron: operon structure from DOOR;
1:
16077069
2:
16077070
3:
16077071 16077072 255767014
4:
16077074

ortholog: orthology information between Ecoli and reference: (stanard output of GOST)
145698239,
187933779
5e-90,
6e-90
145698257,
187933775
2e-05,
3e-05
145698262,
187935634
1e-27,
6e-27
145698268,
187932476
2e-25,
9e-23
145698269,
187933610
5e-05,
7e-05
BBS Usage
This software provides a BoBro Based Searching/Scanning (BBS) tool capable of searching motifs
in a set of sequences using known motif patterns.
BBS: Inputs and outputs
The major program in the provided package is *BBS*, it can search motifs in a fasta file using known
alignment,
matrix
format
or
consensus
of
motifs,
and
(example_alignment, example_matrix and example consensus).
example
files
are
provided
For basic usage of motif scanning
Use packed PERL script BBS.pl by

basic usage:
$ perl BBS.pl

Search motif in alignment format:
$ perl BBS.pl motif_alignment promoters 1

Search motif in matrix format:
$ perl BBS.pl motif_matrix promoters 2

Search motif in consensus format
$ perl BBS.pl motif_consensus promoters 3

Search motif considering background genome:
$ perl BBS.pl motif_consensus promoters 1/2/3 background
For advanced usage of motif scanning
Use the program BBS

basic usage:
$ ./BBS -h (./BBS)

To search motif base on alignment
$ ./BBS -i example -j example_alignment $ ./BBS -i example -j example_alignment -D
(output seed only)
note: the minimal length of sequences in example should not be less than the minimal motif length
in example_alignment

To search motif base on frequency matrix
$ ./BBS -i example -m example_matrix BBS generates a output file, namely, '.motifinfo' file.
In '.motifinfo' file, it generates a closures corresponding to each alignment (matrix) in
example_alignment (example_matrix), in the increasing order of closure's pvalue.

To calculate the zscore compare to the background
$ ./BBS -i example -j example_matrix -z example -u .95

To transfer consensus format to matrix
$ ./BBS -p example_consensus -i example > example_consensus_matrix

To compare similarity between any pair of input motifs
$ ./BBS -i example -j example_alignment -C $ ./BBS -i example -j motif.txt -C (uninformative
cloumn example)

To change alignment to matrix and consensus
$ ./BBS -i example -j example_alignment -a

Furthermore, we can control the output result mainly by controlling four parameters
-e
[1,3]
the larger the e value the more searched TFBSs
-t
(0.3,0.9)
the smaller the t value the more searched TFBSs
-n
(0,0.3]
when e>1, the larger of n the stricter of searching strength
-E
if you want to get more searched TFBSs, adding -E
BBS: Input Formats
Matrix
A
5645410443512433003433333
C
5022121510015000070004410
G
0104652158680588227784448
T
1450038110014200921000030
Alignment
ATCAACTGAAACAAAACGAAAGATT
GAAAACCATTATCTTTCGTTTTATT
GACTTTCATTATGTTTCTTTTGTGA
ACCAAGTGAAATGAAACGAAAGGCA
AACTTTCAGTTTCTTTTCTATAGAT
AAATTTCGTTTTATTTCTTTTTTCT
GCAATCCCTTTTGCTTCCTTTATCT
GCCTTTCTTTTTCTTTCGTTTTGAT
CAGGGTCAATTAGCTTCGTTTTGAT
GCAAAACGAAATGAAACGAAAGTTT
AAGGTGGGCTTGCATTTGCTTAATA
Consensus
AGGRKTTBCCGA
BBC Usage
The software called 'BBC' has the following functions:
 Compare diffrent motif profiles.
 Cluster motif profiles.
 Annotate motifs in aligned sequences.
BBC: Inputs and outputs
The input file can be only sequences in standard
fasta forma
BBC will run BoBro firstly to get the motif prediction, then cluster and annotate them back to
aligned sequences.

$ perl BBC.pl SequenceFile
The three result files will be in folder 'SequenceFile.BBC':
 Result about motif prediction: SequenceFile.BBC/SequenceFile.closures;
 Result about motif comparison is in SequenceFile.BBC/SequenceFile.similarity;
 Result about motif cluster and annotation is in SequenceFile.BBC/SequenceFile.BBC.
Users can input SequenceFile and known motif
files in 4 format

BoBro standard output: 0;

alignment: 1;

matrix: 2;

consensus: 3.
the examples of the last three format are provided in current package (motif_alignment,
motif_consensus, motif_matrix).

perl BBC.pl SequenceFile motif_alignment [0/1/2/3]
The three result files will be in folder 'SequenceFile.BBC'
 Result
about
motif
prediction:
SequenceFile.BBC/SequenceFile.motif_alignment.closures;
 Result
about
motif
comparison
is
in
SequenceFile.BBC/SequenceFile.motif_alignment.similarity;
 Result
about
motif
cluster
and
annotation
is
in
SequenceFile.BBC/SequenceFile.motif_alignment.BBC;
BBC: Format of outputs
Result file about motif comparison
Here is an example
similarity
Motif-1
Motif-2
Motif-3
Motif-4
Motif-1
0.00 (0-0)
0.18 (1-1)
0.15 (4-1)
0.24 (2-1)
Motif-2
0.18 (1-1)
0.00 (0-0)
0.20 (2-1)
0.11 (3-1)
Motif-3
0.15 (1-4)
0.20 (1-2)
0.00 (0-0)
0.27 (2-1)
Motif-4
0.24 (1-2)
0.11 (1-3)
0.27 (1-2)
0.00 (0-0)
This is comparison result for 4 motifs. The decimals in 4*4 matrix (leading diagonal excluded) mean
similarity scores of corresponding motif pairs.
Result file about motif cluster and annotation
Here is an example of head information of the output file,
BOBRO-Based motif Comparison and Annotation (BBC) result:
Input sequences:
data.fasta;
Predicted motifs:
data.fasta.closures;
Motifs with hierarchical clustering:
Rank
Name
Length
M(Motif rank)-(1st level cluster)-(2nd level cluster)
1
Motif-1
14
M1_1_1
2
Motif-2
14
M2_2_2
3
Motif-3
14
M3_3_3
4
Motif-4
14
M4_2_4
Above is the information of cluster result for 4 motifs, followed by annotation on aligned sequences.
The label like 'M1_1_1' is cluster information of motifs, the first number is original label of motifs
(ranked by the decreasing order of z-score), the motifs with same second number means that they
are in same cluster with fair similarity and same third number means they share high similarity.
BBA Usage
The script called 'BBA.pl' is designed for motif 'co-occurrence Analysis'. Note: The computer you
use should have R installed.
BBA: CMD line
Simply run the following cmd to get a brief guide.

$ perl BBA.pl
BBA can do co-occurrence Analysis for file 'Input' by CMD:

$ perl BBA.pl Input SequenceNum
BBA: Inputs
The input file include TF name and the sequence lables which contain binding sites of this TF(see
'Motif_position_for_BBA' for example). The format of input file:
>AcrR
2251
1650
Here is a motif information for the TF AcrR: its name should have a prefix '>'; the number 2251
means there is a motif occurrence for AcrR in the 2251st promoter sequence.
BBA: Outputs
There are two output files: 'Input.BBA' and 'Input.BBA.all';
The significantly co-occurred TF pairs are collected in "Input.BBA"; Co-related scores for all TF
pairs are stored in "Input.BBA.all";
The data in result file have 7 columns with means:
 TF1
 TF2
 Hyper-geometric p-value
 Total sequences number
 The number of sequences contain binding sites of TF1
 The number of sequences contain binding sites of TF2
 The number of sequences contain binding sites for both TF1 and TF2
Contact
Any questions, problems, bugs are welcome and should be dumped to

Qin Ma <maqin2001@uga.edu>

Bingqiang Liu <bingqiangsdu@gmail.com>

Chuan Zhou <zhouchuan121@gmail.com>
Creation: June. 27, 2012
Download