CoBalt User Guide v0

advertisement

CoBaltDB User Guide

v1.1

© B@SIC, UMR 6026 – December 2009

1. Introduction

1.1. Objectives

CoBaltDB, the Complete Bacterial and Archaeal Orfeomes Subcellular Localization Database, is a Client-Server application, which aims at presenting the cellular localisation of all prokaryotic proteins of all sequenced genomes, as predicted by numerous bioinformatic localisation tools. A list of all the tools used in CoBaltDB is given at the end of this guide.

1.2. Graphical User Interface (GUI)

Not only does CoBaltDB supply the localization predictions given by the selected tools, but it also seeks to facilitate the job of the biologist or bioanalyst by providing a certain number of potentialities, such as:

 Giving the tools predictions of the subcellular localizations of a list of proteins identified by their locus tags or by their genome name;

Giving the tools predictions of the presence or truth of localization features

(Tat or Sec signal peptides, lipoproteins, presence of transmembrane domains, beta barrels) for all considered proteins;

Providing the annotation information for all considered proteins, with links to

 the corresponding NCBI and KEGG web sites;

Sorting the proteins with respect to the presence or truth of their localization

 features;

Providing the raw data of the tools used within CoBaltDB for the considered

 proteins;

Supplying a synopsis recapitulating all localization-related information;

 Providing user-friendly graphs showing the positions of the signal peptides and transmembrane domains predicted by all considered localization tools;

Allowing the user to submit the protein sequence to another ~50 localization tools;

Allowing the user to save the tables and sysnopsis to xls and pdf files, respectively.

2. Installation & Launch of the Application

CoBaltDB is a client-server application, with the server installed and staying at Biogenouest

Bioinformatics Platform, keeping all needed pre-computed data, while the CoBaltDB Client or GUI is a Java application which communicates with the server via web-services. The CoBaltDB Client needs to be downloaded on your computer and can be found on web site: http://www.umr6026.univ-rennes1.fr/english/home/research/basic/software/cobalten

In order to run, CoBaltDB needs Java JRE 5 (or a more recent version). If not already installed on your machine, the latter can be downloaded at the following address: http://java.sun.com/javase/downloads/index_jdk5.jsp

Once CoBaltDB has been downloaded, unzip the CoBaltDB.zip or CoBaltDB.tar.gz file by clicking on it, or by typing under Linux: tar -xzvf CoBaltDB.tar.gz

A

CoBaltDB/

directory should appear. In order to launch CoBaltDB, no matter which platform, first go to the

CoBaltDB/

directory;

On Windows, simply double-click on file:

StartCoBaltDB.bat

On Mac OS X, double-click on file:

StartCoBaltDB.command

On Linux, double-click on file:

StartCoBaltDB.sh

or in a terminal window, type:

./StartCoBaltDB

Depending on the requests submitted, the CoBaltDB client may require large amounts of memory. By default, this is somewhat accounted for within the files above. However, if your computer has less than 1Go available RAM, please replace the

-Xmx1g

option inside the above corresponding file by

–Xmx128m, –Xmx256m

or

–Xmx512m according to your system available RAM.

The application CoBaltDB launches; after a few seconds, you should see the following window appear:

Figure 1. The Input tab at the beginning.

3. Description of the Application

3.1. The Input Tab

The Input tab, already shown in Figure 1, allows the biologist to express two kinds of requests to

CoBaltDB: a) What are the localization predictions for all proteins of a given genome? b) What are the localization predictions for a list of proteins defined by their locus tags?

Depending on the question, the bio-analyst needs to check the appropriate radio button:

3.1.1. Requesting all proteins of a genome

The genome of interest must be selected. The biologist may use the editable text field

to enter parts of the genome name, or simply browse through the genome names given in alphabetical order.

Once the genome is selected, the biologist may submit the request to CoBaltDB. This is performed by clicking on the button. The CoBaltDB server then receives the request, reads it, recognizes the submitted name of the pre-computed genome, and returns to the client the desired data. The latter contains the localization information of all the genes belonging to the selected genome.

Once the data has returned from the server, the CoBaltDB client window switches to present a table showing the results of all feature tool boxes and associated databases, for all genes belonging to the selected genome (Figure 2):

Figure 2. The Feature Boxes and Localization Cards Table.

It can be seen that, in addition to the Input tab, new tabs have been included in the CoBaltDB client window:

A Specialized Tools (Feature Predictions) tab showing all proteins together with their corresponding annotation information, results of the feature box tools and databases, and links to their synopsis.

A Meta Tools (Localization Predictions) tab showing the localization predictions for each protein from all retained global tools and global databases;

An Additionnal Tools (Posts) tab enabling the submission of sequences to yet another 50 localization tools or so.

All these tabs will be described in details in Sections 3.2. and 3.3.

3.1.2. Requesting a list of proteins designated by their locus tags

The alternative input consists in requesting a list of proteins. This is performed by selecting the appropriate radio-button; a new panel appears on the input tab:

A list of locus tags may be constructed from this panel: the biologist needs to enter the first locus tag in the text field , and then click on the to the list which is shown just below.

button to add this locus tag

This step may be reproduced several times, finally yielding the desired list of locus tags.

Alternatively, the list of locus tags may be loaded from a text file, which contains one locus tag per line, by clicking on the button . Selecting a particular locus tag within the list and then clicking on the button removes the selected locus tag from the list. The list is given the name appearing in the designation text field . At last, clicking on the button submits the request to the server and the localization feature panel is displayed, showing the localization features of the proteins corresponding to the requested genes:

3.2. The Specialized Tools (Feature Predictions) Tab

3.2.1. The Main Panel

This panels presents, for each gene in the replicon or genome, or for each gene whose locus tag belongs to the uploaded list, some annotation and localization information: the locus tag of the gene, its protein identifier (id.), gene name and description (as present in the annotation), replicon name, feature boxes and predictions of localization databases are shown on a single line.

Selecting a line (corresponding to a single gene) and clicking on the and

buttons results in opening the default browser to the NCBI and KEGG information web pages for that gene, respectively, e.g.

There are five feature boxes: the Lipo, Tat, Sec,  Helix (transmembrane) and  Barrel

(outer membrane) boxes. These boxes gather the results from the different tools, integrated in

CoBaltDB, and provide some prediction for the corresponding feature: for instance, the Tat box gathers the tools predicting the presence/absence of Tat signal peptides within each protein.

For each box and each protein, the percentage of tools predicting the presence or truth of the considered feature can be visualized by clicking on the protein line. This percentage is also somewhat shown by using different shades of cobalt blue for colouring the corresponding cells.

Clicking on one of those cells will give the actual results of all tools belonging to the considered box for the corresponding protein. This capability will be further described below in section 3.2.2.

Clicking on the header of any column will sort the whole table according to the alphanumerical order of the information contained within that column.

Clicking again on the same header will sort the table in the inverse order. This capability is particularly interesting if one wants to search for certain kinds of proteins: for instance, a biologist who would like to find e.g. all transmembrane proteins with sec signal peptides would just need to click (once or twice) on the Helix header and then on the Sec header; all proteins with just these features will be sorted in the table. Notice that the order according to which proteins are sorted will eventually yield different results.

The interface provides other controls for the biologist: the Replicon combo-box allows viewing the proteins of the selected replicon only. The table can be saved under xls formatted files by clicking on the button and specifying the repertory and file name. The table may also be searched (locus tag, protein id, annotation gene name or description, etc.) by entering the desired expression into the Search field and then clicking on the Search button to search from the beginning or the Next button to look for the next occurrence(s).

3.2.2. The Tools Raw Data Window

This window appears whenever a shaded cobalt blue cell, corresponding to a particular protein and localization feature box (Lipo, Tat, Sec,  Helix or  Barrel), is clicked. The window recalls the information relative to the gene (its genome, replicon and locus tag) and localization feature box (its name). It also displays a different tab for each localization tool that actually gave some results for that protein and belongs to the feature box. In every tab (i.e. for every tool), all raw data specific to the considered tool, and recorded from the pre-computing process, is displayed in a table showing the name and value of each recorded property:

In this way, the biologists or bioanalysts can understand the percentage value (or cobalt blue shade given to the selected cell). They may also retrieve the information they are used to when analyzing using their favourite tool, which may help them interpret the results and draw some hypotheses regarding the actual features or localization of the considered protein.

3.2.3. The Synopsis

CoBaltDB provides a synopsis giving the results returned by the localization tools for every particular protein. It presents the details of the protein (locus tag, protein id, gene name, position on the genome, organism, replicon name, annotation description and sequence) in their upper panel. The lower panel displays more precise localization-oriented information. A Save to pdf button allows saving the whole window as a pdf formatted file.

The synopsis gives all details retained within CoBalt: information with respect to the protein being a lipoprotein, or having a signal peptide, then proposing its consensual position, information with respect to possible transmembrane domains and their consensual positions, information from the global tools and databases with regard to their prediction concerning some precise localization. The raw data of the tools are not given here. The synopsis has been designed so as to fit onto a single A4 sheet.

The figure below shows the information given by the synopsis.

3.3. The Meta Tools (Localization Predictions) Tab

The third tab within CoBaltDB displays, for all considered proteins, the localization predictions of the considered global or meta- tools  they are called meta- because they integrate results from other tools  and from the global databases, which directly propose some prediction(s) for the cellular localization.

A different colour is used for every different localization prediction. As before, this table may be searched and saved under xls format.

3.4. The Additional Tools Submission Window

Finally, the fourth and last tab within CoBaltDB enables the user to submit any particular gene to yet another 50 or so additional localization tools, i.e. other than those whose results are displayed or used within CoBaltDB. The different tools are organized within feature panels. The gene must be selected from the other tabs by clicking on the corresponding line of the tables.

The selected gene is specified via its locus tag and amino-acid sequence. Below this information, a list of localization tools that have not been used to construct the CoBaltDB database appears in the form of different check-boxes, organized in several panels gathering respectively additional lipoprotein tools, some signal peptide prediction tools, transmembrane prediction tools, beta-barrel tools and finally global localization prediction meta-tools. Checking the desired tools and then clicking on the Launch button should result in opening, for each selected web tool, the default browser showing the corresponding web sites, with the sequence and possibly gram colour filled in in the appropriate place. Only a few webtools, marked with an asterix, will not have the sequence and gram information filled in. The biologist should then only need to press the submit button within the web site to actually launch the web tool processes and eventually collect the results.

4. Contact

We hope you will find CoBaltDB useful and this guide helpful.

If you have any questions or suggestions, feel free to contact us at: stephane.avner@univrennes1.fr

Thank you,

The B@SIC team.

Download