Tutorial - Politecnico di Milano

advertisement
Politecnico di Milano
School of Information Engineering
Master Degree in Information Engineering
Course “Bioinformatics and Computational Biology for Medicine”
Ontologizer Tutorial and Exercises
Arif Canakoglu
canakoglu@elet.polimi.it
(This tutorial was taken from Ontologizer web application help)
Ontologizer Tutorial
You can run the application from the given link below: (you should start the application with clicking java webstart
button
http://compbio.charite.de/contao/index.php/ontologizer2.html
Setting Up the Ontologizer
1. Download the file http://compbio.charite.de/tl_files/ontologizer/examples/yeastSampleFiles.zip and unpack
it into a directory of your choice, e.g, the desktop. The archive contains sets of genes up- or down regulated
following treatment with sulfometuron methyl, which is an inhibitor of amino acid biosynthesis. Data was
gathered from Jia et al. (2000) Global expression profiling of yeast treated with an inhibitor of amino acid
biosynthesis, sulfometuron methyl.
2. If you need a proxy to access the internet, please open the Preferences Window via the Window >
Preferences... menu entry within Ontologizer.
3. Enter your proxy configuration in the appropriate line and then press Ok.
File Sets
1

Each File Set contains a definition file and an association file. Each project uses a File Set to perform the
analysis. This is done so that users can manage different file configurations easily, allowing for instance
different versions of the definition file to be used for different projects, as it may be useful to use the same
version of the definition file (which is frequently updated at the GO website) for development.
User-Supplied Files


In order to analyze your experimental data, you need to prepare one file for each of the groups of interest in
your experiment (For instance, this might be a list of genes differentially expressed at different time points).
We refer to such groups as study sets. Additionally, you need to indicate the population set. In general, this
will be a list of all genes that were test (for instance, all genes represented on a microarray). The genes should
be listed one on a line in plain text (Alternatively, FASTA files can be used if desired if the name of the gene
directly follows the '>' sign).
For this tutorial, you can download the yeast study and populations files from the Ontologizer website:
http://www.charite.de/ch/medgen/ontologizer/howto/index.html. Unpack these files before use.
Creating a New Project
1. In order to create the new project, press the New Project button within the toolbar of Ontologizer's main
window or select the Project > New > Project... menu entry.
2. This brings up the New Project Wizard. First enter a name for the project. For the tutorial, enter
suflometuronMethyl into the Project Name textfield then press the Next button to proceed to the next page.
2
3. Here you need to indicate the definition file (via the Ontology text field) and the association file (via the
Association text field). The Ontologizer comes with predefined File Sets for frequently used species that can
be automatically downloaded. We have downloaded the File Set for Yeast above. If we hadn't, the Ontologizer
would now automatically download these files in the background. For this tutorial, click on the File Set combo
box and choose Yeast. Then press Next which brings you to the Population Edit page.
4. Now enter the genes of the population set. Use the study set/population set example files downloaded from
the Ontologizer homepage as described above. Drag & Drop the file called population.txt into the gene editor
field or use a File Selection Dialog by clicking on the Append Set... button. Notice that names of genes with GO
annotations are highlighted (you may have to wait for completion of downloads or parsing before seeing
highlighting). You can hover the mouse over these entries to see more information about the gene's
3
annotation. Proceed by clicking on the Next button.
5. Drag & Drop a study file into editor area (again, alternatively, you can use a file selection dialog by clicking on
the Append Set... button). Press Next and repeat the procedure for each study set (file).
6. Press Finish when you added the last study set. The New Project Wizard window closes and you should now
see your new project suflometuronMethyl appearing in the main window.
4
Performing the Analysis
The Ontologizer offers multiple methods for searching for GO term overrepresentation and for multiple testing
correction. For more information on these topics please consult the Ontologizer homepage, where you will also find
links to publications describing the Ontologizer. For the purposes of this tutorial, we will use the Parent-Child Union
Methods with a Bonferroni multiple testing correction.
1. Within the main window, select our project which is sulfometuronMethyl.
2. From the combo boxes in the tool bar, choose a calculation method (first combo box), Parent-Child-Union and
and the Bonferroni a multiple test correction (second combo box). Then press Analyze.
Exploring the Results
1. The Results Window now appears. Depending on the size and number of the study sets and the type of
multiple testing correction desired, the analyis should complete in a few seconds to a few minutes. As
individual study sets are completed, new tabs appear with the results. If you have used all the files of this
tutorial, you should see seven tab folders corresponding to the name of the study sets once analysis is
completed. The first study set is activated and within the tab folder the results are presented in form of a
table.
5
2. Notice that the background of terms whose adjusted p-value falls below the significance level (as given by
widget below the table) is colorized according to the sub-ontology and the rank. (Note that the significant
terms are marked in color, whereby the terms from biological process are shown in green, terms from
molecular function in yellow and terms from cellular component in magenta.)
3. Now click on one of the terms, e.g., amino acid and derivative. This refreshes the browser of the bottom part
in the window to contain information about the term including the parents (more general terms), children
(more specific terms) or the names of the genes, to which the term is annotated to.
4. To get a graphical overview, press the Preview Graph button (the third from left in the toolbar).
5. The graphs consists of all active terms as defined by the little checkboxes before every time, which by default
are all signifcant terms.
6. The parameter of the graph view (i.e., zoom factor, which extend is displayed) can be altered the button or
context menu commands.
Question 1: try the same analysis without any correction and compare the results.
Answer 1:
The analysis for the 15minSMinduced are as follows:
As a calculation method (first combo box), Term-for-Term and the Bonferroni a multiple test correction, there
are 47 ontology terms were found.(with default threshold: 0.10).
While if we are not using any correction method, we extract 345 ontology for the same test
Question 2: try the same analysis with different correction methods and try to explain the reason of the
numbers of the elements found it different for the different cases.
Answer 2:
6
Again for the analysis “15minSMinduced” are as follows with statistical analysis control as term-for-term:
We run the analysis as it is given in the class:
Bonferroni: 47
Bonferroni-Holm: 47
Westfall-Young-Single-Step: 59
Westfall-Young-Step-Down: 59
Benjamini-Hochberg: 163
None: 345
So the results are as we expected while we are moving more false negatives we found less terms and while
we are moving to more false positives we found more terms.
Question 3: Use different Statistical Analysis methods and compare them.
Answer 3:
As a calculation method (first combo box), Parent-Child-Union and the Bonferroni a multiple test correction,
there are 24 ontology terms were found.(with default threshold: 0.10).
While if we are not using any correction method, we extract 272 ontology for the same test
When we are using term-for-term we got more ontology than parent-child-union. It is because in the as
explaned below the parent-child-union is used also the ontology information of the terms. So it can combined
the the terms in the roots.
Note: Statistical Analysis controls the method by which the annotated genes or gene products in the study
set are analyzed for GO term overrepresentation with respect to the population set. The standard method has
been to calculate the upper tail of the hypergeometric distribution (One-sided Fisher exact test) for each term
separately. The Ontologizer also provides analysis by means of the parent-child approach, which has several
advantages compared to the standard approach (see the Ontologizer homepage for further details and
references).
7
Download