0_GO - Theoretical Systems Biology

advertisement
Gene Ontology
John Pinney
j.pinney@imperial.ac.uk
Gene annotation
Goal:
transfer knowledge about the
function of gene products
from model organisms to
other genomes
Gene annotation
Problem:
keyword systems are
different between research
communities
Gene annotation
Solution:
controlled vocabulary
Ontology
structured
controlled vocabulary
Ontology:
a collection of terms
and their definitions
and the logical relationships
between them
Gene Ontology (GO):
a collection of terms
describing gene products
and their definitions
and the logical relationships
between them
nucleus
GO:0005634
“A membrane-bounded organelle of eukaryotic
cells in which chromosomes are housed and
replicated. In most cells, the nucleus contains all
of the cell's chromosomes except the organellar
chromosomes, and is the site of RNA synthesis
and processing. In some species, or in specialized
cell types, RNA metabolism or DNA replication
may be absent.”
“part of”
cell
nucleus
nucleoplasm
nuclear membrane
nucleolus
intracellular
organelle
“is a”
membrane-bounded
organelle
intracellular membranebounded organelle
nucleus
pronucleus
A term may have
more than one parent term
and
more than one child term.
=>
The gene ontology is not a tree
The gene ontology has a structure
known as a Directed Acyclic Graph
(DAG).
relationships are there are no
mathematical
not symmetrical directed
term for a
loops
network
GO is actually made up of 3 different
ontologies:
cellular component
molecular function
biological process
cellular component
“The part of a cell or its extracellular
environment in which a gene product is
located. A gene product may be located in
one or more parts of a cell.”
cellular component
examples:
cohesin core heterodimer
extracellular region
laminin-1 complex
replication fork
transcription factor complex
molecular function
“Elemental activities, such as catalysis or
binding, describing the actions of a gene
product at the molecular level. A given
gene product may exhibit one or more
molecular functions.”
molecular function
examples:
transcription factor binding
enzyme activator activity
3'-nucleotidase activity
metallopeptidase activity
hexokinase activity
biological process
“Those processes specifically pertinent to
the functioning of integrated living units:
cells, tissues, organs, and organisms. A
process is a collection of molecular events
with a defined beginning and end.”
biological process
examples:
para-aminobenzoic acid biosynthetic process
protein localization
establishment of blood-nerve barrier
circadian rhythm
posterior midgut development
geneontology.org
geneontology.org
search and browse the ontologies
geneontology.org
search and browse the ontologies
geneontology.org
download ontologies
geneontology.org
download mappings from other
databases
enzyme functions
(EC, KEGG, MetaCyc)
protein domains
(Pfam, SMART, PRINTS,…)
other controlled vocabularies of functions
(E. coli functions, MIPS FunCat)
geneontology.org
download annotations for various
genomes
geneontology.org
download annotations for various
genomes
database
gene product ID
NCBI_NP
NP_354299.2
gene symbol
lolD
GO term ID
GO:0043190
evidence code
ISS
"ABC transporter, nucleotide binding/ATPase
protein (lipoprotein)"
taxon:176299
20070612
PAMGO_GAT
evidence codes
Allow curators to indicate the type of
evidence for each gene-term annotation.
experimental
e.g. IMP Inferred from mutant phenotype
IDAInferred from direct assay
computational
e.g. ISS
IGC
Inferred from sequence similarity
Inferred from genome context
author statement
e.g. TASTraceable author statement
geneontology.org
download annotations for various
genomes
database
gene product ID
NCBI_NP
NP_354299.2
gene symbol
lolD
GO term ID
GO:0043190
evidence code
ISS
description
"ABC transporter, nucleotide binding/ATPase
protein (lipoprotein)"
taxon:176299
organism (taxon) ID
20070612
date
PAMGO_GAT
annotation project ID
geneontology.org
repository of analysis tools that use
GO
search, edit and and browse ontologies / annotations
software libraries
statistical analysis
text mining
protein interactions
enrichment analysis
Enrichment analysis
significant
expression
change in a
microarray
experiment
cluster from a
protein
interaction
network
some other
experiment /
analysis
ArrayTrack
BiNGO
GOstat
gene set
Which GO terms
occur significantly
more often than
expected in this
gene set?
whole
genome
(annotated)
Advantages of GO
single set of terms to describe the function of gene
products from all organisms.
DAG structure provides a logical framework to represent
knowledge at whatever level of detail is available.
continually revised to reflect the state of current
knowledge.
can quantify strength of relationships between terms
(semantic similarity).
many statistical analysis tools available.
Limitations of GO
GO is limited in scope: it does not cover
processes that are not normal functions of gene
products (e.g. oncogenesis).
sequence attributes (e.g. introns/exons)
protein structures or interactions
evolution
gene expression
Summary (1)
The gene ontology (GO) is a structured, controlled
vocabulary to describe the function of gene products.
Terms in GO have logical relationships (“is a”, “part of”)
with one another. Together these form a structure called a
Directed Acyclic Graph (DAG).
GO is formed of 3 separate ontologies describing different
aspects of gene function: cellular component, molecular
function and biological process.
Summary (2)
geneontology.org is the central resource for downloading
ontology, annotation and mapping files.
evidence codes are used in annotations to show the
experimental, computational or literature support for
each function.
Summary (3)
many software tools are available to support GO analysis
of experimental data, including enrichment analysis by
ArrayTrack (microarray expression data)
BiNGO (protein interaction clusters)
GOstat (any data in the form of gene sets)
Download