Developing and Using the Gene Ontology Midori A. Harris

advertisement
Developing and Using
the
Gene Ontology
Midori A. Harris
EBI & GO Consortium
RDF, Ontologies and Meta-Data Workshop
7–9 June 2006
GO project aims
•
•
•
Compile structured vocabularies describing
aspects of molecular biology
Describe gene products using vocabulary
terms (annotation)
Provide public resource of data and tools:
• to query and modify the vocabularies and
annotations
• annotation tools for curators
GO Scope: 3 domains
• ‣Molecular Function — elemental activity or task
nuclease, DNA binding, transcription factor
• Biological Process — broad objective or goal
‣
mitosis, signal transduction, metabolism
• ‣Cellular Component — location or complex
nucleus, ribosome, origin recognition complex
✴ ‘Normal’ functions and processes only:
‣
‣
No pathological processes
No experimental conditions
GO Structure: DAG
cell
is_a
part_of
membrane
mitochondrion
lysosomal
membrane
mitochondrial
membrane
GO is a directed acyclic graph (DAG):
a term can have one or more parent(s)
Applications of GO
• Gene product annotation
• model organism databases
• genome sequence analysis
• Expression data analysis
• Text mining
• More ...
GO Annotation
• What is GO annotation?
• An annotation is a statement that a gene
product …
‣ … has a particular molecular function
‣ … is involved in a particular biological
process
‣ … is located within a certain cellular
component
• … as determined by a particular method
• … as described in a particular reference
GO Annotation
• What is GO annotation?
• An annotation is a statement that a gene
product …
‣ … has a particular molecular function
‣ … is involved in a particular biological
process
‣ … is located within a certain cellular
component
• … as determined by a particular method
• … as described in a particular reference
gene
product
GO Annotation
• What is GO annotation?
• An annotation is a statement that a gene
product …
‣ … has a particular molecular function
‣ … is involved in a particular biological
process
‣ … is located within a certain cellular
component
• … as determined by a particular method
• … as described in a particular reference
}
gene
product
GO
terms
GO Annotation
• What is GO annotation?
• An annotation is a statement that a gene
product …
‣ … has a particular molecular function
‣ … is involved in a particular biological
process
‣ … is located within a certain cellular
component
• … as determined by a particular method
• … as described in a particular reference
}
gene
product
GO
terms
evidence
GO Annotation
• What is GO annotation?
• An annotation is a statement that a gene
product …
‣ … has a particular molecular function
‣ … is involved in a particular biological
process
‣ … is located within a certain cellular
component
• … as determined by a particular method
• … as described in a particular reference
}
gene
product
GO
terms
evidence
reference
GO Annotation
•
Anatomy of an annotation:
• Database object (gene or gene product)
• GO term ID
• Reference ID
• Evidence code
• IDA, IPI, IMP, IEP, IGI, ISS, IEA,TAS, NAS, or IC
• Optional and supporting items
• e. g. term qualifiers, gene product names, etc.
GO Annotation
• Annotation methods
• Electronic annotation
• sequence similarity
• transitive annotation
• nomenclature, other text matching
➡
high quantity but low quality
• Manual annotation
• literature curation by biologists
➡
time consuming but high quality
GO Annotation
cell
membrane
mitochondrion
lysosomal
membrane
mitochondrial
membrane
Annotate to any level within GO DAG
GO Annotation
cell
membrane
mitochondrion
lysosomal
membrane
mitochondrial
membrane
Annotate to any level within GO DAG
grx2
GO Annotation
cell
membrane
mitochondrion
lysosomal
membrane
mitochondrial
membrane
Annotate to any level within GO DAG
grx2
mgr2
GO & gene expression
GO & gene expression
GO & gene expression
D.R. Williams et al. (2005) Physiol Genomics 24(1): 13–22 (Fig. 2)
GO & gene expression
Genes
D.R. Williams et al. (2005) Physiol Genomics 24(1): 13–22 (Fig. 2)
GO & gene expression
Genes
D.R. Williams et al. (2005) Physiol Genomics 24(1): 13–22 (Fig. 2)
GO & gene expression
Genes
GO biological process
D.R. Williams et al. (2005) Physiol Genomics 24(1): 13–22 (Fig. 2)
Why GO Changes
• Advances in biology
• New groups join, requiring new terms or
different relationships between terms
• Update legacy terms
• Improve logical consistency
How GO Changes:
Overview of the GO Editorial Procedure
curators
annotators
editorial process
tracking
system
- checklist of steps;
discussion
- document in tracker
- implement using
OBO-Edit and CVS
anyone else
changes made public
Sources of Ideas
• Curators – implement decisions from
meetings; ongoing maintenance; specific
areas of interest
• Annotators – using GO to describe gene
products
• Interest groups – involving experts in their
areas of expertise
• Computation – parsing, reasoning
GO Interest groups
• Cover specific topics within GO
• Include GO curators and outside experts
• Group members develop portions of the
•
ontology, then report to GO Consortium
Sample topics:
• metabolism
• developmental biology
• plant biology
• transcription
GO Interest Groups
Ontology Rules
•
•
•
(examples)
Univocity
• A term (or relationship) should mean the same
thing every time it’s used
Positivity
• Complements of classes are not themselves
classes; i.e., terms such as ‘non-mammal’ or ‘nonmembrane’ do not designate genuine classes
Objectivity
• Which classes exist does not depend on our
biological knowledge; terms such as 'unknown',
'unclassified' or 'unlocalized' are thus unsuitable
Univocity:
GO uses one term and many characterized synonyms
Tactition
Taction
Tactile sense
perception of touch ; GO:0050975
The Challenge of Univocity:
People use the same words to describe different things
= bud initiation
= bud initiation
= bud initiation
Bud initiation?
How is a computer
to know?
Univocity:
GO adds “sensu” descriptors to discriminate among organisms
= bud initiation
sensu Metazoa
= bud initiation
sensu Saccharomyces
= bud initiation
sensu Viridiplantae
The Challenge of Positivity
centriole
• Some organelles are membrane-bound.
• A centrosome is not a membrane bound
organelle, but it still may be considered an
organelle.
The Challenge of Positivity:
Sometimes absence is a distinction in a biologist’s mind
centriole
non-membrane-bound
organelle GO:0043228
nucleus
membrane-bound organelle
GO:0043227
The Challenge of Positivity:
Sometimes absence is a distinction in a biologist’s mind
Note the logical difference between
“non-membrane-bound organelle” and
“not a membrane-bound organelle”
centriole
nucleus
The latter includes membrane-bound
everything that isorganelle
non-membrane-bound
a membrane bound organelle!
organellenot
GO:0043228
GO:0043227
Ontology alignment
One of the current goals of GO is to align:
Cell Types in GO
with
keratinocyte differentiation
fat cell differentiation
synonym: adipocyte differentiation
dendritic cell activation
lymphocyte proliferation
T cell homeostasis
garland cell differentiation
heterocyst differentiation
Cell Types in the Cell Ontology
keratinocyte
fat cell
synonym: adipocyte
dendritic cell
lymphocyte
T cell
garland cell
heterocyst
GO + CL:
Alignment will permit the generation of consistent and complete definitions
GO
+
id: CL:0000062
name: osteoblast
def: "A bone-forming cell which secretes an extracellular matrix.
Hydroxyapatite crystals are then deposited into the matrix to form
bone." [MESH:A.11.329.629]
is_a: CL:0000055
relationship: develops_from CL:0000008
relationship: develops_from CL:0000375
Cell type
Osteoblast differentiation: Processes whereby an
osteoprogenitor cell or a cranial neural crest cell
acquires the specialized features of an osteoblast, a
bone-forming cell which secretes extracellular matrix.
New Definition
=
GO + CL:
Consistent and complete definitions
id: GO:0001649
name: osteoblast differentiation
synonym: osteoblast cell differentiation
genus: differentiation GO:0030154 (differentiation)
differentium: acquires_features_of CL:0000062 (osteoblast)
definition (text): Processes whereby a relatively unspecialized cell acquires
the specialized features of an osteoblast, the mesodermal cell that gives
rise to bone
Formal definitions with necessary and sufficient
conditions, in both human readable and computer
readable forms
Other Ontologies that can
be aligned with GO
ontologies
• Chemical
3,4-dihydroxy-2-butanone-4-phosphate synthase activity
•
ontologies
• Anatomy
metanephros development
•
itself
• GOmitochondrial
inner membrane peptidase activity
•
Acknowledgements
• The GO Editorial Office
• Jane Lomax
• Amelia Ireland
• Jennifer Clark
• GO Consortium members
•
special thanks to David Hill (MGI) for slides
• Funding:
The Gene Ontology Consortium is supported by a P41 grant from the National Human Genome
Research Institute (NHGRI), and has been supported by grants from the European Union RTD
Programme "Quality of Life and Management of Living Resources." The Gene Ontology project
also thanks AstraZeneca for financial support.
The GO Consortium
The GO Consortium
www.geneontology.org
The GO Consortium
www.geneontology.org
Download