Representing common use cases in the Functional Genomics

advertisement
Representing common use cases in the Functional
Genomics Experiment Object Model (FuGE-OM)
Andrew Jones
Department of Computer Science, University of Manchester
Introduction
The Functional Genomics Experiment Object Model (FuGE-OM) is a model of the
laboratory techniques, biological samples, and data structures required to develop a
standard format for high-throughput ‘omics techniques. It is planned that FuGE-OM will
provide the basis from which the next version of the MAGE standard
(www.mged.org/mage) for microarray data will be produced. FuGE-OM will also be used
for describing biological samples, basic laboratory protocols and an overview of the
hypothesis within the developing GPS format (General Proteomics Standard) being
developed by the Proteomics Standards Initiative (PSI, http://psidev.sourceforge.net/).
By utilising FuGE as the core for both microarray and proteome standard formats, the
comparison between microarray and proteomics experiments should be facilitated.
There will also be advantages where the same biological samples have been analysed
with both techniques, because the sample processing details must only be created a
single time. We believe there would be benefits for model developers in other related
domains from adopting FuGE as the basis for the creation of standards as well. This
document describes how certain biological use cases could be reported in FuGE. The
source of several of the use cases is the Reporting Structures for Biological Information
project (RSBI, http://www.mged.org/Workgroups/rsbi/rsbi.html) which is a consortium of
experimentalists from the environmental genomics, nutrigenomics and toxicogenomics
fields. The aim of RSBI is to ensure that the emerging data standards for different
domains represent information in a way that is more intuitive for bench biologists and
that sufficient information is captured to allow new biological knowledge to be derived
from data. We also report use cases from laboratories in which several different
functional genomics techniques are performed on the same yeast samples (to
complete).
Representing Use Cases in FuGE
In this section, there is a description of how several different use cases can be
expressed in FuGE in conjunction with suitable ontologies. At present the MGED
Ontology (MO, http://mged.sourceforge.net/ontologies/index.php) could supply many
relevant terms for describing the variables in the investigation and the descriptions of
biological samples. Efforts are underway to combine MO with ontologies from other
domains to create the Functional Genomics Ontology (FuGO) that ultimately will supply
terms to be used in FuGE. The complete specification for the current draft FuGE model
can be found at the website (fuge.sourceforge.net/dev/) along with descriptions of how
FuGE could be extended to other domains. There is also a document (web address to
come) that explains the basic components of FuGE, with which the reader should be
familiar before reading the use case description. Note that throughout the following text,
classes from FuGE are rendered in Courier font.
Toxicogenomics
Toxicogenomics is the study of the effects of toxic compounds on the global gene and
protein expression profiles. Toxicogenomics use cases have been provided by RSBI (the
complete use cases can be downloaded from fuge.sourceforge.net/rsbi/). The first study
aims to detect the toxicity caused by a particular drug at five different doses on rats and
mice. The animals are randomised into groups, divided by species, sex and drug dose.
The animals are killed at the end of the study period, organs are extracted and necropsy
performed. Tissue is extracted from the organs and it undergoes transcript and
proteome profiling to detect the gene and protein expression. Similar studies are
performed over a two week period, 13 weeks and two years. Figure 1 represents how
the overall investigation could be summarised in the FuGE Investigation package.
The Investigation class captures the hypothesis and has an association to
OntologyTerm for the investigationType. Suitable ontology terms for this
investigation would be “perturbational_design” and “dose_response_design”. There are
three instances of ExperimentalDesign, one for each type of assay performed
(proteomics, microarrays and histopathology). ExperimentalDesign also has three
associations to Description for adding text, if appropriate, about quality control
measures, the number of replicates and the normalization procedure carried out for each
assay. There are three instances of ExperimentFactor (the variables) for sex,
species and dose, which are represented as OntologyTerms. Each of the
ExperimentalDesigns has associations to each of the ExperimentFactors
because each type of assay is used to measure each variable. The values for the
variables (FactorValue) are stored using OntologyTerms as shown. Each
FactorValue references the relevant Data that corresponds to this variable.
Figure 1. A summary of the toxicogenomics use case represented in the FuGE
Investigation package. All classes can be linked to Description to allow a textual
description, auditing information, a URI, additional parameters and security information
to be attached.
Figure 2. The flow of samples (Materials) from the toxicogenomics use case, as
represented in FuGE.
The diagram in Figure 2 shows a summary of a workflow for the investigation
represented in the FuGE Material and Protocol package. The individual organisms
are represented as Materials (blue ovals), which can be linked to information about
the species and sex. If required, grouping Materials can be defined for the entire
population of male rats, female rats, male mice and female mice (far left of the diagram).
Each of the grouping structures is related to the individuals, using the self-association
(called “sub-components”) on Material in FuGE.
Protocols for the different dosing and caging strategies are defined (green rounded
rectangles). For every individual, represented as a Material, a MaterialTreatment
is created that stores the date and the operator on which the Protocol was performed.
MaterialTreatment is a type of ProtocolApplication. As such, each
MaterialTreatment references the correct Protocol for the dose that was applied
to that individual. In the next stage, a Protocol for sacrificing the animals can be
defined, with one MaterialTreatment per animal to define by whom and when the
sacrifice was done. A further Protocol would be defined for the extraction of organs.
The MaterialTreatments that reference the “organ extraction” Protocol produce
several output Materials, corresponding to the different organs extracted. Protocols
are defined for the assays that are performed (microarray, histopathology and
proteomics) and one MaterialTreatment is created for every defined organ, for each
of the three assay types. These MaterialTreatments produce the corresponding
Data objects.
Nutrigenomics
Nutrigenomics is the study of the effects of dietary intake on health at a molecular level,
for example in terms of the changes in gene and protein expression in response to
particular dietary factors. A nutrigenomics use case has been produced by RSBI, the
exact specification of which can be found on the at fuge.sourceforge.net/rsbi. The
general goal of the study is to investigate the relationships between diet (in terms of fat
intake), genetic variation and responses in gene expression, metabolite and hormonal
composition. Two groups of subjects are selected (750 obese subjects and 115 lean
subjects) and they are scrutinized for lifestyle and dietary information. The subjects are
then placed on high fat or low fat hypo-calorific diets. At specified points in the diet
biometric measurements are taken and gene expression profiling is performed. The
genotypes of subjects are determined in relation to certain candidate genes and related
back to the measurements taken, and the gene expression profiles. Figure 3 displays
how a summary of this use case can be captured in the Investigation package.
There are two types of ExperimentFactor, for the lean or obese phenotype and for
the high or low fat diet. Two ExperimentalDesigns could be defined, one for the
biometric measures and one for the microarray assay. If additional tests were performed,
these could be captured as additional ExperimentalDesigns.
Figure 3 – A summary of the nutrigenomics study in the FuGE Investigation
package.
Figure 4. A simple representation of the flow of materials and data from Nutrigenomics
use case.
Representing protocols
In the nutrigenomics use case, the flow of Materials could be represented with simple
protocols as shown in Figure 4. However, there is a requirement for more complex
protocol structures to represent certain phases of the investigation, such as “preintervention” and “post-intervention” as described in the use case document. Such a
protocol could be represented as shown in Figure 5. Protocols could be defined for
the “pre-intervention” and “post-intervention” stages. Each Protocol would comprise a
series of Action steps. There would be a set of Action steps containing text that
describes the conditions of the diet. Certain Action steps would reference the
Protocol for the biometric measures or microarray analysis. The Action steps that
reference another Protocol would be tagged with terms from an ontology
(ActionTerm) that give measurements for the time points at which the biometric
measures or microarray analysis is being performed with the respect to the parent
Protocol. There would be one Protocol for the low fat diet and one Protocol for
the high fat diet. There would a single MaterialTreatment for every individual that
must be described (750 obese subjects and 115 lean subjects, totalling 865
MaterialTreatments). Each of the 865 MaterialTreatments will reference either
the Protocol for the high fat diet or the low fat diet, and a MaterialTreatment will
mirror the structure of the corresponding Protocol by having ActionApplication
steps that reference the corresponding Action. The inputs and outputs of each
ProtocolApplication are modelled as instances of Material and instances of
Data (after the biometric profile or microarray analysis).
Figure 5. The representation of complex nutrigenomics protocols in FuGE.
Environmental genomics
To come
Multiple ‘omics techniques
To come
Discussion
To come
Download