Representing common use cases in the Functional Genomics Experiment Object Model (FuGE-OM) Andrew Jones Department of Computer Science, University of Manchester Introduction The Functional Genomics Experiment Object Model (FuGE-OM) is a model of the laboratory techniques, biological samples, and data structures required to develop a standard format for high-throughput ‘omics techniques. It is planned that FuGE-OM will provide the basis from which the next version of the MAGE standard (www.mged.org/mage) for microarray data will be produced. FuGE-OM will also be used for describing biological samples, basic laboratory protocols and an overview of the hypothesis within the developing GPS format (General Proteomics Standard) being developed by the Proteomics Standards Initiative (PSI, http://psidev.sourceforge.net/). By utilising FuGE as the core for both microarray and proteome standard formats, the comparison between microarray and proteomics experiments should be facilitated. There will also be advantages where the same biological samples have been analysed with both techniques, because the sample processing details must only be created a single time. We believe there would be benefits for model developers in other related domains from adopting FuGE as the basis for the creation of standards as well. This document describes how certain biological use cases could be reported in FuGE. The source of several of the use cases is the Reporting Structures for Biological Information project (RSBI, http://www.mged.org/Workgroups/rsbi/rsbi.html) which is a consortium of experimentalists from the environmental genomics, nutrigenomics and toxicogenomics fields. The aim of RSBI is to ensure that the emerging data standards for different domains represent information in a way that is more intuitive for bench biologists and that sufficient information is captured to allow new biological knowledge to be derived from data. We also report use cases from laboratories in which several different functional genomics techniques are performed on the same yeast samples (to complete). Representing Use Cases in FuGE In this section, there is a description of how several different use cases can be expressed in FuGE in conjunction with suitable ontologies. At present the MGED Ontology (MO, http://mged.sourceforge.net/ontologies/index.php) could supply many relevant terms for describing the variables in the investigation and the descriptions of biological samples. Efforts are underway to combine MO with ontologies from other domains to create the Functional Genomics Ontology (FuGO) that ultimately will supply terms to be used in FuGE. The complete specification for the current draft FuGE model can be found at the website (fuge.sourceforge.net/dev/) along with descriptions of how FuGE could be extended to other domains. There is also a document (web address to come) that explains the basic components of FuGE, with which the reader should be familiar before reading the use case description. Note that throughout the following text, classes from FuGE are rendered in Courier font. Toxicogenomics Toxicogenomics is the study of the effects of toxic compounds on the global gene and protein expression profiles. Toxicogenomics use cases have been provided by RSBI (the complete use cases can be downloaded from fuge.sourceforge.net/rsbi/). The first study aims to detect the toxicity caused by a particular drug at five different doses on rats and mice. The animals are randomised into groups, divided by species, sex and drug dose. The animals are killed at the end of the study period, organs are extracted and necropsy performed. Tissue is extracted from the organs and it undergoes transcript and proteome profiling to detect the gene and protein expression. Similar studies are performed over a two week period, 13 weeks and two years. Figure 1 represents how the overall investigation could be summarised in the FuGE Investigation package. The Investigation class captures the hypothesis and has an association to OntologyTerm for the investigationType. Suitable ontology terms for this investigation would be “perturbational_design” and “dose_response_design”. There are three instances of ExperimentalDesign, one for each type of assay performed (proteomics, microarrays and histopathology). ExperimentalDesign also has three associations to Description for adding text, if appropriate, about quality control measures, the number of replicates and the normalization procedure carried out for each assay. There are three instances of ExperimentFactor (the variables) for sex, species and dose, which are represented as OntologyTerms. Each of the ExperimentalDesigns has associations to each of the ExperimentFactors because each type of assay is used to measure each variable. The values for the variables (FactorValue) are stored using OntologyTerms as shown. Each FactorValue references the relevant Data that corresponds to this variable. Figure 1. A summary of the toxicogenomics use case represented in the FuGE Investigation package. All classes can be linked to Description to allow a textual description, auditing information, a URI, additional parameters and security information to be attached. Figure 2. The flow of samples (Materials) from the toxicogenomics use case, as represented in FuGE. The diagram in Figure 2 shows a summary of a workflow for the investigation represented in the FuGE Material and Protocol package. The individual organisms are represented as Materials (blue ovals), which can be linked to information about the species and sex. If required, grouping Materials can be defined for the entire population of male rats, female rats, male mice and female mice (far left of the diagram). Each of the grouping structures is related to the individuals, using the self-association (called “sub-components”) on Material in FuGE. Protocols for the different dosing and caging strategies are defined (green rounded rectangles). For every individual, represented as a Material, a MaterialTreatment is created that stores the date and the operator on which the Protocol was performed. MaterialTreatment is a type of ProtocolApplication. As such, each MaterialTreatment references the correct Protocol for the dose that was applied to that individual. In the next stage, a Protocol for sacrificing the animals can be defined, with one MaterialTreatment per animal to define by whom and when the sacrifice was done. A further Protocol would be defined for the extraction of organs. The MaterialTreatments that reference the “organ extraction” Protocol produce several output Materials, corresponding to the different organs extracted. Protocols are defined for the assays that are performed (microarray, histopathology and proteomics) and one MaterialTreatment is created for every defined organ, for each of the three assay types. These MaterialTreatments produce the corresponding Data objects. Nutrigenomics Nutrigenomics is the study of the effects of dietary intake on health at a molecular level, for example in terms of the changes in gene and protein expression in response to particular dietary factors. A nutrigenomics use case has been produced by RSBI, the exact specification of which can be found on the at fuge.sourceforge.net/rsbi. The general goal of the study is to investigate the relationships between diet (in terms of fat intake), genetic variation and responses in gene expression, metabolite and hormonal composition. Two groups of subjects are selected (750 obese subjects and 115 lean subjects) and they are scrutinized for lifestyle and dietary information. The subjects are then placed on high fat or low fat hypo-calorific diets. At specified points in the diet biometric measurements are taken and gene expression profiling is performed. The genotypes of subjects are determined in relation to certain candidate genes and related back to the measurements taken, and the gene expression profiles. Figure 3 displays how a summary of this use case can be captured in the Investigation package. There are two types of ExperimentFactor, for the lean or obese phenotype and for the high or low fat diet. Two ExperimentalDesigns could be defined, one for the biometric measures and one for the microarray assay. If additional tests were performed, these could be captured as additional ExperimentalDesigns. Figure 3 – A summary of the nutrigenomics study in the FuGE Investigation package. Figure 4. A simple representation of the flow of materials and data from Nutrigenomics use case. Representing protocols In the nutrigenomics use case, the flow of Materials could be represented with simple protocols as shown in Figure 4. However, there is a requirement for more complex protocol structures to represent certain phases of the investigation, such as “preintervention” and “post-intervention” as described in the use case document. Such a protocol could be represented as shown in Figure 5. Protocols could be defined for the “pre-intervention” and “post-intervention” stages. Each Protocol would comprise a series of Action steps. There would be a set of Action steps containing text that describes the conditions of the diet. Certain Action steps would reference the Protocol for the biometric measures or microarray analysis. The Action steps that reference another Protocol would be tagged with terms from an ontology (ActionTerm) that give measurements for the time points at which the biometric measures or microarray analysis is being performed with the respect to the parent Protocol. There would be one Protocol for the low fat diet and one Protocol for the high fat diet. There would a single MaterialTreatment for every individual that must be described (750 obese subjects and 115 lean subjects, totalling 865 MaterialTreatments). Each of the 865 MaterialTreatments will reference either the Protocol for the high fat diet or the low fat diet, and a MaterialTreatment will mirror the structure of the corresponding Protocol by having ActionApplication steps that reference the corresponding Action. The inputs and outputs of each ProtocolApplication are modelled as instances of Material and instances of Data (after the biometric profile or microarray analysis). Figure 5. The representation of complex nutrigenomics protocols in FuGE. Environmental genomics To come Multiple ‘omics techniques To come Discussion To come