Gene Ontology - A Way Forwards Ruth Lovering, Varsha Khodiyar, Pete Scambler, Mike Hubank, Rolf Apweiler and Philippa Talmud Centre for Cardiovascular Genetics, UCL Department of Medicine, Rayne Institute 5 University Street London WC1E 6JF. Molecular Medicine Unit, Institute of Child Health, 30 Guilford Street, London WC1N 1EH. Molecular Hematology and Cancer Biology Unit, Institute of Child Health, 30 Guilford Street, London WC1N 1EH. European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD. (TNF alpha) Inhibitory action of lipoxins on pro-inflammatory TNF-alpha signalling The UCL-based GO annotation team aims to work with bench scientists to improve the annotation of human proteins. Improvements in the GO annotation of your favourite protein will lead to an improved public resource for everyone. Proteomes and differentially regulated mRNAs can be analysed with GO data, to provide an overview of the predominant activities the constituent proteins are involved in or where they are normally located1. Furthermore, often the generation of hypotheses to explain proteome-wide alterations in response to certain diseases, such as cardiac hypertrophy2, or stress states, such as hypoxia3, relies on the use of GO annotation data. The ability to review experimental results, with respect to known functional information, has also proved useful when investigators need to select a subset of proteins to analyse in greater depth in order to identify new sets of disease biomarkers4,5. GO data also provides an indispensable resource to indicate the success of subcellular enrichment strategies or large scale confocal microscopy analyses6,7. Already, drug treatments are being tailored according to molecular pathway imbalances, detected through individual-specific microarray or proteomic data. PTPN11 For more information about contributing to the annotation of the human genome contact GOannotation@UCL.ac.uk (IKBKG) Gene Ontology provides a systematic language for the description of gene product attributes in three key domains MAP3K14 (CHUK) (SFN, YWHA family) Cellular Component FOXO1 CDKN1B CCNE1 (NFKB1) (IL-6) Annotation GO terms are associated with gene products (proteins) MetaCore Map, GeneGO, www.genego.com (CCNE1, Cyclin E1) Distribution of Data GO annotations are available through major biological databases and numerous high-throughput analysis GO tools Large number of uses • Biomarker discovery • Enhancing annotation of any genome • Validation of cell separation methodologies • Identification of disease-associated processes • Quick access to information about individual proteins • Validation of automated ways of deriving gene information • Drug therapies based on process variations between individuals • Identification of predominant activities within a specific group of proteins • Identification of common pathways targeted by different pathogens, proteins etc Grant: SP/07/007/23671 www.cardiovasculargeneontology.com Spot the Difference KEY Activation Inhibition Unspecified Cytoplasm Extracellular Plasma Membrane Nucleus B Binding CR Class relation CS Complex subunit IE Influence on expression +P Phosphorylation TR Transcription regulation Z Catalysis Associated with Cardiovascular Disease Kinase Phosphatase Phospholipase Protein Transfactor Molecule Phospholipid Ligand Binding protein Receptor GPCP Protein Family Completing the annotation of every gene product, using Gene Ontology (GO), is a substantial undertaking, especially for highly investigated genes. Consequently, at present, there is a wide variation between the quality and quantity of annotations associated with different proteins. QuickGO (www.ebi.ac.uk/ego) views of the GO terms associated with TNF-alpha, IL-6 and CCNE1 (above) and the histogram, to the right, illustrate the variation in the number of unique GO terms associated with human proteins. This variation is not simply a reflection of the current knowledge about these proteins. Thousands of publications describe TNF-alpha and IL-6 and yet there are over twice as many GO terms associated with TNFalpha (68) as there are with IL-6 (28). This difference is due to the time constraints facing GO curators. At present there are only 2 projects (funding 4 curators) that prioritise the comprehensive annotation of human genes. IL-1B, IL-6, PTPN11 and TNF-alpha have been prioritised for annotation by the Cardiovascular GO Annotation Initiative, however, of these only TNF-alpha has been annotated by this project, to date. The quality of annotations also varies between proteins. Proteins annotated mostly through automated methods tend to have more general GO terms (see CCNE1). Whereas, proteins with annotations made by GO curators, based on published experimental evidence, tend to have more specific GO terms (see TNF-alpha). References 1. Pasini, E.M., Kirkegaard, M., Mortensen, P., et al. In-depth anyalysis of the membrane and cytosolic proteome of red blood cells. Blood, 2006, 108, 791-801. 2. Pan, Y., Kislinger, T., Gramolini, A. O., et al. Identification of biochemical adaptations in hyper- or hypocontractile hearts from phospholamban mutant mice by expression proteomics, Proc Natl Acad Sci U S A, 2004, 101: 2241-2246. 3. Boraldi, F., Annovi, G., Carraro, F., et al. Hypoxia influences the cellular cross-talk of human dermal fibroblasts. A proteomic approach, Biochim Biophys Acta, 2007, 1774: 1402-1413. 4. Shi, M., Jin, J., Wang, Y., et al. Mortalin: a protein associated with progression of Parkinson disease?, J Neuropathol Exp Neurol, 2008, 67: 117-124. 5. Perco, P., Wilflingseder, J., Bernthaler, A., et al. Biomarker candidates for cardiovascular disease and bone metabolism disorders in chronic kidney disease: A systems biology perspective, J Cell Mol Med, 2008. 6. Kislinger, T., Rahman, K., Radulovic, D., et al. PRISM, a generic large scale proteomic investigation strategy for mammals, Mol Cell Proteomics, 2003, 2: 96-106. 7. Barbe, L., Lundberg, E., Oksvold, P., et al. Toward a confocal subcellular atlas of the human proteome, Mol Cell Proteomics, 2008, 7: 499-508. Number of publications and GO terms associated with lipoxins/TNF-alpha signalling pathway proteins 80 6 70 Unique GO terms Publications 5 60 4 50 40 3 30 2 20 1 10 0 0 Gene Symbol Log Number of Publications Molecular Function High-throughput technologies and research into multi-factorial diseases are also highlighting how highly investigated proteins in one field of biology are relevant to processes associated with another field of biology. For example, in the central figure, several genes (IL-6, IL-8, STAT3 and TNF-alpha) are associated with the TNF-alpha pro-inflammatory signalling pathway and are also associated with cardiovascular disease. NFKBIE ERLIN1 PPAPDC2 YWHAQ YWHAZ NFKBIB PIK3R2 TNFRSF1 YWHAE AKT2 MAP3K14 AKT3 CCNE1 YWHAB YWHAG FPRL1 PIK3CA TRADD FOXO1 YWHAH PIK3CD TRAF2 CDKN1B CHUK IKBKB NFKB1 PDPK1 RIPK1 SOCS1 IKBKG JAK1 PLD1 SFN TNFRSF1 STAT3 PIK3CB NFKBIA CDK2 IL8 PRKCZ PIK3R1 RELA IL6 IL1B PTPN11 AKT1 TNF (NFKB1A) Biological Process How is GO used? Number of Unique GO Terms Gene Ontology (GO) provides a controlled vocabulary to describe the attributes of genes and gene products in any organism. This resource is proving highly useful for researchers investigating complex phenotypes such as cardiovascular disease, as well as those interpreting results from high-throughput methodologies. By providing current functional knowledge in a format that can be exploited by high-throughput technologies, the GOC provides a freely available key public annotation resource that can help bridge the gap between data collation and data analysis (www.geneontology.org).