GENEONTOLOGY Unifying Biology Using Gene Ontology (GO) to Characterise Key Players in Parkinson’s Disease Rebecca E. Foulger1, Paul Denny1, Claire O’Donovan3, John Hardy2 and Ruth C. Lovering1 1. Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, Rayne Building, 5 University Street, London, WC1E 6JF 2. Department of Molecular Neuroscience, Institute of Neurology, University College London, Queen Square, London, WC1N 3BG 3. European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD Introduction to GO • The Gene Ontology (GO) project is a collaborative effort to provide consistent descriptions of gene products across all kingdoms of life, and is a key resource for researchers wishing to understand the biological role of a gene product. • GO contains three structured controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular locations and molecular functions, in a species-independent manner. • Originally developed in 1998, the ontologies have grown to include nearly 40,000 terms describing a wide range of concepts to differing levels of specificity. Anatomy of a GO annotation • GO annotation is the practice of capturing information about a gene product using terms from the Gene Ontology. GO terms are assigned to proteins based on different evidence: IDA = inferred from direct assay IMP = inferred from mutant phenotype TAS = traceable author statement IC = inferred by curator Each annotation is attached to a reference for traceability. Figure 1: Placement of ‘negative regulation of neuron apoptotic process’ (GO:0043524) in the Gene Ontology. denotes GO terms assigned by this project Blue arrows represent is_a relationships between GO terms Purple arrows represent regulates and negatively_regulates relationships between GO terms Each GO term has a unique ID, name and definition. A GO term may also contain one or more synonyms to aid searching. Image taken from OBO-Edit, version 2.3.1 Figure 2: Anatomy of an annotation: a subset of biological process GO annotations for human PARK7 (PARKIN-7, DJ-1). Displayed in the EBI GO browser (www.ebi.ac.uk/QuickGO). Gene Ontology and Parkinson’s Disease • The discovery of genes linked to familial forms of Parkinson’s Disease, including SNCA (αsynuclein), PARK2 (parkin), LRRK2 (PARK8), PARK7 (DJ-1) and PINK1 (PARK6) has yielded important insights into the pathogenesis of Parkinson’s disease. Further elucidating the roles of these genes will help identify the cellular mechanisms and machinery underlying disease risk, onset and progression. Project aims, priorities and progress • We have used multiple approaches to select our annotation priorities, including: • Started in January 2014, the Parkinson’s UK GO annotation project is a collaboration between University College London (UCL) and the European Bioinformatics Institute (EMBLEBI), and is funded by Parkinson’s UK. Our aim is to extend GO annotation into neurological areas and provide high-quality GO annotations to the products of genes relevant to Parkinson’s Disease. • Previous annotation projects have demonstrated the effectiveness of topic-based GO curation (Alam-Faruque et al., 2011 and 2014, Khodiyar et al., 2013). This is the first annotation effort to focus on a neurological disease, and working at UCL has enabled us to establish collaborations with local neurological researchers to guide and verify our annotations. • Our work focuses on elucidating the ‘normal’ function of a Parkinson’s-associated gene product, providing an additional challenge for a disease-related project. References and further reading • Ten quick tips for using the Gene Ontology. Blake J.A. PLoS Comput Biol. Nov;9(11):e1003343 (2013). PMID 24244145 Parkinson’s risk genes: compiled from PDGene (database for Parkinson’s Disease genetic association studies, www.pdgene.org) and reviews on Parkinson’s Disease. • Interactors of risk genes: In collaboration with the IntAct Parkinson’s project funded by the MJ Fox Foundation, we are initially prioritising interactors of three proteins: LRRK2 (PARK8), α-SYNUCLEIN (SNCA/PARK1) and TAU (MAPT). • Processes that are often disrupted in cases of Parkinson’s: In consultation with UCL researchers, we have identified a set of processes that are of great interest in Parkinson’s research: • • • • • • • Housing these annotations in the GO database allows researchers to find out more about their gene of interest, search for common processes within a gene list, or perform more complex queries on their data set. Our annotation efforts will therefore improve the analysis of highthroughput datasets, which rely on large numbers of high-quality annotations for correct interpretation. • We extract data from primary papers and reviews to attach GO terms to Parkinson’s-relevant proteins. Our primary focus is human, but we also capture information from model organisms including fly, rat and mouse. • Mitophagy Mitochondrial fusion and fission Ubiquitination & protein degradation Vesicular transport Regulation of neuron death Lysosomal pathways • • • • • • Autophagy Synaptic transmission Unfolded protein response Oxidative stress response Dopamine transport Wnt signaling • Our curation feeds back into development of the Gene Ontology itself as we expand and improve areas of the ontology relevant to Parkinson’s Disease such as vesicle trafficking, regulation of neuron death, mitophagy etc. In addition to revision of existing terms, the project has so far lead to creation of over 100 new GO terms including ‘L-dopa decarboxylase activity’ (GO:0036468), ‘synaptic vesicle recycling’ (GO:0036465) and ‘negative regulation of oxidative stress-induced neuron death’ (GO:1903204). • We have so far created 1113 annotations to 274 distinct proteins (including 171 human proteins) from 94 papers (statistics correct as of September 2nd 2014). We aim to curate 520 Parkinson’s-relevant proteins by the end of 2015 and 800 in total by the end of 2016. • To follow our progress, please ask to be added to our quarterly newsletter, or visit our project at www.ucl.ac.uk/functional-gene-annotation/neurological. • The Gene Ontology: enhancements for 2011. The Gene Ontology Consortium. Nucleic Acids Res. 40, D559-564 (2012). PMID 22102568 • The IntAct molecular interaction database in 2012. Kerrien et al. Nucleic Acids Res. 40, D841-846 (2012). PMID 22121220 • Representing kidney development using the Gene Ontology. Alam-Faruque et al. PLoS One. Jun 18;9(6):e99864 (2014). PMID 24941002 How YOU can help • • From zebrafish heart jogging genes to mouse and human orthologs: Using Gene Ontology to investigate mammalian heart development. Khodiyar et al. F1000Res, Nov 13 (2013). PMID: 24627794 We are keen to hear from you about the genes and processes YOU think we should be annotating. Please speak to us or email rebecca.foulger@ucl.ac.uk or p.denny@ucl.ac.uk. • Search the GO annotations associated with your favourite Parkinson’s gene - let us know if you think any annotations are missing. • The impact of focused Gene Ontology curation of specific mammalian systems. AlamFaruque et al. PLoS One. 6(12):e27541 (2011). PMID 22174742 • Send us your Parkinson’s-relevant papers to be annotated. www.ucl.ac.uk/functional-gene-annotation/neurological www.geneontology.org The Parkinson’s UK annotation project is funded by Parkinson’s UK, grant G-1307. Project members are part of the GO Consortium.