GO-based tools for functional modeling TAMU GO Workshop 17 May 2010 Functional Modeling 1. Grouping by function GO Slim sets GO browser tools GOSlimViewer 2. Expression analysis 3. DAVID EasyGO/agriGO Onto-Express Funcassociate 2.0 Pathway & network analysis Workshop Part 2 contains some functional modeling tutorials that use some of these tools. 1. Grouping by function GO Slim Sets slim sets are abbreviated versions of the GO contain broader functional terms made by different GO Consortium groups (for different purposes, eg. plant, yeast, etc) need to cite which one you used! More information about GO terms for each slim set can be found at EBI QuickGO: http://www.ebi.ac.uk/QuickGO/ GO Slim and Subset Guide http://www.geneontology.org/GO.slims.shtml QuickGO: Create your own subset/slim of GO terms http://www.ebi.ac.uk/QuickGO/ GO slims tutorial available This tutorial will describe GO slims, what they are used for and how to use QuickGO for: * creating a custom GO slim * using a pre-defined GO slim * obtaining GO annotations to a GO slim * customising a set of slimmed annotations * using statistics calculated by QuickGO to generate graphical representations of the data AmiGO: GO Slimmer http://amigo.geneontology.org/cgibin/amigo/slimmer?session_id=4878amig o1273279396 Note – AmiGO browser does not include IEA annotations. GOSlimViewer input file Input is a text file containing 3 tab separated columns: 1. accession 2. GO:ID 3. aspect (P,F or C) • file provided by GORetriever • can manually add to it from GOanna excel file allows you to include your additional GO annotations in the analysis GOSlimViewer output GOSlimViewer output GOSlimViewer output 2. Expression analysis Determining which classes of gene products are over-represented or under-represented. http://www.geneontology.org/ However…. many of these tools do not support agricultural species the tools have different computing requirements A list of these tools that can be used for agricultural species is available on the workshop website at the “Summary of Tools for gene expression analysis” link. Evaluating GO tools Some criteria for evaluating GO Tools: 1. Does it include my species of interest (or do I have to “humanize” my list)? 2. What does it require to set up (computer usage/online) 3. What was the source for the GO (primary or secondary) and when was it last updated? 4. Does it report the GO evidence codes (and is IEA included)? 5. Does it report which of my gene products has no GO? 6. Does it report both over/under represented GO groups and how does it evaluate this? 7. Does it allow me to add my own GO annotations? 8. Does it represent my results in a way that facilitates discovery? Some useful expression analysis tools: Database for Annotation, Visualization and Integrated Discovery (DAVID) http://david.abcc.ncifcrf.gov/ agriGO -- GO Analysis Toolkit and Database for Agricultural Community http://bioinfo.cau.edu.cn/agriGO/ used to be EasyGO chicken, cow, pig, mouse, cereals, dicots includes Plant Ontology (PO) analysis Onto-Express http://vortex.cs.wayne.edu/projects.htm#Onto-Express can provide your own gene association file Funcassociate 2.0: The Gene Set Functionator http://llama.med.harvard.edu/funcassociate/ can provide your own gene association file http://david.abcc.ncifcrf.gov/ functional grouping – including GO, pathways, gene-disease association ID Conversion search functionally related genes regular updates online support & publications http://bioinformatics.cau.edu.cn/easygo/ May 2010: EasyGO replaced by agriGO http://bioinfo.cau.edu.cn/agriGO/ enrichment analysis using either GO or Plant Ontology (PO) 40 species: chicken, cow, pig, mouse, cereals, poplar, fruits GenBank, EMBL, UniProt Affymetrix, Operon, Agilent arrays Onto-Express http://vortex.cs.wayne.edu/projects.htm Onto-Express analysis instructions are Available in onto-express.ppt Species represented in Onto-Express Can upload your own annotations using OE2GO http://llama.med.harvard.edu/funcassociate/ 3. Pathway & network analysis GO, Pathway, Network Analysis Many GO analysis tools also include pathway & network analysis Ingenuity Pathways Analysis (IPA) and Pathway Studios – commercial software DAVID – includes multiple functional categories Onto-Tools – includes Pathways Express tool Pathways & Networks A network is a collection of interactions Pathways are a subset of networks Network of interacting proteins that carry out biological functions such as metabolism and signal transduction All pathways are networks of interactions Not all networks are pathways Pathways Resources KEGG BioCyc Reactome GenMAPP BioCarta http://www.genome.jp/kegg/pathway.html/ http://www.biocyc.org/ http://www.reactome.org/ http://www.genmapp.org/ http://www.biocarta.com/ Pathguide – the pathway resource list http://www.pathguide.org/ Biological Networks Networks often represented as graphs Nodes represent proteins or genes that code for proteins Edges represent the functional links between nodes (ex regulation) Small changes in graph’s topology/architecture can result in the emergence of novel properties Types of interactions protein (enzyme) – metabolite (ligand) protein – protein metabolic pathways cell signaling pathways, protein complexes protein – gene genetic networks Network example: STRING Database http://string.embl.de/ Sod1 Mus musculus Database/URL/FTP PLoS Computational Biology March 2007, Volume 3 e42 •DIP http://dip.doe-mbi.ucla.edu •BIND http://bind.ca •MPact/MIPS http://mips.gsf.de/services/ppi •STRING http://string.embl.de •MINT http://mint.bio.uniroma2.it/mint •IntAct http://www.ebi.ac.uk/intact •BioGRID http://www.thebiogrid.org •HPRD http://www.hprd.org •ProtCom http://www.ces.clemson.edu/compbio/ProtCom •3did, Interprets http://gatealoy.pcb.ub.es/3did/ •Pibase, Modbase http://alto.compbio.ucsf.edu/pibase •CBM ftp://ftp.ncbi.nlm.nih.gov/pub/cbm •SCOPPI http://www.scoppi.org/ •iPfam http://www.sanger.ac.uk/Software/Pfam/iPfam •InterDom http://interdom.lit.org.sg •DIMA http://mips.gsf.de/genre/proj/dima/index.html •Prolinks http://prolinks.doembi.ucla.edu/cgibin/functionator/pronav/ •Predictome http://predictome.bu.edu/ Retrieval of interaction datasets Evaluate PPI resources such as Predictome or Prolinks for existence of species of interest If unavailable, find orthologous proteins in related species that have interactions! 4. Hypothesis testing using GO http://www.genetools.microarray.ntnu.no/common/intro.php eGOn v2.0 can test statistical hypotheses of association between gene reporter lists: Master-Target Mutually Exclusive Target-Target Intersecting Target-Target situation statistical hypothesis testing using the GO allows addition of extra GO annotation http://gdm.fmrp.usp.br/cgi-bin/gc/upload/upload.pl visualization for mapping SAGE data onto GO graphical visualization of the percentage of SAGE tags in each GO category, along with confidence intervals and hypothesis testing http://www.agbase.msstate.edu/cgi-bin/tools/GOModeler.pl takes a user generated of hypothesis/GO term statements and tests the quantitative effect of gene expression values on these statements Some comments on analysis tools: > 68 GO based analysis tools listed on the GO Consortium website (not a comprehensive list!) several tools combine GO, pathway and network functional analysis many different ways of visualizing the results expanding the species supported by analysis tools – check with tool developers check for last updates & user support information HOMEWORK ! For those in the afternoon session who have sequence files (eg. RNA-Seq data, EST data, etc): Please prepare a sample (approx 100 sequences) and send it through GOanna so that you can get your results emailed back for this afternoon. - please try to use a species specific database to improve run times - see me if you have questions