Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp 1 References Ligtenberg W. 2011. Reactome.db: How to use the reactome.db package. www.reactome.org 2 Reactome.db Overview “Open souce, open access, manually curated, and peer-reviewed pathway database” – www.reactome.org Reactome.db is an R interface that allows queries to the SQL database containing pathway information Contains functions for converting between annotation IDs and names for GO, Entrez, and Reactome 3 Getting Help on Specific Reactome.db Functions #Load the Reactome.db package library(reactome.db) #Check for main manual pages ?reactome.db #This won't get the actual manual #List all reactome.db objects ls("package:reactome.db") # [1] # [4] # [7] #[10] "reactome“ "reactome_dbconn“ "reactome_dbfile" "reactome_dbInfo“ "reactome_dbschema“ "reactomeEXTID2PATHID" "reactomeGO2REACTOMEID“ "reactomeMAPCOUNTS“ "reactomePATHID2EXTID" "reactomePATHID2NAME“ "reactomePATHNAME2ID“ "reactomeREACTOMEID2GO" #Look up specific manual for an object ?reactome_dbInfo #Still not very useful – poor documentation 4 How IDs and names are stored in Reactome.db Key 15869 68616 68827 68867 68874 The reactome.db links to a SQL database Functions are interfaces to the database SQL databases are relational databases (think of Excel spreedsheets, but better) Data is stored as key:value pairs Value Homo sapiens: Metabolism of nucleotides Homo sapiens: Assembly of the ORC complex at the origin of replication Homo sapiens: CDC6 association with the ORC:origin complex Homo sapiens: CDT1 association with the CDC6:ORC:origin complex Homo sapiens: Assembly of the pre-replicative complex 5 Reactome.db Function Uses (NOTE: all return a key:value list) Converting Between Entrez and Reactome reactomeEXTID2PATHID = Entrez ID to Reactome.db ID reactomePATHID2EXTID = Reactome.db Name to Entrez ID > xx <- toTable(reactomeEXTID2PATHID) > head(xx) reactome_id gene_id 1 168253 10898 Use toTable() 2 168254 10898 instead of as.list() 3 168253 8106 that is shown in 4 168254 8106 manuals 5 168253 5610 6 168254 5610 6 Reactome.db Function Uses (NOTE: all return a key:value list) Converting from GO ID and Reactome ID reactomeREACTOMEID2GO = Reactome.db ID to GO IDs reactomeGO2REACTOMEID = GO ID to Reactome.db ID > xx <- toTable(reactomeGO2REACTOMEID) > head(xx) reactome_id go_id 1 168276 GO:0019054 2 168276 GO:0019048 3 168276 GO:0044068 4 168276 GO:0022415 5 168276 GO:0051701 6 168276 GO:0044003 7 Reactome.db Function Uses (NOTE: all return a key:value list) Retrieving Pathway Names from Reactome IDS reactomePATHNAME2ID = Reactome.db Name to Reactome.db ID reactomePATHID2NAME = Reactome.db ID to Reactome.db Name > xx <- toTable(reactomePATHID2NAME) > head(xx) reactome_id path_name 1 15869 Homo sapiens: Metabolism of nucleotides 2 68616 Homo sapiens: Assembly of the ORC complex at the origin of replication 3 68689 Homo sapiens: CDC6 association with the ORC:origin complex 4 68827 Homo sapiens: CDT1 association with the CDC6:ORC:origin complex 5 68867 Homo sapiens: Assembly of the pre-replicative complex 6 68874 Homo sapiens: M/G1 Transition 8 Reactome.db Function Uses (NOTE: all return a key:value list) reactomeMAPCOUNTS = shows number of rows in each function’s relational database (not very useful unless error checking) > xx <- as.list(reactomeMAPCOUNTS) > xx $reactomePATHID2NAME $reactomeEXTID2PATHID [1] 13778 [1] 28363 $reactomePATHNAME2ID $reactomeGO2REACTOMEID [1] 13876 [1] 3217 $reactomeREACTOMEID2GO $reactomePATHID2EXTID [1] 47575 [1] 8320 9 Ex: Find apoptosis induction-related ID (compare to Notes 6.1 slide 10) # Get data.frame summarizing all reactome.db pathways including a certain string xx <- toTable(reactomePATHNAME2ID) all.pathways <- xx$path_name # get name of each reactome.db pathway t <- grep('apoptosis',all.Terms) # get index where Term includes #use agrep() for approximate term searching reactome.Term <- unlist(all.pathways[t]) reactome.IDs <- unlist(xx$reactome_id[t]) reactome.frame <- data.frame(reactome.ID=reactome.IDs, reactome.Term=reactome.Term) rownames(reactome.frame) <- 1:length(reactome.ID) reactome.frame # 13 terms 10 Ex: Find apoptosis induction-related ID (compare to Notes 6.1 slide 10) 11 Ex. Pathway Term Search Function ##Define Function to search for pathways with given key word ##agrep.bool is indicator to use agrep (TRUE) or grep (FALSE) searchPathways2REACTOMEID <- function(term, agrep.bool) { xx <- toTable(reactomePATHNAME2ID) all.pathways <- xx$path_name # get name of each reactome.db pathway #get index where Term is found if (agrep.bool==FALSE) (t <- grep(term, all.pathways)) else (t <agrep(term, all.pathways)) unlist(xx$reactome_id[t]) } apop.IDs <- searchPathways2REACTOMEID("apoptosis", FALSE) length(apop.IDs) #13 pathways matched apop.IDs <- searchPathways2REACTOMEID("apoptosis", TRUE) length(apop.IDs) #85 pathways matched 12 Getting GO Terms from single Reactome ID ##Get List of GO Terms from Reactome ID xx <- toTable(reactomeGO2REACTOMEID) t <- xx$reactome_id == "15869" GOTerms <- xx$go_id[t] > GOTerms [1] "GO:0055086" "GO:0006139" "GO:0044281" [4] "GO:0034641" "GO:0044238" "GO:0008152" [7] "GO:0006807" "GO:0044237" "GO:0008150" [10] "GO:0009987" > xx <- toTable(reactomeGO2REACTOMEID) > head(xx) reactome_id go_id 1 168276 GO:0019054 2 168276 GO:0019048 3 168276 GO:0044068 4 168276 GO:0022415 5 168276 GO:0051701 6 168276 GO:0044003 13 Getting GO Terms from list of Reactome IDs ##Define Function to get all GO Terms for all Reactome IDs in a list getGOTerms <- function(list_reactome) { listGO = list(); xx <- toTable(reactomeGO2REACTOMEID); for(i in 1:length(list_reactome)) {t <- xx$reactome_id==list_reactome[i]; temp_list = xx$go_id[t] listGO = c(listGO, temp_list)} unlist(listGO) } GOTerms.all <- getGOTerms(apop.IDs)#From slide 10 length(GOTerms.all) #136 GO Terms from 13 apop.IDs Should have yielded 169 terms (Notes 4.1 slide 10) – reactome.db might not be complete 14 Reactome.org Online Tools 15 Pathway Viewer on reactome.org http://www.reactome.org/userguide/Usersguide.html#Introduction 16 Pathway Viewer on reactome.org Details Panel 17 Pathway Viewer on reactome.org http://www.reactome.org/entitylevelview/PathwayBrowser.html#DB=gk_current&FOCUS_SPECIES_ID=48887&FOCUS_PATHWAY_ID=71387&ID=76213&VID=3422142 18 Reactome Pathway Symbols Upregulation and participating proteins Inhibition http://www.reactome.org/entitylevelview/PathwayBrowser.html#DB=gk_current&FOCUS_SPECIES_ID=48887&FOCUS_PATHWAY_ID=71387&ID=76213&VID=3422142 19 Reactome Database Assignment Method Genes seem to be assigned to pathways in a similar manner to GO database If gene is up-regulated, it is included Genes that are down-regulated in a condition are NOT mapped to the condition/pathway Haven’t received official response from reactome.org, but from general browsing this seems to be the case 20 Pathway Analysis Tool http://www.reactome.org/ReactomeGWT/entrypoint.html#PathwayAnalysisDataUploadPage 21 Pathway Analysis Tool http://www.reactome.org/ReactomeGWT/entrypoint.html#PathwayAnalysisDataUploadPage 22 Expression Set Data Analysis 23 Expression Set Data Analysis 24 Summary Reactome.db provides an interface to the SQL database containing IDs Functions for converting between ID types No functionality for gene testing through R Online tools include pathway maps and ID lookup tables Some limited expression testing (with unknown statistical methods) 25 Questions? 26