Converting from GO ID and Reactome ID

advertisement
Annotation for Gene Expression
Analysis with Reactome.db Package
Utah State University – Spring 2012
STAT 6570: Statistical Bioinformatics
Cody Tramp
1
References

Ligtenberg W. 2011. Reactome.db: How to
use the reactome.db package.

www.reactome.org
2
Reactome.db Overview



“Open souce, open access, manually
curated, and peer-reviewed pathway
database” – www.reactome.org
Reactome.db is an R interface that allows
queries to the SQL database containing
pathway information
Contains functions for converting between
annotation IDs and names for GO, Entrez,
and Reactome
3
Getting Help on Specific Reactome.db
Functions
#Load the Reactome.db package
library(reactome.db)
#Check for main manual pages
?reactome.db #This won't get the actual manual
#List all reactome.db objects
ls("package:reactome.db")
# [1]
# [4]
# [7]
#[10]
"reactome“ "reactome_dbconn“ "reactome_dbfile"
"reactome_dbInfo“ "reactome_dbschema“ "reactomeEXTID2PATHID"
"reactomeGO2REACTOMEID“ "reactomeMAPCOUNTS“ "reactomePATHID2EXTID"
"reactomePATHID2NAME“ "reactomePATHNAME2ID“ "reactomeREACTOMEID2GO"
#Look up specific manual for an object
?reactome_dbInfo #Still not very useful – poor documentation
4
How IDs and names are stored in
Reactome.db




Key
15869
68616
68827
68867
68874
The reactome.db links to a SQL database
Functions are interfaces to the database
SQL databases are relational databases
(think of Excel spreedsheets, but better)
Data is stored as key:value pairs
Value
Homo sapiens: Metabolism of nucleotides
Homo sapiens: Assembly of the ORC complex at the origin of replication
Homo sapiens: CDC6 association with the ORC:origin complex
Homo sapiens: CDT1 association with the CDC6:ORC:origin complex
Homo sapiens: Assembly of the pre-replicative complex
5
Reactome.db Function Uses
(NOTE: all return a key:value list)
Converting Between Entrez and Reactome
reactomeEXTID2PATHID = Entrez ID to Reactome.db ID
reactomePATHID2EXTID = Reactome.db Name to Entrez ID
> xx <- toTable(reactomeEXTID2PATHID)
> head(xx)
reactome_id gene_id
1
168253
10898
Use toTable()
2
168254
10898
instead of as.list()
3
168253
8106
that is shown in
4
168254
8106
manuals
5
168253
5610
6
168254
5610
6
Reactome.db Function Uses
(NOTE: all return a key:value list)
Converting from GO ID and Reactome ID
reactomeREACTOMEID2GO = Reactome.db ID to GO IDs
reactomeGO2REACTOMEID = GO ID to Reactome.db ID
> xx <- toTable(reactomeGO2REACTOMEID)
> head(xx)
reactome_id
go_id
1
168276 GO:0019054
2
168276 GO:0019048
3
168276 GO:0044068
4
168276 GO:0022415
5
168276 GO:0051701
6
168276 GO:0044003
7
Reactome.db Function Uses
(NOTE: all return a key:value list)
Retrieving Pathway Names from Reactome IDS
reactomePATHNAME2ID = Reactome.db Name to Reactome.db ID
reactomePATHID2NAME = Reactome.db ID to Reactome.db Name
> xx <- toTable(reactomePATHID2NAME)
> head(xx)
reactome_id path_name
1 15869 Homo sapiens: Metabolism of nucleotides
2 68616 Homo sapiens: Assembly of the ORC complex at the origin of replication
3 68689 Homo sapiens: CDC6 association with the ORC:origin complex
4 68827 Homo sapiens: CDT1 association with the CDC6:ORC:origin complex
5 68867 Homo sapiens: Assembly of the pre-replicative complex
6 68874 Homo sapiens: M/G1 Transition
8
Reactome.db Function Uses
(NOTE: all return a key:value list)
reactomeMAPCOUNTS = shows number of rows in each
function’s relational database (not
very useful unless error checking)
> xx <- as.list(reactomeMAPCOUNTS)
> xx
$reactomePATHID2NAME
$reactomeEXTID2PATHID
[1] 13778
[1] 28363
$reactomePATHNAME2ID
$reactomeGO2REACTOMEID
[1] 13876
[1] 3217
$reactomeREACTOMEID2GO
$reactomePATHID2EXTID
[1] 47575
[1] 8320
9
Ex: Find apoptosis induction-related ID
(compare to Notes 6.1 slide 10)
# Get data.frame summarizing all reactome.db pathways including a
certain string
xx <- toTable(reactomePATHNAME2ID)
all.pathways <- xx$path_name # get name of each reactome.db pathway
t <- grep('apoptosis',all.Terms) # get index where Term includes
#use agrep() for approximate term searching
reactome.Term <- unlist(all.pathways[t])
reactome.IDs <- unlist(xx$reactome_id[t])
reactome.frame <- data.frame(reactome.ID=reactome.IDs,
reactome.Term=reactome.Term)
rownames(reactome.frame) <- 1:length(reactome.ID)
reactome.frame # 13 terms
10
Ex: Find apoptosis induction-related ID
(compare to Notes 6.1 slide 10)
11
Ex. Pathway Term Search Function
##Define Function to search for pathways with given key word
##agrep.bool is indicator to use agrep (TRUE) or grep (FALSE)
searchPathways2REACTOMEID <- function(term, agrep.bool)
{
xx <- toTable(reactomePATHNAME2ID)
all.pathways <- xx$path_name # get name of each reactome.db pathway
#get index where Term is found
if (agrep.bool==FALSE) (t <- grep(term, all.pathways)) else (t <agrep(term, all.pathways))
unlist(xx$reactome_id[t])
}
apop.IDs <- searchPathways2REACTOMEID("apoptosis", FALSE)
length(apop.IDs) #13 pathways matched
apop.IDs <- searchPathways2REACTOMEID("apoptosis", TRUE)
length(apop.IDs) #85 pathways matched
12
Getting GO Terms from single
Reactome ID
##Get List of GO Terms from Reactome ID
xx <- toTable(reactomeGO2REACTOMEID)
t <- xx$reactome_id == "15869"
GOTerms <- xx$go_id[t]
> GOTerms
[1] "GO:0055086" "GO:0006139" "GO:0044281"
[4] "GO:0034641" "GO:0044238" "GO:0008152"
[7] "GO:0006807" "GO:0044237" "GO:0008150"
[10] "GO:0009987"
> xx <- toTable(reactomeGO2REACTOMEID)
> head(xx)
reactome_id
go_id
1
168276 GO:0019054
2
168276 GO:0019048
3
168276 GO:0044068
4
168276 GO:0022415
5
168276 GO:0051701
6
168276 GO:0044003
13
Getting GO Terms from list of
Reactome IDs
##Define Function to get all GO Terms for all Reactome IDs in a list
getGOTerms <- function(list_reactome)
{
listGO = list(); xx <- toTable(reactomeGO2REACTOMEID);
for(i in 1:length(list_reactome))
{t <- xx$reactome_id==list_reactome[i]; temp_list = xx$go_id[t]
listGO = c(listGO, temp_list)}
unlist(listGO)
}
GOTerms.all <- getGOTerms(apop.IDs)#From slide 10
length(GOTerms.all) #136 GO Terms from 13 apop.IDs
Should have yielded 169 terms (Notes 4.1 slide 10)
– reactome.db might not be complete
14
Reactome.org Online Tools
15
Pathway Viewer on reactome.org
http://www.reactome.org/userguide/Usersguide.html#Introduction
16
Pathway Viewer on reactome.org

Details Panel
17
Pathway Viewer on reactome.org
http://www.reactome.org/entitylevelview/PathwayBrowser.html#DB=gk_current&FOCUS_SPECIES_ID=48887&FOCUS_PATHWAY_ID=71387&ID=76213&VID=3422142
18
Reactome Pathway Symbols
Upregulation and
participating proteins
Inhibition
http://www.reactome.org/entitylevelview/PathwayBrowser.html#DB=gk_current&FOCUS_SPECIES_ID=48887&FOCUS_PATHWAY_ID=71387&ID=76213&VID=3422142
19
Reactome Database Assignment Method

Genes seem to be assigned to pathways in a
similar manner to GO database



If gene is up-regulated, it is included
Genes that are down-regulated in a condition are
NOT mapped to the condition/pathway
Haven’t received official response from
reactome.org, but from general browsing this
seems to be the case
20
Pathway Analysis Tool
http://www.reactome.org/ReactomeGWT/entrypoint.html#PathwayAnalysisDataUploadPage
21
Pathway Analysis Tool
http://www.reactome.org/ReactomeGWT/entrypoint.html#PathwayAnalysisDataUploadPage
22
Expression Set Data Analysis
23
Expression Set Data Analysis
24
Summary





Reactome.db provides an interface to the
SQL database containing IDs
Functions for converting between ID types
No functionality for gene testing through R
Online tools include pathway maps and ID
lookup tables
Some limited expression testing (with
unknown statistical methods)
25
Questions?
26
Download