Getting Started With Reactome This tutorial provides an introduction to Reactome, the user interfaces and the database content. Exercises are provided to help you practice what you have learned; you will need to refer to the details and screenshots in this document. Further information can be found in the online Reactome user guide at http://www.reactome.org/userguide/Usersguide.html. You will learn about: Pathway and reaction visualization in Reactome Querying Reactome via the web interface Overlays of interaction and expression data onto Reactome pathways Performing pathway enrichment analysis in Reactome Mapping gene expression data to Reactome pathways Getting Started With Reactome .............................................................. 1 You will learn about:................................................................................ 1 What is Reactome? ................................................................................. 3 What is this Tutorial For? ........................................................................ 3 Exercises................................................................................................. 3 The Reactome home page ...................................................................... 4 Exercise 1 ............................................................................................ 5 The Pathway Browser ............................................................................. 6 The Pathways tab - pathway hierarchy ................................................ 7 Exercise 2 ............................................................................................ 8 Pathway Diagram s ................................................................................. 8 Exercise 3 .......................................................................................... 10 Navigating pathway diagrams ............................................................... 11 Exercise 4 .......................................................................................... 12 The Details Panel .................................................................................. 12 Exercise 5 .......................................................................................... 13 Reactome Tools .................................................................................... 15 Pathway Analysis ............................................................................... 15 Exercise 6 .......................................................................................... 19 Species Comparison ............................................................................. 20 Exercise 7 .......................................................................................... 23 This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. 1 Expression Analysis .............................................................................. 23 Exercise 8 .......................................................................................... 27 Molecular Interaction Overlay................................................................ 27 Exercise 9 .......................................................................................... 29 Other Reactome Features ..................................................................... 29 Features and Tools in the Pathway Browser ...................................... 30 Searching Reactome ............................................................................. 35 Simple text search .............................................................................. 35 Searching Pathway Diagrams ............................................................ 36 Details Views......................................................................................... 37 Pathway Details ................................................................................. 37 Reaction Details ................................................................................. 39 Complex or Set Details ...................................................................... 39 Protein/Small molecule Details........................................................... 40 The BioMart interface ............................................................................ 40 Exercise 10 ........................................................................................ 43 Answers to Exercises ............................................................................ 44 Exercise 1 .......................................................................................... 44 Exercise 2 .......................................................................................... 44 Exercise 3 .......................................................................................... 44 Exercise 4 .......................................................................................... 45 Exercise 5 .......................................................................................... 45 Exercise 6 .......................................................................................... 46 Exercise 7 .......................................................................................... 46 Exercise 8 .......................................................................................... 46 Exercise 9 .......................................................................................... 47 Exercise 10 ........................................................................................ 47 2 What is Reactome? Reactome is a curated database of pathways and reactions (pathway steps) in human biology. The Reactome definition of a ‘reaction’ includes many events in biology that are changes in state, such as binding, activation, translocation and degradation, in addition to classical biochemical reactions. Information in the database is authored by expert biologist researchers, maintained by Reactome editorial staff, and extensively cross-referenced to other resources e.g. NCBI, Ensembl, UniProt, UCSC Genome Browser, HapMap, KEGG (Gene and Compound), ChEBI, PubMed and GO. Inferred orthologous reactions are available for over 20 non-human species including mouse, rat, chicken, puffer fish, worm, fly, yeast, rice, Arabidopsis and E.coli. What is this Tutorial For? This tutorial will introduce features of the Reactome website using a combination of short explanations and exercises. When completed you will have learned enough to be able to search Reactome, understand the views, use the Tools and if necessary find documentation and obtain Help. Exercises The exercises will help you understand Reactome content. Some of the questions are not directly addressed in the tutorial but can be answered with a little deduction – if you get stuck the answers are available at the end of this document in Answers to Exercises. 3 The Reactome home page Reactome’s home page is http://www.reactome.org/. This page contains links, tools, search functions and documentation that will help you use Reactome. The home page, represented below, is divided into 3 main sections: Navigation Bar – Across the top of the page, under the Reactome banner. Dropdown menus access all the data and functionality available on the Reactome website including: o Home. Link to Reactome home page. 4 o About. Background information o Content. Overview of the database, areas of biology we cover, plans for future coverage, database schema, statistics o Documentation. Includes the User Guide and technical guides. o Tools. o Download. Links downloadable Reactome data and code in various formats. o Contact Us. Launches your email program. o Outreach. Details of Reactome training, publications and representation at conferences. Sidebar – The left Sidebar is divided into upper and lower sections. Upper section has buttons that launch: o Pathway Browser – the main pathway viewer o Pathway Analysis – Map IDs to pathway and perform overrepresentation analysis o Species Comparison – compare curated human pathways to inferred model organism equivalents o Expression Analysis – overlay pathways with user-supplied expression data Lower section includes data downloads and a ‘Your comments’ feedback tool. Main text – Reactome description Pathway of the month Online tutorial Reactome news. Exercise 1 1. What’s the latest news item on the Reactome homepage? 2. How many human proteins are represented in Reactome? 3. What date is the next release? What is the first new topic that will be included? 5 The Pathway Browser The Pathway Browser is the primary means of interacting with pathways in Reactome. It is a search tool, interactive pathway viewer and toolset for exploring specific pathways. These tools allow several types of analysis including: Comparison of a pathway with its equivalent in another species The overlay of user-supplied expression data onto a pathway The overlay of protein-protein or protein-compound data from external databases or user-supplied data onto a pathway. The Shortcut buttons on the Sidebar access extended versions of these tools that query across all Reactome pathways (covered later in this Tutorial, or see Reactome Tools in this document). The Pathway Browser is launched by clicking the top button on the Sidebar. The Pathway Browser has 4 panels: Search and Analyze bar – 6 Includes a search tool for pathways and the proteins they contain (detailed in Searching Pathway Diagrams). The Analyze, Annotate & Upload button opens a panel that controls the interactive tools associated with pathway diagrams (explained in detail in the section ‘The Analyze, Update and Annotate Button’ below). On the far left is a Home button that returns you to the homepage. The Sidebar – Top is a species selector, with dropdown list of species. The default is Homo sapiens. N.B. Reactome data is human-centric, data for other species is inferred from human pathways – pathway steps may be missing for other organisms if they are not identified by the inference process (described here). Below the Species selector the Sidebar has three tabs: o Search Results tab displays hits following a text search o Pathways tab displays the hierarchy of Reactome pathways. Similar to the Windows File Manager, sub-pathways can be revealed by clicking on the + symbol to the left of the pathway name. o Help tab– help. Click the blue arrowhead on the right edge of the sidebar to hide/reveal it. The Pathway Diagram Panel – This is where pathway diagrams are displayed when selected in the Pathways tab or following a search Note that this panel will be blank until you select a pathway! Top-left of this panel is a navigate/zoom tool. Click on the arrows to move across the diagram, click on the circles to zoom in/out (the mouse wheel also zooms). The Details Panel – Contains details of the pathway, reaction, complex, set or protein selected (see Details section). Blue arrowhead at the bottom of the Pathway Diagram Panel hides/reveals it (you may need to open it). The Pathways tab - pathway hierarchy Large topics such as apoptosis are too large to represent as a single pathway. Instead they are divided into sub-pathways. This can be seen on the Pathways tab as a pathway hierarchy. This view functions in a similar manner to the Windows File Manager; sub-pathways are revealed by clicking on the + symbol to the left of the pathway name, and hidden by clicking on the – symbol. Pathway names in the pathway hierarchy are preceded by the symbol . Sub-pathways frequently have their own sub-pathways; there is no limit to the number of levels. As sub-pathways are revealed, at some point a level is reached where the steps of the pathway, or ‘reactions’ are detailed. Reactions are 7 represented by the symbol . You may also see the symbol representing “black-box” reactions, where complete details have been omitted as unnecessary or are not completely determined. The order of reactions from top to bottom in the hierarchy usually follows their order in the pathway, so that preceding reactions are above the subsequent reaction, but note that this is not always the case. Reactome has a more formal way of identifying connected reactions called Preceding and Following Events, visible in the Reaction Details (see this section below). Exercise 2 From the homepage, search for ‘Notch signaling’. Click on the top pathway hit. This will open it in the Pathway Browser. Ignoring the diagram for now, look at the Pathways tab on the left. 1. How many sub-pathways does this pathway have? 2. How many reactions are in the first of these sub-pathways? 3. What reaction follows Notch 2 precursor transport to Golgi? Hint: If it’s not visible, open the Details pane at the bottom of the page by clicking on the blue triangle. Pathway Diagrams Pathway diagrams represent the steps of a pathway as a series of interconnected pathway steps, known in Reactome as ‘reactions’. Cellular compartments are represented as pink boxes – a typical diagram has a box representing the cytosol, bounded by a double-line that represents the plasma membrane. The white background outside this represents the extracellular space. Other organelles are represented as additional labelled boxes within the cytosol. Molecules are represented in the physiologically correct cellular compartment, or lie on the boundary of a compartment to indicate they are in the corresponding membrane, e.g. a molecule on the boundary of the cytosol is in the plasma membrane. 8 A reaction includes: Input and output molecules, and a catalyst if relevant (see A below). Inputs, outputs and catalysts are represented as boxes or ovals. Ovals (always green) are small molecules or sets of small molecules Green boxes with rounded-off corners are individual proteins or sets of proteins, or sets that contain proteins and small molecules. Green boxes with square corners are proteins that have no associated Uniprot accession (usually because the protein was not available in Uniprot at the time the reaction was created). Blue boxes are complexes, i.e. proteins and/or small molecules that are bound to each other. Reaction inputs and outputs molecules are joined by lines to a central ‘reaction node’ (surrounded by a green box in Figure A). Clicking this selects the reaction. The outputs of a reaction have an arrowhead on the line connecting them to the reaction node. Numbered boxes on the line between an input/output and the reaction node indicates the number of molecules of this type in the reaction (when n >1). Inputs/outputs are often connected by arrows to preceding or subsequent reactions (e.g. the preceding/subsequent steps in the pathway). Catalysts are connected to the reaction node by a line ending in a circle. Molecules that regulate a reaction are connected to the reaction node by a line ending in an open triangle for positive regulation or a ‘T’-shaped head for negative regulation (see B below). A 9 B The reaction node has 5 subtypes, indicating subclasses of reaction. Open squares represent a ‘transition’ Filled circles represent ‘association’, i.e. binding Double-circles represent ‘dissociation’ Squares with two slashes represent ‘omitted process’. This is used to denote a reaction where the full details have been deliberately omitted. This is most commonly used for events that include specific members of a protein family to illustrate the general behaviour of the larger group. It is used for reactions that occur with no fixed order or stoichiometry, and for degradation where the output is a random set of peptide fragments. Squares containing a question mark represent ‘uncertain process’, where some details of the reaction are known, but the process is thought to be more complex than it is represented. Explanatory details are typically included in the Description. Exercise 3 From the Homepage, search for the pathway ‘Effects of PIP2 hydrolysis’ and open it in the Pathway Browser. 1. What symbol represents the reaction for ‘Binding of IP3 to the IP3 receptor’? 2. What symbol represents the reaction ‘Transport of Ca++ from platelet dense tubular system to cytoplasm’? What subtype of reaction is this? 3. What is the catalyst (descriptive name) for ‘2-AG hydrolysis to arachidonate by MAGL’? Can you find its UniProt ID and name the two outputs of this reaction? 10 Navigating pathway diagrams Clicking on a name in the Pathways tab causes the corresponding pathway diagram to appear in the right-hand Pathway Diagram panel. For example, clicking on the pathway "Apoptosis" highlights the pathway name in green, and displays the pathway diagram. If the Details Panel is revealed, details of this pathway are shown (see diagram below - these details are explained in the Pathway Details section). Hovering the mouse over a sub-pathway name in the hierarchy causes the equivalent pathway diagram item(s) to be surrounded by a blue ‘preselection’ box. Large topics such as apoptosis contain too much information to be displayed as a single pathway or diagram. Where this is the case, the topic is divided into sub-pathways in the pathway hierarchy, and subpathway diagrams in the Pathway Diagram Panel. Pathways that only contain sub-pathways diagrams are displayed as an overview diagram, consisting of Sub-pathway Boxes with green borders. These boxes indicate that detailed diagrams are available but not displayed at this level. There are two ways to access a sub-pathway diagram. Select the Subpathway Box and click on any of the highlighted sub-pathway names on the Pathway tab. In the example shown below, the Sub-pathway Box ‘Intrinsic Pathway for Apoptosis is selected (boxed in green) causing the corresponding sub-pathway to be highlighted yellow-green and parent pathway highlighted green on the Pathway tab. The sub-pathway name has not been clicked. Clicking on the sub-pathway name in the hierarchy causes the corresponding pathway diagram to open in the Pathway Diagram Panel (see below). Alternatively, right-click on a Sub-pathway Box to produce a menu, select the option ‘Go To Pathway’. 11 Sub-pathways frequently have their own sub-pathways, there is no limit to the number of levels. In the example represented below, the sub-pathway FasL/ CD95L signaling has been selected on the Pathway tab, causing the reaction ‘nodes’ (squares) for all the steps of this sub-pathway to be surrounded by green squares on the Pathway Details panel. Selecting a pathway step (reaction) in the pathway hierarchy causes the Details Panel to show the details of that reaction. If the reaction is not currently visible on the Pathway Diagram Panel, the view will re-centre on the selected reaction (may take a few seconds). You can zoom in or out of a diagram using the navigation tool top-left of the Pathway Diagram panel, or use the mouse wheel. You can also click and drag the diagram. Exercise 4 Open the pathway Apoptosis in the Pathway Browser. Select the sub-pathway box ‘Intrinsic Pathway in Apoptosis’. 1. How do you open the pathway diagram for this sub-pathway? 2. With this pathway diagram open, what happens if you click on the sub-pathway ‘BH3-only proteins associate with and inactivate antiapoptotic BCL-2 members’? 3. What happens if you click on the reaction ‘Sequestration of tBID by BCL-2’ in the in the pathway hierarchy? The Details Panel 12 The Details Panel is at the bottom of the Pathway Browser. It gives details of the selected pathway, reaction, complex, set or proteins, when they are selected in the pathway diagram or Pathways tab. This panel can be revealed/hidden using the small blue triangle on the border with the Pathway Diagram panel. Below is a reaction details example (the other detail views are similar – see the sections for Pathway, Complex and Protein Details below). Included in the details are a summary of the reaction and possibly a figure. Reactions must contain References that provide experimental data verifying the reaction, or an ‘Inferred From’ link if the reaction has been manually inferred from experimental data from model organisms. Other details within this section include: Authored - the expert biologists that contributed materials that allowed this pathway to be created in Reactome. Reviewed - the expert biologists that verified the content for this pathway. Input/Output – identifies the input/output molecules, sets or complexes for this reaction. Icons to the right of these named items link to further information in external resources, e.g. the red U on a grey background links to Uniprot. Cellular Compartment – identifies the cellular compartment for this reaction, with the associated GO term. Preceding/Following Events – links to pathways that precede/follow this reaction. Exercise 5 1. Find the reaction 'Activated type I receptor phosphorylates R-SMAD directly'. What pathway does it belong to? 13 2. In which cellular compartment does this reaction take place? 3. What is the associated GO molecular function? 4. What references verify this reaction? 5. Is this reaction predicted to occur in Canis familiaris? In Saccharomyces cerevisiae? 14 Reactome Tools Pathway Analysis The Pathway Analysis tool takes a user-supplied set of gene or protein identifiers and shows the Reactome pathways they match. It can also perform a statistical test to determine whether some pathways are overrepresented (enriched) in the submitted data, i.e. it answers the question ‘does the list represent the proteins within a specific pathway more than would be expected if the set were random?’ Pathway Analysis is launched by pressing the ‘Pathway Analysis’ button on the sidebar, on the left-hand side of the homepage. The tool opens as a data entry page with a box for pasting your list of protein identifiers (one per line). Alternatively, use the browse button to locate a file. Valid identifiers include UniProt accession numbers and IDs, GenBank/EMBL/DDBJ, RefPep, RefSeq, EntrezGene, MIM and InterPro IDs, Affymetrix, Illumina and Agilent probe IDs, Ensembl protein, transcript and gene identifiers. Identifiers that contain only numbers, such as those from OMIM and EntrezGene must be prefixed by the source database name and a colon e.g. MIM:602544, EntrezGene:55718. Mixed identifier lists can be used. If a mixed species list is used, only identifiers from the most frequent species are used in the analysis. 15 Select one of the two radio buttons to determine whether simple ID matching or over-representation analysis will be performed. Click the Analyse button to begin the analysis. This may take several minutes, depending on the size of the data set submitted and load on the server. While the analysis is in progress a progress indicator is displayed. ID mapping and pathway assignment results If this option was selected, the results will appear something like the example below: 16 The table represents every protein identified from the submitted list. If the ID was not recognized, it is displayed in column 1 but all other columns will be blank. The columns can be sorted. Click once on the title of the column. After a while, a small white arrow appears next to the column title, indicating the direction of the sort. Click the title a second time to sort in the opposite direction. N.B. Large datasets may take a couple of minutes to sort – be patient! The columns represent: 1. The IDs you supplied. 2. The corresponding UniProt ID 3. Species. 4. List of names of pathways in which this protein is found. Clicking a pathway name opens the Pathway Browser (in a new tab or browser) and displays the appropriate pathway diagram. The protein that corresponds to the ID in the results table will be highlighted in the pathway diagram. See the section above on Navigating pathway diagrams to learn how to interpret and navigate these diagrams. The table of results can be exported in several formats using the Download button at the top of the table. Overrepresentation analysis If this option was selected, the top of the Pathway Analysis results will look something like those below: 17 This uppermost results section, 'Statistically over-represented events in hierarchy', represents all Reactome pathways that contain proteins identified from the submitted list. N.B. Clicking on a pathway name navigates away from the results and opens the pathway diagram in the Pathway Browser. The colour used to highlight the pathway name indicates the level of overrepresentation, i.e. the bias in your dataset towards proteins in that pathway; the warmer the color, the higher the level of over-representation in the pathway. To the right of the pathway name is the p-value, followed by the number of proteins from the submitted set that matched the pathway/the total number of proteins in the pathway. The order of pathways is also determined by p-value though this may not be immediately obvious: By default, over-representation analysis results display only 'top-level' pathways, i.e. the list of pathways seen in the Table of Contents or Pathway Browser Pathways tab when the list is unexpanded. These pathways represent the top of a hierarchical tree, and may contain subpathways with lower p-values that might therefore be of greater interest. To draw attention to sub-pathways with highly significant scores, the list of top-level pathways is ordered by the most significant p-value within it's hierarchy. In the image above, the pathway 'Cell Cycle, Mitotic' highlighted 18 in dark blue contains the sub-pathway ‘G1/S transition’ highlighted in yellow, which has a considerably lower p-value and consequently has pushed ‘Cell Cycle, Mitotic’ up the list. The top-level pathways, and any sub-pathways they contain, can be expanded using the + symbols, or alternatively, the entire tree can be expanded/compressed using the Open All and Close All buttons. For each represented pathway, there is a ‘Matching identifiers’ list of the identifiers and associated proteins that contributed to the overrepresentation score. The second section 'Statistically over-represented events as an ordered list' gives the same information in a downloadable tabular form. The third section, 'Reactions coloured according to the number of genes or compounds (as specified by the submitted list of identifiers) participating in the given reaction' is a map of all reactions coloured by the number of participants in the reaction that were included in the submitted list. Reactions with no participants represented in the list are coloured grey. Additional sections allow downloads of the graphics, and a breakdown of the mapping from submitted identifiers to reactions, including those that did not match – this is useful to identify problems with the list, or identify proteins that could not contribute to the over-representation scores. Exercise 6 On the homepage, click the button Pathway Analysis. When the submission form appears click on the Example button. Select the radio button for Overrepresentation analysis Your submission page should look like this: 19 Click the button marked Analyse. 1. What is the most significantly over-represented top-level pathway for this dataset? 2. How many genes are in this pathway, and how many were represented in the dataset? 3. Why is the top-level pathway Chromosome Maintenance higher in the list than Signalling by Wnt when the latter has a more significant probability score? (Hint – use the Open All button) 4. Can you interpret these results in terms of the underlying biology? (Hint: good luck, there are many correct answers!) Species Comparison Reactome uses manually-curated human pathways to electronically ‘infer’ their equivalents in 19 other species. A full description of the inference process can be found on the Homepage under Documentation, Orthology Prediction. The Species Comparison tool allows you to compare human pathways with these predicted pathways, to see what is common to both or perhaps missing in the model organism. Species Comparison is launched using a button on the sidebar, on the left side of the homepage. On the resulting page is a selection tool that reveals a dropdown list of species. Choose one and click the Apply button. 20 It may take some minutes before the results appear. The results page will look something like this: 21 The table contains one row for each Reactome pathway. The columns represent: 1. Pathway name. 2. The species used for comparison with human. 3. Number of proteins in the human pathway. 4. Number of proteins inferred to exist in the comparison species. 5. Graphical representing the ratio of values in column 3 and 4. 6. A button that launches the Pathway Browser and displays the relevant pathway diagram. These columns can be sorted by clicking on the title of the column at the top of the table. A small white arrow appears next to the title, indicating the direction in which the column's contents have been sorted. Clicking the title a second time causes the column to be sorted in the opposite direction. Clicking on a view button launches the Pathway Browser and displays the relevant pathway diagram (see example below). The nodes are colour coded: Yellow indicates that the protein's orthologue is present in the comparison species. Blue indicates that the protein is only known in human, no orthologue could be found in the comparison species. Grey indicates that inference was not possible, used for small molecules and genomic objects that have no UniProt entry (or did not at the time the pathway was constructed). Black means the entity is a complex. Right click on the complex to reveal a grid representing the proteins in that complex: 22 Within the grid each cell represents one of the proteins in that complex, hover the mouse over it to see its name. The cells are colour coded as described above. N.B. The grid is always 3 cells wide, pale grey is the background, not a protein ‘missing’ in the comparison species. Refer to the Navigating Pathway Diagrams section for more information on the diagrams. Exercise 7 Launch the Species Comparison and select the species Rattus norvegicus. When the results are displayed, open the pathway Complement Cascade. 1. Find Complement factor B (bottom left in the diagram) - what colour is it? What does that mean? 2. What other species is this protein inferred to be present in? Hint: You can answer this question without rerunning Species comparison. 3. Find Complement factor 2 (top middle) – why is it blue? 4. Find C3b (top left corner) – Why is it black? How many proteins contribute to this object? Are they predicted to exist in Rattus? 5. Why is Calcium grey? Expression Analysis This tool allows you to visualize user-supplied expression data (or any other numeric value, e.g. differential expression, GWAS scores, etc.) superimposed onto Reactome pathways. It is launched using the Expression Analysis button on the Sidebar, on the left-hand side of the homepage. A submission form will appear. 23 To see an example of the expected data format, or to try the tool without submitting your own data, click on the "Example" button. You can type or paste your IDs into the text area provided, alternatively upload a file using the Browse button. Microsoft Excel and tab separated value (TSV) are both accepted. The first data column must identify the protein, ideally with UniProt IDs. Many other identifiers are supported including Affymetrix, Agilent and Illumina probe IDs, refer to the User Guide for full details. The second and any subsequent columns should contain numbers (expression or other measure). Each column is treated as a new ‘sample’ and used to generate an independently-coloured pathway diagram. These images can be viewed sequentially; this is particularly useful for timepoints or disease progression. Having identified the data to submit, click the "Apply" button. The results may take some minutes to appear. The results page will look something like this: 24 The table contains rows for each Reactome pathway. The columns represent: 1. Pathway name 2. Species 3. Total number of proteins in the pathway. 4. Proteins in the pathway represented in the submitted data. 5. A graphic representing the ratio of the values in column 3 and 4. 6. A button that launches the Pathway Browser and displays the relevant pathway diagram. The columns can be sorted by clicking on the title of the column at the top of the table. A small white arrow appears next to the title, indicating the direction in which the column's contents have been sorted. Clicking the title a second time causes the column to be sorted in the opposite direction. Clicking on any of the view buttons launches the Pathway Browser and displays the relevant pathway diagram (see example below). 25 26 Proteins in the pathway diagram are coloured according to their values. The colours form a continuous spectrum from red for the highest values to dark blue for the lowest values. The scale automatically adjusts to fit the range represented in the dataset. The submitted identifier and value are overlaid onto the protein box. Grey boxes are proteins or small molecules with no associated values in the input data. Black nodes represent complexes that have values for at least one of the proteins. The value associated with each component of a complex can be viewed; right click on the complex and select ‘Display Participating Molecules'. A popup is displayed representing the complex as a grid of coloured cells, each representing a protein within the complex. Grey cells represent proteins with no associated numeric values in the data. Hovering the mouse pointer over a cell displays the name of the protein. Clicking on the cell displays details of the protein in the Details pane. The grid can be closed by clicking the 'x' in the top right corner. The Experiment Browser toolbar (pale blue, at the bottom of the Pathway Diagram) allows you to step through the columns of your data, e.g. time-points or disease progression. Move between them by pressing the arrow buttons. The header of the data column (if present) is displayed between the arrows. The pathway diagram will re-colour to reflect the new values. Exercise 8 Launch Expression Analysis and load the example dataset (The data was obtained from ArrayExpress and comes from the processed results of an experiment studying the differentiation of KG1 cells.). Click Analyse. When the results are displayed, find the pathway Nucleotide Excision Repair. 1. How many proteins are in this pathway? 2. How many had expression data? Click on the View button to see this pathway in the Pathway Browser. Use the Experiment Browser toolbar to cycle through the timepoints. 3. Which protein has the greatest change of expression? 4. Find the complex ‘Active Pol II complex with repaired DNA template:mRNA hybrid’ (top right of the diagram). Which component of the complex has the highest expression at 24h? 5. What was the probe ID used to measure expression of this component? Molecular Interaction Overlay Molecular Interaction (MI) overlay allows protein-protein or proteincompound interactions to be overlaid (superimposed) onto the pathway diagram. To add interactors, right click on the protein and select ‘Display Interactors’. 27 The default source of the interactions is IntAct, but this can be easily reset to use other sources of interaction data (protein-protein and protein-compound) including a user-supplied list. A maximum of ten interactors are displayed as a ring of boxes connected by blue lines to the selected protein. A white box containing the total number of interactors is superimposed onto the selected protein, up to a maximum of 50. Above 50 the box will display 50+. Multiple proteins can have interactors displayed. Details of up to 50 interactors for every protein in the displayed pathway can be viewed as a table via the Analyze, Update and Annotate button. If you chose Export table, all available interactors from the selected data source will be included (if more than 50). Hovering the mouse pointer over a protein interactor produces a pop-up containing the name and ID of the protein. Hovering over a chemical interactor displays the name of the chemical. Clicking on the line that connects the interactor to the pathway item links to details of the interaction at the source database. While interactors are displayed, right clicking on the selected protein produces two new options, Hide Interactors which removes them, and Export Interactors. Interactions are exported in PSI-MITAB format. Several items on the pathway diagram can be sequentially selected for interactor display. If the same interactor is connected to more than one item it is reused, i.e. connected to all the selected items in the diagram. The Analyze, Update and Annotate Button This button, located on the Menu Bar, produces a control panel that has several functions. It has two tabs – one allows configuration of Molecular Interactions (the other contains tools for overlaying expression data onto the pathway diagram and for generating an alternative inter-species pathway-comparison view, explained in detail in the section ‘The Analyze, Update and Annotate Button’ below). MI Overlay tab This tab has the following features: 28 Interaction database – a dropdown list allows selection of a source for interactions data. The default is IntAct. Upload a file – use this to submit your own list of interaction data. It accepts PSI-MITAB format, in its simplest form this is column 1 = accession numbers, column 2 = names. If you want to use the colour by confidence score feature these need to be in column 3. If the upload is successful, the label you submit when prompted will appear on the Interaction Database dropdown and be selectable as a source of interactions. Submitted datasets persist for the session. Clear overlay – removes all interactors from the pathway diagram. Submit a New PSICQUIC service – can be used to add a new source interactions database, see the Reactome User Guide for details. Set Confidence Level Threshold – allows the user to set a confidence threshold, used for colouring interactors. The default threshold is 0.5. Interactions (with confidence scores) either side of this are coloured according to the settings for Above and Below colours, when you press the 'Colour' button. The colours can be changed by clicking on the coloured Hide Table of all interactors for pathway – this button switches off the table of interactors. Clicking on the blue squares within this table will add/remove interactors from that protein. Clicking on the protein name will cause the pathway diagram to centre on the protein represented in the first column interactors. Export all interactors for pathway - exports all interactors (no 50 limit) in PSI-MITAB format. Exercise 9 Open the pathway diagram for Netrin-1 Signaling. 1. Find the protein SHP2 (top left of the cytosol). Right click on it and select Display Interactors. How many are there? 2. How many times has the interaction between SHP2 Adapter protein GRB2 been documented? Hint: This detail is not in Reactome. 3. Find the protein SRC (to the right of SHP2). Display interactors for this protein. How many are there? Can you get a list of them? 4. Display interactors for UNC5B (bottom left of the cytosol). What happens and why? 5. What is the easiest way to remove interactors? Other Reactome Features The tutorial ends with the above exercise, but there are many other features that were either mentioned in brief or not mentioned at all – if you 29 are interested, some of these are explained below. Full details of Reactome are available in the online User Guide. Features and Tools in the Pathway Browser Within pathway diagrams, right-clicking on the box representing a molecule, complex or reaction presents the user with a menu or list of features dependent on the nature of the item selected. Note that unavailable options do not appear, so very few items will have the full range of options. Other Pathways This option displays a list of the other pathways that include the selected item as a participant. Clicking on any of the pathway names will display that pathway in the Pathway Diagram Panel (see GRB2:SOS example below). Display Interactors Retrieves a list of molecules that are interactors of the selected item. The source depends on the currently selected interaction database. The default is IntAct, other sources of interaction data (protein-protein and protein-compound) can be selected using the Analyze, Update & Annotate button on the Menu Bar. A maximum of ten interactors are displayed as a ring of boxes connected by blue lines to the selected item. If more than 10 were available, a white box is superimposed onto the selected item, the number inside is the total number of interactors. Clicking on an interactor box links to the interaction details at the source database. Once interactors are displayed, right clicking again produces two new options, Hide Interactors which removes them, and Export Interactors. Interactions are exported in PSI-MITAB format. Several items can be sequentially selected for interactor display. If the same interactor is connected to more than one item it is re-used, i.e. connected to all the selected items in the diagram. 30 Participating Molecules Lists the component molecules for a complex or set. Clicking on a molecule in the list opens a new window with details (example below for Activated Raf 1 complex:MEK2). Go To Pathway When a Subpathway Box is selected, this displays the associated Pathway Diagram. Viewing the Equivalent Pathway in Another Species Reactome is human-centric and aims to represent human biology. Pathway in other species are electronically inferred from curated human pathways – a description of the inference process can be found here. To view the predicted conservation of a pathway in another species, select the species of interest in the "Switch species" dropdown menu in the upper left corner of the window. The pathway diagram will re-draw (a revolving arrow icon on the Pathways tab indicates the diagram is redrawing) in the Pathway Diagram Panel. Any reactions that were not inferred will be absent. 31 Molecular Interaction Overlay Molecular Interaction (MI) overlay allows protein-protein or proteincompound interactions to be overlaid (superimposed) onto the pathway diagram. The source depends on the currently selected interaction database. The default is IntAct, other sources of interaction data (proteinprotein and protein-compound) including a user-supplied list can be selected using the Analyze, Update & Annotate button on the Menu Bar. A maximum of ten interactors are displayed as a ring of boxes connected by blue lines to the selected protein. A white box is superimposed onto the selected protein, the number inside is the total number of interactors up to 50, if more than 50 are available 50+ is displayed. Details of up to 50 interactors of all proteins in the pathway diagram can be viewed as a table accessed via the Analyze, Update & Annotate button. An extended version of this table with no 50 row limit can be exported. Hovering the mouse pointer over a protein interactor produces a pop-up containing the name and Uniprot accession of the protein. Hovering over a chemical interactor displays the formula and name of the chemical. Clicking on an interactor box links to details of the interaction at the source database. While interactors are displayed, right clicking on the selected protein produces two new options, Hide Interactors which removes them, and Export Interactors. Interactions are exported in PSI-MITAB format. Several items can be sequentially selected for interactor display. If the same interactor is connected to more than one item it is re-used, i.e. connected to all the selected items in the diagram. The Analyze, Update and Annotate Button This button, located on the Menu Bar, produces a control panel that has several functions. It has two tabs – one allows configuration of Molecular Interactions, the other contains tools for overlaying expression data onto the pathway diagram and for generating an alternative inter-species pathway-comparison view. MI Overlay tab This tab contains the following features: Interaction database – a dropdown list allows selection of the source of interactors. Upload a file – allows a user supplied list to be used as the source of interactors. Data must be in PSI-MITAB format though the only columns that need to be filled in are the accession number, gene name and confidence score columns (confidence score is only necessary if you want to use the color interactions feature). If the upload is successful, the label you submit when prompted will appear on the 'Interaction Database' drop down and can be selected as a source of interactions. A data set submitted in this manner will persist for the user session. 32 Clear overlay – removes all interactors from the pathway diagram. Submit a New PSICQUIC service – can be used to add a new source interactions database, see the Reactome User Guide for details. Set Confidence Level Threshold – this allows the user to set a confidence threshold used for colouring interactors. The default threshold is 0.5. Interactions with a confidence level below this threshold will be coloured according to the colur set as the 'Below' colour, interactions with confidence scores equal to or above the threshold will be coloured with the 'Above' colour. Pressing the 'Colour' button activates colouring of the pathway diagram - interactions will be coloured only if a confidence score is available at the source database. To switch off colouring mode press the 'Colouring Off' button. The colours used can be changed by clicking on the coloured squares for 'Above' and 'Below'. A dialog allows selection of an alternative colour, click 'Apply' to update the colours displayed. Hide Table of all interactors for pathway – this button switches off the table of pathway item-interactor pairs that is otherwise displayed below. Clicking on the blue squares within this table will cause the pathway diagram to centre on the protein represented in the first column and display its interactors. Export all interactors for pathway - exports in PSI-MITAB format. The Expression & Species tab This tab controls the colouring of pathways according to expression values or generation of an alternative species comparison view for the pathway. Expression Analysis – expression painter To activate colouring by expression values, the user must submit a file of protein identifiers and numeric values, typically expression levels. The first column must be a protein identifier, ideally Uniprot ID or another identifier that can be mapped to proteins, e.g. Affymetrix or Illumina probe IDs. Many ID types are acceptable, see the User Guide for a complete list. The second and any subsequent columns contain numeric (expression) values. Each column is treated as a new ‘sample’ and used to generate an independently-coloured pathway diagram. These images can be viewed sequentially as a ‘movie’. This is particularly useful for timepoints or disease progressions. Use the Browse button to identify the file, and once the name is displayed, click the Submit button to upload. To ‘paint’ the pathway diagram click the Expression Painting On checkbox (if the file did not upload this will produce a warning message). This will overlay colours representing the numeric values on the pathway diagram (see below). 33 Proteins in the pathway diagram are coloured according to their associated numeric values (typically expression levels, but could be differential expression, or any other measure). The colours form a continuous spectrum from red for the highest values to dark blue for the lowest values. The scale automatically adjusts to fit the range represented in the dataset. The protein identifier and numeric value are overlaid onto the protein box. Grey boxes are proteins (or small molecules) with no associated values in the input data. Black nodes are complexes that include at least one protein represented by numeric data. The values associated with each component of a complex can be viewed; right click on the complex and select ‘Display Participating Molecules'. A popup is displayed representing the complex as a grid of coloured cells, each representing a protein within the complex. Grey cells represent proteins with no associated numeric values in the data. Hovering the mouse pointer over a cell displays the name of the protein. Clicking on the cell displays details of the protein in the Details pane. The grid can be closed by clicking the 'x' in the top right corner. At the bottom of the diagram a pale blue Experiment Browser toolbar is displayed. This allows you to step through timepoints or experiments if more than one column of numeric values was included in the submitted data. Move between these by pressing the arrow buttons. The header of the data column is displayed between these arrows. The pathway diagram will re-colour to reflect the new numeric values. 34 To turn off expression painting mode, uncheck the box to the right side of the Analyze, Update and Annotate button. Species comparison - Other species view To produce the Other species pathway comparison view, select a species from the dropdown list. The pathway diagram will be coloured according to the success of Reactome’s pre-computed inference of equivalent reactions in the non-human species. Objects on the pathway diagram are colour coded: Yellow indicates that the protein's orthologue is present in the comparison species. Blue indicates that the protein is only known in human, no orthologue could be found in the comparison species. Grey indicates that the molecule was not inferred to exist in the comparison species, but is also used for small molecules where this comparison is not relevant. Black means the entity is a complex. Right click on the complex to reveal a grid representing the proteins in that complex: Within the grid each cell represents one of the proteins in that complex, hover the mouse over it to see its name. The calls are colour coded as described above. N.B. The grid is always 3 cells wide, pale grey is the background, not a protein ‘missing’ in the comparsion species. Searching Reactome Simple text search 35 The simple text search tool is located top left of the homepage. Many other Reactome pages include this search bar. To search Reactome for content related to protein kinases: Click Go! to start the search. The results will be similar to those below: Results have an associated type: Reaction; Pathway; Protein or Other. Type is indicated by an icon and a type name preceding the title of the result. Click on the title to go to the corresponding Reactome web page. Most results will have descriptive text details. Your search terms will be highlighted if they appear within the title or descriptive text. At the top of the results page is a set of tick boxes, the Type Selector Bar. This includes a count by type (pathways, reactions, proteins, other) for the displayed results, and allows you to limit results by type. Uncheck boxes for type categories that you don’t want to see, e.g. if you only want results of type Proteins, uncheck Pathways, Reactions and Others, then click the Show button. Searching Pathway Diagrams 36 The Pathway Browser has a diagram-specific search tool. This is located to the right side of the Menu Bar. When a pathway name or keyword is entered in the search box it auto-completes, offering a list of proteins or pathways which match the search term. Select one of these and press the Search button or Return key to perform the search. The protein, small compound or pathway matching the search term will be highlighted on the Search Results tab on the sidebar. If a protein or compound, click on the name to produce a list of the pathways that it is part of. Details Views Pathway Details Pathway details appear in the Detail Panel at the bottom of the Pathway Browser when a pathway is selected in the Pathways tab: 37 This provides a summary of the pathway and may contain a figure and/or links to literature at PubMed that provide background for the pathway. Other details within this section include: Authored - the expert biologists that contributed materials that allowed this pathway to be created in Reactome. Reviewed - the expert biologists that verified the content for this pathway. Represents GO Biological process – identifies the GO term for the biological process associated with this pathway. Preceding/Following Events – links to pathways that precede/follow this pathway. Equivalent event(s) in other organism(s) – identifies whether this pathway has been inferred in other species included in the Reactome inference process. Download Pathway – links to download the pathway in several formats allowing reuse with other tools, or as a pdf for reference. 38 Reaction Details Reaction details are displayed in the in the Detail Panel at the bottom of the Pathway Browser when a reaction is selected in the Pathways tab or a reaction node is selected on a Pathway Diagram: This provides a summary of the reaction and may contain a figure. It will contain links to literature at PubMed that provide experimental data verifying the reaction, or a an ‘Inferred From’ link Other details within this section include: Authored - the expert biologists that contributed materials that allowed this pathway to be created in Reactome. Reviewed - the expert biologists that verified the content for this pathway. Input/Output – identifies the input/output molecules, sets or complexes for this reaction. Icons to the right of these named items link to further information in external resources, e.g. the red U on a grey background links to Uniprot. Cellular Compartment – identifies the cellular compartment for this reaction, with the associated GO term. Preceding/Following Events – links to pathways that precede/follow this reaction. Complex or Set Details Details the members of the complex or set are displayed, similar to the details for reactions with some additional details: 39 Is represented by generalisation(s) – identifies supersets that include this set/complex as a member Catalyses events – is a catalyst for the events named on the right. Produced by events – is an output of the reactions named on the right. Hierarchical view of the components – identifies all the components of a complex in a hierarchical manner, identifying sub-complexes before splitting them into their components. Protein/Small molecule Details Clicking on a single protein or small molecule in the pathway diagram, or where identified in the detail pane as part of a reaction, set or complex, updates the Detail Panel with details of that protein/small molecule. The fields represented include: Name – the name used in Reactome Links to corresponding entries in external databases – the identifiers used in external databases, act as links to them. Other identifiers related to this sequence – secondary identifiers from external databases. Is represented by generalisation(s) – sets that include this protein/small molecule. Component of – complexes that include this protein/small molecule Catalyses events – reactions where this protein acts as a catalyst. Other forms of this molecule – representation in Reactome of posttranslationally modified forms of this protein. The BioMart interface BioMart is the name of a project and a corresponding software suite that allows rapid bulk searches of large databases, and the ability to link between databases so that a single query can retrieve data from more than one database. There are several means to access BioMart data including a standardized web interface. Databases available in BioMart include Reactome, Uniprot, HGNC, HapMap and ENSEMBL genome databases. BioMart is accessible via the Navigation Bar. Click on the Tools, select BioMart: 40 Once the BioMart page has loaded it should look like this: Click on Choose Database to open the dropdown list, and select Reactome. At this point, you need to decide which Reactome dataset you want to interaction, pathway, reaction, or complex. Set your choice using the Choose Dataset dropdown list: Now that a Dataset has been selected, the panel on the left-hand side of the page will open out and display Filters and Attributes that you can set to configure your query: 41 Clicking on “Filters” opens a list of options in the panel on the right-hand side of the page. Filters limit the information that is returned by your query; if you don’t set filters, BioMart will return the entire database. These terms are essentially the query terms, e.g. you can search using a list of Entrez Gene IDs by checking the box for Limit to Reactions containing these IDs, and selecting Entrez Sequence IDs from the dropdown list. Submit your IDs by pasting into the box or Browsing for a file that contains them. When you set a filter, its name appears underneath Filters in the left-hand panel. Attributes determine the information that is returned by your query. Some are set to be returned by default; these are displayed under Attributes in the left-hand panel. Click on Attributes to specify additional attributes to be returned in the results of your query. Once you have set up the Filters and Attributes, click on Results to run the query. Results are presented in a tabular form and by default, only the first 10 results are shown. You can display up to 200 results lines in a single page. In the panel above the results table, you can export the entire results set in a number of different formats: 42 TSV means tab-separated (delimited) values, XLS is the spreadsheet standard used by Excel. Check the Unique results only box to remove duplicates from the results. Exercise 10 Go to the Reactome BioMart page. How would you select the “pathway” dataset? If you were interested in the protein “Nucleolar transcription factor 1” (UniProt ID P17480), how would you identify pathways in Reactome involving that protein? How many pathways does your query find? How would you find the UniProt IDs of the other proteins in the first of the pathways that you discovered? Save the results list from 3) in tab-delimited format. How many lines of output are there, excluding column headers? 43 Answers to Exercises The answers may change as more data is added to Reactome; these are based on the June 2010 Reactome release. Exercise 1 1. On the homepage, Mar 20, 2011: Reactome will be taking part in GSoC 2011 2. Found under Content, Statistics 5234 3. Found under Content, Editorial Calendar. March 2011 (needs updating!) 4. First topic is Mycobacterium tuberculosis biological processes (needs updating!) Exercise 2 1. The top pathway hit is ‘Signaling by Notch’. This has 7 subpathways, revealed by clicking the + symbol to the left of the pathway name. 2. The first subpathway ‘Transport of Notch receptor precursor to golgi’ has 4 reactions. 3. It’s tempting to think it will be the next reaction down in the hierarchy, and often this is the case, but not for this reaction. The hint suggests opening the Details pane. One of the details is Following events: this will always correctly identify the next reaction step in a pathway. The answer is ‘Notch 2 precursor cleaved to form a heterodimer’. An visual way to identify preceding and subsequent reactions is to simply follow the connecting lines on the pathway diagram. Exercise 3 1. The reaction node for this reaction is a solid circle, representing a binding reaction. 2. The reaction node symbol for this reaction is an open square, representing a ‘transition’ reaction. 3. You could drag the diagram around to find this reaction, but it’s easier to use the pathway hierarchy. This reaction is in a subpathway of Effects of PIP2 hydrolysis, Arachidonate production 44 from DAG, so first you have to open the subpathway with the + symbol. Click on the reaction name to find it in the diagram. The descriptive name for the catalyst is the label for the catalyst object, Monoglyceride lyase. To find the Uniprot ID click on monoglyceride lyase to see details in the Details pane – the Uniprot ID is listed as Reference entity – Q99685. Click on the reaction node to find see details of the reaction in the Details pane, outputs are Arachidonate and Glycerol. Exercise 4 1. Left-clicking to select the sub-pathway object for ‘Intrinsic Pathway for Apoptosis’ does not open the diagram. To do this either a) rightclick and select the option Go To Pathway, or b) click on the subpathway name, highlighted in the pathway hierarchy on the Pathways tab. 2. ‘Intrinsic Pathway for Apoptosis’ has sub-pathways, click on the + symbol to reveal them. When you click on the sub-pathway ‘BH3only proteins associate with and inactivate anti-apoptotic BCL-2 members’ the 3 reactions in this sub-pathway are selected on the pathway diagram, indicated by green boxes around the reaction nodes. 3. Clicking on ‘Sequestration of tBID by BCL-2’ will select just this reaction and centres the pathway diagram on this reaction (unless you are zoomed out – if you were try zooming in and drag the diagram so you can’t see this reaction, then click on it in the hierarchy again). Exercise 5 1. 'Activated type I receptor phosphorylates R-SMAD directly is part of the pathway ‘Signaling by TGF beta’. 2. The answer to this question is in the Details pane if you click on the reaction in the pathway hierarchy, under Cellular compartment – early endosome membrane. 3. The associated GO molecular function is ‘transmembrane receptor protein serine/threonine kinase activity’. 4. The reference is Souchelnytski et al. 2002. 5. Yes for dogs, not for yeast, can be answered by looking at the detail ‘Equivalent event(s) in other organism(s)’ 45 Exercise 6 1. Gene Expression 2. 413 genes in the pathway, 174 represented. 3. Chromosome Maintenance is higher in the list than Signalling by Wnt because the former has a sub-sub-sub-pathway, Telomere Cstrand (Lagging Strand) Synthesis that has a more significant probability score than that of Signaling by Wnt. The top-level pathway inherits this from the sub-pathway, pushing it up the list. 4. Any answer that suggests ‘something that increases cell growth and division’ is correct! Exercise 7 1. Complement factor B is yellow, indicating that it has been inferred to exist in rat. 2. You can answer this by clicking on the box for Complement factor B and looking at its details – look for ‘Entities deduced on the basis of this entity’ – this lists the other species that have been inferred to have Complement factor B. 3. Complement factor 2 is blue, indicating that it has not been inferred to exist in rats. 4. C3b is black because it is a complex. Right-click on it and select the option ‘View participating molecules’ to see which are predicted to exist in rat. They are 2 components, both blue, so not predicted to exist in rat. 5. Calcium is grey because it is a small molecule, species comparison is not relevant. Other pathway objects that have no Uniprot ID will also be grey. Exercise 8 1. 49 proteins in the pathway, 2. 45 have expression values in the dataset. 3. It’s easier to see this if you zoom out so all the pathway is visible. I think it’s HR23B, top left of the diagram, but I have not checked all the complexes... 4. DNA-directed RNA polymerases I, II, and III 17.1 kDa polypeptide (RPB17) 46 5. The probe ID is displayed if you mouse over the cell, it was the identifier used in this dataset – 209302_at. Exercise 9 Open the pathway diagram for Netrin-1 Signaling. 1. SHP2 has 6 interactors. 2. Click on the line between SHP2 and Adapter protein GRB2 to open a page describing the interaction at the source database, IntAct. This interaction has been identified 3 times, but if you look at the associated pubmed IDs, there is only 1; the paper describes 3 different technologies that identify the same interaction. 3. SRC has more than 50 interactors, indicated by 50+ in the interaction count box (you need to be zoomed in to see this). You can see these 50 in the table within the Analyze, Annotate & Upload panel; if you export the interactions you will get all interactions available from the selected source, in this case IntAct. 4. UNC5B has no interactors, a prompt tells you this. 5. There is a Clear button in the Analyze, Annotate & Upload panel. Exercise 10 Click on Choose Database and select Reactome, then click on Choose Dataset and select Pathway. Add a Filter, limiting to pathways containing a list of IDs, select Uniprot IDs from the dropdown list, enter P17480 in the box. The query returns 9 pathways. Copy the Pathway Stable ID, REACT_2232, change the filter so it limits by Pathway Stable ID, enter REACT_2232 in the box. Add the Attribute Protein UniProt ID. You should have 2 results returned. You will be asked to save your chosen download format to your computer. 2 lines of output. 47 Printable Key to Reactome Pathway Diagrams 48