Sandra Orchard and Samuel Kerrien (26/05/10) May 2010 A tour of the IntAct Portal http://www.ebi.ac.uk/intact The IntAct curation policy is to provide the user with all the experimental detail described in the originating paper, with all entries being fully IMEx- [1] and MIMIx-compliant [2] and providing extra levels of detail beyond these minimum requirements. To do so, IntAct makes extensive use of a number of controlled vocabularies, primarily PSI-MI [3] to describe the technical details of the experiment, binding sites, protein tags and mutations and Gene Ontology [4] to describe the subcellular location an interaction may be shown to occur in or the function of an enzyme in an enzyme/substrate assay. Interacting molecules are systematically mapped to stable identifiers from public databases such as UniProtKB for proteins [5], ChEBI for small molecules [6], Ensembl for genes [7] and the DDBJ/EMBL/GenBank nucleotide databases for nucleic acids [8]. Features within a molecule, such as a binding site on a protein, are mapped to the sequence/structure given in the under-lying database and remapped should a new version of the underlying sequence be released. Binding sites are also cross-referenced to the InterPro database [9], whenever possible. This work is licensed under the Creative Commons Attribution-Share Alike 3.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. 1 1. Overview of the portal’s layout This web site is uses tabs to give you quick access to the various views of the data; we are now going to review briefly these tabs and how they relate to one another: Home: presents the IntAct project, gives access to documentation and to the download section of the project. Search: allows our users to search for interactions, every search request will result in opening the interaction tab to show results. Interactions: Shows the interactions selected by a user query. Binary interactions are shown in a table, in which the user can configure the columns displayed. Various links allow the user to gather more information: - about the interacting molecules, for instance by clicking on the Dasty2 icon which will open the Molecule View tab, - about the interaction, by clicking on the magnifying glass in the first column of each interaction displayed. This will open the Interaction details tab This tab also allows users to download the result of their query in standard formats such as PSI-MI XML or PSIMITAB. Browse: allows users to list molecules interacting in the current set of interaction selected. Users can narrow down the current dataset by browsing the GO ontology and apply a filter by selecting a term. Finally one can link out to other resources by using the current set of molecule interaction, for instance by sending interacting proteins to the Reactome Sky Painter and see in which Pathway the molecules are also known to play a role. List: allows users to browse the list of interacting molecules by type (protein, small molecule and nucleic acid). A subset of these molecules can be selected and used to select a new set of interactions or link out to other resources. Interaction Details: only activates once an interaction has been selected from the interaction tab. It shows the full details that were captured by our curators, such as: textual annotation, more cross references, the complete list of participants (as opposed to spoke expanded interaction), binding domains and other experimental features such as sites of mutational analyses. Molecule View: only activates when clicking on the icon that you will find in the Interaction and List tabs. It shows more information about the interacting molecules by gathering data from IntAct and the reference databases and aggregates them using a DAS client. Thus, protein sequence can be annotated and annotation overlaid. Also 3D structure can be visualised when available from PDB and sequence annotation overlaid on it. Graph: shows the current interaction network in our simple viewer and gives the option to users to open their query in Cytoscape, thus allowing more interactive manipulation of interaction networks. 2 1. Searching Interactions (Search Tab) As you can see, in this tab we are now trying to give a targeted choice when you perform your queries. Please note that the examples provided in this tab are live links, so you can simply click them to see the resulting interactions sets. a. Using the Quick Search In this search panel you are free to type anything that might relate to interactions, whether it is properties of their interactor (gene name, Accession Numbers, GO term…) or more specific to the interaction such as publication ID, authors, experimental detection method, … Try the query: imatinib This is a drug for which we have curated a number of interactions. Once you press the search button you should be taken to the Interaction Tab that lists 130 binary interactions. Then try the query: lck You will see for this proteins that you will get results for the human, mouse and rat proteins expressed from the gene with this name as you have not specified a species. You will also get additional results from both the PSICQUIC and IMEx services with this query. Click on “Your query matches n interaction evidences from n other databases” Select the results from ChEMBL – this is a drug-target database and LCK has been a popular target for the pharmaceutical industry. Return to IntAct. If you want to construct more complex queries we recommend you take a look at the Molecular Interaction Query Language, accessible from the quick search panel. Try the query: species:yeast AND type:direct This query selects all interactions involving yeast interactors that have been shown to have direct interactions. If you customize the column display of the interaction tab, you will see that not only “direct interaction” have been selected but also children terms in the PSI-MI ontology. This type of search is difficult, however, if you do not have an understanding of MIQL and may be easier to perform using the Advanced Search (see later) b. Using the Ontology Search Open the Search Tab. This panel is specialised to give you an easy access to ontology search. So far you can search on 4 ontologies: 3 Gene Ontology InterPro PSI-MI ChEBI Whenever you start typing a query in this search panel, the system will search as you type and propose a list of matching controlled vocabulary terms. You can then select one of them and select matching interactions. Type: cancer in the Ontology Search box. You will be presented with a few choices, please note that each term is followed by the count of matching interactions in the IntAct database. Select a term using the keyboard cursor keys, complete the search and you will be taken to the interaction tab. Return to the Search Tab c. Limiting the scope of search result using the filter panel Now that you know how to search for molecular interaction data we will see how to limit the scope of your searches. After the preliminary search, your results are a mixture of experimental binary data and binary data derived by performing a spoke expansion of co-complex data. IntAct allows you to filter the expanded binaries out of your final dataset, should you wish. Ticking the corresponding boxes, you will on search within these categories. Try the query: +species:yeast +type:direct Under the heading tabs, there is a statement > n binary interactions were found. n of them are originated from spoke expanded cocomplexes and you may want to filter them. Click on filter and see the effect on your final number of interactions. d. Advanced Search Clicking on the “Fields” button to the right of the Quick Search box will open up the Advanced Search, allowing you to specify one or more fields you wish to search in, and building the query for you as you progress. Click on “Fields” and select “Organism” from the Pulldown menu – type in Human as your organism. Further refine the search by adding “Detection method” as “Experimental” – you should see a slight drop in interaction number on the Interaction tab as some inferred data is filtered out. Finally, actively filter out all the two hybrid data by checking the NOT box and selecting “Detection method” as “two hybrid” – you should loose about 10,000 interactions as all the two hybrid (and child thereof) data is removed. 4 2. The new Interaction browser (Interaction Tab) In this tab, we display the list of interaction that you have selected using one of our search features. Despite the fact that our data are annotated to accurately reflect the interactions reported in scientific literature, the data is shown in this view as binary interactions. Whenever the data was reported as a co-complex involving more than two molecules, we store it as such in the IntAct database and post-process it so the portal can show it as binary interaction. This post-processing is the Spoke Expansion model (connects bait to all preys): At any moment you can choose to display the expansion column in this view in order to see which interaction are spoke expanded and which are not. Perform a Quick Search using the UniProtKB accession number O96017 a. Features to Note - Icons in the first two column indicate the roles of the molecule in the interaction - Icons with hyperlinks to the reference databases of the interacting molecule, as well as links to our molecule viewer: Dasty2, which will give you more information about the molecule (once clicked, it opens the Molecule View tab). Clicking the UniProt icon will open the entry in the UniProtKB webpage. - Users can retrieve their interaction set using standard formats such as PSIMI XML and PSIMITAB by using the “Export to” function. Scroll down and look to see if you can find an example of an isoform (hintlook for O96017-*). Open this molecule in the UniProt view and find the sequence of this particular isoform. Return to the entry. b. Configuring the view to your need In the header of the interaction table you will find a button: ‘Change Column Display’ that will show you all the columns available and allow you to update the current selected set. Add “First author” to the view and then count how many of these entries come from Dozier et al. (2004). c. Downloading the data into Standard formats In the header of the interaction table you will find a drop down list that contains all the formats currently supported when downloading the interaction 5 data. Select one of them and click the export button next to the list. Please note that PSI-MI XML is only available when the interaction set is no bigger than 1000 interactions. d. Opening the interaction details Clicking on the magnifying glass in the first column of the interaction table will open the details of the corresponding interaction in the Interaction Details tab, giving you access to more details of the manually curated record. Open interaction EBI-1263190 (the interaction between Cds1 and RNF53). How many molecules are involved in this one bait, multiple prey interaction? What post-translational modification of O96017 is recorded as potentially playing a role in the interaction (note - check the experimental-feature)? 6 3. Browsing (Browse Tab) This tab gives you access to more content, based on the currently selected set of interactions. As you can see, in this tab we have grouped features under the interacting molecule types. Please note that some of these functionalities will only allow you to include up to 200 molecules - if you exceed this number you will see the warning icon . Note – you will need to have the Pop-up window enabled for this to work. Now let’s look at the features available to you: a. Listing the molecules involved by specific type Clicking on the ‘List All’ present under each molecule type will open the corresponding list of molecule in the List Tab. b. Limiting the scope of the current dataset with the GO ontology Allows users to browse the GO hierarchy as a tree and select terms in order to narrow down their dataset. Once a term is selected, you are taken back to the interaction tab to review your dataset. c. Limiting the scope of the current dataset with the ChEBI ontology Allows users to browse the ChEBI hierarchy as a tree and select terms in order to narrow down their dataset. Once a term is selected, you are taken back to the interaction tab to review your dataset. d. Bulk linking to third party resources by using involved proteins Proteins by InterPro domain: Opens the InterPro domain search and shows in a single display the proteins interacting in your interaction set. Proteins by Reactome pathway: Sends your proteins to the Reactome SkyPainter that will show you the pathways in which these molecules are know to play a role. Proteins by Chromosomal location: Sends your list of proteins to Ensembl’s Karyotype viewer and overlays the proteins on the chromosomes. Proteins by mRNA expression: Sends your set of proteins to the ArrayExpress Atlas that will show the known gene expression based on experimental studies. Select all the proteins in your current set, and visualise them using Reactome SkyPainter. Which pathways are most heavily represented in this interaction network? 7 Return to the Browse tab and use the Gene Ontology to see if the Biological Process that these proteins are annotated to reflects the results you obtained from Reactome? 8 4. Using molecule Lists (List tab) This tab will show the list of molecules involved in the currently selected set of interaction by interactor type. Further operations are available from each subtype’s tables. However, should you need to display the whole list at once, you can do so by selecting the corresponding option in the drop down list placed at the top right hand side of the table. a. Linking a selected set of molecule to third party resources You can select a subset of the currently displayed molecules by using the tick boxed and then click one of the buttons placed in the table header to open third party resources (similarly to the ones already showed in the Browse tab). b. Building a new interaction set based on a selection of molecules Furthermore, you can also search all interactions involving your selected molecule by clicking the ‘Search interactions’ button. 5. Visualizing your dataset (Graph Tab) You can visualize the currently selected set of interactions by opening the Graph tab. So far this functionality is only available if you have up to 300 interactions but we are working on extending (possibly lifting altogether) this limitation. You will find two ways to view your dataset: a. Simplified network representation This will be shown by default when the tab opens. It shows a static representation of your interaction network. This obviously presents limitations when it comes to dealing with large dataset b. Taking it further using the Cytoscape integration You will find on the right hand side panel the Cytoscape icon that, once clicked, will load Cytoscape using Java Web Start technology (which should be enabled on your computer to work properly). Once Cytoscape has loaded, your dataset if then imported and your network displayed. Please refer to Cytoscape documentation for more information about using this powerful tool. Open the O96017 interaction network in Cytoscape. 9 . On the left hand side of the network are 4 tabs (Network, VizMapper, Editor and Filters), select the VizMapper, under the drop down list ‘Current Visual Style’ choose ‘Sample 1’. Hint: You can also beautify the network by applying a layout to it. Example: Layout > yFiles > Circular In the example, the interaction edge has a different colour depending on the detection method. You can achieve this effect by following this steps: Step 1. In the Visual Mapping Browser (left box), expand the Edge Color node. Step 2. Select detection method as the value for the Edge Color node. Step 3. Select discrete mapping as the value for the Mapping type. You should see the list of detection methods below. Step 4. You could choose your favourite colours for the detection methods, but Cytoscape has an easy way to assign different colours to the values. To do so, right-click on top of discrete mapping and click on Generate Discrete Values > Rainbow 1. There you have it. You can use this powerful system to show labels, change colours and generate a beautiful network. 10 References References 1. 2. 3. 4. 5. 6. 7. 8. 9. Orchard, S., Kerrien, S., Jones, P., Ceol, A., Chatr-Aryamontri, A., Salwinski, L., Nerothin, J., Hermjakob, H. (2007) Submit your interaction data the IMEx way: a step by step guide to trouble-free deposition. 7 Suppl 1, 28-34 Orchard, S., Salwinski, L., Kerrien, S., Montecchi-Palazzi, L., Oesterheld, M., Stümpflen, V., Ceol, A., Chatr-aryamontri, A., Armstrong, J., Woollard, P., et al. (2007) The Minimum Information required for reporting a Molecular Interaction Experiment (MIMIx) Nat. Biotechnol, 25, 894-898 Kerrien, S., Orchard, S., Montecchi-Palazzi, L., Aranda, B., Quinn, A.F., Vinod, N., Bader, G.D., Xenarios, I., Wojcik, J., Sherman, D., et al (2007) Broadening the horizon-level 2.5 of the HUPO-PSI format for molecular interactions. BMC biology, 5, 44 Blake, J.A., Harris, M.A. (2008) The Gene Ontology (GO) project: structured vocabularies for molecular biology and their application to genome and expression analysis. Current protocols in bioinformatics, 7, 7.2 The UniProt Consortium (2009) The Universal Protein Resource (UniProt) 2009. Nucleic acids research, (37), d169-174 Degtyarenko, K., Hastings, J., de Matos, P., Ennis, M. (2009) ChEBI: an open bioinformatics and cheminformatics resource. Current protocols in bioinformatics 14, 14.9 Hubbard, T.J., Aken, B.L., Ayling, S., Ballester, B., Beal,K., Bragin,E., Brent, S., Chen,Y., Clapham,P., Clarke, L. et al (2009) Ensembl 2009. Nucleic acids research 37, D690-7 Tateno, Y. (2008) International collaboration among DDBJ, EMBL Bank and GenBank. Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme 53, 182-189 Hunter, S., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Das, U., Daugherty, L., Duquenne, L. et al. (2009) InterPro: the integrative protein signature database. Nucleic acids research 37, D211-215 Further Reading Kerrien, S., Alam-Faruque, Y., Aranda, B., Bancarz, I., Bridge, A., Derow, C. , Dimmer, E. , Feuermann, M., Friedrichsen, A., Huntley, R., Kohler, C., Khadake, J., Leroy, C., Liban, A., Lieftink, C., Montecchi-Palazzi, L., Orchard, S., Risse, J., Robbe, K., Roechert, B., Thorneycroft, D., Zhang, Y., Apweiler, R. and Hermjakob, H. (2007) IntAct--open source resource for molecular interaction data. Nucleic acids research 35, d561-565 11