Interaction

advertisement
IntAct
A database of Molecular Interactions
Steve Jupe
(sjupe@ebi.ac.uk)
What are protein-protein interactions?
What data are we dealing with ?
Example technique: yeast two hybrid
What data are we dealing with ?
Why are we interested in Interactions ?
1.
As a means of precisely understanding a protein role
inside a specific cell type
1.
Guilt by Association – it may be the only means of
predicting a protein’s function
1.
As building blocks for Systems Biology
What data are we dealing with ?
The scope of IntAct data
Nucleic acids
Proteins
Transcriptomics
Small compounds
What data are we dealing with ?
IntAct goals & achievements
1. Define a standard for the representation and
annotation of molecular interaction data
2. provide a public repository
http://www.ebi.ac.uk/intact
ftp://ftp.ebi.ac.uk/pub/databases/intact
1. populate the repository with experimental data from
project partners and curated literature data
4200+ distinct publications, 228,000+ binary interactions,
68,000+ proteins imported from UniProt
2. provide modular analysis tools
search & advanced search, hierarchView, pay-as-you-go, MiNe…
3. provide portable versions of the software to allow
installation of local IntAct nodes.
Known installation: AstraZeneca, GSK, MERCK, MINT, Proteome Center of Shanghai
IntAct Curation
“Lifecycle of an Interaction”
Sanity Checks
(nightly)
reject
Public web site
Publication
(full text)
abstract
.
exp
accept
p2
I
p1
FTP site
check
CVs
annotate
Curation
manual
IMEx
report
report
MatrixDB
curator
Master headline
Super curator
Mint
DIP
Public data
•
All data is manually curated by expert curators
•
Curation manual rigorously followed
•
All curated data is reviewed by a senior curator
•
All data is made available on FTP site:
Data
ftp://ftp.ebi.ac.uk/pub/databases/intact
(!) data updated every week
(!) format available:
Interaction space
Realistically one publication per working day and
curator
Only a fraction of all published interactions is
captured in interaction databases
The end is not in sight, the interaction space is still
vastly under-sampled
Multiple observations increase confidence
Master headline
Christian Kohler
A very detailed data model
•
Support for detailed features
e.g. define interaction interface, PTMs
•
Remapped every Uniprot update
Interacting domains
Overlay of Ranges on sequence:
Controlled vocabularies
•
Why do we use them ?
• many ways to write the same thing:
yeast two hybrid, Y2H, 2H, two-hybrid, …
•
Full integration of Proteomics Standard Initiative for
Molecular Interactions (PSI-MI) ontology
•
Over 1,500 terms defined and cross-referenced
How to deal with Complexes
•
Some experiment methods generate data for complexes:
e.g. Tandem affinity purification (TAP)
•
To convert a complex into sets of binary interactions, 2
algorithms are available:
Data distribution: PSICQUIC
• Proteomics Standards Initiative Common QUery InterfaCe.
• Community effort to standardise access and retrieval of data from
molecular interaction databases.
• Widely implemented by many interaction databases.
• Based on PSI standard formats (PSI-MI XML and MITAB)
• Not limited to protein-protein interactions, also drug-target
interactions and simple pathway data
• A server (web service) provides data for all members
• A registry lists the resources implementing PSICQUIC
• Documentation: http://psicquic.googlecode.com
PSICQUIC: distributing data over multiple sources
PSI-MI XML format
• Community standard for Molecular Interactions
• XML schema and detailed controlled vocabularies
• Jointly developed by major data providers: BIND, CellZome, DIP, GSK,
HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono, U. Bielefeld, U.
Bordeaux, U. Cambridge, and others
• Version 1.0 published in February 2004
The HUPO PSI Molecular Interaction Format - A community standard for the representation of protein interaction data.
Henning Hermjakob et al, Nature Biotechnology 2004.
• Version 2.5 published in October 2007
Broadening the horizon - Level 2.5 of the HUPO-PSI format for molecular interactions.
Samuel Kerrien et al., BMC Biology 2007.
MIMIx
•Experiments
•Interaction detection method (eg. Yeast two hybrid)
•Participant detection method (eg. Mass Spectrometry)
•Host organism
• Interactions
•Interactors
•Identifiers from public database
•Species of origin
•Biological/experimental roles (eg. enzyme,target /
bait,prey)
•Confidence
IMEx:
The International Molecular Exchange Consortium
• Group of major public interaction data
providers sharing curation effort: DIP, IntAct, MINT,
MPact, MatrixDB, Molecular Connections, InnateDB, MPIDB and
BioGRID
• Independent molecular interaction resources
• Common curation standards for detailed curation
• Common data formats (PSI-MI XML, PSICQUIC)
• Common accession number space
• Coordinated & non-redundant curation
• In production mode since February 2010
• Since 3/2009 supported by the European
Commission under PSIMEx, contract number FP7-HEALTH-2007-223411, with
additional partners Vital-IT, Nature, Wiley, BiaCore (GE), U. Maryland, CSIC, TU Munich, MIPS, SCBIT
(Shanghai)
www.imexconsortium.org
Tutorial
http://www.ebi.ac.uk/intact
IntAct: Search and results
Other PSICQUIC services
IMEx data
Interaction details
Interactors
UniProt Taxonomy
PubMed
Method (PSI-MI CV)
Complex ?
IntAct: Search and results
Export
Custom columns
Filters
Exercise 1
• In the search panel, type the query: CDK8. How many binary interactions
are returned?
• Which species are present in the results? (hint – look at Browse by
taxonomy)
• How would you filter these results so that only experimentally determined
pairwise interactions are displayed?
• Type the query: “transcription factor”. What types of interactor does it find
(hint: click on the Lists tag).
In the search panel, type the query: chlorophyll. Click on “Change Columns
Displayed” and deselect the two Aliases columns, select the First Author
column, then click the Update button. What changes occur in the interactions
table? Who is the first author for the “Photosystem I subunit VII-ps1a1”
interaction?
Interaction details
Exercise 2
• In the search panel, type: ERK AND species:3702. Click on the
details symbol for interaction 1. What is the host organism for this
experiment?
•Which journal was it published in, and in what year?
•How many interactions in total does IntAct have from this
publication? (hint – look to the right of the Publications section)
The Browse tab
Exercise 3
• In the search panel, type: Phosphopentokinase, click on the
Browse tab, then click the By UniProt taxonomy link. How many
interactions are there involving only Arabidopsis proteins?
•Select the human interactions. Which interaction detection method
is used for the manually curated entry?
•What is the title of the publication for this entry?
•Click on the Browse tab again, click Back to Browse Options. Click
By Gene Ontology. Where do these interactions occur? (ie. which
compartment)
Advanced search: Fields
Filtering options
Add more filtering options
Exercise 4
•In the search panel, type: starch. How many entries are returned?
•Click on the “Show Advanced Fields” button to the right of the Quick
Search box.
Select the field Organism from the Pulldown menu – type in 3702 as
your organism, click Add and search. How many entries are
returned?
Further refine the search by adding Detection method as two hybrid
– does this make a difference in the number of interactions found?
The List tab - Proteins
List tab - Compounds
Exercise 5
•In the search panel, type: mitosis and click on the Lists tab.
•How many proteins are found? How many small compounds?
•Click on the DASTY links for various proteins. Notice how it shows
features such as mutation sites, post translational modifications and
binding sites.
•Return to the Lists tab. Click on the Compounds sub-tab.
•Click on the ChEBI link for gdp. Is it's atomic mass below 500
kilodaltons?
Viewing results in other resources
Exercise 6
• Search for: GPCR and click on the Lists tag, then click the Domains button.
You get an error – why is this?
• Fix the cause of the error, and click the Domains button again. Which
domain is prominant in GPCRs?
• Click on the Pathways button – which resource does this take you to? Which
pathways are overrepresented?
Ontology search I
Ontology search II
Exercise 7
• Click on the Search tab and scroll down to the ontology section. Start to
enter the word stamen slowly. What do you notice?
• How many different stamen processes does IntAct recognize?
• Which ontologies are supported by IntAct?
• Which of these ontologies know something about stamen processes?
Using PSICQIC services
Other PSICQUIC services
IMEx data
Exercise 8
• In the search panel, type the query: arabidopsis. How many binary
interactions are returned?
• What is the total number of interaction evidences from other databases?
• How many interaction evidences come from IMEx databases?
• Click on the link to the IMEx hits. Which other database(s) has/have hits for
this query?
• Look at the interactions from the MINT database. What information is
available that is not available in IntAct?
Graph tab I
Graph tab II
Exercise 9
In the search panel, type: O81905, click on the Graph tab, then click
the Cytoscape link. If Cytoscape does not start, ask your neighbour
– not all computers have the permissions to do this.
•On the left hand side of the Cytoscape window, select the VizMapper
tab.
•Under the drop down list ‘Current Visual Style’ choose ‘Sample 1’
•Expand the edge color node, set detection method to discrete
mapping.
• To color interactions by detection method, right click and choose
Generate discrete values → Rainbow 1.
•Now experiment with other features of this visualization tool!
Answers!
Exercise 1
• 57 interactions returned
• Look at browse by taxonomy, lists species in results
• Filter out spoke-expanded queries, leaves 12 results
• Finds proteins, chemical compounds and nucleic acids
• Naver et al (2001)
Exercise 2
• Host organism is yeast
• Proc Natl Acad Sci USA, 2008
• 8 interactions
Exercise 3
• 2 interactions from Arabidopsis
• Enzymatic study
• PubMed 15352244 New targets of Arabidopsis
thioredoxins revealed by proteomic analysis
• The apoplast compartment
Exercise 4
• 334 entries
• When you add Arabidopsis, leaves 20 results
• Add two hybrid detection method, leaves 15 entries
Exercise 5
• 5753 proteins
• 23 small compounds
• Mass 443.2
Master headline
Exercise 6
• Error – need to select some or all of the list before selecting button
• The 7TM domain is prominent in GPCRs
• Reactome
• GPCR signalling
Master headline
Exercise 7
• The word stamen auto-completes
• 4 stamen processes are recognised
• Gene Ontology, PSI-MI, ChEBI, UniProt Taxonomy and InterPro
• GO
Master headline
Exercise 8
• 231 interactions found
• 117,901 interaction evidences (PSICQUIC)
• 663 interaction evidences from 2 other IMEx databases
• DIP and MINT databases have hits for this query
• Confidence values in MINT
Master headline
Download