Protein-protein Interactions June 18, 2015 Why PPI? Protein-protein interactions determine outcome of most cellular processes Proteins which are close homologues often interact in the same way Protein-protein interactions place evolutionary constraints on protein sequence and structural divergence Pre-cursor to networks PPI classification Strength of interaction Permanent or transient Specificity Location within polypeptide chain Similarity of partners Homo- or hetero-oligomers Direct (binary) or a complex Confidence score Determining PPIs Small-scale methods Co-immunoprecipitation Affinity chromatography Pull-down assays In vitro binding assays FRET, Biacore, AFM Structural (co-crystals) PPIs by high-throughput methods Yeast two hybrid systems Affinity tag purification followed by mass spectrometry Protein microarrays Microarrays/gene co-expression Implied functional PPIs Synthetic lethality Genetic interactions, implied functional PPIs Yeast two hybrid system Gal4 protein comprises DNA binding and activating domains Binding domain interacts with promoter Activating domain interacts with polymerase Measure reporter enzyme activity (e.g. blue colonies) Yeast two hybrid system •Gal4 protein: two domains do not need to be transcribed in a single protein •If they come into close enough proximity to interact, they will activate the RNA polymerase Two other protein domains (A & B) interact Binding domain interacts with promoter A B Activating domain interacts with polymerase Measure reporter enzyme activity (e.g. blue colonies) Yeast two hybrid system This is achieved using gene fusion Plasmids carrying different constructs can be expressed in yeast Binding domain as a translational fusion with the gene encoding another protein in one plasmid. A Activating domain as a translational fusion with the gene encoding a different protein in a second plasmid. B If the two proteins interact, then GAL4 is expressed and blue colonies form Yeast two hybrid Advantages Fairly simple, rapid and inexpensive Requires no protein purification No previous knowledge of proteins needed Scalable to high-throughput Is not limited to yeast proteins Limitations Works best with cytosolic proteins Tendency to produce false positives Mass spectrometry Need to purify protein or protein complexes Use a affinity-tag system Need efficient method of recovering fusion protein in low concentration TAP (tandem affinity purification) Spacer CBP PCR product TEV site Protein A Homologous recombination Chromosome Fusion protein Protein Spacer CBP TEV site Protein A Calmodulin binding peptide TAP process "Taptag simple" by Chandres - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons TAP Advantages No prior knowledge of complex composition Two-step purification increases specificity of pull-down Limitations Transient interactions may not survive 2 rounds of washing Tag may prevent interactions Tag may affect expression levels Works less efficiently in mammalian cells Other tags HA, Flag and His Anti-tag antibodies can interfere with MS analysis Streptavidin binding peptide (SBP) High affinity for streptavidin beads 10-fold increase in efficiency of purification compared to conventional TAP tag Successfully used to identify components of complexes in the Wnt/b-catenin pathway Used Dsh-2 and Dsh-3 as bait proteins The KLHL12-Cullin-3 ubiquitin ligase negatively regulates Wnt-bcatenin pathway by targeting Dishevelled for degradation Nature Cell Biology 4:348-357 (2006) Binding partners of Bruton’s tyrosine kinase Role in lymphocyte development & B-cell maturation Protein Science 20:140-149 (2011) Databases of protein-protein interactions MINT – Molecular Interaction Database >240,000 interactions with 35,000 proteins Covers multiple speces DIP -- Database of Interacting Proteins (UCLA) >79,000 interactions with >27,000 proteins CCSB – Proteomics base interactomes (Harvard) Human, viruses, C. elegans, Some S. cerevisiae unpublished data IntAct – EBI molecular interaction database Curated data from multiple sources Integrated Databases of PPIs MiMI: Michigan Molecular Interactions Data merged from several PPI databases; source provenance maintained Links to literature sources for the PPI Linked to Entrez Gene, InterPro, Gene ontology Includes pathway data Various methods of viewing the data NOT CURATED Data only as good as source data http://mimi.ncibi.org MiMI database MiMI search results MiMI Gene Detail Gene Ontology Interactions Pathways KEGG pathway Each protein name is a link to another page Arrows & lines provide information about the type of interaction Other viewing options MeSH terms that involve this gene PPI with this gene in Cytoscape Adaptive PubMed search On average, two databases curating the same publication agree on 42% of their interactions. Discrepancies between sets of proteins annotated from the same publication are less pronounced, with an average agreement of 62%, but the overall trend is similar Better agreement on non-vertebrate model organisms data sets than for vertebrates Isoform complexity is a major issue Literature curation of protein interactions: measuring agreement across databases. Turinsky A.L. et. al. Database, Vol. 2010, Article ID baq026 iRefWeb Web interface to integrated database of protein- protein interactions Better review of the records after pulling in the data from the various source databases Can search by gene name or various IDs, including batch searches. Does not have the pathway and other information, but has a better measure of confidence of PPI http://wodaklab.org/iRefWeb/ iRef Web search The search will try to match automatically, both name and species. MI score: (Mint-inspired) score is a measure of confidence in molecular interactions for interactions between A and B: 1. Total number of unique PubMed publications that support the interactions 2. Cumulative sum of weighted evidence from all 3. The cumulative sum of weighted evidence from all interologs, i.e. interactions containing homologous pairs A' and B'. Interaction detail STRING database Search Tool for the Retrieval of Interacting Genes Integrates information from existing PPI data sources Provides confidence scoring of the interactions Periodically runs interaction prediction algorithms on newly sequenced genomes v.10 covers >2000 organisms http://string-db.org/ Networks in STRING database Starting protein Networks can be expanded 3 indirect interactions Information about the proteins Transferring PPI annotation Most of the high-throughput PPI work is done in model organisms Can you transfer that annotation a homologous gene in a different organism? Defining homologs Orthologue of a protein is usually defined as the bestmatching homolog in another species Candidates with significant BLASTP E-value (<1020) Having ≥80% of residues in both sequences included in BLASTP alignment Having one candidate as the best-matching homologue of the other candidate in corresponding organism Interologs If two proteins, A and B, interact in one organism and their orthologs, A’ and B’, interact in another species, then the pair of interactions A—B and A’—B’ are called interologs Align the homologs (A & A’, B & B’) to each other. Determine the percent identity and the E-value of both alignments Then calculate the Joint identity and the Joint Evalue Joint identity J I = I AA' ´ I BB' Joint E-value J E = EAA' ´ EBB' Transfer of annotation Compared interaction datasets between yeast, worm and fly Assessed chance that two proteins interact with each other based on their joint sequence identities Performed similar analysis based on joint E-values All protein pairs with JI ≥ 80% with a known interacting pair will interact with each other More than half of protein pairs with JE E-70 could be experimentally verified. Yu, H. et. al. (2004) Genome Res. 14: 1107-1118 PMID: 15173116 Examples of Protein-Protein Interologs In C. elegans, mpk-1 was experimentally shown to interact with 26 other proteins (by yeast 2-hybrid) Ste5 is the homolog of Mpk-1 in S. cerevisiae Based on the similarity between the interaction partners of mpk-1 and their closest homologs in S. cerevisiae, the interolog approach predicted 5 of the 6 subunits of the Ste5 complex in S. cerevisiae This paper has been cited >100 times Why the interest in predicting protein-protein interactions? Determining protein-protein interactions is challenging and the high-throughput (genomewide) methods are still difficult and expensive to conduct Identifying candidate interaction partners for a targeted pull-down assay is a more viable strategy for most labs BIPS: BIANA Interolog Prediction Server • Based on concept of interolog • Pre-defined alignments • Can submit list of proteins to get predicted interaction partners • Can filter predicted list to increase confidence Today in computer lab Tutorial on finding PPIs in your gene list using MiMI or iRefWeb Exploring a subset of PPIs using the STRING database Prediction of interactions homologs using the BIPS server Exercise 4 on protein domain analysis