Chapter 4: Protein Interactions and Disease Mileidy W. Gonzalez, Maricel G. Kann Presented by Md Jamiul Jahid What to learn in this chapter • Experimental and computational methods to detect protein interactions • Protein networks and disease • Studying the genetic and molecular basis of disease • Using protein interactions to understand disease What is Protein interaction • Protein is the main agents of biological function – Protein determine the phenotype of all organisms • Protein don't function alone – interaction with other proteins – interaction with other molecules (e.g. DNA, RNA) What is Protein interaction • Protein interaction generally means physical contact between proteins and their interacting partners. • Protein associate physically to create macromolecular structures of various complexities and heterogeneities • Protein pair can form dimers, multi-protein complexes or long chains What is Protein interaction • But it always need not to be physical • Besides physical interactions protein interaction means metabolic or genetic correlation or co-localization • Metabolic -> in same pathway • Genetically correlated -> co-expressed • Co-localization -> protein in the same cellular compartment PPI Network • PPI network represents interaction among proteins • Each node represent a protein • Each link represents an interaction PPI Network A PPI network of the proteins encoded by radiation-sensitive genes in mouse, rat, and human, reproduced from [89]. PPI Network • Some use of PPI network – To learn the evolution of different proteins – About different systems they are involved – Network can be used to learn interaction for other species – Helpful to identify functions of uncharacterized proteins Experimental Identification of PPIs • Biophysical Methods • High-Throughput Methods – Direct high-throughput methods – Indirect high-throughput methods Biophysical Methods • Mainly biochemical, physical and genetic methods – X-ray – Crystallography – NMR spectroscopy – Fluorescence – Atomic force microscopy Biophysical Methods • Biophysical methods identify interacting partners • Chemical features of the interaction • Problem: – Time and resource consumption is high – Applicable for small scale High Throughput Methods • Direct high-throughput methods • Indirect high-throughput methods Direct high-throughput methods • Yeast two-hybrid (Y2H) – Most common – Fuse two protein in a transcription binding domain – If the protein interact->transcription complex activated Direct high-throughput methods Y2H overview Image courtesy Wikipedia.org Direct high-throughput methods • Problem (Yeast two-hybrid) – Cannot identify complex protein interaction means more than two interaction – Interaction of proteins initiating transcription Indirect high-throughput methods • Looking at characteristics of the gene encode that produce that protein • Gene co-expression – Assumption: genes of interacting protein must co-expressed to provide the product of protein interaction Computational Predictions of PPIs • Empirical predictions • Theoretical predictions – Coevolution at the residue level – Coevolution at the full sequence level Empirical predictions • Based on – Relative frequency of interacting domains – Maximum likelihood estimation – Co-expression • Disadvantage – Rely on existing network – Propagate inaccuracies Theoretical Predictions of PPIs Based on Coevolution • Coevolution at the residue level • Coevolution at the full sequence level • In biology, coevolution is "the change of a biological object triggered by the change of a related object." Coevolution at the residue • Paris of residues of the same protein can co-evolve for three dimensional proximity or shared functions • A pair of protein is assumed to interact if they show enrichment of the same correlated mutations Coevolution at the full sequence level • Basic idea: changes in one protein are compensated by correlated changes in its interacting partners to preserve interaction • ->> interacting protein have phylogenetic trees with topologies more similar than by chance • Mirrortree is most accurate option to indentify interaction Mirrortree • Identify the orthologs of both proteins in common species • Creating multiple sequence alignment (MSA) with each orthologs • Create distance metric from MSA • Calculate correlation coefficient between distance metric Mirrortree Different methods for computing PPI Protein Network and Disease • Studying the Genetic Basis of Disease • Studying the Molecular Basis of Disease Studying the Genetic Basis of Disease • After Mendelian genetics in the 1900, a lot of effort to categorize disease genes • Positional cloning: the process to isolate a gene in the chromosome based on its position • Genes identified by this approach – cystic fibrosis, HD, breast cancer etc. – still mutation in gene not correlate with symptoms Studying the Genetic Basis of Disease • Several reasons – pleiotropy – influence of other genes – environmental factors Studying the Genetic Basis of Disease • Pleiotropy: when a single gene produce multiple phenotype • Problem: complicates disease elucidation process because mutation of such gene can have effect of some, all or none of its traits. • Means, mutation of a pleiotrophic gene may cause multiple syndrome or only cause disease in some of the biological process Studying the Genetic Basis of Disease • Influence of other genes – Interact synergistically – Modify one another Studying the Genetic Basis of Disease • Environmental factors – diet – infection etc. • Cancer are believed to be caused by several genes and are affected by several environment factors Studying the Molecular Basis of Disease • Genes associated with disease is important • Molecular details is also important to identify the mechanism triggering, participating and controlled perturbed biological functions The role of protein interaction in disease • Protein interaction provide a vast source of molecular information because their interaction involve in – metabolic – signaling – immune – gene regulatory networks • Protein interaction should be the key target to understand molecular based disease understanding The role of protein interaction in disease • Protein-DNA interaction disruption • Protein misfolding • New undesired protein interaction Protein-DNA interaction disruption • p53 tumor suppressor • Mutation on p53 DNA-binding domain destroy its ability to bind its target DNA sequence • Cause preventioning of several anticancer mechanism it mediates Protein misfolding and undesired interaction • Protein misfolding – protein folding: A process by which a protein goes to its 3D functional shape • New undesired protein interaction – Main cause of several disease like Huntington disease, Cystic fibrosis, Alzheimer's disease etc. Using PPI network to understand disease • PPI Network can help identify novel pathway • PPI network can be helpful to explore difference between healthy and disease states • Protein interaction studies play a major role in the prediction of genotypephenotype association Using PPI network to understand disease • New diagnostic tools can result from genotype-phenotype associations • Can identify disease sub networks • Drug design PPI Network can help identify novel pathway • PPI network: Maps physical and functional interaction of protein pairs • Pathway: Represents genetic, metabolic, signaling or neural processes as a series of sequential biochemical reaction PPI Network can help identify novel pathway • Pathway alone cannot uncover disease detail • When performing pathway analysis to study disease differential expression is the key • Majority of human genes haven't been assigned to pathway PPI Network can help identify novel pathway • In this scenario PPI network can be helpful to identify novel pathway • Some key findings – Disease genes are generally occupy peripheral position in PPI network – Few cancer genes are hubs – Disease genes tend to cluster together – Protein involved in similar phenotype are highly connected PPI network can be helpful to explore difference between healthy and disease states Source: Dynamic modularity in protein interaction networks predicts breast cancer outcome, Nature Biotechnology 27, 2009 Genotype-phenotype association and new disease genes • Disease gene by interacting partners of already known disease genes • Topological features to predict disease genes – 970/5000 genes are disease genes Disease subnetwork identification Disease subnetwork identification Drug design • Hub node in PPI are not good for drug target • Less connected nodes may be good target for drug Exercise • Objective: investigate Epstein-Barr Virus pathogenesis using PPI • EBV is most common human virus • 95% adult infected to this virus • EBV replicates in epithelial cells and establish latency in B lymphocytes – 35-50% time mono-nucleosis – Sometimes cancer Dataset • Dataset S1: EBV interactome • Dataset S2: EBV-Human interactome • Software requirement: – Cytoscape (DL link: www.cytoscape.org) Questions • How many nodes and edges are featured in this network? • How many self interactions does the network have? • How many pairs are not connected to the largest connected component? • Define the following topological parameters and explain how they might be used to characterize a protein-protein interaction network: node degree (or average number of neighbors), network heterogeneity, average clustering coefficient distribution, network centrality. Questions • How many unique proteins were found to interact in each organism? • How many interactions are mapped? • How many human proteins are targeted by multiple (i.e. how many individual human proteins interact with >1) EBV proteins? • How does identifying the multi-targeted human proteins help you understand the pathogenicity of the virus? — Hint: Speculate about the role of the multi-targeted human proteins in the virus life cycle. Questions • Based on the ‘degree’ property, what can you deduce about the connectedness of ET-HPs? What does this tell you about the kind of proteins (i.e. what type of network component) EBV targets? Questions • What do the number and size of the largest components tell you about the inter-connectedness of the ET-HP subnetwork? Questions • Why is distance relevant to network centrality? What is unusual about the distance of ET-HPs to other proteins and what can you deduce about the importance of these proteins in the Human-Human interactome? Questions • Based on your conclusions from questions i-iii, explain why EBV targets the ET-HP set over the other human proteins and speculate on the advantages to virus survival the protein set might confer. Thanks