Molecular interactions Based on Chapter 4 of Post-genome Bioinformatics by Minoru Kanehisa, Oxford University Press, 2000 Central dogma: DNA -> RNA -> Protein Sequence Structure Function Interaction Network Function Genome Transcriptome Proteome Network representation. A network (graph) consists of a set of elements (vertices) and a set of binary relations (edges). Biological knowledge and computational results are represented by different types of network data. 2) Binary Relation 1) Element Molecular interaction Genetic interaction Other types of relations Molecule Gene 3) Network Assembly Pathway Genome Neighbour Cluster Hierarchical Tree Representation of the same graph by: (a) a drawing of nodes and edges, (b) a linked list, and (c) an adjacency matrix. (a) (b) A B C D E F A B C D E F (c) B A B B C E C D E E D F A B C D E F A B C D E F 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 Biological examples of network comparisons. Pathway vs. Pathway Pathway vs. Genome Genome vs. Genome Cluster vs. Pathway Pathway alignment is a problem of graph isomorphism: (a) a maximum common induced subgraph and (b) a maximum clique. Pathway 1 (a) E Pathway 2 A a B b c C d D A B-a B-b C-d D-f A (b) (A, (B, (C, (D, (E, a) a) a) a) a) (A, (B, (C, (D, (E, b) b) b) b) b) (A, (B, (C, (D, (E, c) c) c) c) c) (A, (B, (C, (D, (E, d) d) d) d) d) (A, (B, (C, (D, (E, e) e) e) e) e) (A, (B, (C, (D, (E, f) f) f) f) f) A heuristic algorithm for biological graph comparison. It searches for clusters of correspondences, as shown in (a), which is similar in spirit to sequence alignment, shown in (b). Graph 1 (a) A C D B A B C D . . G E I Correspondences K H F J A C D E I H . . a b c d . . Graph 2 a d i k h a c i J j (b) A-B-C-D-E-F-G-H-I-J-K : : : A-c-b-d-e-f-h-g-j-k-i k h F b g e K f j d G b g e Clustering algorithm B c f Examples of binary relations Type o f relation Factual relation Contents Link s between database entiti es Examples Factual data and it s publi cation information Nucleotide sequence and trans la ted ami no acid sequence Protein sequence and 3D structure Simila rit y relation Computed simil arity Computed complementarit y Sequenc e simil arit y: 3D stuctural simi larit y 3D structural co mplementarit y Func tiona l relation Mole cular reactions Mole cular interactions Gene tic interactions Subs trate-product relations Mole cular pathway s; molecular assembli es Positi vely co-expre ssed gene s Negatively co-exp ressed genes Correlation o f gene locations (operons ) Orthologous and pa ralogous gene s Chromosomal relations Evolu tiona ry relations An example of computing possible reaction paths from pyruvate (C00022) to L-alanine (C00041) given a set of substrate-product binary relations, or a given list of enzymes. O O EC number 1.4.1.1 O H3C OH NH2 OH C00022 CH3 2.6.1.21 C00041 5.1.1.1 O CH3 OH 4.1.1.3 C00133 4.1.1.12 NH2 O O OH OH O 1.4.3.16 OH NH2 O HO C00036 C00049 O Query relaxation. Nodes E and E’ are considered to be equivalent according to the grouping G. Network data representation in KEGG Network type Pathway Assembly KEGG d ata Pathway map Content Metaboli c pathway, regu la tory pathway, and molecular assembly Representation GIF image map Geno me Geno me map Comparative geno me map Chromosomal l ocation o f genes Java applet Cluster Expre ssion map Dif ferential gene expr ession p rofil e by mi croarrays Java applet Neighbou r Orthologue g roup Pathway Assembly table Geno me Func tiona l un it of gene s in a pathway or assembly, toge ther with orthologous relation of gen es and chromosomal r elation o f gen es HTML table Hierarchical tree Hierarchical classification of gene s Hierarchical classification of molecules Hierarchical classification of organ isms Hierarchical classification diseases Hierarchical text Gene catalogue Mole cular catalogue Taxono my Disease ca talogue Genome-pathway comparison, which reveals the correlation of physical coupling of genes in the genome - operon structure (a) and functional coupling (b) of gene products in the pathway (a) E. coli genome hisL yefM hisG hisD hisC hisB hisH hisA hisF hisI yzzB (b) Metabolic pathway HISTIDINE METABOLISM Pentose phosphate cycle 5P-D-1-ribulosylformimine 3.5.1.- Phosphoribosyl-AMP PRPP 3.6.1.31 2.4.2.17 3.5.4.19 Phosphoriboxyl-ATP PhosphoribulosylFormimino-AICAR-P 5.3.1.16 2.6.1.- 2.4.2.- PhosphoribosylFormimino-AICAR-P Imidazoleacetole P 4.2.1.19 ImidazoleGlicerol-3P 2.6.1.9 3.1.3.15 L-Histidinol-P 5P Ribosyl-5-amino 4Imidazole carboxamide (AICAR) 1.1.1.23 1-MethylL-histidine 3.4.13.5 Aneserine 6.3.2.11 Purine metabolism 3.4.13.3 3.5.3.5 Imidazolone acetate 3.5.2.- 1.14135 Imidazole4-acetate 3.4.13.20 Imidazole acetaldehyde 1.2.1.3 Histamine 1.4.3.6 L-Hisyidinal 2.1.1.- 2.1.1.22 Carnosine N-Formyl-Laspartate L-Hisyidinal 4.1.1.22 4.1.1.28 6.3.2.11 1.1.1.23 6.1.1 Hercyn L-Histidine Hierarchy-pathway comparison, which reveals the correlation of evolutionary coupling of genes (similar sequences or similar folds due to gene duplications) and functional coupling of gene products in the pathway. SCOP hierarchical tree……..NE, TYROSINE AND TRYPTOPHAN BIOSYNTHESIS 1. 2. 3. All alpha All beta Alpha and beta (a/b) 3.1 beta/alpha (TIM)-barrel 3.2 Cellulases . . . . . . . 3.74 Thiolase 3.75 Cytidine deaminase 4. Alpha and beta (a+b) 5. Multi-domain (alpha and beta) 6. Membrane and cell surface pro 7. Small proteins RNA 8. Peptides 9. Designed proteins 10. Non-protein 2.5.1.19 3-deoxyD-arabinoheptonate 1.3.1.43 4.2.1.51 3-Dehydroquinate 4.2.1.10 4.2.1.11 1.1.9925 Quniate 2.6.1.57 Pretyrosine 4.2.1.91 1.4.1.20 6.1.1.20 2.6.1.5 Phenylalanine Phenylpyruvate 2.6.1.1 2.6.1.9 2.6.1.57 4.1.3.27 Histidine 1.1.9925 4-Aminobenzoate 3-Dehydro- Protocatechuate shikimate Folate biosynthesis 2.6.1.9 2.6.1.57 1.4.3.2 2.6.1.1 2.6.1.5 2.6.1.9 2.6.1.57 Prephenate 4.2.1.51 Indole 4.2.1.91 5.4.99.5 2.4.2.18 N-(5-Phosphob-v-ribosyl)anthranilate 4.1.3.- 4.2.1.10 2.6.1.5 4.2.1.20 1.4.3.2 4.6.1.4 2.6.1.1 4-Hydroxyphenylpyruvate 1.14.16.1 Shikimate 1.1.1.25 Alkaloid biosynthesis I 6.1.1.1 Tyrosine Anthranilate 4.6.1.3 1.1.1.24 Tyr-tRNA Chorismate 2.7.1.71 Tyrosine metabolism Ubiquinone biosynthesis 4.2.1.20 5.3.1.24 4.1.1.48 1-(2- CarboxyPhenylamino)1-deoxy-D-ribulose 5-phosphate 4.2.1.20 (3-Indolyl)Glycerol phosphate L-Tryptophan Tryptophan metabolism Grand challenge problems Protein folding problems Organism reconstruction problem Prediction Structure prediction - to predict protein 3D structure from amino acid sequence Network prediction - to predict entire biochemical network from complete genome sequence Knowledge Known protein 3D structures Known biochemical pathways and assemblies Knowledge based prediction Threading Network reconstruction Ab initio prediction Energy minimization Path computation Prediction of perturbed states Protein engineering Pathway engineering Glycolysis, the TCA cycle , and the pentose phosphate pathway, viewed as a network of chemical compounds. Each circle is a chemical compound with the number of carbons shown inside. NADPH D-Glucose-6P D-Glucose 6 6 D-Fructose-6P 6 D-Fructose-1,6P2 6 6 D-Xylulose-5P CO 2 5 4 Glycerone-P 3 3 NADH Funarate (S )-Malate 4 4 4 Citrate Succinate 4 21 GTP 21 21 25 CO2 12 Succinyl-CoA Dihydrolipoamide Isocitrate 8 6 CoA 8 NADH Glycerae-1,3P2 3 Glycerate-3P 3 Glycerate-2P 3 CoA glutarate Lipoamide 3 21 NADH 5 2-Oxo- S-Acetyldihydrolipoamide Oxaloacetate 6 CoA CoA 6 23 10 Acetyl-CoA Dihydrolipoamide CO2 8 S-Acetyldihydrolipoamide 8 NADH D-Sedoheptulose-7P Glyceraldehyde-3P NADH ATP Phophoenolpyruvate ATP 3 Pyruvate Lipoamide 6-PhosphoD-gluconate NADPH 5 D-Ribulose-5P 5 7 Citrate cycle (TCA cycle) FADH2 D-Glucono-1,5Lactone-6P D-Ribose-5P Pentose Phosphate pathway Glycolysis viewed as a network of enzymes (gene products). Each box is an enzyme with its EC number inside. D-Glucose (extracellular) D-Glucose 2.7.1.69 D-Glucose-6P 2.7.1.2 5.3.1.9 D-Fructose-6P 3.1.3.11 2.7.1.11 D-Fructose-1,6P2 4.1.2.13 Glycerone-P Glyceraldehyde-3P 5.3.1.1 1.2.1.12 Gycerate-1, 3P2 2.7.2.3 Glycerate-3P 5.4.2.1 Glycerate-2P 4.2.1.11 Citrate cycle (TCA cycle) Phosphoenolpyruvate Acetyl-CoA 2.7.1.40 1.2.1.51 2.3.1.12 6-S-Acetyl-dihydrolipoamide Dihydrolipoamide 1.8.1.4 1.2.4.1 Lipoamide Pyruvate Pentose Phosphate cycle A generalized concept of protein-protein interactions. Direct protein-protein interaction Protein 1 Protein 2 Binding, modification, Cleavage, etc. Indirect protein-protein interaction Protein 1 Protein 2 Enzymic reaction Protein 1 Protein 2 Gene expression Gene (Molecular template) A strategy for network reconstruction from genomic information. Reference knowledge (e.g. KEGG) Gene catalogue in the genome Predicted network by orthologue identification Predicted network by path computation Binary relations: Positional cloning Genome comparisons Gene-gene (indirect) interactions DNA chips Protein-protein (direct) interactions Substrate-product relations Protein chips Biochemical knowledge Hierarchial relations Sequence analysis Genetic and chemical blueprints of life. Bluep rint Entit y Info rmation Gene tic bluep rint of lif e Geno me Centrali zed Static Chemic al bluep rint of lif e Network of interacting molecules in the cell Distributed Dyna mic Principles of the biochemical network encoded in the genome. Hierarchy - conservation and diversification (a) Low resolution network (b) Divergent inputs Divergent outputs Conserved pathway High resolution network Duality - chemical logic and genetic logic (c) Chemical Enzyme network network + = Metabolic network (d) Protein-protein interaction network Gene regulatory network Biological examples of complex systems System Node Edge Protein 3D structure Atom Atomi c interaction Organ ism Mole cule Mole cular interaction Brain Cell Cellular interaction Ecosystem Organ ism Organ ism i nteraction Civili zation Human Human interaction From Sequence to Function Comparison of bioinformatics aproaches for functional prediction Era Experiments Database Computational method 1977 gene cloning sequencing sequence sequence similarity search 1995 whole genome sequencing pathway pathway reconstruction path computation pathway = wiring diagram Functional Reconstruction Problem (Sequence -> organism) 1. Genome is a blueprint of life (Dolly’s cloning principle) Genome + Environment (Nucleus) 2. Network of molecular interactions in the entire cell is a blueprint of life - Genome is only a warehouse of parts (Principle of molecular interaction) Germ Cell Line Pathway Assembly DNA Damage Suzie Grant’s Study: Aims • Examine the effects of oncostatin-M (OSM) in combination with Epidermal Growth Factor (EGF) • Delineate the signalling pathway responsible for the effects induced by OSM in breast cancer cells. IL-6 Cytokine Receptor Family. gp130 IL-6 IL-11 CNTF IL-6R IL--11R CNTFR gp130 LIF OSM CT LIFRb OSM gp130 OSMR b Physiological Functions of IL-6 family members Function Cytokine Proliferation/maturation of megakaryocytes OSM, LIF, IL-6, IL-11 Expansion of hemopoietic progenitor cells in the AGM OSM Induce terminal differentiation of M1 cells OSM, LIF, IL-6, CT-1 Inhibit differentiation of ES cells OSM, LIF, CT-1, CNTF Stimulate proliferation of fibroblasts OSM Increase expression of TIMP-1, ICAM-1 and VCAM-1 OSM, LIF, IL-6, IL-11 Proliferation/differentiation of vascular endothelial cells OSM Elevate LDL receptors in hepatocytes OSM Induce synthesis of acute phase proteins in the liver OSM, LIF, IL-6, IL-11, CNTF, CT-1 Inhibit lipoprotein lipase, resulting in fat depletion OSM, LIF, IL-6, IL-11 Induce bone resorption, stimulate osteoblast activity OSM, LIF, IL-6, IL-11 Induce proliferation/differentiation of T-lymphocytes OSM, LIF, IL-6 Promote survival or differentiation of neurons OSM, LIF, IL-6, IL-11, CNTF, CT-1 Effects of IL-6, LIF, OSM, CNTF and IL-11 on MCF-7 cell proliferation. 100 # + # 80 * p < 0.001 # p < 0.01 + p < 0.02 60 40 * 20 IL-11 CNTF OSM 14 LIF 0 IL-6 n = 9 expts. Control Cell No. (% Control) 120 Effects of OSM on breast cancer cells. • OSMRb and gp-130 are expressed in breast cancer cell lines and primary tumour samples • Inhibition of proliferation of ER + and - breast cancer cell lines • Decreased clonogenicity • Inhibition of cell cycle progression – Reduced S phase fraction – Increased G0/G1 phase fraction • Alterations in mRNA expression – Decrease ER and PRLR expression – Increased EGFR expression • Phenotypic changes consistent with differentiation-induction – Morphology – Lipid accumulation – Apoptosis OSM Signalling OSM OSMRb or LIFRb gp130 Cell Membrane P JAK1 PY P Cytoplasm JAK1 STAT3 P YP YP Y P P ? GRB2 SOS RAS P SHC RAF P MEK P STAT3 P P P P Nucleus MAPK (ERK1/2) P STAT3 Transcription Factors P P P Transcription S Signalling by IL-6 Type Cytokines • In M1 cells, STAT3 is critical for IL-6 induced growth regulation and differentiation. Nakajima et al., EMBO J, 15, 1996 • Growth inhibition of A375 cells by OSM/IL-6 is STAT3 dependant. Kortylewski et al., Oncogene, 18, 1999 • In myeloma cells IL-6 up regulates mcl-1 through the JAK/STAT not ras/MAPK pathway. Puthier et al., Eur. J. Immunol., 29, 1999 • OSM activates STAT3 and ERK 2 in GOS3 cells. Blockade of MEK 1 partially inhibits the effects of OSM on these cells. Halfter et al., MCBRC, 1, 1999 • In adipocytes, LIF induces differentiation via the MAPK pathway. Aubert et al., JBC, 274, 1999 • Growth of KS cells stimulated by OSM/IL-6 is mediated by ERK 1/2 and negatively regulated by p38. Murakami-Mori et al., BBRC, 264, 1999 • OSM activates MAPK through a JAK 1 dependant pathway in HeLa cells. Stancato et al., MCB, 17, 1997 EGF family of growth factors and receptors • Epidermal growth factor (EGF) is a polypeptide growth factor • Mitogenic for mammary epithelium and breast cancer cells • Overcomes effects of several breast inhibitors such as tamoxifen and dexamethasone • Binds the EGFR/ErbB-1, a receptor with intrinsic tyrosine kinase activity • Signalling via an EGFR homodimer or EGFR heterodimer with ErbB-2,-3 or -4 – Heterodimer of EGFR and ErbB-2 preferred EGF family of receptors • EGFR/ErbB-1 – Overexpressed in about 30% of breast tumours – Expression correlates inversely with ER – Predicts aggressive disease/poor prognosis • ErbB-2 (HER2/neu) – Overexpressed in many types of cancer – Correlates with aggressive disease and shorter disease free survival in breast cancer patients – Most oncogenic of all ErbB family members – Orphan receptor • ErbB-3 – Contains a non-functional kinase – No correlation b/w expression in tumours and prognosis • ErbB-4 – Few clinical studies EGF signalling EGFR, ErbB-2, 3 or 4 PI3K EGF EGFR Ras PLC-g MAPK Cell Proliferation Src Effects of OSM and EGF on proliferation of MCF-7 cells. Cell Number (% Control) 120 100 80 60 * 40 20 N=10 0 OSM+EGF OSM EGF Control Summary of Suzie’s work so far • Effects of OSM on breast cancer cells enhanced by EGF – – – – – Inhibition of proliferation Decreased clonogenicity Cell cycle suppression Decreased ER expression Differentiation • Mechanism?