BioNet of Human Cancers Representations of Protein Stuctures a - full atom b,c - strands / helices d - Topology diagrams Multiple Sequence Alignment (MSA) Protein Domains “Independent Folding Units” 50 - 350 residues Mean size - 125 residues Alpha folds; Beta Folds; Alpha+Beta Folds; Alpha/Beta Folds Principal Protein Fold Classes All alpha alpha + beta All beta alpha / beta COG 272, BRCT family P. Bork et al Fold Classification SCOP Database - manual curation CATH Database - largely automated, manual refinement Dali Database - fully automated Structural Validation of Homology 19% Seq ID Z = 12.2 Adenylate Kinase Guanylate Kinase Eukaryotes 30000 Other families 25000 dm+ce+hs: 45 families at+dm+ce+hs: 56 families All: 381 families 20000 15000 10000 5000 0 sc at dm ce hs courtesy of C. Chothia Homology modeling – Use structural information from experimentally determined protein structures to predict structure of similar (homologous) protein – Servers: SwissModel, 3DJigSaw, EsyPred3D, MODELLER, HOMA – Limitations in distinguishing between correct and wrong homology models HOMA: homology modeling by satisfaction of spatial restraints Li, H.; Tejero, R.; Monleon, D.; Bassolino-Klimas, D.; Abate-Shen, C.; Bruccoleri, R.E.; Montelione, G.T. Protein Science 1997, 6: 956 - 970. Homology modeling using simulated annealing of restrained molecular dynamics and conformational search calculations with CONGEN: Application in predicting the three-dimensional structure of murine homeodomain Msx-1. Bhattacharya, A.; Wunderlich, Z.; Monleon, D.; Tejero, R.; Montelione, G.T. PROTEINS: Struct. Funct. Bioinformatics. 2007 70: 105 - 118. Assessing model accuracy using the homology modeling automatically (HOMA) software. – Calculate inter-atomic distances between ‘homologous atoms’, in template structure – Random subset to generate distance constraints – Refinement protocols • DYANA (Güntert P, et al, 1997 J Mol Biol 273: 283) • XPLOR (Brünger AT. X-PLOR, Version 3.1, Schwieters et al, 2003 JMR 160: 65) • Hybrid DYANA / XPLOR Test set to evaluate HOMA • Proteins with available experimental structure – Filtered for quality of structure • 24 groups of homologous (same SCOP family) proteins – 30 % to 85 % pairwise sequence identity within each group – Each protein modeled using structure of other proteins in group – 264 homology models generated • Control sets – self-modeled: Proteins modeled using own experimental structure as template (90) – wrongly-folded: Proteins modeled with template from different SCOP family (246) Accuracy assessment for HOMA models • RMSD used to calculate accuracy of homology models – Backbone heavy atoms (N, Cα, C’) Entire test set Comparison with other methods Backbone atom RMSD to experimental structure Cancer Pathways Visualization and Representation Jason Lu and Mark Gerstein Yale University Legend and Arrow Ontology Ligand Receptor Cleavage/cuts translocates Kinase Adaptor Enzymes Transcription factor Other Protein Plasma Membrane Nuclear Membrane P Phosphate group activates inhibits Toll-like Receptor Pathway Ligand TRAF6 Receptor Kinase IRAK1 IRAK4 TLR MyD88 TAB2 TAK1 TAB1 Adaptor Enzymes Transcription factor IKK- IKK- IKK- Interferon Gamma Pathway Other Protein Plasma Membrane Nuclear Membrane IB p50 p65 NF-κB pathway Pathway in detail • The innate immune response responds in a general manner to factors present in invading pathogens. Bacterial factors such as lipopolysaccharides (LPS, endotoxin), bacterial lipoproteins, peptidoglycans and also CpG nucleic acids activate innate immunity as well as stimulating the antigen-specific immune response and triggering the inflammatory response. • Members of the toll-like receptor (TLR) gene family convey signals stimulated by these factors, activating signal transduction pathways that result in transcriptional regulation and stimulate immune function. • The downstream signaling pathways used by these receptors activate the IL-1 receptor associated kinase (IRAK) through the MyD88 adaptor protein, and signaling through TRAF-6 and protein kinase cascades to activate NFkB and MAPK pathways. • NF-kB and other ways then activate transcription of genes such as the proinflammatory cytokines IL-1 and IL-12. Interferon-Gamma Pathway JAK-STAT Pathway Ligand Receptor IFN-γ IFN-γR Kinase JAK2 P Adaptor Enzymes TID1 IFN-γ IFN-γR Transcription factor Other Protein JAK2 TID1 IKK- NF-κB pathway Nuclear Membrane HSP70 HSP70 TID1 IFN-γR Plasma Membrane JAK2 P Phosphate group Tumor suppressors RB p53 WT1 Pathway in detail • Signaling by interferon-gamma stimulates anti-viral responses and tumor suppression through the heterodimeric interferon-gamma receptor. • Signaling is initiated by binding of interferon-gamma to its receptor, activating the receptor-associated JAK2 tyrosine kinase to phosphorylate STAT transcription factors that activate interferon responsive genes. • Molecular chaperones that modulate or alter protein folding interact with different components of the interferon signaling pathway. One chaperone that modulates interferon signaling is hTid-1, a member of the DnaJ family of chaperones and a cochaperone for the heat shock protein Hsp70, another molecular chaperone. • Hsp70 holds Jak2 in an inactive conformation prior to ligand activation, and is released in the presence of agonist to allow the activation of Jak-2 and downstream pathways. JAK-STAT Pathway Ligand mTOR Pathway Receptor Kinase Adaptor P JAK2 P p53 P cytokines P TYK2 P p53 P P p53 P p53 p53 P p53 P Enzymes DNA transcription Transcription factor Other Protein Plasma Membrane Nuclear Membrane P Phosphate group MAPK Pathway Pathway in detail • The Janus kinase-signal transducer and activator of transcription (JAK-STAT) pathway is capable of transmitting information from extracellular polypeptide signals through transmembrane receptors, directly from the cytoplasm to target gene promoters in the nucleus. • Evolutionarily, the major components are conserved from slime molds to humans, but are absent from fungi and plants. • This canonical pathway presents the major themes common to most systems that use JAK-STAT signaling. TGF-beta Pathway Ligand Receptor R-SMAD R-SMAD SMAD4 SMAD4 Enzymes p38 R-SMAD DNA-BP JNKs I-SMAD Adaptor ERKs TGF-beta R Smurf Kinase Transcription factor Other Protein SARA Target genes Plasma Membrane Nuclear Membrane I-SMAD Pathway in detail • Members of the transforming growth factor beta (TGFb) superfamily of ligands initiate signaling by binding to and inducing formation of heteromeric complexes of type I and type II Ser-Thr kinase receptors. • This activated type I receptor then propagates the signal to members of the Smad family of intracellular mediators. • Smad anchor for receptor activation (SARA), appears to be important for recruiting R-Smads to the TGFb receptor complex. • Once phosphorylated, R-Smads form heteromeric complexes with the common Smad (Co-Smad), Smad4. This heteromeric complex then translocates to the nucleus to modulate the activity of specific promoters through physical interactions with DNA-binding partners. • Inhibitory Smads (I-Smads), antagonize signaling. Smurfs are E3 ubiquitin ligases that associate with certain R- and I-Smads to mediate ubiquitination and degradation of either Smads or Smad-associated proteins, including the receptor complex. Regulation Common Themes • Common components found in all pathways: Ligand, receptors, kinases and transcription factors. Correponds to the different stages of initial signal/binding, signal transduction, amplication/cascade and final effect. • Phosphorylation is the most common repeated step: why? Rapid, reversible covalent modification that is easy to regulate reciprocally via phosphorylase and phosphotase. Common Themes • Ubiquitination and cleavage/proteolysis rare: • why? may be due to the nature of the pathways, i.e. more common in degradative/apoptotic pathways? • Proteolysis is complete and irreversible. There is high cellular energy cost associated with it. Makes more sense to have reversible phosphrylations… Common Themes • Loops: when they occur, usually negative feedback loops of downstream proteins inhibiting more upstream targets. • Why? Negative feedback is used to maintain homeostasis and ensure a desirable level of cellular flux. (think metabolism) More on Regulation…. • Regulation tend to occur at key steps (i.e. bottlenecks or check points, not all steps are heavily regulated) Usually found before the amplification cascade. • Why? Makes better sense to regulate at key control points (e.g. receptor binding) before the rapid cascading takes place. Example: TLR Pathway Breakdown regulatory hubs TRAF6 Ligand Amplification Kinase IRAK1 IRAK4 Signal TLR MyD88 Receptor TAB2 TAK1 TAB1 Transduction Adaptor Enzymes Transcription factor IKK- IKK- IKK- Interferon Gamma Pathway Other Protein Plasma Membrane Nuclear Membrane IB p50 p65 Effect NF-κB pathway Pathway Crosstalks Pathways are Interconnected • Individual pathways interconnect at different points • Some pathway are downstream targets regulated by others • These ‘crosstalks’ form a ‘cancer pathways network’ Some Pathway Crosstalks MAPK (EGF) JAKSTAT Regulates Smads TAK1 MAPK IFNgamma Toll Early phase NFB Late phase NFB TGF- Regulates Smads activates NFB inhibits BioNet of Human Cancers E. White Human Cancer Pathway Interaction Network (HCPIN) • • • • • • • • Cell cycle progression Apoptosis Toll-like receptor pathway Interferon alpha/beta JAK-STAT pathway TGF-beta pathway PI3K pathway MAPK pathway BioNet – Biomedical target selection from interaction networks Systematically complete structural coverage of pathways and interaction networks Study structures of complexes Pathway-Interaction Subnet KEGG Pathway Database Ogata, H. et al (1999). Nucleic Acids Res 27, 29-34. HPRD • The Human Protein Reference Database – All the information in HPRD has been manually extracted from the literature by expert biologists who read, interpret and analyze the published data. Peri, S. et al. (2003) Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Research. 13:2363-2371. http://www.hprd.org/ Human Cancer Pathway Interaction Network (HCPIN) Proteins/Complexes (nodes) 2971 proteins (658 pathway proteins) 240 multiprotein complexes Interactions (edges) 10583(292 loops) Diameter (longest distance) 11 Average distance (how closely nodes are connected) 4.143 Clustering coefficient (completeness of the network) 0.143 Pathway protein (658) Interaction protein (2313) 240 Multiprotein Complex (connecting at least three proteins) Centrality (hub and bottleneck) P53 Hub: Protein with high number of interactions Bottleneck: Protein that occurs on many shortest paths Top central proteins: P53 GRB2 EGF receptor EGFR RAF1 BRCA1 BCL2 SRC RB1 PIK3R1 HDAC1 JUN CREBBP GRB2 HCPIN Domains Pkinase Zf-C2H2 WD40 Collagen Domain Name Frq Molecular Function Collagen 265 extracellular structural proteins Pkinase 184 protein kinase zf-C2H2 176 nucleic acid-binding WD40 173 multi-protein complex assemblies LRR_1 148 leucine rich repeat, proteinprotein interaction Ank 145 protein-protein interaction motif fn3 134 cell surface binding, signaling EGF 122 EGF-like domain SH3_1 104 signal transduction related to cytoskeletal organization Ldl_recept_b 101 low-density lipoprotein receptor repeakt class B TPR_1 99 protein-protein interaction EGF_CA 94 calcium binding EGF domain IQ 81 calmodulin-binding motif efhand 79 calcium-binding domain ig 77 immunoglobulin domain A structure coverage overview of the apoptosis pathwayinteraction module structure coverage pathway protein interaction protein no SwissProt entry multiprotein complex (connecting at least three proteins) A structure coverage overview of the apoptosis pathwayinteraction module structure coverage pathway protein interaction protein no SwissProt entry multiprotein complex (connecting at least three proteins) Community Outreach http://nmr.cabm.rutgers.edu:9090/HCPIN Janet Huang Dehua Hang apoptosis TLR * * p53 * P53 * IL21(JAK) 658 Target Selection 2971 ~1100 human proteins/domains are selected as NESG targets 2328 506 136 1160 http://nmr.cabm.rutgers.edu:9090/PLIMS/ Community Outreach SW name: NESG-id: PDB-id: Coverage: Method: TLR2 HC02 1fyw 19% X-ray SW Name: NESG-id: PDB-id: Coverage: Method: NBEA HC3 1mi1 14% X-ray SW name: NESG-id: PDB-id: Coverage: Method: HTP PSI-1 MYD88 HR2869A 2js7 52% NMR SW name: NESG-id: PDB-id: Coverage: Method: RNPC2 HR4730A 2jrs 18% NMR Toronto Group SW Name: NESG-id: PDB-id: Coverage: Method: IF16 HR4626A, HR4626B 3b6y,2oq0 51% X-ray,X-ray SW name: NESG-id: PDB-id: Coverage: Method: CUL7 HT1 2jng 6% NMR SW name: NESG-id: PDB-id: Coverage: Method: SW name: NESG-id: PDB-id: Coverage: Method: DGKA HR532 1tuz 16% NMR HTP T1 ZN363 HT2B 2jrj 20% NMR SW name: NESG-id: PDB-id: Coverage: Method: PARC HR3443B 2juf 14% NMR SW name: NESG-id: PDB-id: Coverage: Method: RBBP9 HR2978 2qs9 100% X-Ray Y. Xu SW name: NESG-id: PDB-id: Coverage: Method: TLR2 HC02 1fyw 19% X-ray SW Name: NESG-id: PDB-id: Coverage: Method: NBEA HC3 1mi1 14% X-ray SW name: NESG-id: PDB-id: Coverage: Method: DGKA HR532 1tuz 16% NMR G. Jogl G. Liu A. Lemak P. Rossi (2) SW Name: NESG-id: PDB-id: Coverage: Method: IF16 HR4626B 2oq0 26% X-ray SW name: NESG-id: PDB-id: Coverage: Method: MYD88 HR2869A 2js7 52% NMR SW name: NESG-id: PDB-id: Coverage: Method: RNPC2 HR4730A 2jrs 18% NMR Structure coverage of HCPIN Medium-accuracy modeling level (Blast E_value < 10-6) High-accuracy modeling level (Blast E_value < 10-6 and >80% sequence identity) Total struct. coverage Total struct. coverage (after sw validation) Pathway proteins HCPIN – interaction proteins No. 600 1728 %SDa 86 76 (after sw validation) No. %SDa %Resb 55 Pathway proteins 600 52 23 42 HCPIN – interaction proteins 1728 44 18 %Resb a.Single-Domain (SD) coverage: - The percentage of pathway proteins with single-domain structural coverage. b.Residue coverage: - The number of residues covered by PDB hit, divided by total length of proteins in the pathways. Residues predicted to be low complexity or coiled coil are not counted in denominator. Single-Domain and Residue Coverage PDB hit Total Single-Domain Coverage(%) Residue Coverage(%) 100 50 0 0 50 25 Structure coverage of HCPIN Medium-accuracy modeling level (Blast E_value < 10-6) High-accuracy modeling level (Blast E_value < 10-6 and >80% sequence identity) Total struct. coverage Total struct. coverage (after sw validation) Pathway proteins HCPIN – interaction proteins No. 600 1728 %SDa 86 76 (after sw validation) No. %SDa %Resb 55 Pathway proteins 600 52 23 42 HCPIN – interaction proteins 1728 44 18 %Resb a.Single-Domain (SD) coverage: - The percentage of pathway proteins with single-domain structural coverage. b.Residue coverage: - The number of residues covered by PDB hit, divided by total length of proteins in the pathways. Residues predicted to be low complexity or coiled coil are not counted in denominator.