BioNet of Human Cancers

advertisement
BioNet of Human Cancers
Representations of
Protein Stuctures
a - full atom
b,c - strands /
helices
d - Topology
diagrams
Multiple Sequence Alignment (MSA)
Protein Domains
“Independent Folding Units”
50 - 350 residues
Mean size - 125 residues
Alpha folds; Beta Folds;
Alpha+Beta Folds; Alpha/Beta Folds
Principal Protein Fold Classes
All alpha
alpha + beta
All beta
alpha / beta
COG 272, BRCT family
P. Bork et al
Fold Classification
SCOP Database - manual curation
CATH Database - largely automated,
manual refinement
Dali Database - fully automated
Structural Validation
of Homology
19% Seq ID
Z = 12.2
Adenylate Kinase
Guanylate Kinase
Eukaryotes
30000
Other families
25000
dm+ce+hs: 45 families
at+dm+ce+hs: 56 families
All: 381 families
20000
15000
10000
5000
0
sc
at
dm
ce
hs
courtesy of C. Chothia
Homology modeling
– Use structural information from experimentally determined
protein structures to predict structure of similar (homologous)
protein
– Servers: SwissModel, 3DJigSaw, EsyPred3D, MODELLER, HOMA
– Limitations in distinguishing between correct and wrong homology
models
HOMA: homology modeling by satisfaction
of spatial restraints
Li, H.; Tejero, R.; Monleon, D.; Bassolino-Klimas, D.; Abate-Shen, C.;
Bruccoleri, R.E.; Montelione, G.T. Protein Science 1997, 6: 956 - 970.
Homology modeling using simulated annealing of restrained molecular
dynamics and conformational search calculations with CONGEN: Application
in predicting the three-dimensional structure of murine homeodomain Msx-1.
Bhattacharya, A.; Wunderlich, Z.; Monleon, D.; Tejero, R.; Montelione, G.T.
PROTEINS: Struct. Funct. Bioinformatics. 2007 70: 105 - 118. Assessing
model accuracy using the homology modeling automatically (HOMA) software.
– Calculate inter-atomic distances between ‘homologous
atoms’, in template structure
– Random subset to generate distance constraints
– Refinement protocols
• DYANA (Güntert P, et al, 1997 J Mol Biol 273: 283)
• XPLOR (Brünger AT. X-PLOR, Version 3.1, Schwieters et al, 2003 JMR 160: 65)
• Hybrid DYANA / XPLOR
Test set to evaluate HOMA
• Proteins with available experimental structure
– Filtered for quality of structure
• 24 groups of homologous (same SCOP family) proteins
– 30 % to 85 % pairwise sequence identity within each group
– Each protein modeled using structure of other proteins in group
– 264 homology models generated
• Control sets
– self-modeled: Proteins modeled using own experimental
structure as template (90)
– wrongly-folded: Proteins modeled with template from different
SCOP family (246)
Accuracy assessment for HOMA
models
• RMSD used to calculate accuracy of homology models
– Backbone heavy atoms (N, Cα, C’)
Entire test set
Comparison with other methods
Backbone atom RMSD to experimental structure
Cancer Pathways
Visualization and Representation
Jason Lu and Mark Gerstein
Yale University
Legend and Arrow Ontology
Ligand
Receptor
Cleavage/cuts
translocates
Kinase
Adaptor
Enzymes
Transcription factor
Other Protein
Plasma Membrane
Nuclear Membrane
P Phosphate group
activates
inhibits
Toll-like Receptor Pathway
Ligand
TRAF6
Receptor
Kinase
IRAK1
IRAK4
TLR
MyD88
TAB2
TAK1
TAB1
Adaptor
Enzymes
Transcription factor
IKK-
IKK- IKK-
Interferon Gamma
Pathway
Other Protein
Plasma Membrane
Nuclear Membrane
IB
p50 p65
NF-κB pathway
Pathway in detail
• The innate immune response responds in a general manner to factors
present in invading pathogens. Bacterial factors such as
lipopolysaccharides (LPS, endotoxin), bacterial lipoproteins,
peptidoglycans and also CpG nucleic acids activate innate immunity as
well as stimulating the antigen-specific immune response and triggering
the inflammatory response.
• Members of the toll-like receptor (TLR) gene family convey signals
stimulated by these factors, activating signal transduction pathways that
result in transcriptional regulation and stimulate immune function.
• The downstream signaling pathways used by these receptors activate the
IL-1 receptor associated kinase (IRAK) through the MyD88 adaptor protein,
and signaling through TRAF-6 and protein kinase cascades to activate NFkB and MAPK pathways.
• NF-kB and other ways then activate transcription of genes such as the
proinflammatory cytokines IL-1 and IL-12.
Interferon-Gamma Pathway
JAK-STAT Pathway
Ligand
Receptor
IFN-γ
IFN-γR
Kinase
JAK2 P
Adaptor
Enzymes
TID1
IFN-γ
IFN-γR
Transcription factor
Other Protein
JAK2
TID1
IKK-
NF-κB pathway
Nuclear Membrane
HSP70
HSP70
TID1
IFN-γR
Plasma Membrane
JAK2
P Phosphate group
Tumor
suppressors
RB
p53 WT1
Pathway in detail
• Signaling by interferon-gamma stimulates anti-viral responses and tumor
suppression through the heterodimeric interferon-gamma receptor.
• Signaling is initiated by binding of interferon-gamma to its receptor,
activating the receptor-associated JAK2 tyrosine kinase to phosphorylate
STAT transcription factors that activate interferon responsive genes.
• Molecular chaperones that modulate or alter protein folding interact with
different components of the interferon signaling pathway. One chaperone
that modulates interferon signaling is hTid-1, a member of the DnaJ
family of chaperones and a cochaperone for the heat shock protein Hsp70,
another molecular chaperone.
• Hsp70 holds Jak2 in an inactive conformation prior to ligand activation,
and is released in the presence of agonist to allow the activation of Jak-2
and downstream pathways.
JAK-STAT Pathway
Ligand
mTOR Pathway
Receptor
Kinase
Adaptor
P
JAK2
P
p53
P
cytokines P
TYK2
P
p53
P
P
p53
P
p53
p53
P
p53
P
Enzymes
DNA
transcription
Transcription factor
Other Protein
Plasma Membrane
Nuclear Membrane
P Phosphate group
MAPK Pathway
Pathway in detail
• The Janus kinase-signal transducer and activator of transcription
(JAK-STAT) pathway is capable of transmitting information from
extracellular polypeptide signals through transmembrane
receptors, directly from the cytoplasm to target gene promoters in
the nucleus.
• Evolutionarily, the major components are conserved from slime
molds to humans, but are absent from fungi and plants.
• This canonical pathway presents the major themes common to
most systems that use JAK-STAT signaling.
TGF-beta Pathway
Ligand
Receptor
R-SMAD
R-SMAD
SMAD4
SMAD4
Enzymes
p38
R-SMAD
DNA-BP
JNKs
I-SMAD
Adaptor
ERKs
TGF-beta R
Smurf
Kinase
Transcription factor
Other Protein
SARA
Target genes
Plasma Membrane
Nuclear Membrane
I-SMAD
Pathway in detail
•
Members of the transforming growth factor beta (TGFb) superfamily of ligands
initiate signaling by binding to and inducing formation of heteromeric complexes of
type I and type II Ser-Thr kinase receptors.
•
This activated type I receptor then propagates the signal to members of the Smad
family of intracellular mediators.
•
Smad anchor for receptor activation (SARA), appears to be important for recruiting
R-Smads to the TGFb receptor complex.
•
Once phosphorylated, R-Smads form heteromeric complexes with the common
Smad (Co-Smad), Smad4. This heteromeric complex then translocates to the
nucleus to modulate the activity of specific promoters through physical interactions
with DNA-binding partners.
•
Inhibitory Smads (I-Smads), antagonize signaling. Smurfs are E3 ubiquitin ligases
that associate with certain R- and I-Smads to mediate ubiquitination and
degradation of either Smads or Smad-associated proteins, including the receptor
complex.
Regulation
Common Themes
• Common components found in all pathways: Ligand,
receptors, kinases and transcription factors.
Correponds to the different stages of initial
signal/binding, signal transduction,
amplication/cascade and final effect.
• Phosphorylation is the most common repeated step:
why? Rapid, reversible covalent modification that is
easy to regulate reciprocally via phosphorylase and
phosphotase.
Common Themes
• Ubiquitination and cleavage/proteolysis rare:
• why? may be due to the nature of the pathways, i.e.
more common in degradative/apoptotic pathways?
• Proteolysis is complete and irreversible. There is high
cellular energy cost associated with it. Makes more
sense to have reversible phosphrylations…
Common Themes
• Loops: when they occur, usually negative
feedback loops of downstream proteins
inhibiting more upstream targets.
• Why? Negative feedback is used to maintain
homeostasis and ensure a desirable level of
cellular flux. (think metabolism)
More on Regulation….
• Regulation tend to occur at key steps (i.e.
bottlenecks or check points, not all steps are
heavily regulated) Usually found before the
amplification cascade.
• Why? Makes better sense to regulate at key
control points (e.g. receptor binding) before
the rapid cascading takes place.
Example: TLR Pathway Breakdown
regulatory hubs
TRAF6
Ligand
Amplification
Kinase
IRAK1
IRAK4
Signal
TLR
MyD88
Receptor
TAB2
TAK1
TAB1
Transduction
Adaptor
Enzymes
Transcription factor
IKK-
IKK- IKK-
Interferon Gamma
Pathway
Other Protein
Plasma Membrane
Nuclear Membrane
IB
p50 p65
Effect
NF-κB pathway
Pathway Crosstalks
Pathways are Interconnected
• Individual pathways interconnect at different
points
• Some pathway are downstream targets
regulated by others
• These ‘crosstalks’ form a ‘cancer pathways
network’
Some Pathway Crosstalks
MAPK
(EGF)
JAKSTAT
Regulates Smads
TAK1  MAPK
IFNgamma
Toll
Early phase NFB
Late phase NFB
TGF-
Regulates Smads
activates
NFB
inhibits
BioNet of Human Cancers
E. White
Human Cancer Pathway Interaction Network
(HCPIN)
•
•
•
•
•
•
•
•
Cell cycle progression
Apoptosis
Toll-like receptor pathway
Interferon alpha/beta
JAK-STAT pathway
TGF-beta pathway
PI3K pathway
MAPK pathway
BioNet – Biomedical target selection from
interaction networks
Systematically complete
structural coverage of
pathways and interaction
networks
Study structures of
complexes
Pathway-Interaction Subnet
KEGG Pathway Database
Ogata, H. et al (1999). Nucleic Acids Res 27, 29-34.
HPRD
• The Human Protein Reference Database
– All the information in HPRD has been manually extracted from the
literature by expert biologists who read, interpret and analyze the
published data.
Peri, S. et al. (2003) Development of human protein reference database as an initial platform
for approaching systems biology in humans. Genome Research. 13:2363-2371.
http://www.hprd.org/
Human Cancer Pathway Interaction Network
(HCPIN)
Proteins/Complexes (nodes)
2971 proteins
(658 pathway proteins)
240 multiprotein complexes
Interactions (edges)
10583(292 loops)
Diameter (longest distance)
11
Average distance
(how closely nodes are connected)
4.143
Clustering coefficient
(completeness of the network)
0.143
Pathway protein
(658)
Interaction protein
(2313)
240 Multiprotein Complex
(connecting at least three proteins)
Centrality (hub and bottleneck)
P53
Hub:
Protein with high
number of
interactions
Bottleneck:
Protein that occurs
on many shortest
paths
Top central proteins:
P53
GRB2
EGF receptor
EGFR
RAF1
BRCA1
BCL2
SRC
RB1
PIK3R1
HDAC1
JUN
CREBBP
GRB2
HCPIN Domains
Pkinase
Zf-C2H2
WD40 Collagen
Domain
Name
Frq
Molecular Function
Collagen
265
extracellular structural
proteins
Pkinase
184
protein kinase
zf-C2H2
176
nucleic acid-binding
WD40
173
multi-protein complex
assemblies
LRR_1
148
leucine rich repeat, proteinprotein interaction
Ank
145
protein-protein interaction
motif
fn3
134
cell surface binding,
signaling
EGF
122
EGF-like domain
SH3_1
104
signal transduction related to
cytoskeletal organization
Ldl_recept_b
101
low-density lipoprotein
receptor repeakt class B
TPR_1
99
protein-protein interaction
EGF_CA
94
calcium binding EGF domain
IQ
81
calmodulin-binding motif
efhand
79
calcium-binding domain
ig
77
immunoglobulin domain
A structure coverage overview of the apoptosis pathwayinteraction module
structure coverage
pathway protein
interaction protein
no SwissProt entry
multiprotein complex
(connecting at least three proteins)
A structure coverage overview of the apoptosis pathwayinteraction module
structure coverage
pathway protein
interaction protein
no SwissProt entry
multiprotein complex
(connecting at least three proteins)
Community Outreach
http://nmr.cabm.rutgers.edu:9090/HCPIN
Janet Huang
Dehua Hang
apoptosis
TLR
*
*
p53
*
P53
*
IL21(JAK)
658
Target Selection
2971
~1100 human proteins/domains
are selected as NESG targets
2328
506
136
1160
http://nmr.cabm.rutgers.edu:9090/PLIMS/
Community Outreach
SW name:
NESG-id:
PDB-id:
Coverage:
Method:
TLR2
HC02
1fyw
19%
X-ray
SW Name:
NESG-id:
PDB-id:
Coverage:
Method:
NBEA
HC3
1mi1
14%
X-ray
SW name:
NESG-id:
PDB-id:
Coverage:
Method:
HTP PSI-1
MYD88
HR2869A
2js7
52%
NMR
SW name:
NESG-id:
PDB-id:
Coverage:
Method:
RNPC2
HR4730A
2jrs
18%
NMR
Toronto Group
SW Name:
NESG-id:
PDB-id:
Coverage:
Method:
IF16
HR4626A,
HR4626B
3b6y,2oq0
51%
X-ray,X-ray
SW name:
NESG-id:
PDB-id:
Coverage:
Method:
CUL7
HT1
2jng
6%
NMR
SW name:
NESG-id:
PDB-id:
Coverage:
Method:
SW name:
NESG-id:
PDB-id:
Coverage:
Method:
DGKA
HR532
1tuz
16%
NMR
HTP T1
ZN363
HT2B
2jrj
20%
NMR
SW name:
NESG-id:
PDB-id:
Coverage:
Method:
PARC
HR3443B
2juf
14%
NMR
SW name:
NESG-id:
PDB-id:
Coverage:
Method:
RBBP9
HR2978
2qs9
100%
X-Ray
Y. Xu
SW name:
NESG-id:
PDB-id:
Coverage:
Method:
TLR2
HC02
1fyw
19%
X-ray
SW Name:
NESG-id:
PDB-id:
Coverage:
Method:
NBEA
HC3
1mi1
14%
X-ray
SW name:
NESG-id:
PDB-id:
Coverage:
Method:
DGKA
HR532
1tuz
16%
NMR
G. Jogl
G. Liu
A. Lemak
P. Rossi (2)
SW Name:
NESG-id:
PDB-id:
Coverage:
Method:
IF16
HR4626B
2oq0
26%
X-ray
SW name:
NESG-id:
PDB-id:
Coverage:
Method:
MYD88
HR2869A
2js7
52%
NMR
SW name:
NESG-id:
PDB-id:
Coverage:
Method:
RNPC2
HR4730A
2jrs
18%
NMR
Structure coverage of HCPIN
Medium-accuracy modeling level
(Blast E_value < 10-6)
High-accuracy modeling level
(Blast E_value < 10-6 and
>80% sequence identity)
Total struct. coverage
Total struct. coverage
(after sw
validation)
Pathway
proteins
HCPIN –
interaction
proteins
No.
600
1728
%SDa
86
76
(after sw
validation)
No.
%SDa
%Resb
55
Pathway
proteins
600
52
23
42
HCPIN –
interaction
proteins
1728
44
18
%Resb
a.Single-Domain (SD) coverage:
- The percentage of pathway proteins with single-domain structural coverage.
b.Residue coverage:
- The number of residues covered by PDB hit, divided by total length of proteins in the
pathways. Residues predicted to be low complexity or coiled coil are not counted in
denominator.
Single-Domain and Residue Coverage
PDB hit
Total
Single-Domain
Coverage(%)
Residue
Coverage(%)
100
50
0
0
50
25
Structure coverage of HCPIN
Medium-accuracy modeling level
(Blast E_value < 10-6)
High-accuracy modeling level
(Blast E_value < 10-6 and
>80% sequence identity)
Total struct. coverage
Total struct. coverage
(after sw
validation)
Pathway
proteins
HCPIN –
interaction
proteins
No.
600
1728
%SDa
86
76
(after sw
validation)
No.
%SDa
%Resb
55
Pathway
proteins
600
52
23
42
HCPIN –
interaction
proteins
1728
44
18
%Resb
a.Single-Domain (SD) coverage:
- The percentage of pathway proteins with single-domain structural coverage.
b.Residue coverage:
- The number of residues covered by PDB hit, divided by total length of proteins in the
pathways. Residues predicted to be low complexity or coiled coil are not counted in
denominator.
Download