MAPK Signaling Pathway Analysis ESD.342 – Spring 2006

advertisement
MAPK Signaling Pathway Analysis
ESD.342 – Spring 2006
May 11th 2006
Presentation by:
Gergana Bounova, Michael Hanowsky and Nandan Sudarsanam
Faculty: Chris Magee
1
Agenda
• Background
• Network Statistics: the Usual Suspects
• Structural Analysis: Motifs and Communities
• Other Datasets, Benchmarking
• Future Directions and Scope
2
MAPK Signaling Pathway
• Cellular Level Biological Network
– Signaling pathways are used to respond to external stimuli and
regulate cellular activities
– MAPK’s (Mitogen Activated Protein Kinase) – Transfer
information through chemical reactions and mechanistic physical
interactions.
– One pathway, three species – does that give us any information
• Literature and Data sources
–
–
–
–
Literature sources in Protein network analysis
The proteomics initiative
Nature of data – an incomplete map
KEGG database (Kyoto Encyclopedia of Genes and Genomes):
signaling transduction pathways
– DIP (Database of Interacting Proteins): all known interacting
proteins
3
Network Statistics - Comparisons
Drosophila
Yeast
Human
# nodes
19
56
148
# edges
19
56
187
Edge/node
1.000
1.000
1.264
Directed?
No
No
No
Connected?
Yes
No (5 comp)
No (16 comp)
Max,mean,min deg
6,2,1
8,2.154,1
15,2.831,1
Max degree node
9
6
98
Deg correlation
-0.630
-0.146
-0.306
Max,mean,min,betw
27,23.737,20
103,62.796,54
4639,1035.538,384
Max betw node
3,14,15,16,17
9
12
Clust coeff C1,C2
0
0.064
0.008
Max clust coeff node
-
1
1
# triangle loops
0
3
3
Mean path length
3.836 (20% n)
6.370 (11% n)
6.454 (4% n)
Network diameter
8.000
16.000
17
consistently
close to 1
negative
consistently
close to 0
Low reachability
for small networks
4
Network statistics - Drosophila
Drosophila
Undirected
Directed
Random – Preserved
Degree Sequence
# nodes
19
19
19
# edges
19
19
10
8
Edge/node
19
20
as expected
35
1.000 7
1.000
1.000
30
No
Yes
25
No
8
Directed
Connected
6
15
5
6
Yes
No
4
20
Yes
10
15
4
Max,mean,min deg
2
Max degree node
0
0
Deg correlation
3
6,2,1
9
2
4
degree
In: 3,1,0, out: 5,1,0
2
5
1
In: 3, out: 9
0
6
20
-0.630
22
24
26
betweenness
- 28
0
-5-4-3-2-1 0 1 2 3 4
clustering coefficient
6,2,1
10
95
0
0
2
-0.203
4
6
path length
Max,mean,min,betw
27,23.737,20
14,8,1
19, 19, 19
10
Max betw node
20
3,14,15,16,17
20
10,11,12,18,19
50
n/a
8
Clust coeff C1,C2
0, 0
0, 0
40
0.140
Max clust coeff
6 node
-
# triangle loops
4
0
Mean path length
3.836
15
10
2
Diameter
0
5
8.000
0
0
5
degree
15
10
0
14
1156
1171829202123
betweenness
8
completely different!
130
10
120
4.075 5
3.427
8
7.000
0
consistent
10
0
0.5
1
clustering coefficient
0
0
5
path length
10
5
Network statistics - Yeast
Yeast
Undirected
Directed
Yeast – Random Generated,
Preserved Degree Sequence
(GC)
# nodes
56
56
52 (GC)
# edges
56
56
56
Edge/node
1.000
1.000
1.077
No40
Yes50
No
Connected?15
No30(5 comp)
No 40
Yes
Max,mean,min deg
8,2.154,1
8,2,0
8, 2.154,1
Directed?
20
10
20
Max degree node
6
5
Deg correlation
0
0
5
20
10
0
50
9
Clust coeff 15
C1,C2
Max clust coeff
node
10
5
# triangle loops
Network
5
diameterdegree
6
100
10
100
103,62.796,54
betweenness
Max betw node
0
In: 6, out: 45
10
300
200
20
-0.146
Max,mean,min,betw
degree
Mean path length
0
30
400
40
-0.092
0
150
0
0.5
0
1
60
13, 21
30
0.0638,
0.0376
0.0202,
0.0119
40
0.000
1
1
3
10
6.370
0
10
60
80
16.000 betweenness
0
20
4.7790
20
5
path length
10
200
150
100
0
50
4.910
0
-5-4-3-2-101234
0
11.000clustering coefficient
10.000
100
10
path length
250
8,19
20
0
28,12.125,1
85, 56.7, 52
clustering coefficient
6
Network statistics - Human
Human
Undirected
Directed
Human undirected, randomly-generated (GC)
# nodes
148
148
130
# edges
187
187
184
Edge/node
1.264
1.264
1.415
Directed?
No
Yes
No
Connected?
No (16 comp)
No
Yes
Max,mean,min deg
15,2.831,1
15,2.527,0
15, 2.831, 1
Max degree node
98
In: 23, out: 98
82
Deg correlation
-0.306
Max,mean,min,betw
4639,1035.538,384
383,37.669,1
362, 241.6, 160
Max betw node
12
145
25,33,41,70,72,73,82
Clust coeff C1,C2
0.0075, 0.0045
0, 0
0.122
Max clust coeff node
1
-
1
# triangle loops
3
0
9
Mean path length
6.454
3.931
4.286
Network diameter
17
11.000
9.000
-0.323
Becoming more evident: the real pathway has more built-in flexibility,
even though reachability remains low
7
Motifs – Background
• Coarse Graining – an important bottom-up method of understanding
network structure, by uncovering global patterns.
–
–
–
This helps us go beyond the global features and understand the relevance of certain
structural elements.
Motifs are statistically significant patterns of connections that recur through out the network.
They serve as the basic building blocks of the network.
Studies have shown that each network motif performs a key information processing function
in biological networks.
• Examples of Motifs studied in Biological networks:
Directed 3-Node Motifs
Directed 4-Node Motifs
Undirected 4-Node Motifs
8
Motif Occurrences
Motif index
D.Mel
D.Mel. rand
Yeast
Yeast rand
Human
Human rand
1
24
36
40
40
538
326
2
38
16
128
104
600
694
4
8
12
38
74
272
352
5
-
3
6
3
9
15
9
-
-
3
-
-
12
Motif index
D.Mel
D.Mel. rand
Yeast
Yeast rand
1
81
66
228
2
64
82
264
3
-
25
105
4
-
-
8
5
-
-
8
297 Motif index
4488
406
2996
1
150
2
400
4
3
24
4
Human
2
1
3
4
5
Human rand
D.Mel4269 D.Mel. rand
3334
545
28
30
12
Yeast
Yeast rand
Human
Human rand
4
-
204
4
4
-
164
16
-
-
-
12
15
66
1929
411
9
The Newman-Girvan View
10
All Protein Datasets
•
Yeast original: 8992 nodes, 5952 links
– First refinement: 2554 nodes, 5728 edges
– Second refinement: 2408 nodes, 5668 edges
– Edge/node: 2.4, clust=0.294, meanL=5.197,
diam=14
•
Drosophila original: 28052 nodes, 22819 links
– First refinement: 7451 nodes, 22636 edges
– Second refinement: 7355 nodes, 22593 links
– Edge/node: 3.072, clust=0.016, meanL=8.009
•
Yeast Core Proteins Main Component
Human Proteins Connected Component
Human original: 28155 nodes, 1397 links
– First refinement: 1085 nodes, 1346 links
– Second refinement: 939 nodes, 1276 links
– Edge/node: 1.359, clust=0.235, meanL=6.822
11
Conclusions & Future Work
• Analyzed three pathway sets structurally and detected
some similar patterns (modules, communities)
• Found betweenness as the best signature of pathways
and a sign of flexibility in biological networks
• Building blocks are preserved across pathways, and
certain motifs found by others do appear statistically
significant (such as bi-fan, for example)
• Plan to perform coarse-graining on larger datasets to
look for further structural matching
• Benchmarking of motifs and other statistical measures
with whole known protein datasets
12
References
•
Itzkovitz, S., Levitt, R., Kashtan, N., Milo, R., Itzkovitz, M., and Alon. U.,
Coarse-Graining and Self-Dissimilarity of Complex Networks. arXiv:qbio.MN/0405011 v1, May 2004.
•
Jeong, H., Mason, S. P., Barabási, A.-L. & Oltvai, Z. N., Lethality and
centrality in protein networks. Nature 411, 41–42 (2001).
•
Maslov, S. & Sneppen, K. Specificity and stability in topology of protein
networks. Science 296, 910–913 (2002).
•
Milo, R., Shen-Orr, S. S., Itzkovitz, S., Kashtan, N. & Alon, U., Network
motifs: simple building blocks of complex networks. Science 298, 824–827
(2002).
•
Sole, R. V., Pastor-Satorras, R., Smith, E., and Kepler,T. B., A model of
large-scale proteome evolution, Advances in Complex Systems 5, 43{54
(2002).
•
Wuchty, S., and Almaas, E., “Peeling the Yeast protein network,”
Proteomics. 2005 Feb;5(2):444-9. 5(2), pp. 444–449, 2005.
13
Distribution Comparisons - Human
100
60
80
50
150
2500
2000
40
100
60
1500
30
40
1000
20
20
0
50
500
10
0
10
degree
20
0
0
5000
betweenness
100
80
0
0
0.2
0.4
clustering coefficient
30
120
25
100
20
80
60
0
10
path length
20
5
path length
10
2500
2000
1500
15
60
10
40
5
20
40
1000
20
0
0
0
10
degree
20
0
0
200
400
betweenness
0
0
0.5
1
clustering coefficient
500
0
0
14
Download