Network_Lecture

advertisement
Biological Networks
8 April 2015
Slides courtesy of Eric Franzosa, Kimberly Glass
Co-authorship of scientific articles
http://www.jeffkennedyassociates.com:16080/connections/concept/image.html
2
Networks in Molecular Biology
• Protein-Protein interactions
• Protein-DNA interactions
• Genetic interactions
• Metabolic reactions
• Co-expression interactions
• Text mining interactions
• Association Networks
• Etc.
Barabasi & Oltvai, Nature Reviews, 2004
3
1. Network terminology and vocabulary
• Paths
• Motifs
• Node metrics
• Network metrics
2. Translation to biological networks
• Undirected interactions
• Functional networks
• Predicting PPI and inferring knowledge from them
• Directed networks
3. Activies
• Parsing network data;
• KEGG, STRING, and Cytoscape;
• Kevin Bacon
Introduction to networks
NETWORKS
A network is a collection of “things” connected by relationships
(in math language a network is called a graph).
It is a set of vertices V and edges E (G=V, E).
Vocabulary
NETWORKS
The “things” being connected are called nodes
(or, in math language, vertices).
V = {v1, v2, v3, v4, v5, v6}
Vocabulary
NETWORKS
Relationships/connections between nodes are called edges
(the same term is used in math language).
E = {(v1, v2), (v1, v3), (v1, v4), (v1, v5) , (v1, v6)}
Vocabulary
NETWORKS
An edge is said to be incident to two nodes.
Two nodes are connected by an edge.
Vocabulary
NETWORKS
An edge can be undirected (“A and B do/are something”)
A
B
or directed (“A does/is something to B”).
A
B
Network examples
Network
NETWORKS
Node is...
Edge is...
Directed?
Person
Friendship
No
Network examples
NETWORKS
Network
Node is...
Edge is...
Directed?
Politics
Politician
Shared project
No
Network examples
NETWORKS
Network
Node is...
Edge is...
Directed?
The Internet
Website
Hyperlink
Yes
Network examples
NETWORKS
Network
Node is...
Edge is...
Directed?
Family tree
Person
Descent/Marriage
Yes/No
Vocabulary
NETWORKS
The number of edges incident to a node is the node’s
degree. Nodes of high degree are called hubs.
1
1
1
5
1
1
This hub is a 5th-degree node
Vocabulary
NETWORKS
Degree in directed networks can be split into in-degree and
out-degree for number of incoming and outgoing edges
respectively.
1/0
0/1
1/0
2/3
0/1
1/0
In-Degree/Out-Degree
Graphs
• Graph G=(V,E) is a set of vertices V and edges E
– V = {v1, v2, v3, v4, v5}
– E = {(v1, v2), (v1, v3), (v2, v4), (v2, v5) , (v3, v5)}
• A subgraph G’ of G is induced by some V’  V and E’  E
– For example, V’ = {v1, v2, v3} and E’ = {(v1, v2), (v1, v3)}
v1
v2
v3
v1
v4
v2
v5
v3
16
Networks and Graphs: Terminology
• Formally, a network is a graph is…
– G = (V, E), an ordered tuple of two sets
– V = {v1, …, vn}, a set of unique nodes, and
– E = {(vi, vj), …}, a set of (un)ordered node tuples
Bipartite
-2
0.5
6
1.2
Directed
Multigraph
Loops
(Self-connections)
Acyclic
(DAG)
Undirected
Cyclic
Weighted
17
Sparse vs Dense
• G(V, E) where |V|=n, |E|=m the number of vertices
and edges
• Graph is sparse if m~n
• Graph is dense if m~n2
• Complete graph when m=n2
18
Connected Components
• G(V,E)
• |V| = 69
• |E| = 71
19
Connected Components
•
•
•
•
G(V,E)
|V| = 69
|E| = 71
6 connected
components
20
PATHS
Vocabulary
INTRO
End Node
Path
Start Node
Path length = 4
Vocabulary: Weighted Edges (distance)
INTRO
End Node
1
1
6
2
6
2
Path
2
1
4
1
9
1
2
2
4
2
4
1
1
5
Larger distance = weaker connection
7
3
2
3
1
Weighted
Path length = 9
6
1
4
2
9
1
3
3
Start Node
1
6
Paths
A path is a sequence {x1, x2,…, xn} such that (x1,x2),
(x2,x3), …, (xn-1,xn) are edges of the graph.
A closed path xn=x1 on a graph is called a graph
cycle or circuit.
24
Shortest-Path between nodes
25
Shortest-Path between nodes
26
Longest Shortest-Path
27
Breadth First Search (BFS)
SHORTEST PATH
Goal: Search for a node j starting from a start node i
BFS Algorithm
- Begin at start node i
- Explore all neighbors
- For each neighbor, explore its neighbors
- Keep going till you find search node j
BFS Train Problem
SHORTEST PATH
How do I get from Frankfurt to Munich with the fewest
number of connections?
FKM
Simple version does not account for weights
Dijkstra’s Algorithm
SHORTEST PATH
Finds shortest path in a network
Network can be weighted or unweighted (distances = 1)
Network can be directed or undirected
Widely used in computer network routing protocols
and transportation route calculations
Basic idea
Consider closest nodes (to start node) rather than
every neighbor. In unweighted case it is BFS.
Dijkstra’s Algorithm: Train Problem FM
Step 1 (Initialization):
Sort neighbors by distance to
start node. Mark all nodes
(except start) as unvisited.
F
85
173
217
Ma
80
Kr
W
186
K
103
E
N
502
250
183
A
S
84
SHORTEST PATH
167
M
Node
Visited?
Distance
F
1
0
Ma
0
85
Kr
0
???
K
0
173
W
0
217
N
0
???
E
0
???
A
0
???
M
0
???
S
0
???
Dijkstra’s Algorithm: Train Problem FM
F
85
173
217
Ma
80
Kr
W
186
K
103
E
N
502
250
183
A
S
84
167
M
SHORTEST PATH
Step 2 (Visit closest neighbors):
Visit closest node to start. Mark
as visited and keep track of path.
Calculate/update neighbor
distances to start node.
Node
Visited?
Distance
F
1
0
Ma
1
85
Kr
0
165
K
0
173
W
0
217
N
0
???
E
0
???
A
0
???
M
0
???
S
0
???
Dijkstra’s Algorithm: Train Problem FM
Step 3 (Repeat):
Repeat Step 2 until you reach
destination. Never revisit a node.
F
85
173
217
Ma
80
Kr
W
186
K
103
E
N
502
250
183
A
S
84
SHORTEST PATH
167
M
Node
Visited?
Distance
F
1
0
Ma
1
85
Kr
1
165
K
0
173
W
0
217
N
0
???
E
0
???
A
0
415
M
0
???
S
0
???
Dijkstra’s Algorithm: Train Problem FM
Step 3 (Repeat):
Repeat Step 2 until you reach
destination. Never revisit a node.
F
85
173
217
Ma
80
Kr
W
186
K
103
E
N
502
250
183
A
S
84
SHORTEST PATH
167
M
Node
Visited?
Distance
F
1
0
Ma
1
85
Kr
1
165
K
1
173
W
0
217
N
0
???
E
0
???
A
0
415
M
0
675
S
0
???
Dijkstra’s Algorithm: Train Problem FM
Step 3 (Repeat):
Repeat Step 2 until you reach
destination. Never revisit a node.
F
85
173
217
Ma
80
Kr
W
186
K
103
E
N
502
250
183
A
S
84
SHORTEST PATH
167
M
Node
Visited?
Distance
F
1
0
Ma
1
85
Kr
1
165
K
1
173
W
1
217
N
0
320
E
0
403
A
0
415
M
0
675
S
0
???
Dijkstra’s Algorithm: Train Problem FM
Step 3 (Repeat):
Repeat Step 2 until you reach
destination. Never revisit a node.
F
85
173
217
Ma
80
Kr
W
186
K
103
E
N
502
250
183
A
S
84
SHORTEST PATH
167
M
Node
Visited?
Distance
F
1
0
Ma
1
85
Kr
1
165
K
1
173
W
1
217
N
1
320
E
0
403
A
0
415
M
0
675
S
0
503
Dijkstra’s Algorithm: Train Problem FM
Step 3 (Repeat):
Repeat Step 2 until you reach
destination. Never revisit a node.
F
85
173
217
Ma
80
Kr
W
186
K
103
E
N
502
250
183
A
S
84
SHORTEST PATH
167
M
Node
Visited?
Distance
F
1
0
Ma
1
85
Kr
1
165
K
1
173
W
1
217
N
1
320
E
1
403
A
0
415
M
0
675
S
0
503
Dijkstra’s Algorithm: Train Problem FM
Step 3 (Repeat):
Repeat Step 2 until you reach
destination. Never revisit a node.
F
85
173
217
Ma
80
Kr
W
186
K
103
E
N
502
250
183
S
A
84
SHORTEST PATH
167
M
Node
Visited?
Distance
F
1
0
Ma
1
85
Kr
1
165
K
1
173
W
1
217
N
1
320
E
1
403
A
1
415
M
0
599
S
0
503
Dijkstra’s Algorithm: Train Problem FM
Step 3 (Repeat):
Repeat Step 2 until you reach
destination. Never revisit a node.
F
85
173
217
Ma
80
Kr
W
186
K
103
E
N
502
250
183
A
S
84
167
M
SHORTEST PATH
DONE!!!!!
Node
Visited?
Distance
F
1
0
Ma
1
85
Kr
1
165
K
1
173
W
1
217
N
1
320
E
1
403
A
1
415
M
0
599
S
0
503
NODE METRICS
What is centrality?
CENTRALITY
Centrality is a measure of the relative importance of a
node within a network
Three typical measures of centrality:
1. Degree Centrality
2. Closeness Centrality
3. Betweenness Centrality
Degree Centrality
CENTRALITY
Degree centrality is the simplest measure and is equal
to the degree of the node
What are nodes with high degree centrality called?
Hubs
What are hubs?
Hubs are nodes with a “high” degree
Date Hubs:
Interact with many
at different times
Party Hubs:
Interact with many
all the time
Controversial: Is there a difference?
CENTRALITY
Removing hubs is bad for network integrity
Removing date hubs from
yeast PPI network results
in small subgraphs
CENTRALITY
Hubs are essential
CENTRALITY
Single knockouts of essential genes cause the organism
to die
Knockouts of hubs are more essential than other genes
in the yeast protein-protein interaction (PPI) network
Knock-out lethality and connectivity
1.0E+01
60
y = 1.2x-1.91
50
% Essential Genes
P (k )
1.0E+00
1.0E-01
1.0E-02
1.0E-03
40
30
20
10
0
1.0E-04
1
10
Degree k
100
0
5
10
15
20
25
Degree k
46
How do you determine degree cutoff?
CENTRALITY
A hub is a node with a “high” degree
Hub has degree > k
k = 5 or 8 or 12 or 20
Hub has degree > degree of x % of all nodes
x = 50 or 80 or 95 %
The degree cutoff is (typically) determined ad hoc
Degree centrality is normalized
CENTRALITY
CD(i) = Degree(i) / (N-1)
Degree of node divided by total possible nodes it could
connect to (ignoring self loop)
Normalized metric for comparing same node in
different networks
Closeness Centrality
CENTRALITY
Closeness centrality measures how close a node is to
everything else
~ Average shortest path length to all other nodes
Betweenness Centrality
CENTRALITY
Betweenness centrality measures the number of times
a node is present in shortest paths between ALL pairs
of nodes
Betweenness Centrality
CENTRALITY
Betweenness centrality measures the number of times
a node is present in shortest paths between ALL pairs
of nodes
Clustering Coefficient
CLUSTERING COEFFICIENT
Clustering coefficient of node i (Ci) measures how
close its neighbors are to being a clique (completely
connected subgraph)
Clique:
CA:
All nodes interact
# edges = 2
# max edges = 6
CA = 2/6 = 1/3
A
A
B
C
B
C
Clustering coefficient
The density of the network
surrounding node I, characterized as
the number of triangles through I.
Related to network modularity
nI
2n I
CI 

 k  k  k  1
 
2
k: neighbors of I
The center node has 8 (grey) neighbors
nI: edges between
node I’s neighbors
There are 4 edges between the
neighbors
C = 2*4 /(8*(8-1)) = 8/56 = 1/7
53
WHOLE NETWORK
METRICS
Node to Network Properties
NETWORK PROPERTIES
Simple set of properties come from averaging a given
property of all nodes:
Degreeavg , Cavg
Also you can average all distances (shortest paths)
Characteristic Path Length (CPL): Average distance
between all pairs of nodes
But averages are highly dependent on the number of
nodes. It is better to look at a distribution (more in 3
slides…)
RANDOM NETWORKS
Network properties can be compared against random
(and randomized) networks to assess significance
Diameter
NETWORK PROPERTIES
Diameter: Maximum distance between all pairs of
nodes
Network properties allow you to compare different
networks.
Random Networks: ER Model
RANDOM NETWORKS
Erdös-Rényi (ER) model is a method for generating a
random network
Algorithm:
- Loop through each pair of N nodes
-Randomly add an edge between
them with probability p
Alfréd
Rényi
Paul
Erdös
p = 0.01
Real vs. Random
NETWORK PROPERTIES
Real networks have different properties than random
networks
Real networks are small-world and scale-free
Small-world
NETWORK PROPERTIES
Small-world: Most nodes can be reached from every
other by a small number of steps
i.e. Small-world networks have small diameter
President Teddy Roosevelt has a Bacon number of 3
6⁰ of separation
Randomizing Networks: Swapping
RANDOM NETWORKS
Some properties such as shortest path length are
heavily dependent on the size of the network AND the
degrees of the nodes
To avoid changing basic degree related properties, one
can randomize an existing real network by iteratively
swapping the ends of two edges
X1
Y1
X1
Y1
X2
Y2
X2
Y2
Scale-free
NETWORK PROPERTIES
Degree Distribution: Frequency of all
possible node degrees in a network
Scale-free: The degree distribution follows
a power-law
i.e. Most nodes have small degree, but
some have a very large degree
P(k) ~ kg
Motifs
NETWORK MOTIFS
Recurring pattern in network with a biological
significance
Pioneering work by Uri Alon
Biological function of motifs
NETWORK MOTIFS
Network motifs are considered the basic building
blocks of a network
Network motifs act as information processing circuits
Coherent FFL acts as a noise filter
X increases
 Y increases
X and Y increase  Z increases
TF
x
TF
y
Gene
z
Time delay between X increasing
and Y increasing
3-node model and simulation
NETWORK MOTIFS
Biological Networks
Complexity comes from the set of parts...
INTRO
...and their connections (e.g., metabolism)
INTRO
How is biological data represented in networks?
Low
Correlation
High
Correlation
• Gene expression
• Physical PPIs
+
=
• Genetic interactions
• Colocalization
• Sequence
• Protein domains
• Regulatory binding sites
…
69
Building and Interpreting Biological Netw
• How we build a biological network depends on what data we
have AND what we want the edges in the network to
represent.
• The meaning of the edges in a biological network depend on
the method used to generate those edges.
 Influences how we interpret the interactions in a network.
node:
an object in the network (e.g. genes)
edge:
indicates relationship between two nodes
70
Interpreting the “edges” in Biological
Networks
A
B
A
B
A
B
Relational Networks
Correlation Network
Regulatory Network
• Generally Undirected
(non-causal
relationships)
• Nodes all of same
“type”
• Generally no “signs” on
edges
• Undirected
(non-causal
relationships)
• Nodes all of same “type”
• Edges can have “signs”
• Directed Network
(causal relationships)
• Can have “types” of
nodes
• Edges can have “signs”
Example: When the
expression of Gene A
changes, so does the
expression for Gene B.
Example: TF A regulates
Gene B.
Example: Protein A is a
dimerization partner with
protein B.
*Correlation is not causation.
71
Network examples (Molecular biology -omes)
NETWORKS
Network
Node is...
Edge is...
Directed?
Physical
Interactome
Protein
Direct/indirect contact
No
Genetic
Interactome
Gene
Epistatic relationship
No
Informatic
Interactome
Various
Computed similarity
No
Regulatory
Interactome 1
TF/gene
Transcriptional activation
Yes
Regulatory
Interactome 2
Kinase/target Phosphorylation
Yes
Metabolome 1
Reactant
Reaction
Yes
Metabolome 2
Reaction
Reactant
Yes
PHYSICAL INTERACTION
Physical interactions between proteins (protein-protein
interactions) are intuitive to think about.
Protein A makes direct physical contact with Protein B
in the cell; alternatively, A and B both interact with a
third (mediator) protein, C.
C
Examples
PHYSICAL INTERACTION
ATP synthase is a large, stable complex of physically
interacting proteins. These are permanent* interactions.
*also called
“obligate” or
“constitutive”
Examples
PHYSICAL INTERACTION
(1) Cyclin binds to CDK and (2) the Cyclin-CDK complex
binds to a target protein. These are transient interactions.
Detection
PHYSICAL INTERACTION
Some physical interactions are inferred from biochemical
activities (e.g., a kinase and its target) or from structures
(e.g., two chains in contact in the PDB).
There are many experimental techniques for validating or
screening for protein-protein interactions.
The most popular are affinity capture and two-hybrid.
Affinity capture
PHYSICAL INTERACTION
The cell’s contents are exposed to a surface engineered to
bind a particular protein (the bait, here A). This is often
done using an antibody specific to A or a tag fused to A.
Affinity capture
PHYSICAL INTERACTION
The bait protein binds to the surface, bringing its various
interaction partners along with it (called prey).
Affinity capture
PHYSICAL INTERACTION
The unbound cellular contents are then washed away.
Affinity capture
PHYSICAL INTERACTION
C
Prey proteins pulled down by the bait are identified using
prey-specific antibodies or by mass spectrometry.
Affinity capture
PHYSICAL INTERACTION
Method strengths:
Done well, co-immunoprecipitation is considered a gold
standard of protein-protein interaction.
Method weaknesses:
Can’t differentiate between direct and indirect (mediated)
contact; prey must bind bait tightly to be pulled down.
Two-hybrid
PHYSICAL INTERACTION
The two-hybrid method manipulates the independent operation of
DNA-binding (BD) and transcription activating (AD) domains of
eukaryotic transcription factors to detect interactions.
transcription factor
UAS
Gene
Two-hybrid
PHYSICAL INTERACTION
The two-hybrid method manipulates the independent operation of
DNA-binding (BD) and transcription activating (AD) domains of
eukaryotic transcription factors to detect interactions.
BD
UAS
Transcription ON
Gene
Two-hybrid
PHYSICAL INTERACTION
The two-hybrid method manipulates the independent operation of
DNA-binding (BD) and transcription activating (AD) domains of
eukaryotic transcription factors to detect interactions.
Two fusion proteins are made: BD-P1 (bait) and AD-P2 (prey).
Two-hybrid
PHYSICAL INTERACTION
The two-hybrid method manipulates the independent operation of
DNA-binding (BD) and transcription activating (AD) domains of
eukaryotic transcription factors to detect interactions.
Two fusion proteins are made: BD-P1 (bait) and AD-P2 (prey).
BD
UAS
Gene
Two-hybrid
PHYSICAL INTERACTION
The two-hybrid method manipulates the independent operation of
DNA-binding (BD) and transcription activating (AD) domains of
eukaryotic transcription factors to detect interactions.
Two fusion proteins are made: BD-P1 (bait) and AD-P2 (prey).
Interaction of P1 and P2 is sufficient to initiate transcription.
BD
UAS
Transcription ON
Gene
Two-hybrid
PHYSICAL INTERACTION
Method strengths:
Scales well to very high-throughput screens; can detect
transient interactions; reasonably specific to binary (A+B)
type interactions.
Method weaknesses:
High false positive and negative rates; fusion may affect
bait/prey proteins’ ability to fold or bind; bait/prey may
not be able to enter the nucleus (required for activation).
GENETIC INTERACTIONS
Genetic interactions are more abstract.
They go by many names, often recognized by the terms
phenotypic, synthetic, or dosage.
All are related to the concept of epistasis.
Epistasis
GENETIC INTERACTIONS
Let’s say there are two methods of recreating ATP from ADP and Pi:
one mediated by gene 1 (solid) and another by gene 2 (dashed).
gene 1
gene 2
Epistasis
GENETIC INTERACTIONS
If only one of the two pathways is lost, the redundant pathway
remains, the cell can still produce ATP, and therefore lives.
Phenotype = alive.
gene 1
gene 1
gene 2
gene 2
Epistasis
GENETIC INTERACTIONS
If both pathways are lost the cell cannot produce ATP and therefore
dies. Loss of both genes results in a new phenotype.
Phenotype = dead.
gene 1
gene 2
This notion, that a new phenotype can result from a combination of
changes at the genetic level, is epistasis. We report a genetic
interaction between genes 1 and 2 called synthetic lethality.
(Related terms: sick, phenotypic enhancement, rescue).
GENETIC INTERACTIONS
Genetic interactions can be
useful for identifying parallel
pathways and other subtle
(non-physical) interactions.
Complexes may also be
revealed if they are robust
against the removal of one,
but not two, components.
B
A
D
C
A
B
A
D
C
B
C
D
Common interaction databases
DATABASES
BioGRID (http://www.thebiogrid.org/)
Biological General Repository for Interaction Datasets. Comprehensive, especially
for yeast; includes high throughput and small-scale analyses; 250,000 interactions.
MINT (http://mint.bio.uniroma2.it/mint/)
Molecular Interaction database. Experimental interaction data manually curated
from literature. 80,000 interactions.
MIPS (http://mips.helmholtz-muenchen.de/)
Munich Information Center for Protein Sequences. Very well curated; often used as
a “gold standard” of protein-protein interaction.
HPRD (http://www.hprd.org/)
Human Protein Reference Database. Emphasis on human protein bioinformatics,
including 40,000 interactions.
Others…
Single interaction report
Gene/Protein 1, code and alias
DATABASES
Experimental method
YOR128C YCR066W ADE2 RAD18 Two-hybrid Uetz P (2000) 10688190
Gene/Protein 2, code and alias
Reference (including Pubmed ID)
Statistics from BioGRID (2009): Organisms
DATABASES
Genes in
Genome
Reported
Interactions
% Confirmed
% Physical
% Genetic
6,000
95,978
25%
49%
54%
Homo sapiens
(Human)
25,000
26,864
29%
100%
1%
Drosophila melanogaster
(Fruitfly)
14,000
24,953
11%
89%
11%
5,000
11,562
11%
16%
88%
Caenorhabditis elegans
(Nematode worm)
20,000
6,622
2%
69%
31%
Arabidopsis Thaliana
(Mouse-ear cress)
25,000
2,611
27%
97%
4%
Mus musculus
(Mouse)
24,000
894
21%
99%
3%
Species
Saccharomyces cerevisiae
(Baker’s Yeast)
Schizosaccharomyces pombe
(Fission yeast)
Statistics from BioGRID (2009): Methods
Method Type
Method Name
Physical
DATABASES
Interactions Reported
Papers Using
Two-hybrid
48,192
4,519
Physical
Affinity Capture-MS
31,258
655
Genetic
Phenotypic Enhancement
30,807
2,675
Physical
Affinity Capture-Western
16,524
8,763
Genetic
Phenotypic Suppression
12,399
1,936
Genetic
Synthetic Growth Defect
12,085
980
Physical
Reconstituted Complex
11,782
7,138
Genetic
Synthetic Lethality
11,666
1,555
Physical
Biochemical Activity
6,657
1,370
Genetic
Dosage Rescue
3,660
1,736
Genetic
Synthetic Rescue
2,767
1,277
Genetic
PCA
2,685
31
Physical
Co-purification
2,168
615
Physical
Affinity Capture-RNA
1,209
24
Physical
Co-fractionation
1,065
444
Statistics from BioGRID (2009): Papers
DATABASES
Interactions Reported (≤)
Number of Papers
1
9,639
10
10,696
100
1,049
1,000
64
10,000
25
100,000
2
The vast majority of interaction-reporting papers (94.7%) report
10 or fewer interactions (99.6% for 100 or fewer).
About 20% of known interactions have only been observed in
studies reporting 10 or fewer interactions.
What are they?
FUNCTIONAL NETWORKS
Functional association network
or
Functional linkage network (FLN)
- Nodes are genes or proteins
- Proteins aka functional association
What can we use to functionally link genes/proteins?
GO!
STRING
FUNCTIONAL NETWORKS
http://string.embl.de/
- Physical interactions
- Genomic context (e.g. gene fusion events)
- Coexpression (microarray)
- Literature co-occurrence
STRING
FUNCTIONAL NETWORKS
Functional association  Predicted physical interaction
Maybe?
Works because they include another information:
Species co-occurrence (630 organisms!!)
Homology based prediction
PPI PREDICTION
- Interacting proteins are more likely to co-evolve
- Interactions are transferred to corresponding orthologs
ortholog
A
α
physical
interaction
Mouse
B
ortholog
physical
interaction?
β
Human
Homology based prediction
PPI PREDICTION
- Interacting proteins are more likely to co-evolve
- Interactions are transferred to corresponding orthologs
ortholog
A
α
physical
interaction
Mouse
B
ortholog
physical
interaction
β
Human
“Interologs”: Interacting AND Homologous
Homology based prediction
PPI PREDICTION
- Interacting proteins are more likely to co-evolve
- Interactions are transferred to corresponding orthologs
HOLD YOUR
HORSES!
ortholog
A
α
physical
interaction
Mouse
B
ortholog
physical
interaction
β
Human
Phylogenetic profiling
PPI PREDICTION
Ortholog interactions must be present across many
species
A-B
Human
Mouse
Chicken
Yeast
Worm
Fly
Fugu
E. Coli
?
Yes
Yes
Yes
No
Yes
Yes
No
Phylogenetic profiling
PPI PREDICTION
Ortholog interactions must be present across many
species
A-B
Human
Mouse
Chicken
Yeast
Worm
Fly
Fugu
E. Coli
Yes
Yes
Yes
Yes
No
Yes
Yes
No
5 out of 7
p-value = 0.0001
Phylogenetic tree similarity
PPI PREDICTION
- Entirely based on co-evolution
- A and B have similar trees  they must interact
≈
Protein A
Protein B
Structural patterns
PPI PREDICTION
- Identify interaction interfaces from structures
- Search for the same interface in other pairs of PDB
structures
A
B
Interface
Integrate all information
PPI PREDICTION
The best prediction algorithms integrate different
evidences using machine learning – like STRING
Basic idea:
Step 1: Identify recurring evidence pattern in known
interactions – training
Step 2: Identify new interactions by searching for same
evidence pattern in unknown protein pairs – testing
How to use interactomes?
PPI ANALYSES
Remember: Network is undirected
Clustering
Find complexes
Protein neighborhoods – functional
Other
Inferring knowledge such as functional annotations
Clustering
PPI ANALYSES
Function Assignment
PPI ANALYSES
Guilt-by-association: Function is transferred from
neighbors
Interacting partner annotations:
BLUE
GREEN
B
F
C
A
E
D
Function Assignment
PPI ANALYSES
Guilt-by-association: Function is transferred from
neighbors
Interacting partner annotations:
BLUE
GREEN
B
F
B
C
A
E
D
“best” = max
F
C
A
E
D
Correlation Networks
Function Assignment
PPI ANALYSES
Guilt-by-association: Function is transferred from
neighbors
Interacting partner annotations:
BLUE
McGary et al, Genome Biology, 2007
GREEN
B
F
B
C
A
E
D
all
F
C
A
E
D
Correlation is the simplest metric for co-expression
genes
genes
conditions
genes
115
Mutual Information is a Measure
of Non-linear Correlation
Pearson
correlation
value
Source: http://en.wikipedia.org/wiki/Correlation_and_dependence
116
Mutual Information (MI)
Definition I ( X ; Y )    p ( x, y ) log p ( x, y ) 
 p( x) p( y ) 
yY x X


Properties MI  I ( X ; Y )  H ( X )  H ( X | Y )
• Measures how much knowing one of these variables reduces
uncertainty about the other
• Positive and symmetric
• Invariant under nonlinear transformation
Network Reconstruction Algorithms that
use MI:
• ARACNE
• CLR
117
Regulatory Networks
DIRECTED NETWORKS
Signaling
Phosphorylation
Activation
Inhibition
Protein
A
Protein
B
Transcriptional
Regulation
TF
A
TF = Transcription Factor
Expression
Repression
Gene
B
Regulatory Networks
Regulatory Networks
DIRECTED NETWORKS
Signal at Cell Surface
Cascade to Nucleus
Activate Transcription
Factors
TF
Genes
Gene Expression
Transcriptional Regulatory Networks
TF NETWORKS
Identify genes where transcription factors bind
DNA binding sites
- Experimental techniques
- Computational prediction
Identifying DNA Binding Sites: Experiments
TF NETWORKS
ChIP-chip
Chromatin immunoprecipitation (ChIP) followed by
microarray analysis (chip) or sequencing (seq)
Identifying DNA Binding Sites: Computational TF NETWORKS
Motif Scanning
Scan promoters using position weight matrices (PWM)
Yeast Transcriptional Regulatory Network
Rick Young dataset
TF NETWORKS
Yeast Transcriptional Regulatory Network
TF NETWORKS
TF – TF interactions only
Every edge can be an activation or an inhibition.
Overview
SIGNALING NETWORKS
Edges: activation or inhibition (multiple edge types!)
KEGG Pathways Database
SIGNALING NETWORKS
Edges: activation, inhibition, phosphorylation, etc.
KEGG Pathways Database
SIGNALING NETWORKS
Literature curated, manually drawn pathways
Groups of pathways
Metabolism
Genetic Information Processing
Environmental Information Processing
Cellular Processes
Human Diseases
Pathways are both species specific & cross-species (KO)
Other Pathway Databases
SIGNALING NETWORKS
KEGG (http://www.kegg.jp/kegg/pathway.html)
Great for metabolic pathways. Simple interface. Multiple
species including prokaryotes.
REACTOME (http://www.reactome.org/)
Supposedly the most comprehensive resource for signal
transduction pathways. Human only.
BIOCARTA (http://www.biocarta.com/genes/index.asp)
Pretty maps with lots of colors. Mammalian.
Experiments
SIGNALING NETWORKS
Decades of low throughput, painstaking experiments
- Stimulation
- Mutants
- Structure
- Context
No single experiment type to deduce signaling network
Directions = Pathways
DIRECTED NETWORKS
- Chain regulatory interactions
- Concept of pathway emerges from directions
- New analyses not possible with undirected networks
TF
A
TF
B
TF
C
Gene
D
Recept.
A
Kinase
B
Signal
Protein
C
TF
D
Connect the dots
DIRECTED NETWORKS
Signal at Cell Surface
Cascade to Nucleus
Activate Transcription
Factors
TF
Genes
Gene Expression
Clustering
DIRECTED NETWORKS
Network Analysis and Visualization
http://www.cytoscape.org/
http://igraph.sourceforge.net/
http://www.graphviz.org/
134
SUMMARY / APPLICATION
Functional mapping: mining biological networks
Predicted relationships
between genes
Low
Confidence
High
Confidence
The strength of these
relationships indicates how
cohesive a process is.
Cell cycle genes
136
Functional mapping: mining biological networks
Predicted relationships
between genes
Low
Confidence
High
Confidence
Cell cycle genes
137
Functional mapping: mining biological networks
Predicted relationships
between genes
Low
Confidence
High
Confidence
The strength of these
relationships indicates
how associated two
processes are.
Cell cycle genes
DNA replication genes
138
Predicting gene function
Predicted relationships
between genes
Low
Confidence
High
Confidence
Cell cycle genes
139
Predicting gene function
Predicted relationships
between genes
Low
Confidence
High
Confidence
Cell cycle genes
140
Predicting gene function
Predicted relationships
between genes
Low
Confidence
High
Confidence
These edges provide
a measure of how
likely a gene is to
specifically participate
in the process of
interest.
Cell cycle genes
141
IMAGE SOURCES
Slide numbers are no longer correct due to rearrangement and slide deck merging, but consult these URLs for all otherwise
unattributed images
Slide
Source
1
http://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg
4
http://en.wikipedia.org/wiki/Leonhard_Euler
8
http://en.wikipedia.org/wiki/Breadth-first_search
31
http://www.ncbi.nlm.nih.gov/pubmed/15190252
34
http://genomebiology.com/content/figures/gb-2007-8-5-r95-3.gif
36
http://www.sgcity.org/airport/images/routemaplg.gif
38
http://en.wikipedia.org/wiki/Centrality
45
http://en.wikipedia.org/wiki/Paul_Erdős
http://en.wikipedia.org/wiki/Alfréd_Rényi
http://en.wikipedia.org/wiki/Erdős–Rényi_model
48
http://en.wikipedia.org/wiki/Six_degrees_of_separation
http://en.wikipedia.org/wiki/Theodore_roosevelt
http://oracleofbacon.org/images/Kevin_Bacon.jpg
49
http://network-science.org/fig_rand_versus_scalefree.html
50
http://www.weizmann.ac.il/mcb/UriAlon/
51
https://www.weizmann.ac.il/complex/tlusty/courses/InfoInBio/Papers/AlonMotifs2002.pdf
53
http://www.weizmann.ac.il/mcb/UriAlon/Papers/network_motifs_in_coli.pdf
IMAGE SOURCES
Slide
Source
1
http://en.wikipedia.org/wiki/Signal_transduction
17
http://interactome.dfci.harvard.edu/S_cerevisiae/S_images/Y2H_YI1.png
29
http://www.bcm.edu/molvir/images/faculty/Palzkill-Graphic.png
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1236910/figure/F3/
41
http://evolutionarygenomics.imim.es/pf/pf_documentation.php?WID=
http://ani.embl.de/trawler/result_paper/logo_mammals_trawler/nfkb_transfac.png
42
http://www.sciencemag.org/cgi/content/full/298/5594/799
43
http://www.biomedcentral.com/1471-2105/7/113/figure/F5?highres=y
45
http://www.kegg.jp/kegg/pathway/hsa/hsa04010.html
50
http://publications.nigms.nih.gov/computinglife/images/fuzzy_big.gif
Download