Lecture 7

advertisement
Lecture7
Topic1: Graph spectral analysis/Graph spectral
clustering and its application to metabolic
networks
Topic 2: Different centrality measures of nodes
Graph spectral analysis/
Graph spectral clustering
PROTEIN STRUCTURE: INSIGHTS FROM
GRAPH THEORY
by
SARASWATHI VISHVESHWARA, K. V. BRINDA and N. KANNANy
Molecular Biophysics Unit, Indian Institute of Science
Bangalore 560012, India
Adjacency Matrix
Laplacian matrix L=D-A
Degree Matrix
Eigenvalues and eigenvectors
Eigenvalues of a matrix A are the roots of the following equation
|A-λI|=0, where I is an identity matrix
Let λ is an eigenvalue of A and x is a vector such that
-----(1)
N×N N×1
N×1
then x is an eigenvector of A corresponding to λ .
Node 1 has 3 edges, nodes 2, 3 and 4 have 2 edges each
and node 5 has only one edge. The magnitude of the
vector components of the largest eigenvalue of the
Adjacency matrix reflects this observation.
Node 1 has 3 edges, nodes 2, 3 and 4 have 2 edges each
and node 5 has only one edge. Also the magnitude of the
vector components of the largest eigenvalue of the
Laplacian matrix reflects this observation.
The largest eigenvalue (lev) depends upon the highest degree
in the graph.
For any k regular graph G (a graph with k degree on all the
vertices), the eigenvalue with the largest absolute value is k.
A corollary to this theorem is that the lev of a clique of n
vertices
is n − 1.
In a general connected graph, the lev is always less than or
equal to (≤ ) to the largest degree in the graph.
In a graph with n vertices, the absolute value of lev decreases
as the degree of vertices decreases.
 The lev of a clique with 11 vertices is 10 and that of a linear
chain with 11 vertices is 1.932
a linear chain with 11 vertices
In graphs 5(a)-5(e), the highest degree is 6. In graphs 5(f)-5(i), the highest
degree is 5, 4, 3 and 2 respectively.
It can be noticed that the lev is generally higher if the graph contains vertices of
high degree. The lev decreases gradually from the graph with highest degree 6
to the one with highest degree 2. In case of graphs 5(a)-5(e), where there is one
common vertex with degree 6 (highest degree) and the degrees of the other
vertices are different (less than 6 in all cases) i.e. the lev also depends on the
degree of the vertices adjoining the highest degree vertex.
This paper combines graph 4(a) and graph 4(b) and constructs a
Laplacian matrix with edge weights (1/dij ), where dij is the
distance between vertices i and j. The distances between the
vertices of graph 4(a) and graph 4(b) are considered to be very
large (say 100) and thus the matrix elements corresponding to a
vertex from graph 4(a) and the other from graph 4(b) is considered
to have a very small value of 0.01. The Laplacian matrix of 8
vertices thus considered is diagonalized and their eigenvalues
and corresponding vector components are given in Table 3.
The vector components corresponding to
the second smallest eigenvalue contains
the desired information about clustering,
where the cluster forming residues have
identical values. In Fig. 4, nodes 1-5 form a
cluster (cluster 1) and 6-8 form another
cluster (cluster 2).
Metabolome Based Reaction Graphs of M.
tuberculosis and M. leprae: A Comparative
Network Analysis
by
Ketki D. Verkhedkar1, Karthik Raman2, Nagasuma R. Chandra2, Saraswathi
Vishveshwara1*
1 Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India, 2
Bioinformatics Centre, Supercomputer Education and Research Centre, Indian
Institute of Science, Bangalore, India
PLoS ONE | www.plosone.org
September 2007 | Issue 9 | e881
Construction of network
Stoichrometric matrix
Following this method the networks of
metabolic reactions corresponding to
3 organisms were constructed
R1
R2
R3
R4
Analysis of network parameters
Giant component of the reaction network of e.coli
Giant components of the reaction networks of M.
tuberculosis and M. leprae
Analyses of sub-clusters in the giant component
Graph spectral analysis was performed to detect subclusters of reactions in the giant component.
To obtain the eigenvalue spectra of the graph, the
adjacency matrix of the graph is converted to a
Laplacian matrix (L), by the equation:
L=D-A
where D, the degree matrix of the graph, is a diagonal
matrix in which the ith element on the diagonal is equal
to the number of connections that the ith node makes in
the graph.
It is observed that reactions belonging to fatty acid
biosynthesis and the FAS-II cycle of the mycolic acid
pathway in M. tuberculosis form distinct, tightly
connected sub-clusters.
Identification of hubs in the reaction networks
In biological networks, the hubs are thought to
be functionally important and phylogenetically
oldest.
The largest vector component of the highest
eigenvalue of the Laplacian matrix of the graph
corresponds to the node with high degree as
well as low eccentricity. Two parameters, degree
and eccentricity, are involved in the identification
of graph spectral (GS) hubs.
Identification of hubs in the reaction networks
Alternatively, hubs can be ranked based on their
connectivity alone (degree hubs).
It was observed that the top 50 degree hubs in the
reaction networks of the three organisms comprised
reactions involving the metabolite L-glutamate as well as
reactions involving pyruvate. However, the top 50 GS
hubs of M. tuberculosis and M. leprae exclusively
comprised reactions involving L-glutamate while the top
GS hubs in E. coli only consisted of reactions involving
pyruvate.
The difference in the degree and GS hubs suggests that
the most highly connected reactions are not necessarily
the most central reactions in the metabolome of the
organism
Centrality measures of nodes
Centrality measures
Within graph theory and network analysis, there are
various measures of the centrality of a vertex within a
graph that determine the relative importance of a
vertex within the graph.
We will discuss on the following centrality measures:
•Degree centrality
•Betweenness centrality
•Closeness centrality
•Eigenvector centrality
•Subgraph centrality
Degree centrality
Degree centrality is defined as the number of links incident
upon a node i.e. the number of degree of the node
Degree centrality is often interpreted in terms of the
immediate risk of the node for catching whatever is flowing
through the network (such as a virus, or some information).
Degree centrality of the
blue nodes are higher
Betweenness centrality
The vertex betweenness centrality BC(v) of a vertex v is
defined as follows:
Here σuw is the total number of shortest paths between
node u and w and σuw(v) is number of shortest paths
between node u and w that pass node v
Vertices that occur on many shortest paths between other
vertices have higher betweenness than those that do not.
Betweenness centrality
σuw
a
c
d
b
f
e
Betweenness centrality of
node c=6
Betweenness centrality of
node a=0
σuw(v)
σuw/σuw(v)
(a,b)
1
0
0
(a,d)
1
1
1
(a,e) 1
1
1
(a,f)
1
1
1
(b,d) 1
1
1
(b,e) 1
1
1
(b,f)
1
1
1
(d,e) 1
0
0
(d,f)
1
0
0
(e,f)
1
0
0
Calculation for node c
Betweenness centrality
•Nodes of high
betweenness centrality
are important for
transport.
•If they are blocked,
transport becomes less
efficient and on the
other hand if their
capacity is improved
transport becomes
more efficient.
•Using a similar
concept edge
betweenness is
calculated.
Hue (from red=0 to blue=max)
shows the node betweenness.
http://en.wikipedia.org/wiki/Between
ness_centrality#betweenness
Closeness centrality
The farness of a vortex is the sum of the shortest-path
distance from the vertex to any other vertex in the graph.
The reciprocal of farness is the closeness centrality (CC).
1
CC (v) 
 d ( v, t )
t V \ v
Here, d(v,t) is the shortest distance between vertex v and
vertex t
Closeness centrality can be viewed as the efficiency of a
vertex in spreading information to all other vertices
Eigenvector centrality
Let A is the adjacency matrix of a graph and λ is the largest
eigenvalue of A and x is the corresponding eigenvector then
-----(1)
N×N N×1
|A-λI|=0, where I is an
identity matrix
N×1
The ith component of the eigenvector x then gives the eigenvector
centrality score of the ith node in the network.
From (1)
xi 
1
N
A


j 1
i, j
xj
•Therefore, for any node, the eigenvector centrality score be
proportional to the sum of the scores of all nodes which are
connected to it.
•Consequently, a node has high value of EC either if it is
connected to many other nodes or if it is connected to others that
themselves have high EC
Subgraph centrality
the number of closed
walks of length k starting
and ending on vertex i in
the network is given by
the local spectral
moments μ k (i), which
are simply defined as the
ith diagonal entry of the
kth power of the
adjacency matrix, A:
Subgraph Centrality in Complex
Networks, Physical Review E 71,
056103(2005)
Closed walks can be trivial or
nontrivial and are directly related to
the subgraphs of the network.
Subgraph centrality
01000000000000
10110100000000
01011100000000
01101101000000
00110100000000
01111010000000
M=
00000100001000
00010000100000
Muv = 1 if there is an edge between
nodes u and v and 0 otherwise.
00000001010011
00000000101011
00000010010000
00000000000010
00000000110101
00000000110010
Adjacency matrix
Subgraph centrality
10110100000000
04223211000000
12432311000000
12352310100000
03223211000000
12332501001000
M2 =
01111020010000
01101102010011
(M2)uv for uv represents the
number of common neighbor of the
nodes u and v.
00010000421122
local spectral moment
00000000110101
00000011240122
00000100102011
00000001221042
00000001221123
Subgraph centrality
The subgraph centrality of the node i is given by
Let λ be the main eigenvalue of the adjacency matrix A. It
can be shown that
Thus, the subgraph centrality of any vertex i is
bounded above by
Table 2.
Summary of
results of eight
real-world
complex
networks.
Download