Slides

advertisement
Graphs
Definitions, Measures, Graphology,
Pathology, Degree Distributions
Network Science: Graph Theory
2012
COMPONENTS OF A COMPLEX SYSTEM
 components: nodes, vertices
N
 interactions: links, edges
L
 system:
network, graph
(N,L)
Network Science: Graph Theory
2012
NETWORKS OR GRAPHS?
network often refers to real systems
•www,
•social network
•metabolic network.
Language: (Network, node, link)
graph: mathematical representation of a network
•web graph,
•social graph (a Facebook term)
Language: (Graph, vertex, edge)
We will try to make this distinction whenever it is appropriate,
but in most cases we will use the two terms interchangeably.
Network Science: Graph Theory
2012
A COMMON LANGUAGE
friend
Movie 1
Mary
Peter
brothers
friend
co-worker
Albert
Movie 3
Movie 2
Albert
Protein 1
Actor 2
Actor 1
Actor 4
Actor 3
Protein 2
Protein 5
Protein 9
N=4
L=4
Network Science: Graph Theory
2012
CHOOSING A PROPER REPRESENTATION
The choice of the proper network representation determines our
ability to use network theory successfully.
In some cases there is a unique, unambiguous representation.
In other cases, the representation is by no means unique.
For example,, the way we assign the links between a group of
individuals will determine the nature of the question we can study.
Network Science: Graph Theory
2012
UNDIRECTED VS. DIRECTED NETWORKS
Undirected
Directed
Links: undirected (symmetrical)
Links: directed (arcs).
Graph:
Digraph = directed graph:
L
A
M
F
D
B
C
I
D
G
B
E
G
H
A
C
Undirected links :
coauthorship links
Actor network
protein interactions
An undirected
link is the
superposition of
two opposite
directed links.
F
Directed links :
URLs on the www
phone calls
metabolic reactions
Network Science: Graph Theory
2012
NODE DEGREES
Directed
Undirected
Node degree: the number of links connected to the node.
A
kA =1
B
In directed networks we can define an in-degree and out-degree.
D
B
The (total) degree is the sum of in- and out-degree.
C
A
kB = 4
E
G
kCin  2
kCout  1
kC  3
F
Source: a node with kin= 0; Sink: a node with kout= 0.
Graph density
 No. of the connections that may exist between n nodes
 directed graph
emax = n*(n-1)
 undirected graph
emax = n*(n-1)/2
 What fraction are present?
 density = e/ emax
 For example, out of 12 possible connections,
this graph has 7, giving it a density of 7/12 = 0.583
8
Graph density
• Would this measure be useful for comparing
networks of different sizes (different numbers of
nodes)?
• As n → ∞, a graph whose density reaches
– 0 is a sparse graph
– a constant is a dense graph
9
AVERAGE DEGREE
Directed
Undirected
Recall: <x> = (x1 + x1 + ... + xN)/N = Σi xi /N
1
k 
N
j
i
N
k
i 1
k º
i
2L
N
N – the number of nodes in the graph
1 N
L   ki
2 i 1
L – the number of links in the graph
D
B
k
C
in
1

N
N
k
i 1
in
i
,
k
out
1

N
N
k
i 1
out
i
,
k in  k out
E
A
F
k º
L
N
Network Science: Graph Theory
2012
COMPLETE GRAPH
The maximum number of links a network
of N nodes can have is: Lmax = æç N ö÷ = N(N -1)
è2ø
2
A graph with degree L=Lmax is called a complete graph,
and its average degree is <k>=N-1
Network Science: Graph Theory
2012
REAL NETWORKS ARE SPARSE
Most networks observed in real systems are sparse:
L << Lmax
or
<k> <<N-1.
WWW (ND Sample):
Protein (S. Cerevisiae):
Coauthorship (Math):
Movie Actors:
N=325,729;
N= 1,870;
N= 70,975;
N=212,250;
L=1.4 106
L=4,470
L=2 105
L=6 106
Lmax=1012
Lmax=107
Lmax=3 1010
Lmax=1.8 1013
<k>=4.51
<k>=2.39
<k>=3.9
<k>=28.78
(Source: Albert, Barabasi, RMP2002)
Network Science: Graph Theory
2012
ADJACENCY MATRIX
4
4
3
2
3
2
1
1
Aij=1 if there is a link between node i and j
Aij=0 if nodes i and j are not connected to each other.
æ0
ç
1
ç
Aij =
ç0
ç
è1
1
0
0
0
0
0
1
1
1ö
÷
1÷
1÷
÷
0ø
Directed is symmetric.
0

0
Aij  
0

0

1
0
0
0
0
0
1
1
1

0
0

0 
Undirected is not symmetric.
Network Science: Graph Theory
2012
ADJACENCY MATRIX
a
e
h
b
f
g
d
a
b
c
d
e
f
g
h
a
0
1
0
0
1
0
1
0
b
1
0
1
0
0
0
0
1
c
0
1
0
1
0
1
1
0
d
0
0
1
0
1
0
0
0
e
1
0
0
1
0
0
0
0
f
0
0
1
0
0
0
0
0
g
1
0
1
0
0
1
0
0
h
0
1
0
0
0
0
0
0
c
Network Science: Graph Theory
2012
Directed
3
2
1
0
0
0
1
0
1
1ö
÷
1÷
1÷
÷
0ø
Aij = A ji
Aii = 0
4
2
1
3
æ0
ç
0
Aij = ç
ç0
ç
è0
1
0
0
0
0
1
0
1
Aij ¹ A ji
Aii = 0
ki =
N
åA
ij
j =1
N
k j = å Aij
i=1
N
N
1
1
L = å ki = å Aij
2 i=1
2 ij
1ö
÷
0÷
0÷
÷
0ø
N
4
1
0
k inj = å Aij
æ0
ç
1
Aij = ç
ç0
ç
è1
i=1
Undirected
ADJACENCY MATRIX AND NODE DEGREES
out
i
k
=
N
åA
ij
i =1
N
N
N
i=1
j=1
i, j
L = å kiin = å k out
= å Aij
j
Network Science: Graph Theory
2012
GRAPHOLOGY 1
Undirected
4
Directed
4
1
1
2
2
3
æ0
ç
1
ç
Aij =
ç1
ç
è0
3
1
1
0
1
1
0
1
0
Aii = 0
N
1
L = å Aij
2 i, j=1
0ö
÷
1÷
0÷
÷
0ø
Aij = A ji
2L
< k >=
N
Actor network, protein-protein interactions
æ0
ç
0
ç
Aij =
ç1
ç
è0
1
0
0
1
0
0
0
0
Aii = 0
L=
N
åA
ij
i, j=1
0ö
÷
1÷
0÷
÷
0ø
Aij ¹ A ji
L
< k >=
N
WWW, citation networks
Network Science: Graph Theory
2012
GRAPHOLOGY 2
Unweighted
Weighted
4
(undirected)
4
(undirected)
1
1
2
2
3
æ0
ç
1
Aij = ç
ç1
ç
è0
Aii = 0
N
1
L = å Aij
2 i, j=1
3
1
1
0
1
1
0
1
0
0ö
÷
1÷
0÷
÷
0ø
Aij = A ji
2L
< k >=
N
protein-protein interactions, www
æ 0
ç
2
Aij = ç
ç 0.5
ç
è 0
2
0.5
0
1
1
0
4
0
Aii = 0
N
1
L = å nonzero(Aij )
2 i, j=1
0ö
÷
4÷
0÷
÷
0ø
Aij = A ji
2L
< k >=
N
Call Graph, metabolic networks
Network Science: Graph Theory
2012
GRAPHOLOGY 3
Self-interactions
Multigraph
(undirected)
4
1
4
1
2
2
3
æ1
ç
1
Aij = ç
ç1
ç
è0
3
1
1
0
1
1
0
1
0
Aii  0
N
1 N
L
Aii
 Aij  
2 i , j 1,i  j
i 1
Protein interaction network, www
0ö
÷
1÷
0÷
÷
1ø
Aij  A ji
æ0
ç
2
Aij = ç
ç1
ç
è0
2
1
0
1
1
0
3
0
Aii = 0
N
1
L = å nonzero(Aij )
2 i, j=1
0ö
÷
3÷
0÷
÷
0ø
Aij = A ji
2L
< k >=
N
Social networks, collaboration networks
Network Science: Graph Theory
2012
GRAPHOLOGY 4
Complete Graph
(undirected)
4
1
2
3
æ0
ç
1
Aij = ç
ç1
ç
è1
1
1
0
1
1
0
1
1
Aii = 0
N(N -1)
L = Lmax =
2
1ö
÷
1÷
1÷
÷
0ø
Ai¹ j = 1
< k >= N -1
Actor network, protein-protein interactions
Network Science: Graph Theory
2012
GRAPHOLOGY: Real networks can have multiple characteristics
WWW >
Protein Interactions >
directed multigraph with self-interactions
undirected unweighted with self-interactions
Collaboration network >
Mobile phone calls >
Facebook Friendship links >
undirected multigraph or weighted.
directed, weighted.
undirected,
unweighted.
Network Science: Graph Theory
2012
BIPARTITE GRAPHS
bipartite graph (or bigraph) is a graph
whose nodes can be divided into two
disjoint sets U and V such that every link
connects a node in U to one in V; that is,
U and V are independent sets.
Examples:
Hollywood actor network
Collaboration networks
Disease network (diseasome)
Network Science: Graph Theory
2012
PATHS
A path is a sequence of nodes in which each node is adjacent to the next one
Pi0,in of length n between nodes i0 and in is an ordered collection of n+1 nodes and n links
Pn = {i0,i1,i2,...,in }
Pn = {(i0 ,i1),(i1,i2 ),(i2 ,i3 ),...,(in-1,in )}
B
•A path can intersect itself and pass through the same
link repeatedly. Each time a link is crossed, it is counted
separately
•A legitimate path on the graph on the right:
ABCBCADEEBA
A
E
C
D
• In a directed network, the path can follow only the
direction of an arrow.
Network Science: Graph Theory
2012
DISTANCE IN A GRAPH
Shortest Path, Geodesic Path
The distance (shortest path, geodesic path) between two
B
nodes is defined as the number of edges along the shortest
A
path connecting them.
C
D
*If the two nodes are disconnected, the distance is infinity.
In directed graphs each path needs to follow the direction of
B
the arrows.
A
Thus in a digraph the distance from node A to B (on an AB
path) is generally different from the distance from node B to A
D
C
(on a BCA path).
Network Science: Graph Theory
2012
NUMBER OF PATHS BETWEEN TWO NODES
Adjacency Matrix
Nij, number of paths between any two nodes i and j:
Length n=1: If there is a link between i and j, then Aij=1 and Aij=0 otherwise.
Length n=2: If there is a path of length two between i and j, then AikAkj=1, and AikAkj=0
otherwise.
The number of paths of length 2:
N
N ij( 2 )   Aik Akj
k 1
Length n: In general, if there is a path of length n between i and j, then Aik…Alj=1
and Aik…Alj=0 otherwise.
The number of paths of length n between i and j is
N ij = [A ]ij
(n)
n
Network Science: Graph Theory
2012
FINDING DISTANCES: BREADTH FIRST SEARCH
Distance between node 1 and node 4:
1.Start at 1.
3
3
4
3
4
2
4
2
3
1
1
1
2
3
3
4
4
4
1
3
4
2
4
2
3
Network Science: Graph Theory
2012
FINDING DISTANCES: BREADTH FIRST SEATCH
Distance between node 1 and node 4:
1.Start at 1.
2.Find the nodes adjacent to 1. Mark them as at distance 1. Put them in a queue.
3
3
4
3
4
2
4
2
3
1
1
1
2
3
3
4
4
4
1
3
4
2
4
2
3
Network Science: Graph Theory
2012
FINDING DISTANCES: BREADTH FIRST SEATCH
Distance between node 1 and node 4:
1.Start at 1.
2.Find the nodes adjacent to 1. Mark them as at distance 1. Put them in a queue.
3.Take the first node out of the queue. Find the unmarked nodes adjacent to it in the
graph. Mark them with the label of 2. Put them in the queue.
3
3
4
3
4
2
4
2
3
1
1
1
2
3
4
4
3
4
1
3
4
2
4
2
3
Network Science: Graph Theory
2012
FINDING DISTANCES: BREADTH FIRST SEATCH
Distance between node 1 and node 4:
1.Repeat until you find node 4 or there are no more nodes in the queue.
2.The distance between 1 and 4 is the label of 4 or, if 4 does not have a label, infinity.
3
3
4
3
4
2
4
2
3
1
1
1
2
3
3
4
4
4
1
3
4
2
4
2
3
Network Science: Graph Theory
2012
NETWORK DIAMETER AND AVERAGE DISTANCE
Diameter: dmax the maximum distance between any pair of nodes in the graph;
along the shortest path.
Average path length/distance, <d>, for a connected graph:
1
 d 
d ij where dij is the distance from node i to node j

No. of Links i , j i
Network Science: Graph Theory
2012
THE BRIDGES OF KONIGSBERG
1736: Konigsberg (Prussia)
7 bridges over river Pregel
connected 4 land masses.
Can one walk across the seven
bridges and never cross the
same bridge twice?
Euler PATH or CIRCUIT: return to the starting point by traveling each
link of the graph once and only once.
Network Science: Graph Theory
2012
EULERIAN GRAPH: it has an Eulerian path
Every vertex of this graph has an even degree, therefore this is an Eulerian graph.
Following the edges in alphabetical order gives an Eulerian circuit/cycle.
http://en.wikipedia.org/wiki/Euler_circuit
Network Science: Graph Theory
2012
EULER CIRCUITS IN DIRECTED GRAPHS
B
If a digraph is strongly connected and the indegree of each node is equal to its out-degree,
then there is an Euler circuit
D
A
C
E
F
G
Otherwise there is no Euler circuit.
in a circuit we need to enter each node as many
times as we leave it.
Network Science: Graph Theory
2012
PATHOLOGY: summary
Path
Shortest Path
1
1
2
5
2
5
3
4
3
4
A sequence of nodes such that each
node is connected to the next node
along the path by a link.
The path with the shortest
length between two nodes
(distance).
Network Science: Graph Theory
2012
PATHOLOGY: summary
Diameter
Average Path Length
1
1
2
5
2
5
3
4
3
4
The longest shortest path in
a graph
The average of the shortest paths for
all pairs of nodes.
Network Science: Graph Theory
2012
PATHOLOGY: summary
Cycle
Self-avoiding Path
1
1
2
5
2
5
3
4
3
4
A path with the same start
and end node.
A path that does not intersect
itself.
Network Science: Graph Theory
2012
PATHOLOGY: summary
Eulerian Path
Hamiltonian Path
1
1
2
5
2
5
3
4
3
4
A path that traverses each
link exactly once.
A path that visits each
node exactly once.
Network Science: Graph Theory
2012
CONNECTIVITY OF UNDIRECTED GRAPHS
Connected (undirected) graph: any two vertices can be joined by a path.
A disconnected graph is made up by two or more connected components.
B
B
A
D
F
C
D
F
F
G
Largest Component:
Giant Component
A
C
F
The rest: Isolates
G
Bridge: if we erase it, the graph becomes disconnected.
Network Science: Graph Theory
2012
CONNECTIVITY OF UNDIRECTED GRAPHS
Adjacency Matrix
The adjacency matrix of a network with several components can be written in a blockdiagonal form, so that nonzero elements are confined to squares, with all other elements
being zero:
Figure after Newman, 2010
Network Science: Graph Theory
2012
THREE CENTRAL QUANTITIES IN NETWORK SCIENCE
Degree distribution
pk
Average path length
<d>
Clustering coefficient
C
Network Science: Graph Theory
2012
STATISTICS REMINDER
We have a sample of values x1, ..., xN
Distribution of x (a.k.a. PDF): probability that a randomly chosen value is x
P(x) = (# values x) / N
Σi P(xi) = 1 always!
Histograms >>>
Network Science: Graph Theory
2012
pdf -- the continuous case
Will work on the board
Network Science: Graph Theory
2012
DEGREE DISTRIBUTION
Degree distribution P(k): probability that
a randomly chosen vertex has degree k
Nk = # nodes with degree k
P(k) = Nk / N  plot
P(k)
0.6
0.5
0.4
0.3
0.2
0.1
1
2
3
4
k
Network Science: Graph Theory
2012
DEGREE DISTRIBUTION
discrete representation: pk is the probability that a node has degree k.
continuum description: p(k) is the pdf of the degrees, where
k2
ò
p(k)dk
k1
represents the probability that a node’s degree is between k1 and k2.
Normalization condition:
¥
åp
k
0
=1
¥
ò p(k)dk = 1
K min
where Kmin is the minimal degree in the network.
Network Science: Graph Theory
2012
CLUSTERING COEFFICIENT
Clustering coefficient:
What portion of your neighbors are connected?
Captures the degree to which the neighbors of a given node link to each other.
Node i with degree Ki
ei is the number of links between the Ki neighbors of node i
Ci in [0,1]
Network Science: Graph Theory
2012
THREE CENTRAL QUANTITIES IN NETWORK SCIENCE
A. Degree distribution:
B. Path length:
C. Clustering coefficient:
pk
<d>
Network Science: Graph Theory
2012
Download