Weighted Graphs and Disconnected Components Patterns and a

advertisement
Weighted Graphs and
Disconnected Components
Patterns and a Generator
Mary McGlohon, Leman Akoglu, Christos Faloutsos
Carnegie Mellon University
School of Computer Science
McGlohon, Akoglu, Faloutsos KDD08
2
“Disconnected” components
●
●
●
In graphs a largest connected component
emerges.
What about the smaller-size components?
How do they emerge, and join with the large
one?
McGlohon, Akoglu, Faloutsos KDD08
3
Weighted edges
●
Graphs have heavy-tailed degree distribution.
●
What can we also say about these edges?
●
How are they repeated, or otherwise weighted?
McGlohon, Akoglu, Faloutsos KDD08
4
Our goals
●
Observe “Next-largest connected components”
Q1. How does the GCC emerge?
Q2. How do NLCC’s emerge and join with the GCC?
●
Find properties that govern edge weights
Q3: How does the total weight of the graph relate to
the number of edges?
Q4: How do the weights of nodes relate to degree?
Q5: Does this relation change with the graph?
●
Q6: Can we produce an emergent, generative
model
McGlohon, Akoglu, Faloutsos KDD08
5
Outline
●
Motivation
●
Related work
●
Preliminaries
●
Data
●
Observations
●
Model
●
Summary
1
2
3
4
5
McGlohon, Akoglu, Faloutsos KDD08
6
Properties of networks
●
Small diameter (“small world” phenomenon)
–
●
Heavy-tailed degree distribution
–
●
[Barabasi, Albert 99] [Faloutsos, Faloutsos,
Faloutsos 99]
Densification
–
●
[Milgram 67] [Leskovec, Horovitz 07]
[Leskovec, Kleinberg, Faloutsos 05]
“Middle region” components as well as GCC
and singletons
–
[Kumar, Novak, Tomkins 06]
McGlohon, Akoglu, Faloutsos KDD08
7
Generative Models
●
Erdos-Renyi model [Erdos, Renyi 60]
●
Preferential Attachment [Barabasi, Albert 99]
●
●
●
●
Forest Fire model [Leskovec, Kleinberg,
Faloutsos 05]
Kronecker multiplication [Leskovec,
Chakrabarti, Kleinberg, Faloutsos 07]
Edge Copying model [Kumar, Raghavan,
Rajagopalan, Sivakumar, Tomkins, Upfal 00]
“Winners don’t take all” [Pennock, Flake,
Lawrence, Glover, Giles 02]
McGlohon, Akoglu, Faloutsos KDD08
8
Outline
●
Motivation
●
Related work
●
Preliminaries
●
Data
●
Observations
●
Model
●
Summary
1
2
3
4
5
6
McGlohon, Akoglu, Faloutsos KDD08
9
Diameter
●
Diameter of a graph is the “longest shortest
path”.
n5
n1
n2
n6
n3
n4
n7
McGlohon, Akoglu, Faloutsos KDD08
10
Diameter
●
Diameter of a graph is the “longest shortest
path”.
n5
n1
diameter=3
n2
n6
n3
n4
n7
McGlohon, Akoglu, Faloutsos KDD08
11
Diameter
●
●
Diameter of a graph is the “longest shortest
path”.
Effective diameter is the distance at which 90%
of nodes can be reached.
n5
n1
diameter=3
n2
n6
n3
n4
n7
McGlohon, Akoglu, Faloutsos KDD08
12
Outline
●
Motivation
●
Related work
●
Preliminaries
●
Data
●
Observations
●
Model
●
Summary
1
2
3
4
5
McGlohon, Akoglu, Faloutsos KDD08
13
Unipartite Networks
●
●
Postnet: Posts in blogs, hyperlinks
between
Blognet: Aggregated Postnet,
repeated edges
●
Patent: Patent citations
●
NIPS: Academic citations
●
Arxiv: Academic citations
●
NetTraffic: Packets, repeated edges
●
n1
n3
n2
n4
Autonomous Systems (AS): Packets,
repeated edges
n5
n6
n7
McGlohon, Akoglu, Faloutsos KDD08
14
Unipartite Networks
●
●
Postnet: Posts in blogs, hyperlinks
between
Blognet: Aggregated Postnet,
repeated edges
●
Patent: Patent citations
●
NIPS: Academic citations
●
Arxiv: Academic citations
●
NetTraffic: Packets, repeated edges
●
(3)
n1
n3
n2
n4
Autonomous Systems (AS): Packets,
repeated edges
n5
n6
n7
McGlohon, Akoglu, Faloutsos KDD08
15
Unipartite Networks
●
●
Postnet: Posts in blogs, hyperlinks
between
Blognet: Aggregated Postnet,
repeated edges
●
Patent: Patent citations
●
NIPS: Academic citations
●
Arxiv: Academic citations
●
NetTraffic: Packets, repeated edges
●
10
n1
n2
n3
1.2
1
8.3
Autonomous Systems (AS): Packets,
repeated edges
n4
6
n5
n6
2
n7
McGlohon, Akoglu, Faloutsos KDD08
16
Unipartite Networks
●
(Nodes, Edges, Timestamps)
●
Postnet: 250K, 218K, 80 days
●
Blognet: 60K,125K, 80 days
●
Patent: 4M, 8M, 17 yrs
n1
●
NIPS: 2K, 3K, 13 yrs
●
Arxiv: 30K, 60K, 13 yrs
●
NetTraffic: 21K, 3M, 52 mo
●
AS: 12K, 38K, 6 mo
n3
n2
n4
n5
n6
n7
McGlohon, Akoglu, Faloutsos KDD08
17
Bipartite Networks
●
IMDB: Actor-movie network
●
Netflix: User-movie ratings
●
DBLP: conference- repeated edges
–
Author-Keyword
–
Keyword-Conference
–
●
Author-Conference
US Election Donations: $ weights,
repeated edges
–
–
n1
m
1
n2
m
2
n3
Orgs-Candidates
Individuals-Orgs
McGlohon, Akoglu, Faloutsos KDD08
m
n4
3
18
Bipartite Networks
●
IMDB: Actor-movie network
●
Netflix: User-movie ratings
●
DBLP: repeated edges
–
Author-Keyword
–
Keyword-Conference
–
●
Author-Conference
US Election Donations: $ weights,
repeated edges
–
–
n1
m
1
n2
m
2
n3
Orgs-Candidates
Individuals-Orgs
McGlohon, Akoglu, Faloutsos KDD08
m
n4
3
19
Bipartite Networks
●
IMDB: Actor-movie network
●
Netflix: User-movie ratings
●
DBLP: repeated edges
–
Author-Keyword
–
Keyword-Conference
–
●
Author-Conference
US Election Donations: $ weights,
repeated edges
–
–
n2
McGlohon, Akoglu, Faloutsos KDD08
1.2
n3
n4
m
1
2
5
Orgs-Candidates
Individuals-Orgs
10
n1
1 6
m
2
m
3
20
Bipartite Networks
●
IMDB: 757K, 2M, 114 yr
●
Netflix: 125K, 14M, 72 mo
●
DBLP: 25 yr
–
Author-Keyword: 27K, 189K
–
Keyword-Conference: 10K, 23K
–
●
Author-Conference: 17K, 22K
US Election Donations: 22 yr
–
–
Orgs-Candidates: 23K, 877K
Individuals-Orgs: 6M, 10M
McGlohon, Akoglu, Faloutsos KDD08
n1
m
1
n2
m
2
n3
m
n4
3
21
Outline
●
Motivation
●
Related work
●
Preliminaries
●
Data
●
Observations
●
Model
●
Summary
1
2
3
4
5
McGlohon, Akoglu, Faloutsos KDD08
22
Observation 1: Gelling Point
Q1: How does the GCC emerge?
McGlohon, Akoglu, Faloutsos KDD08
23
Observation 1: Gelling Point
●
●
Most real graphs display a gelling point, or
burning off period
After gelling point, they exhibit typical behavior.
This is marked by a spike in diameter.
IMDB
t=1914
Diameter
Time
McGlohon, Akoglu, Faloutsos KDD08
24
Observation 2: NLCC behavior
Q2: How do NLCC’s emerge
and join with the GCC?
Do they continue to grow in size?
Do they shrink?
Stabilize?
McGlohon, Akoglu, Faloutsos KDD08
25
Observation 2: NLCC behavior
●
After the gelling point, the GCC takes off, but
NLCC’s remain constant or oscillate.
IMDB
CC size
Time
McGlohon, Akoglu, Faloutsos KDD08
26
Outline
●
Motivation
●
Related work
●
Preliminaries
●
Data
●
Observations
●
Model
●
Summary
1
2
3
4
5
McGlohon, Akoglu, Faloutsos KDD08
27
Observation 3
Q3: How does the total weight
of the graph relate to the
number of edges?
McGlohon, Akoglu, Faloutsos KDD08
28
Observation 3: Fortification Effect
●
$ = # checks ?
Orgs-Candidates
2004
|$|
1980
|Checks|
McGlohon, Akoglu, Faloutsos KDD08
29
Observation 3: Fortification Effect
●
Weight additions follow a power law with
respect to the number of edges:
–
W(t): total weight of graph at t
–
E(t): total edges of graph at t
–
w is PL exponent
–
1.01 < w < 1.5 = super-linear!
–
(more checks, even more $)
Orgs-Candidates
2004
|$|
1980
|Checks|
McGlohon, Akoglu, Faloutsos KDD08
30
Observation 4 and 5
Q4: How do the weights
of nodes relate to degree?
Q5: Does this relation
change over time?
McGlohon, Akoglu, Faloutsos KDD08
31
Observation 4:
Snapshot Power Law
●
●
At any time, total incoming weight of a node is
proportional to in degree with PL exponent, iw. 1.01 < iw
< 1.26, super-linear
More donors, even more $
Orgs-Candidates
e.g. John Kerry,
$10M received,
from 1K donors
In-weights
($)
Edges (# donors)
McGlohon, Akoglu, Faloutsos KDD08
32
Observation 5:
Snapshot Power Law
●
For a given graph, this exponent is constant
over time.
Orgs-Candidates
exponent
Time
McGlohon, Akoglu, Faloutsos KDD08
33
Outline
●
Motivation
●
Related work
●
Preliminaries
●
Data
●
Observations
●
●
Q6: Is there a generative, “emergent”
model?
Summary
McGlohon, Akoglu, Faloutsos KDD08
34
Goals of model
●
a) Emergent, intuitive behavior
●
b) Shrinking diameter
●
c) Constant NLCC’s
●
d) Densification power law
●
e) Power-law degree distribution
McGlohon, Akoglu, Faloutsos KDD08
35
Goals of model
●
a) Emergent, intuitive behavior
●
b) Shrinking diameter
●
c) Constant NLCC’s
●
d) Densification power law
●
e) Power-law degree distribution
= “Butterfly” Model
McGlohon, Akoglu, Faloutsos KDD08
36
Butterfly model in action
●
A node joins a network, with own parameter.
pstep
n1
n3
n2
n8
“Curiosity”
n4
n5
n6
n7
McGlohon, Akoglu, Faloutsos KDD08
37
Butterfly model in action
●
A node joins a network, with own parameter.
●
With (global) phost, chooses a random host
n1
n3
n2
phost
“Cross-disciplinarity”
n8
n4
n5
n6
n7
McGlohon, Akoglu, Faloutsos KDD08
38
Butterfly model in action
●
A node joins a network, with own parameters.
●
With (global) phost, chooses a random host
–
With (global) plink, creates link
n1
n3
n2
plink
n8
“Friendliness”
n4
n5
n6
n7
McGlohon, Akoglu, Faloutsos KDD08
39
Butterfly model in action
●
A node joins a network, with own parameters.
●
With (global) phost, chooses a random host
–
With (global) plink, creates link
–
With pstep travels to random neighbor
n1
n3
n2
n8
pstep
n4
n5
n6
n7
McGlohon, Akoglu, Faloutsos KDD08
40
Butterfly model in action
●
A node joins a network, with own parameters.
●
With (global) phost, chooses a random host
–
With (global) plink, creates link
–
With pstep travels to random neighbor. Repeat.
n1
n3
n2
n8
plink
n4
n5
n6
n7
McGlohon, Akoglu, Faloutsos KDD08
41
Butterfly model in action
●
A node joins a network, with own parameters.
●
With (global) phost, chooses a random host
–
With (global) plink, creates link
–
With pstep travels to random neighbor. Repeat.
n1
n3
n2
n8
pstep
n4
n5
n6
n7
McGlohon, Akoglu, Faloutsos KDD08
42
Butterfly model in action
●
Once there are no more “steps”, repeat “host”
procedure:
–
With phost, choose new host, possibly link, etc.
n1
n3
n2
n8
phost
n4
n5
n6
n7
McGlohon, Akoglu, Faloutsos KDD08
43
Butterfly model in action
●
Once there are no more “steps”, repeat “host”
procedure:
–
With phost, choose new host, possibly link, etc.
n1
n3
n2
n8
phost
n4
n5
n6
n7
McGlohon, Akoglu, Faloutsos KDD08
44
Butterfly model in action
●
Once there are no more “steps”, repeat “host”
procedure:
–
With phost, choose new host, possibly link, etc.
–
Until no more steps, and no more hosts.
n1
n3
n2
n8
plink
n4
n5
n6
n7
McGlohon, Akoglu, Faloutsos KDD08
45
Butterfly model in action
●
Once there are no more “steps”, repeat “host”
procedure:
–
With phost, choose new host, possibly link, etc.
–
Until no more steps, and no more hosts.
n1
n3
n2
n8
n4
n5
pstep
n6
n7
McGlohon, Akoglu, Faloutsos KDD08
46
a) Emergent, intuitive behavior
Novelties of model:
●
Nodes link with probability
–
●
Incoming nodes are “social butterflies”
–
●
May choose host, but not link (start new component)
May have several hosts (merges components)
Some nodes are friendlier than others
–
pstep different for each node
–
This creates power-law degree distribution (theorem)
McGlohon, Akoglu, Faloutsos KDD08
47
Validation of Butterfly
●
Chose following parameters:
–
phost= 0.3
–
plink = 0.5
–
pstep ~ U(0,1)
●
Ran 10 simulations
●
100,000 nodes per simulation
McGlohon, Akoglu, Faloutsos KDD08
48
b) Shrinking diameter
●
Shrinking diameter
–
In model, gelling usually occurred around N=20,000
N=20,000
Diameter
Nodes
McGlohon, Akoglu, Faloutsos KDD08
49
c) Oscillating NLCC’s
●
Constant / oscillating NLCC’s
N=20,000
NLCC
size
Nodes
McGlohon, Akoglu, Faloutsos KDD08
50
d) Densification power law
●
Densification:
–
Our datasets had a=(1.03, 1.7)
–
In [Leskovec+05-KDD], a= (1.1, 1.7)
–
Simulation produced a = (1.1,1.2)
Edges
N=20,000
Nodes
McGlohon, Akoglu, Faloutsos KDD08
51
e) Power-law degree distribution
●
Power-law degree distribution
–
Exponents approx -2
Count
Degree
McGlohon, Akoglu, Faloutsos KDD08
52
Summary
●
Studied several diverse public graphs
–
Measured at many timestamps
–
Unipartite and bipartite
–
Blogs, citations, real-world, network traffic
–
Largest was 6 million nodes, 10 million edges
McGlohon, Akoglu, Faloutsos KDD08
53
Summary
●
Observations on unweighted graphs:
A1: The GCC emerges at the “gelling point”
A2: NLCC’s are of constant / oscillating size
●
Observations on weighted graphs:
A3: Total weight increases super-linearly with edges
A4: Node’s weights increase super-linearly with
degree, power law exponent iw
A5: iw remains constant over time
●
A6: Intuitive, emergent generative “butterfly”
model, that matches properties
McGlohon, Akoglu, Faloutsos KDD08
54
References
[Barabasi+99] Barabasi, A. L. & Albert, R. (1999), 'Emergence of scaling in random
networks', Science 286(5439), 509--512.
[Erdos+60] Erdos, P. & Renyi, A. (1960), 'On the evolution of random graphs', Publ. Math.
Inst. Hungary. Acad. Sci. 5, 17-61.
[Faloutsos*99] Faloutsos, M.; Faloutsos, P. & Faloutsos, C. (1999), 'On Power-law
Relationships of the Internet Topology', SIGCOMM, 251-262.
[Kumar+99]. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and Eli
Upfal. Stochastic models for the Web graph. Proceedings of the 41th FOCS. 2000, pp.
57-65
[Kumar+06] Kumar, R.; Novak, J. & Tomkins, A. (2006), Structure and evolution of online
social networks, in 'KDD '06: Proceedings of the 12th ACM SIGKDD International
Conference on Knowedge Discover and Data Mining', pp. 611—617.
[Leskovec+05KDD] Leskovec, J.; Kleinberg, J. & Faloutsos, C. (2005), Graphs over time:
densification laws, shrinking diameters and possible explanations, in 'KDD '05.
[Leskovec+07] Leskovec, J.; Faloutsos, C. Scalable modeling of real graphs using
Kronecker Multiplication. ICML 2007.
[Milgram+67] Milgram, S. (1967), 'The small-world problem', Psychology Today 2, 60—67.
[Pennock+02] Winners don’t take all: Characterizing the competition for links on the web
McGlohon, Akoglu, Faloutsos KDD08
55
Contact us
Leman Akoglu
www.andrew.cmu.edu/~lakoglu
lakoglu@cs.cmu.edu
Christos Faloutsos
www.cs.cmu.edu/~christos
christos@cs.cmu.edu
Mary McGlohon
www.cs.cmu.edu/~mmcgloho
mmcgloho@cs.cmu.edu
McGlohon, Akoglu, Faloutsos KDD08
56
Entropy plots [Wang+2002]
●
●
From time series data, begin with resolution r=
T/2.
Record entropy HR
Entropy
D Weights
Time
Resolution
McGlohon, Akoglu, Faloutsos KDD08
57
Entropy plots
●
●
From time series data, begin with resolution r=
T/2.
Record entropy HR`
Entropy
D Weights
Time
Resolution
McGlohon, Akoglu, Faloutsos KDD08
58
Entropy plots
●
●
●
From time series data, begin with resolution r=
T/2.
Record entropy HR
Recursively take finer resolutions.
Entropy
D Weights
Time
Resolution
McGlohon, Akoglu, Faloutsos KDD08
59
Entropy plots
●
●
●
From time series data, begin with resolution r=
T/2.
Record entropy HR
Recursively take finer resolutions.
Entropy
D Weights
Time
Resolution
McGlohon, Akoglu, Faloutsos KDD08
60
Entropy Plots
●
Self-similarity  Linear plot
●
Entropy
s= 0.59
Resolution
McGlohon, Akoglu, Faloutsos KDD08
61
Entropy Plots
●
Self-similarity  Linear plot
Uniform: slope of plot s=1.
time
s= 0.59
Entropy
●
Resolution
McGlohon, Akoglu, Faloutsos KDD08
62
Entropy Plots
●
Self-similarity  Linear plot
Uniform: slope of plot s=1.
Point mass: s=0
time
time
s= 0.59
Entropy
●
Resolution
McGlohon, Akoglu, Faloutsos KDD08
63
Entropy Plots
●
Self-similarity  Linear plot
Uniform: slope of plot s=1.
Point mass: s=0
time
time
s= 0.59
Entropy
●
Bursty:
0.2 < s < 0.9
Resolution
McGlohon, Akoglu, Faloutsos KDD08
64
Download