Large Graph Mining Christos Faloutsos CMU

advertisement
CMU SCS
Large Graph Mining
Christos Faloutsos
CMU
CMU SCS
Thank you!
• Dr. Yan Liu
• Dr. Jimeng Sun
ICDM-LDMTA 2009
C. Faloutsos
2
CMU SCS
Outline
•
•
•
•
•
Introduction – Motivation
Problem#1: Patterns in graphs
Problem#2: Generators
Problem#3: Scalability
Conclusions
ICDM-LDMTA 2009
C. Faloutsos
3
CMU SCS
Graphs - why should we care?
Internet Map
[lumeta.com]
Food Web
[Martinez ’91]
Protein Interactions
[genomebiology.com]
Friendship Network
[Moody ’01]
ICDM-LDMTA 2009
C. Faloutsos
4
CMU SCS
Graphs - why should we care?
• IR: bi-partite graphs (doc-terms)
T1
D1
...
...
DN
TM
• web: hyper-text graph
• ... and more:
ICDM-LDMTA 2009
C. Faloutsos
5
CMU SCS
Graphs - why should we care?
• network of companies & board-of-directors
members
• ‘viral’ marketing
• web-log (‘blog’) news propagation
• computer network security: email/IP traffic
and anomaly detection
• ....
ICDM-LDMTA 2009
C. Faloutsos
6
CMU SCS
Outline
• Introduction – Motivation
• Problem#1: Patterns in graphs
– Static graphs
– Weighted graphs
– Time evolving graphs
• Problem#2: Generators
• Problem#3: Scalability
• Conclusions
ICDM-LDMTA 2009
C. Faloutsos
7
CMU SCS
Problem #1 - network and graph
mining
•
•
•
•
ICDM-LDMTA 2009
How does the Internet look like?
How does the web look like?
What is ‘normal’/‘abnormal’?
which patterns/laws hold?
C. Faloutsos
8
CMU SCS
Graph mining
• Are real graphs random?
ICDM-LDMTA 2009
C. Faloutsos
9
CMU SCS
Laws and patterns
• Are real graphs random?
• A: NO!!
– Diameter
– in- and out- degree distributions
– other (surprising) patterns
• So, let’s look at the data
• (‘it is amazing what you hear, when you
listen’)
ICDM-LDMTA 2009
C. Faloutsos
10
CMU SCS
Solution# S.1
• Power law in the degree distribution
[SIGCOMM99]
internet domains
att.com
log(degree)
ibm.com
-0.82
log(rank)
ICDM-LDMTA 2009
C. Faloutsos
11
CMU SCS
Solution# S.2: Eigen Exponent E
Eigenvalue
Exponent = slope
E = -0.48
May 2001
Rank of decreasing eigenvalue
• A2: power law in the eigenvalues of the adjacency
matrix
ICDM-LDMTA 2009
C. Faloutsos
12
CMU SCS
Solution# S.2: Eigen Exponent E
Eigenvalue
Exponent = slope
E = -0.48
May 2001
Rank of decreasing eigenvalue
• [Mihail, Papadimitriou ’02]: slope is ½ of rank
exponent
ICDM-LDMTA 2009
C. Faloutsos
13
CMU SCS
But:
How about graphs from other domains?
ICDM-LDMTA 2009
C. Faloutsos
14
CMU SCS
More power laws:
• web hit counts [w/ A. Montgomery]
Web Site Traffic
log(count)
Zipf
``ebay’’
users
sites
log(in-degree)
ICDM-LDMTA 2009
C. Faloutsos
16
CMU SCS
epinions.com
• who-trusts-whom
[Richardson +
Domingos, KDD
2001]
count
trusts-2000-people user
(out) degree
ICDM-LDMTA 2009
C. Faloutsos
17
CMU SCS
Outline
• Introduction – Motivation
• Problem#1: Patterns in graphs
– Static graphs
• degree, diameter, eigen*,
• triangles
• cliques
– Weighted graphs
– Time evolving graphs
• Problem#2: Generators
ICDM-LDMTA 2009
C. Faloutsos
18
CMU SCS
Solution# S.3: Triangle ‘Laws’
• Real social networks have a lot of triangles
ICDM-LDMTA 2009
C. Faloutsos
19
CMU SCS
Triangle ‘Laws’
• Real social networks have a lot of triangles
– Friends of friends are friends
• Any patterns?
ICDM-LDMTA 2009
C. Faloutsos
20
CMU SCS
Triangle Law: #S.3
[Tsourakakis ICDM 2008]
HEP-TH
ASN
Epinions
X-axis: # of Triangles
a node participates in
Y-axis: count of such nodes
ICDM-LDMTA 2009
C. Faloutsos
21
CMU SCS
Triangle Law: #S.4
[Tsourakakis ICDM 2008]
Reuters
SN
X-axis: degree
Y-axis: mean # triangles
Notice: slope ~ degree
exponent (insets)
Epinions
ICDM-LDMTA 2009
C. Faloutsos
22
CMU SCS
details
Triangle Law: Computations
[Tsourakakis ICDM 2008]
But: triangles are expensive to compute
(3-way join; several approx. algos)
Q: Can we do that quickly?
ICDM-LDMTA 2009
C. Faloutsos
23
CMU SCS
details
Triangle Law: Computations
[Tsourakakis ICDM 2008]
But: triangles are expensive to compute
(3-way join; several approx. algos)
Q: Can we do that quickly?
A: Yes!
#triangles = 1/6 Sum ( li3 )
(and, because of skewness, we only need
the top few eigenvalues!
ICDM-LDMTA 2009
C. Faloutsos
24
CMU SCS
details
Triangle Law: Computations
[Tsourakakis ICDM 2008]
1000x+ speed-up, high accuracy
ICDM-LDMTA 2009
C. Faloutsos
25
CMU SCS
Outline
• Introduction – Motivation
• Problem#1: Patterns in graphs
– Static graphs
• degree, diameter, eigen*,
• triangles
• cliques
– Weighted graphs
– Time evolving graphs
• Problem#2: Generators
ICDM-LDMTA 2009
C. Faloutsos
26
CMU SCS
Large Human Communication Networks
Patterns and a Utility-Driven Generator
Nan Du, Christos Faloutsos, Bai Wang, Leman Akoglu
KDD 2009
CMU SCS
Cliques
• Clique is a complete subgraph.
• If a clique can not be
contained by any larger
clique, it is called the
maximal clique.
ICDM-LDMTA 2009
C. Faloutsos
0
2
1
3
4
28
CMU SCS
Clique
• Clique is a complete subgraph.
• If a clique can not be
contained by any larger
clique, it is called the
maximal clique.
ICDM-LDMTA 2009
C. Faloutsos
0
2
1
3
4
29
CMU SCS
Clique
• Clique is a complete subgraph.
• If a clique can not be
contained by any larger
clique, it is called the
maximal clique.
ICDM-LDMTA 2009
C. Faloutsos
0
2
1
3
4
30
CMU SCS
Clique
• Clique is a complete subgraph.
• If a clique can not be
contained by any larger
clique, it is called the
maximal clique.
• {0,1,2}, {0,1,3}, {1,2,3}
{2,3,4}, {0,1,2,3} are cliques;
• {0,1,2,3} and {2,3,4} are
the maximal cliques.
ICDM-LDMTA 2009
C. Faloutsos
0
2
1
3
4
31
CMU SCS
S.5: Clique-Degree Power-Law
• Power law:
Cd
# maximal
cliques of node i
a
degree
of node i
Dataset: who-calls-whom
anonymized,
Over several time units
ICDM-LDMTA 2009
More friends, even more
social circles !
C. Faloutsos
32
CMU SCS
S.5 Clique-Degree Power-Law
• Outlier Detection
ICDM-LDMTA 2009
C. Faloutsos
33
CMU SCS
1.5 Clique-Degree Power-Law
• Outlier Detection
ICDM-LDMTA 2009
C. Faloutsos
34
CMU SCS
Outline
• Introduction – Motivation
• Problem#1: Patterns in graphs
– Static graphs
• degree, diameter, eigen*,
• triangles
• cliques
– Weighted graphs
– Time evolving graphs
• Problem#2: Generators
ICDM-LDMTA 2009
C. Faloutsos
35
CMU SCS
Observation W.1
Question : Nodes in a triangle are topologically
equivalent. Will they also give equal number of
phone calls to each other ?
ICDM-LDMTA 2009
C. Faloutsos
36
CMU SCS
Observation W.1:Triangle Weight Law
Max  medium
a
Max  min b
medium  min
c
Periods S1 – S3
ICDM-LDMTA 2009
C. Faloutsos
37
CMU SCS
Other observations on weighted
graphs?
• A: yes - even more ‘laws’!
M. McGlohon, L. Akoglu, and C. Faloutsos
Weighted Graphs and Disconnected
Components: Patterns and a Generator.
SIG-KDD 2008
ICDM-LDMTA 2009
C. Faloutsos
38
CMU SCS
Observation W.2: fortification
Q: How do the weights
of nodes relate to degree?
ICDM-LDMTA 2009
C. Faloutsos
39
CMU SCS
Observation W.2: fortification:
Snapshot Power Law
• weight of a node is super-linear on the in-degree with PL
exponent ‘iw’:
– i.e. 1.01 < iw < 1.26, super-linear
Orgs-Candidates
More donors,
even more $
$10
e.g. John Kerry,
$10M received,
from 1K donors
In-weights
($)
$5
Edges (# donors)
ICDM-LDMTA 2009
C. Faloutsos
40
CMU SCS
Are there deviations from P.L.?
• A: yes – but they also correspond to very
skewed distributions (‘black/gray swans’)
ICDM-LDMTA 2009
C. Faloutsos
41
CMU SCS
“Mobile
Call Graphs: Beyond Power Law
and Lognormal Distributions”
KDD’ 08
Mukund Seshadri , Sridhar Machiraju,
Ashwin Sridharan, Jean Bolot
Christos Faloutsos, Jure Leskovec
CMU SCS
Dataset
• Who-calls-whom (anonymized)
• Degree distribution, in green
....
+
--....
Observed Data
Observed Data
Power Law Fit
LogNormal Fit
Area S1
Time period T1
1 month
Count
Degree of Mobile-phone call graph
ICDM-LDMTA 2009
C. Faloutsos
43
CMU SCS
A poor fit?
• Power Laws: p(x) ~ x^(-a)
• Lognormal: log(x) is Normal
....
+
--....
Observed Data
Observed Data
Power Law Fit
LogNormal Fit
Area S1
Time period T1
1 month
Count
Degree of Mobile-phone call graph
ICDM-LDMTA 2009
C. Faloutsos
44
CMU SCS
Solution: DPLN
• Double Pareto Log Normal (Reed, 2003)
• 4 parameters: [α,β,ν,τ]
• Linear head and tail.
Area S1, Time Period T1
count
β
α
Degree
ICDM-LDMTA 2009
C. Faloutsos
“Rich get Richer”
BUT
Population lifetimes
NOT identical
45
CMU SCS
Datasets
• Anonymized monthly aggregates of call metrics
per user
– Collected at 4 coverage areas S1,S2,S3,S4
– Collected for two month-long periods T1 and
T2, separated by 6 months
– Total coverage:
• >1 million users
• >10 million calls
• ~7000 square miles.
ICDM-LDMTA 2009
C. Faloutsos
46
CMU SCS
Metrics
• Degree (No. of Call Partners)
• Calls
• Talk Time
Total over a month,
per user
Area S1
Time-period T1
Degree Distribution
ICDM-LDMTA 2009
C. Faloutsos
47
CMU SCS
Persistence –
Across Metric, Time,
Space
Calls
Area S1, Time T1
Partners
Partners
S1, T2
S2, T2
ICDM-LDMTA 2009
C. Faloutsos
48
CMU SCS
Applications
• Outlier detection
– Many un-answered calls => multiples
of 27 sec!
Per-user distribution
of total talk time
(@ T1, S1)
ICDM-LDMTA 2009
C. Faloutsos
49
CMU SCS
Outline
• Introduction – Motivation
• Problem#1: Patterns in graphs
– Static graphs
– Weighted graphs
– Time evolving graphs
• Problem#2: Generators
• …
ICDM-LDMTA 2009
C. Faloutsos
50
CMU SCS
Problem: Time evolution
• with Jure Leskovec (CMU ->
Stanford)
• and Jon Kleinberg (Cornell –
sabb. @ CMU)
ICDM-LDMTA 2009
C. Faloutsos
51
CMU SCS
T.1 Evolution of the Diameter
• Prior work on Power Law graphs hints
at slowly growing diameter:
– diameter ~ O(log N)
– diameter ~ O(log log N)
• What is happening in real data?
ICDM-LDMTA 2009
C. Faloutsos
52
CMU SCS
T.1 Evolution of the Diameter
• Prior work on Power Law graphs hints
at slowly growing diameter:
– diameter ~ O(log N)
– diameter ~ O(log log N)
• What is happening in real data?
• Diameter shrinks over time
ICDM-LDMTA 2009
C. Faloutsos
53
CMU SCS
T.1 Diameter – “Patents”
• Patent citation
network
• 25 years of data
diameter
time [years]
ICDM-LDMTA 2009
C. Faloutsos
54
CMU SCS
T.2 Temporal Evolution of the
Graphs
• N(t) … nodes at time t
• E(t) … edges at time t
• Suppose that
N(t+1) = 2 * N(t)
• Q: what is your guess for
E(t+1) =? 2 * E(t)
ICDM-LDMTA 2009
C. Faloutsos
55
CMU SCS
T.2 Temporal Evolution of the
Graphs
• N(t) … nodes at time t
• E(t) … edges at time t
• Suppose that
N(t+1) = 2 * N(t)
• Q: what is your guess for
E(t+1) =? 2 * E(t)
• A: over-doubled!
– But obeying the ``Densification Power Law’’
ICDM-LDMTA 2009
C. Faloutsos
56
CMU SCS
T.2 Densification – Patent
Citations
• Citations among
patents granted E(t)
• 1999
1.66
– 2.9 million nodes
– 16.5 million
edges
• Each year is a
datapoint
ICDM-LDMTA 2009
N(t)
C. Faloutsos
57
CMU SCS
Outline
• Introduction – Motivation
• Problem#1: Patterns in graphs
– Static graphs
– Weighted graphs
– Time evolving graphs
• Problem#2: Generators
• …
ICDM-LDMTA 2009
C. Faloutsos
58
CMU SCS
More on Time-evolving graphs
M. McGlohon, L. Akoglu, and C. Faloutsos
Weighted Graphs and Disconnected
Components: Patterns and a Generator.
SIG-KDD 2008
ICDM-LDMTA 2009
C. Faloutsos
59
CMU SCS
Observation T.3: NLCC behavior
Q: How do NLCC’s emerge and join with
the GCC?
(``NLCC’’ = non-largest conn. components)
– Do they continue to grow in size?
– or do they shrink?
– or stabilize?
ICDM-LDMTA 2009
C. Faloutsos
60
CMU SCS
Observation T.4: NLCC behavior
• After the gelling point, the GCC takes off, but
NLCC’s remain ~constant (actually, oscillate).
IMDB
CC size
Time-stamp
ICDM-LDMTA 2009
C. Faloutsos
61
CMU SCS
T.5 How do new edges appear?
[LBKT’08] Microscopic Evolution of
Social Networks
Jure Leskovec, Lars Backstrom, Ravi
Kumar, Andrew Tomkins.
(ACM KDD), 2008.
ICDM-LDMTA 2009
C. Faloutsos
62
CMU SCS
T.5 How do edges appear in time?
[LBKT’08]
Edge gap δ(d):
inter-arrival
time between
dth and d+1st
edge
d
What is the PDF of d?
Poisson?
ICDM-LDMTA 2009
C. Faloutsos
63
CMU SCS
T.5 How do edges appear in time?
[LBKT’08]
LinkedIn
Edge gap δ(d):
inter-arrival
time between
dth and d+1st
edge

pg (d (d );  ,  )  d (d ) e
ICDM-LDMTA 2009
C. Faloutsos
 d ( d )
64
CMU SCS
Similarly for Blogs
• with Mary McGlohon (CMU)
• Jure Leskovec (CMU->Stanford)
• Natalie Glance (now at Google)
• Mat Hurst (now at MSR)
[SDM’07]
ICDM-LDMTA 2009
C. Faloutsos
65
CMU SCS
T.6 : popularity over time
# in links
1
2
3
lag: days after post
Post popularity drops-off – exponentially?
@t
@t + lag
ICDM-LDMTA 2009
C. Faloutsos
66
CMU SCS
T.6 : popularity over time
# in links
(log)
1
2
3
days after post
(log)
Post popularity drops-off – exponentially?
POWER LAW!
Exponent?
ICDM-LDMTA 2009
C. Faloutsos
67
CMU SCS
T.6 : popularity over time
# in links
(log)
-1.6
1
2
3
days after post
(log)
Post popularity drops-off – exponentially?
POWER LAW!
Exponent? -1.6
• close to -1.5: Barabasi’s stack model
• and like the zero-crossings of a random walk
ICDM-LDMTA 2009
C. Faloutsos
68
CMU SCS
Outline
•
•
•
•
•
Introduction – Motivation
Problem#1: Patterns in graphs
Problem#2: Generators
Problem#3: Scalability
Conclusions
ICDM-LDMTA 2009
C. Faloutsos
69
CMU SCS
Problem#2: Generation
Goal: Generate a realistic sequence of graphs
that
1. obey all the patterns
•
•
including weights on the edges and
timestamps on the edges
2. needs no randomness
3. needs no parameters (or as few as
possible)
ICDM-LDMTA 2009
C. Faloutsos
70
CMU SCS
PaC generator
• Why do we need generators?
– What-if scenarios
– How to attract/retain customers/members
– Stress-testing algorithms, when real
graphs are unavailable (eg., due to
privacy, national/corporate security, etc)
ICDM-LDMTA 2009
C. Faloutsos
71
CMU SCS
(NUMEROUS earlier generators)
•
•
•
•
•
•
preferential attachment [Barabasi+]
copying model [Kleinberg+]
winner does not take all [Giles+]
Kronecker graphs
...
(survey: [Chakrabarti+, ’06])
ICDM-LDMTA 2009
C. Faloutsos
72
CMU SCS
Problem#2: Generation
Goal: Generate a realistic sequence of graphs that
1. obey all the patterns
•
•
including weights on the edges and
timestamps on the edges
2. needs no randomness
3. needs no parameters (or as few as possible)
Proposed ‘PaC’ model – idea:
• design ‘agents’, who try to max utility of
contacts
ICDM-LDMTA 2009
C. Faloutsos
73
CMU SCS
PaC Model
• People benefit from calling each other.
• A Pay and Call game = PaC Model
• The payoffs are measured as “emotional
dollars”.
Friendliness 0<=Fi<=1
Fi  0
Fi  1
capital
agent
Prob pl to stay in the game
ICDM-LDMTA 2009
C. Faloutsos
74
CMU SCS
Outline of Agent Behavior
• Step 1: decide to stay (PL)
• Step 2: if stay, call the
most profitable person(s)
– Existing friend (‘exploit’)
– Stranger (‘explore’) or ask
for recommendation (if
available) to maximize
benefits
ICDM-LDMTA 2009
C. Faloutsos
Exponential lifetime
Rich get richer
Closing Triangle
75
CMU SCS
Validation of PaC
• Ran 35 simulations
• 100,000 agents per simulation
• Variation of the parameters does not change
the shape of the distribution
ICDM-LDMTA 2009
C. Faloutsos
77
CMU SCS
Goals of Validation
• G1: Skewed degree/node weight
distribution
• G2: Snapshot Power-Law
• G3: Skewed connected components
distribution
• G4: Clique-Degree Power-Law
• G5: Clique-Participation Law
• G6: Triangle Weight Law
ICDM-LDMTA 2009
C. Faloutsos
78
CMU SCS
Validation of PaC
• G1: Skewed Degree / Node Weight Distribution
Real Network
Synthetic Network
ICDM-LDMTA 2009
C. Faloutsos
79
CMU SCS
Validation of PaC
• G2: Snapshot Power Law [McGlohon, Akoglu,
Faloutsos 08] “more partners, even more
calls”
Real Network
ICDM-LDMTA 2009
Synthetic Network
C. Faloutsos
80
CMU SCS
Validation of PaC
• G3: Skewed distribution of the connected
components
Real Network
ICDM-LDMTA 2009
Synthetic Network
C. Faloutsos
81
CMU SCS
Validation of PaC
• G4: Clique Degree Power Law
Real Network
ICDM-LDMTA 2009
Synthetic Network
C. Faloutsos
82
CMU SCS
Validation of PaC
• G5: Clique Participation Law
Real Network
ICDM-LDMTA 2009
Synthetic Network
C. Faloutsos
83
CMU SCS
Validation of PaC
• G6: Triangle Weight Law
Real Network
Synthetic Network
ICDM-LDMTA 2009
C. Faloutsos
84
CMU SCS
Validation of PaC
G1: Skewed degree/node weight
distribution
G2: Snapshot Power-Law
G3: Skewed connected components
distribution
G4: Clique-Degree Power-Law
G5: Clique-Participation Law
G6: Triangle Weight Law
ICDM-LDMTA 2009
C. Faloutsos
85
CMU SCS
Outline
•
•
•
•
•
Introduction – Motivation
Problem#1: Patterns in graphs
Problem#2: Generators
Problem#3: Scalability
Conclusions
ICDM-LDMTA 2009
C. Faloutsos
86
CMU SCS
Scalability
• How about if graph/tensor does not fit in
core?
• How about handling huge graphs?
ICDM-LDMTA 2009
C. Faloutsos
87
CMU SCS
Scalability
• How about if graph/tensor does not fit in
core?
• [‘MET’: Kolda, Sun, ICMD’08, best paper
award]
• How about handling huge graphs?
ICDM-LDMTA 2009
C. Faloutsos
88
CMU SCS
Scalability
• Google: > 450,000 processors in clusters of ~2000
processors each [Barroso, Dean, Hölzle, “Web Search for
a Planet: The Google Cluster Architecture” IEEE Micro
2003]
•
•
•
•
Yahoo: 5Pb of data [Fayyad, KDD’07]
Problem: machine failures, on a daily basis
How to parallelize data mining tasks, then?
A: map/reduce – hadoop (open-source clone)
http://hadoop.apache.org/
ICDM-LDMTA 2009
C. Faloutsos
89
CMU SCS
details
2’ intro to hadoop
• master-slave architecture; n-way replication
(default n=3)
• ‘group by’ of SQL (in parallel, fault-tolerant way)
• e.g, find histogram of word frequency
– compute local histograms
– then merge into global histogram
select course-id, count(*)
from ENROLLMENT
group by course-id
ICDM-LDMTA 2009
C. Faloutsos
90
CMU SCS
details
2’ intro to hadoop
• master-slave architecture; n-way replication
(default n=3)
• ‘group by’ of SQL (in parallel, fault-tolerant way)
• e.g, find histogram of word frequency
– compute local histograms
– then merge into global histogram
select course-id, count(*)
from ENROLLMENT
group by course-id
ICDM-LDMTA 2009
C. Faloutsos
reduce
map
91
CMU SCS
details
User
Program
Input Data
(on HDFS)
Split 0 read
Split 1
Split 2
fork
fork
fork
assign
map
Master
assign
reduce
Mapper
Mapper
Reducer
local
write
Reducer
Mapper
write
Output
File 0
Output
File 1
remote read,
sort
By default: 3-way replication;
Late/dead machines: ignored, transparently (!)
ICDM-LDMTA 2009
C. Faloutsos
92
CMU SCS
PEGASUS: A Peta-Scale Graph Mining
System, ICDM’09
U Kang,
ICDM-LDMTA 2009
Charalampos Tsourakakis, C. Faloutsos
C. Faloutsos
93
CMU SCS
Analysis of Real Networks
YahooWeb: 1.4B nodes and 6.6B edges, 120Gb
(largest public graph analyzed)
M45 (Y!):
• 1.5Pb disk
• 3Tb RAM
• 4000 cores
• among top 500
the PageRank distribution: power-law
ICDM-LDMTA 2009
C. Faloutsos
94
CMU SCS
Analysis of Real Networks
YahooWeb: 1.4 B nodes and 6.6 B edges Connected components:
count
size
Spikes?
ICDM-LDMTA 2009
C. Faloutsos
95
CMU SCS
Analysis of Real Networks
YahooWeb: 1.4 B nodes and 6.6 B edges Connected components:
count
size
Spikes: due to replications of pages.
ICDM-LDMTA 2009
C. Faloutsos
96
CMU SCS
details
LinkedIn: 7.5M nodes and 58M edges
Evolution of connected components of LinkedIn:
stable, *after* the gelling point (@2004)
ICDM-LDMTA 2009
C. Faloutsos
97
CMU SCS
OVERALL CONCLUSIONS –
low level:
• Several new patterns (fortification, trianglelaws, etc)
• New generator: ‘PaC’ (agent based)
• Scalability / cloud computing -> PetaBytes
ICDM-LDMTA 2009
C. Faloutsos
98
CMU SCS
OVERALL CONCLUSIONS –
high level
• good news: unprecedented opportunities (huge
datasets, available on-line)
• better news: cross-disciplinary work:
– DB/sys -> scalability
– ML/stat -> tools
– Econ, Soc -> experience; what questions to ask
– phys, math -> phase transitions, tensors, eigenvalues
– …
ICDM-LDMTA 2009
C. Faloutsos
99
CMU SCS
References
• Hanghang Tong, Christos Faloutsos, and Jia-Yu
Pan Fast Random Walk with Restart and Its
Applications ICDM 2006, Hong Kong.
• Hanghang Tong, Christos Faloutsos Center-Piece
Subgraphs: Problem Definition and Fast
Solutions, KDD 2006, Philadelphia, PA
• T. G. Kolda and J. Sun. Scalable Tensor
Decompositions for Multi-aspect Data Mining. In:
ICDM 2008, pp. 363-372, December 2008.
ICDM-LDMTA 2009
C. Faloutsos
100
CMU SCS
References
• Jure Leskovec, Jon Kleinberg and Christos
Faloutsos Graphs over Time: Densification Laws,
Shrinking Diameters and Possible Explanations
KDD 2005, Chicago, IL. ("Best Research Paper"
award).
• Jure Leskovec, Deepayan Chakrabarti, Jon
Kleinberg, Christos Faloutsos Realistic,
Mathematically Tractable Graph Generation and
Evolution, Using Kronecker Multiplication
(ECML/PKDD 2005), Porto, Portugal, 2005.
ICDM-LDMTA 2009
C. Faloutsos
101
CMU SCS
References
• Jure Leskovec and Christos Faloutsos, Scalable
Modeling of Real Graphs using Kronecker
Multiplication, ICML 2007, Corvallis, OR, USA
• Shashank Pandit, Duen Horng (Polo) Chau,
Samuel Wang and Christos Faloutsos NetProbe: A
Fast and Scalable System for Fraud Detection in
Online Auction Networks WWW 2007, Banff,
Alberta, Canada, May 8-12, 2007.
• Jimeng Sun, Dacheng Tao, Christos Faloutsos
Beyond Streams and Graphs: Dynamic Tensor
Analysis, KDD 2006, Philadelphia, PA
ICDM-LDMTA 2009
C. Faloutsos
102
CMU SCS
References
• Jimeng Sun, Yinglian Xie, Hui Zhang, Christos
Faloutsos. Less is More: Compact Matrix
Decomposition for Large Sparse Graphs, SDM,
Minneapolis, Minnesota, Apr 2007. [pdf]
• Jimeng Sun, Spiros Papadimitriou, Philip S. Yu,
and Christos Faloutsos, GraphScope: Parameterfree Mining of Large Time-evolving Graphs ACM
SIGKDD Conference, San Jose, CA, August 2007
ICDM-LDMTA 2009
C. Faloutsos
103
CMU SCS
Contact info:
www. cs.cmu.edu /~christos
Thanks again to the organizers:
Jimeng Sun, Yan Liu
Funding sources:
• NSF IIS-0705359, IIS-0534205, DBI-0640543
• LLNL, PITA, IBM, INTEL, HP
ICDM-LDMTA 2009
C. Faloutsos
104
Download