Mining Large Graphs: Patterns, Anomalies, and Fraud Detection Christos Faloutsos

advertisement
CMU SCS
Mining Large Graphs:
Patterns, Anomalies, and Fraud
Detection
Christos Faloutsos
CMU
CMU SCS
Thank you!
• Dr. Ping Luo
CAS
(c) 2015, C. Faloutsos
2
CMU SCS
Roadmap
• Introduction – Motivation
– Why study (big) graphs?
• Part#1: Patterns in graphs
• Part#2: time-evolving graphs; tensors
• Conclusions
CAS
(c) 2015, C. Faloutsos
3
CMU SCS
Graphs - why should we care?
~1B nodes (web sites)
~6B edges (http links)
‘YahooWeb graph’
CAS
(c) 2015, C. Faloutsos
4
CMU SCS
Graphs - why should we care?
~1B nodes (web sites)
~6B edges (http links)
‘YahooWeb graph’
U Kang, Jay-Yoon Lee, Danai Koutra, and Christos
Faloutsos.
Net-Ray: Visualizing
and Mining Billion-Scale
CAS
(c) 2015, C. Faloutsos
5
Graphs PAKDD 2014, Tainan, Taiwan.
CMU SCS
Graphs - why should we care?
>$10B; ~1B users
CAS
(c) 2015, C. Faloutsos
6
CMU SCS
Graphs - why should we care?
Food Web
[Martinez ’91]
Internet Map
[lumeta.com]
CAS
(c) 2015, C. Faloutsos
7
CMU SCS
Graphs - why should we care?
• web-log (‘blog’) news propagation
• computer network security: email/IP traffic and
anomaly detection
• Recommendation systems
• ....
• Many-to-many db relationship -> graph
CAS
(c) 2015, C. Faloutsos
8
CMU SCS
Motivating problems
• P1: patterns? Fraud detection?
• P2: patterns in time-evolving graphs /
tensors
destination
time
CAS
(c) 2015, C. Faloutsos
9
CMU SCS
Motivating problems
• P1: patterns? Fraud detection?
Patterns
anomalies
• P2: patterns in time-evolving graphs /
tensors
destination
time
CAS
(c) 2015, C. Faloutsos
10
CMU SCS
Roadmap
• Introduction – Motivation
– Why study (big) graphs?
• Part#1: Patterns & fraud detection
• Part#2: time-evolving graphs; tensors
• Conclusions
CAS
(c) 2015, C. Faloutsos
11
CMU SCS
Part 1:
Patterns, &
fraud detection
CAS
(c) 2015, C. Faloutsos
12
CMU SCS
Laws and patterns
• Q1: Are real graphs random?
CAS
(c) 2015, C. Faloutsos
13
CMU SCS
Laws and patterns
• Q1: Are real graphs random?
• A1: NO!!
– Diameter (‘6 degrees’; ‘Kevin Bacon’)
– in- and out- degree distributions
– other (surprising) patterns
• So, let’s look at the data
CAS
(c) 2015, C. Faloutsos
14
CMU SCS
Solution# S.1
• Power law in the degree distribution [Faloutsos x 3
SIGCOMM99]
internet domains
att.com
log(degree)
ibm.com
log(rank)
CAS
(c) 2015, C. Faloutsos
15
CMU SCS
Solution# S.1
• Power law in the degree distribution [Faloutsos x 3
SIGCOMM99]
internet domains
att.com
log(degree)
ibm.com
-0.82
log(rank)
CAS
(c) 2015, C. Faloutsos
16
CMU SCS
Solution# S.1
• Q: So what?
internet domains
att.com
log(degree)
ibm.com
-0.82
log(rank)
CAS
(c) 2015, C. Faloutsos
17
CMU SCS
Solution# S.1
• Q: So what?
• A1: # of two-step-away pairs:
internet domains
att.com
log(degree)
ibm.com
-0.82
log(rank)
CAS
(c) 2015, C. Faloutsos
18
CMU SCS
Solution# S.1
• Q: So what?
• A1: # of two-step-away pairs: 100^2 * N= 10 Trillion
internet domains
att.com
log(degree)
ibm.com
-0.82
log(rank)
CAS
(c) 2015, C. Faloutsos
19
CMU SCS
Solution# S.1
• Q: So what?
• A1: # of two-step-away pairs: 100^2 * N= 10 Trillion
internet domains
att.com
log(degree)
ibm.com
-0.82
log(rank)
CAS
(c) 2015, C. Faloutsos
20
CMU SCS
Gaussian trap
Solution# S.1
• Q: So what?
• A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2
internet domains
att.com
~0.8PB ->
a data center(!)
log(degree)
ibm.com
-0.82
log(rank)
CAS
(c) 2015, C. Faloutsos
DCO @ CMU
21
CMU SCS
Gaussian trap
Solution# S.1
• Q: So what?
• A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2
internet domains
att.com
~0.8PB ->
a data center(!)
log(degree)
ibm.com
-0.82
log(rank)
CAS
(c) 2015, C. Faloutsos
22
CMU SCS
Observation – big-data:
• O(N2) algorithms are ~intractable - N=1B
• N2 seconds = 31B years (>2x age of
universe)
1B
1B
CAS
(c) 2015, C. Faloutsos
23
CMU SCS
Observation – big-data:
• O(N2) algorithms are ~intractable - N=1B
31M
• N2 seconds = 31B years
• 1,000 machines
1B
CAS
(c) 2015, C. Faloutsos
24
CMU SCS
Observation – big-data:
• O(N2) algorithms are ~intractable - N=1B
31K
• N2 seconds = 31B years
• 1M machines
1B
CAS
(c) 2015, C. Faloutsos
25
CMU SCS
Observation – big-data:
• O(N2) algorithms are ~intractable - N=1B
3
• N2 seconds = 31B years
• 10B machines ~ $10Trillion
1B
CAS
(c) 2015, C. Faloutsos
26
CMU SCS
Observation – big-data:
• O(N2) algorithms are ~intractable - N=1B
And parallelism might not help
3
• N2 seconds = 31B years
• 10B machines ~ $10Trillion
1B
CAS
(c) 2015, C. Faloutsos
27
CMU SCS
Solution# S.2: Eigen Exponent E
Eigenvalue
Exponent = slope
E = -0.48
Ax=lx
Rank of decreasing eigenvalue
• A2: power law in the eigenvalues of the adjacency
matrix (‘eig()’)
CAS
(c) 2015, C. Faloutsos
28
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
– Patterns: Degree; Triangles
– Anomaly/fraud detection
– Graph understanding
• Part#2: time-evolving graphs; tensors
• Conclusions
CAS
(c) 2015, C. Faloutsos
29
CMU SCS
Solution# S.3: Triangle ‘Laws’
• Real social networks have a lot of triangles
CAS
(c) 2015, C. Faloutsos
30
CMU SCS
Solution# S.3: Triangle ‘Laws’
• Real social networks have a lot of triangles
– Friends of friends are friends
• Any patterns?
– 2x the friends, 2x the triangles ?
CAS
(c) 2015, C. Faloutsos
31
CMU SCS
Triangle Law: #S.3
[Tsourakakis ICDM 2008]
Reuters
Epinions
CAS
SN
X-axis: degree
Y-axis: mean # triangles
n friends -> ~n1.6 triangles
(c) 2015, C. Faloutsos
32
CMU SCS
details
Triangle Law: Computations
[Tsourakakis ICDM 2008]
But: triangles are expensive to compute
(3-way join; several approx. algos) – O(dmax2)
Q: Can we do that quickly?
A:
CAS
(c) 2015, C. Faloutsos
33
CMU SCS
details
Triangle Law: Computations
[Tsourakakis ICDM 2008]
But: triangles are expensive to compute
(3-way join; several approx. algos) – O(dmax2)
Q: Can we do that quickly?
Ax=lx
A: Yes!
#triangles = 1/6 Sum ( li3 )
(and, because of skewness (S2) ,
we only need the top few eigenvalues! - O(E)
CAS
(c) 2015, C. Faloutsos
34
CMU SCS
Triangle counting for large graphs?
?
?
?
Anomalous nodes in Twitter(~ 3 billion edges)
[U Kang, Brendan Meeder, +, PAKDD’11]
CAS
(c) 2015, C. Faloutsos
35
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges)
[U Kang, Brendan Meeder, +, PAKDD’11]
CAS
(c) 2015, C. Faloutsos
36
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges)
[U Kang, Brendan Meeder, +, PAKDD’11]
CAS
(c) 2015, C. Faloutsos
37
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges)
[U Kang, Brendan Meeder, +, PAKDD’11]
CAS
(c) 2015, C. Faloutsos
38
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges)
[U Kang, Brendan Meeder, +, PAKDD’11]
CAS
(c) 2015, C. Faloutsos
39
CMU SCS
MORE Graph Patterns
✔
✔
✔
RTG:CAS
A Recursive Realistic
Graph Generator using Random
(c) 2015, C. Faloutsos
40
Typing Leman Akoglu and Christos Faloutsos. PKDD’09.
CMU SCS
MORE Graph Patterns
• Mary McGlohon, Leman Akoglu, Christos
Faloutsos. Statistical Properties of Social
Networks. in "Social Network Data Analytics” (Ed.:
Charu Aggarwal)
• Deepayan Chakrabarti and Christos Faloutsos,
Graph Mining: Laws, Tools, and Case Studies Oct.
2012, Morgan Claypool.
CAS
(c) 2015, C. Faloutsos
41
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
– Patterns
– Anomaly / fraud detection
• CopyCatch
Patterns
• Spectral methods (‘fBox’)
• Belief Propagation
anomalies
• Part#2: time-evolving graphs; tensors
• Conclusions
CAS
(c) 2015, C. Faloutsos
42
CMU SCS
Fraud
• Given
– Who ‘likes’ what page, and
when
• Find
– Suspicious users and suspicious
products
CopyCatch: Stopping Group Attacks by Spotting
Lockstep Behavior in Social Networks, Alex Beutel,
Wanhong
Xu, Venkatesan
Guruswami, Christopher Palow,
CAS
(c) 2015, C. Faloutsos
43
Christos Faloutsos WWW, 2013.
CMU SCS
Fraud
• Given
– Who ‘likes’ what page, and
when
• Find
– Suspicious users and suspicious
products
Likes
CopyCatch: Stopping Group Attacks by Spotting
Lockstep Behavior in Social Networks, Alex Beutel,
Wanhong
Xu, Venkatesan
Guruswami, Christopher Palow,
CAS
(c) 2015, C. Faloutsos
44
Christos Faloutsos WWW, 2013.
CMU SCS
Graph Patterns and Lockstep
Our intuition Behavior
▪
Lockstep behavior: Same Likes, same time
Likes
CAS
(c) 2015, C. Faloutsos
45
CMU SCS
Graph Patterns and Lockstep
Our intuition Behavior
▪
Lockstep behavior: Same Likes, same time
Likes
CAS
(c) 2015, C. Faloutsos
46
CMU SCS
Graph Patterns and Lockstep
Our intuition Behavior
▪
Lockstep behavior: Same Likes, same time
Likes
CAS
(c) 2015,Lockstep
C. Faloutsos
Suspicious
Behavior
47
CMU SCS
MapReduce Overview
▪
Use Hadoop to search for
many clusters in parallel:
1.
Start with randomly seed
2.
Update set of Pages and
center Like times for
each cluster
3.
Repeat until
convergence
Likes
CAS
(c) 2015, C. Faloutsos
48
CMU SCS
Deployment at Facebook
CopyCatch runs regularly (along with many other
security mechanisms, and a large Site Integrity
team)
3 months of CopyCatch @ Facebook
#users
caught
CAS
Number of users caught
▪
08/25 09/08 09/22 10/06 10/20 11/03 11/17 12/01
(c)of2015,
C. Faloutsos
Date
CopyCatch
run
time
49
CMU SCS
Deployment at Facebook
Fake acct
Manually labeled 22 randomly selected
clusters from February 2013
CAS
(c) 2015, C. Faloutsos
50
CMU SCS
Deployment at Facebook
Fake acct
Most clusters (77%) come from
real but compromised users
Manually labeled 22 randomly selected
clusters from February 2013
CAS
(c) 2015, C. Faloutsos
51
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
– Patterns
– Anomaly / fraud detection
• CopyCatch
• Spectral methods (‘fBox’)
• Belief Propagation
• Part#2: time-evolving graphs; tensors
• Conclusions
CAS
(c) 2015, C. Faloutsos
52
CMU SCS
Problem: Social Network Link Fraud
Target: find “stealthy” attackers missed by other algorithms
Clique
41.7M nodes
1.5B edges
Bipartite
core
CAS
(c) 2015, C. Faloutsos
53
CMU SCS
Problem: Social Network Link Fraud
Target: find “stealthy” attackers missed by other algorithms
Takeaway: use reconstruction error
between true/latent representation!
Neil Shah, Alex Beutel, Brian Gallagher and Christos
Faloutsos.
Spotting Suspicious
Link Behavior with fBox: An
CAS
(c) 2015, C. Faloutsos
54
Adversarial Perspective. ICDM 2014, Shenzhen, China.
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
– Patterns
– Anomaly / fraud detection
• CopyCatch
• Spectral methods (‘fBox’, suspiciousness)
• Belief Propagation
• Part#2: time-evolving graphs; tensors
• Conclusions
CAS
(c) 2015, C. Faloutsos
55
CMU SCS
Suspicious Patterns in Event Data
?
2-modes
?
?
n-modes
A General Suspiciousness Metric for Dense Blocks in
Multimodal
Data, Meng Jiang,
Alex Beutel, Peng Cui, Bryan
CAS
(c) 2015, C. Faloutsos
56
Hooi, Shiqiang Yang, and Christos Faloutsos, ICDM, 2015.
CMU SCS
ICDM 2015
Suspicious Patterns in Event Data
Which is more suspicious?
20,000 Users
Retweeting same 20 tweets
6 times each
All in 10 hours
CAS
vs.
225 Users
Retweeting same 1 tweet
15 times each
All in 3 hours
All from 2 IP addresses
(c) 2015, C. Faloutsos
Answer: volume
* DKL(p|| pbackground)
57
ymous CMU SCS
Anonymous
Anonymous
mous
ymous
Anonymous
Anonymous
Anonymous
Anonymous
ICDM 2015
CAS
..
.
15:37
l
l
:
l
l
l
:
l
:
:
l
l
:
l
:
l
l
:
21:13
11:04
:
:
:
:
l
l
l
10:37
:
:
l
l
:
:
l
l
:
l
l
l
10:32
10:24
···
l
l
l
:
:
:
l
10:19
27,313
R et weet s
“ John123”
“ Joe456”
“ Job789”
“ Jack135”
“ Jill246”
“ Jane314”
“ Jay357”
10:14
200 M inut es
225 U ser s
tweets from
m 500 user s
g and all in
t tr y to find
t no method
dense blocks
dr aw human
ng, typically
notewor thy
hat we show
ed answer to
st of axioms
e propose an
and is fast to
o spot dense
ness” ) or der.
it impr oves
nds retweetspanning 0.3
10:09
Suspicious Patterns in Event Data
...
l
l
..
.
Fig. 1. Density in multiple modes is suspicious. Left: A dense block of 225
users on Tencent Weibo (Chinese Twitter) retweeting one tweet 27,313 times
from 2 IP addresses over 200 minutes. Right: magnification of a subset of this
block. l and : indicate the two IP addresses used. Notice how synchronized
the behavior is, across several modes (IP-address, user-id, timestamp).
Retweeting: “Galaxy Note Dream Project:
Happy Happy Life Traveling the World”
I nfor mal Problem 1 (Suspiciousness scor e) Given a K mode dataset (tensor) X , with counts of events (that are
non-negative integer values), and two subtensors Y1 and Y2 ,
which is more suspicious and worthy of further investigation?
Why multimodal data (tensor ): Graphs and social networks
(c) 2015, C. Faloutsos
have attracted huge interest
- and they are perfectly modeled as
58
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
– Patterns
– Anomaly / fraud detection
• CopyCatch
• Spectral methods (‘fBox’)
• Belief Propagation
• Part#2: time-evolving graphs; tensors
• Conclusions
CAS
(c) 2015, C. Faloutsos
59
CMU SCS
E-bay Fraud detection
w/ Polo Chau &
Shashank Pandit, CMU
[www’07]
CAS
(c) 2015, C. Faloutsos
60
CMU SCS
E-bay Fraud detection
CAS
(c) 2015, C. Faloutsos
61
CMU SCS
E-bay Fraud detection
CAS
(c) 2015, C. Faloutsos
62
CMU SCS
E-bay Fraud detection - NetProbe
CAS
(c) 2015, C. Faloutsos
63
CMU SCS
Popular press
And less desirable attention:
• E-mail from ‘Belgium police’ (‘copy of
your code?’)
CAS
(c) 2015, C. Faloutsos
64
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
– Patterns
– Anomaly / fraud detection
• CopyCatch
• Spectral methods (‘fBox’)
• Belief Propagation; fast computation & unification
• Part#2: time-evolving graphs; tensors
• Conclusions
CAS
(c) 2015, C. Faloutsos
76
CMU SCS
Unifying Guilt-by-Association Approaches:
Theorems and Fast Algorithms
Danai Koutra
U Kang
Hsing-Kuo Kenneth Pao
Tai-You Ke
Duen Horng (Polo) Chau
Christos Faloutsos
ECML PKDD, 5-9 September 2011, Athens, Greece
CMU SCS
Problem Definition:
GBA techniques
Given: Graph; &
few labeled nodes
Find: labels of rest
(assuming network
effects)
CAS
(c) 2015, C. Faloutsos
78
CMU SCS
Are they related?
• RWR (Random Walk with Restarts)
– google’s pageRank (‘if my friends are
important, I’m important, too’)
• SSL (Semi-supervised learning)
– minimize the differences among neighbors
• BP (Belief propagation)
– send messages to neighbors, on what you
believe about them
CAS
(c) 2015, C. Faloutsos
79
CMU SCS
Are they related?
YES!
• RWR (Random Walk with Restarts)
– google’s pageRank (‘if my friends are
important, I’m important, too’)
• SSL (Semi-supervised learning)
– minimize the differences among neighbors
• BP (Belief propagation)
– send messages to neighbors, on what you
believe about them
CAS
(c) 2015, C. Faloutsos
80
CMU SCS
Correspondence of Methods
Method
Matrix
RWR
SSL
FABP
[I – c AD-1]
[I + a(D - A)]
[I + a D - c’A]
CAS
1
1
1
Unknow
known
n
×
x
= (1-c)y
×
x
=
y
×
bh
=
φh
d1
0 1 0
d2
1 0 1
d3 0 1 0
adjacency
matrix
(c) 2015, C. Faloutsos
?
final
labels/
beliefs
0
1
1
prior
labels/
beliefs
81
CMU SCS
runtime (min)
Results: Scalability
# of edges (Kronecker
graphs)
FABP is linear on the number of edges.
CAS
(c) 2015, C. Faloutsos
82
CMU SCS
% accuracy
Results: Parallelism
runtime (min)
FABP ~2x faster
& wins/ties on accuracy.
CAS
(c) 2015, C. Faloutsos
83
CMU SCS
Summary of Part#1
• *many* patterns in real graphs
– Power-laws everywhere
– Gaussian trap
• Avg << Max
– Long (and growing) list of tools for
anomaly/fraud detection
Patterns
CAS
anomalies
(c) 2015, C. Faloutsos
84
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
• Part#2: time-evolving graphs; tensors
– P2.1: time-evolving graphs
– P2.2: with side information (‘coupled’ M.T.F.)
– (Speed)
• Conclusions
CAS
(c) 2015, C. Faloutsos
85
CMU SCS
Part 2:
Time evolving
graphs; tensors
CAS
(c) 2015, C. Faloutsos
86
CMU SCS
Graphs over time -> tensors!
• Problem #2.1:
– Given who calls whom, and when
– Find patterns / anomalies
smith
CAS
(c) 2015, C. Faloutsos
87
CMU SCS
Graphs over time -> tensors!
• Problem #2.1:
– Given who calls whom, and when
– Find patterns / anomalies
CAS
(c) 2015, C. Faloutsos
88
CMU SCS
Graphs over time -> tensors!
• Problem #2.1:
– Given who calls whom, and when
– Find patterns / anomalies
Tue
Mon
CAS
(c) 2015, C. Faloutsos
89
CMU SCS
Graphs over time -> tensors!
• Problem #2.1:
– Given who calls whom, and when
– Find patterns / anomalies
caller
callee
CAS
(c) 2015, C. Faloutsos
90
CMU SCS
Graphs over time -> tensors!
• Problem #2.1’:
– Given author-keyword-date
– Find patterns / anomalies
MANY more settings,
with >2 ‘modes’
author
keyword
CAS
(c) 2015, C. Faloutsos
91
CMU SCS
Graphs over time -> tensors!
• Problem #2.1’’:
– Given subject – verb – object facts
– Find patterns / anomalies
MANY more settings,
with >2 ‘modes’
subject
object
CAS
(c) 2015, C. Faloutsos
92
CMU SCS
Graphs over time -> tensors!
• Problem #2.1’’’:
– Given <triplets>
– Find patterns / anomalies
MANY more settings,
with >2 ‘modes’
(and 4, 5, etc modes)
mode1
mode2
CAS
(c) 2015, C. Faloutsos
93
CMU SCS
Graphs & side info
• Problem #2.2: coupled (eg., side info)
– Given subject – verb – object facts
• And voxel-activity for each subject-word
– Find patterns / anomalies
subject
object
CAS
`apple tastes sweet’
fMRI voxel activity
(c) 2015, C. Faloutsos
94
CMU SCS
Graphs & side info
• Problem #2.2: coupled (eg., side info)
– Given subject – verb – object facts
• And voxel-activity for each subject-word
– Find patterns / anomalies
‘apple’
‘sweet’
CAS
`apple tastes sweet’
‘apple’
fMRI voxel activity
(c) 2015, C. Faloutsos
95
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
• Part#2: time-evolving graphs; tensors
– P2.1: time-evolving graphs
– P2.2: with side information (‘coupled’ M.T.F.)
– Speed
• Conclusions
CAS
(c) 2015, C. Faloutsos
96
CMU SCS
Answer to both: tensor
factorization
• Recall: (SVD) matrix factorization: finds
blocks
M
products
N
users
CAS
‘meat-eaters’ ‘vegetarians’
‘kids’
‘steaks’
‘cookies’
‘plants’
~
+
(c) 2015, C. Faloutsos
+
97
CMU SCS
Answer to both: tensor
factorization
• PARAFAC decomposition
artists
politicians
subject
=
+
athletes
+
object
CAS
(c) 2015, C. Faloutsos
98
CMU SCS
Answer: tensor factorization
• PARAFAC decomposition
• Results for who-calls-whom-when
– 4M x 15 days
??
??
caller
=
+
??
+
callee
CAS
(c) 2015, C. Faloutsos
99
CMU SCS
Anomaly detection in timeevolving graphs
=
• Anomalous communities in phone call data:
– European country, 4M clients, data over 2 weeks
1 caller
5 receivers
4 days of activity
~200 calls to EACH receiver on EACH day!
CAS
(c) 2015, C. Faloutsos
100
CMU SCS
Anomaly detection in timeevolving graphs
=
• Anomalous communities in phone call data:
– European country, 4M clients, data over 2 weeks
1 caller
5 receivers
4 days of activity
~200 calls to EACH receiver on EACH day!
CAS
(c) 2015, C. Faloutsos
101
CMU SCS
Anomaly detection in timeevolving graphs
=
• Anomalous communities in phone call data:
– European country, 4M clients, data over 2 weeks
Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann,
Christos Faloutsos, Prithwish Basu, Ananthram Swami,
Evangelos Papalexakis, Danai Koutra.
Com2: Fast
~200 calls to EACH receiver on EACH day!
Automatic
Discovery of (c) Temporal
(Comet) Communities.
CAS
2015, C. Faloutsos
102
PAKDD 2014, Tainan, Taiwan.
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
• Part#2: time-evolving graphs; tensors
– P2.1: Discoveries @ phonecall network
– P2.2: Discoveries in neuro-semantics
– (Speed)
• Conclusions
CAS
(c) 2015, C. Faloutsos
103
CMU SCS
Coupled Matrix-Tensor Factorization
(CMTF)
c1
b1
Y
X
a1
d1
a1
CAS
cF
bF
+
aF
dF
+
aF
(c) 2015, C. Faloutsos
104
CMU SCS
Neuro-semantics
• Brain Scan Data*
• 9 persons
• 60 nouns
• Questions
• 218 questions
• ‘is it alive?’, ‘can
you eat it?’
*Mitchell et al. Predicting human brain activity
associated
with the meanings
of nouns. Science,2008.
CAS
(c) 2015, C. Faloutsos
105
Data@ www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html
CMU SCS
Neuro-semantics
• Brain Scan Data*
• 9 persons
• 60 nouns
• Questions
• 218 questions
• ‘is it alive?’, ‘can
you eat it?’
Patterns?
CAS
(c) 2015, C. Faloutsos
106
CMU SCS
Neuro-semantics
• Brain Scan Data*
• 9 persons
• 60 nouns
• Questions
• 218 questions
• ‘is it alive?’, ‘can
you eat it?’
Patterns?
questions
airplane
CAS
nouns
(c) 2015, C. Faloutsos
…
dog
voxels
107
CMU SCS
Neuro-semantics
CAS
(c) 2015, C. Faloutsos
=
108
CMU SCS
Neuro-semantics
=
Small items ->
Premotor cortex
CAS
(c) 2015, C. Faloutsos
109
CMU SCS
Neuro-semantics
Small items ->
Premotor cortex
Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos,
Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy,
Turbo-SMT:
Accelerating(c) 2015,
Coupled
Sparse Matrix-Tensor
CAS
C. Faloutsos
110
Factorizations by 200x, SDM 2014
CMU SCS
Part 2: Conclusions
• Time-evolving / heterogeneous graphs ->
tensors
• PARAFAC finds patterns
• (GigaTensor/HaTen2 -> fast & scalable)
=
CAS
(c) 2015, C. Faloutsos
111
CMU SCS
Roadmap
• Introduction – Motivation
– Why study (big) graphs?
• Part#1: Patterns in graphs
• Part#2: time-evolving graphs; tensors
• Acknowledgements and Conclusions
CAS
(c) 2015, C. Faloutsos
112
CMU SCS
Thanks
Disclaimer: All opinions are mine; not necessarily reflecting
the opinions of the funding agencies
Thanks to: NSF IIS-1247489, IIS-0705359, IIS0534205, CTA-INARC; Yahoo (M45), LLNL, IBM,
CAS
(c) 2015, C. Faloutsos
113
SPRINT, Google, INTEL, HP, iLab
CMU SCS
Cast
Akoglu,
Leman
Kang, U
CAS
Araujo,
Miguel
Beutel,
Alex
Koutra, Papalexakis,
Danai
Vagelis
(c) 2015, C. Faloutsos
Chau,
Polo
Shah,
Neil
Hooi,
Bryan
Song,
Hyun Ah
114
CMU SCS
CONCLUSION#1 – Big data
• Patterns
Anomalies
• Large datasets reveal patterns/outliers that
are invisible otherwise
CAS
(c) 2015, C. Faloutsos
115
CMU SCS
CONCLUSION#2 – tensors
• powerful tool
=
CAS
(c) 2015, C. Faloutsos
116
CMU SCS
References
• D. Chakrabarti, C. Faloutsos: Graph Mining – Laws,
Tools and Case Studies, Morgan Claypool 2012
• http://www.morganclaypool.com/doi/abs/10.2200/S004
49ED1V01Y201209DMK006
CAS
(c) 2015, C. Faloutsos
117
CMU SCS
TAKE HOME MESSAGE:
Cross-disciplinarity
=
CAS
(c) 2015, C. Faloutsos
118
CMU SCS
TAKE
HOME you!
MESSAGE:
Thank
Cross-disciplinarity
=
CAS
(c) 2015, C. Faloutsos
119
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
• Part#2: time-evolving graphs; tensors
– P2.1: Discoveries @ phonecall network
– P2.2: Discoveries in neuro-semantics
– Scalability & Speed
• Conclusions
CAS
(c) 2015, C. Faloutsos
121
CMU SCS
Q: spilling to the disk?
Reminder: tensor (eg., Subject-verb-object)
144M non-zeros
48M
NELL (Never Ending
Language Learner)
@CMU
26M
26M
CAS
(c) 2015, C. Faloutsos
122
CMU SCS
A: GigaTensor
Reminder: tensor (eg., Subject-verb-object)
26M x 48M x 26M, 144M non-zeros
NELL (Never Ending
Language
Learner)@CMU
U Kang, Evangelos E. Papalexakis, Abhay Harpale, Christos
Faloutsos,
GigaTensor: Scaling
Tensor Analysis Up By 100
CAS
(c) 2015, C. Faloutsos
123
Times - Algorithms and Discoveries, KDD’12
CMU SCS
A: GigaTensor
• GigaTensor solves 100x larger problem
(K)
(J)
GigaTensor
100x
Out of
Memory
CAS
(c) 2015, C. Faloutsos
(I)
Number of
nonzero
= I / 50
124
Download