Patterns, Anomalies, and Fraud Detection in Large Graphs Christos Faloutsos

advertisement
CMU SCS
Patterns, Anomalies, and
Fraud Detection in Large
Graphs
Christos Faloutsos
CMU
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Static graphs
– Patterns
– tools
• Part#2: time-evolving graphs
– Patterns
– Tools
• Conclusions
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
2
CMU SCS
Graphs - why should we care?
• Financial (user-account)
• Social networks
• computer network security: email/IP traffic and
anomaly detection
• Recommendation systems
• ....
• Many-to-many db relationship -> graph
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
3
CMU SCS
Motivating problems
• P1: patterns? Fraud detection?
• P2: patterns in time-evolving graphs /
tensors
account
time
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
4
CMU SCS
Motivating problems
• P1: patterns? Fraud detection?
Patterns
anomalies
• P2: patterns in time-evolving graphs /
tensors
account
time
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
5
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Static graphs
– Patterns
– tools
• Part#2: time-evolving graphs
– Patterns
– Tools
• Conclusions
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
6
CMU SCS
Part 1:
Static graphs
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
7
CMU SCS
Laws and patterns
• Q1: Are real graphs random?
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
8
CMU SCS
Laws and patterns
• Q1: Are real graphs random?
• A1: NO!!
– Diameter (‘6 degrees’; ‘Kevin Bacon’)
– in- and out- degree distributions
– other (surprising) patterns
• So, let’s look at the data
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
9
CMU SCS
Solution# S.1
• Power law in the degree distribution [Faloutsos x 3
SIGCOMM99]
internet domains
att.com
log(degree)
ibm.com
log(rank)
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
10
CMU SCS
Solution# S.1
• Power law in the degree distribution [Faloutsos x 3
SIGCOMM99; + Siganos]
internet domains
att.com
log(degree)
ibm.com
-0.82
log(rank)
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
11
CMU SCS
Solution# S.2: Eigen Exponent E
Eigenvalue
Exponent = slope
E = -0.48
May 2001
Ax=lx
Rank of decreasing eigenvalue
• A2: power law in the eigenvalues of the adjacency
matrix (‘eig()’)
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
12
CMU SCS
Solution# S.3: Triangle ‘Laws’
• Real social networks have a lot of triangles
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
13
CMU SCS
Solution# S.3: Triangle ‘Laws’
• Real social networks have a lot of triangles
– Friends of friends are friends
• Any patterns?
– 2x the friends, 2x the triangles ?
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
14
CMU SCS
Triangle Law: #S.3
[Tsourakakis ICDM 2008]
Reuters
Epinions
Eurlion, Beijing 2015
SN
X-axis: degree
Y-axis: mean # triangles
n friends -> ~n1.6 triangles
(c) 2015, C. Faloutsos
15
CMU SCS
Triangle counting for large graphs?
?
?
?
Anomalous nodes in Twitter(~ 3 billion edges)
[U Kang, Brendan Meeder, +, PAKDD’11]
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
16
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges)
[U Kang, Brendan Meeder, +, PAKDD’11]
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
17
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges)
[U Kang, Brendan Meeder, +, PAKDD’11]
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
18
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges)
[U Kang, Brendan Meeder, +, PAKDD’11]
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
19
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges)
[U Kang, Brendan Meeder, +, PAKDD’11]
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
20
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Static graphs
– Patterns (binary; weighted)
– tools
• Part#2: time-evolving graphs
• Conclusions
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
21
CMU SCS
Observations on weighted
graphs?
• A: yes - even more ‘laws’!
M. McGlohon, L. Akoglu, and C. Faloutsos
Weighted Graphs and Disconnected
Components: Patterns and a Generator.
SIG-KDD 2008
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
22
CMU SCS
Observation W.1: Fortification
Q: How do the weights
of nodes relate to degree?
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
23
CMU SCS
Observation W.1: Fortification
Double the checks,
double the $ ?
$10
$5
$7
‘Reagan’
‘Clinton’
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
24
CMU SCS
Observation W.1: fortification:
Snapshot Power Law
• Weight: super-linear on in-degree
• exponent ‘iw’: 1.01 < iw < 1.26
Orgs-Candidates
Double the checks,
double the $ ?
$10
e.g. John Kerry,
$10M received,
from 1K donors
In-weights
($)
$5
Edges (# donors)
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
25
CMU SCS
MORE Graph Patterns
✔
✔
✔
RTG:Eurlion,
A Recursive
Realistic
Graph Generator using Random
Beijing 2015
(c) 2015, C. Faloutsos
26
Typing Leman Akoglu and Christos Faloutsos. PKDD’09.
CMU SCS
MORE Graph Patterns
• Mary McGlohon, Leman Akoglu, Christos
Faloutsos. Statistical Properties of Social
Networks. in "Social Network Data Analytics” (Ed.:
Charu Aggarwal)
• Deepayan Chakrabarti and Christos Faloutsos,
Graph Mining: Laws, Tools, and Case Studies Oct.
2012, Morgan Claypool.
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
27
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Static graphs
– Patterns
– Tools
• Local
• Spectral
• Label propagation
• Part#2: time-evolving graphs
• Conclusions
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
28
CMU SCS
Overview of tools
✔ Triangle-degree
oddBall
eigenSpokes
fBox
netProbe+
tensors
Eurlion, Beijing 2015
Binary /
weights
B
W
W
W
W
W
Timeevolving
(c) 2015, C. Faloutsos
with classlabels
Y
Y
29
CMU SCS
OddBall: Spotting Anomalies
in Weighted Graphs
Leman Akoglu, Mary McGlohon, Christos
Faloutsos
Carnegie Mellon University
School of Computer Science
PAKDD 2010, Hyderabad, India
CMU SCS
Main idea
For each node,
• extract ‘ego-net’ (=1-step-away neighbors)
• Extract features (#edges, total weight, etc
etc)
• Compare with the rest of the population
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
31
CMU SCS
What is an egonet?
egonet
ego
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
32
CMU SCS
Selected Features




Ni: number of neighbors (degree) of ego i
Ei: number of edges in egonet i
Wi: total weight of egonet i
λw,i: principal eigenvalue of the weighted
adjacency matrix of egonet I
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
33
CMU SCS
Near-Clique/Star
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
34
CMU SCS
Near-Clique/Star
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
35
CMU SCS
Near-Clique/Star
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
36
CMU SCS
Near-Clique/Star
Andrew Lewis
(director)
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
37
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Static graphs
– Patterns
– Tools
• Local
• Spectral (EigenSpokes, CopyCatch)
• Label propagation
• Part#2: time-evolving graphs
• Conclusions
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
38
CMU SCS
Overview of tools
✔ Triangle-degree
✔ oddBall
eigenSpokes
fBox
netProbe+
tensors
Eurlion, Beijing 2015
Binary /
weights
B
W
W
W
W
W
Timeevolving
(c) 2015, C. Faloutsos
with classlabels
Y
Y
39
CMU SCS
EigenSpokes
B. Aditya Prakash, Mukund Seshadri, Ashwin
Sridharan, Sridhar Machiraju and Christos
Faloutsos: EigenSpokes: Surprising
Patterns and Scalable Community Chipping
in Large Graphs, PAKDD 2010,
Hyderabad, India, 21-24 June 2010.
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
40
CMU SCS
EigenSpokes
• Eigenvectors of adjacency matrix
 equivalent to singular vectors
(symmetric, undirected graph)
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
41
CMU SCS
details
EigenSpokes
• Eigenvectors of adjacency matrix
 equivalent to singular vectors
(symmetric, undirected graph)
N
N
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
42
CMU SCS
details
EigenSpokes
• Eigenvectors of adjacency matrix
 equivalent to singular vectors
(symmetric, undirected graph)
N
N
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
43
CMU SCS
details
EigenSpokes
• Eigenvectors of adjacency matrix
 equivalent to singular vectors
(symmetric, undirected graph)
N
N
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
44
CMU SCS
details
EigenSpokes
• Eigenvectors of adjacency matrix
 equivalent to singular vectors
(symmetric, undirected graph)
N
N
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
45
CMU SCS
EigenSpokes
2nd Principal
component
u2
• EE plot:
• Scatter plot of
scores of u1 vs u2
• One would expect
– Many points @
origin
– A few scattered
~randomly
Eurlion, Beijing 2015
u1
1st Principal
component
(c) 2015, C. Faloutsos
46
CMU SCS
EigenSpokes
• EE plot:
• Scatter plot of
scores of u1 vs u2
• One would expect
u2
– Many points @
origin
– A few scattered
~randomly
Eurlion, Beijing 2015
90o
u1
(c) 2015, C. Faloutsos
47
CMU SCS
EigenSpokes - pervasiveness
• Present in mobile social graph
 across time and space
• Patent citation graph
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
48
CMU SCS
EigenSpokes - explanation
Near-cliques, or nearbipartite-cores, loosely
connected
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
49
CMU SCS
EigenSpokes - explanation
Near-cliques, or nearbipartite-cores, loosely
connected
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
50
CMU SCS
EigenSpokes - explanation
Near-cliques, or nearbipartite-cores, loosely
connected
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
51
CMU SCS
EigenSpokes - explanation
Near-cliques, or nearbipartite-cores, loosely
connected
spy plot of top 20 nodes
So what?
 Extract nodes with high
scores
 high connectivity
 Good “communities”
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
52
CMU SCS
Bipartite Communities!
patents from
same inventor(s)
`cut-and-paste’
bibliography!
magnified bipartite community
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
53
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Static graphs
– Patterns
– Tools
• Local
• Spectral (eigenspokes, CopyCatch)
• Label propagation
• Part#2: time-evolving graphs
• Conclusions
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
54
CMU SCS
Fraud
• Given
– Who ‘likes’ what page, and
when
• Find
– Suspicious users and suspicious
products
CopyCatch: Stopping Group Attacks by Spotting
Lockstep Behavior in Social Networks, Alex Beutel,
Wanhong
Xu, Venkatesan
Guruswami, Christopher Palow,
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
55
Christos Faloutsos WWW, 2013.
CMU SCS
Fraud
• Given
– Who ‘likes’ what page, and
when
• Find
– Suspicious users and suspicious
products
Likes
CopyCatch: Stopping Group Attacks by Spotting
Lockstep Behavior in Social Networks, Alex Beutel,
Wanhong
Xu, Venkatesan
Guruswami, Christopher Palow,
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
56
Christos Faloutsos WWW, 2013.
CMU SCS
Graph Patterns and Lockstep
Our intuition Behavior
▪
Lockstep behavior: Same Likes, same time
Likes
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
57
CMU SCS
Graph Patterns and Lockstep
Our intuition Behavior
▪
Lockstep behavior: Same Likes, same time
Likes
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
58
CMU SCS
Graph Patterns and Lockstep
Our intuition Behavior
▪
Lockstep behavior: Same Likes, same time
Likes
Eurlion, Beijing 2015
(c) 2015,Lockstep
C. Faloutsos
Suspicious
Behavior
59
CMU SCS
MapReduce Overview
▪
Use Hadoop to search for
many clusters in parallel:
1.
Start with randomly seed
2.
Update set of Pages and
center Like times for
each cluster
3.
Repeat until
convergence
Likes
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
60
CMU SCS
Deployment at Facebook
CopyCatch runs regularly (along with many other
security mechanisms, and a large Site Integrity
team)
3 months of CopyCatch @ Facebook
#users
caught
Eurlion, Beijing 2015
Number of users caught
▪
08/25 09/08 09/22 10/06 10/20 11/03 11/17 12/01
(c)of2015,
C. Faloutsos
Date
CopyCatch
run
time
61
CMU SCS
Deployment at Facebook
Fake acct
Most clusters (77%) come from
real but compromised users
Manually labeled 22 randomly selected
clusters from February 2013
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
62
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Static graphs
– Patterns
– Tools
• Local
• Spectral (EigenSpokes, CopyCatch, fBox)
• Label propagation
• Part#2: time-evolving graphs
• Conclusions
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
63
CMU SCS
Problem: Social Network Link Fraud
Target: find “stealthy” attackers missed by other algorithms
Clique
41.7M nodes
1.5B edges
Bipartite
core
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
64
CMU SCS
Problem: Social Network Link Fraud
Target: find “stealthy” attackers missed by other algorithms
Takeaway: use reconstruction error
between true/latent representation!
Neil Shah, Alex Beutel, Brian Gallagher and Christos
Faloutsos.
Spotting Suspicious
Link Behavior with fBox: An
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
65
Adversarial Perspective. ICDM 2014, Shenzhen, China.
CMU SCS
Overview of tools
✔ Triangle-degree
✔ oddBall
✔ eigenSpokes
✔ fBox
netProbe+
tensors
Eurlion, Beijing 2015
Binary /
weights
B
W
W
W
W
W
Timeevolving
(c) 2015, C. Faloutsos
with classlabels
Y
Y
66
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Static graphs
– Patterns
– Tools
• Local
• Spectral (eigenspokes, CopyCatch)
• Label propagation (NetProbe, Polonium, Snare)
• Part#2: time-evolving graphs
• Conclusions
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
67
CMU SCS
E-bay Fraud detection
w/ Polo Chau &
Shashank Pandit, CMU
[www’07]
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
68
CMU SCS
E-bay Fraud detection
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
69
CMU SCS
E-bay Fraud detection
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
70
CMU SCS
E-bay Fraud detection - NetProbe
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
71
CMU SCS
Popular press
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
72
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Static graphs
– Patterns
– Tools
• Local
• Spectral (eigenspokes, CopyCatch)
• Label propagation (NetProbe, Polonium, Snare)
• Part#2: time-evolving graphs
• Conclusions
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
73
CMU SCS
Polonium: Tera-Scale Graph Mining and
Inference for Malware Detection
SDM 2011, Mesa, Arizona
Polo Chau
Carey Nachenberg
Machine Learning Dept Vice President & Fellow
Jeffrey Wilhelm
Principal Software Engineer
Adam Wright
Prof. Christos Faloutsos
Software Engineer
Computer Science Dept
CMU SCS
Polonium: The Data
60+ terabytes of data anonymously
contributed by participants of worldwide
Norton Community Watch program
50+ million machines
900+ million executable files
Constructed a machine-file bipartite graph
(0.2 TB+)
1 billion nodes (machines and files)
37 billion edges
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
75
CMU SCS
Polonium: Key Ideas
• Use Belief Propagation to propagate
domain knowledge in machine-file graph to
detect malware
• Use “guilt-by-association” (i.e., homophily)
– E.g., files that appear on machines with many
bad files are more likely to be bad
• Scalability: handles 37 billion-edge graph
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
76
CMU SCS
Polonium: One-Interaction Results
Ideal
84.9% True Positive Rate
1% False Positive Rate
True Positive Rate
% of malware
correctly identified
Eurlion, Beijing 2015
False Positive Rate
(c) 2015, C. Faloutsos
77
% of non-malware
wrongly labeled as malware
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Static graphs
– Patterns
– Tools
• Local
• Spectral (eigenspokes, CopyCatch)
• Label propagation (NetProbe, Polonium, Snare)
• Part#2: time-evolving graphs
• Conclusions
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
78
CMU SCS
Network Effect Tools: SNARE
• Some accounts are sort-of-suspicious – how to combine weak
signals?
Before
Inventory
Revenue
L.A.
Acc.
payable
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
79
CMU SCS
Network Effect Tools: SNARE
• Some accounts are sort-of-suspicious – how to combine weak
signals?
Before
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
80
CMU SCS
Network Effect Tools: SNARE
• A: Belief Propagation.
Before
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
81
CMU SCS
Network Effect Tools: SNARE
• A: Belief Propagation.
Before
After
Mary McGlohon, Stephen Bay, Markus G. Anderle, David M.
Steier,
Christos Faloutsos:(c) 2015,
SNARE:
a link analytic system for
Eurlion, Beijing 2015
C. Faloutsos
graph labeling and risk detection. KDD 2009: 1265-1274
82
CMU SCS
Network Effect Tools: SNARE
• Produces improvement over simply using flags
– Up to 6.5 lift
– Improvement especially for low false positive rate
Results for accounts data (ROC Curve)
Ideal
True
positive
rate
Eurlion, Beijing 2015
SNARE
(c) 2015, C. Faloutsos
False positive rate
Baseline (flags
only)
83
CMU SCS
Network Effect Tools: SNARE
• Accurate- Produces large improvement over
simply using flags
• Flexible- Can be applied to other domains
• Scalable- One iteration BP runs in linear time
(# edges)
• Robust- Works on large range of parameters
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
84
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Static graphs
– Patterns
– Tools
• Local
• Spectral (eigenspokes, CopyCatch)
• Label propagation (NetProbe,… , Unification)
• Part#2: time-evolving graphs
• Conclusions
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
85
CMU SCS
Unifying Guilt-by-Association Approaches:
Theorems and Fast Algorithms
Danai Koutra
U Kang
Hsing-Kuo Kenneth Pao
Tai-You Ke
Duen Horng (Polo) Chau
Christos Faloutsos
ECML PKDD, 5-9 September 2011, Athens, Greece
CMU SCS
Problem Definition:
GBA techniques
(Guilt By Association)
Given: Graph; &
few labeled nodes
Find: labels of rest
(assuming network
effects)
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
87
CMU SCS
Are they related?
• RWR (Random Walk with Restarts)
– google’s pageRank (‘if my friends are
important, I’m important, too’)
• SSL (Semi-supervised learning)
– minimize the differences among neighbors
• BP (Belief propagation)
– send messages to neighbors, on what you
believe about them
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
88
CMU SCS
Are they related?
YES!
• RWR (Random Walk with Restarts)
– google’s pageRank (‘if my friends are
important, I’m important, too’)
• SSL (Semi-supervised learning)
– minimize the differences among neighbors
• BP (Belief propagation)
– send messages to neighbors, on what you
believe about them
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
89
CMU SCS
DETAILS
Correspondence of Methods
Method
Matrix
RWR
SSL
FABP
[I – c AD-1]
[I + a(D - A)]
[I + a D - c’A]
1
Eurlion, Beijing 2015
Unknow
known
n
×
x
= (1-c)y
×
x
=
y
×
bh
=
φh
d1
0 1 0
1
d2
1 0 1
1
d3 0 1 0
adjacency
matrix
(c) 2015, C. Faloutsos
?
final
labels/
beliefs
0
1
1
prior
labels/
beliefs
90
CMU SCS
runtime (min)
Results: Scalability DETAILS
# of edges
FABP is linear on the number of edges.
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
91
CMU SCS
Overview of tools
✔ Triangle-degree
✔ oddBall
✔ eigenSpokes
✔ fBox
✔ netProbe+
tensors
Eurlion, Beijing 2015
Binary /
weights
B
W
W
W
W
W
Timeevolving
(c) 2015, C. Faloutsos
with classlabels
Y
Y
92
CMU SCS
Summary of Part#1
• *many* patterns in real graphs
– Power-laws everywhere
– Long (and growing) list of tools for
anomaly/fraud detection
…
oddBall
Patterns
Eurlion, Beijing 2015
fBox
NetProbe
anomalies
(c) 2015, C. Faloutsos
93
CMU SCS
Roadmap
• Introduction – Motivation
• Part#1: Static graphs
– Patterns
– Tools
• Local
• Spectral (eigenspokes, CopyCatch)
• Label propagation (NetProbe, Polonium, Snare)
• Part#2: time-evolving graphs
• Conclusions
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
94
CMU SCS
Part 2:
Time evolving
graphs
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
95
CMU SCS
Graphs over time -> tensors!
• Problem #2:
– Given who calls whom, and when
– Find patterns / anomalies
smith
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
96
CMU SCS
Graphs over time -> tensors!
• Problem #2:
– Given who calls whom, and when
– Find patterns / anomalies
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
97
CMU SCS
Graphs over time -> tensors!
• Problem #2:
– Given who calls whom, and when
– Find patterns / anomalies
Tue
Mon
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
98
CMU SCS
Graphs over time -> tensors!
• Problem #2:
– Given who calls whom, and when
– Find patterns / anomalies
caller
callee
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
99
CMU SCS
Graphs over time -> tensors!
• Problem #2’:
– Given customer-account-date
– Find patterns / anomalies
MANY more settings,
with >2 ‘modes’
customer
account
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
100
CMU SCS
Graphs over time -> tensors!
• Problem #2’’:
– Given author-keyword-date
– Find patterns / anomalies
MANY more settings,
with >2 ‘modes’
author
keyword
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
101
CMU SCS
Graphs over time -> tensors!
• Problem #2’’’:
– Given subject – verb – object facts
– Find patterns / anomalies
MANY more settings,
with >2 ‘modes’
subject
object
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
102
CMU SCS
Graphs over time -> tensors!
• Problem #2’’’’:
– Given <triplets>
– Find patterns / anomalies
MANY more settings,
with >2 ‘modes’
(and 4, 5, etc modes)
mode1
mode2
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
103
CMU SCS
Answer to all: tensor factorization
• Recall: (SVD) matrix factorization: finds
blocks
M
products
N
users
Eurlion, Beijing 2015
‘meat-eaters’ ‘vegetarians’
‘kids’
‘steaks’
‘cookies’
‘plants’
~
+
(c) 2015, C. Faloutsos
+
104
CMU SCS
Answer to all: tensor factorization
• PARAFAC decomposition
artists
politicians
subject
=
+
athletes
+
object
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
105
CMU SCS
Answer: tensor factorization
• PARAFAC decomposition
• Results for who-calls-whom-when
– 4M x 15 days
??
??
caller
=
+
??
+
callee
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
106
CMU SCS
Anomaly detection in timeevolving graphs
=
• Anomalous communities in phone call data:
– European country, 4M clients, data over 2 weeks
1 caller
5 receivers
4 days of activity
~200 calls to EACH receiver on EACH day!
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
107
CMU SCS
Anomaly detection in timeevolving graphs
=
• Anomalous communities in phone call data:
– European country, 4M clients, data over 2 weeks
1 caller
5 receivers
4 days of activity
~200 calls to EACH receiver on EACH day!
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
108
CMU SCS
Anomaly detection in timeevolving graphs
=
• Anomalous communities in phone call data:
– European country, 4M clients, data over 2 weeks
Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann,
Christos Faloutsos, Prithwish Basu, Ananthram Swami,
Evangelos Papalexakis, Danai Koutra.
Com2: Fast
~200 calls to EACH receiver on EACH day!
Automatic
Discovery of (c) Temporal
(Comet) Communities.
Eurlion, Beijing 2015
2015, C. Faloutsos
109
PAKDD 2014, Tainan, Taiwan.
CMU SCS
Roadmap
• Introduction – Motivation
– Why study (big) graphs?
• Part#1: Patterns in graphs
• Part#2: time-evolving graphs; tensors
• Resources and Conclusions
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
110
CMU SCS
Project info: PEGASUS
www.cs.cmu.edu/~pegasus
Results on large graphs: with Pegasus +
hadoop + M45
Apache license
Code, papers, manual, video
Prof. U Kang
Eurlion, Beijing 2015
Prof. Polo Chau
(c) 2015, C. Faloutsos
111
CMU SCS
Cast
Akoglu,
Leman
Koutra,
Danai
Araujo,
Miguel
Beutel,
Alex
Lee,
Jay Yoon
Prakash,
Aditya
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
Chau,
Polo
Kang, U
Papalexakis,
Vagelis
Shah,
Neil
112
CMU SCS
CONCLUSION#0
• Patterns
Eurlion, Beijing 2015
Anomalies
(c) 2015, C. Faloutsos
113
CMU SCS
CONCLUSION#1
• Many, surprising patterns in real graphs
…
Degree-weight
Rank-degree
Degree-#triangles
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
114
CMU SCS
CONCLUSION#2 - tools
• Many, powerful tools
…
oddBall
Eurlion, Beijing 2015
fBox
tensors
(c) 2015, C. Faloutsos
115
CMU SCS
Overview of tools
Triangle-degree
oddBall
eigenSpokes
fBox
netProbe+
tensors
Eurlion, Beijing 2015
Binary /
weights
B
W
W
W
W
W
Timeevolving
(c) 2015, C. Faloutsos
with classlabels
Y
Y
116
CMU SCS
References
• D. Chakrabarti, C. Faloutsos: Graph Mining – Laws,
Tools and Case Studies, Morgan Claypool 2012
• http://www.morganclaypool.com/doi/abs/10.2200/S004
49ED1V01Y201209DMK006
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
117
CMU SCS
Thank you!
• Many, powerful tools
…
oddBall
fBox
tensors
christos@cs.cmu.edu
www.cs.cmu.edu/~christos
Eurlion, Beijing 2015
(c) 2015, C. Faloutsos
118
Download