CatchSync! - Meng Jiang

advertisement
CATCHSYNC:
CATCHING SYNCHRONIZED
BEHAVIOR IN LARGE
DIRECTED GRAPHS
Meng Jiang, Tsinghua University, Beijing, China
Joint work with Peng Cui, Alex Beutel,
Christos Faloutsos and Shiqiang Yang
August 26, 2014 – NYC, USA
2
Fraud Detection: Graph Analysis Problem
[www.buyfollowz.org]
[buymorelikes.com]
3
Fraud Detection: Graph Analysis Problem
[buycheaplikes.com]
[reviewsteria.com]
4
Our Goals
• Given: A graph (large-scale, directed, etc.)
• Find: Frauds = Anomalous edges
• Goals:
• G1. Find patterns that distinguish fraudsters
from normal users
• G2. Design algorithms that catch fraudsters
5
OUTLINE
1. Background
2. Fraudulent Pattern
3. The Algorithm
4. Experiments
6
Anomalies in Degree Distributions
• Power-law distribution
DBLP
Author-publication
Flickr
User-user
Twitter
Who-follows-whom
[konect.uni-koblenz.de/networks/]
7
Anomalies in Degree Distributions
2009
3.17M
0.41M
41M
d=20
8
Linear Classifier with “Degree”: Fail
=20?
+1
(Fraud)
Label
(+1,-1)
3.17M
0.41M
Out-degree
d=20
classifier
×
9
Graph Structure Distorted
2011
1.91M
117M
0.44M
d=64
10
Traditional Fraud Detection
Big? Small?
+1
(Fraud)
Label
(+1,-1)
Out-degree In-degree
classifier
Big?
#tweet
Big?
#url in
tweets
Big?
#hashtag
in tweets
Content-based
features
11
Empty Profile?
12
Few Followers?
13
Many Followings?
14
Content: Unavailable? Look Normal?
0, 0, 0… sorry
Label
(+1,-1)
Out-degree In-degree
classifier
#tweet
#url in
tweets
#hashtag
in tweets
Content-based
features
15
Behavior is the Key
Monetary Incentive
Content
Behavior/
Links
what they
appear to
behave
what they
have to
behave
16
OUTLINE
1. Background
2. Fraudulent Pattern
3. The Algorithm
4. Experiments
17
Behavior-based Features
Follower
behavior
≈
Out-degree
1st left singular vector
(Hubness)
2nd left singular vector
…
Followee
behavior
≈
In-degree
1st right singular vector
(Authoritativeness)
2nd right singular vector
…
18
Behavior-based Feature Space
Follower
behavior
Followee
behavior
19
Fraudulent Behavior Patterns
20
Fraudulent Behavior Patterns
21
Fraudulent Behavior Patterns
22
Fraudulent Behavior Patterns
23
Fraudulent Behavior Patterns
24
Fraudulent Behavior Patterns
• Synchronized
• Abnormal
25
OUTLINE
1. Background
2. Fraudulent Pattern
3. The Algorithm
4. Experiments
26
Synchronicity and Normality
• Synchronicity
27
Synchronicity and Normality
• Normality
28
Synchronicity-Normality Plot
29
Theorem
• For any distribution, there is a parabolic lower
limit in the synchronicity-normality plot.
synchronicity
• Proof. See our paper 
normality
30
CatchSync Algorithm
• Distance-based
anomaly detection
• Fraudsters
• Big synchronicity
• Small normality
• Away from the densest
31
OUTLINE
1. Background
2. Fraudulent Pattern Mining
3. The Algorithm
4. Experiments
32
Experiments
• Q1: Does CatchSync remove anomalies?
• Degree distribution
• Feature space
• Q2: Is CatchSync catching actually
fraudulent users?
• Q3: Is CatchSync robust?
33
Q1: Does CatchSync Remove Anomalies?
2009
3.17M
41M
0.41M
d=20
34
Q1: Does CatchSync Remove Anomalies?
2011
117M
35
Before CatchSync
Follower
behavior
Followee
behavior
36
After CatchSync
Follower
behavior
Followee
behavior
37
Q2: Is CatchSync Catching Actually
Fraudulent Users?
173/1,000
237/1,000
38
Q2: Is CatchSync Catching Actually
Fraudulent Users?
CatchSync
+SPOT
0.813
CatchSync
0.751
0.597
SPOT
OutRank
0.412
0
0.2
0.4
0.6
0.8
1
39
Q2: Is CatchSync Catching Actually
Fraudulent Users?
CatchSync
+SPOT
0.785
CatchSync
0.694
0.653
SPOT
OutRank
0.377
0
0.2
0.4
0.6
0.8
1
40
Q2: Is CatchSync Catching Actually
Fraudulent Users?
Recall = 80%
Precision in Twitter
Precision in Tencent Weibo
83.5%
79.4%
41
Q3: Is CatchSync Robust to Camouflage?
Target
Popular
camouflage
Random
camouflage
42
Q3: Is CatchSync Robust to Camouflage?
43
Q3: Is CatchSync Robust to Camouflage?
44
Q3: Is CatchSync Robust to Camouflage?
Popular
camouflage
Random
camouflage
45
Conclusion
• Goals
• G1. Find patterns that distinguish fraudulent
user behavior from normal behavior
• A1: Synchronized & Abnormal!
• G2. Design algorithms that catch fraudsters
• A2: CatchSync!
• Remove spikes
• Content free
• Robust to camouflage
46
Questions?
Meng Jiang
mjiang89@gmail.com
http://www.meng-jiang.com
Download