CATCHSYNC: CATCHING SYNCHRONIZED BEHAVIOR IN LARGE DIRECTED GRAPHS Meng Jiang, Tsinghua University, Beijing, China Joint work with Peng Cui, Alex Beutel, Christos Faloutsos and Shiqiang Yang August 26, 2014 – NYC, USA 2 Fraud Detection: Graph Analysis Problem [www.buyfollowz.org] [buymorelikes.com] 3 Fraud Detection: Graph Analysis Problem [buycheaplikes.com] [reviewsteria.com] 4 Our Goals • Given: A graph (large-scale, directed, etc.) • Find: Frauds = Anomalous edges • Goals: • G1. Find patterns that distinguish fraudsters from normal users • G2. Design algorithms that catch fraudsters 5 OUTLINE 1. Background 2. Fraudulent Pattern 3. The Algorithm 4. Experiments 6 Anomalies in Degree Distributions • Power-law distribution DBLP Author-publication Flickr User-user Twitter Who-follows-whom [konect.uni-koblenz.de/networks/] 7 Anomalies in Degree Distributions 2009 3.17M 0.41M 41M d=20 8 Linear Classifier with “Degree”: Fail =20? +1 (Fraud) Label (+1,-1) 3.17M 0.41M Out-degree d=20 classifier × 9 Graph Structure Distorted 2011 1.91M 117M 0.44M d=64 10 Traditional Fraud Detection Big? Small? +1 (Fraud) Label (+1,-1) Out-degree In-degree classifier Big? #tweet Big? #url in tweets Big? #hashtag in tweets Content-based features 11 Empty Profile? 12 Few Followers? 13 Many Followings? 14 Content: Unavailable? Look Normal? 0, 0, 0… sorry Label (+1,-1) Out-degree In-degree classifier #tweet #url in tweets #hashtag in tweets Content-based features 15 Behavior is the Key Monetary Incentive Content Behavior/ Links what they appear to behave what they have to behave 16 OUTLINE 1. Background 2. Fraudulent Pattern 3. The Algorithm 4. Experiments 17 Behavior-based Features Follower behavior ≈ Out-degree 1st left singular vector (Hubness) 2nd left singular vector … Followee behavior ≈ In-degree 1st right singular vector (Authoritativeness) 2nd right singular vector … 18 Behavior-based Feature Space Follower behavior Followee behavior 19 Fraudulent Behavior Patterns 20 Fraudulent Behavior Patterns 21 Fraudulent Behavior Patterns 22 Fraudulent Behavior Patterns 23 Fraudulent Behavior Patterns 24 Fraudulent Behavior Patterns • Synchronized • Abnormal 25 OUTLINE 1. Background 2. Fraudulent Pattern 3. The Algorithm 4. Experiments 26 Synchronicity and Normality • Synchronicity 27 Synchronicity and Normality • Normality 28 Synchronicity-Normality Plot 29 Theorem • For any distribution, there is a parabolic lower limit in the synchronicity-normality plot. synchronicity • Proof. See our paper normality 30 CatchSync Algorithm • Distance-based anomaly detection • Fraudsters • Big synchronicity • Small normality • Away from the densest 31 OUTLINE 1. Background 2. Fraudulent Pattern Mining 3. The Algorithm 4. Experiments 32 Experiments • Q1: Does CatchSync remove anomalies? • Degree distribution • Feature space • Q2: Is CatchSync catching actually fraudulent users? • Q3: Is CatchSync robust? 33 Q1: Does CatchSync Remove Anomalies? 2009 3.17M 41M 0.41M d=20 34 Q1: Does CatchSync Remove Anomalies? 2011 117M 35 Before CatchSync Follower behavior Followee behavior 36 After CatchSync Follower behavior Followee behavior 37 Q2: Is CatchSync Catching Actually Fraudulent Users? 173/1,000 237/1,000 38 Q2: Is CatchSync Catching Actually Fraudulent Users? CatchSync +SPOT 0.813 CatchSync 0.751 0.597 SPOT OutRank 0.412 0 0.2 0.4 0.6 0.8 1 39 Q2: Is CatchSync Catching Actually Fraudulent Users? CatchSync +SPOT 0.785 CatchSync 0.694 0.653 SPOT OutRank 0.377 0 0.2 0.4 0.6 0.8 1 40 Q2: Is CatchSync Catching Actually Fraudulent Users? Recall = 80% Precision in Twitter Precision in Tencent Weibo 83.5% 79.4% 41 Q3: Is CatchSync Robust to Camouflage? Target Popular camouflage Random camouflage 42 Q3: Is CatchSync Robust to Camouflage? 43 Q3: Is CatchSync Robust to Camouflage? 44 Q3: Is CatchSync Robust to Camouflage? Popular camouflage Random camouflage 45 Conclusion • Goals • G1. Find patterns that distinguish fraudulent user behavior from normal behavior • A1: Synchronized & Abnormal! • G2. Design algorithms that catch fraudsters • A2: CatchSync! • Remove spikes • Content free • Robust to camouflage 46 Questions? Meng Jiang mjiang89@gmail.com http://www.meng-jiang.com