CMU SCS Patterns, Anomalies, and Fraud Detection in Large Graphs Christos Faloutsos CMU CMU SCS Roadmap • Introduction – Motivation • Part#1: Static graphs – Patterns – tools • Part#2: time-evolving graphs – Patterns – Tools • Conclusions Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 2 CMU SCS Graphs - why should we care? • Financial (user-account) • Social networks • computer network security: email/IP traffic and anomaly detection • Recommendation systems • .... • Many-to-many db relationship -> graph Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 3 CMU SCS Motivating problems • P1: patterns? Fraud detection? • P2: patterns in time-evolving graphs / tensors account time Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 4 CMU SCS Motivating problems • P1: patterns? Fraud detection? Patterns anomalies • P2: patterns in time-evolving graphs / tensors account time Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 5 CMU SCS Roadmap • Introduction – Motivation • Part#1: Static graphs – Patterns – tools • Part#2: time-evolving graphs – Patterns – Tools • Conclusions Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 6 CMU SCS Part 1: Static graphs Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 7 CMU SCS Laws and patterns • Q1: Are real graphs random? Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 8 CMU SCS Laws and patterns • Q1: Are real graphs random? • A1: NO!! – Diameter (‘6 degrees’; ‘Kevin Bacon’) – in- and out- degree distributions – other (surprising) patterns • So, let’s look at the data Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 9 CMU SCS Solution# S.1 • Power law in the degree distribution [Faloutsos x 3 SIGCOMM99] internet domains att.com log(degree) ibm.com log(rank) Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 10 CMU SCS Solution# S.1 • Power law in the degree distribution [Faloutsos x 3 SIGCOMM99; + Siganos] internet domains att.com log(degree) ibm.com -0.82 log(rank) Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 11 CMU SCS Solution# S.2: Eigen Exponent E Eigenvalue Exponent = slope E = -0.48 May 2001 Ax=lx Rank of decreasing eigenvalue • A2: power law in the eigenvalues of the adjacency matrix (‘eig()’) Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 12 CMU SCS Solution# S.3: Triangle ‘Laws’ • Real social networks have a lot of triangles Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 13 CMU SCS Solution# S.3: Triangle ‘Laws’ • Real social networks have a lot of triangles – Friends of friends are friends • Any patterns? – 2x the friends, 2x the triangles ? Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 14 CMU SCS Triangle Law: #S.3 [Tsourakakis ICDM 2008] Reuters Epinions Eurlion, Beijing 2015 SN X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles (c) 2015, C. Faloutsos 15 CMU SCS Triangle counting for large graphs? ? ? ? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 16 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 17 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 18 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 19 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 20 CMU SCS Roadmap • Introduction – Motivation • Part#1: Static graphs – Patterns (binary; weighted) – tools • Part#2: time-evolving graphs • Conclusions Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 21 CMU SCS Observations on weighted graphs? • A: yes - even more ‘laws’! M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008 Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 22 CMU SCS Observation W.1: Fortification Q: How do the weights of nodes relate to degree? Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 23 CMU SCS Observation W.1: Fortification Double the checks, double the $ ? $10 $5 $7 ‘Reagan’ ‘Clinton’ Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 24 CMU SCS Observation W.1: fortification: Snapshot Power Law • Weight: super-linear on in-degree • exponent ‘iw’: 1.01 < iw < 1.26 Orgs-Candidates Double the checks, double the $ ? $10 e.g. John Kerry, $10M received, from 1K donors In-weights ($) $5 Edges (# donors) Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 25 CMU SCS MORE Graph Patterns ✔ ✔ ✔ RTG:Eurlion, A Recursive Realistic Graph Generator using Random Beijing 2015 (c) 2015, C. Faloutsos 26 Typing Leman Akoglu and Christos Faloutsos. PKDD’09. CMU SCS MORE Graph Patterns • Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed.: Charu Aggarwal) • Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool. Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 27 CMU SCS Roadmap • Introduction – Motivation • Part#1: Static graphs – Patterns – Tools • Local • Spectral • Label propagation • Part#2: time-evolving graphs • Conclusions Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 28 CMU SCS Overview of tools ✔ Triangle-degree oddBall eigenSpokes fBox netProbe+ tensors Eurlion, Beijing 2015 Binary / weights B W W W W W Timeevolving (c) 2015, C. Faloutsos with classlabels Y Y 29 CMU SCS OddBall: Spotting Anomalies in Weighted Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School of Computer Science PAKDD 2010, Hyderabad, India CMU SCS Main idea For each node, • extract ‘ego-net’ (=1-step-away neighbors) • Extract features (#edges, total weight, etc etc) • Compare with the rest of the population Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 31 CMU SCS What is an egonet? egonet ego Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 32 CMU SCS Selected Features Ni: number of neighbors (degree) of ego i Ei: number of edges in egonet i Wi: total weight of egonet i λw,i: principal eigenvalue of the weighted adjacency matrix of egonet I Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 33 CMU SCS Near-Clique/Star Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 34 CMU SCS Near-Clique/Star Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 35 CMU SCS Near-Clique/Star Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 36 CMU SCS Near-Clique/Star Andrew Lewis (director) Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 37 CMU SCS Roadmap • Introduction – Motivation • Part#1: Static graphs – Patterns – Tools • Local • Spectral (EigenSpokes, CopyCatch) • Label propagation • Part#2: time-evolving graphs • Conclusions Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 38 CMU SCS Overview of tools ✔ Triangle-degree ✔ oddBall eigenSpokes fBox netProbe+ tensors Eurlion, Beijing 2015 Binary / weights B W W W W W Timeevolving (c) 2015, C. Faloutsos with classlabels Y Y 39 CMU SCS EigenSpokes B. Aditya Prakash, Mukund Seshadri, Ashwin Sridharan, Sridhar Machiraju and Christos Faloutsos: EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs, PAKDD 2010, Hyderabad, India, 21-24 June 2010. Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 40 CMU SCS EigenSpokes • Eigenvectors of adjacency matrix equivalent to singular vectors (symmetric, undirected graph) Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 41 CMU SCS details EigenSpokes • Eigenvectors of adjacency matrix equivalent to singular vectors (symmetric, undirected graph) N N Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 42 CMU SCS details EigenSpokes • Eigenvectors of adjacency matrix equivalent to singular vectors (symmetric, undirected graph) N N Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 43 CMU SCS details EigenSpokes • Eigenvectors of adjacency matrix equivalent to singular vectors (symmetric, undirected graph) N N Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 44 CMU SCS details EigenSpokes • Eigenvectors of adjacency matrix equivalent to singular vectors (symmetric, undirected graph) N N Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 45 CMU SCS EigenSpokes 2nd Principal component u2 • EE plot: • Scatter plot of scores of u1 vs u2 • One would expect – Many points @ origin – A few scattered ~randomly Eurlion, Beijing 2015 u1 1st Principal component (c) 2015, C. Faloutsos 46 CMU SCS EigenSpokes • EE plot: • Scatter plot of scores of u1 vs u2 • One would expect u2 – Many points @ origin – A few scattered ~randomly Eurlion, Beijing 2015 90o u1 (c) 2015, C. Faloutsos 47 CMU SCS EigenSpokes - pervasiveness • Present in mobile social graph across time and space • Patent citation graph Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 48 CMU SCS EigenSpokes - explanation Near-cliques, or nearbipartite-cores, loosely connected Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 49 CMU SCS EigenSpokes - explanation Near-cliques, or nearbipartite-cores, loosely connected Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 50 CMU SCS EigenSpokes - explanation Near-cliques, or nearbipartite-cores, loosely connected Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 51 CMU SCS EigenSpokes - explanation Near-cliques, or nearbipartite-cores, loosely connected spy plot of top 20 nodes So what? Extract nodes with high scores high connectivity Good “communities” Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 52 CMU SCS Bipartite Communities! patents from same inventor(s) `cut-and-paste’ bibliography! magnified bipartite community Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 53 CMU SCS Roadmap • Introduction – Motivation • Part#1: Static graphs – Patterns – Tools • Local • Spectral (eigenspokes, CopyCatch) • Label propagation • Part#2: time-evolving graphs • Conclusions Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 54 CMU SCS Fraud • Given – Who ‘likes’ what page, and when • Find – Suspicious users and suspicious products CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 55 Christos Faloutsos WWW, 2013. CMU SCS Fraud • Given – Who ‘likes’ what page, and when • Find – Suspicious users and suspicious products Likes CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 56 Christos Faloutsos WWW, 2013. CMU SCS Graph Patterns and Lockstep Our intuition Behavior ▪ Lockstep behavior: Same Likes, same time Likes Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 57 CMU SCS Graph Patterns and Lockstep Our intuition Behavior ▪ Lockstep behavior: Same Likes, same time Likes Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 58 CMU SCS Graph Patterns and Lockstep Our intuition Behavior ▪ Lockstep behavior: Same Likes, same time Likes Eurlion, Beijing 2015 (c) 2015,Lockstep C. Faloutsos Suspicious Behavior 59 CMU SCS MapReduce Overview ▪ Use Hadoop to search for many clusters in parallel: 1. Start with randomly seed 2. Update set of Pages and center Like times for each cluster 3. Repeat until convergence Likes Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 60 CMU SCS Deployment at Facebook CopyCatch runs regularly (along with many other security mechanisms, and a large Site Integrity team) 3 months of CopyCatch @ Facebook #users caught Eurlion, Beijing 2015 Number of users caught ▪ 08/25 09/08 09/22 10/06 10/20 11/03 11/17 12/01 (c)of2015, C. Faloutsos Date CopyCatch run time 61 CMU SCS Deployment at Facebook Fake acct Most clusters (77%) come from real but compromised users Manually labeled 22 randomly selected clusters from February 2013 Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 62 CMU SCS Roadmap • Introduction – Motivation • Part#1: Static graphs – Patterns – Tools • Local • Spectral (EigenSpokes, CopyCatch, fBox) • Label propagation • Part#2: time-evolving graphs • Conclusions Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 63 CMU SCS Problem: Social Network Link Fraud Target: find “stealthy” attackers missed by other algorithms Clique 41.7M nodes 1.5B edges Bipartite core Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 64 CMU SCS Problem: Social Network Link Fraud Target: find “stealthy” attackers missed by other algorithms Takeaway: use reconstruction error between true/latent representation! Neil Shah, Alex Beutel, Brian Gallagher and Christos Faloutsos. Spotting Suspicious Link Behavior with fBox: An Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 65 Adversarial Perspective. ICDM 2014, Shenzhen, China. CMU SCS Overview of tools ✔ Triangle-degree ✔ oddBall ✔ eigenSpokes ✔ fBox netProbe+ tensors Eurlion, Beijing 2015 Binary / weights B W W W W W Timeevolving (c) 2015, C. Faloutsos with classlabels Y Y 66 CMU SCS Roadmap • Introduction – Motivation • Part#1: Static graphs – Patterns – Tools • Local • Spectral (eigenspokes, CopyCatch) • Label propagation (NetProbe, Polonium, Snare) • Part#2: time-evolving graphs • Conclusions Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 67 CMU SCS E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’07] Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 68 CMU SCS E-bay Fraud detection Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 69 CMU SCS E-bay Fraud detection Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 70 CMU SCS E-bay Fraud detection - NetProbe Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 71 CMU SCS Popular press Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 72 CMU SCS Roadmap • Introduction – Motivation • Part#1: Static graphs – Patterns – Tools • Local • Spectral (eigenspokes, CopyCatch) • Label propagation (NetProbe, Polonium, Snare) • Part#2: time-evolving graphs • Conclusions Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 73 CMU SCS Polonium: Tera-Scale Graph Mining and Inference for Malware Detection SDM 2011, Mesa, Arizona Polo Chau Carey Nachenberg Machine Learning Dept Vice President & Fellow Jeffrey Wilhelm Principal Software Engineer Adam Wright Prof. Christos Faloutsos Software Engineer Computer Science Dept CMU SCS Polonium: The Data 60+ terabytes of data anonymously contributed by participants of worldwide Norton Community Watch program 50+ million machines 900+ million executable files Constructed a machine-file bipartite graph (0.2 TB+) 1 billion nodes (machines and files) 37 billion edges Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 75 CMU SCS Polonium: Key Ideas • Use Belief Propagation to propagate domain knowledge in machine-file graph to detect malware • Use “guilt-by-association” (i.e., homophily) – E.g., files that appear on machines with many bad files are more likely to be bad • Scalability: handles 37 billion-edge graph Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 76 CMU SCS Polonium: One-Interaction Results Ideal 84.9% True Positive Rate 1% False Positive Rate True Positive Rate % of malware correctly identified Eurlion, Beijing 2015 False Positive Rate (c) 2015, C. Faloutsos 77 % of non-malware wrongly labeled as malware CMU SCS Roadmap • Introduction – Motivation • Part#1: Static graphs – Patterns – Tools • Local • Spectral (eigenspokes, CopyCatch) • Label propagation (NetProbe, Polonium, Snare) • Part#2: time-evolving graphs • Conclusions Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 78 CMU SCS Network Effect Tools: SNARE • Some accounts are sort-of-suspicious – how to combine weak signals? Before Inventory Revenue L.A. Acc. payable Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 79 CMU SCS Network Effect Tools: SNARE • Some accounts are sort-of-suspicious – how to combine weak signals? Before Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 80 CMU SCS Network Effect Tools: SNARE • A: Belief Propagation. Before Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 81 CMU SCS Network Effect Tools: SNARE • A: Belief Propagation. Before After Mary McGlohon, Stephen Bay, Markus G. Anderle, David M. Steier, Christos Faloutsos:(c) 2015, SNARE: a link analytic system for Eurlion, Beijing 2015 C. Faloutsos graph labeling and risk detection. KDD 2009: 1265-1274 82 CMU SCS Network Effect Tools: SNARE • Produces improvement over simply using flags – Up to 6.5 lift – Improvement especially for low false positive rate Results for accounts data (ROC Curve) Ideal True positive rate Eurlion, Beijing 2015 SNARE (c) 2015, C. Faloutsos False positive rate Baseline (flags only) 83 CMU SCS Network Effect Tools: SNARE • Accurate- Produces large improvement over simply using flags • Flexible- Can be applied to other domains • Scalable- One iteration BP runs in linear time (# edges) • Robust- Works on large range of parameters Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 84 CMU SCS Roadmap • Introduction – Motivation • Part#1: Static graphs – Patterns – Tools • Local • Spectral (eigenspokes, CopyCatch) • Label propagation (NetProbe,… , Unification) • Part#2: time-evolving graphs • Conclusions Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 85 CMU SCS Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra U Kang Hsing-Kuo Kenneth Pao Tai-You Ke Duen Horng (Polo) Chau Christos Faloutsos ECML PKDD, 5-9 September 2011, Athens, Greece CMU SCS Problem Definition: GBA techniques (Guilt By Association) Given: Graph; & few labeled nodes Find: labels of rest (assuming network effects) Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 87 CMU SCS Are they related? • RWR (Random Walk with Restarts) – google’s pageRank (‘if my friends are important, I’m important, too’) • SSL (Semi-supervised learning) – minimize the differences among neighbors • BP (Belief propagation) – send messages to neighbors, on what you believe about them Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 88 CMU SCS Are they related? YES! • RWR (Random Walk with Restarts) – google’s pageRank (‘if my friends are important, I’m important, too’) • SSL (Semi-supervised learning) – minimize the differences among neighbors • BP (Belief propagation) – send messages to neighbors, on what you believe about them Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 89 CMU SCS DETAILS Correspondence of Methods Method Matrix RWR SSL FABP [I – c AD-1] [I + a(D - A)] [I + a D - c’A] 1 Eurlion, Beijing 2015 Unknow known n × x = (1-c)y × x = y × bh = φh d1 0 1 0 1 d2 1 0 1 1 d3 0 1 0 adjacency matrix (c) 2015, C. Faloutsos ? final labels/ beliefs 0 1 1 prior labels/ beliefs 90 CMU SCS runtime (min) Results: Scalability DETAILS # of edges FABP is linear on the number of edges. Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 91 CMU SCS Overview of tools ✔ Triangle-degree ✔ oddBall ✔ eigenSpokes ✔ fBox ✔ netProbe+ tensors Eurlion, Beijing 2015 Binary / weights B W W W W W Timeevolving (c) 2015, C. Faloutsos with classlabels Y Y 92 CMU SCS Summary of Part#1 • *many* patterns in real graphs – Power-laws everywhere – Long (and growing) list of tools for anomaly/fraud detection … oddBall Patterns Eurlion, Beijing 2015 fBox NetProbe anomalies (c) 2015, C. Faloutsos 93 CMU SCS Roadmap • Introduction – Motivation • Part#1: Static graphs – Patterns – Tools • Local • Spectral (eigenspokes, CopyCatch) • Label propagation (NetProbe, Polonium, Snare) • Part#2: time-evolving graphs • Conclusions Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 94 CMU SCS Part 2: Time evolving graphs Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 95 CMU SCS Graphs over time -> tensors! • Problem #2: – Given who calls whom, and when – Find patterns / anomalies smith Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 96 CMU SCS Graphs over time -> tensors! • Problem #2: – Given who calls whom, and when – Find patterns / anomalies Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 97 CMU SCS Graphs over time -> tensors! • Problem #2: – Given who calls whom, and when – Find patterns / anomalies Tue Mon Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 98 CMU SCS Graphs over time -> tensors! • Problem #2: – Given who calls whom, and when – Find patterns / anomalies caller callee Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 99 CMU SCS Graphs over time -> tensors! • Problem #2’: – Given customer-account-date – Find patterns / anomalies MANY more settings, with >2 ‘modes’ customer account Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 100 CMU SCS Graphs over time -> tensors! • Problem #2’’: – Given author-keyword-date – Find patterns / anomalies MANY more settings, with >2 ‘modes’ author keyword Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 101 CMU SCS Graphs over time -> tensors! • Problem #2’’’: – Given subject – verb – object facts – Find patterns / anomalies MANY more settings, with >2 ‘modes’ subject object Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 102 CMU SCS Graphs over time -> tensors! • Problem #2’’’’: – Given <triplets> – Find patterns / anomalies MANY more settings, with >2 ‘modes’ (and 4, 5, etc modes) mode1 mode2 Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 103 CMU SCS Answer to all: tensor factorization • Recall: (SVD) matrix factorization: finds blocks M products N users Eurlion, Beijing 2015 ‘meat-eaters’ ‘vegetarians’ ‘kids’ ‘steaks’ ‘cookies’ ‘plants’ ~ + (c) 2015, C. Faloutsos + 104 CMU SCS Answer to all: tensor factorization • PARAFAC decomposition artists politicians subject = + athletes + object Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 105 CMU SCS Answer: tensor factorization • PARAFAC decomposition • Results for who-calls-whom-when – 4M x 15 days ?? ?? caller = + ?? + callee Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 106 CMU SCS Anomaly detection in timeevolving graphs = • Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks 1 caller 5 receivers 4 days of activity ~200 calls to EACH receiver on EACH day! Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 107 CMU SCS Anomaly detection in timeevolving graphs = • Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks 1 caller 5 receivers 4 days of activity ~200 calls to EACH receiver on EACH day! Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 108 CMU SCS Anomaly detection in timeevolving graphs = • Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast ~200 calls to EACH receiver on EACH day! Automatic Discovery of (c) Temporal (Comet) Communities. Eurlion, Beijing 2015 2015, C. Faloutsos 109 PAKDD 2014, Tainan, Taiwan. CMU SCS Roadmap • Introduction – Motivation – Why study (big) graphs? • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors • Resources and Conclusions Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 110 CMU SCS Project info: PEGASUS www.cs.cmu.edu/~pegasus Results on large graphs: with Pegasus + hadoop + M45 Apache license Code, papers, manual, video Prof. U Kang Eurlion, Beijing 2015 Prof. Polo Chau (c) 2015, C. Faloutsos 111 CMU SCS Cast Akoglu, Leman Koutra, Danai Araujo, Miguel Beutel, Alex Lee, Jay Yoon Prakash, Aditya Eurlion, Beijing 2015 (c) 2015, C. Faloutsos Chau, Polo Kang, U Papalexakis, Vagelis Shah, Neil 112 CMU SCS CONCLUSION#0 • Patterns Eurlion, Beijing 2015 Anomalies (c) 2015, C. Faloutsos 113 CMU SCS CONCLUSION#1 • Many, surprising patterns in real graphs … Degree-weight Rank-degree Degree-#triangles Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 114 CMU SCS CONCLUSION#2 - tools • Many, powerful tools … oddBall Eurlion, Beijing 2015 fBox tensors (c) 2015, C. Faloutsos 115 CMU SCS Overview of tools Triangle-degree oddBall eigenSpokes fBox netProbe+ tensors Eurlion, Beijing 2015 Binary / weights B W W W W W Timeevolving (c) 2015, C. Faloutsos with classlabels Y Y 116 CMU SCS References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 • http://www.morganclaypool.com/doi/abs/10.2200/S004 49ED1V01Y201209DMK006 Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 117 CMU SCS Thank you! • Many, powerful tools … oddBall fBox tensors christos@cs.cmu.edu www.cs.cmu.edu/~christos Eurlion, Beijing 2015 (c) 2015, C. Faloutsos 118