CMU SCS Mining Large Graphs: Patterns, Anomalies, and Fraud Detection Christos Faloutsos CMU CMU SCS Thank you! • Dr. Ping Luo CAS (c) 2015, C. Faloutsos 2 CMU SCS Roadmap • Introduction – Motivation – Why study (big) graphs? • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors • Conclusions CAS (c) 2015, C. Faloutsos 3 CMU SCS Graphs - why should we care? ~1B nodes (web sites) ~6B edges (http links) ‘YahooWeb graph’ CAS (c) 2015, C. Faloutsos 4 CMU SCS Graphs - why should we care? ~1B nodes (web sites) ~6B edges (http links) ‘YahooWeb graph’ U Kang, Jay-Yoon Lee, Danai Koutra, and Christos Faloutsos. Net-Ray: Visualizing and Mining Billion-Scale CAS (c) 2015, C. Faloutsos 5 Graphs PAKDD 2014, Tainan, Taiwan. CMU SCS Graphs - why should we care? >$10B; ~1B users CAS (c) 2015, C. Faloutsos 6 CMU SCS Graphs - why should we care? Food Web [Martinez ’91] Internet Map [lumeta.com] CAS (c) 2015, C. Faloutsos 7 CMU SCS Graphs - why should we care? • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • Recommendation systems • .... • Many-to-many db relationship -> graph CAS (c) 2015, C. Faloutsos 8 CMU SCS Motivating problems • P1: patterns? Fraud detection? • P2: patterns in time-evolving graphs / tensors destination time CAS (c) 2015, C. Faloutsos 9 CMU SCS Motivating problems • P1: patterns? Fraud detection? Patterns anomalies • P2: patterns in time-evolving graphs / tensors destination time CAS (c) 2015, C. Faloutsos 10 CMU SCS Roadmap • Introduction – Motivation – Why study (big) graphs? • Part#1: Patterns & fraud detection • Part#2: time-evolving graphs; tensors • Conclusions CAS (c) 2015, C. Faloutsos 11 CMU SCS Part 1: Patterns, & fraud detection CAS (c) 2015, C. Faloutsos 12 CMU SCS Laws and patterns • Q1: Are real graphs random? CAS (c) 2015, C. Faloutsos 13 CMU SCS Laws and patterns • Q1: Are real graphs random? • A1: NO!! – Diameter (‘6 degrees’; ‘Kevin Bacon’) – in- and out- degree distributions – other (surprising) patterns • So, let’s look at the data CAS (c) 2015, C. Faloutsos 14 CMU SCS Solution# S.1 • Power law in the degree distribution [Faloutsos x 3 SIGCOMM99] internet domains att.com log(degree) ibm.com log(rank) CAS (c) 2015, C. Faloutsos 15 CMU SCS Solution# S.1 • Power law in the degree distribution [Faloutsos x 3 SIGCOMM99] internet domains att.com log(degree) ibm.com -0.82 log(rank) CAS (c) 2015, C. Faloutsos 16 CMU SCS Solution# S.1 • Q: So what? internet domains att.com log(degree) ibm.com -0.82 log(rank) CAS (c) 2015, C. Faloutsos 17 CMU SCS Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: internet domains att.com log(degree) ibm.com -0.82 log(rank) CAS (c) 2015, C. Faloutsos 18 CMU SCS Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: 100^2 * N= 10 Trillion internet domains att.com log(degree) ibm.com -0.82 log(rank) CAS (c) 2015, C. Faloutsos 19 CMU SCS Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: 100^2 * N= 10 Trillion internet domains att.com log(degree) ibm.com -0.82 log(rank) CAS (c) 2015, C. Faloutsos 20 CMU SCS Gaussian trap Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2 internet domains att.com ~0.8PB -> a data center(!) log(degree) ibm.com -0.82 log(rank) CAS (c) 2015, C. Faloutsos DCO @ CMU 21 CMU SCS Gaussian trap Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2 internet domains att.com ~0.8PB -> a data center(!) log(degree) ibm.com -0.82 log(rank) CAS (c) 2015, C. Faloutsos 22 CMU SCS Observation – big-data: • O(N2) algorithms are ~intractable - N=1B • N2 seconds = 31B years (>2x age of universe) 1B 1B CAS (c) 2015, C. Faloutsos 23 CMU SCS Observation – big-data: • O(N2) algorithms are ~intractable - N=1B 31M • N2 seconds = 31B years • 1,000 machines 1B CAS (c) 2015, C. Faloutsos 24 CMU SCS Observation – big-data: • O(N2) algorithms are ~intractable - N=1B 31K • N2 seconds = 31B years • 1M machines 1B CAS (c) 2015, C. Faloutsos 25 CMU SCS Observation – big-data: • O(N2) algorithms are ~intractable - N=1B 3 • N2 seconds = 31B years • 10B machines ~ $10Trillion 1B CAS (c) 2015, C. Faloutsos 26 CMU SCS Observation – big-data: • O(N2) algorithms are ~intractable - N=1B And parallelism might not help 3 • N2 seconds = 31B years • 10B machines ~ $10Trillion 1B CAS (c) 2015, C. Faloutsos 27 CMU SCS Solution# S.2: Eigen Exponent E Eigenvalue Exponent = slope E = -0.48 Ax=lx Rank of decreasing eigenvalue • A2: power law in the eigenvalues of the adjacency matrix (‘eig()’) CAS (c) 2015, C. Faloutsos 28 CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – Patterns: Degree; Triangles – Anomaly/fraud detection – Graph understanding • Part#2: time-evolving graphs; tensors • Conclusions CAS (c) 2015, C. Faloutsos 29 CMU SCS Solution# S.3: Triangle ‘Laws’ • Real social networks have a lot of triangles CAS (c) 2015, C. Faloutsos 30 CMU SCS Solution# S.3: Triangle ‘Laws’ • Real social networks have a lot of triangles – Friends of friends are friends • Any patterns? – 2x the friends, 2x the triangles ? CAS (c) 2015, C. Faloutsos 31 CMU SCS Triangle Law: #S.3 [Tsourakakis ICDM 2008] Reuters Epinions CAS SN X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles (c) 2015, C. Faloutsos 32 CMU SCS details Triangle Law: Computations [Tsourakakis ICDM 2008] But: triangles are expensive to compute (3-way join; several approx. algos) – O(dmax2) Q: Can we do that quickly? A: CAS (c) 2015, C. Faloutsos 33 CMU SCS details Triangle Law: Computations [Tsourakakis ICDM 2008] But: triangles are expensive to compute (3-way join; several approx. algos) – O(dmax2) Q: Can we do that quickly? Ax=lx A: Yes! #triangles = 1/6 Sum ( li3 ) (and, because of skewness (S2) , we only need the top few eigenvalues! - O(E) CAS (c) 2015, C. Faloutsos 34 CMU SCS Triangle counting for large graphs? ? ? ? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] CAS (c) 2015, C. Faloutsos 35 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] CAS (c) 2015, C. Faloutsos 36 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] CAS (c) 2015, C. Faloutsos 37 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] CAS (c) 2015, C. Faloutsos 38 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] CAS (c) 2015, C. Faloutsos 39 CMU SCS MORE Graph Patterns ✔ ✔ ✔ RTG:CAS A Recursive Realistic Graph Generator using Random (c) 2015, C. Faloutsos 40 Typing Leman Akoglu and Christos Faloutsos. PKDD’09. CMU SCS MORE Graph Patterns • Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed.: Charu Aggarwal) • Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool. CAS (c) 2015, C. Faloutsos 41 CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – Patterns – Anomaly / fraud detection • CopyCatch Patterns • Spectral methods (‘fBox’) • Belief Propagation anomalies • Part#2: time-evolving graphs; tensors • Conclusions CAS (c) 2015, C. Faloutsos 42 CMU SCS Fraud • Given – Who ‘likes’ what page, and when • Find – Suspicious users and suspicious products CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, CAS (c) 2015, C. Faloutsos 43 Christos Faloutsos WWW, 2013. CMU SCS Fraud • Given – Who ‘likes’ what page, and when • Find – Suspicious users and suspicious products Likes CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, CAS (c) 2015, C. Faloutsos 44 Christos Faloutsos WWW, 2013. CMU SCS Graph Patterns and Lockstep Our intuition Behavior ▪ Lockstep behavior: Same Likes, same time Likes CAS (c) 2015, C. Faloutsos 45 CMU SCS Graph Patterns and Lockstep Our intuition Behavior ▪ Lockstep behavior: Same Likes, same time Likes CAS (c) 2015, C. Faloutsos 46 CMU SCS Graph Patterns and Lockstep Our intuition Behavior ▪ Lockstep behavior: Same Likes, same time Likes CAS (c) 2015,Lockstep C. Faloutsos Suspicious Behavior 47 CMU SCS MapReduce Overview ▪ Use Hadoop to search for many clusters in parallel: 1. Start with randomly seed 2. Update set of Pages and center Like times for each cluster 3. Repeat until convergence Likes CAS (c) 2015, C. Faloutsos 48 CMU SCS Deployment at Facebook CopyCatch runs regularly (along with many other security mechanisms, and a large Site Integrity team) 3 months of CopyCatch @ Facebook #users caught CAS Number of users caught ▪ 08/25 09/08 09/22 10/06 10/20 11/03 11/17 12/01 (c)of2015, C. Faloutsos Date CopyCatch run time 49 CMU SCS Deployment at Facebook Fake acct Manually labeled 22 randomly selected clusters from February 2013 CAS (c) 2015, C. Faloutsos 50 CMU SCS Deployment at Facebook Fake acct Most clusters (77%) come from real but compromised users Manually labeled 22 randomly selected clusters from February 2013 CAS (c) 2015, C. Faloutsos 51 CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – Patterns – Anomaly / fraud detection • CopyCatch • Spectral methods (‘fBox’) • Belief Propagation • Part#2: time-evolving graphs; tensors • Conclusions CAS (c) 2015, C. Faloutsos 52 CMU SCS Problem: Social Network Link Fraud Target: find “stealthy” attackers missed by other algorithms Clique 41.7M nodes 1.5B edges Bipartite core CAS (c) 2015, C. Faloutsos 53 CMU SCS Problem: Social Network Link Fraud Target: find “stealthy” attackers missed by other algorithms Takeaway: use reconstruction error between true/latent representation! Neil Shah, Alex Beutel, Brian Gallagher and Christos Faloutsos. Spotting Suspicious Link Behavior with fBox: An CAS (c) 2015, C. Faloutsos 54 Adversarial Perspective. ICDM 2014, Shenzhen, China. CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – Patterns – Anomaly / fraud detection • CopyCatch • Spectral methods (‘fBox’, suspiciousness) • Belief Propagation • Part#2: time-evolving graphs; tensors • Conclusions CAS (c) 2015, C. Faloutsos 55 CMU SCS Suspicious Patterns in Event Data ? 2-modes ? ? n-modes A General Suspiciousness Metric for Dense Blocks in Multimodal Data, Meng Jiang, Alex Beutel, Peng Cui, Bryan CAS (c) 2015, C. Faloutsos 56 Hooi, Shiqiang Yang, and Christos Faloutsos, ICDM, 2015. CMU SCS ICDM 2015 Suspicious Patterns in Event Data Which is more suspicious? 20,000 Users Retweeting same 20 tweets 6 times each All in 10 hours CAS vs. 225 Users Retweeting same 1 tweet 15 times each All in 3 hours All from 2 IP addresses (c) 2015, C. Faloutsos Answer: volume * DKL(p|| pbackground) 57 ymous CMU SCS Anonymous Anonymous mous ymous Anonymous Anonymous Anonymous Anonymous ICDM 2015 CAS .. . 15:37 l l : l l l : l : : l l : l : l l : 21:13 11:04 : : : : l l l 10:37 : : l l : : l l : l l l 10:32 10:24 ··· l l l : : : l 10:19 27,313 R et weet s “ John123” “ Joe456” “ Job789” “ Jack135” “ Jill246” “ Jane314” “ Jay357” 10:14 200 M inut es 225 U ser s tweets from m 500 user s g and all in t tr y to find t no method dense blocks dr aw human ng, typically notewor thy hat we show ed answer to st of axioms e propose an and is fast to o spot dense ness” ) or der. it impr oves nds retweetspanning 0.3 10:09 Suspicious Patterns in Event Data ... l l .. . Fig. 1. Density in multiple modes is suspicious. Left: A dense block of 225 users on Tencent Weibo (Chinese Twitter) retweeting one tweet 27,313 times from 2 IP addresses over 200 minutes. Right: magnification of a subset of this block. l and : indicate the two IP addresses used. Notice how synchronized the behavior is, across several modes (IP-address, user-id, timestamp). Retweeting: “Galaxy Note Dream Project: Happy Happy Life Traveling the World” I nfor mal Problem 1 (Suspiciousness scor e) Given a K mode dataset (tensor) X , with counts of events (that are non-negative integer values), and two subtensors Y1 and Y2 , which is more suspicious and worthy of further investigation? Why multimodal data (tensor ): Graphs and social networks (c) 2015, C. Faloutsos have attracted huge interest - and they are perfectly modeled as 58 CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – Patterns – Anomaly / fraud detection • CopyCatch • Spectral methods (‘fBox’) • Belief Propagation • Part#2: time-evolving graphs; tensors • Conclusions CAS (c) 2015, C. Faloutsos 59 CMU SCS E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’07] CAS (c) 2015, C. Faloutsos 60 CMU SCS E-bay Fraud detection CAS (c) 2015, C. Faloutsos 61 CMU SCS E-bay Fraud detection CAS (c) 2015, C. Faloutsos 62 CMU SCS E-bay Fraud detection - NetProbe CAS (c) 2015, C. Faloutsos 63 CMU SCS Popular press And less desirable attention: • E-mail from ‘Belgium police’ (‘copy of your code?’) CAS (c) 2015, C. Faloutsos 64 CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – Patterns – Anomaly / fraud detection • CopyCatch • Spectral methods (‘fBox’) • Belief Propagation; fast computation & unification • Part#2: time-evolving graphs; tensors • Conclusions CAS (c) 2015, C. Faloutsos 76 CMU SCS Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra U Kang Hsing-Kuo Kenneth Pao Tai-You Ke Duen Horng (Polo) Chau Christos Faloutsos ECML PKDD, 5-9 September 2011, Athens, Greece CMU SCS Problem Definition: GBA techniques Given: Graph; & few labeled nodes Find: labels of rest (assuming network effects) CAS (c) 2015, C. Faloutsos 78 CMU SCS Are they related? • RWR (Random Walk with Restarts) – google’s pageRank (‘if my friends are important, I’m important, too’) • SSL (Semi-supervised learning) – minimize the differences among neighbors • BP (Belief propagation) – send messages to neighbors, on what you believe about them CAS (c) 2015, C. Faloutsos 79 CMU SCS Are they related? YES! • RWR (Random Walk with Restarts) – google’s pageRank (‘if my friends are important, I’m important, too’) • SSL (Semi-supervised learning) – minimize the differences among neighbors • BP (Belief propagation) – send messages to neighbors, on what you believe about them CAS (c) 2015, C. Faloutsos 80 CMU SCS Correspondence of Methods Method Matrix RWR SSL FABP [I – c AD-1] [I + a(D - A)] [I + a D - c’A] CAS 1 1 1 Unknow known n × x = (1-c)y × x = y × bh = φh d1 0 1 0 d2 1 0 1 d3 0 1 0 adjacency matrix (c) 2015, C. Faloutsos ? final labels/ beliefs 0 1 1 prior labels/ beliefs 81 CMU SCS runtime (min) Results: Scalability # of edges (Kronecker graphs) FABP is linear on the number of edges. CAS (c) 2015, C. Faloutsos 82 CMU SCS % accuracy Results: Parallelism runtime (min) FABP ~2x faster & wins/ties on accuracy. CAS (c) 2015, C. Faloutsos 83 CMU SCS Summary of Part#1 • *many* patterns in real graphs – Power-laws everywhere – Gaussian trap • Avg << Max – Long (and growing) list of tools for anomaly/fraud detection Patterns CAS anomalies (c) 2015, C. Faloutsos 84 CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors – P2.1: time-evolving graphs – P2.2: with side information (‘coupled’ M.T.F.) – (Speed) • Conclusions CAS (c) 2015, C. Faloutsos 85 CMU SCS Part 2: Time evolving graphs; tensors CAS (c) 2015, C. Faloutsos 86 CMU SCS Graphs over time -> tensors! • Problem #2.1: – Given who calls whom, and when – Find patterns / anomalies smith CAS (c) 2015, C. Faloutsos 87 CMU SCS Graphs over time -> tensors! • Problem #2.1: – Given who calls whom, and when – Find patterns / anomalies CAS (c) 2015, C. Faloutsos 88 CMU SCS Graphs over time -> tensors! • Problem #2.1: – Given who calls whom, and when – Find patterns / anomalies Tue Mon CAS (c) 2015, C. Faloutsos 89 CMU SCS Graphs over time -> tensors! • Problem #2.1: – Given who calls whom, and when – Find patterns / anomalies caller callee CAS (c) 2015, C. Faloutsos 90 CMU SCS Graphs over time -> tensors! • Problem #2.1’: – Given author-keyword-date – Find patterns / anomalies MANY more settings, with >2 ‘modes’ author keyword CAS (c) 2015, C. Faloutsos 91 CMU SCS Graphs over time -> tensors! • Problem #2.1’’: – Given subject – verb – object facts – Find patterns / anomalies MANY more settings, with >2 ‘modes’ subject object CAS (c) 2015, C. Faloutsos 92 CMU SCS Graphs over time -> tensors! • Problem #2.1’’’: – Given <triplets> – Find patterns / anomalies MANY more settings, with >2 ‘modes’ (and 4, 5, etc modes) mode1 mode2 CAS (c) 2015, C. Faloutsos 93 CMU SCS Graphs & side info • Problem #2.2: coupled (eg., side info) – Given subject – verb – object facts • And voxel-activity for each subject-word – Find patterns / anomalies subject object CAS `apple tastes sweet’ fMRI voxel activity (c) 2015, C. Faloutsos 94 CMU SCS Graphs & side info • Problem #2.2: coupled (eg., side info) – Given subject – verb – object facts • And voxel-activity for each subject-word – Find patterns / anomalies ‘apple’ ‘sweet’ CAS `apple tastes sweet’ ‘apple’ fMRI voxel activity (c) 2015, C. Faloutsos 95 CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors – P2.1: time-evolving graphs – P2.2: with side information (‘coupled’ M.T.F.) – Speed • Conclusions CAS (c) 2015, C. Faloutsos 96 CMU SCS Answer to both: tensor factorization • Recall: (SVD) matrix factorization: finds blocks M products N users CAS ‘meat-eaters’ ‘vegetarians’ ‘kids’ ‘steaks’ ‘cookies’ ‘plants’ ~ + (c) 2015, C. Faloutsos + 97 CMU SCS Answer to both: tensor factorization • PARAFAC decomposition artists politicians subject = + athletes + object CAS (c) 2015, C. Faloutsos 98 CMU SCS Answer: tensor factorization • PARAFAC decomposition • Results for who-calls-whom-when – 4M x 15 days ?? ?? caller = + ?? + callee CAS (c) 2015, C. Faloutsos 99 CMU SCS Anomaly detection in timeevolving graphs = • Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks 1 caller 5 receivers 4 days of activity ~200 calls to EACH receiver on EACH day! CAS (c) 2015, C. Faloutsos 100 CMU SCS Anomaly detection in timeevolving graphs = • Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks 1 caller 5 receivers 4 days of activity ~200 calls to EACH receiver on EACH day! CAS (c) 2015, C. Faloutsos 101 CMU SCS Anomaly detection in timeevolving graphs = • Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast ~200 calls to EACH receiver on EACH day! Automatic Discovery of (c) Temporal (Comet) Communities. CAS 2015, C. Faloutsos 102 PAKDD 2014, Tainan, Taiwan. CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors – P2.1: Discoveries @ phonecall network – P2.2: Discoveries in neuro-semantics – (Speed) • Conclusions CAS (c) 2015, C. Faloutsos 103 CMU SCS Coupled Matrix-Tensor Factorization (CMTF) c1 b1 Y X a1 d1 a1 CAS cF bF + aF dF + aF (c) 2015, C. Faloutsos 104 CMU SCS Neuro-semantics • Brain Scan Data* • 9 persons • 60 nouns • Questions • 218 questions • ‘is it alive?’, ‘can you eat it?’ *Mitchell et al. Predicting human brain activity associated with the meanings of nouns. Science,2008. CAS (c) 2015, C. Faloutsos 105 Data@ www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html CMU SCS Neuro-semantics • Brain Scan Data* • 9 persons • 60 nouns • Questions • 218 questions • ‘is it alive?’, ‘can you eat it?’ Patterns? CAS (c) 2015, C. Faloutsos 106 CMU SCS Neuro-semantics • Brain Scan Data* • 9 persons • 60 nouns • Questions • 218 questions • ‘is it alive?’, ‘can you eat it?’ Patterns? questions airplane CAS nouns (c) 2015, C. Faloutsos … dog voxels 107 CMU SCS Neuro-semantics CAS (c) 2015, C. Faloutsos = 108 CMU SCS Neuro-semantics = Small items -> Premotor cortex CAS (c) 2015, C. Faloutsos 109 CMU SCS Neuro-semantics Small items -> Premotor cortex Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT: Accelerating(c) 2015, Coupled Sparse Matrix-Tensor CAS C. Faloutsos 110 Factorizations by 200x, SDM 2014 CMU SCS Part 2: Conclusions • Time-evolving / heterogeneous graphs -> tensors • PARAFAC finds patterns • (GigaTensor/HaTen2 -> fast & scalable) = CAS (c) 2015, C. Faloutsos 111 CMU SCS Roadmap • Introduction – Motivation – Why study (big) graphs? • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors • Acknowledgements and Conclusions CAS (c) 2015, C. Faloutsos 112 CMU SCS Thanks Disclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies Thanks to: NSF IIS-1247489, IIS-0705359, IIS0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, CAS (c) 2015, C. Faloutsos 113 SPRINT, Google, INTEL, HP, iLab CMU SCS Cast Akoglu, Leman Kang, U CAS Araujo, Miguel Beutel, Alex Koutra, Papalexakis, Danai Vagelis (c) 2015, C. Faloutsos Chau, Polo Shah, Neil Hooi, Bryan Song, Hyun Ah 114 CMU SCS CONCLUSION#1 – Big data • Patterns Anomalies • Large datasets reveal patterns/outliers that are invisible otherwise CAS (c) 2015, C. Faloutsos 115 CMU SCS CONCLUSION#2 – tensors • powerful tool = CAS (c) 2015, C. Faloutsos 116 CMU SCS References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 • http://www.morganclaypool.com/doi/abs/10.2200/S004 49ED1V01Y201209DMK006 CAS (c) 2015, C. Faloutsos 117 CMU SCS TAKE HOME MESSAGE: Cross-disciplinarity = CAS (c) 2015, C. Faloutsos 118 CMU SCS TAKE HOME you! MESSAGE: Thank Cross-disciplinarity = CAS (c) 2015, C. Faloutsos 119 CMU SCS Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors – P2.1: Discoveries @ phonecall network – P2.2: Discoveries in neuro-semantics – Scalability & Speed • Conclusions CAS (c) 2015, C. Faloutsos 121 CMU SCS Q: spilling to the disk? Reminder: tensor (eg., Subject-verb-object) 144M non-zeros 48M NELL (Never Ending Language Learner) @CMU 26M 26M CAS (c) 2015, C. Faloutsos 122 CMU SCS A: GigaTensor Reminder: tensor (eg., Subject-verb-object) 26M x 48M x 26M, 144M non-zeros NELL (Never Ending Language Learner)@CMU U Kang, Evangelos E. Papalexakis, Abhay Harpale, Christos Faloutsos, GigaTensor: Scaling Tensor Analysis Up By 100 CAS (c) 2015, C. Faloutsos 123 Times - Algorithms and Discoveries, KDD’12 CMU SCS A: GigaTensor • GigaTensor solves 100x larger problem (K) (J) GigaTensor 100x Out of Memory CAS (c) 2015, C. Faloutsos (I) Number of nonzero = I / 50 124