Structure, Tie Persistence and Event Detection in Large Phone and SMS Networks Leman Akoglu and Bhavana Dalvi {lakoglu , bbd} @cs.cmu.edu Carnegie Mellon University and iLab Around 2M users, 50M edges, 500M phone calls/SMS 6 months data Structure Analysis • If A calls/texts B n times, can we say anything about how many times B calls/texts A? • Are a node’s degree and its neighbors’ degrees correlated? • How does the total duration or the number of phonecalls and SMSs grow by the number of contacts a user has? • Does tie strength of i and j depend on their neighborhood overlap? Event Detection TiePersistence Persistence Tie Phone and SMS network • Tie Persistence (TP) : It is the stability of ties across time as number of timeticks in which a link is observed, over the total number P of time-ticks. • User Perseverance (UP) : Perseverance of a user is defined as the average of the persistence of all his/her ties. • At which points in time does the behavior of the customers change considerably? • Can the detected change-points be attributed to a set of nodes, i.e. can we characterize which customer(s) cause most of the change? Methodology • How to predict whether a link will persist in the future? (1) Feature extraction: • Characterize nodes with 12 networkfeatures F: degree (number of contacts), total weight (phone call duration), … • One TxN time-series matrix per feature, T=183 days N=1,8M users • How are these attributes correlated to each other and to TP and UP ? • Which link and node attributes are important in prediction? Tie Attributes Node Attributes Reciprocity (R) : 1 if the tie is reciprocal in time tick Topological Overlap (TO) : Degree (K) Cluster Coefficient (C) : # triads in which node is involved # common neighbours User reciprocity (r) : Faction of ties Node degree containing both incoming and outgoing calls Delta_C Delta_K Delta_r R TO TP Delta_C Delta_K 1 0.22 1 Link attr Delta_r 0.127 -0.13 1 R -0.12 -0.4 0.08 1 TO -0.07 -0.4 0.06 0.41 1 TP -0.001 0.073 0.043 0.51 0.22 1 C K r UP C K r UP 1 0.068 1 0.27 0.26 0.0679 0.07 1 0.39 1 (2) Event Detection: • Define a sliding window of size W (set to 5 days) • Generate a correlation matrix C, with Cij being Pearson’s correlation between the time series of pair (i,j)over window W. • Largest eigenvector of C give the “activity” of each node. • Compare “activity” vectors over time by taking dot product score Z (1 if same, 0 if perpendicular –flag for small Z) Links Node attr Logistic Regression Model • Link A-B • Link & node attr at time t Prediction • Logistic Regression Input Tie strength based on (a) # SMS Reciprocity patterns can be used to spot outliers. (b) # Phone calls Deg of node & avg deg of its neighbors exhibit assortative mixing Total node strength grows superlinearly by increasing degree. • Will link A-B persist in time t+k ? (left) Z score vs time with W=5 and F=inweight (number of calls received). Top 10 days with the largest Z score is highlighted in red bars. (middle) u(t) vs r(t-1) for each node at T=Dec 26th. Top 5 nodes with the largest change is marked with red stars. (right) inweight vs time for the top 5 nodes marked – notice the change in calling behavior during the Christmas week. Output (c) Duration of phone calls Tie strength increases by increasing neighborhood overlap on avg. Dataset used for this work was provided by iLab at Carnegie Mellon University. Local network attributes do help to predict tie persistence. Using both tie attributes and node attributes improve prediction accuracy Regression techniques give better accuracy than rule based techniques. Large change in users’ “eigen-behaviors” is flagged as changepoints (events) in time. Our method detected “events” that coincide with major holidays and festivals in our data set. These results can be used to spot top users who contributes most to detected changes.