1 Visual Analytic Techniques for Structural and Functional Discovery in Dynamic Graphs Shawn Mankad, George Michailidis Abstract—Time series of graphs are increasingly prevalent in modern data and pose unique challenges to visual exploration and pattern extraction. This paper describes the application of matrix factorizations that enhance existing visualization techniques for exploration and pattern detection in dynamic graph series. The combination of matrix factorization and visualizations allows the user to hone in on and display interesting, underlying structure and its evolution over time. The methods are scalable to data sets with a large number of time points or nodes, and can accommodate sudden changes to graph topology. The contribution of our techniques to visual exploration are demonstrated with several dynamic graph series from both synthetic and real world data. The real graphs are obtained from citation and trade networks. These examples illustrate how users can steer the techniques and augment existing visualization methods to discover and display meaningful patterns in sizable graphs over many time points. Index Terms—Visual analytics, dynamic graph filtering, structural graph discovery, graph visualization, matrix decomposition F 1 I NTRODUCTION V ISUAL analytics have become an important class of methods for extracting information from complex data. The combination of sophisticated analytical and visualization techniques allows users to explore and detect interesting patterns in large amounts of data, and develop a deeper understanding of the underlying mechanisms. Due to advances in data collection technologies, time series of graphs (networks) are increasingly common in a variety of fields, such as sociology [1], biology [2], and finance [3], among others. The analysis of such data is challenging, since time dependent changes may simultaneously affect network topology and node/edge features. Common visualization techniques for dynamic graphs enhance static methods with animations that move nodes as little as possible between time steps to facilitate readability. However, the effectiveness of these methods rely on the human ability to perceive and remember changes [4]. Moreover, experiments have discovered that the effectiveness of dynamic layouts are strongly predicted by node speed and target separation [5]. Thus, dynamic graph visualizations encounter difficulties when faced with a large number of time points, larger graphs that feature abrupt, non-smooth changes, or if the user is interested in detailed analysis, especially at the individual node level (see Section 3.2 of [6], [7], [8]). In this work, we present methods that address these challenges by utilizing visualizations that leverage matrix factorization to detect and display underlying struc- • Shawn Mankad is with the Department of Statistics, University of Michgian, Ann Arbor, MI, 48109. E-mail: smankad@umich.edu • George Michailidis is with the Departments of Statistics and Electrical Engineering & Computer Science, University of Michigan, Ann Arbor, MI, 48109. ture. In particular, we are interested in primarily three tasks that necessitate an approach with mathematical foundations. (i) Produce static displays of the evolving node connectivity, while explicitly incorporating the temporal dimension. This can be especially useful when one is given a large number of time points, or if the graph sequence contains sudden changes. (ii) Time-varying, overlapping community assignment, which could be used for node coloring, aggregation, and so on to aid in visual analysis. (iii) Reduce clutter in graph layouts in a principled fashion by removing unimportant edges via a filter. This preprocessing can help facilitate visual exploration and information extraction. The proposed methodology relies on examining how lower dimensional matrix representations evolve, and controlling their evolution using a constrained optimization. The constraint strengths, which control how sensitive the matrix representations are to short term fluctuations, are set by the user to steer the analysis. The methodology is scalable to data sets with large numbers of time points or nodes, and can accommodate abrupt changes that are challenging to dynamic layouts. Visual exploration of dynamic graphs based on matrix factorizations is an important approach to consider for many reasons. First, factorizations allow the user to see typical time-varying node behaviors. Some of these timevarying connectivity patterns may be expected, as in the rapid rise in connectivity of hub nodes in preferential attachment networks, but others may be truly discovered. As a consequence, matrix factorizations also allow the user to see a measure of the dynamic network’s complexity, specifically, the number and types of evolving nodes in the data. The remainder of this article is organized as follows: in the next section, we review related work on visualization of graphs. We provide background on matrix factorizations in Section 3, followed by a description of 2 our approach (Section 4). Mathematical detail behind the proposed factorization is given in Section 5. The corresponding procedure to obtain estimates and are discussed in Section 6. We then discuss different displays based on matrix factorizations and exemplify our approach on several simulated and real-world data sets in Sections 7, 8, and close the article with a brief discussion (Section 9). 2 R ELATED W ORK V ISUALIZATION ON N ETWORK A NALYSIS AND 2.1 Static Graphs There is a large literature focusing on discovering patterns within and among a set of static graphs, that is, graphs that are not changing over time. Traditional examples of network analysis include discovering community structure, and using different connectivity measures, such as path length, degree, modularity, and so on, to characterize the relative importance of particular subsets of nodes [9], [10], [11]. These traditional tools can experience difficulties with a sequence of networks, as they do not explicitly incorporate any temporal information. Yet, this is a key aspect, as time dependent changes may simultaneously affect network topology and node/edge features. Due to the challenges of analyzing such complex objects, graph layouts have become important for detecting meaningful structure. [12], [13] are classical texts that discuss traditional graph layouts, which impose criteria such as display symmetry, minimal edge crossings, uniform edge lengths and so on for aesthetic reasons. A number of software packages have been developed for graph drawing of complex networks (see, for example, [14], [15], [16], [17], [18]). More recent graph visualization techniques have been developed for graphs on the order of hundreds of thousands nodes, and depend on the stage and goal of the analysis, and attributes of the given graphs (e.g., static vs. dynamic, weighted vs. unweighted, etc.) [6], [7]. For example, an overview of large graphs can be performed efficiently using aggregation techniques to avoid drawing every node [19]. For social networks, graph drawing techniques based on semantic and structural abstraction can be used [20], and methods to visualize uncertainty have been developed [21], [22]. Other approaches similar in spirit to this work have been created for community assignment in large, static sparse graphs through computation of eigenvectors of graph related matrices [23]. These communities could be used for aggregation, node coloring, etc., to facilitate visual analysis. While we follow the same principles as these works, we offer analysis for a time-series of graphs. sequence. For example, [24] propose an online algorithm that uses the Graphics Processing Unit to efficiently represent the main global structure of the graphs while preserving the layout’s temporal stability. An important challenge for such an approach is to preserve the user’s mental map [4], [25], [26]. Specifically, the same overall shape and attributes should be preserved and nodes moved as little as possible between time steps to facilitate readability. An alternative approach is called the small multiples approach [27], which allows the user to view all time periods simultaneously using a matrix of images. Numerous studies have shown the small multiples approach to be superior for some graph comprehension tasks [4], [28]. However, analyzing even a hundred time points can be challenging for either approach if the user is interested in detailed analysis. [5] find that the effectiveness of animation is strongly predicted by node speed and target separation. Thus, there exists a bottleneck stemming from the user’s cognitive load, as the user must remember patterns over a large time span or time points must be traversed quickly increasing node speed. With small multiples, there usually exists a scarcity of screen space for so many images. Both approaches can also struggle with larger graphs that feature non-smooth changes to its topology. In this work, we address the challenge of scalability both in terms of time and network size by representing the network sequence with a set of time-series for each node. The paths of each node over all time points can be displayed using static visualizations. Hence, this work is particularly suited for representing temporal changes at the individual node level, a task that can be difficult using graph drawing techniques [8]. Another recent approach, the so-called TimeMatrix [8], relies on viewing adjacency matrices at different levels of aggregation to help users gain insight about the temporal aspects of network sequences. Since the main display is of an adjacency matrix, TimeMatrix performs well when especially interested in evolution of edges. In contrast to TimeMatrix, our approach focuses on node dynamics. Both approaches complement graph drawing through matrix-based representations, and are designed for analytical tasks that may be difficult with pure drawing approaches. 3 In this section, we provide background information on two matrix factorizations and how they have been utilized to facilitate successful application of visualization techniques. The extension to dynamic networks is presented in Section 4. 3.1 2.2 Dynamic Graphs When given time-varying graphs, the majority of algorithms are designed for animated drawing of the graph BACKGROUND Matrix Factorizations The most common factorization is the Singular Value Decomposition (SVD), which has important connections to community detection [29], graph drawing [30], and 3 areas of statistics and signal processing [31]. For instance in classical spectral layout, the coordinates of each node are given by the SVD of graph related matrices, and can be calculated efficiently using tools in [30], [32]. The non-negative matrix factorization (NMF) is an alternative factorization that has been shown to be advantageous for visualization of non-negative data [33], [34], [35]. This is typically the case with networks, as edges commonly correspond to flows, capacity, or binary relationships, and hence are non-negative. Recently, theoretical connections between NMF and important problems in data mining have been developed [36], [37], and accordingly, NMF has been proposed for overlapping community detection on static networks [38], [39]. Both types of factorization approximate a given graph related matrix with an outer product A ≈ UV T , (1) where A is usually the n × n adjacency or Laplacian matrix, and U and V are both n × K matrices. The rank or dimension of the approximation K is chosen to obtain a good fit to the data while achieving interpretability. The key difference between SVD and NMF are the constraints that are placed on U and V . SVD imposes a particular geometry on the factorization, so that U and V can each be viewed as coordinate systems that fit the data. In particular, each (eigen)vector Ui is perpendicular to every other vector Uj , so that the collection {U1 , ..., UK } forms a lower dimensional orthonormal space that the data can be visualized in. In addition, U satisfies U T U = I (orthonormality constraints). A similar characterization holds for {V1 , ..., VK }. In contrast, with NMF the orthogonality constraints are replaced with a restriction of non-negativity of the factorized matrices [33], [40]. That is, every element of U and V is greater than or equal to zero. The geometric characterization of SVD is traded for the enhanced interpretability that comes from strictly additive combinations. For instance, PK consider (1) in element form Aij = k=1 Uik Vkj . Each term of the sum can be thought of as the contribution of community k to edge Aij , since all terms are nonnegative. Both factorizations are useful for discovering interesting node connectivities in networks. The U vectors score nodes by their “interestingness”, or distance from the average outgoing connectivity. The V vectors yield similar scores based on incoming connections. Together, U and V are useful for highlighting nodes by their importance to interconnectivity. For illustration, consider the graph structures and matrix factorizations shown in Fig. 1. The U vector for both NMF and SVD on the Star Network highlight the central node, having the largest magnitude. The V vectors show that all peripheral nodes are equal in terms of their incoming connections, and that the central node has no incoming connections. An interesting fact about NMF is that the estimates are always rescalable (scale invariant). For example with Star Network Adjacency Matrix NMF U, V SVD U, V 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0.5 0 0.5 0 0.5 0 0.5 0 0.9 −0.5 −0.2 −0.2 −0.5 −0.2 −0.5 −0.5 −0.2 Ring Network 0 1 0 0 1 1 0 1 0 0 0.9 0.9 0.9 0.9 0.9 −0.6 0.5 −0.2 −0.2 0.5 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 0.4 0.4 0.4 0.4 0.4 −0.6 0.5 −0.2 −0.2 0.5 Fig. 1. Rank 1 non-negative and singular value matrix factorizations. The SVD is computed with the Laplacian matrix. the Star Network, we can multiply to obtain 1 0 U = 0 ,V = 0 0 U by 0.5 and V by 2 0 1 1 1 1 . (2) Their product U V T is unchanged with the rescaling. Thus when interpreting NMF estimates, one cannot compare the magnitudes of U with V . Instead, U and V should be considered separately, with emphasis on the relative distributions of scores. For instance, NMF vectors of the Ring Network show each node with an equal score for incoming and outgoing connectivity. The fact that U contains larger elements than V is arbitrary. However, the assignment of equal values within U and V shows each node is equally important to interconnectivity. The rescaling issue does not apply to SVD due to the imposed orthonormality constraints. Yet, the SVD magnitudes tend to fluctuate with the Ring Network and are harder to interpret given the network structure. With noise and time-varying evolution, connectivity patterns can be hidden from traditional approaches. Our approach utilizes matrix factorizations with additional constraints to filter or smooth out the noise, and provide a basis for visualization of node evolution and graph structure. Before moving to our proposed approach, we provide some background for matrix factorizations with additional constraints. 4 3.2 Penalized Matrix Factorizations The use of additional constraints in matrix factorizations is a common technique to more fully reveal structure within the data. We refer to this class of models as penalized matrix factorizations, since usually the constraints are represented as penalties using the Lagrangian form of an objective function. In penalized matrix decompositions, the factorized matrices are obtained through minimizing an objective function that consists of a goodness of fit component and a roughness penalty. The strength of the penalty is set by the user, where a larger penalty encourages smoother U and V . For instance, [41], [42], [43] add penalties for SVD on certain types of data. These penalties effectively relax the rigid orthogonality constraints of SVD, thus allowing the factorization to better represent the particular data structures. Penalties with NMF are also common (see [44], [45], [46], [47], [48] and references therein). These previous works usually consider a static setting, that is, applying factorization to a single matrix. We use penalties as a way to extend matrix factorization to a dynamic sequence of graphs. Thus, our problem poses additional challenges because we observe a sequence of adjacency matrices, and does not directly fit into existing approaches due to either the time series component or multiple, correlated nodes at each time point. In the next sections, we present the additional constraints to extend matrix factorizations for graph sequences, develop the constraints using a Langrangian penalty in an optimization, and provide estimation algorithms. 4 OVERVIEW OF P ROPOSED A PPROACH Given a time series of networks {Gt = (Vt , Et )}Tt=1 with corresponding adjacency matrices {At }Tt=1 , the goal is to produce a sequence of lower dimensional matrix factorizations. To enhance their visualization and interpretability, we impose certain constraints on the factorizations. In particular, the constraints aim to satisfy the following properties 1) The evolving basis Ut should exhibit temporal stability to preserve the mental map. 2) Nodes that are known to be similar at a particular time should be close together in Ut . 3) Insignificant nodes should be set exactly to zero in Vt to enhance interpretability. The first property aims to preserve the “mental map” in displays of Ut . This is a fundamental concern, as the effectiveness of such representations rely on the human ability to perceive and remember changes [4]. Node trajectories are visually smooth when this property is satisfied. As a consequence, time plots of each node become informative. The user can see typical time-varying node behaviors, and the number and types of behaviors in the data. The time plots form a set of static displays that explicitly incorporate the temporal dimension. These aspects help with detailed analysis and avoid difficulties with animated or dynamic graph layouts when given a large number of time points or nodes. Where the first property deals with temporal structure, the second deals with ‘spatial’ structure. In fact, the corresponding constraint encourages nodes in the same group or “cluster” to evolve together. Thus, this property is useful when incorporating prior knowledge of node groups at different points in time. If such information is unknown a priori, then this property/constraint can be omitted. The third property deals with the time-varying factors (Vt ). Setting unimportant nodes to zero improves overall interpretability of visuals and facilitates analysis by identifying important nodes at different points in time. A heatmap or display of nonzero status of each element is appropriate, due to the penalty form on Vt . In addition to the displays of Ut and Vt , matrix factorization procedures can be useful for community detection and filtration of the observed graphs. With any estimation of Ut and Vt , a reconstruction of the given graph is also obtained by taking the product Ut VtT . With large and complex graphs, the filtered versions highlight important relations and reduce clutter in subsequent visualizations by selectively removing edges. 4.1 Illustrative Example Before discussing mathematical and procedural details, we illustrate a main benefit of the proposed approach with simulated data. We consider a sequence of weighted, directed random graphs, with two embedded groups whose intergroup connections are time-varying. Weighted directional networks with evolving groups are of interest in diverse areas, including economic, technology, and biological networks. Fig. 2 illustrates an important aspect of the simulation, namely that intergroup connections evolve according to the following functional shapes f1 (t) ∝ f2 (t) ∝ (t − 25) p 1 + (t − 25)2 I{t > 33}. (3) (4) Nodes belonging to the first group connect to each other with weights following a sigmoid (growth) curve, which may be conceptually similar to countries in trading networks that experience persistent and rapid increases in trade. This group is composed of nodes 80 to 100. Nodes belonging to the second group trade with each other at a stable level only after a particular time. This type of pattern could be observed with citation networks, when papers (nodes) enter into the network, and quickly reach their maximal number of connections. This group is composed of nodes 10 to 20. All other edges exist independently with a fixed probability, with average weights held fixed. There are 100 time points and 100 nodes in each time period. We use this data to compare four models: 5 the data are more clearly represented and match the true functions governing the data generation mechanism best. The displays combine to show typical time-varying node behaviors, the number and types of node evolution in the data, and when node groups become active. 5 O PTIMIZATION P ROBLEM Returning to our given time series of adjacency matrices {At }Tt=1 , the first component of the proposed objective function measures goodness of fit: Fig. 2. Schematic of the illustrative example. Edges exist independently with some probability, with average weights given by the dashed line. The two within group connections have average weights that follow particular curves over time. 1) The direct NMF applies classical NMF to each data slice separately, without any additional smoothness constraints. 2) The penalized NMF applies NMF with our proposed additional constraints. 3) The direct SVD applies classical SVD to each data slice separately. 4) The penalized SVD applies SVD with our proposed additional constraints. For each model, we fit a sequence of one dimensional factorizations to facilitate visualization. The parameters that control the temporal and sparsity constraint strengths are searched over a grid of values. Displays for each set of parameters are made, with the one that emphasizes the structure most shown in Fig. 3. This strategy is feasible, as the estimation procedure is computationally efficient. For each estimate, we reorganize {Ût } into multidimensional time series (Û1 )1 (Û1 )2 . . . (Û1 )n (Û2 )1 (Û2 )2 . . . (Û2 )n , (5) ... (ÛT )1 (ÛT )2 . . . (ÛT )n where each row corresponds to a time point and columns index nodes. Larger rank factorizations would result in multiple time-series for every node, one for each dimension. The same organizational scheme is used for {V̂t }. Time series for the node positions in Ût and V̂t are shown in Fig. 3. For all model specifications, one could identify the three node groups from the time plots. However, the penalized NMF is most representative of the actual functions the groups follow. Similarly, the heatmap for the penalized factorizations are the most satisfactory, as the sparsity patterns complement the time plots by identifying when groups become distinguishable. Altogether, the smoothed NMF displays appear most informative, as the main structural patterns underlying min T X {Ut ,Vt }T t=1 ||At − Ut VtT ||2F . (6) t=1 After translating the properties above into constraints, we write them as penalties through Lagrange multipliers to facilitate estimation. Thus, the factorized matrices are obtained through minimizing an objective function that consists of a goodness of fit component and roughness penalties. The final, proposed objective function becomes min {Ut ,Vt }T t=1 T X ||At − Ut VtT ||2F (7) t=1 W + λt t+ 2 T X X ||Ut − Ut̃ ||2F t=1 t̃=t− W 2 + λg T X n X X ||Ut (i, :) − Ut (j, :)||22 t=1 i=1 j∈N (i) + λs T X n X K X |Vt (i, j)|, t=1 i=1 j=1 where W is a small integer representing a time window and N (k) denotes the neighborhood or group that node k belongs to. The parameters λt , λg , λs and W are all non-negative numbers set by the user to steer the analysis. In many applications it is appropriate to use W = 2 (looking one time period ahead and before) and λg = 0, so that only λt and λs need to be defined. The first penalty term controls for short term fluctuations in the evolving basis Ut . Hence, the visual effect of setting larger λt is to create smoother paths over time for each node in Ut . Larger penalization levels force nodes to have similar positions as in neighboring time steps. In fact, if λt is set to an extremely large number, then Ut will be approximately constant for all time periods, e.g., Ut = Ut . This relation provides a useful guideline and upper bound when setting the penalty level. As shown in Fig. 5 in the supplemental material, if λt is too large, the trajectories overlap and exhibit little variation due to over-smoothing. λt could also be set to zero, but then the time plots become difficult to interpret due to temporal instabilities. λg corresponds to the second (group) property in the previous section that nodes in the same group should evolve similarly. The actual penalty is similar in spirit to the first penalty term. It controls the fluctuations of 6 Direct NMF Penalized NMF Direct SVD Penalized SVD Ût V̂t Fig. 3. Estimates for the illustrative example under different model specifications. Each line (trajectory) corresponds to a node. the group within the basis Ut at each point in time. Without prior knowledge of group structure, λg is set to zero so that the constraint is optional. Otherwise if such structure is known, larger values of λg more strongly encourage groups to evolve similarly in Ut . λs corresponds to the third property that unimportant nodes should be set exactly to zero in Vt . Appropriate λs tends to emphasize the main patterns in the data. Very large λs can result in numerical instability and degenerate solutions, e.g., force all values to zero. As a general guideline, small-to-moderate amounts of sparsity seem to improve interpretability. As a rough guideline, we find setting λs five to ten times smaller than the size of λt yields interpretable displays. This will be discussed further and demonstrated in the applications. The parameter, W , controls the window width for smoothing, e.g., the number of neighboring time steps to average over. Larger values of W mean that the model has more memory so it incorporates more time points for estimation. This risks missing sharper changes in the data and only detecting the most persistent patterns. On the other hand, small values of W make the fitting more sensitive to sharp changes, but increase short term fluctuations due to smaller number of observations. We set W = 2 (looking one time period ahead and before) for all presented case studies. Larger values could be used in very noisy settings to further smooth results. analysis. Additional details are provided in the supplemental material on the optimization and corresponding algorithm for SVD. Below we give the algorithm accommodating temporal and sparsity penalties only, e.g., without the group penalty. Though the final algorithm with a group penalty is almost identical to the one presented below, there is some added algebraic complexity stemming from potentially arbitrary group structure. We avoid this difficulty by writing the temporal and group penalties with a Laplacian smoothing matrix. Details are provided in the supplemental material. The benchmark algorithm for NMF was proposed by Lee and Seung [33], [40], and is known as ‘multiplicative updating’. The algorithm can be viewed as an adaptive gradient descent, and was shown to find local minima of the objective function. It is relatively simple to implement, but can converge slowly due to its linear convergence rate [49]. In practice we find that after a handful of iterations, the algorithm results in visually meaningful factorizations. The multiplicative updating algorithm is shown in Algo. 1. The updating rules are derived from standard arguments. Details are again given in the supplemental material. 7 V ISUALIZING E VOLVING N ODE C ONNECTIV- ITIES 6 O BTAINING E STIMATES In this section, we focus on the algorithmic aspects of NMF, since we find it is preferable to SVD for visual After estimating the matrix factors, we reorganize each dimension of {Ut } and {Vt } into multidimensional time series, as shown in (5). Time plots and heatmaps (or 7 Algorithm 1 Algorithm for penalized NMF with temporal and sparsity constraints 1: Set constants λt , λs , W . 2: Initialize {Ut }, {Vt } as dense, positive random matrices. 3: repeat 4: for t=1..T do 5: Set W (At Vt +λt (Ut )ij ← (Ut )ij 6: 7: 8: Set Pt+ 2 Ut̃ +λt t̃=t+1 t̃=t− W 2 (Ut VtT Vt +W λt Ut )ij Pt−1 Ut̃ )ij . (a) No penalties (ATt Ut )ij (Vt )ij ← (Vt )ij . (Vt UtT Ut )ij + λs end for until Convergence displays of non-zero entries) for each dimension of Ut and Vt , respectively, are generated. From the time plots of Ut , one can see typical timevarying node trajectories, and the number and types of nodes in the data. Vt are useful for identifying when particular nodes or groups of nodes become important from a connectivity perspective. In particular, the penalty on Vt drives nodes to zero exactly when unimportant. As a consequence, we visualize sparsity pattern with heatmaps. When the factorization displays are combined, the user can discover potential groups that evolve similarly (Ut ) and whether the trajectories are important (Vt ). This provides analysts a way to uncover dynamic structure different from typical dense clumps on the network. Moreover, it provides an exploratory view of all time points and node connectivities simultaneously in a set of static displays. 7.1 Case Studies 7.1.1 Preferential Attachment Process In this simulation, we observe 100 noisy snapshots of a preferential attachment graph as it forms. Nodes attach according to a preferential attachment model until 10000 nodes have ’attached’ to the embedding. We observe this growing process at 100 uniformly spaced time points. Thus, at each time point 100 new nodes attach to the graph. We use source code from a networks MATLAB toolbox [50] that generates preferential attachment graphs according to the standard model [10], [51]. In the preferential attachment model, Π(i), which represents the probability that a new node connects to node i, depends on node i’s degree. Specifically, we have Π(i) ∝ di (8) where di is the degree of the ith node. This generating framework leads to large networks whose degree distribution follows a power-law distribution with parameter γ = 3. Graphs with heavy-tailed degree distributions are (b) λt = 50, λs = 5 (c) λt = 100, λs = 5 Fig. 4. Fitted values for U and V over time for the preferential attachment embedding. The left column shows a time plot of Ut over different parameter values. Each line corresponds to a node on the graph. The right column identifies the nonzero elements of Vt . Each row corresponds to a node on the graph and time varies along the horizontal axis. commonly observed in a variety of areas, such as the Internet, protein interactions, citation networks, among others [52]. Given appropriate levels of penalization, NMF yields interpretable decompositions with just a sequence of one-dimensional (K = 1) approximations. In onedimensional space, important nodes have distinct trajectories that indicate their relative importance to the network. In preferential attachment, nodes that acquire more connections will increase their degree at a higher rate as time goes on. This consequence of the generating process 8 is reflected in the estimated Ut , shown in Fig. 4. We can see many nodes with trajectories near zero, or with trajectories that increase at a slow rate. It is difficult to discern by eye which of these curves are important. The displays of Vt that indicate nonzero elements convey this information. We show the non-zero status to denote the on-off relationship clearly to the user, since some nodes have values very close, but not exactly equal to zero. The connectivity pattern, such as attachment order, is clearly conveyed in the pseudo upper triangular form. Rows (nodes) are ordered according to their sums. Fig. 4 shows the estimated factorizations for three different levels of penalization. Without a temporal penalty, the time plots emphasize only the most dominant, highest degree node. Setting non-zero λt smooths the node trajectories and highlights important nodes as other vertices attach to it. Sparsity is then important to keep displays of Vt uncluttered and hence interpretable. 7.1.2 arXiv Citations We investigate a time series of citation networks provided as part of the 2003 KDD Cup [53]. The graphs are from the e-print service arXiv for the ‘high energy physics theory’ section. The data covers papers in the period from October 1993 to December 2002, and is organized into monthly networks. In particular, if paper i cites paper j, then the graph contains a directed edge from i to j. Any citations to or from papers outside the dataset are not included. We also choose to aggregate edges, that is, the citation graph for a given month will contain all citations from the beginning of the data up to, and including, the current month. Altogether, there are 22750 papers (nodes) with 176602 edges over 112 months. Statistical properties of the data were discussed in [54], which found that the networks feature decreasing diameter over time and heavy-tailed degree distributions. Fig. 5 compares the direct and penalized factorizations using a sequence of one-dimensional approximations (an inner rank of one, K = 1). As observed with the preferential attachment experiments, the time plots are difficult to read without penalties. The paper trajectories are smoothed effectively and the important papers are highlighted by employing penalties. Moreover, the displays of Vt show behavior similar to the preferential attachment setting. A main difference is that papers do not appear uniformly throughout time. They ‘attach’ at a faster rate around year 2000. Again, the displays show the non-zero status to denote the on-off relationship more clearly to the user, as some nodes have values very close, but not exactly equal to zero. As the penalized fits show, there are two important periods in the data. The first period covers 1996-1999, and featured papers mostly on an extension of string theory called M-theory. M-theory was first proposed in 1995 and led to new research in theoretical physics. A number of scientists, including Witten, Sen, Polchinski, and Duff, were important to the historical development of the theory, and as seen in Table 1, our NMF approach identifies these important authors and their works. From 1999-2000 the citations to these papers decreased, while focus shifted to other topics and subfields that M-theory gave rise to. The top papers from year 2000 and after also include review papers on M-theory, signaling the maturity of the topic. Tables 1 and 2 show the top 10 papers in each time period. Once again, we have provided a simple workflow that allows the user to visually uncover the patterns in the data. We first fit the penalized, rank 1 NMF for each graph. We display the estimated components, and from this are able to identify the key papers and individuals that contributed to high energy and theoretical physics. 8 V ISUALIZING C OMMUNITY D ETECTION E MBEDDED S TRUCTURE AND Next, we discuss interpretations and visualizations that can be facilitated using matrix factorizations as a community detection and filtering mechanism. 8.1 Community Detection In addition to visualizing evolving connectivity with Ut and Vt , matrix factorization procedures are useful for overlapping community detection. In particular, we use the smooth Ut to group nodes resulting in a community structure that is temporally stable. The rank (K) of Ut corresponds to the number of communities in the graph sequence. The contribution of each community to node i is measured by the relative magnitude of the i-th element PK of each dimension of Ut , e.g., (Ut )ik / k=1 (Ut )ik . In the following case studies, we utilize a small extension to traditional graph drawing that effectively translates the overlapping community structure to the user. Specifically, we color each node according to the relative contribution of each community with a pie chart. In principle, one can also use Vt to measure relative community contribution. Though, this can sometimes be visually unsatisfactory due to transient community assignments. A previous work in a static setting [38], PK used the full product (Ut )ik (Vt )jk / k=1 (Ut )ik (Vt )jk to measure the relative contribution of each community to each (At )ij edge. First, all edges are assigned to the community with largest relative contribution. Then, for the given node, the pie-chart displays the proportion of its edges that belong to each community. However, this can also be visually unsatisfactory due to unstable community assignments. Using the smooth Ut and defining communities in terms of nodes ensures the stability of the community structure through time. 8.2 Discovering and Visualizing Embedded Structure The matrix factorizations also serve as a filter through the product Ut VtT . The filtered graphs, which selectively remove unimportant edges, can help highlight important 9 No Penalty λs = 1 λs = 5 λs = 20 Fig. 5. Fitted values for Ut and Vt for the arXiv data with λt = 100. Each light gray line corresponds to a paper (node) on the graph. The bold lines show the average of the 10 papers with highest average Û from 1996-1999, and 2001 onwards (dashed). Each row in the heatmaps corresponds to a paper and time varies along the horizontal axis. TABLE 1 The top 10 papers with highest average Û from 1996-1999. # citations counts all references to the work, including by papers outside of our data. These counts obtained via Google. Title Evidence for F-Theory Notes on D-Branes Harmonic superpositions of M-branes Comments on String Dynamics in Six Dimensions Strings on Orientifolds M-Theory (the Theory Formerly Known as Strings) An Introduction to Non-perturbative String Theory BPS Quantization of the Five-Brane Black Holes and Solitons in String Theory BPS Spectrum of the Five-Brane and Black Hole Entropy Authors Vafa Polchinski, et. al Tseytlin Seiberg and Witten Dabholkar and Park Duff Sen Dijkgraaf, et. al Youm Dijkgraaf, et. al In-Degree 135 75 54 44 39 35 34 33 29 28 Out-Degree 4 7 4 2 2 3 21 3 24 4 # citations (Google) 993 556 374 285 195 283 189 156 163 181 TABLE 2 The top 10 papers with highest average Û from 2001 onwards. Title Supergravity and a Confining Gauge Theory: Duality Cascades and χSB-Resolution of Naked Singularities The String Dual of a Confining Four-Dimensional Gauge Theory Gravity Duals of Supersymmetric SU(N) x SU(N+M) Gauge Theories Curvature Singularities: the Good, the Bad, and the Naked M(atrix) Theory: Matrix Quantum Mechanics as a Fundamental Theory TASI Lectures: Introduction to the AdS/CFT Correspondence Anatomy of Two Holographic Renormalization Group Flows D3-brane Holography The Holographic Renormalization Group Strings, Branes and Extra Dimensions Authors Klebanov and Strassler In-Degree 197 Out-Degree 46 # citations (Google) 1271 Polchinski and Strassler Klebanov and Tseytlin Gubser Taylor Klebanov Bianchi, et. al Danielsson, et. al Boer Forste 179 112 102 60 25 21 12 6 5 83 40 70 274 74 55 69 46 302 435 453 211 232 137 54 26 34 56 10 node relations when used in combination with network visualization tools. The different penalty settings are used to control how edges are removed from the original graph. For instance, with a very large temporal penalty, Ut becomes effectively constant. This forces the reconstruction to reflect an “average” community structure from the full time period. There are additional issues to consider when primarily interested in filtering that we highlight next. Unweighted graphs feature adjacency matrices that are defined by the location of zeros. However, Algo. 1 results in reconstructions Ât = Ut VtT that commonly contain elements close to, but not exactly equal to zero. Hence, if graph filtering is the goal, one must also set a small threshold to define edges. In particular, if the estimated value (Â)ij is less than the threshold, then (Â)ij is set to zero. After this thresholding step, the filtered graph can be fed into traditional graph drawing tools. Since undirected graphs feature symmetric adjacency matrices, it may seem natural to impose symmetry upon our matrix factorization: At ≈ Ut UtT . Case Studies Below two new case studies are presented. The arXiv citation data is also discussed in the supplemental material. 8.3.1 ● ● Fig. 6. The cell phone network from a day using a force directed layout algorithm in igraph. Node 200 is colored black. A filtered version of this graph is shown in Fig. 7. (9) We find that symmetric NMF is far more sensitive to penalization than its general counterpart. It exemplifies less flexibility, since any additional constraint strongly influences the overall accuracy of the estimation. On the other hand, with general matrix factorization, as Vt changes, Ut compensates in order for the final product to reproduce the data as best as possible. For completeness, the updating rules for the symmetric matrix factorizations are provided in the supplemental material. 8.3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ●● ● ●● ●● ●●● ● ● ●● ●● ● ●●● ● ●●●●●● ● ● ●● ● ● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●●● ● ● ●●● ●● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ●●● ●●●● ●● ● ●●●●●● ● ● ● ●●● ● ● ● ● ●●● ●● ●● ● ●● ●● ● ● ●●● ● ● ● ● ●●● ●● ●● ● ●●●●●●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ●●●● ● ●● ●●● ● ● ●● ●● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●●● ●●● ●● ● ●●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ●● ● ● ● ● ●● ●● ●●● ●●●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Catalano Communication Network We demonstrate the value of the proposed approach as a data filter by analyzing the Catalano social network, which was part of the VAST 2008 challenge [55]. The synthetic data consists of 400 unique cell phone IDs over a ten day period. Altogether, there are 9834 phone records with the following fields: calling phone identifier, receiving phone identifier, date, time of day, call duration, and cell tower closest to the call origin. The purpose of the challenge was to characterize the social structure over time for a fictitious, controversial sociopolitical movement. In particular, the challenge requires identifying five key individuals that organize activities and communications for the network; a hint was given to challenge participants that node 200 is one of the persons of interest. We use the first seven days of data to illustrate our methodology, since there is a strong change in the connection patterns from day 8-10 for node 200 (see [55], [56] and references therein). Directed networks were constructed daily by drawing an edge from the caller to the receiver. Fig. 6 shows an example of one day’s network. The graph is too cluttered to visually identify leaders of the network or get a sense of the network structure. We apply the penalized NMF to filter the networks and highlight the main structural patterns. Specifically, we use a large temporal penalty to keep only the most persistent interactions while removing transient communications. Thus, the temporal aspect of the data is utilized through the penalization to discover important interactions. The reconstructed network is shown in Fig. 7. The persons of interest and the hierarchical structure of the communication network are clearly shown. Node 200 relays information to his neighbors (1,2,3,5), and each neighbor disseminates information to his respective subordinates. The pie chart on each node displays the contribution of each community measured with Ut . Fig. 7 shows nodes higher up on the social hierarchy tend to belong to multiple communities, presumably since they disseminate information to different groups of subordinates. The filter was run using the following parameter settings: K = 7, W = 2, λt = 10, λg = λs = 0, and a threshold of 0.2 to discretize the reconstruction. Different values of K, λt and the post processing threshold were evaluated with a grid search. Parameter values were chosen to emphasize readability and interpretability of the filtered graph embeddings. We find that the key persons of interest and the hierarchical structure are robust to the number of communities (inner rank, K) of the reconstruction. However, the com- 11 3 5 200 1 2 Fig. 7. The filtered network using Fig. 6 as input. A force directed layout in igraph was used to create this embedding. Nodes are colored by soft partitioning via the penalized NMF. munity assignments are not as interpretable with smaller K. Additional details are given in the supplemental material. Since we apply the matrix factorization as a filter, accuracy in the reconstruction is important. We set λs = 0, as sparsity can cause the reconstruction to miss important edges. The results could be sharpened further by utilizing the additional information, such as geographic location and call duration. This could be done in our modeling approach by defining groups and employing a group penalty. VAST never officially released correct answers for the challenge. However, our analysis closely matches winning entries [56], [57], [58]. Treating the conclusions of the entries as ground truth, we have provided a simple workflow that uncovers the patterns in the data. By plotting the filtered graphs, we are able to correctly identify the key individuals that organize activities and communications for the network. Moreover, the use of matrix factorizations as a community detection tool helps the graph layout communicate the organizational structure underlying the socio-political movement. 8.3.2 Global Trade Flows In this example, the data consists of annual, total bilateral trade flows between 164 countries from 1980-1997 [59]. Thus, we observe a dynamic, weighted graph at 18 time points, where each directional edge denotes the total value of exports from one country to another. Since trade flows can differ in size by orders of magnitude, we work with trade values that are expressed in log dollars. Fig. 7 in the Supplemental material shows examples of the data. The representations are too dense to convey much information. In fact, a typical analysis primarily relies on non-visual approaches. Aggregate statistics, such as import and export totals or measures of centrality are commonly used to identify important countries and flows [60]. Our approach additionally utilizes the time aspect of connections to uncover interesting patterns in the data that can be successfully visualized using traditional techniques. Specifically, we reduce clutter in the displays by removing unimportant flows, and produce time-varying clustering structure that groups countries according to their trading activities. Even with no drawn edges, these time varying communities visually convey trading partners and geopolitical alignment. Fig. 8 displays the trade flows after applying penalized NMF, and there are a number of interesting insights that become apparent. First, Europe and Japan seem to be trading hubs that connect other countries, such as the United States and Russia. The coloring scheme, shown without edges in the second row, denotes overlapping community structure. We use three underlying communities (K = 3), and let each be colored as red, blue or green. The particular color for a country is given by the relative contribution of the three communities. For example, the US and western Europe are bright green, denoting that they belong only to one group. In contrast, Russia is bright red indicating it belongs to another group. Countries like China belong to multiple communities, and the composition changes over time. We can roughly interpret the green as Western economies and red as the communist economies, led by the former Soviet Union, which fell in 1989. A year after the Soviet Union fell, China, India and a number of Persian Gulf and middle Eastern countries that were partially aligned with the Soviet Union, began to trade more with Europe and the US. This behavior continued so that by 1997 green is the dominant color, even though in 1980 the green and red are roughly equal. The third group denoted by blue tends to consist of countries that experienced average or lower growth rates in the data. It primarily consists of African and Central American countries, and is relatively stable. The number of communities is chosen according to cross validation. This type of approach helps balance model complexity (number of communities) and accuracy. We find that an NMF model with K = 3 is best, since additional communities do not significantly improve the reconstruction or interpretability of the clustering. Details on cross validation are given in the supplemental material. The third row plots the networks using a force based graph drawing algorithm. These layouts provide additional insights. The United States’ importance to global trade increased over time and is more distinct, becoming the most dominant node on the graph by 1990. European countries tend to form strong communities due 12 to heightened inter-continental trade. However, by the 1990’s there is an additional subcommunity, comprised mostly of the so-called ’Asian miracles’: countries in east Asia that experienced persistent and rapid economic growth in the 1990’s [61], [62]. Lastly, the homogeneity of country colors in the graphs support the notion that the world trade network has become more interconnected. Thus, this analysis may foreshadow the benefits and potential risks to the global economy stemming from the modern day Euro crisis, in which multiple European countries may fail if a single country fails. 9 D ISCUSSION The main idea behind the approach presented in this paper is to abstract the network sequence to a sequence of lower dimensional spaces using matrix factorizations. Next, we highlight some of the strengths and weaknesses of this approach. 9.1 Strengths An important benefit is the versatility and scalability of matrix factorization. Table 1 in the supplemental material shows Algo. 1 run times for all case studies. The computational speed is fast enough to use as a preprocessing step before applying traditional visualization tools. Moreover, it is reasonable to select parameters (λt , λs , K) with a grid search or cross validation. Using the factorizations as a basis for an exploratory visual tool can help users uncover different connectivity patterns and evolution in the data. The displays of Ut and Vt can lead to group identification or a ranking of nodes on their importance to connectivity. They also give the user a sense of the data complexity by the types and numbers of trajectories. Matrix factorizations are useful for community detection and selectively removing edges, which both facilitate visualizations of graphs. The use of temporal penalties allow the user to control how sensitive the recovered structure is to short term fluctuations. Smoothing out noise and highlighting the main temporal trends can facilitate other matrix-based representations, such as [8], for visualization of edge dynamics. 9.2 Weaknesses The optimal choice of tuning parameters (λt , λs ) is dependent on perception and how the edge weights are scaled. Thus, the user will likely need to experiment with different parameters each time a sequence of networks is encountered. This can limit the benefits of the proposed approach when given a large multiple of network sequences. Time plots and heatmaps to visualize each factor yield limited information about the underlying global topology. For example, one can see from Figs. 4 and 5 that there are dominant nodes, but in principle, there could be many topologies that feature dominant nodes. One cannot say for sure without additional analysis that the networks have an emergent hub structure or follow a particular model. Conveying global geometric structure is traditionally an area where graph drawing excels. Thus, combining the matrix factorization approach in this article with existing visualization tools can provide a more comprehensive view of the data. 9.3 Future Work An important area of exploration would be to systematically compare penalized versions of NMF and SVD. In this work we chose to focus on NMF, since we find the corresponding displays preferable in terms of interpretability. This is generally consistent with existing literature on matrix factorization. However, SVD of graph related matrices have deep connections to classical spectral layout and problems in community detection. There may be classes of graph topologies and particular visualization goals under which SVD is preferable. There could also be other types and combinations of penalties that are useful in visualization and detection of graph structure. In this work, we are primarily interested visualizing the evolution of node connectivity and extracting stable structure when given dynamic networks with short term fluctuations. If one is interested in evolution at the group or edge level, or in detecting particular topologies, other penalties may be more beneficial. R EFERENCES [1] B. Skyrms and R. Pemantle, “A dynamic model of social network formation,” Proceedings of the National Academy of Sciences, vol. 97, no. 16, pp. 9340–9346, 2000. [Online]. Available: http://www.pnas.org/content/97/16/9340.abstract [2] R. Opgen-Rhein and K. Strimmer, “Inferring gene dependency networks from genomic longitudinal data: a functional data approach,” REVSTAT, vol. 4, no. 1, pp. 53–65, 2006. [3] L. Adamic, C. Brunetti, J. H. Harris, and A. A. Kirilenko, “Trading Networks,” SSRN eLibrary, 2010. [4] D. Archambault, H. Purchase, and B. Pinaud, “Animation, small multiples, and the effect of mental map preservation in dynamic graphs,” Visualization and Computer Graphics, IEEE Transactions on, vol. 17, no. 4, pp. 539 –552, april 2011. [5] S. Ghani, N. Elmqvist, and J.-S. Yi, “Perception of animated nodelink diagrams for dynamic graphs,” Computer Graphics Forum (Proc. EuroViz 2012), vol. 31, no. 3, pp. 1205–1214, 2012. [6] T. von Landserber, A. Kuijper, T. Schreck, J. Kohlhammer, J.-J. van Wijk, and D.-W. Fellner, “Visual analysis of large graphs,” EuroGraphics state of the art reports, 2010. [7] T. von Landesberger, A. Kuijper, T. Schreck, J. Kohlhammer, J. van Wijk, J.-D. Fekete, and D. Fellner, “Visual analysis of large graphs: State-of-the-art and future research challenges,” Computer Graphics Forum, vol. 30, no. 6, pp. 1719–1749, 2011. [Online]. Available: http://dx.doi.org/10.1111/j.1467-8659.2011.01898.x [8] J. S. Yi, N. Elmqvist, and S. Lee, “Timematrix: Analyzing temporal social networks using interactive matrix-based visualizations,” International Journal of Human-Computer Interaction, vol. 26, no. 1112, pp. 1031–1051, 2010. [9] M. Newman, Networks: An Introduction. Oxford University Press, 2010. [10] M. Newman, A. Barabási, and D. Watts, The Structure And Dynamics of Networks, ser. Princeton Studies in Complexity. Princeton University Press, 2006. [11] E. Bullmore and O. Sporns, “Complex brain networks: Graph theoretic analysis of structural and functional systems,” Nature Reviews, vol. 10, pp. 186–198, 2009. 13 1980 1990 1997 Fig. 8. World maps over time, where countries are colored corresponding to their NMF factorization, with three communities. The particular color mixture indicates how much each component contributes to the country. The first row shows the most important reconstructed edges. [12] M. Kaufmann and D. Wagner, Drawing Graphs: Methods and Models, ser. Lecture Notes in Computer Science. Springer, 2001. [13] G. Di Battista, Graph drawing: algorithms for the visualization of graphs, ser. An Alan R. Apt book. Prentice Hall, 1999. [14] D. Auber, “Tulip a huge graph visualization framework,” in Graph Drawing Software, ser. Mathematics and Visualization, M. Jnger and P. Mutzel, Eds. Springer Berlin Heidelberg, 2004, pp. 105–126. [15] M. Bastian, S. Heymann, and M. Jacomy, “Gephi: An open source software for exploring and manipulating networks,” 2009. [Online]. Available: http://www.aaai.org/ocs/index.php/ ICWSM/09/paper/view/154 [16] D. Auber, D. Archambault, R. Bourqui, A. Lambert, M. Mathiaut, P. Mary, M. Delest, J. Dubois, and G. Mélançon, “The Tulip 3 Framework: A Scalable Software Library for Information Visualization Applications Based on Relational Data,” INRIA, Research Report RR-7860, Jan. 2012. [Online]. Available: http: //hal.inria.fr/hal-00659880 [17] W. De Nooy, A. Mrvar, and V. Batagelj, Exploratory Social Network Analysis With Pajek, ser. Structural Analysis in the Social Sciences. Cambridge University Press, 2011. [18] U. Brandes and D. Wagner, “Visone – analysis and visualization of social networks,” in GRAPH DRAWING SOFTWARE. SpringerVerlag, 2003, pp. 321–340. [19] N. Elmqvist and J.-D. Fekete, “Hierarchical aggregation for information visualization: Overview, techniques, and design guidelines,” Visualization and Computer Graphics, IEEE Transactions on, vol. 16, no. 3, pp. 439 –454, may-june 2010. [20] Z. Shen, K.-L. Ma, and T. Eliassi-Rad, “Visual analysis of large heterogeneous social networks by semantic and structural abstraction,” Visualization and Computer Graphics, IEEE Transactions on, vol. 12, no. 6, pp. 1427 –1439, nov.-dec. 2006. [21] C. Correa, Y.-H. Chan, and K.-L. Ma, “A framework for uncertainty-aware visual analytics,” in Visual Analytics Science and Technology, 2009. VAST 2009. IEEE Symposium on, oct. 2009, pp. 51 –58. [22] C. Correa, T. Crnovrsanin, and K.-L. Ma, “Visual reasoning about social networks using centrality sensitivity,” Visualization and Computer Graphics, IEEE Transactions on, vol. 18, no. 1, pp. 106 –120, jan. 2012. [23] I. Dhillon, Y. Guan, and B. Kulis, “Weighted graph cuts without eigenvectors a multilevel approach,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 29, no. 11, pp. 1944 –1957, nov. 2007. [24] Y. Frishman and A. Tal, “Online dynamic graph drawing,” Visualization and Computer Graphics, IEEE Transactions on, vol. 14, no. 4, pp. 727 –740, july-aug. 2008. [25] H. Purchase and A. Samra, “Extremes are better: Investigating mental map preservation in dynamic graphs,” in Diagrammatic Representation and Inference, ser. Lecture Notes in Computer Science, G. Stapleton, J. Howse, and J. Lee, Eds. Springer Berlin / Heidelberg, 2008, vol. 5223, pp. 60–73. [26] P. Saffrey and H. Purchase, “The ”mental map” versus ”static aesthetic” compromise in dynamic graphs: a user study,” in Proceedings of the ninth conference on Australasian user interface Volume 76, ser. AUIC ’08. Darlinghurst, Australia, Australia: Australian Computer Society, Inc., 2008, pp. 85–93. [27] E. Tufte, Envisioning information. Graphics Press, 1990. [28] M. Farrugia and A. Quigley, “Effective temporal graph layout: A comparative study of animation versus static display methods,” Information Visualization, vol. 10, no. 1, pp. 47–64, 2011. [29] F. R. K. Chung, Spectral Graph Theory. Amer. Math. Soc., 1997. [30] Y. Koren, “Drawing graphs by eigenvectors: theory and practice,” Computers & Mathematics with Applications, vol. 49, no. 1112, pp. 14 [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] 1867 – 1888, 2005. [Online]. Available: http://www.sciencedirect. com/science/article/pii/S089812210500204X T. Hastie, R. Tibshirani, and J. H. Friedman, The elements of statistical learning: data mining, inference, and prediction: with 200 full-color illustrations. New York: Springer-Verlag, 2001. U. Brandes, D. Fleischer, and T. Puppe, “Dynamic spectral layout of small worlds,” pp. 25–36, 2006, 10.1007/11618058 3. [Online]. Available: http://gdea.informatik.uni-koeln.de/677/ D. D. Lee and H. S. Seung, “Learning the parts of objects by nonnegative matrix factorization,” Nature, vol. 401, pp. 788–791, 10 1999. P. Paatero and U. Tapper, “Positive matrix factorization: A nonnegative factor model with optimal utilization of error estimates of data values,” Environmetrics, vol. 5, no. 2, pp. 111–126, 1994. [Online]. Available: http://dx.doi.org/10.1002/env.3170050203 K. Devarajan, “Nonnegative matrix factorization: An analytical and interpretive tool in computational biology,” PLoS Comput Biol, vol. 4, no. 7, p. e1000029, 07 2008. [Online]. Available: http://dx.doi.org/10.1371%2Fjournal.pcbi.1000029 C. Ding, X. He, and H. D. Simon, “On the equivalence of nonnegative matrix factorization and spectral clustering,” in Proc. SIAM Data Mining Conf, 2005, pp. 606–610. C. Ding, T. Li, and W. Peng, “On the equivalence between nonnegative matrix factorization and probabilistic latent semantic indexing,” Comput. Stat. Data Anal., vol. 52, no. 8, pp. 3913–3927, Apr. 2008. [Online]. Available: http://dx.doi.org/10.1016/j.csda. 2008.01.011 I. Psorakis, S. Roberts, M. Ebden, and B. Sheldon, “Overlapping community detection using bayesian non-negative matrix factorization,” Phys. Rev. E, vol. 83, p. 066114, Jun 2011. [Online]. Available: http://link.aps.org/doi/10.1103/PhysRevE.83.066114 F. Wang, T. Li, X. Wang, S. Zhu, and C. Ding, “Community discovery using nonnegative matrix factorization,” Data Min. Knowl. Discov., vol. 22, pp. 493–521, May 2011. [Online]. Available: http://dx.doi.org/10.1007/s10618-010-0181-y D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” Advances in neural information processing systems, pp. 556–562, 2001. H. Zou, T. Hastie, and R. Tibshirani, “Sparse principal component analysis,” Journal of Computational and Graphical Statistics, vol. 15, no. 2, pp. 265–286, 2006. D. M. Witten, R. Tibshirani, and T. Hastie, “A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis,” Biostatistics, vol. 10, no. 3, pp. 515–534, 2009. [Online]. Available: http: //biostatistics.oxfordjournals.org/content/10/3/515.abstract J. Guo, G. James, E. Levina, G. Michailidis, and J. Zhu, “Principal component analysis with sparse fused loadings,” Journal of Computational and Graphical Statistics, vol. 19, no. 4, pp. 930–946, 2010. [Online]. Available: http://pubs.amstat.org/doi/abs/10. 1198/jcgs.2010.08127 M. W. Berry, M. Browne, A. N. Langville, V. P. Pauca, and R. J. Plemmons, “Algorithms and applications for approximate nonnegative matrix factorization,” in Computational Statistics and Data Analysis, 2006, pp. 155–173. Z. Chen and A. Cichocki, “Nonnegative matrix factorization with temporal smoothness and/or spatial decorrelation constraints,” in Laboratory for Advanced Brain Signal Processing, RIKEN, Tech. Rep, 2005. P. O. Hoyer, “Non-negative sparse coding,” in In Neural Networks for Signal Processing XII (Proc. IEEE Workshop on Neural Networks for Signal Processing), 2002, pp. 557–565. ——, “Non-negative matrix factorization with sparseness constraints,” J. Mach. Learn. Res., vol. 5, pp. 1457–1469, December 2004. [Online]. Available: http://portal.acm.org/citation.cfm?id= 1005332.1044709 D. Cai, X. He, J. Han, and T. Huang, “Graph regularized nonnegative matrix factorization for data representation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 33, no. 8, pp. 1548 –1560, aug. 2011. M. Chu, F. Diele, R. Plemmons, and S. Ragni, “Optimality, computation, and interpretation of nonnegative matrix factorizations,” SIAM JOURNAL ON MATRIX ANALYSIS, pp. 4–8030, 2004. G. Bounova, “Matlab tools for network analysis,” dec. 2011. [Online]. Available: http://strategic.mit.edu/downloads.php A.-L. Barabsi and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286, no. 5439, pp. 509–512, 1999. [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [Online]. Available: http://www.sciencemag.org/content/286/ 5439/509.abstract A. Clauset, C. Rohilla Shalizi, and M. E. J. Newman, “Power-law distributions in empirical data,” ArXiv e-prints, Jun. 2007. J. Gehrke, P. Ginsparg, and J. M. Kleinberg, “Overview of the 2003 kdd cup,” in SIGKDD Explorations, vol. 5, no. 2, 2003, pp. 149 –151. J. Leskovec, J. Kleinberg, and C. Faloutsos, “Graphs over time: densification laws, shrinking diameters and possible explanations,” in Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, ser. KDD ’05. New York, NY, USA: ACM, 2005, pp. 177–187. [Online]. Available: http://doi.acm.org/10.1145/1081870.1081893 Proceedings of the IEEE Symposium on Visual Analytics Science and Technology, IEEE VAST 2008, Columbus, Ohio, USA, 19-24 October 2008. IEEE, 2008. A. A. Shaverdian, H. Zhou, G. Michailidis, and H. V. Jagadish, “Algebraic visual analysis: the catalano phone call data set case study,” in Proceedings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with Interactive Exploration, ser. VAKD ’09. New York, NY, USA: ACM, 2009, pp. 74–82. [Online]. Available: http://doi.acm.org/10.1145/1562849.1562858 Z. Shen and K.-L. Ma, “Mobivis: A visualization system for exploring mobile data,” in Visualization Symposium, 2008. PacificVIS ’08. IEEE Pacific, march 2008, pp. 175 –182. Q. Ye, B. Wu, D. Hu, and B. Wang, “Exploring temporal egocentric networks in mobile call graphs,” in Fuzzy Systems and Knowledge Discovery, 2009. FSKD ’09. Sixth International Conference on, vol. 2, aug. 2009, pp. 413 –417. R. C. Feenstra, R. E. Lipsey, H. Deng, A. C. Ma, and H. Mo, “World trade flows: 1962:2000,” NBER Working Paper no. 11040, 2004. L. De Benedictis and L. Tajoli, “The world trade network,” The World Economy, vol. 34, no. 8, pp. 1417–1454, 2011. [Online]. Available: http://dx.doi.org/10.1111/j.1467-9701.2011.01360.x J. E. Stiglitz, “Some lessons from the east asian miracle,” The World Bank Research Observer, vol. 11, no. 2, pp. 151–177, 1996. [Online]. Available: http://wbro.oxfordjournals.org/content/11/ 2/151.abstract R. R. Nelson and H. Pack, “The asian miracle and modern growth theory,” The World Bank, Policy Research Working Paper Series 1881, Feb. 1998. [Online]. Available: http://ideas.repec. org/p/wbk/wbrwps/1881.html Shawn Mankad received a B.S. in Mathematics and Statistics from Carnegie Mellon University in 2008, and an M.A. in Statistics from the University of Michigan in 2012. He is working toward a Statistics PhD at the University of Michigan under the supervision of George Michailidis. His research interests include space-time models for information extraction and visualization, network estimation, and analytical techniques with applications to Economics, Finance, complex systems, among others. George Michailidis received his Ph.D. in Mathematics from UCLA in 1996. He was a postdoctoral fellow in the Department of Operations Research at Stanford University from 1996 to 1998. He joined The University of Michigan in 1998, where he is currently a Professor of Statistics, Electrical Engineering & Computer Science. He is a Fellow of the Institute of Mathematical Statistics and of the American Statistical Association and an Elected Member of the International Statistical Institute. He has served as Associate Editor of many statistics journals including Journal of the American Statistical Association, Technometrics, Journal of Computational and Graphical Statistics. His research interests are in the areas of analysis and inference of high dimensional data, networks and visualization.