Reconstruction from Randomized Graph via Low Rank Approximation LetingWu XiaoweiYing, XintaoWu Dept. Software and Information Systems Univ. of N.C. – Charlotte Outline Background & Motivation Low Rank Approximation on Graph Data Reconstruction from Randomized Graph Evaluation Privacy Issue 2 Background & Motivation 3 Background In the process of publishing/outsourcing network data for mining/analysis, pure anonymization is not enough for protecting the privacy due to topology based attacks(Active/passive attacks, subgraph attacks). Graph Randomization/Perturbation: Random Add/Del edges (no. of edges unchanged) Random Switch edges (nodes’ degree unchanged) Feature preserving randomization Spectrum preserving randomization Feature preserving via Markov-chain based graph generation Clustering --- grouping subgraphs into supernodes 4 Motivation We focus on whether we can reconstruct a graph from s.t. Our Focus 5 Low Rank Approximation on Graph Data 6 Adjacency Matrix & Its Eigen-Decomposition Matrix Representation of Network Adjacency Matrix A (symmetric) Eigen-decomposition: Questions: What are their relations with graph topology? 7 Leading Eigenpairs vs. Graph Topology What are the role of positive and negative eigen-pairs in graph topology? Without loss of generality, we partition the node set into two groups and the adjacency matrix can be partitioned as where and represent the edges within the two groups and represents the edges between the groups 8 Leading Eigenpairs vs. Graph Topology Original r=1 9 r=2 Leading Eigenpairs vs. Graph Topology Original r=1 10 r=2 Leading Eigenpairs vs. Graph Topology 11 Original r=1 r=2 r=4 Low Rank Approximation on Graph Data Low Rank Approximation: This provide a best r rank approximation to A To keep the structure of adjacency matrix, discrete following: 12 as Reconstruction from Randomized Graph 13 Reconstructed Features (Political Blogs, Rand Add/Del 40% of Edges) 14 Determine Number of Eigen-pairs Question: How to choose an optimal rank r for reconstruction? Solution: Choose as the indicator since it is closely related to the other features and there exists an explicit moment estimator where m is the number of edges, k is the number of edges add/delete, 15 Algorithm 16 Evaluation 17 Effect of Noise (Political Blogs) The method works well to a certain level of noise Even with high level of noise, the reconstructed features are still closer to the original than the randomized ones 18 Reconstructed Features on 3 real network data Reconstruction Quality When , the reconstructed features are closer to the original ones than the randomized ones All positive for the three data sets 19 Privacy Issue 20 Privacy Issue 21 Normalized F Norm Political Blogs Normalized F Norm Can this reconstruction be used by attackers? Define the normalized Frobenius distance between A and as Enron Normalized F Norm Political Books Question 1: Privacy Issue Question 2: Which type of graphs would have privacy breached? For low rank graphs which have , the distance between the reconstructed graph and the original graph can be very small 22 Synthetic Low Rank Graphs Here is a set of synthetic low rank graphs generated from Political Blogs and you can see that the reconstruction works on both the distance and features 23 Conclusion We show the relationship between graph topological structure and eigen-pairs of the adjacency matrix We propose a low rank approximation based reconstruction algorithm with a novel solution to determine the optimal rank For most social networks, our algorithm do not incur further disclosure risks of individual privacy except for networks with low ranks or a small number of dominant eigenvalues 24 Thank You! Questions? Acknowledgments This work was supported in part by U.S. National Science Foundation IIS-0546027 and CNS-0831204. 25 25