Slides - Personal Web Pages

advertisement
Reconstruction from Randomized
Graph via Low Rank Approximation
LetingWu
XiaoweiYing, XintaoWu
Dept. Software and Information Systems
Univ. of N.C. – Charlotte
Outline
 Background & Motivation
 Low Rank Approximation on Graph Data
 Reconstruction from Randomized Graph
 Evaluation
 Privacy Issue
2
Background & Motivation
3
Background
 In the process of publishing/outsourcing network data for
mining/analysis, pure anonymization is not enough for
protecting the privacy due to topology based
attacks(Active/passive attacks, subgraph attacks).
 Graph Randomization/Perturbation:
 Random Add/Del edges (no. of edges unchanged)
 Random Switch edges (nodes’ degree unchanged)
 Feature preserving randomization
 Spectrum preserving randomization
 Feature preserving via Markov-chain based graph generation
 Clustering --- grouping subgraphs into supernodes
4
Motivation
 We focus on whether we can reconstruct a graph
from
s.t.
Our Focus
5
Low Rank Approximation on Graph Data
6
Adjacency Matrix & Its Eigen-Decomposition
Matrix Representation of Network
 Adjacency Matrix A (symmetric)
 Eigen-decomposition:
Questions:
 What are their relations with graph topology?
7
Leading Eigenpairs vs. Graph Topology
 What are the role of positive and negative eigen-pairs in
graph topology?
 Without loss of generality, we partition the node set into
two groups and the adjacency matrix can be partitioned
as
where
and
represent the edges within the two
groups and
represents the edges between the groups
8
Leading Eigenpairs vs. Graph Topology
Original
r=1
9
r=2
Leading Eigenpairs vs. Graph Topology
Original
r=1
10
r=2
Leading Eigenpairs vs. Graph Topology
11
Original
r=1
r=2
r=4
Low Rank Approximation on Graph Data
 Low Rank Approximation:
This provide a best r rank approximation to A
 To keep the structure of adjacency matrix, discrete
following:
12
as
Reconstruction from Randomized Graph
13
Reconstructed Features
(Political Blogs, Rand Add/Del 40% of Edges)
14
Determine Number of Eigen-pairs
Question:
 How to choose an optimal rank r for reconstruction?
Solution:
 Choose
as the indicator since it is closely related to
the other features and there exists an explicit moment
estimator
where m is the number of edges, k is the number of edges
add/delete,
15
Algorithm
16
Evaluation
17
Effect of Noise (Political Blogs)
 The method works well to a certain level of noise
 Even with high level of noise, the reconstructed features
are still closer to the original than the randomized ones
18
Reconstructed Features on 3 real network
data
 Reconstruction Quality
 When
, the
reconstructed features are
closer to the original ones
than the randomized ones
 All positive for the three
data sets
19
Privacy Issue
20
Privacy Issue
21
Normalized F Norm
Political Blogs
Normalized F Norm
Can this
reconstruction
be used by
attackers?
 Define the
normalized
Frobenius
distance
between A and
as
Enron
Normalized F Norm
Political Books
 Question 1:
Privacy Issue
 Question 2: Which type of graphs would have privacy
breached?
 For low rank graphs which have
, the
distance between the reconstructed graph and the
original graph can be very small
22
Synthetic Low Rank Graphs
 Here is a set of
synthetic low
rank graphs
generated
from Political
Blogs and you
can see that
the
reconstruction
works on both
the distance
and features
23
Conclusion
 We show the relationship between graph topological
structure and eigen-pairs of the adjacency matrix
 We propose a low rank approximation based
reconstruction algorithm with a novel solution to
determine the optimal rank
 For most social networks, our algorithm do not incur
further disclosure risks of individual privacy except for
networks with low ranks or a small number of dominant
eigenvalues
24
Thank You!
Questions?
Acknowledgments
This work was supported in part by U.S. National
Science Foundation IIS-0546027 and CNS-0831204.
25
25
Download