Generating Uncertain Networks based on Historical Network Snapshots Meng Han1, Mingyuan Yan1, Jinbao Li2, Shouling Ji1 and Yingshu Li1,2 1 Department of Computer Science, Georgia State University 2 Department of Computer Science and Technology, Heilongjiang University Workshop on Computational Social Networks (CSoNet 2013) OUTLINE Background Problem Definition Algorithms and Theoretical Analysis Experimental Evaluation Conclusion 2 Various Networks Internet Molecule Structure molecule Social Network Communication Network 3 Protein-protein interaction Network Co-author Network Uncertainty Exists Everywhere Large number of uncertain networks exist in real life. Protein-protein interaction Network Topological structure of wireless sensor network The probability of protein interaction TIF34 0.75 FET3 0.95 0.651 SMT3 0.88 NTG1 RAD59 The probability of success communication 0.92 0.69 4 RPC40 Challenges 1. How to model and define real life uncertainty? No representative uncertain data set 2. Expensive to manage and mine network uncertainty Structural data much harder to manipulate 3. Difficult to decide relationships among nodes Affected by many factors 5 Contributions Model uncertainty Approximate the dynamic feature of a network by a static model endowed with some additional features. Lower computation cost of managing and mining uncertainty Employ some sampling techniques Detect relationship in uncertain networks Serve as a framework for measuring the expected number of common neighbors in uncertain graphs 6 OUTLINE Background Problem Definition Algorithms and Theoretical Analysis Experimental Evaluation Conclusion 7 Problem Definition Generating uncertain networks based on historical network snapshots. 8 OUTLINE Background Problem Definition Algorithms and Theoretical Analysis Experimental Evaluation Conclusion 9 (M1) Constant model (M2) Linear model (M3) Log model (M4) Exponential model Function assigning weight to each snapshot Existence probability assigned to edge e 10 For uncertain graph G, there are 2|E| possible worlds Ii (1 ≤ i ≤ 2|E|). 11 12 Common neighbors are very important 13 14 To enumerate all the possible worlds generated from an uncertain graph G is a #P-complete problem [6]. – Cannot enumerate all the possible worlds to calculate Endistance(u, v) in an uncertain graph! 15 If we sample at least possible worlds, we can guarantee that: Upper bound of relative error Failure probability 16 OUTLINE Background Problem Definition Algorithms and Theoretical Analysis Experimental Evaluation Conclusion 17 Dataset One typical dataset from SNAP (Stanford Large Network Dataset Collection) was used to evaluate our algorithm. 18 Effectiveness Evaluation 19 Efficiency Evaluation Even for an extremely strict requirement of correctness, e.g., =0.005 and =0.005, the sampling number is only 5.98*1012 (whole sampling space 25000) If the sampling number is only 1.4*1010, it still can be guaranteed that <0.08 and <0.05 20 OUTLINE Background Problem Definition Algorithms and Theoretical Analysis Experimental Evaluation Conclusion 21 A framework for generating uncertain networks based on historical network snapshots. Two uncertainty construction models to capture uncertainty from dynamic snapshots, and sampling techniques to improve the efficiency of the algorithm. 22 Thanks Q &A 23