LINK PREDICTION-ENABLED NETWORK COMPLETION Presenter: Cong Tran Supervisor: Prof. Won-Yong Shin Yonsei University CONTENTS Introduction Network analysis Partially observable networks Network completion problem Related work DeepNC Proposed method DeepNC (link prediction-enabled) Experimental evaluation Experimental setup Performance metric Results Conclusion and future work 6/1/2021 MACHINE INTELLIGENCE AND DATA SCIENCE LAB. – YONSEI UNIVERSITY – SOUTH KOREA 2 NETWORK MODELING Biology Information and technology Network Source of pictures: http://jonlieffmd.com/blog/how-many-different-kinds-of-neurons-are-there http://jamsessiontopics.blogspot.kr/2014/08/social-networking.html http://managementlearner.com/law-and-society/ society 0 1 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 1 1 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 Adjacency matrix (A) MACHINE INTELLIGENCE AND DATA SCIENCE LAB. – YONSEI UNIVERSITY – SOUTH KOREA 3 PARTIALLY OBSERVABLE NETWORKS Acquiring a large amount of network data is often expensive and/or hard Even when your data is complete, you may not have the computational resources to examine all of the data Partially observable networks Both nodes and edges are missing 6/1/2021 Picture: http://support.gnip.com/apis/firehose/ MACHINE INTELLIGENCE AND DATA SCIENCE LAB. – YONSEI UNIVERSITY – SOUTH KOREA 4 PARTIALLY OBSERVABLE NETWORKS 6/1/2021 MACHINE INTELLIGENCE AND DATA SCIENCE LAB. – YONSEI UNIVERSITY – SOUTH KOREA 5 NETWORK COMPLETION Community detection Influence maximization Network completion Downstream machine learning tasks Influence Multi-label graph classification 𝐍𝐇𝟐 Breast Cancer Lung Cancer Melanoma … 𝐎 6/1/2021 𝐍𝐇𝟐 MACHINE INTELLIGENCE AND DATA SCIENCE LAB. – YONSEI UNIVERSITY – SOUTH KOREA Leukemia 6 RELATED WORK: NETWORK COMPLETION PROBLEM Observable network Complete Network 0 1 1 1 0 0 ? ? 1 0 1 0 0 0 ? ? 1 1 0 1 0 0 ? ? 1 0 1 0 1 1 ? ? 0 0 0 1 0 1 ? ? 0 0 0 1 1 0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 6/1/2021 There are 2 missing nodes, how to connect them? MACHINE INTELLIGENCE AND DATA SCIENCE LAB. – YONSEI UNIVERSITY – SOUTH KOREA 7 RELATED WORK: DEEP GENERATIVE MODEL OF GRAPHS Model and efficiently sample complex distributions over graphs Learn generative latent variables from observed set of graphs After learning, the model can generate graphs having similar properties based on learned generative parameters Graphs with similar properties Learn Generative parameters Θ Deep generative model of graphs Generate RELATED WORK: DEEPNC Cong Tran, Won-Yong Shin, Andreas Spitz, and Michael Gertz. "DeepNC: Deep Generative Network Completion”. Submitted to TPAMI (in revision). Facebook in Vietnam https://arxiv.org/abs/1907.07381 Partial observation Facebook in Korea Privacy issues DeepNC Deep generative model of graphs Learn Generative parameter Θ 6/1/2021 MACHINE INTELLIGENCE AND DATA SCIENCE LAB. – YONSEI UNIVERSITY – SOUTH KOREA 9 RELATED WORK: 𝐺1 Generative parameter Θ 1 DEEPNC 𝐺 𝑃 𝐺1 𝐺𝑂 , Θ 𝑃 𝐺2 𝐺𝑂 , Θ 𝐺𝑂 𝐺2 𝑃 𝐺3 𝐺𝑂 , Θ Objective function: 𝐺 = argmax 𝑃 𝐺 𝐺𝑂 , Θ 𝐺 𝐺3 1 Cong Tran, Won-Yong Shin, Andreas Spitz, and Michael Gertz. "DeepNC: Deep Generative Network Completion” 10 NETWORK COMPLETION PROBLEM (EXTENDED) Limitation of DeepNC: Assume that the observable graph is complete => No missing edge between two observable nodes There are 2 missing nodes, how to connect them? Practical situation There is a missing edge between two observable nodes, how to recover it? 6/1/2021 MACHINE INTELLIGENCE AND DATA SCIENCE LAB. – YONSEI UNIVERSITY – SOUTH KOREA 11 LINK PREDICTION-ENABLED NETWORK COMPLETION Facebook in Vietnam Partial observation Facebook in Korea Privacy issues DeepNC (enhanced) Deep generative model of graphs Learn Generative parameter Θ 6/1/2021 MACHINE INTELLIGENCE AND DATA SCIENCE LAB. – YONSEI UNIVERSITY – SOUTH KOREA 12 PROPOSED METHOD: DEEPNC (ENHANCED) 1st iteration (1) 𝐺𝑂 Network completion 𝐺 (1) Update the observable graph 2nd iteration (2) 𝐺𝑂 Network completion 𝐺 (2) ⋮ Objective function: 𝐺 = argmax 𝑃 𝐺 𝐺𝑂 , Θ 6/1/2021 𝐺 MACHINE INTELLIGENCE AND DATA SCIENCE LAB. – YONSEI UNIVERSITY – SOUTH KOREA 13 EVALUATION Four datasets: LFR, B-A, CiteSeer, and Protein Details of datasets: https://arxiv.org/abs/1907.07381 Node sampling 70% Ground-truth graph 6/1/2021 Edge sampling DeepNC (Enhanced) 90% Partially observable graph Compare MACHINE INTELLIGENCE AND DATA SCIENCE LAB. – YONSEI UNIVERSITY – SOUTH KOREA 14 EVALUATION Performance metric: mean absolute error2 (MAE) is used to measure the difference between ground-truth graph and recovered graph – the lower, the better “Similarity” score = Matching score 1 min 𝐀 − 𝐏𝐀𝑷T 𝐏 2 Ground-truth graph (𝐀) 2A 𝑢 𝐏 is the permutation matrix fast projected fixed-point algorithm for large graph matching, Pattern Recognition, 2016 Recovered graph (𝐀) EXPERIMENTAL RESULTS 1,6 The MAE of all cases show an improvement of the enhanced version over the original DeepNC - Highest improvement rate (7.6%) can be seen from Protein dataset 1,2 1 MAE - The experiment is conducted using only 1 iteration 7.6% gain 1,4 0,8 0,6 0,4 0,2 0 LFR B-A DeepNC (original) 6/1/2021 CiteSeer Protein DeepNC (enhanced) MACHINE INTELLIGENCE AND DATA SCIENCE LAB. – YONSEI UNIVERSITY – SOUTH KOREA 16 CONCLUSION AND FUTURE WORK We introduce the partially observable network and our motivation We propose an enhanced version of DeepNC, where link prediction is enabled Future work: intensive experiments 6/1/2021 MACHINE INTELLIGENCE AND DATA SCIENCE LAB. – YONSEI UNIVERSITY – SOUTH KOREA 17 Email: congtran@ieee.org 6/1/2021 MACHINE INTELLIGENCE AND DATA SCIENCE LAB. – YONSEI UNIVERSITY – SOUTH KOREA 18