Graph Matching Simulation based approach Shang Zechao 1010161920 Introduction • What is graph matching? • When the one graph matches with another? Introduction (cont.) • Graph: G=(V, E). GQ = (VQ, EQ) • Can be easily extended with labels. • Exact matching: isomorphism – Find a bijection function f between V and VQ – (u, v) in E iff (f(u), f(v)) in EQ Introduction (cont.) • Graph isomorphism – GI class • Sub-graph isomorphism – NP-Complete • Too hard! Simulation based approach [Henzinger95] • Find a relation S: V x VQ • (u, u’) in S if – u and u’ has same labels – for all children v’ of u’, there exists v • • V is child of u (v, v’) in S Simulation based approach • The major difference between graph simulation and graph isomorphism – Isomorphism requires an bijection (one to one) function – Graph simulation based on relation (many to many) • Simulation is in polynomial time An Example [Fan10] • Drug dealer network – B: Boss – S: Secretary – AM: Assistant manager – FW: Field worker An Example (cont.) • In real world – S and AM is same – AM maps to multiple worker Bounded Simulation [Fan10] • Each edge in pattern graph has label – Either a positive integer K – Or * (infinite) • The length of path connects these two nodes The Example (cont.) • AM should be able to reach FW within 3 hops. Matching Algorithm • Similar with the EffcientSimilarity algorithm in [Henzinger95]. – Pre-compute the distance matrix between all pairs of node in G. • Complexity O(|V||E| + |Ep||V|2 + |Vp||V|) Strong Simulation [Ma12] • Recall the condition that two nodes match: – Have same label – Children could be matched by simulation • Two issues – Parent information is not captured – Matching size is not limited An Example [Ma12] • Bio can match to Bio1, Bio2, Bio3, Bio4 – Actually only Bio4 makes sense Strong Simulation • two nodes match if: – Have same label – Children could be matched by simulation – Parent could be matched by simulation • The matched sub-graph should have same diameter as pattern graph An Example (cont.) • Bio only matches to Bio4 in strong simulation Comparison of different approaches simulation children parents connect cycle topology topology ivity info Y N N N with parent topology Y Y Y Y with diameter constrain Y Y Y Y isomorphism Y Y Y Y Comparison of different approaches simulation locality bounded bisimula bounded matches tion cycle N Y N N with parent topology N N N N with diameter constrain Y Y N N isomorphism Y N Y Y But • Bounded cycle problem is intractable – NP-hard • Bisimilar problem is intractable – coNP-hard References • [Henzinger95] M. R. Henzinger, T. A. Henzinger, and P. W. Kopke. 1995. Computing simulations on finite and infinite graphs. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science (FOCS '95). IEEE Computer Society, Washington, DC, USA, 453-. • [Fan10] Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Yinghui Wu, and Yunpeng Wu. 2010. Graph pattern matching: from intractable to polynomial time. Proc. VLDB Endow. 3, 1-2 (September 2010), 264-275. • [Ma12] Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai , Tianyu Wo. 2012. Capturing Topology in Graph Pattern Matching. PVLDB. To appear.