Part I: Introductory Materials Introduction to Graph Theory Dr. Nagiza F. Samatova Department of Computer Science North Carolina State University and Computer Science and Mathematics Division Oak Ridge National Laboratory Graphs G (V , E ) V {v1 , v2 ,..., vn } Graph with 7 nodes and 16 edges Nodes / Vertices E {ek (vi , v j ) | vi , v j V , k 1,..., m} Undirected Edges (vi , v j ) (v j , vi ) Directed (vi , v j ) (v j , vi ) 2 Types of Graphs • • • • • • • Undirected vs. Directed Attributed/Labeled (e.g., vertex, edge) vs. Unlabeled Weighted vs. Unweighted General vs. Bipartite (Multipartite) Trees (no cycles) Hypergraphs Simple vs. w/ loops vs. w/ multi-edges 3 Labeled Graphs and Induced Subgraphs Labeled graph w/ loops Bold: A subgraph induced by vertices b, c and d 4 Graph Isomorphism (A) (B) (C) C Which graphs are isomorphic? 5 Graph Automorphism Automorphism is isomorphism that preserves the labels. (A) (B) (C) B Which graphs are automorphic? 6 Vertex degree, in-degree, out-degree t tail h head In-degree of the vertex is the number of in-coming edges Directed Out-degree of the vertex is the number of out-going edges Degree of the vertex is the number of edges (both in- & out-degree) 7 Graph Representation and Formats • • • • • Adjacency Matrix (vertex vs. vertex) Incidence Matrix (vertex vs. edge) Sparse vs. Dense Matrices DIMACS file format In R: igraph object 8 Adjacency Matrix Representation Representation is NOT unique. Algorithms can be order-sensitive. A(1) A(2) B (5) B (6) B (7) B (8) A(3) A(2) A(4) A(1) B (7) B (6) B (5) B (8) A(3) A(4) A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8) A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8) 1 1 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1 1 0 0 1 0 0 1 1 0 1 0 0 1 0 1 0 1 1 0 0 0 1 0 1 1 1 A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8) A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8) 1 1 0 1 0 1 0 0 1 1 1 0 0 0 1 0 0 1 1 1 1 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 0 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 1 1 0 0 0 0 1 1 1 0 1 Src: “Introduction to Data Mining” by Kumar et al 9 Families of Graphs • Cliques • Path and simple path • Cycle • Tree • Connected graphs Read the book chapter for definitions and examples. 10 Complete Graph, or Clique Each pair of vertices is connected. Clique 11 The CLIQUE Problem CLIQUE { G, k | G has a clique of size k} Clique: a complete subgraph Maximal Clique: a clique cannot be enlarged by adding any more vertices Maximum Clique: the largest maximal clique in the graph Maximum Clique of Size 5 12 Does this graph contain a 4-clique? Indeed it does! But, if it had not, what evidence would have been needed? 13 Problem: Decision, Optimization or Search Problem Decision “Yes”-”No” Optimization Search Enumeration (self-reduction) Parameter k max/min Actual solution All solutions • Which problem is harder to solve? • If we solve Decision problem, can we use it for the others? Formulate each version for the CLIQUE problem. 14 Refresher: Class P and Class NP Definition: P (NP) is the class of languages/problems that are decidable in polynomial time on a (non-)deterministic single-tape Turing machine. Class NP P P DTIME (nk ) k NP ???? NTIME (nk ) k non-polynomial Nondeterministic polynomial Polynomially verifiable 15 P vs. NP The Classic Complexity Theory View: “forget about it” “easy” P P NP ∑ 2 … PSPACE … “hard” “About ten years ago some computer scientists came by and said they heard we have some really cool problems. They showed that the problems are NP-complete and went 16 away!” Classical Graph Theory Problems CSC505:Algorithms, CSC707 :Complexity Theory, CSC5??:Graph Theory • • • • • • • • • • Longest Path Maximum Clique Minimum Vertex Cover Hamiltonian Path/Cycle Traveling Salesman (TSP) Maximum Independent Set Minimum Dominating Set Graph/Subgraph Isomorphism Maximum Common Subgraph … NP-hard Problems 17 Graph Mining Problems CSC 422/522 and Our Book Many graph mining problems have to deal with classical graph problems as part of its data mining pipeline. • • • • • • • • Clustering + Maximal Clique Enumeration Classification Association Rule Mining +Frequent Subgraph Mining Anomaly Detection Similarity/Dissimilarity/Distance Measures Graph-based Dimension Reduction Link Analysis … 18 Dealing with Computational Intractability • Exact Algorithms: – Small graph problems – Small parameters to graph problems – Special classes of graphs (e.g., bounded tree-width) • Approximation Polynomial-Time Algorithms (O(nc)) – Guaranteed error-bar on the solution • Heuristic Polynomial-Time Algorithms Our focus – No guarantee on the quality of the solution – Low degree polynomial solutions 19