PPT - the Department of Computer Science

advertisement
Part I: Introductory Materials
Introduction to Graph Theory
Dr. Nagiza F. Samatova
Department of Computer Science
North Carolina State University
and
Computer Science and Mathematics Division
Oak Ridge National Laboratory
Graphs
G  (V , E )
V  {v1 , v2 ,..., vn }
Graph with 7 nodes and 16 edges
Nodes / Vertices
E  {ek  (vi , v j ) | vi , v j V , k  1,..., m}
Undirected
Edges
(vi , v j )  (v j , vi )
Directed
(vi , v j )  (v j , vi )
2
Types of Graphs
•
•
•
•
•
•
•
Undirected vs. Directed
Attributed/Labeled (e.g., vertex, edge) vs. Unlabeled
Weighted vs. Unweighted
General vs. Bipartite (Multipartite)
Trees (no cycles)
Hypergraphs
Simple vs. w/ loops vs. w/ multi-edges
3
Labeled Graphs and Induced Subgraphs
Labeled graph w/ loops
Bold: A subgraph induced by
vertices b, c and d
4
Graph Isomorphism
(A)
(B)
(C)
C
Which graphs are isomorphic?
5
Graph Automorphism
Automorphism is isomorphism that preserves the labels.
(A)
(B)
(C)
B
Which graphs are automorphic?
6
Vertex degree, in-degree, out-degree
t
tail
h
head
In-degree of the vertex is the
number of in-coming edges
Directed
Out-degree of the vertex is the
number of out-going edges
Degree of the vertex is the number
of edges (both in- & out-degree)
7
Graph Representation and Formats
•
•
•
•
•
Adjacency Matrix (vertex vs. vertex)
Incidence Matrix (vertex vs. edge)
Sparse vs. Dense Matrices
DIMACS file format
In R: igraph object
8
Adjacency Matrix Representation
Representation is NOT unique. Algorithms can be order-sensitive.
A(1)
A(2)
B (5)
B (6)
B (7)
B (8)
A(3)
A(2)
A(4)
A(1)
B (7)
B (6)
B (5)
B (8)
A(3)
A(4)
A(1)
A(2)
A(3)
A(4)
B(5)
B(6)
B(7)
B(8)
A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8)
1
1
1
0
1
0
0
0
1
1
0
1
0
1
0
0
1
0
1
1
0
0
1
0
0
1
1
1
0
0
0
1
1
0
0
0
1
1
1
0
0
1
0
0
1
1
0
1
0
0
1
0
1
0
1
1
0
0
0
1
0
1
1
1
A(1)
A(2)
A(3)
A(4)
B(5)
B(6)
B(7)
B(8)
A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8)
1
1
0
1
0
1
0
0
1
1
1
0
0
0
1
0
0
1
1
1
1
0
0
0
1
0
1
1
0
0
0
1
0
0
1
0
1
0
1
1
1
0
0
0
0
1
1
1
0
1
0
0
1
1
1
0
0
0
0
1
1
1
0
1
Src: “Introduction to Data Mining” by Kumar et al
9
Families of Graphs
• Cliques
• Path and simple path
• Cycle
• Tree
• Connected graphs
Read the book chapter for definitions and examples.
10
Complete Graph, or Clique
Each pair of vertices is connected.
Clique
11
The CLIQUE Problem
CLIQUE  { G, k | G has a clique of size k}
Clique: a complete subgraph
Maximal Clique: a clique
cannot be enlarged by
adding any more vertices
Maximum Clique: the largest
maximal clique in the
graph
Maximum Clique of Size 5
12
Does this graph contain a 4-clique?
Indeed it does!
But, if it had not,
what evidence would have been needed?
13
Problem: Decision, Optimization or Search
Problem
Decision
“Yes”-”No”
Optimization
Search
Enumeration
(self-reduction)
Parameter k  max/min Actual solution All solutions
• Which problem is harder to solve?
• If we solve Decision problem, can we use it for the others?
Formulate each version for the CLIQUE problem.
14
Refresher: Class P and Class NP
Definition: P (NP) is the class of languages/problems that are decidable in
polynomial time on a (non-)deterministic single-tape Turing machine.
Class
NP
P
P
DTIME (nk )
k
NP 
????
NTIME (nk )
k
non-polynomial
Nondeterministic
polynomial
Polynomially
verifiable
15
P vs. NP
The Classic Complexity Theory View:
“forget about it”
“easy”
P
P
NP ∑ 2
… PSPACE
…
“hard”
“About ten years ago some computer scientists came by and
said they heard we have some really cool problems. They
showed that the problems are NP-complete and went
16
away!”
Classical Graph Theory Problems
CSC505:Algorithms, CSC707 :Complexity Theory, CSC5??:Graph Theory
•
•
•
•
•
•
•
•
•
•
Longest Path
Maximum Clique
Minimum Vertex Cover
Hamiltonian Path/Cycle
Traveling Salesman (TSP)
Maximum Independent Set
Minimum Dominating Set
Graph/Subgraph Isomorphism
Maximum Common Subgraph
…
NP-hard
Problems
17
Graph Mining Problems
CSC 422/522 and Our Book
Many graph mining problems have to deal with classical
graph problems as part of its data mining pipeline.
•
•
•
•
•
•
•
•
Clustering + Maximal Clique Enumeration
Classification
Association Rule Mining +Frequent Subgraph Mining
Anomaly Detection
Similarity/Dissimilarity/Distance Measures
Graph-based Dimension Reduction
Link Analysis
…
18
Dealing with Computational Intractability
• Exact Algorithms:
– Small graph problems
– Small parameters to graph problems
– Special classes of graphs (e.g., bounded tree-width)
• Approximation Polynomial-Time Algorithms
(O(nc))
– Guaranteed error-bar on the solution
• Heuristic Polynomial-Time Algorithms
Our focus
– No guarantee on the quality of the solution
– Low degree polynomial solutions
19
Download