Algorithm Design and Analysis (ADA) 242-535, Semester 1 2014-2015 8. Introduction to Graphs • Objective o introduce the main kinds of graphs, discuss two implementation approaches, and remind you about trees 242-535 ADA: 8. Intro. Graphs 1 Overview 1. Graphs 2. Graph Terminology 3. Implementing Graphs - adjency matrix adjency list 4. Trees and Forests 5. Tree Terminology 242-535 ADA: 8. Intro. Graphs 2 1. Graphs • A graph has two parts (V, E), where: o V are the nodes, called vertices o E are the links between vertices, called edges • Example: o airports and distance between them SFO PVD ORD LGA HNL LAX DFW MIA 1.1.Graph Types • Directed graph o the edges are directed o e.g., bus cost network • Undirected graph o the edges are undirected o e.g., road network 242-535 ADA: 8. Intro. Graphs 4 1.2. Examples cslab1a cslab1b • Electronic circuits o Printed circuit board o Integrated circuit • Transportation networks math.brown.edu cs.brown.edu o Highway network o Flight network • Computer networks brown.edu qwest.net att.net o Local area network o Internet o Web • Databases o Entity-relationship diagram 242-535 ADA: 8. Intro. Graphs cox.net John Paul David 5 Graphs are everywhere Example Nodes Edges Transportation network: airline routes airports nonstop flights Communication networks computers, hubs, routers physical wires Information network: web pages hyperlinks Information network: scientific papers articles references Social networks people “u is v’s friend”, “u sends email to v”, “u’s FaceBook links to v” Computer programs functions (or modules) statement blocks “u calls v” 242-535 ADA: 8. Intro. Graphs “v can follow u” 6 A Calling Graph • A calling graph for a program: main makeList printList mergeSort 4 examples of recursion 242-535 ADA: 8. Intro. Graphs split merge 7 Sheet Metal Hole Drilling • Problem: minimise the moving time of the drill over a metal sheet. 242-535 ADA: 8. Intro. Graphs continued 8 A Weighted Graph Version • Add edge numbers (weights) for the movement time between any two holes. 8 a 4 6 2 6 c 3 5 d 4 242-535 ADA: 8. Intro. Graphs b 9 12 e 9 2. Graph Terminology • End vertices (or endpoints) of an edge o U and V are the endpoints • Edges incident on a vertex a o a, d, and b are incident • Adjacent vertices o U and V are adjacent • Degree of a vertex o X has degree 5 • Parallel edges V b d U h X c e j Z i g W f Y o h and i are parallel edges • Self-loop o j is a self-loop 242-535 ADA: 8. Intro. Graphs 10 • Path o sequence of alternating vertices and edges o begins with a vertex o ends with a vertex o each edge is preceded and followed by its endpoints • Simple path o path such that all its vertices and edges are distinct • Examples o P1=(V,b,X,h,Z) is a simple path o P2=(U,c,W,e,X,g,Y,f,W,d,V) is a path that is not simple 242-535 ADA: 8. Intro. Graphs a U c V b d P2 P1 X e h Z g W f Y 11 • Cycle o circular sequence of alternating vertices and edges o each edge is preceded and followed by its endpoints a • Simple cycle o cycle such that all its vertices and edges are distinct • Examples o C1=(V,b,X,g,Y,f,W,c,U,a) is a simple cycle o C2=(U,c,W,e,X,g,Y,f,W,d,V,a,) is a cycle that is not simple Graphs 242-535 ADA: 8. Intro. Graphs U c V b d C2 X e C1 g W f h Z Y 12 12 Connectivity • A graph is connected if there is a path between every pair of vertices Connected graph Non connected graph with two connected components Some Properties Notation Property V E Sv degree(v) = 2*| E | Proof: each undirected edge is counted twice (called the handshaking lemma) Property In an undirected graph with no self-loops and no multiple edges |E| |V| (|V| - 1)/2 Proof: each vertex has degree at most (|V| - 1) set of vertices set of edges |. . .| the set size degree() degree of a vertex c d a Example | V| = 4 | E | = 6 b degree(a) = 3 3. Implementing Graphs • We will typically express running times in terms of |E| and |V| (often dropping the |’s) o If |E| |V|2 the graph is dense • can also write this as |E| is O(|v2|) o If |E| |V| the graph is sparse • or |E| is O(|V|) • Dense and sparse graphs are best implemented using two different data structures: o Adjacency matricies: for dense graphs o Adjacency lists: for sparse graphs 242-535 ADA: 8. Intro. Graphs 15 Dense Big-Oh • In the most dense graph, a graph of v verticies will have |V|(|V|-1)/2 edges. • In that case, for large n, |E| is O(|V|2) |V| = 5 |E| = (5*4)/2 = 10 242-535 ADA: 8. Intro. Graphs 16 • Proof that a graph of n nodes has n(n-1)/2 edges. Write as S(n) = n(n-1)/2 • Basis. S(2) = 1. True. • Inductive Case. o assume S(n) = n(n-1)/2 o try to show S(n+1) = (n+1)n/2 o we know: S(n+1) = S(n) + n o S(n+1) = n(n-1)/2 + n o S(n+1) = (n+1)n/2 242-535 ADA: 8. Intro. Graphs (1) (2) which is which is which is (2) 17 3.1. Adjacency Matrix a b c d e Graph 242-535 ADA: 8. Intro. Graphs a b c d e a b c d e 0 1 0 0 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 1 1 1 1 1 0 Adjacency Matrix 18 Properties • An adjacency matrix represents the graph as a V * V matrix A: o A[i, j] = 1 if edge (i, j) E = 0 if edge (i, j) E • The degree of a vertex v (of a simple graph) = sum of row v or sum of column v o e.g. vertex a has degree 2 since it is connected to b and e • An adjacency matrix can represent loops o e.g. vertex c on the previous slide 242-535 ADA: 8. Intro. Graphs continued 19 • An adjacency matrix can represent parallel edges if non-negative integers are allowed as matrix entries o ijth entry = no. of edges between vertex i and j • The matrix duplicates information around the main diagonal o the size can be easily reduced with some coding tricks • Properties of graphs can be obtained using matrix operations o e.g. the no. of paths of a given length, and vertex degree 242-535 ADA: 8. Intro. Graphs 20 The No. of Paths of Length n • If an adjacency matrix A is multiplied by itself repeatedly: o A, A2, A3, ..., An Then the ijth entry in matrix An is equal to the number of paths from i to j of length n. 242-535 ADA: 8. Intro. Graphs 21 Example a b A= c d 242-535 ADA: 8. Intro. Graphs e a b c d e a b c d e 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 1 0 0 22 a b c d e 0 1 0 1 0 1 0 1 0 1 A2 = 0 1 0 1 1 1 0 1 0 0 0 1 1 0 0 242-535 ADA: 8. Intro. Graphs 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 1 0 0 a = b c d e 2 0 2 0 1 0 3 1 2 1 2 1 3 0 1 0 2 0 2 1 1 1 1 1 2 23 Why it Works... • Consider row a, column c in A2: c b d a ( 0 1 0 1 0 ) 0 1 0 1 a-b-c a-d-c b d = 0*0 + 1*1 + 0*0 + 1*1 + 0*1 = 2 1 242-535 ADA: 8. Intro. Graphs continued 24 • A non-zero product means there is at least one vertex connecting verticies a and c. • The sum is 2 because of: o (a, b, c) and (a, d, c) o 2 paths of length two 242-535 ADA: 8. Intro. Graphs 25 The Degree of Verticies • The entries on the main diagonal of A2 give the degrees of the verticies (when A is a simple graph). • Consider vertex c: o degree of c == 3 since it is connected to the edges (c,b), (c,d), and (c,e). 242-535 ADA: 8. Intro. Graphs continued 26 • In A2 these become paths of length 2: o (c,b,c), (c,d,c), and (c,e,c) • So the number of paths of length 2 for c = the degree of c o this is true for all verticies 242-535 ADA: 8. Intro. Graphs 27 Coding Adjacency Matricies • #define NUMNODES n int arcs[NUMNODES][NUMNODES]; • arcs[u][v] == 1 if there is an edge (u,v); 0 otherwise • Storage used: O(|V|2) • The implementation may also need a way to map node names (strings) to array indicies. 242-535 ADA: 8. Intro. Graphs continued 28 • If n is large then the array will be very large, with almost half of it being unnecessary. • If the nodes are lightly connected then most of the array will contain 0’s, which is a further waste of memory. 242-535 ADA: 8. Intro. Graphs 29 Representing Directed Graphs • A directed graph: 0 1 3 2 242-535 ADA: 8. Intro. Graphs 4 30 Its Adjacency Matrix finish start 0 1 2 3 4 0 1 2 3 4 1 0 1 0 0 1 0 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 242-535 ADA: 8. Intro. Graphs • Not symmetric; all the array may be necessary. • Still a waste of space if nodes are lightly connected. 31 When to use an Adjacency Matrix • The adjacency matrix is an efficient way to store dense graphs. • But most large interesting graphs are sparse o e.g., planar graphs, in which no edges cross, have |e| = O(|v|) by Euler’s formula o For this reason the adjacency list is often a better respresentation than the adjacency matrix 242-535 ADA: 8. Intro. Graphs 32 Euler’s Formula Characteristic • Euler (1752) proved that for any connected graph, where: F = no. of faces E = no. of edges V = no. of verticies/nodes then the formula holds: F=E–V+2 F = 5; E = 9; V = 6 242-535 ADA: 8. Intro. Graphs 33 3.2. Adjacency List • Adjacency list: for each vertex v V, store a list of vertices adjacent to v • Example: o o o o o 0 adj[0] = {0, 1, 2} adj[1] = {3} adj[2] = {0, 1, 4} adj[3] = {2, 4} adj[4] = {1} • Can be used for directed and undirected graphs. 1 3 2 4 242-535 ADA: 8. Intro. Graphs 34 • An implementation diagram: adj[] 0 0 1 3 2 3 size of array = no. of 4 vertices (|V|) 242-535 ADA: 8. Intro. Graphs 1 2 0 1 4 2 4 1 means NULL no. of cells == no. of edges (|E|) 35 Data Structures • struct cell { /* for a linked list */ Node nodeName; struct cell *next; }; struct cell *adj[NUMNODES]; • adj[u] points to a linked list of cells which give the names of the nodes connected to u. 242-535 ADA: 8. Intro. Graphs 36 Storage Needs • How much storage is required? o The degree of a vertex v == number of incident edges • directed graphs have in-degree, out-degree values • For directed graphs, the number of items in an adjacency lists is S out-degree(v) = |E| • This uses (V + E) storage 242-535 ADA: 8. Intro. Graphs 37 • For undirected graphs, the number of items in the adjency list is S degree(v) = 2*|E| (the handshaking lemma) o Why? If we mark every edge connected to every vertex, then by the end, every edge will be marked twice • This also uses (V + E) storage • In summary, adjacency lists use (V+E) storage 242-535 ADA: 8. Intro. Graphs 38 3.3. Running Time: Matrix or List? • Which representation is better for graphs? • The simple answer: • dense graph – use a matrix • sparse graph – use an adjcency list • But a more accurate answer depends on the operations that will be applied to the graph. • We will consider three operations: o is there an edge between u and v? o find the successors of u (in a directed graph) o find the predecessors of u (in a directed graph) 242-535 ADA: 8. Intro. Graphs continued 39 Is there an edge (u,v)? • Adjacency matrix: O(1) to read arcs[u][v] • Adjacency list: O(1 + E/V) o O(1) to get to adj[u] // forget the |...| o length of linked list is on average E/V o if a sparse graph (E<<V): O(1+ E/V) => O(1) o if a dense graph (E ≈ V2): O(1+ E/V) => O(V) 242-535 ADA: 8. Intro. Graphs 40 Find u’s successors (u->v) • Adjacency matrix: O(V) since must examine the entire row for vertex u • Adjacency list: O(1 + (E/V)) since must look at entire list pointed to by adj[u] o if a sparse graph (E<<V): O(1+ E/V) => O(1) o if a dense graph (E ≈ V2): O(1+ E/V) => O(V) 242-535 ADA: 8. Intro. Graphs 41 Find u’s predecessors (t->u) • Adjacency matrix: O(V) since must examine the entire column for vertex u o a 1 in the row for ‘t’ means that ‘t’ is a predecessor • Adjacency list: O(E) since must examine every list pointed to by adj[] o if a sparse graph (E<<V): O(E) is fast o if a dense graph (E ≈ V2): O(E) is slow 242-535 ADA: 8. Intro. Graphs 42 Summary: which is faster? • Operation Find edge Find succ. Find pred. Dense Graph Adj. Matrix Either Adj. Matrix Sparse Graph Either Adj. list Either • As a graph gets denser, an adjacency matrix has better execution time than an adjacency list. 242-535 ADA: 8. Intro. Graphs 43 3.4. Storage Space: Matrix or List? • The size of an adjacency matrix for a graph of V nodes is: o V2 bits (assuming 0 and 1 are stored as bits) 242-535 ADA: 8. Intro. Graphs continued 44 • An adjacency list cell uses: o 32 bits for the integer, 32 bits for the pointer o so, cell size = 64 bits • Total no. of cells = total no. of edges, e o so, total size of lists = 64*E bits • successors[] has V entries (for V verticies) o so, array size is 32*V bits • Total size of an adjacency list data struct: 64*E + 32*V 242-535 ADA: 8. Intro. Graphs 45 Size Comparison • An adjacency list will use less storage than an adjacency matrix when: 64*E + 32*V < V2 which is: E < V2/64 – V/2 When V is large, ignore the V/2 term: E < V2/64 242-535 ADA: 8. Intro. Graphs continued 46 • V2 is (roughly) the maximum number of edges. • So if the actual number of edges in a graph is 1/64 of the maximum number of edges, then an adj. list representation will be smaller than an adj. matrix coding o but the graph must be quite sparse 242-535 ADA: 8. Intro. Graphs 47 4. Trees and Forests • A (free) tree is an undirected graph T such that o T is connected o T has no cycles This definition of tree is different from the one of a rooted tree Tree • A forest is an undirected graph without cycles • The connected components of a forest are trees Forest Graphs 242-535 ADA: 8. Intro. Graphs 48 48 Uses of Trees President Vice-President for Academics Dean of Engineering Head of CoE Vice-President for Admin. Dean of Business Head of EE Head of AC. Planning Officer .... Purchases Officer .... .... 242-535 ADA: 8. Intro. Graphs 49 Saturated Hydrocarbons H H C H C H C H H H H H H H C C C H H C Isobutane H C Butane H H H H H H • Non-rooted (free) trees o a free tree is a graph with no cycles 242-535 ADA: 8. Intro. Graphs 50 A Computer File System / usr bin ed ad vi bin spool exs opr ls mail tmp who junk uucp printer 242-535 ADA: 8. Intro. Graphs 51 5. (Rooted) Tree Terminology • e.g. Part of the ancient Greek god family: levels 0 Uranus Aphrodite Eros Zeus Apollo Athena 242-535 ADA: 8. Intro. Graphs Kronos Poseidon Hermes Atlas Hades Prometheus Ares Heracles 1 2 3 : : 52 Some Definitions • Let T be a tree with root v0. • Suppose that x, y, z are verticies in T. • (v0, v1,..., vn) is a simple path in T (no loops). • a) vn-1 is the parent of vn. • b) v0, ..., vn-1 are ancestors of vn • c) vn is a child of vn-1 242-535 ADA: 8. Intro. Graphs continued 53 • d) If x is an ancestor of y, then y is a descendant of x. • e) If x and y are children of z, then x and y are siblings. • f) If x has no children, then x is a terminal vertex (or a leaf). • g) If x is not a terminal vertex, then x is an internal (or branch) vertex. 242-535 ADA: 8. Intro. Graphs continued 54 • h) The subtree of T rooted at x is the graph with vertex set V and edge set E o V contains x and all the descendents of x o E = {e | e is an edge on a simple path from x to some vertex in V} • i) The length of a path is the number of edges it uses, not verticies. 242-535 ADA: 8. Intro. Graphs continued 55 • j) The level of a vertex x is the length of the simple path from the root to x. • k) The height of a vertex x is the length of the simple path from x to the farthest leaf o the height of a tree is the height of its root • l) A tree where every internal vertex has exactly m children is called a full m-ary tree. 242-535 ADA: 8. Intro. Graphs 56 Applied to the Example • The root is Uranus. • A simple path is {Uranus, Aphrodite, Eros} • The parent of Eros is Aphrodite. • The ancestors of Hermes are Zeus, Kronos, and Uranus. • The children of Zeus are Apollo, Athena, Hermes, and Heracles. 242-535 ADA: 8. Intro. Graphs continued 57 • The descendants of Kronos are Zeus, Poseidon, Hades, Ares, Apollo, Athena, Hermes, and Heracles. • The leaves (terminal verticies) are Eros, Apollo, Athena, Hermes, Heracles, Poseidon, Hades, Ares, Atlas, and Prometheus. • The branches (internal verticies) are Uranus, Aphrodite, Kronos, and Zeus. 242-535 ADA: 8. Intro. Graphs continued 58 • The subtree rooted at Kronos: Kronos Zeus Apollo Athena 242-535 ADA: 8. Intro. Graphs Poseidon Hermes Hades Ares Heracles continued 59 • The length of the path {Uranus, Aphrodite, Eros} is 2 (not 3). • The level of Ares is 2. • The height of the tree is 3. 242-535 ADA: 8. Intro. Graphs 60