Fundamentals of Python: From First Programs Through Data Structures Chapter 20 Graphs Objectives After completing this chapter, you will be able to: • Use the relevant terminology to describe the difference between graphs and other types of collections • Recognize applications for which graphs are appropriate • Explain the structural differences between an adjacency matrix representation of a graph and the adjacency list representation of a graph Fundamentals of Python: From First Programs Through Data Structures 2 Objectives (continued) • Analyze the performance of basic graph operations using the two representations of graphs • Describe the differences between a depth-first traversal and a breadth-first traversal of a graph • Explain the results of the topological sort, minimum spanning tree, and single-source shortest path algorithm • Develop an ADT and implementation of a directed graph using one or both of the graph representations Fundamentals of Python: From First Programs Through Data Structures 3 Graph Terminology • Mathematically, a graph is a set V of vertices and a set E of edges, such that each edge in E connects two of the vertices in V – We use node as a synonym for vertex Fundamentals of Python: From First Programs Through Data Structures 4 Graph Terminology (continued) • One vertex is adjacent to another vertex if there is an edge connecting the two vertices (neighbors) • Path: Sequence of edges that allows one vertex to be reached from another vertex in a graph • A vertex is reachable from another vertex if and only if there is a path between the two • Length of a path: Number of edges on the path • Degree of a vertex: Number of edges connected to it – In a complete graph: Number of vertices minus one Fundamentals of Python: From First Programs Through Data Structures 5 Graph Terminology (continued) • A graph is connected if there is a path from each vertex to every other vertex • A graph is complete if there is an edge from each vertex to every other vertex Fundamentals of Python: From First Programs Through Data Structures 6 Graph Terminology (continued) • Subgraph: Subset of the graph’s vertices and the edges connecting those vertices • Connected component: Subgraph consisting of the set of vertices reachable from a given vertex Fundamentals of Python: From First Programs Through Data Structures 7 Graph Terminology (continued) • Simple path: Path that does not pass through the same vertex more than once • Cycle: Path that begins and ends at the same vertex Fundamentals of Python: From First Programs Through Data Structures 8 Graph Terminology (continued) • Graphs can be undirected or directed (digraph) • A directed edge has a source vertex and a destination vertex • Edges emanating from a given source vertex are called its incident edges Fundamentals of Python: From First Programs Through Data Structures 9 Graph Terminology (continued) • To convert undirected graph to equivalent directed graph, replace each edge in undirected graph with a pair of edges pointing in opposite directions Fundamentals of Python: From First Programs Through Data Structures 10 Graph Terminology (continued) • A special case of digraph that contains no cycles is known as a directed acyclic graph, or DAG • Lists and trees are special cases of directed graphs Fundamentals of Python: From First Programs Through Data Structures 11 Graph Terminology (continued) • A dense graph has relatively many edges • A sparse graph has relatively few edges • The number of edges in a complete directed graph with N vertices is N * (N – 1) • The number of edges in a complete undirected graph is N * (N – 1) / 2 • The limiting case of a dense graph has approximately N2 edges • The limiting case of a sparse graph has approximately N edges Fundamentals of Python: From First Programs Through Data Structures 12 Why Use Graphs? • Graphs serve as models of a wide range of objects: – – – – A roadmap A map of airline routes A layout of an adventure game world A schematic of the computers and connections that make up the Internet – The links between pages on the Web – The relationship between students and courses – A diagram of the flow capacities in a communications or transportation network Fundamentals of Python: From First Programs Through Data Structures 13 Representations of Graphs • To represent graphs, you need a convenient way to store the vertices and the edges that connect them • Two commonly used representations of graphs: – The adjacency matrix – The adjacency list Fundamentals of Python: From First Programs Through Data Structures 14 Adjacency Matrix • If a graph has N vertices labeled 0, 1, . . . , N – 1: – The adjacency matrix for the graph is a grid G with N rows and N columns – Cell G[i][ j] = 1 if there’s an edge from vertex i to j • Otherwise, there is no edge and that cell contains 0 Fundamentals of Python: From First Programs Through Data Structures 15 Adjacency Matrix (continued) • If the graph is undirected, then four more cells are occupied by 1: • If the vertices are labeled, then the labels can be stored in a separate one-dimensional array Fundamentals of Python: From First Programs Through Data Structures 16 Adjacency List • If a graph has N vertices labeled 0, 1, . . . , N – 1, – The adjacency list for the graph is an array of N linked lists – The ith linked list contains a node for vertex j if and only if there is an edge from vertex i to vertex j Fundamentals of Python: From First Programs Through Data Structures 17 Adjacency List (continued) Fundamentals of Python: From First Programs Through Data Structures 18 Adjacency List (continued) Fundamentals of Python: From First Programs Through Data Structures 19 Analysis of the Two Representations • Two commonly used graph operations: – Determine whether or not there is an edge between two given vertices • The adjacency matrix supports the first operation in constant time • The linked adjacency has linear running time with the length of this list, on the average – Find all of the vertices adjacent to a given vertex • Adjacency list tends to support this operation more efficiently than the adjacency matrix • Both are O(N) in worst case (i.e., a complete graph) Fundamentals of Python: From First Programs Through Data Structures 20 Analysis of the Two Representations (continued) • For the case of insertions of edges into the lists: – Array-based adjacency list insertion takes linear time – Linked-based adjacency list insertion requires constant time • Memory usage: – Adjacency matrix always requires N2 cells – Adjacency list requires an array of N pointers and a number of nodes equal to twice the number of edges in the case of an undirected graph Fundamentals of Python: From First Programs Through Data Structures 21 Further Run-Time Considerations • To iterate across all the neighbors of a given vertex (N = number of vertices, M = number of edges): – Adjacency matrix must traverse a row in a time that is O(N); to repeat this for all rows is O(N2) – Adjacency list time to traverse across all neighbors depends on the number of neighbors • On the average, O(M/N); to repeat this for all vertices is O(max(M, N)) – For a dense graph: O(N2) – For a sparse graph: O(N) Fundamentals of Python: From First Programs Through Data Structures 22 Graph Traversals • In addition to the insertion and removal of items, important graph-processing operations include: – Finding the shortest path to a given item in a graph – Finding all of the items to which a given item is connected by paths – Traversing all of the items in a graph • One starts at a given vertex and, from there, visits all vertices to which it connects • Different from tree traversals, which always visit all of the nodes in a given tree Fundamentals of Python: From First Programs Through Data Structures 23 A Generic Traversal Algorithm traverseFromVertex(graph, startVertex): mark all vertices in the graph as unvisited insert the startVertex into an empty collection while the collection is not empty: remove a vertex from the collection if the vertex has not been visited: mark the vertex as visited process the vertex insert all adjacent unvisited vertices into the collection Fundamentals of Python: From First Programs Through Data Structures 24 Breadth-First and Depth-First Traversals • A depth-first traversal, uses a stack as the collection in the generic algorithm • A breadth-first traversal, uses a queue as the collection in the generic algorithm Fundamentals of Python: From First Programs Through Data Structures 25 Breadth-First and Depth-First Traversals (continued) • Recursive depth-first traversal: traverseFromVertex(graph, startVertex): mark all vertices in the graph as unvisited dfs(graph, startVertex) dfs(graph, v): mark v as visited process v for each vertex, w, adjacent to v: if w has not been visited: dfs(graph, w) Fundamentals of Python: From First Programs Through Data Structures 26 Breadth-First and Depth-First Traversals (continued) • To traverse all vertices (iterative version): traverseAll(graph): mark all vertices in the graph as unvisited instantiate an empty collection for each vertex in the graph: if the vertex has not been visited: insert the vertex in the collection while the collection is not empty: remove a vertex from the collection if the vertex has not been visited: mark the vertex as visited process the vertex insert adjacent unvisited vertices into collection Fundamentals of Python: From First Programs Through Data Structures 27 Breadth-First and Depth-First Traversals (continued) • To traverse all vertices (recursive version): traverseAll(graph): mark all vertices in the graph as unvisited for each vertex, v, in the graph: if v is unvisited: dfs(graph, v) dfs(graph g, v): mark v as visited process v for each vertex, w, adjacent to v: if w is unvisited dfs(g, w) Fundamentals of Python: From First Programs Through Data Structures 28 Graph Components partitionIntoComponents(graph): lyst = [] mark all vertices in the graph as unvisited for each vertex, v, in the graph: if v is unvisited: s = set() lyst.append(s) dfs (g, v, s) return list dfs(graph, v, s): mark v as visited s.add(v) for each vertex, w, adjacent to v: if w is unvisited: dfs(g, w, s) Fundamentals of Python: From First Programs Through Data Structures 29 Trees Within Graphs • traverseFromVertex yields a tree rooted at the vertex from which the traversal starts and includes all the vertices reached during the traversal – Tree is just a subgraph of the graph being traversed – If dfs has been used, we build a depth-first search tree – It is also possible to build a breadth-first search tree Fundamentals of Python: From First Programs Through Data Structures 30 Spanning Trees and Forests • A spanning tree has the fewest number of edges possible while still retaining a connection between all the vertices in the component – If the component contains n vertices, the spanning tree contains n – 1 edges • When you traverse all the vertices of an undirected graph, you generate a spanning forest Fundamentals of Python: From First Programs Through Data Structures 31 Minimum Spanning Tree • In a weighted graph, you can sum the weights for all edges in a spanning tree and attempt to find a spanning tree that minimizes this sum – There are several algorithms for finding a minimum spanning tree for a component. • Repeated application to all the components in a graph yields a minimum spanning forest • For example, to determine how an airline can service all cities, while minimizing the total length of the routes it needs to support Fundamentals of Python: From First Programs Through Data Structures 32 Algorithms for Minimum Spanning Trees • There are two well-known algorithms for finding a minimum spanning tree: – One developed by Robert C. Prim in 1957 – The other by Joseph Kruskal in 1956 • Here is Prim’s algorithm: minimumSpanningTree(graph): mark all vertices and edges as unvisited mark some vertex, say v, as visited for all the vertices: find the least weight edge from a visited vertex to an unvisited vertex, say w mark the edge and w as visited – Maximum running time is O(m * n) Fundamentals of Python: From First Programs Through Data Structures 33 Algorithms for Minimum Spanning Trees (continued) • Improvement to algorithm: use a heap of edges – Maximum running time: O(m log n) for adjacency list Fundamentals of Python: From First Programs Through Data Structures 34 Topological Sort • A directed acyclic graph (DAG) has an order among the vertices • A topological order assigns a rank to each vertex such that the edges go from lower- to higherranked vertices • Topological sort: process of finding and returning a topological order of vertices in a graph • One topological sort algorithm is based on a graph traversal – Performance is O(m) when stack insertions are O(1) Fundamentals of Python: From First Programs Through Data Structures 35 Topological Sort (continued) Fundamentals of Python: From First Programs Through Data Structures 36 Topological Sort (continued) topologicalSort(graph g): stack = LinkedStack() mark all vertices in the graph as unvisited for each vertex, v, in the graph: if v is unvisited: dfs(g, v, stack) return stack dfs(graph, v, stack): mark v as visited for each vertex, w, adjacent to v: if w is unvisited: dfs(graph, w, stack) stack.push(v) Fundamentals of Python: From First Programs Through Data Structures 37 The Shortest-Path Problem • The single-source shortest path problem asks for a solution that contains the shortest paths from a given vertex to all of the other vertices – Has a widely used solution by Dijkstra • Is O(n2) and assumes that all weights must be positive • The all-pairs shortest path problem, asks for the set of all the shortest paths in a graph – A widely used solution by Floyd is O(n3) Fundamentals of Python: From First Programs Through Data Structures 38 Dijkstra’s Algorithm • Inputs: a directed acyclic graph with edge weights > 0 and the source vertex • Computes the distances of the shortest paths from source vertex to all other vertices in graph • Output: a two-dimensional grid, results – N rows, where N is the number of vertices • Uses a temporary list, included, of N Booleans to track if shortest path has already been determined for a vertex • Steps: initialization step and computation step Fundamentals of Python: From First Programs Through Data Structures 39 The Initialization Step for each vertex in the graph Store vertex in the current row of the results grid If vertex = source vertex Set the row’s distance cell to 0 Set the row’s parent cell to undefined Set included[row] to True Else if there is an edge from source vertex to vertex Set the row’s distance cell to the edge’s weight Set the row’s parent cell to source vertex Set included[row] to False Else Set the row’s distance cell to infinity Set the row’s parent cell to undefined Set included[row] to False Go to the next row in the results grid Fundamentals of Python: From First Programs Through Data Structures 40 The Initialization Step (continued) Fundamentals of Python: From First Programs Through Data Structures 41 The Computation Step Do Find the vertex F that is not yet included and has the minimal distance in the results grid Mark F as included For each other vertex T not included If there is an edge from F to T Set new distance to F’s distance + edge’s weight If new distance < T’s distance in the results grid Set T’s distance to new distance Set T’s parent in the results grid to F While at least one vertex is not included Fundamentals of Python: From First Programs Through Data Structures 42 The Computation Step (continued) Fundamentals of Python: From First Programs Through Data Structures 43 Analysis • The initialization step must process every vertex, so it is O(n) • The outer loop of the computation step also iterates through every vertex – The inner loop of this step iterates through every vertex not included thus far – Hence, the overall behavior of the computation step resembles that of other O(n2) algorithms • Dijkstra’s algorithm is O(n2) Fundamentals of Python: From First Programs Through Data Structures 44 Developing a Graph ADT • The graph ADT shown here creates weighted directed graphs with an adjacency list representation • In the examples, the vertices are labeled with strings and the edges are weighted with numbers • The implementation of the graph ADT shown here consists of the classes: – LinkedDirectedGraph – LinkedVertex – LinkedEdge Fundamentals of Python: From First Programs Through Data Structures 45 Example Use of the Graph ADT • Assume that you want to create the following weighted directed graph: Fundamentals of Python: From First Programs Through Data Structures 46 Example Use of the Graph ADT (continued) Fundamentals of Python: From First Programs Through Data Structures 47 Example Use of the Graph ADT (continued) • To display the neighboring vertices and the incident edges of the vertex labeled A: • Output: Fundamentals of Python: From First Programs Through Data Structures 48 The Class LinkedDirectedGraph Fundamentals of Python: From First Programs Through Data Structures 49 The Class LinkedVertex Fundamentals of Python: From First Programs Through Data Structures 50 The Class LinkedEdge Fundamentals of Python: From First Programs Through Data Structures 51 The Class LinkedEdge (continued) Fundamentals of Python: From First Programs Through Data Structures 52 Case Study: Testing Graph Algorithms • Request: – Write a program that allows the user to test some graph-processing algorithms • Analysis: – The program consists of two main classes, GraphDemoView and GraphDemoModel Fundamentals of Python: From First Programs Through Data Structures 53 Case Study: Testing Graph Algorithms (continued) • The Classes GraphDemoView and GraphDemoModel: Fundamentals of Python: From First Programs Through Data Structures 54 Case Study: Testing Graph Algorithms (continued) Fundamentals of Python: From First Programs Through Data Structures 55 Case Study: Testing Graph Algorithms (continued) • Implementation (Coding): – The view class includes methods for displaying the menu and getting a command that are similar to methods in other case studies • The other two methods get the inputs from the keyboard or a file – The model class includes methods to create a graph and run a graph-processing algorithm – The functions defined in the algorithms module accept two arguments: a graph and a start label • When the start label is not used, it can be defined as an optional argument Fundamentals of Python: From First Programs Through Data Structures 56 Summary • Graphs have many applications • A graph consists of one or more vertices (items) connected by one or more edges • Path: Sequence of edges that allows one vertex to be reached from another vertex in the graph • A graph is connected if there is a path from each vertex to every other vertex • A graph is complete if there is an edge from each vertex to every other vertex Fundamentals of Python: From First Programs Through Data Structures 57 Summary (continued) • A subgraph consists of a subset of a graph’s vertices and a subset of its edges • Connected component: a subgraph consisting of the set of vertices reachable from a given vertex • Directed graphs allow travel along an edge in just one direction • Edges can be labeled with weights, which indicate the cost of traveling along them • Graphs have two common implementations: – Adjacency matrix and adjacency list Fundamentals of Python: From First Programs Through Data Structures 58 Summary (continued) • Graph traversals explore tree-like structures within a graph, starting with a distinguished start vertex – e.g., depth-first traversal and breadth-first traversal • A minimum spanning tree is a spanning tree whose edges contain the minimum weights possible • A topological sort generates a sequence of vertices in a directed acyclic graph • The single-source shortest path problem asks for a solution that contains the shortest paths from a given vertex to all of the other vertices Fundamentals of Python: From First Programs Through Data Structures 59