Datalog Cycle-inclusive Execution-order Sequence CS236 Project Three Directed Graphs When executing a Datalog program, queries can depend on rules, which can depend on other rules, all of which eventually depend on facts. A query q depends on a rule r if and only if the predicateName of the query q is equal to the predicateName of the simple predicate at the beginning or head of the rule r. A rule r1 depends on a rule r2 if and only if there exists a predicate, p, on the right-hand side of r1 such that the predicateName of p is equal to the predicateName of the simple predicate at the beginning or head of the rule r1. Note that since more than one rule can have the same predicateName for the simple predicate at the beginning of the rule, a query can depend on more than one rule. Similarly, a rule with m predicates on the right-hand side may depend on more than m rules. Note that a rule may depend on itself. The dependency of queries and rules on facts is not defined since it is not germane to this project. We can model these depends-on relationships as a directed graph—an abstract (discrete-math) structure that we can use to determine the order in which we should evaluate the rules when processing a Datalog query. A directed graph is a pair (N, E), where N is a set of nodes and E is a set of directed edges (also called arcs). Each arc in the directed graph is an ordered pair (N1, N2), denoting that the arc from N1 to N2. (For our Datalog appplication the nodes are the queries and rules, and the edges are the depends-on relationships. We will denote the n rules in the given Datalog program by rule numbers, R1, …, Rn, and the m queries by query numbers, Q1, …, Qm.) For our application, we will be interested in paths and cycles in directed graphs. A path is a sequence of nodes such that each successive pair is an arc (e.g., <Q3, R7, R3, R5> is a path in a graph G if (Q3, R7), (R7, R3), and (R3, R5) are each arcs of G). A cycle is a path that starts and stops at the same node (e.g., both <R2, R14, R7, R2> and <R9, R9> are cycles of G if they are paths of G). Project Description The given queries and rules of a Datalog program (as obtained by your parser in Project 2) induce a directed graph in which the rules and queries are the nodes and each (N1, N2) is an arc from node N1 to node N2 if N2 depends on N1 when the Datalog program is executed. For each query construct a cycle-inclusive depends-on sequence over the rules the query depends on. For a directed graph G, a depends-on sequence is a linear ordering of nodes such that if there is a directed edge from A to B, denoting a dependency of A on B, B appears before A in the linear ordering. A depends-on sequence is cycle-inclusive if for each cycle C in G (if any), there is a directed edge from the last node L of cycle C in the depends-on sequence to the first node F of C and the subsequence of nodes from F to L includes all the nodes of C. For the output, denote rules by Rn, where n is the number of the rule in the list of rules in the given Datalog program, and queries by Qn, where n is the number of the query in the list of queries. Output the dependson sequence for a query by listing the query followed by the rules of the sequence, in order, one per line. For each cycle, add to the line of the last rule of the cycle in the sequence, the rule designating the first rule of the cycle in the sequence. Combine Projects 1, 2, and 3 together so that you take, as input, a Datalog program, which may or may not be syntactically correct, and for syntactically correct Datalog programs lists a depends-on sequence for each query. Requirements (Get the Design Right) Getting the design right for this project is about appropriately creating an abstract data type or ADT for a directed graph. An abstract data type consists of an ADT specification, and an ADT implementation. The ADT specification, which is viewable by the users of the ADT, consists of a domain specification and public operation specifications. A C++ operation is a constructor, destructor, or method. The ADT implementation consists of implementations for the domain and all public, protected, and private operations and should be hidden, as much as possible, from the users of the ADT. Conceptually we would like to use C++ “.h” files to define ADT specifications. Unfortunately, there are problems we must overcome. You cannot write a domain specification using C++ syntax. We use comments in a class’ “.h” file to specify the ADT domain. It is usually a more abstract, math-like definition. Another problem is the C++ requirement that all method specifications whether they are public, protected, or private, must appear in the “.h” file. There is not much we can do to solve this problem but to organize and properly label public, protected and private sections of the “.h” file. A third problem arises because C++ does not provide a means to completely specify a method, constructor, or destructor. We can only define signatures but not semantics. For semantics we add comments before or after a public signature in the “.h” file to provide a complete, precise, unambiguous definition of what operation does. Though not required in this class, professional developers use pre- and post-conditions to define semantics and provide them for public, private, and protected operations. While operation implementations can be put in “.cpp” files where they can be hidden from the users of a class, C++ requires that data structures used to implement a domain must be placed in “.h” files and are thus available to all users. The best we can do is to properly label the ADT domain implementation in the “.h” file as an implementation and not a specification. To get the design right, you are to create a class representing the cycle-inclusive execution-order sequence for this project. It could be named DependsOnGraph. Its domain specification (located in the “.h” file) would be similar to the first two sentences of the second paragraph of the Directed Graphs section above. At a minimum this class should have a constructor, destructor, and a method named produceAllCycleinclusiveDepends-onSequences. Provide complete specifications (signatures and semantics) for these three operations. The input for your constructor should be the parse results for the rules and queries from Project 2. The domain implementation can be any data structure you choose to represent a directed graph. The operation implementations should be placed in the “.cpp” file. You may use any algorithm of your choice. Examples These are not sufficient to completely test your program. You must have output formatted exactly like the example outputs below: Example 1 Input Schemes: employee(N,A,D) Old(N) GT(x,y) Facts: employee('Dilbert','51','Custotial'). employee('Dilbert','51','Marketing'). employee('Dogbert','27','Engineering'). employee('PHB','30','Pain Management'). Rules: Old(N) :- employee(N,A,J),GT(A, '30'). Queries: Employee('Dilbert')? Old(P)? Example 2 Input Schemes: Parent(p,c) Sibling(a,b) Ancestor(x,y) GPA(x,y) SmartAncestor(x,y) GT(x,y) Facts: Rules: Ancestor(x,y):-Parent(x,y). Ancestor(x,y):- Ancestor(x,z),Parent(z,y). Ancestor(x,y):-SmartAncestor(y,x). Sibling(x,y):-Sibling(y,x). Smart(x):-GPA(x,y),GT(y,'3.7'). SmartAncestor(x,y):-Smart(x),Ancestor(x,y). Queries: SmartAncestor(x,'me')? Sibling('me', 'Rachel')? Example 1 Output Employee('Dilbert')? Old(P)? R1 Example 2 Output SmartAncestor(x,'me')? R5 R1 R3 R6 R2 R2 R6 Sibling('me', R4 R4 'Rachel')?