doc

advertisement
Datalog Cycle-inclusive Execution-order Sequence
CS236 Project Three
Directed Graphs
When executing a Datalog program, queries can depend on rules, which can depend on other rules, all of
which eventually depend on facts. A query q depends on a rule r if and only if the predicateName of the
query q is equal to the predicateName of the simple predicate at the beginning or head of the rule r. A rule
r1 depends on a rule r2 if and only if there exists a predicate, p, on the right-hand side of r1 such that the
predicateName of p is equal to the predicateName of the simple predicate at the beginning or head of the
rule r1. Note that since more than one rule can have the same predicateName for the simple predicate at the
beginning of the rule, a query can depend on more than one rule. Similarly, a rule with m predicates on the
right-hand side may depend on more than m rules. Note that a rule may depend on itself. The dependency
of queries and rules on facts is not defined since it is not germane to this project.
We can model these depends-on relationships as a directed graph—an abstract (discrete-math) structure
that we can use to determine the order in which we should evaluate the rules when processing a Datalog
query.
A directed graph is a pair (N, E), where N is a set of nodes and E is a set of directed edges (also called
arcs). Each arc in the directed graph is an ordered pair (N1, N2), denoting that the arc from N1 to N2. (For
our Datalog appplication the nodes are the queries and rules, and the edges are the depends-on
relationships. We will denote the n rules in the given Datalog program by rule numbers, R1, …, Rn, and
the m queries by query numbers, Q1, …, Qm.)
For our application, we will be interested in paths and cycles in directed graphs. A path is a sequence of
nodes such that each successive pair is an arc (e.g., <Q3, R7, R3, R5> is a path in a graph G if (Q3, R7),
(R7, R3), and (R3, R5) are each arcs of G). A cycle is a path that starts and stops at the same node (e.g.,
both <R2, R14, R7, R2> and <R9, R9> are cycles of G if they are paths of G).
Project Description
The given queries and rules of a Datalog program (as obtained by your parser in Project 2) induce a
directed graph in which the rules and queries are the nodes and each (N1, N2) is an arc from node N1 to node
N2 if N2 depends on N1 when the Datalog program is executed. For each query construct a cycle-inclusive
depends-on sequence over the rules the query depends on. For a directed graph G, a depends-on sequence
is a linear ordering of nodes such that if there is a directed edge from A to B, denoting a dependency of A
on B, B appears before A in the linear ordering. A depends-on sequence is cycle-inclusive if for each cycle
C in G (if any), there is a directed edge from the last node L of cycle C in the depends-on sequence to the
first node F of C and the subsequence of nodes from F to L includes all the nodes of C.
For the output, denote rules by Rn, where n is the number of the rule in the list of rules in the given Datalog
program, and queries by Qn, where n is the number of the query in the list of queries. Output the dependson sequence for a query by listing the query followed by the rules of the sequence, in order, one per line.
For each cycle, add to the line of the last rule of the cycle in the sequence, the rule designating the first rule
of the cycle in the sequence. Combine Projects 1, 2, and 3 together so that you take, as input, a Datalog
program, which may or may not be syntactically correct, and for syntactically correct Datalog programs
lists a depends-on sequence for each query.
Requirements (Get the Design Right)
Getting the design right for this project is about appropriately creating an abstract data type or ADT for a
directed graph. An abstract data type consists of an ADT specification, and an ADT implementation. The
ADT specification, which is viewable by the users of the ADT, consists of a domain specification and
public operation specifications. A C++ operation is a constructor, destructor, or method. The ADT
implementation consists of implementations for the domain and all public, protected, and private operations
and should be hidden, as much as possible, from the users of the ADT.
Conceptually we would like to use C++ “.h” files to define ADT specifications. Unfortunately, there are
problems we must overcome. You cannot write a domain specification using C++ syntax. We use
comments in a class’ “.h” file to specify the ADT domain. It is usually a more abstract, math-like
definition. Another problem is the C++ requirement that all method specifications whether they are public,
protected, or private, must appear in the “.h” file. There is not much we can do to solve this problem but to
organize and properly label public, protected and private sections of the “.h” file. A third problem arises
because C++ does not provide a means to completely specify a method, constructor, or destructor. We can
only define signatures but not semantics. For semantics we add comments before or after a public
signature in the “.h” file to provide a complete, precise, unambiguous definition of what operation does.
Though not required in this class, professional developers use pre- and post-conditions to define semantics
and provide them for public, private, and protected operations.
While operation implementations can be put in “.cpp” files where they can be hidden from the users of a
class, C++ requires that data structures used to implement a domain must be placed in “.h” files and are
thus available to all users. The best we can do is to properly label the ADT domain implementation in the
“.h” file as an implementation and not a specification.
To get the design right, you are to create a class representing the cycle-inclusive execution-order sequence
for this project. It could be named DependsOnGraph. Its domain specification (located in the “.h” file)
would be similar to the first two sentences of the second paragraph of the Directed Graphs section above.
At a minimum this class should have a constructor, destructor, and a method named produceAllCycleinclusiveDepends-onSequences. Provide complete specifications (signatures and semantics) for these three
operations. The input for your constructor should be the parse results for the rules and queries from Project
2. The domain implementation can be any data structure you choose to represent a directed graph.
The operation implementations should be placed in the “.cpp” file. You may use any algorithm of your
choice.
Examples
These are not sufficient to completely test your program. You must have output formatted exactly
like the example outputs below:
Example 1 Input
Schemes:
employee(N,A,D)
Old(N)
GT(x,y)
Facts:
employee('Dilbert','51','Custotial').
employee('Dilbert','51','Marketing').
employee('Dogbert','27','Engineering').
employee('PHB','30','Pain Management').
Rules:
Old(N) :- employee(N,A,J),GT(A, '30').
Queries:
Employee('Dilbert')?
Old(P)?
Example 2 Input
Schemes:
Parent(p,c)
Sibling(a,b)
Ancestor(x,y)
GPA(x,y)
SmartAncestor(x,y)
GT(x,y)
Facts:
Rules:
Ancestor(x,y):-Parent(x,y).
Ancestor(x,y):- Ancestor(x,z),Parent(z,y).
Ancestor(x,y):-SmartAncestor(y,x).
Sibling(x,y):-Sibling(y,x).
Smart(x):-GPA(x,y),GT(y,'3.7').
SmartAncestor(x,y):-Smart(x),Ancestor(x,y).
Queries:
SmartAncestor(x,'me')?
Sibling('me', 'Rachel')?
Example 1 Output
Employee('Dilbert')?
Old(P)?
R1
Example 2 Output
SmartAncestor(x,'me')?
R5
R1
R3 R6
R2 R2
R6
Sibling('me',
R4 R4
'Rachel')?
Download