----------------------------------------------------------------------External Joins : Plan Construction and Evaluation ------------------------------------------------CS784 Project Report -------------------Ravi Murthy -------------1. External joins and their use ------------------------------A. Description -------------A new builtin operator called ext_join is now supported by Coral. The parameters to the operator are the relations that are to be joined and the output is a new relation formed by joining the input relations. The input relations are persistent i.e. they are EXODUS relations. The ext_join operator can be used in any of the ways a normal relation can be. It can be queried separately or can form a part of some rule. The columns to be joined are represented in the typical logic programming style. Since the relations are in EXODUS, a rigourous type checking can be, and is done to ensure that the columns being joined are of the same type. Currently, ext_join supports Grace Join and Nested Loops Join. By default, the first two relations specified are joined using Grace Join algorithm and the remaining joins (if any) are done using the Nested Loops algorithm. Apart from type checking, other checks are made on the input parameters to ext_join to ensure the correctness of the operation. In particular, it is checked that the given relations do exist in EXODUS and that their arities are correct. The restriction that the input relations be in EXODUS is to enable type checking of join attributes. This cannot be done if in-memory relations are permitted, because in-memory relations are untyped in Coral. B. Examples ----------> ?ext_join(rel1(X,Y),rel2(Y,Z)). This is an example where ext_join can be used as a separate command. It performs the join of rel1 and rel2 where the second field of rel1 is joined with the first field of rel2. Of course, this operation will succeed only if rel1 and rel2 are both EXODUS relations with arity 2. Further, the second field of rel1 has the same type as the first field of rel2. > module test. export rel3(ff). rel3(X,Y) :- ext_join(rel1(X,Z), rel2(Z,Y)). end_module. > ?rel3(X,Y). This is an example where ext_join is used within a rule. The declaration statement is a part of a module and is first compiled before being actually executed by the last statement. 2. Design --------a. Query compilation -------------------The ext_join operator can basically be viewed as a select-project-join query. The relations to be joined are explicitly specified. The environment that is passed into the operator by the run-time system will have some variables bound. This constitutes the selection to be done on the relations. A Selinger-style cost based optimization can be done to deduce the best join order. The result would be a join tree where each node specifies a particular 2-relation join using a specified algorithm. In the current implementation, no cost based analysis is done in generating the join tree. A default left-deep join tree, based on the order in which the relations are specified, is generated. Further, each of the nodes represent a particular join algorithm. We now support nested loops join and grace join methods. Since grace join requires that its input relations be materialized, the default tree generated invokes grace join for the first join (between two base relations) and nested loops join for every other join (which is between a base relation and the result of a join). b. Query Execution -----------------The join tree generated by compiling the query is passed as input to the execution engine. The execution engine evaluates the query in a pipelined fashion. The root of the tree represents the final result. When get_next_tuple is called on the root, it may result in corresponding calls to its children. These may cause further calls down the tree. The result tuples move back up the tree and finally, one result tuple is returned by the root. Thus the evaluation is done in a pipelined fashion i.e. a lazy execution scheme in which intermediate temporary relations are not materialized unless necessary. Since the root of the tree (and in fact, every node of the tree) supports the same interface as any base relation, ext_join can simply be treated as an ordinary relation by the rule evaluator. 3. Implementation ----------------a. Builtin mechanism -------------------Coral supports builtin relations which are a set of predefined "relations" with specific actions. A query on a builtin relation would cause the corresponding solver to be invoked. ext_join is constructed as a builtin relation. The solver for ext_join is called whenever this "relation" is queried. The solver constructs the join tree or the access plan when it is invoked for the first time. At this time, it checks the input parameters for validity and constructs the default join tree. Every other invocation of the solver causes the next result tuple to be returned. b. Class Hierarchy -----------------The following is a description of the classes that implement external joins. ExtJoin is the base class for all external join methods. Any join method is derived from this class. NLExtJoin class handles the nested loops join while the GraceJoin class implements the grace hash join algorithm. i) NLExtJoin : The implementation involves no futher classes. is fairly straight forward and ii) GraceJoin : The first step in this algorithm is to find out the columns that have to be joined. Note that there can be multiple join columns. The class JoinAttrs maintains this information. To hash a tuple of the relation on the join columns, the values in these fields are extracted from the tuple and concatenated and the resulting aggregate field is hashed. The next step is to split both the outer and the inner relations into partitions. The class Partition performs a scan on the relation, hashes the tuples and forms partitions as temporary EXODUS relations. Then, the corresponding partitions of the outer and the inner relations are joined in turn. The class HashJoin handles the join between a single partition of the outer relation and the corresponding inner partition. The tuples of the inner partition are first inserted into a in-memory hash table and the tuples of the outer partition are used to probe the hash table and produce result tuples. 4. Extensibility and possible extensions ---------------------------------------The design and implementation of the ext_join operator has been done with the intent that it be extendable. It is extensible in several respects. New join methods can be incorporated by simply deriving a new class from ExtJoin and providing the get_next_tuple method. A non-trivial generation of the join tree can be done by providing a new implementation for the plan_create function. The structure of the access plan would still remain the same i.e. a tree of ExtJoin nodes. Thus, the basic framework has been provided for future improvements and extensions. Some of the possible directions for extensions are as follows. A cost based generation of the join tree could take the statistics of the input relations into account and produce a least cost ordering of the joins. New join methods like sort-merge can be implemented. 5. Source files and test programs --------------------------------This section contains pointers to the various source files and information on what they contain. Also, there are details regarding some test programs that use the ext_join operator. Currently, the following files reside in the /src/coral/extjoin directory. a. Plan.Ch : ext_join solver and plan_create function. b. ExtJoin.Ch : implementation of the class ExtJoin c. NLExtJoin.Ch : implementation of the class NLExtJoin d. GraceJoin.Ch, Partition.Ch, HashJoin.Ch, JoinAttrs.Ch : implementation of classes related to grace hash join algorithm. The example test programs reside in coral/bin directory. a. load.P : definition of the EXODUS relations. b. tests.P : some tests based on ext_join. c. tests.info : several examples on the use of ext_join.