extjoin.doc

advertisement
----------------------------------------------------------------------External Joins : Plan Construction and Evaluation
------------------------------------------------CS784 Project Report
-------------------Ravi Murthy
-------------1. External joins and their use
------------------------------A. Description
-------------A new builtin operator called ext_join is now supported by Coral. The
parameters to the operator are the relations that are to be joined and
the
output
is a new
relation formed by joining
the input
relations. The input relations are persistent i.e. they are EXODUS
relations. The ext_join operator can be used in any of the ways a
normal relation can be. It can be queried separately or can form a
part of some rule. The columns to be joined are represented in the
typical logic programming style. Since the relations are in EXODUS, a
rigourous type checking can be, and is done to ensure that the columns
being joined are of the same type. Currently, ext_join supports Grace
Join and Nested Loops Join. By default, the first two relations
specified are joined using
Grace Join algorithm and the remaining
joins (if any) are done using the Nested Loops algorithm.
Apart from
type checking,
other checks are made on
the input
parameters to ext_join to ensure the correctness of the operation. In
particular, it is checked that the given relations do exist in EXODUS
and
that their arities are correct. The restriction that the input
relations be in
EXODUS is to
enable
type checking
of join
attributes. This cannot be done if in-memory relations are permitted,
because in-memory relations are untyped in Coral.
B. Examples
----------> ?ext_join(rel1(X,Y),rel2(Y,Z)).
This is an example where
ext_join
can be
used as a separate
command. It performs the join of rel1 and rel2 where the second field
of
rel1 is joined
with the first field of rel2. Of course, this
operation will succeed only if rel1 and rel2 are both EXODUS relations
with arity 2. Further, the second field of rel1 has the same type as
the first field of rel2.
> module test.
export rel3(ff).
rel3(X,Y) :- ext_join(rel1(X,Z), rel2(Z,Y)).
end_module.
> ?rel3(X,Y).
This
is an example
where ext_join is used
within a rule. The
declaration statement is a part of a module and is first compiled
before being actually executed by the last statement.
2. Design
--------a. Query compilation
-------------------The ext_join operator can basically be viewed as a select-project-join
query.
The relations to be joined are explicitly specified.
The
environment that is passed into
the operator by the run-time system
will have some variables bound. This constitutes the selection to be
done on the relations. A Selinger-style cost based optimization can
be done to deduce the best join order. The result would be a join tree
where
each node specifies a particular 2-relation join using a
specified algorithm.
In the current implementation, no cost
based analysis is done in
generating the join tree. A default left-deep join tree, based on the
order in which the relations are
specified, is generated. Further,
each of the nodes represent
a particular join algorithm. We
now
support nested loops join and
grace join methods. Since grace join
requires that its input relations be materialized, the default tree
generated
invokes grace join for the first join (between two base
relations) and nested loops join
for every other join (which
is
between a base relation and the result of a join).
b. Query Execution
-----------------The join tree generated by compiling the query is passed as input to
the execution engine. The execution
engine evaluates the query in a
pipelined
fashion. The root
of
the tree represents the final
result. When get_next_tuple is called on the root, it may result in
corresponding calls to its children. These may cause further calls
down the tree. The result tuples move back up the tree and finally,
one result tuple is returned by the root. Thus the evaluation is done
in
a pipelined fashion i.e. a
lazy execution scheme in which
intermediate
temporary
relations
are
not materialized unless
necessary.
Since
the root of the tree (and in fact, every node of the tree)
supports the same interface as any base relation, ext_join can simply
be treated as an ordinary relation by the rule evaluator.
3. Implementation
----------------a. Builtin mechanism
-------------------Coral supports builtin relations which
are a
set of predefined
"relations" with specific actions. A query on a builtin relation would
cause the corresponding solver to be invoked.
ext_join is constructed as a builtin relation. The solver for ext_join
is called whenever this "relation" is queried. The solver constructs
the join tree or the access plan when it is invoked for the first
time. At this time, it checks the input parameters for validity and
constructs the default join tree. Every other invocation of the solver
causes the next result tuple to be returned.
b. Class Hierarchy
-----------------The following is a description of the classes that implement external
joins. ExtJoin is the base class for all external join methods. Any
join method is derived from this class. NLExtJoin class handles the
nested loops join while the GraceJoin class implements the grace hash
join algorithm.
i) NLExtJoin : The implementation
involves no futher classes.
is
fairly straight forward
and
ii) GraceJoin : The first step in this algorithm is to find out the
columns that have to be joined. Note that there can be multiple join
columns. The class JoinAttrs maintains this information. To hash a
tuple of the relation on the join columns, the values in these fields
are extracted from the tuple
and concatenated and
the resulting
aggregate field is hashed.
The next step is to split both the outer and the inner relations into
partitions. The class
Partition performs a scan on the relation,
hashes the tuples and forms partitions as temporary EXODUS relations.
Then,
the corresponding
partitions of the
outer and the inner
relations are joined
in turn. The class HashJoin handles the join
between a single partition of the outer relation and the corresponding
inner partition. The tuples of the inner partition are first inserted
into a in-memory hash table and the tuples of the outer partition are
used to probe the hash table and produce result tuples.
4. Extensibility and possible extensions
---------------------------------------The design and implementation of the ext_join operator has been done
with the intent that it be extendable. It is extensible in several
respects. New join methods can be incorporated by simply deriving a
new class from ExtJoin and providing the get_next_tuple method.
A
non-trivial generation of the join tree can be done by providing a new
implementation
for the plan_create
function. The structure of the
access
plan would still
remain the same
i.e. a tree of ExtJoin
nodes. Thus,
the basic framework
has been provided for future
improvements and extensions.
Some of the possible directions for extensions are as follows. A cost
based generation of the join tree
could take the statistics of the
input relations into account and produce a least cost ordering of the
joins. New join methods like sort-merge can be implemented.
5. Source files and test programs
--------------------------------This section contains pointers
to the various source files and
information on what they contain. Also, there are details regarding
some test programs that use the ext_join operator.
Currently,
the following
files reside in the /src/coral/extjoin
directory.
a. Plan.Ch : ext_join solver and plan_create function.
b. ExtJoin.Ch : implementation of the class ExtJoin
c. NLExtJoin.Ch : implementation of the class NLExtJoin
d. GraceJoin.Ch,
Partition.Ch, HashJoin.Ch,
JoinAttrs.Ch
:
implementation of classes related to grace hash join algorithm.
The example test programs reside in coral/bin directory.
a. load.P : definition of the EXODUS relations.
b. tests.P : some tests based on ext_join.
c. tests.info : several examples on the use of ext_join.
Download