OGSA-DAI DQP OGSA DAI DQP A D l

advertisement
OGSA-DAI
OGSA
DAI DQP
AD
Developer’s
l
’ Vi
View
Bartosz Dobrzelecki
Applications Consultant, EPCC
bartosz@epcc.ed.ac.uk
+44 131 650 5137
User’s View
DQPConfiguration.xml
<DQPConfiguration xmlns="http://ogsadai.org.uk/dqp/namespaces/2008/12">
<dataResources>
<
<resource
url="http://localhost:8080/dai/services"
l "htt //l
lh t 8080/d i/
i
"
dsos="DataSourceService"
drerID="DataRequestExecutionResource"
resourceID="MySQLResource"
i L
isLocal="true"/>
l "t
"/>
<resource url="http://localhost:8090/dai/services"
dsos="DataSourceService"
drerID="DataRequestExecutionResource"
reso rceID "Reso rce2"/>
resourceID="Resource2"/>
<resource url="http://localhost:8095/dai/services"
dsos="DataSourceService"
drerID="DataRequestExecutionResource"
resourceID="MySQLResource"
alias="MySQL"/>
</dataResources>
<evaluationResources>
<resource url="http://localhost:8085/dai/services"
drerID="DataRequestExecutionResource"/>
</evaluationResources>
</DQPConfiguration>
MySQLResource_employee
MySQL_employee
Query processing steps
Logical
Query
Plan
SQL query
expression
SQL
Parser
LQP
Builder
Abstract
Syntax
Tree
Optimiser
Optimiser
Optimiser
Results
Workflow
W
kfl
Builder
Execute
Partitioner
Optimised
LQP
Partitioned
LQP
OGSA-DAI
Requests
and
Sub-workflows
4
Query execution
SQL
Query
OGSA-DAI
Request
Result
OGSA-DAI
Data Node 3
OGSA-DAI-DQP
DQP Coordinator
SubWorkflow
OGSA-DAI
Request
data
OGSA-DAI
Data Node 1
data
OGSA-DAI
Data Node 2
DB1
DB2
DB3
Producing Abstract Syntax Tree (AST)
• First step: parse SQL and generate AST.
• We use ANTLR 3 to generate code from grammars.
• Two grammars:
– SQL to AST
– AST to SQL (tree grammar)
• The tree grammar is used in our OGSA-DAI Views product
which implements
p
read only
y SQL Views by
y rewriting
g AST.
• In DQP the tree grammar is used to generate string
representations for column definitions
definitions, conditions
conditions, ect
ect.
AST is a contract
• We do not expect AST to be changed.
• However, we do provide a mechanism for exposing new
operators to the language surface.
SELECT A.aname AS name
FROM aircraft A, certified C
WHERE A.aid = C.aid
Relation valued functions
SELECT A.aname AS name
FROM outerUnion(
(SELECT * FROM aircraft A),
(SELECT * FROM certified C), 'ALL') A
Logical Query Plan
• Second step: translate AST to a logical query
plan.
plan
SELECT aname AS name
FROM aircraft
WHERE aid = 10
• Operator anatomy
anatomy.
Attribute
(name, source, type)
Heading
H
di
(list of Attributes)
parent
Operator
specific
p
internals
children
c
de
OperatorID
Operators
• Behaviour defined in the Operator interface
– Validation – checks if operator gets all the input data it needs
needs,
detects missing attributes, ambiguities, deals with correlation,
performs type checking.
– Update – updates operator internals after it was (re) connected.
• Operator, Heading, Attribute objects can be annotated with
arbitrary annotations (key :String -> value :Object)
– Sample uses:
– Attribute is sorted, correlated, temporary
– Which physical algorithm for join operator
– Estimated
E ti t d cardinality
di lit
– There will be a set of default annotations
Operator family
• Unary:
–
–
–
–
–
–
–
–
–
–
• Binary:
SELECT
PROJECT
RENAME
DUPLICATE ELIMINATION
SORT
GROUP BY
SCALAR GROUP BY
ONE ROW ONLY
TABLE SCAN
EXCHANGE
–
–
–
–
–
–
–
–
–
–
INNER JOIN
PRODUCT
UNION
INTERSECTION
DIFFERENCE
FULL OUTER JOIN
[LEFT][RIGHT] OUTER JOIN
[ANTI] SEMI JOIN
APPLY
[UNARY][BINARY][SCAN]
REL_FUNCTION
Data Dictionary
• Data Dictionary provides information about federated
data resources, available evaluators (DRERs), logical
and physical table schemas.
• It is populated when the resource is initialised.
• Most of the entries can be annotated
– you
y can p
plug
g in yyour own code to be executed on
initialisation
– you may want to annotate attributes with histograms.
• TABLE_SCAN
TABLE SCAN operator
t b
builds
ild itits H
Heading
di using
i d
data
t
from Data Dictionary (on update).
• After
Aft assembling
bli LQP iis validated.
lid t d
Optimisation
• After successful validation LQP is optimised by a chain of
optimisers.
optimisers
• This chain is defined as part of the Compiler configuration.
• Optimisers need to implement a single method:
Operator optimise(Operator lqpRoot,
DataDictionary dataDictionary,
CompilerConfiguration compilerConfiguration)
throws LQPException;
Default optimisers
• Query normalisation + heuristics
–
–
–
–
Remove redundant operators
Select Push Down + implicit join detection
Rename Pull Up
p
Project Pull Up
• Join orderingg
• Partitioning – finding best places for EXCHANGE operators
• TABLE_SCAN
TABLE SCAN iimplosion
l i – pushing
hi as much
h processing
i as
we can to the RDBMS
Normalisation
AST to LQP
translator is not
trying to be smart -
SELECT Temp.name, Temp.AvgSalary
FROM (
SELECT A.aid, A.aname AS name,
AVG (
(E.salary)
y) AS y
FROM aircraft A, certified C,
employees E
WHERE A.aid = C.aid AND C.eid = E.eid
AND A.cruisingrange > 1000
GROUP BY A.aid,
A aid A.aname
A aname
) AS Temp
it takes it easy
LQP is then
y a chain
normalised by
of optimisers
Join Ordering
• Not there yet.
• Will be based on the same cost model as in OGSA-DQP.
• We will also reuse the same algorithm that produces left
deep trees.
• More sophisticated models and algorithms (considering
bushy trees, semi joins, etc.) will be implemented later on.
• You can always implement your own and replace the default.
Partitioning optimiser
• Pluggable optimiser decides how to split LQP into
partitions by inserting the EXCHANGE operator.
• Default optimiser will put most load on the “local”
evaluator (DRER) – otherwise it will choose randomly.
TABLE_SCAN Implosion
• Not there yet.
• We will always try to push as much processing as
we can to the RDBMS.
• TABLE_SCAN
_
“eats” as much of a tree as it can
and builds up an equivalent SQL query.
SELECT * FROM (
SELECT * FROM aircraft
WHERE aircraft.cruisingrange>1000
g
g
) aircraft
JOIN (
SELECT * FROM certified
) certified
ON aircraft.aid=certified.aid
SQL support level of a relational resource
• TABLE_SCAN implosion needs to know what level of SQL is supported
by the underlying resource.
– fully featured RDBMS
– simple SQL interface for csv files supporting only simple filtering or records
– a web service wrapper
• Relational resources will expose a resource property – a serialised object
i l
implementing
ti SQLS
SQLSupportLevel
tL
l interface
i t f
similar
i il to
t that
th t d
defined
fi d b
by JDBC
JDBC:
java.sql.DatabaseMetaData
java
sql DatabaseMetaData
public boolean supportsColumnAliasing()
public boolean supportsCorrelatedSubqueries()
public boolean supportsSubqueriesInComparisons()
public boolean supportsSubqueriesInExists()
...
Executing the plan
• Build phase
– Each LQP Operator has associated Activity Pipeline Builder class
which takes in Operator and returns Activity Output.
– Most operators can be mapped directly to single Activity.
– Some operators may have different implementations (for example join
operator), builder chooses default one or is guided by an Annotation.
– Operator ->
> Builder class mapping is configurable
configurable.
• Setup phase
– For
F each
h EXCHANGE Data
D t Source
S
Resource
R
iis created.
t d
• Execution phase
– All workflows (partitions) are submitted.
– Coordinator always executes sub workflow (with at least the
EXCHANGE CONSUMER operator)
EXCHANGE_CONSUMER
Extensibility points
• New Operator can be introduced by mapping relation valued function to
Operators
p
to Activity
y Pipeline
p
Builder.
• New Operator can be included in the default query normalisation by
providing strategies for SELECT push down, RENAME/PROJECT pull
up.
• Optimisation chain is configurable – it is easy to plug in new LQP
transformations.
• Alternative physical operator implementations can be introduced by
replacing default
defa lt Activity
Acti it Pipeline B
Builders
ilders – annotations can be used
sed to
choose between several implementations.
• Scalar,
Scalar aggregate and relation valued User Defined Functions will be
supported.
Introducing a new operator
SELECT A.aname AS name
FROM outerUnion(
(SELECT * FROM aircraft A),
(SELECT * FROM certified C), 'ALL') A
• LQP Builder will check if there is a mapping from outerUnion ->> Operator
and use Operator object in LQP.
• If there is no mapping – look for a relation valued function outerUnion in
the Function Repository and connect generic RELVAL_FUNCION
operator.
CompilerConfiguration.xml
<LQPCompilerConfiguration xmlns="http://ogsadai.org.uk/dqp/namespaces/2008/12">
<builders operator="GROUP_BY“
default="uk
default=
uk.org.ogsadai.dqp.execute.workflow.GroupBy
org ogsadai dqp execute workflow GroupBy"/>
/>
<builders operator="INNER_THETA_JOIN“
default="uk.org.ogsadai.dqp.execute.workflow.ProductSelect">
<builder name="HASH
name HASH_JOIN
JOIN“
class="uk.org.ogsadai.dqp.execute.workflow.HashJoin"/>
</builders>
<relationFunction name="outerUnion" operator="OUTER_UNION"/>
<operator name="OUTER_UNION“
class="uk.org.ogsadai.dqp.lqp.operators.extra.OuterUnionOperator"/>
g g
qp qp p
p
<builders operator=“OUTER_UNION“
default="uk.org.ogsadai.dqp.execute.workflow.OuterUnion"/>
<optimisationChain>
<optimiser class="uk.org.ogsadai.dqp.lqp.optimiser.QueryNormaliser" />
<optimiser class="uk.org.ogsadai.dqp.lqp.optimiser.SelectPushDown" />
</optimisationChain>
</LQPCompilerConfiguration>
User Defined Functions
• Three types
– Scalar
SELECT editDistance(a.name,
ditDi t
(
‘J
‘John’)
h ’) FROM a
– Aggregate
SELECT * FROM a HAVING a.age<median(a.age)
– Relation valued
– Unary
SELECT * FROM sample(a, 0.75)
– Binary
SELECT * FROM
f
fuse(SELECT
(SELECT * FROM a),
) (SELECT * FROM b))
– Scan (tuple producing) SELECT * FROM randomInt(0, 10, 1000)
• Implementations of sub interfaces of the Function interface
interface.
• Function Repository is part of the Data Dictionary.
Discovering Evaluator Capabilities
• We assume that every evaluation resource has the same set
of activities and UDFs
UDFs.
• Checking if activities are supported is quite easy
– Get list of supported activities from each evaluation resource (DRER)
– Ask Activity Pipeline Builder for a list of required activities
• Checking for UDF availability is more tricky
– Introduce UDF Resource + “GetUDFSchemas” activity
– Match by name and parameter list, types, return type
– Relation valued functions are problematic – they need to validate
themselves inside LQP and provide headings – this is dynamic –
function schema as a script?
Download