Lecture 9 For the final lecture:

advertisement
For the final lecture:
Lecture 9
¾ There will be some time for clarification of hard topics.
¾ PLease send me examples, topics etc that you want me to
address.
Query processing and
optimization
¾ Email: lestr@ida.liu.se
Lena Strömbäck
september 2007
2
Todays lecture
User 4
User Queries
3
Updates
Answers
User Queries
2
Updates
Answers
User
1
Updates Queries Answers
Real World
Model
¾
¾
¾
¾
Updates Queries Answers
Processing of
queries and updates
Database
management
system
Query processing
Semantic query trees and canonical form
Heuristic optimisation
Query plans and code generation
Access to stored data
Physical
database
september 2007
3
september 2007
4
SQL-query
Application schema
naming & structure
information
Parsing &
Validating
SELECT ORDER_ID, ENTRY_DATE
FROM ORDER
WHERE ENTRY_DATE > ‘2001-08-30’
σENTRY_DATE>2001-08-30
Intermediate form of query
Database
Query
Optimizer
System Catalog / DD
with Meta Data
Stored Database
with Application Data
Execution Plan (Access plan)
πORDER_ID,ENTRY_DATE
Query Code
Generator
5
σENTRY_DATE>2001-08-30
ORDER
Runtime DBprocessor
Query result
september 2007
Parsing and Validation
ORDER
Code to execute the query
Application
Data
πORDER_ID,ENTRY_DATE
<< RESULT TABLE >>
september 2007
6
1
Example
Grammar
<Query> ::= SELECT <SelList> FROM <FromList>
WHERE <Condition>
StarsIn( movieTitle, movieYear, starName )
MovieStar( name, address, gender, birthdate )
<SelList> ::= <Attribute>, <SelList>
<SelList> ::= <Attribute>
<FromList> ::= <Relation>, <FromList>
<FromList> ::= <Relation>
SELECT movieTitle
FROM StarsIn
WHERE starName IN (
SELECT name
FROM MovieStar
WHERE birthdate LIKE ’%1960’);
<Condition> ::= <Condition> AND <Condition>
<Condition> ::= <Tuple> IN (<Query>)
<Condition> ::= <Attribute> = <Attribute>
<Condition> ::= <Attribute> LIKE <Pattern>
<Tuple> ::= <Attribute>
september 2007
7
september 2007
Syntax tree
8
Semantic control
<Query>
SELECT <SelList>
FROM <FromList>
<Attribute>
<RelName>
movieTitle
StarsIn
WHERE
<Tuple>
1.
<Condition>
IN
)
(
Control of used relations
•
•
2.
<Attribute>
<Query>
Control and resolve attributes
•
starName
3.
<SelList>
FROM
<Attribute>
name
september 2007
<FromList>
WHERE
<RelName>
<Attribute>
MovieStar
birthdate
Attributes must exist in the relations
Type checking
•
SELECT
Have to be declared in FROM
Must exist in the database
Attributes that are compared must be of the same type
<Condition>
LIKE <Pattern>
’%1960’
9
september 2007
Semantic tree/Relational algebra
10
Execution plan/Access plan
πmovieTitle
one-pass
hash-join
102 buffers
starName=name
StarsIn
IndexScan(StarsIn, IndexR)
πname
σbirthdate LIKE ’%1960’
Filter(birthdate LIKE ’%1960’)
TableScan(MovieStar)
MovieStar
september 2007
11
september 2007
12
2
SQL-query
Generated code
Application schema
naming & structure
information
(very very simplified)
SELECT ORDER_ID, ENTRY_DATE
FROM ORDER
WHERE ENTRY_DATE > ‘2001-08-30’
Parsing &
Validating
σENTRY_DATE>2001-08-30
Intermediate formof query
…
for i=1 to nTuples(Moviestar)
tuple = read(Moviestar,i)
if tuple.birthdate=”%1960”
add tuple to iresult
…
…
Database
Query
Optimizer
System Catalog / DD
with Meta Data
Stored Database
with Application Data
for i=1 to nTuples(iresult)
tuple=read(iresult)
if tuple.name=IStarsIn[Starname]
add tuple to result
…
ORDER
Execution Plan (Access plan)
πORDER_ID, ENTRY_DATE
Query Code
Generator
Code to execute the query
Application
Data
13
september 2007
σENTRY_DATE>2001-08-30
ORDER
Runtime DBprocessor
Query result
september 2007
πORDER_ID, ENTRY_DATE
<< RESULT TABLE >>
14
Basic Relational Algebra
¾ Select, σ
¾ Selects a tuple from a relation
¾ σ<selection condition>(R)
Query trees and canonical form
¾ Project, π
¾ Projects a list of attributes from a relation
¾ π<attribute list>(R)
¾ Join operations
¾ R
S
¾ RXS
september 2007
15
september 2007
Write as relational algebra:
Canonical form
¾
SELECT COURSE.NAME, TEACHES.NAME
FROM COURSE, TEACHES
WHERE COURSE.CODE=TEACHES.COURSE
AND COURSE.PERIOD=VT2
september 2007
17
16
The easisest way of generating a query tree from an SQL
query:
1.
2.
3.
september 2007
Make a large table of all tables in the join using cross product
On this table, use the where clause to make a selection.
On this result, make a project to pick out the attributes pointed
out by the select clause of the query.
18
3
Cost Components
¾ Access cost to secondary storage
¾ access structure, ordering of blocks
Heuristic query optimization
¾ Storage cost
¾ Storing intermediate results on disk
¾ Computation cost
¾ in-memory searching, sorting, computation
¾ Memory usage cost
¾ memory buffers needed in the server
¾ Communication cost
¾ remote connection cost, network transfer cost
september 2007
19
september 2007
20
Sample Query Tree Execution
- projection first
Cost estimation:
σ ENT RY _D AT E> 2001-08-30 ( π OR DE R_ ID , ENT RY _D AT E ( OR DE R ) )
¾ Disc accesses are expensive
¾ Estimate the disc accesses, by estimating the amount of data
that need to be handled when computing the query
n = 2 tuples à
4+27 (=31) bytes
total: 62 bytes
σ EN TRY_ DATE >20 01-08-30
n = 6 tuples à
4+27 (=31) bytes
total: 181 bytes
π O R D ER_ID , E NT RY _D AT E
n = 6 tuples à
4+4+27 (= 35) bytes
tota l: 210 bytes
september 2007
21
september 2007
Sample Query Tree Execution
- selection first
22
ORD ER
JOIN with selection example
SELECT *
FROM ol_order_line, it_item
WHERE ol_item_id = it_item_id
AND ol_order_id = 1001
πORDER_ID, ENTRY_DATE( σENTRY _DATE>2001-08-30( ORDER ) )
n = 2 tuples à
4+27 (=31) bytes
= 62 bytes
σor_order_id=1001(ol_order_line
πORDER_ID, ENTRY_DATE
ol_item_id = it_item_id
it_item)
2)
1)
n = 2 tuples à
4+4+27 (=35) bytes
= 70 bytes
σor_order_id=1001
ol_item_id = it_item_id
σENTRY_D ATE>2001-08-30
ol_item_id = it_item_id
n = 6 tuples à
4+4+27 (= 35) bytes
= 210 bytes
ol_order_line
september 2007
23
ORDER
september 2007
it_item
σor_order_id=1001
ol_order_line
it_item
24
4
Heuristic optimisation
Example:
Idéa: Do selection and projection first, join as late as possible
Pnum Name Address Phone Email Program Enrollment
10
30 30
20
20 5
6
Code Department Examiner Description Period
6
5
10
200
5
SPNum Ccode
10
6
STUDENT relation 5000 tuples, COURSE relation 200 tuples
STUDENTCOURSE relation 100 000 tuples.
Algorithm:
¾ Break up conjunctive select into cascades
¾ Move down select as far as possible in the tree
¾ Rearrange select operations – most restrictive first
¾ Convert cross product to join with the appropriate join condition
from a selection
¾ Move down project operations as far as possible in the tree
¾ Identify subtrees that can be executed by a single algorithm
SELECT name,pnum,examiner
FROM student, course, studentcourse
WHERE code = “tddb38” and code=ccode and spnum=pnum
400 students have taken the course.
september 2007
25
september 2007
Transformation of algebra expressions
The System Catalog
1.
2.
3.
4.
5.
6.
¾ Contains useful information to predict which selections to move
down in the tree.
REL_NAME
FK_REL
september 2007
ATTR_NAME
ATTR_TYPE
DATA_LEN
NUM_DIST
26
MEMB_PK
Conjunctive selection can be broken up into a sequence.
Selection is commutative
Only the last projection in a sequence is necessary.
Projection commutes with selection
Join (and cross product) are commutative
a. If all the attributes in a selection involves only one relation in a join,
then the select can be pushed into the join.
b. If the selection condition can be written c1 AND c2 where each of the
conditions only concerns one relation, c1 and c2 can be pushed down.
MEM_FK
LOW_VAL
HIGH_VAL
27
september 2007
Transformation of algebra expressions
28
Relational algebra
πmovieTitle
7. Projection operations can be pushed into join, each attribute to the relation it
concerns. If the join condition contains additional attributes these attributes
must be added to the join expressions children in the tree.
8. Union and intersection are commutative. Set difference is not.
9. Join, cross product, union and intersection are associative.
10. Selection commutes with union, intersection and set difference.
11. Projection commutes with union.
12. Combinations of selection and cross product can be converted into join
operations.
starName=name
StarsIn
πname
σbirthdate LIKE ’%1960’
MovieStar
september 2007
29
september 2007
30
5
Execution plan
one-pass
hash-join
102 buffers
IndexScan(StarsIn, IndexR) Filter(birthdate LIKE ’%1960’)
TableScan(MovieStar)
september 2007
31
september 2007
32
september 2007
33
september 2007
34
Some Heuristics
Algorithms and code generation
september 2007
35
september 2007
36
6
Basic Algorithms for Executing Query Operations
(Primitives in node operations of query trees)
Sort-Merge
¾ External Sorting - (ORDER BY, pre-processing for efficient joins)
¾ Sorting algorithm suitable for files that do not fit in
memory
¾ Sorting is divided into two phases:
¾ Sorting
¾ sort-merge strategy
¾ The Select Operation
¾ Data scan: Linear search, binary search
¾ Index: Primary on =, Primary on range, Secondary (B+tree index)
¾ Conjunctive selections: Index+test, composite index, record pointer
intersection
¾
¾
¾ Merging
¾ The JOIN Operation
¾ Nested-loop join, Single-loop join, Sort-merge join, Hash join
¾
¾
¾ PROJECT and set operations
¾ π : strait forward, + duplicate elimination
¾ Union, Intersection, Difference : sort-merge + duplicate elimination
september 2007
september 2007
37
File is divided into ”runs” that can fit into available buffers.
Nr_of_initruns=ceiling(blocks/blocks_in_buffer)
¾
¾
september 2007
The sorted runs are merged during one or several ”passes”.
The degree of merging is the number of runs that can be
merged in each pass.
degree_of_merging=min( blocks_in_buffer – 1, nr_of_initruns)
number of passes = ceiling( logdegree_of_merging(nr_of_initruns) )
38
Sort-Merge
Select Operation
Example:
blocks_in_buffer = 5, blocks =1024 Æ nr_of_initruns=205
Degree_of_merging = 4
Pass 0: 205 runs
Pass 1: 52 runs
Pass 2: 13 runs
Pass 3: 4 runs
Pass 4: 1 run
Four passes are needed to sort merge the file.
cost = (2*blocks) + (2*(blocks*(logdegree_of_merging(blocks))))
Example cost = 10240
¾ Linear Search
39
¾ Retrieve and test every record
¾ Binary Search
¾ If the selection involved an equality comparison on a key attribute used for
file ordering.
¾ Primary or Secondary Index
¾ Use the index, eventually for several elements in an intervall.
¾ Index + Test
¾ Composite Index
¾ Record Pointer Intersection
september 2007
40
Implementing Joins
Implementing ”Project”
¾ Nested-loop
¾ If the attribute list contain the key
¾ For every record t in R, retrieve every record s from S and test the join
condition.
¾ No problem, duplicates will not occur
¾ Otherwise
¾ Single-loop
¾ Must remove duplicates
¾ For every record t in R, retrieve all matching records s from S using an
index.
¾ Sort-Merge
¾ Each, sorted, file with records are scanned once
¾ Hash
¾ Hash the record of the smaller file R into buckets. Then hash the records of
S and combine each record with all records from R in the bucket.
Must be able to fit file in memory!
september 2007
41
september 2007
42
7
Summary
¾
¾
¾
¾
¾
september 2007
Heuristic Optimization
Query processing steps
Relational algebra
Heuristic optimization
Basic algorithms for executing query operations
Cost components
43
SQL-example query
SELECT E.LNAME
FROM EMPLOYEE E, WORKS_ON W, PROJECT P
WHERE P.PNAME = ‘Aquarius’
AND P.PNUMBER = W.PNO
AND W.ESSN = E.SSN
AND E.BDATE > ‘1957-12-31’
september 2007
Heuristic Optimization –
Canonical Form πLNAME
44
Heuristic Optimization –
Move Select Down
πLNAME
σPNUMBER=PNO
σPNAME=‘Aquarius’ AND PNUMBER=PNO AND ESSN=SSN AND BDATE>’1957-12-31’
X
X
PROJECT
X
σPNAME=‘Aquarius’
X
PROJECT
σBDATE>’1957-12-31’
WORKS_ON
EMPLOYEE
september 2007
σESSN=SSN
WORKS_ON
EMPLOYEE
45
september 2007
Heuristic Optimization –
Apply Most Restrictive
πLNAME Select First
46
Heuristic Optimization – Convert
Cartesian Product/Select
with Join
πLNAME
σESSN=SSN
ESSN=SSN
X
σPNUMBER=PNO
X
σPNAME=‘Aquarius’
σBDATE>’1957-12-31’
PNUMBER=PNO
EMPLOYEE
EMPLOYEE
WORKS_ON
σPNAME=‘Aquarius’
PROJECT
september 2007
47
σBDATE>’1957-12-31’
WORKS_ON
PROJECT
september 2007
48
8
Heuristic Optimization –
Move Projections πDown
the Tree
LNAME
ESSN=SSN
πESSN
πSSN,LNAME
PNUMBER=PNO
πPNUMBER
σPNAME=‘Aquarius’
πESSN,PNO
σBDATE>’1957-12-31’
EMPLOYEE
WORKS_ON
PROJECT
september 2007
49
9
Download