Lecture 9 Announcements ou ce e ts

advertisement
Announcements
ou ce e ts
Lecture 9
¾ Remember to register for the exam!
¾ Firday 11/10 and Monday 13/10 teachers will be available to
answers questions.
questions See the home page for more information
information.
Query processing and
optimization
Lena Strömbäck
oktober 2008
2
For
o the
t e final
a lecture:
ectu e
User 4
Real World
¾ There will be some time for clarification of hard topics.
topics
Model
¾ Please send me examples, topics etc that you want me to
address by Monday
Monday.
Processing of
queries and updates
Database
management
system
¾ Email: lestr@ida.liu.se
User Queries
3
Updates
Answers
User Queries
2
Updates
Answers
User Queries
1
Updates
Answers
Updates Queries Answers
Access to stored data
Physical
database
oktober 2008
3
oktober 2008
4
SQL-query
Todays
odays lecture
ectu e
Application schema
naming & structure
information
Parsing &
Validating
V
lid i
SELECT ORDER_ID, ENTRY_DATE
FROM ORDER
WHERE ENTRY_DATE > ‘2001-08-30’
σENTRY_DATE>2001-08-30
Intermediate form of query
¾
¾
¾
¾
Query processing
Semantic query trees and canonical form
Heuristic optimisation
Q
Query
plans
l
and
d code
d generation
ti
Database
Query
Optimizer
System Catalog / DD
with Meta Data
Stored Database
with Application Data
ORDER
Execution Plan (Access plan)
πORDER_ID,ENTRY_DATE
Query Code
Generator
Code to execute the query
Application
Data
5
oktober 2008
σENTRY_DATE>2001-08-30
ORDER
Runtime DBprocessor
Query result
oktober 2008
πORDER_ID,ENTRY_DATE
<< RESULT TABLE >>
6
1
Example
a pe
Semantic
Se
a t c control
co t o
1
1.
StarsIn( movieTitle, movieYear, starName )
MovieStar( name, address, gender, birthdate )
Control of used relations
•
•
2
2.
Control and resolve attributes
•
SELECT movieTitle
FROM StarsIn
WHERE starName IN (
SELECT name
FROM M
MovieStar
i St
WHERE birthdate LIKE ’%1960’);
oktober 2008
7
3.
Semantic
Se
a t c tree/Relational
t ee/ e at o a algebra
a geb a
Attributes must exist in the relations
Type checking
•
oktober 2008
Have to be declared in FROM
Must exist in the database
Att ib t th
Attributes
thatt are compared
d mustt b
be off th
the same ttype
8
Execution
ecut o p
plan/Access
a / ccess plan
pa
πmovieTitle
one-pass
hash-join
102 buffers
starName=name
IndexScan(StarsIn IndexR)
IndexScan(StarsIn,
πname
StarsIn
Filter(birthdate LIKE ’%1960’)
%1960 )
σbirthdate LIKE ’%1960’
TableScan(MovieStar)
MovieStar
oktober 2008
9
oktober 2008
10
SQL-query
Generated
Ge
e ated code
Application schema
naming
i & structure
t t
information
(very very simplified)
SELECT ORDER_ID, ENTRY_DATE
FROM ORDER
WHERE ENTRY_DATE > ‘2001-08-30’
Parsing &
Validating
σENTRY_DATE>2001-08-30
Intermediate formof query
…
for i=1 to nTuples(Moviestar)
tuple = read(Moviestar,i)
if tuple.birthdate=”%1960”
add tuple to iresult
…
…
Database
Query
Optimizer
System Catalog / DD
with Meta Data
Stored Database
with Application Data
for i=1 to nTuples(iresult)
tuple read(iresult)
tuple=read(iresult)
if tuple.name=IStarsIn[Starname]
add tuple to result
…
ORDER
Execution Plan (Access plan)
πORDER_ID, ENTRY_DATE
Query Code
Generator
Code to execute the query
Application
Data
11
oktober 2008
σENTRY_DATE>2001-08-30
ORDER
Runtime DBprocessor
Query result
oktober 2008
πORDER_ID, ENTRY_DATE
<< RESULT TABLE >>
12
2
Relational
e at o a algebra
a geb a
¾ Selektion, σ
¾ Selects tuples from a relation
¾ σ<selektvillkor>(R)
Query trees and canonical form
¾
SELECT * FROM R WHERE <selektvillkor>
¾ Projektion, π
¾ Selects attributes from a relation
¾ π<attributlista>
tt ib tli t (R)
¾
SELECT <attributlista> FROM R
Institutionen för dataventenskap (IDA)
Linköpings universitet
oktober 2008
13
oktober 2008
14
2008-10-03
Relational
e at o a algebra
a geb a
Relationsalgebra
e at o sa geb a
¾ Cross product
¾ Sets
¾R X S
¾R U S
¾R – S
¾R I S
¾ SELECT * FROM R, S
¾ Join
¾R
<villkor>
Sida 14
S
¾ SELECT * FROM R
R,S
S WHERE <villkor>
¾ R and S have the same attributes and arity
Institutionen för dataventenskap (IDA)
Linköpings universitet
oktober 2008
15
2008-10-03
Institutionen för dataventenskap (IDA)
Linköpings universitet
Sida 15
oktober 2008
16
2008-10-03
Relational
e at o a algebra
a geb a
Relational
e at o a algebra
a geb a
¾ Combine
¾ Aggregates
πFNAME, LNAME, SALARY(σDNO=5(EMPLOYEE))
<group attributes>F<functions>(R)
- SELECT FNAME, LNAME, SALARY FROM EMPLYEE WHERE DNO=5
ex:
¾ Rename
DNOF<COUNT SSN, AVERAGE SALARY>(EMPLOYEE)
¾ ρS(B1,B2,…,Bn)(R)
¾ ρS (R)
¾ ρ(B1,B2,…,Bn)(R)
¾
SELECT COUNT(SSN), AVERAGE(SALARY)
FROM EMPLOYEE
GROUP BY DNO
FROM R AS S(B1,B2,…,BN)
Institutionen för dataventenskap (IDA)
Linköpings universitet
oktober 2008
17
Sida 16
2008-10-03
Institutionen för dataventenskap (IDA)
Linköpings universitet
Sida 17
oktober 2008
18
2008-10-03
Sida 18
3
Write
te as relational
e at o a algebra:
a geb a
Canonical
Ca
o ca form
o
¾
SELECT COURSE.NAME,
COURSE NAME TEACHES
TEACHES.NAME
NAME
FROM COURSE, TEACHES
WHERE COURSE.CODE=TEACHES.COURSE
AND COURSE.PERIOD=VT2
oktober 2008
19
The easisest way of generating a query tree from an SQL
query:
1.
2.
3.
oktober 2008
Make a large table of all tables in the join using cross product
On this table, use the where clause to make a selection.
On this result, make a project to pick out the attributes pointed
out by the select clause of the query.
20
Cost Co
Components
po e ts
¾ Access cost to secondary storage
¾ access structure, ordering of blocks
Heuristic query optimization
¾ Storage cost
¾ Storing intermediate results on disk
¾ Computation cost
¾ in-memory searching, sorting, computation
¾ Memory
M
usage costt
¾ memory buffers needed in the server
¾ Communication cost
¾ remote connection cost, network transfer cost
oktober 2008
21
oktober 2008
22
Sample
p Query
y Tree Execution
- projection first
Cost estimation:
est at o
σ ENTRY_
ENTRY DATE> 20 01 -08
08 -30
30 ( π OR DER_
DER ID , E NT RY_
RY DAT E ( OR DE R ) )
¾ Disc accesses are expensive
¾ Estimate the disc accesses, by estimating the amount of data
that need to be handled when computing the query
σ ENTRY_
ENTRY DAT E>20 0 1-0
1 0 8-30
8 30
n = 2 tuples à
4+27 (=31) bytes
total: 62 bytes
y
n = 6 tuples à
4+27 ((=31)
31) bytes
total: 181 bytes
π OR DER_ ID, ENTRY_ DATE
n = 6 tuples à
4+4+27 (= 35) bytes
tota l: 210 bytes
oktober 2008
23
oktober 2008
24
O RD ER
4
Sample
p Query
y Tree Execution
- selection first
JOIN with
JO
t se
selection
ect o example
e a pe
SELECT *
FROM ol_order_line, it_item
WHERE ol_item_id
_
_
= it_item_id
_
_
AND ol_order_id = 1001
πORDER_ID, ENTRY_DATE( σENTRY_DATE>2001-08-30( ORDER ) )
n = 2 tuples à
4+27 (=31) bytes
= 62 bytes
σor_order_id=1001
or order id=1001((ol_order_line
πORDER_ID, ENTRY_DATE
ol item id = it_item_id
ol_item_id
it item id
it_item))
2)
1)
n = 2 tuples à
4+4+27 (=35) bytes
= 70 bytes
y
σor_order_id=1001
ol_item_id = it_item_id
σENTRY_DATE>2001-08-30
ol_item_id = it_item_id
n = 6 tuples à
4+4+27 (= 35) bytes
= 210 bytes
ol_order_line
oktober 2008
25
oktober 2008
ORDER
σor_order_id=1001
it_item
ol_order_line
it_item
26
Heuristic
eu st c optimisation
opt sat o
Example:
p
Idéa: Do selection and p
projection
j
first, join
j
as late as possible
p
Pnum Name Address Phone Email Program
g
Enrollment
10
30 30
20
20 5
6
Code Department Examiner Description Period
6
5
10
200
5
SPNum Ccode
10
6
STUDENT relation 5000 tuples, COURSE relation 200 tuples
STUDENTCOURSE relation 100 000 tuples
tuples.
Algorithm:
¾ Break up conjunctive select into cascades
¾ Move down select as far as possible in the tree
¾ Rearrange select operations – most restrictive first
¾ Convert cross product to join with the appropriate join condition
from a selection
¾ Move
M
d
down project
j t operations
ti
as ffar as possible
ibl iin th
the ttree
¾ Identify subtrees that can be executed by a single algorithm
SELECT name,pnum,examiner
FROM student, course, studentcourse
WHERE code = “tddb38” and code=ccode and spnum=pnum
400 students have taken the course.
oktober 2008
27
oktober 2008
Transformation of algebra expressions
The
e Syste
System Catalog
Cata og
1.
2.
3.
4.
5.
6
6.
¾ Contains useful information to predict which selections to move
down in the tree.
REL_NAME
ATTR_NAME
FK_REL
oktober 2008
29
ATTR_TYPE
DATA_LEN
NUM_DIST
28
MEMB_PK
LOW_VAL
Conjunctive selection can be broken up into a sequence.
Selection is commutative
Only the last projection in a sequence is necessary.
Projection commutes with selection
Join (and cross product) are commutative
a If all the attributes in a selection involves only one relation in a join
a.
join,
then the select can be pushed into the join.
b. If the selection condition can be written c1 AND c2 where each of the
conditions only concerns one relation, c1 and c2 can be pushed down.
MEM_FK
HIGH_VAL
oktober 2008
30
5
Transformation of algebra
g
expressions
p
Relational
e at o a algebra
a geb a
πmovieTitle
7. Projection
j
operations
p
can be p
pushed into jjoin,, each attribute to the relation it
concerns. If the join condition contains additional attributes these attributes
must be added to the join expressions children in the tree.
8. Union and intersection are commutative. Set difference is not.
9 Join,
9.
Join cross product
product, union and intersection are associative
associative.
10. Selection commutes with union, intersection and set difference.
11. Projection commutes with union.
12. Combinations of selection and cross p
product can be converted into jjoin
operations.
starName=name
St I
StarsIn
πname
σbirthdate LIKE ’%1960’
MovieStar
oktober 2008
31
oktober 2008
32
Execution
ecut o plan
pa
one pass
one-pass
hash-join
102 buffers
IndexScan(StarsIn, IndexR) Filter(birthdate LIKE ’%1960’)
TableScan(MovieStar)
(
)
oktober 2008
33
oktober 2008
34
oktober 2008
35
oktober 2008
36
6
Some Heuristics
Algorithms and code generation
oktober 2008
37
oktober 2008
38
Basic Algorithms for Executing Query Operations
(P i i i
(Primitives
iin node
d operations
i
off query trees))
So t e ge
Sort-Merge
¾ External Sorting
g - ((ORDER BY,, p
pre-processing
p
g for efficient joins)
j
)
¾ Sorting algorithm suitable for files that do not fit in
memory
¾ Sorting is divided into two phases:
¾ Sorting
g
¾ sort-merge strategy
¾ The Select Operation
¾ Data scan: Linear search, binary search
¾ Index:
I d
P
Primary
i
on =, Primary
Pi
on range, Secondary
S
d
(B+tree
(B t
index)
i d )
¾ Conjunctive selections: Index+test, composite index, record pointer
intersection
¾
¾
¾ Merging
¾ The JOIN Operation
¾ Nested-loop join, Single-loop join, Sort-merge join, Hash join
¾
¾
¾ PROJECT and set operations
¾ π : strait forward, + duplicate elimination
¾ Union,
Union Intersection,
Intersection Difference : sort-merge + duplicate elimination
oktober 2008
oktober 2008
39
File is divided into ”runs” that can fit into available buffers.
Nr_of_initruns=ceiling(blocks/blocks_in_buffer)
¾
¾
oktober 2008
The sorted runs are merged during one or several ”passes”.
The degree of merging is the number of runs that can be
merged in each pass.
degree of merging min( blocks_in_buffer
degree_of_merging=min(
blocks in buffer – 1,
1 nr
nr_of_initruns)
of initruns)
number of passes = ceiling( logdegree_of_merging(nr_of_initruns) )
40
So t e ge
Sort-Merge
Select
Se
ect Operation
Ope at o
Example:
blocks_in_buffer = 5, blocks =1024 Æ nr_of_initruns=205
Degree_of_merging = 4
Pass 0: 205 runs
Pass 1: 52 runs
Pass 2: 13 runs
Pass 3: 4 runs
Pass 4: 1 run
Four passes are needed to sort merge the file.
cost = (2*blocks) + (2*(blocks*(logdegree_of_merging(blocks))))
Example cost = 10240
¾ Linear Search
41
¾ Retrieve and test every record
¾ Binary Search
¾ If the selection involved an equality comparison on a key attribute used for
file ordering.
¾ Primary or Secondary Index
¾ Use the index, eventually for several elements in an intervall.
¾ Index
I d +T
Test
¾ Composite Index
¾ Record Pointer Intersection
oktober 2008
42
7
Implementing
p e e t g Joins
Jo s
Implementing
p e e t g ”Project”
oject
¾ Nested
Nested-loop
loop
¾ If the attribute list contain the key
¾ For every record t in R, retrieve every record s from S and test the join
condition.
¾ No problem, duplicates will not occur
¾ Otherwise
¾ Single-loop
¾ Must remove duplicates
¾ For every record t in R, retrieve all matching records s from S using an
index.
¾ Sort-Merge
¾ Each,
Each sorted,
sorted file with records are scanned once
¾ Hash
¾ Hash the record of the smaller file R into buckets. Then hash the records of
S and combine each record with all records from R in the bucket.
Must be able to fit file in memory!
oktober 2008
43
oktober 2008
Su
Summary
ay
¾
¾
¾
¾
¾
oktober 2008
44
Heuristic
eu st c Optimization
Opt
at o
Query processing steps
Relational algebra
Heuristic optimization
B i algorithms
Basic
l ith
ffor executing
ti query operations
ti
Cost components
45
SQL-example
SQL
example query
SELECT E.LNAME
FROM EMPLOYEE E, WORKS_ON W, PROJECT P
WHERE P.PNAME
PNAME = ‘Aquarius’
‘A
i ’
AND P.PNUMBER = W.PNO
AND W.ESSN = E.SSN
AND E.BDATE > ‘1957-12-31’
oktober 2008
Heuristic Optimization
p
–
Canonical Form πLNAME
46
Heuristic Optimization
p
–
Move Select Down
πLNAME
σPNUMBER=PNO
σPNAME=‘Aquarius’
PNAME ‘A
i ’ AND PNUMBER
PNUMBER=PNO
PNO AND ESSN
ESSN=SSN
SSN AND BDATE>’1957-12-31’
BDATE ’1957 12 31’
X
X
PROJECT
X
EMPLOYEE
oktober 2008
47
σESSN=SSN
σPNAME=‘Aquarius’
X
PROJECT
σBDATE>’1957-12-31’
WORKS_ON
WORKS_ON
EMPLOYEE
oktober 2008
48
8
Heuristic Optimization
p
–
Apply Most Restrictive
πLNAME Select First
Heuristic Optimization
p
– Convert
Cartesian Product/Select
with Join
πLNAME
σESSN=SSN
ESSN=SSN
X
σPNUMBER=PNO
X
σPNAME=‘Aquarius’
σBDATE>’1957-12-31’
PNUMBER=PNO
EMPLOYEE
EMPLOYEE
σPNAME=‘Aquarius’
WORKS_ON
PROJECT
oktober 2008
σBDATE>’1957-12-31’
WORKS_ON
PROJECT
49
oktober 2008
50
Heuristic Optimization
p
–
Move Projections πDown
the Tree
LNAME
ESSN=SSN
πESSN
πSSN,LNAME
PNUMBER=PNO
πPNUMBER
σPNAME=‘Aquarius’
PNAME ‘Aquarius’
πESSN,PNO
SS
O
σBDATE>’1957-12-31’
EMPLOYEE
WORKS ON
WORKS_ON
PROJECT
oktober 2008
51
9
Download