Slides 19 in PPT - TAMU Computer Science Faculty Pages

advertisement
CSCE 310 / 608
Database Systems
Chapter 15: Query Execution
1
Index-Based Algorithms

The existence of an index is especially helpful
for selection, and helps others

Clustered relation: tuples are packed into the
minimum number of blocks
 Clustering index: all tuples with the same
value for the index's search key are packed
into the minimum number of blocks
2
Index-Based Selection

Without an index, selection takes B(R), or
even T(R), disk I/O's.
 To select all tuples with attribute a equal to
value v, when there is an index on a:


search the index for value v and get pointers to
exactly the blocks containing the desired tuples
If index is clustering, then number of disk I/O's
is about B(R)/V(R,a)
3
Examples

Suppose B(R) = 1000, T(R) = 20,000, there is an
index on a and we want to select all tuples with a
= 0.





If R is clustered and don't use index: 1000 disk I/O's
If R is not clustered and don't use index: 20,000 disk
I/O's
If V(R,a) = 100, index is clustering, and use index:
1000/100 = 10 disk I/O's (on average)
If V(R,a) = 10, R is not clustered, index is nonclustering, and use index: 20,000/10 = 2000 disk
I/O's (on average)
If V(R,a) = 20,000 (a is a key) and use index: 1 disk
I/O
4
Using Indexes in Other Operations
1.
2.
If the index is a B-tree, can efficiently select
tuples with indexed attribute in a range
If selection is on a complex condition such
as "a = v AND …", first do the index-based
algorithm to get tuples satisfying "a = v".

Such splitting is part of the job of the query
optimizer
5
Index-Based Join Algorithm

Consider natural join of R(X,Y) and S(Y,Z).
 Suppose S has an index on Y.
for each block of R
for each tuple t in the current block
use index on S to find tuples of S that match t in
the attribute(s) Y
output the join of these tuples
6
Analysis of Index-Based Join

To get all the blocks of R, either B(R) or T(R)
disk I/O's are needed
 For each tuple of R, there are on average
T(S)/V(S,Y) matching tuples of S



T(R)*T(S)/V(S,Y) disk I/O's if index is not clustering
T(R)*B(S)/V(S,Y) disk I/O's if index is clustering
This method is efficient if R is much smaller than
S and V(S,Y) is large (i.e., not many tuples of S
match)
7
Join Using a Sorted Index
Suppose we want to join R(X,Y) and
S(Y,Z).
 Suppose we have a sorted index (e.g.,
B-tree) on Y for R and S:

do sort-join but
 no need to sort the indexed relations first

8
Buffer Management




The availability of blocks (buffers) of main memory is
controlled by buffer manager.
When a new buffer is needed, a replacement policy is
used to decide which existing buffer should be
returned to disk.
If the number of buffers available for an operation
cannot be predicted in advance, then the algorithm
chosen must degrade gracefully as the number of
buffers shrinks.
If the number of buffers available is not large enough
for a two-pass algorithm, then there are
generalizations to algorithms that use three or more
passes.
9
CSCE 608 - 600
Database Systems
Chapter 16: Query Compiler
10
Query Compiler
Parsing
Logical Query Plan
11
SQL query
parse
parse tree
convert
answer
logical query plan
apply laws
“improved” l.q.p
estimate result sizes
l.q.p. +sizes
consider physical plans
{P1,P2,…..}
execute
statistics
Pi
pick best
{(P1,C1),(P2,C2)...}
estimate costs
Outline

Convert SQL query to a parse tree


Convert to a logical query plan (relational algebra
expression)


deal with subqueries
Improve the logical query plan




Semantic checking: attributes, relation names, types
use algebraic transformations
group together certain operators
evaluate logical plan based on estimated size of relations
Convert to a physical query plan



search the space of physical plans
choose order of operations
complete the physical query plan
13
Parsing

Goal is to convert a text string containing a
query into a parse tree data structure:



leaves form the text string (broken into lexical
elements)
internal nodes are syntactic categories
Uses standard algorithmic techniques from
compilers

given a grammar for the language (e.g., SQL),
process the string and build the tree
14
Example: SQL query
SELECT title
FROM StarsIn
WHERE starName IN (
SELECT name
FROM MovieStar
WHERE birthdate LIKE ‘%1960’
);
(Find the movies with stars born in 1960)
Assume we have a simplified grammar for SQL.
15
Example: Parse Tree
<Query>
<SFW>
SELECT <SelList>
FROM
<FromList>
<Attribute>
<RelName>
title
StarsIn
SELECT
<SelList>
FROM
WHERE
<Condition>
<Tuple> IN <Query>
<Attribute>
<FromList>
<Attribute>
<RelName>
name
MovieStar
( <Query> )
starName
<SFW>
WHERE
<Condition>
<Attribute> LIKE <Pattern>
birthDate
‘%1960’
16
The Preprocessor

replaces each reference to a view with a parse
(sub)-tree that describes the view (i.e., a query)
 does semantic checking:



are relations and views mentioned in the schema?
are attributes mentioned in the current scope?
are attribute types correct?
17
Outline

Convert SQL query to a parse tree


Convert to a logical query plan (relational algebra
expression)


deal with subqueries
Improve the logical query plan




Semantic checking: attributes, relation names, types
use algebraic transformations
group together certain operators
evaluate logical plan based on estimated size of relations
Convert to a physical query plan



search the space of physical plans
choose order of operations
complete the physical query plan
18
Convert Parse Tree to Relational Algebra
Complete algorithm depends on specific
grammar, which determines forms of the
parse trees
 Here give a flavor of the approach

19
Conversion

Suppose there are no subqueries.

SELECT att-list FROM rel-list WHERE cond
is converted into
PROJatt-list(SELECTcond(PRODUCT(rel-list))), or
att-list(cond( X (rel-list)))
20
SELECT movieTitle
FROM StarsIn, MovieStar
WHERE starName = name AND birthdate LIKE '%1960';
<Query>
<SFW>
SELECT <SelList>
<Attribute>
movieTitle
FROM
<FromList>
WHERE
<Condition>
<RelName> , <FromList>
StarsIn
AND <Condition>
<RelName>
<Attribute> LIKE <Pattern>
MovieStar
birthdate
<Condition>
<Attribute> = <Attribute>
starName
name
'%1960'
Equivalent Algebraic Expression Tree
movieTitle
 starname = name AND birthdate LIKE '%1960'
X
StarsIn
MovieStar
22
Handling Subqueries

Recall the (equivalent) query:
SELECT title
FROM StarsIn
WHERE starName IN (
SELECT name
FROM MovieStar
WHERE birthdate LIKE ‘%1960’
);

Use an intermediate format called twoargument selection
23
Example: Two-Argument Selection
title

StarsIn
<condition>
<tuple>
<attribute>
starName
IN
name
birthdate LIKE ‘%1960’
MovieStar
24
Converting Two-Argument Selection

To continue the conversion, we need rules for
replacing two-argument selection with a
relational algebra expression
 Different rules depending on the nature of the
subquery
 Here show example for IN operator and
uncorrelated query (subquery computes a
relation independent of the tuple being tested)
25
Rules for IN

R
C
<Condition>
t
IN
S
X
R

S
C is the condition that equates
attributes in t with corresponding
attributes in S
26
Example: Logical Query Plan
title
starName=name

StarsIn

name
birthdate LIKE ‘%1960’
MovieStar
27
What if Subquery is Correlated?

Example is when subquery refers to the
current tuple of the outer scope that is being
tested
 More complicated to deal with, since subquery
cannot be translated in isolation
 Need to incorporate external attributes in the
translation
 Some details are in textbook
28
Outline

Convert SQL query to a parse tree


Convert to a logical query plan (relational algebra
expression)


deal with subqueries
Improve the logical query plan




Semantic checking: attributes, relation names, types
use algebraic transformations
group together certain operators
evaluate logical plan based on estimated size of relations
Convert to a physical query plan



search the space of physical plans
choose order of operations
complete the physical query plan
29
Improving the Logical Query Plan



There are numerous algebraic laws
concerning relational algebra operations
By applying them to a logical query plan
judiciously, we can get an equivalent query
plan that can be executed more efficiently
Next we'll survey some of these laws
30
Associative and Commutative Operations

product
 natural join
 set and bag union
 set and bag intersection

associative: (A op B) op C = A op (B op C)
 commutative: A op B = B op A
31
Laws Involving Selection
Selections usually reduce the size of the
relation
 Usually good to do selections early, i.e.,
"push them down the tree"
 Also can be helpful to break up a
complex selection into parts

32
Selection Splitting

C1 AND C2

C1 OR C2

C1
(R) =  C1 (  C2 (R))
(R) = ( C1 (R)) Uset ( C2 (R))
if R is a set
(  C2 (R)) =  C2 (  C1 (R))
33
Selection and Binary Operators

Must push selection to both arguments:
  C (R U S) =  C (R) U  C (S)
 Must push to first arg, optional for 2nd:
  C (R - S) =  C (R) - S
  C (R - S) =  C (R) -  C (S)
 Push to at least one arg with all attributes
mentioned in C:

product, natural join, theta join, intersection

e.g.,  C (R X S) =  C (R) X S, if R has all the atts
in C
34
Pushing Selection Up the Tree

Suppose we have relations



and a view


StarsIn(title,year,starName)
Movie(title,year,len,inColor,studioName)
CREATE VIEW MoviesOf1996 AS
SELECT *
FROM Movie
WHERE year = 1996;
and the query

SELECT starName, studioName
FROM MoviesOf1996 NATURAL JOIN StarsIn;
35
The Straightforward Tree

starName,studioName
year=1996
Movie
StarsIn
Remember the rule
C(R S) = C(R) S ?
36
The Improved Logical Query Plan

starName,studioName

starName,studioName

starName,studioName
year=1996
year=1996
StarsIn
year=1996 year=1996
Movie
Movie
push selection
up tree
StarsIn
Movie
StarsIn
push selection
down tree
37
Laws Involving Projections


Consider adding in additional projections
Adding a projection lower in the tree can improve
performance, since often tuple size is reduced


If a projection is inserted in the tree, then none of the
eliminated attributes can appear above this point in
the tree


Usually not as helpful as pushing selections down
Ex: L(R X S) = L(M(R) X N(S)), where M (resp. N) is
all attributes of R (resp. S) that are used in L
Another example:

L(R Ubag S) = L(R) Ubag L(S)
But watch out for set union!
38
Push Projection Below Selection?
Rule: L(C(R)) = L(C(M(R)))
where M is all attributes used by L or C
 But is it a good idea?

SELECT starName FROM StarsIn WHERE movieYear =
1996;
 starName
 movieYear=1996
StarsIn
 starName
 movieYear=1996
 starName,movieYear
StarsIn
39
Joins and Products

Recall from the definitions of relational
algebra:



R
C
S = C(R X S) (theta join)
R
S = L(C(R X S)) (natural join)
where C equates same-name attributes in R
and S, and L includes all attributes of R and S
dropping duplicates
To improve a logical query plan, replace a
product followed by a selection with a join

Join algorithms are usually faster than doing
product followed by selection
40
Duplicate Elimination
Moving  down the tree is potentially beneficial as
it can reduce the size of intermediate relations
 Can be eliminated if argument has no duplicates




Legal to push  through product, join, selection,
and bag intersection


a relation with a primary key
a relation resulting from a grouping operator
Ex: (R X S) = (R) X (S)
Cannot push  through bag union, bag difference
or projection
41
Download