Evaluating Relational Operators

advertisement
Database Systems
(資料庫系統)
December 19, 2004
Lecture #12
By Hao-hua Chu (朱浩華)
1
Announcement
• Next week reading: Chapter 16
– Skip Chapter 15
• Hand in your assignment #5
2
Evaluating Relational Operators
Chapter 14
3
Outline
• Continue from Chapter 12 (Overview of Query Evaluation) and
discuss more details on the different possible algorithms to evaluate
relational operators efficiently.
• Selection Operation
• Projection Operation
• Join Operation
• Set Operations
• Aggregate Operations
4
Relational Operations
• We will consider how to implement:
–
–
–
–
–
–
Selection: Selects a subset of rows from relation.
Projection: Deletes unwanted columns from relation.
Join: Allows us to combine two relations.
Set-difference:Tuples in reln. 1, but not in reln. 2.
Union:Tuples in reln. 1 and in reln. 2.
Aggregation (SUM, MIN, etc.) and GROUP BY
5
Schema for Examples
Sailors (sid: integer, sname: string, rating: integer, age: real)
Reserves (sid: integer, bid: integer, day: dates, rname: string)
• Similar to old schema; rname added for variations.
– Rname is the name of the agent making the reservation on behalf of a
sailor.
• Reserves:
–
Each tuple is 40 bytes long, 100 tuples per page, 1000 pages.
• Sailors:
–
•
Each tuple is 50 bytes long, 80 tuples per page, 500 pages.
Again, we will look at cost measured in # of disk I/Os.
6
Selection Operation
SELECT *
FROM Reserves R
WHERE R.rname=‘Joe’
• Several alternative algorithms for implementing selection.
• No algorithm is always better (disk I/O costs) than the others
• Depend on the file organizations (sorted, clustered index, etc.).
–
–
–
–
No Index, Unsorted Data
No Index, Sorted Data
B+ Tree Index
Hash Index
• Two types of selection conditions:
– Conjunctions (term_1 AND term_2)
– Disjunctions (term_1 OR term_2)
7
No Index for Selections
SELECT *
FROM Reserves R
WHERE R.rname=‘Joe’
• Unsorted Data, no Index (on rname)
– Scan the file
– Cost is 1000 page I/Os.
• Sorted Data (on rname), no Index (on rname)
– Binary search to find the first qualifying tuple, then scan to find
remaining qualifying tuples.
– Cost of binary search is O(log2 1000) ~ 10 page I/Os.
– Cost of scanning is the number of pages containing qualifying tuples.
8
Using B+ Tree Index for Selections
• Cost depends on (#qualifying tuples), and (whether the index is
clustered or not).
–
–
–
–
Cost of finding qualifying data entries (typically small) plus cost of retrieving
tuples (could be large without clustering).
Assume uniform distribution of names, about 10% of tuples qualify (100 pages,
10,000 tuples).
With clustered index, cost is little more than 100 page I/Os.
With unclustered index, cost can be up to 10,000 page I/Os.
•
•
Why? Each qualified tuple may be on a different page, worse than the file scan (1,000
page I/Os).
One possible refinement to bound the cost <= cost of file scan, how?
9
Refinement for Unclustered Indexes
1. Find qualifying data entries.
2. Sort the rid’s of the data tuples to be retrieved (e.g., by their
pageNos)
RID = <pageNO, slotNo>
3. Fetch rids in order. This ensures that each data page is looked at
just once.
• The cost is bounded by the number of page I/Os (no worse than file
scan).
10
Using Hash Index for Selections
• It is good only for equality selection conditions.
– It does not apply to range selections.
• The cost of using the hash index is a sum of
– Retrieving the bucket (overflow) page in the index requires only 1~2 page
I/Os.
– Retrieving all qualified tuples requires up to (# qualified tuples) page
I/Os, given that each qualified tuple can be on different page.
– Can apply the refinement of sorting of rids on PageNo.
11
General Selections
• So far, have considered simple selection condition in the form of
(R.rname=‘Joe’).
• A general selection condition is a boolean combination of
– (term_1 op term_2 … op term_n), where op can be AND | OR.
• Given a general selection condition:
(1) rewrite the general condition in conjunctive normal form (CNF):
(day < 8/9/02 AND rname = ‘Joe’) OR bid=5 OR sid=3
Can be transformed into
(day < 8/9/02 OR bid=5 OR sid=3) AND (rname=‘Joe’ OR bid=5 OR sid=3)
(2) Try to match conjunct with index (Chapter 12)
(3) Evaluating Selections with Conjunctions / with Disjunctions
12
Two Approaches in Evaluating Selections
with Conjunction
• First approach: Find the most selective access path, retrieve tuples
using it, and apply any remaining terms that don’t match the index:
Most selective access path: An index or file scan requiring fewest page I/Os.
– Selectivity of a conjunct is proportional to the reduction factor of applying
this conjunct to the tuples.
– Consider day<8/9/94 AND bid=5 AND sid=3. A B+ tree index on day
can be used; then, bid=5 and sid=3 must be checked for each retrieved
tuple. Similarly, a hash index on <bid, sid> could be used;
day<8/9/94 must then be checked.
–
13
Intersection of Rids
• Second approach: (if we have 2 or more matching indexes
that use Alternatives (2) or (3) for data entries):
–
–
–
–
•
Get sets of rids of data tuples using each matching index.
Then intersect these sets of rids
Retrieve the tuples and apply any remaining terms.
(Intersection on rids, not on the tuples)
Example: day<8/9/94 AND bid=5 AND sid=3.
–
Assume a B+ tree index on day and an index on sid, both using
Alternative (2)
14
Examples: what is the most selective path?
Ex#1: (day < 8/9/02 OR rname=‘Joe’), index on rname
– Could check (rname=‘Joe’) using index, but (day < 8/9/02)
requires a file scan, so might as well as use file scan.
– File scan is the most selective path.
Ex#2: (day < 8/9/02 OR rname=‘Joe’) AND sid=3, indexes on
rname and sid
– (sid = 3) is the most selective path, similar to approach #1.
Ex#3: (day < 8/9/02 OR rname=‘Joe’), indexes on day and rname
– Union of Rids matching (day < 8/9/02) and matching
(rname=‘Joe’), similar to approach #2.
15
The Projection Operation
SELECT DISTINCT R.sid, R.bid
FROM Reserves R
• A simple approach based on sorting:
(1) Scan R and produce a set of tuples that contain only the projected attributes
(2) External Merge-Sort this set of tuples.
(3) Scan the sorted result, comparing adjacent tuples to discard duplicates.
•
What is the cost?
–
–
–
–
Step (1) requires reading1,000 page I/Os, and writing out (assuming that projected
attributes is ¼ size of a original tuple) 250 page I/Os.
Step (2) requires (assuming 20 buffer pages) 2 passes (1 + logB-1 250/B). The cost is
2*2*250 = 1000 page I/Os.
Step (3) requires 250 page I/Os.
Total cost is 2,500 page I/Os.
16
The Projection Operation
SELECT DISTINCT R.sid, R.bid
FROM Reserves R
• An improved approach on sorting:
–
–
–
(Combine step 1 with step 2): modify Pass 0 of external sort to
eliminate unwanted fields. Also use replacement sort to produce
runs of about 2B pages.
(Combine step 2 with step 3): Modify merging passes to eliminate
duplicates.
Cost: In Pass 0, read original relation (1000 page I/Os), write
out same number of smaller tuples (250 page I/Os). In the one
merging pass, read the runs (250 page I/Os). The total cost is
1500 page I/Os (compared to 2500 page I/Os in the simple
17
approach).
Projection Based on Hashing
• The basic idea is that duplicate tuples will be hashed to the same bucket. So you
can look at tuples in each bucket and eliminate duplicates.
• Partitioning phase: Read R using one input buffer. For each tuple, discard
unwanted fields, apply hash function h1 to choose one of B-1 output buffers.
–
Result is B-1 partitions (of tuples with no unwanted fields). 2 tuples from different partitions
guaranteed to be distinct.
• Duplicate elimination phase: For each partition, read it and build an in-memory
hash table, using hash fn h2 (<> h1) on all fields, while discarding duplicates.
– Why hashing 2nd time? To minimizing collisions (two unique tuples hashing to the same
bucket).
• Cost: In partitioning phase, read 1000 pages I/Os and write out 250 page I/Os. In elimination
phase, read 250 page I/Os. Total is 1500.
18
Discussion of Projection
• Sort-based approach is the standard, why?
– No need to worry if distribution of hash function is uniform.
– Result is sorted.
• If an index on the relation contains all projected attributes in its
search key, can do index-only scan.
–
Apply projection techniques to data entries (much smaller!)
19
Equality JoinsWith One Join Column
SELECT *
FROM Reserves R1, Sailors S1
WHERE R1.sid=S1.sid
• R ∞ S is very common, so must be carefully optimized.
• R X S is large; so, R X S followed by a selection is inefficient.
• Assume: M pages in R, pR tuples per page, N pages in S, pS tuples
per page.
–
In our examples, R is Reserves and S is Sailors.
• We will consider more complex join conditions later.
20
Two Classes of Algorithms to Implement Join
Operation
• Algorithms in class 1 require enumerating all tuples in the
cross-product and discard tuples that do not meet the join
condition.
– Simple Nested Loops Join
– Blocked Nested Loops Join
• Algorithms in class 2 avoid enumerating the cross-product.
– Index Nested Loops Join
– Sort-Merge Join
– Hash Join
21
Simple Nested Loops Join
foreach tuple r in R do
foreach tuple s in S do
if ri == sj then add <r, s> to result
• For each tuple in the outer relation R, scan the entire inner
relation S (scan S total of pR * M times!).
–
•
Cost: M + pR * M * N = 1000 + 100*1000*500 I/Os => very
huge.
How can we improve simple nested loops join?
22
Page Oriented Nested Loops Join
foreach page of R do
foreach page of S do
for all matching tuples r in R-block and
s in S-page, add <r, s> to result
• Cost: M + M*N = 1000 + 1000*500 = 501,000 => still
huge.
• If smaller relation (S) is outer, cost = 500 + 500*1000 =
500,500
23
Block Nested Loops Join
foreach block of B-2 pages of R do
foreach page of S do
for all matching tuples r in R-block and
s in S-page, add <r, s> to result
• Use one page as an input buffer for scanning the inner S, one page
as the output buffer, and use all remaining pages to hold ``block’’ of
outer R.
–
For each matching tuple r in R-block, s in S-page, add <r, s> to result.
Then read next R-block, scan S, etc.
24
Block Nested Loops Join: Efficient Matching
Pairs
• If B is large, it may be slow to find matching pairs between tuples in
S-page and R-block (R-block has B-2 pages).
• The solution is to build a main-memory hash table for R-block.
R&S
Join Result
Hash table for block of R
...
...
...
Input buffer for S Output buffer
25
Examples of Block Nested Loops
• Cost: Scan of outer + #outer blocks * scan of inner
–
#outer blocks = ceiling (# of pages of outer / blocksize)
• With Reserves (R) as outer, and 102 buffer pages:
–
–
–
Cost of scanning R is 1000 I/Os; a total of 10 blocks.
Per block of R, scan Sailors (S); 10*500 I/Os.
Total cost = 1000 + 10 * 500 = 6000 page I/Os => huge
improvement over page-oriented nested loops join.
• With 100-page block of Sailors as outer:
–
–
–
•
Cost of scanning S is 500 I/Os; a total of 5 blocks.
Per block of S, we scan Reserves; 5*1000 I/Os.
Total cost = 500 + 5*1000 = 5500 page I/Os
For blocked access (block I/Os are more efficient), it may be best to
divide buffers evenly between R and S.
26
Index Nested Loops Join
foreach tuple r in R do
foreach tuple s in S where ri == sj do
add <r, s> to result
• If there is an index on the join column of one relation (say S), can
make it the inner and exploit the index.
–
Cost: M + ( (M*pR) * cost of finding matching S tuples)
• For each R tuple, cost of probing S index is about 1.2 for hash
index, 2-4 for B+ tree. Cost of finding S tuples (assume Alt. (2) or
(3) for data entries) depends on clustering.
–
–
Clustered index: 1 I/O (typical)
Unclustered index: up to 1 I/O per matching S tuple.
27
Examples of Index Nested Loops
• Hash-index (Alt. 2) on sid of Sailors (as inner):
–
–
–
Scan Reserves: 1000 page I/Os, 100*1000 tuples.
For each Reserves tuple: 1.2 I/Os to get data entry in index, plus 1 I/O to get (the exactly one)
matching Sailors tuple.
Total: 100 0+ 100,000 * 2.2 = 221,000 I/Os.
• Hash-index (Alt. 2) on sid of Reserves (as inner):
–
–
–
–
•
•
Scan Sailors: 500 page I/Os, 80*500 tuples.
For each Sailors tuple: 1.2 I/Os to find index page with data entries, plus cost of retrieving
matching Reserves tuples.
Assume uniform distribution, 2.5 reservations per sailor (100,000 / 40,000). Cost of retrieving
them is 1 or 2.5 I/Os depending on whether the index is clustered.
Total (Clustered): 500 + 40,000 * 2.2 = 88,500 I/Os.
Given choices, put the relation with higher # tuples as inner loop.
Index Nested Loop performs better than simple nested loop.
28
Sort-Merge Join
• Sort R and S on the join column [merge-sort], then scan them to do
a ``merge’’ (on join col.) [scan-merge], and output result tuples.
• Scan-merge:
–
–
–
Advance scan of R until current R-tuple >= current S tuple, then advance
scan of S until current S-tuple >= current R tuple; do this until current R
tuple = current S tuple.
At this point, all R tuples with same value in Ri (current R group) and all
S tuples with same value in Sj (current S group) match; output <r, s> for
all pairs of such tuples.
Then resume scanning R and S.
29
Review External Merge Sort (B=4)
3,4 6,2 9,4 8,7
5,6 3,1 2,10 15,3
16,5 13,8 19,1 32,8
Pass 0: Read four unsorted pages, sort them, and write them out.
Produce 3 runs of 4 pages each.
start
2,3 4,4 6,7 8,9
start
start
1,2 3,3 5,6 10,15
1,5 8,8 13,1619,32
merging
1,1 2,2 3,3 …
Pass 1: Read three pages, one page from each of 3 runs, merge them, and
write them out. Produce 1 run of 12 pages.
30
Scan-Merge
Gs
1
2
5
sid
22
28
28
44
58
sname
dustin
yuppy
lubber
guppy
rusty
rating
7
9
8
5
10
Tr
1
3
4
6
age
45.0
35.0 2a
55.5 2b
35.0
35.0
sid
28
28
31
31
31
58
bid
103
103
101
102
101
103
Ts
3a
3b
day
12/4/96
11/3/96
10/10/96
10/12/96
10/11/96
11/12/96
5a
6a
5b
6b
rname
guppy
yuppy
dustin
lubber
lubber
dustin
31
Cost of Merge-Sort Join
• R is scanned once; each S group is scanned once per matching R
tuple. (Multiple scans of an S group are likely to find needed pages
in buffer.)
• Cost: 2 M log M + 2 N log N + (M+N)
–
–
–
–
•
Assume enough buffer pages to sort both Reserves and Sailors in 2 passes
The cost of merge-sorting two relations is 2 M log M + 2 N log N.
The cost of scan-merge two sorted relations is M+N.
Total join cost: 2*2*1000 + 2*2*500 + 1000 + 500 = 7500 page
I/Os.
Any possible refinement to reduce cost?
32
Refinement of Sort-Merge Join
• Combine the merging phase in sorting with the scan-merge for the
join.
–
–
Allocate one buffer space for each run (in theL merge pass) in R & S.
Buffer size B > squar_root(L), where L is the size of the larger relation.Why?
•
–
–
# runs = 2(L/2B) = L/B < B [replacement sort]
Cost: read+write each relation in Pass 0 + read each relation in (only)
merging pass (not counting the writing of result tuples).
Cost goes down from 7500 to 4500 I/Os
R
S
sort-merge
scan-merge
33
Hash Join
• Hash both relations on the join attribute using the same hash
function h.
– Tuples in R-partition_i (bucket) can only match with tuples in Spartition_i.
• For i=1..k, check for matching pairs in R-partition_i and Spartition_i.
• For efficient matching pairs, apply hashing to tuples of R-partition
using another hash function h2.
34
Hash-Join
• Partition both relations
using hash fn h: R tuples
in partition i will only
match S tuples in partition
i.
• Read in a partition of R,
hash it using h2 (<> h!).
Scan matching partition of
S, search for matches.
Original
Relation
OUTPUT
1
Partitions
1
2
INPUT
2
hash
function
...
h
B-1
B-1
Disk
B main memory buffers
Partitions
of R & S
Disk
Join Result
hash
fn
Hash table for partition
Ri
h2
h2
Input buffer
for Si
Disk
Output
buffer
B main memory buffers
Disk35
Observations on Hash-Join
• #partitions k < B-1 (need one buffer page for reading), and B-2 >
size of largest partition to be held in memory.
• Assume uniformly sized partitions, and maximize k, we get:
–
k= B-1, and B-2 > M/(B-1), i.e., B must be > square_root(M)
M
• If we build an in-memory hash table to speed up the matching of
tuples, a little more memory is needed.
• If the hash function does not partition uniformly, one or more R
partitions may not fit in memory. Can apply hash-join technique
recursively to do the join of this R-partition with corresponding Spartition.
36
Cost of Hash-Join
• In partitioning phase, read+write both relns; 2(M+N). In
matching phase, read both relns; M+N I/Os.
• In our running example, this is a total of 4500 I/Os.
• Sort-Merge Join vs. Hash Join:
–
–
–
–
Same amount of buffer pages.
Same cost of 3(M+N) I/Os.
Hash Join shown to be highly parallelizable.
Sort-Merge less sensitive to data skew; result is sorted.
37
General Join Conditions
• So far, we have only discussed single equality join condition.
• How about equalities over several attributes? (e.g., R.sid=S.sid
AND R.rname=S.sname):
–
–
For Index Nested Loops join, build index on <sid, sname> on R (if R is
inner); or use existing indexes on sid or sname.
For Sort-Merge and Hash Join, sort/partition on combination of the two
join columns.
• How about inequality conditions? (e.g., R.rname < S.sname):
–
–
–
For Index Nested Loops join, need (clustered!) B+ tree index.
Hash Join, Sort Merge Join not applicable.
Block Nested Loops join, quite likely to be the best join method here.
38
Set Operations
• Intersection and cross-product are special cases of join.
– Intersection is equality on all fields in join condition. None for cross-product.
• Union (Distinct) and Except similar; we’ll do union.
• Sorting based approach to union:
–
–
–
Sort both relations (on combination of all attributes).
Scan sorted relations and merge them.
Alternative: Merge runs from Pass 0 for both relations.
• Hash based approach to union:
–
–
Partition R and S using hash function h.
For each S-partition, build in-memory hash table (using h2), scan corr. R-partition and
add tuples to table while discarding duplicates.
39
Aggregate Operations (AVG, MIN, etc.)
• Without grouping:
–
–
In general, requires scanning the relation and maintain some running information (sum of
age & number of sailors).
Given index whose search key includes all attributes in the SELECT or WHERE clauses, can
do index-only scan.
• With grouping:
–
–
–
SELECT AVG(S.age)
FROM Sailors S
GROUP-BY S.rating
Sort on group-by attributes (rating), then scan relation and compute aggregate for each
group.
Similar approach based on hashing on rating.
Given tree index whose search key includes all attributes in SELECT,WHERE and GROUP
BY clauses, can do index-only scan; if group-by attributes form prefix of search key, can
retrieve data entries/tuples in group-by order.
40
Impact of Buffering
• If several operations are executing concurrently, they share buffer pool;
therefore it is difficult to estimate the number of available buffer pages.
• Repeated access patterns interact with buffer replacement policy.
–
–
e.g., Inner relation is scanned repeatedly in Simple Nested Loop Join. With
enough buffer pages to hold inner, replacement policy does not matter.
Otherwise, replacement policy is critical. MRU is best, LRU is worst
(sequential flooding).
Does replacement policy matter for Block Nested Loops? (No, why?)
41
Download