Advanced Database Indexing Tutorial outline Part I: Introduction Part II: Advanced techniques

advertisement
Advanced Database Indexing
Yannis Theodoridis
Advanced Database Indexing
Yannis Theodoridis
Computer Technology Institute, Greece
ytheod@cti.gr
Tutorial outline
!
Part I: Introduction
(what is and why do we need indexing?)
!
Part II: Advanced techniques
(isn’t the B-tree “ubiquitous”?)
!
Part III: Theoretical issues
(what will the future be?)
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
2
1
Advanced Database Indexing
Yannis Theodoridis
Part I: Introduction
Part II: Advanced Techniques
Part III: Theoretical Issues
Outline of Part I
!
General issues
– Data modeling
– Query processing
– The need for indexing
!
Classic methods
– The “ubiquitous” B-tree
– Hashing
!
Multi-attribute (and multi-dimensional)
methods
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
4
2
Advanced Database Indexing
Yannis Theodoridis
Database Management Systems
DBMS is a software for:
!
Efficient storage, processing and retrieval of
structured data collections
Its modules are ...
!
– Tools for data definition and manipulation (e.g.
SQL)
– Some extras (query optimization, transaction
management, concurrency control, recovery etc.)
Yannis Theodoridis
Advanced Database Indexing
5
History of data management
50’s
!
60’s
!
70’s
!
File systems
batch processing, cards, tapes etc.
Disk-resident systems
COBOL, sorting routines
Database systems
hierarchical / network / relational
model, SQL
80’s !
Novel data models
90’s !
Novel (non-traditional) data types
O-O, Active (rule-based)
Spatial, Image, Multimedia, ...
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
6
3
Advanced Database Indexing
Yannis Theodoridis
Data Models
!
Conceptual data model
– E-R Model
!
Logical data model
– Hierarchical model
– Network model
– Relational model
Yannis Theodoridis
Advanced Database Indexing
7
Relational model
‘WHO-IS-WHO’ TABLE
CODE
NAME
ADDRESS
1723
1789
1812
…
Gates, Bill
Clinton, Bill
Blair, Tony
…
http://www.microsoft.com
http://www.whitehouse.gov
http://www.number-10.gov.uk/
...
‘SERVERS’ TABLE
Yannis Theodoridis
ADBIS/DASFAA 2000
ID
CODE
0089722031
0062354024
0075642312
…
1789
1723
1723
…
Advanced Database Indexing
8
4
Advanced Database Indexing
Yannis Theodoridis
Query languages
Examples:
– QUEL, SQL, QBE
SQL properties:
–
–
–
–
User-friendly interface
Data definition AND manipulation
Queries and updates on tables
Incorporated transaction and recovery
management
Yannis Theodoridis
Advanced Database Indexing
9
Query Languages (cont.)
!
Structured Query Language (SQL):
SELECT
FROM
WHERE
< attribute-list >
< relation-list >
< condition >
SELECT
FROM
WHERE
Yannis Theodoridis
ADBIS/DASFAA 2000
name
who-is-who W, servers S
W.code = S.code
AND S.id LIKE “008*”
Advanced Database Indexing
10
5
Advanced Database Indexing
Yannis Theodoridis
Query Processing
Speed up processing by
!
– Using indexes on fields (at least on keys)
– Providing several join algorithms (hash-join, etc.)
Optimize query by
!
– Finding alternative Query Execution Plans (QEPs)
– Selecting the ‘optimal’ based on
• Heuristic rules
• Cost estimation
Execute the best strategy
!
Yannis Theodoridis
Advanced Database Indexing
11
Query Processing (cont.)
!
SQL query is equivalent to a relational
algebra expression
πW.name
(σS.id LIKE “008*”
AND W.code = S.code
(W × S)
SELECT W.name
FROM
who-is-who W, servers S
WHERE
W.code = S.code AND
S.id LIKE “008*”
)
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
12
6
Advanced Database Indexing
Yannis Theodoridis
Query Processing (cont.)
From the original …
… to an ‘optimal’ QEP
π W.name
πW.name
σ S.id LIKE “008*”
>< S.code = W.code
∧ S.code = W.code
×
σ S.id LIKE “008*”
S
W
Yannis Theodoridis
W
S
Advanced Database Indexing
13
Query Processing (cont.)
… thus the need for indexing
In commercial DBMSs we find
!
!
– B-trees
– Hashing
– Inverted files
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
14
7
Advanced Database Indexing
Yannis Theodoridis
B-trees
!
!
!
B-trees [Bayer & McCreight, 1972]
B+-trees [Knuth, 1973; Comer, 1979]
Based on ordering “<”
51 91
10 25 32 44
Yannis Theodoridis
91 93 98
51 60 77 80
Advanced Database Indexing
15
B-trees (cont.)
!
!
!
Internal nodes “direct traffic” and
point to lower-level nodes
Leaf nodes include actual keys and
pointers pointers to tuples
Guaranteed 50% capacity
– split / merge routines
10 25 32 44
Yannis Theodoridis
ADBIS/DASFAA 2000
51 91
51 60 77 80
Advanced Database Indexing
91 93 98
16
8
Advanced Database Indexing
Yannis Theodoridis
Hashing
Basic idea: key transformation and mapping to
buckets
Several variables:
!
!
– Extendible hashing [Fagin et al., 1979]
– Linear hashing [Litwin, 1980; Larson, 1982]
– etc.
Key
(decimal)
Yannis Theodoridis
H(key)
(binary)
Directory
Input
Advanced Database Indexing
17
Linear Hashing
!
!
!
!
5 buckets
no overflow has occurred
h0(k)=key mod 5
insertion of 8
due to overflow (of
bucket #3), bucket #0 is
rehashed using
h1(k)=key mod 10
Yannis Theodoridis
ADBIS/DASFAA 2000
1
2
3
4
10
15
0
21
36
7
12
3
13
29
0
1
2
3
4
5
10
21
36
7
12
3
13
29
15
Advanced Database Indexing
8
18
9
Advanced Database Indexing
Yannis Theodoridis
Inverted files
!
For non-primary keys
– where a number of duplicates is expected
1959, ...
..., 1957, 1958
...
10
...
...
25
...
160K, ...
..., 1969, ...
...
32
...
Yannis Theodoridis
..., 120K, ...
..., 200K, ...
...
25
...
32
...
Advanced Database Indexing
...
10
...
19
Inverted files (cont.)
!
Several B+ trees
– one for each attribute (e.g. age, salary)
!
postings lists
– include primary key values and
– pointers to the tuples of the data table
!
Alternative: Multi-dimensional access
methods
– KDB-tree [Robinson, 1981]
– Grid file [Nievergelt et al., 1984]
– ...
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
20
10
Advanced Database Indexing
Yannis Theodoridis
Grid File
pointers to
same bucket
Grid directory
>1970
1960-70
4
3
2
1950-60
2
1
1940-50
<1940
1
0
4
0
3
0
Linear scale on
“Birthdate”
1
2
3
4
5
6
< 10K 10-30K 30-50K 50-60K 60-90K 90-100K > 100K
0
1
2
3
4
5
6
Linear scale on
“Salary”
Yannis Theodoridis
Advanced Database Indexing
21
Grid File (cont.)
!
Pros:
– Efficient for point queries in the absence of
skew
– Simple implementation
!
Cons:
– Not very good for range queries
– Deteriorates rapidly for skewed data
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
22
11
Advanced Database Indexing
Yannis Theodoridis
However …
Our world is not just strings and numbers …
!
– What if our key is of type point? Or image content?
– What is a point/range query on such a key?
Examples
!
– Find the countries that are affected by a nuclear
accident (i.e., in a circle around the accident
location)
– Find photos in our library similar to a given one.
Yannis Theodoridis
Advanced Database Indexing
23
Multi-dimensional indexing
Definition: The organization into external data
structures of keys without an associated total
ordering (of interest)
!
– An “ordering of interest” is an ordering we wish to
use for range queries
– One can always impose “uninteresting” orderings on
keys
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
24
12
Advanced Database Indexing
Yannis Theodoridis
Epilogue of Part I
Indexing is a need for efficient database
processing
!
– B-tree is ubiquitous for traditional applications
– However we need something more: multi-dimensional
indexing
Thus … Advanced Indexing Techniques (Part II)
!
Yannis Theodoridis
Advanced Database Indexing
25
Epilogue of Part I (cont.)
Questions that arise [Hellerstein, 2000]
!
– What is the grand challenge in indexing?
– What would constitute a successful completion of
the research agenda?
– Or, should we expect to continuously need to solve
variants of the indexing problem?
Thus … Theoretical Issues (Part III)
!
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
26
13
Advanced Database Indexing
Yannis Theodoridis
Part I: Introduction
Part II: Advanced Techniques
Part III: Theoretical Issues
Outline of Part II
“Our world is more than strings and
numbers”
!
–
–
–
Spatial DB
Moving Objects (or spatio-temporal) DB
Image and Multimedia DB
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
28
14
Advanced Database Indexing
Yannis Theodoridis
Part II (cont.)
! Spatial
!
!
DB
Moving Objects (or spatio-temporal) DB
Image and Multimedia DB
Yannis Theodoridis
Advanced Database Indexing
29
Applications involving spatial data
!
Traditional GIS applications
– Urban planning
– Route optimization, market analysis
– Fire or Pollution Monitor
– Public networks administration
etc.
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
30
15
Advanced Database Indexing
Yannis Theodoridis
Applications (cont.)
!
Novel applications
– Image and Multimedia databases
• medical databases
• video-on-demand
– Time-series databases
• management of time intervals
– Data warehouses
Yannis Theodoridis
Advanced Database Indexing
31
Spatial database features
!
Manipulation of a very large amount of data
e.g. Tbytes of data per day from satellite images
!
Data distinction
spatial vs. non-spatial (alphanumeric) data
!
Complex spatial relationships and
operations
– Relationships: topological, directional, metric
– Operations: selection and join
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
32
16
Advanced Database Indexing
Yannis Theodoridis
Data Models
!
Two main approaches for spatial representation
raster model
(image-based partition of space)
vector model
(object-based partition of space)
R
R
H
Y-Axis
R
R
R
R
R
R
R
House
River
X-Axis
Yannis Theodoridis
Advanced Database Indexing
33
Req’s for efficient indexing
!
!
!
Specialized access methods are
necessary, for sake of performance,
uniformity, etc.
Point- and non-point objects need to be
efficiently indexed and retrieved
Support of several spatial relationships is
necessary
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
34
17
Advanced Database Indexing
Yannis Theodoridis
Popular indexing techniques
!
Raster Model:
– Quadtrees
!
Vector Model:
– K-D-B-trees, Quadtrees, Grid Files (for
points),
– R-trees and variations (for non-point objects)
Yannis Theodoridis
Advanced Database Indexing
35
Techniques for raster
Quadtrees [Samet, 1984]
!
Data set
Quadtree
Representation
root
0
1
0
20 21
22 23
1
2
3
3
20 21 22 23
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
36
18
Advanced Database Indexing
Yannis Theodoridis
Techniques for vector
Using approximations instead of the exact
geometry of shapes
!
e.g. the Minimum Bounding Rectangle (MBR)
UK
IR
Example:
BE
FR
PO
SP
Yannis Theodoridis
Advanced Database Indexing
37
Techniques for vector (cont.)
Several dozens of indexing methods
!
– a survey in [Gaede and Guenther, 1998]
R-tree [Guttman, 1984]: the most popular
!
– Has been implemented in many commercial DBMS
– Has been studied extensively (both theoretically
and experimentally)
– Several variations: R+-tree [Sellis et al., 1987], R*tree [Beckmann et al., 1990], etc.
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
38
19
Advanced Database Indexing
Yannis Theodoridis
The R-tree
A
D
K
F
G
B
J
E
I
H
A B C
M
D E F
G
H I
J
K
L M N
L
N
C
Yannis Theodoridis
Advanced Database Indexing
39
The R-tree
A
D
K
F
G
B
J
E
H
I
A B C
range query
M
N
point query
D E F
G
H I
J
K
L M N
L
C
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
40
20
Advanced Database Indexing
Yannis Theodoridis
Queries supported by R-trees
Point / range queries
Spatial join queries
!
!
– [Brinkhoff et al., 1993]
Direction, topological, distance queries
!
– [Papadias et al., 1995]
k- Nearest neighbor queries
!
– [Roussopoulos et al., 1995]
k- Closest pair queries
!
– [Hjaltason and Samet, 1998; Corral et al., 2000]
Yannis Theodoridis
Advanced Database Indexing
41
Query processing
Example: “find all states in CMT zone with lakes
inside”
SELECT
state_name
FROM
states, lakes
WHERE
state_region overlap lake_boundary
AND
Spatial join
state_region in “CMT_zone”
Spatial selection
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
42
21
Advanced Database Indexing
Yannis Theodoridis
Query processing (cont.)
Assume two R-trees Rs and Rl
1.
Range query on Rs
2.
Join (Rs,Rl)
3.
Find common set of results
or
SELECT state_name
1.
Range query on Rs
FROM states, lakes
2.
Build Rs’ on-the-fly
WHERE state_region overlap
3.
Join (Rs’,Rl)
lake_boundary
or …
AND state_region in “CMT_zone”
Yannis Theodoridis
Advanced Database Indexing
43
Part II (cont.)
!
Spatial DB
! Moving
Objects (or spatiotemporal) DB
!
Image and Multimedia DB
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
44
22
Advanced Database Indexing
Yannis Theodoridis
Moving objects databases
!
!
Moving points / regions
Time (t- axis) is not just another
dimension
– e.g. monotonously increasing
!
Discrete vs. continuous environment
t
t
y
x
Yannis Theodoridis
y
x
Advanced Database Indexing
45
Applications for moving objects
!
Transportation
– Traffic surveillance, intelligent transportation
systems (ITS)
!
Environmental monitoring
– Weather forecast, monitoring of natural
phenomena
!
Multimedia
– Video databases, animated movies
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
46
23
Advanced Database Indexing
Yannis Theodoridis
Applications (cont.)
!
Query examples [Sistla & Wolfson, 2000]
– During the past year, how many times was bus
#5 late by more than 10 min. at some station
(past query)
– Taxi cubs within 1 mile of my location (present
query)
– Trucks that will reach destination within 20
min. (future query)
– Make my PalmPilot geographically “context
aware”
Yannis Theodoridis
Advanced Database Indexing
47
Req’s for efficient indexing
!
“time” to be considered as first-class
citizen
– At least equivalent to “space”
– Not just another dimension
!
Novel spatio-temporal operators should
be supported
– e.g. “enters”, not just “overlap_during”
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
48
24
Advanced Database Indexing
Yannis Theodoridis
Classification
!
Indexing past locations
– All ‘history’ (i.e., the trajectory) of an object
is known
• In discrete space (a trajectory is a set of
snapshots)
• In continuous space (a trajectory is represented by
a function)
!
Indexing current and future locations
– Current location, speed and heading of an
object are known
Yannis Theodoridis
Advanced Database Indexing
49
Indexing past locations
!
Straightforward approach
– The 3D R-tree [Theodoridis et al., 1996]
!
Overlapping trees approach
– The HR-tree [Nascimento and Silva, 1998]
!
Trajectory-based indexing
– The STR-tree and the TB-tree [Pfoser et al.,
2000]
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
50
25
Advanced Database Indexing
Yannis Theodoridis
The HR-tree
R1
A
B
t0 t1
t1 t1
C
R1
1
2
3
4
5
6
7
8
R2
9
A
B
A1
C
R-tree at t0
R2
A1
B
1
C
2
3 4
5
6 7
8
9
3a
HR-tree
1
2 3a
4
5
6
7
8
9
R-tree at t1
Yannis Theodoridis
Advanced Database Indexing
51
The TB-tree
!
Hybrid R-tree structure + total trajectory
preservation
– one leaf node contains segments of only one
trajectory
– neglecting spatial discrimination with respect to
the two spatial dimensions
– no node splitting
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
52
26
Advanced Database Indexing
Yannis Theodoridis
The TB-tree (cont.)
A set of leaf nodes, each
containing a partial trajectory
+
organized in a tree hierarchy
+
leaf nodes connected by a linked list
Yannis Theodoridis
Advanced Database Indexing
53
Indexing current locations
!
Straightforward indexing
– Quadtree structure [Sistla et al., 1997]
!
Indexing in dual space
– From lines (primal space) to points (dual space)
– Problems of different complexity [Kollios et
al., 1999]
• 1-dimensional problem: objects moving on a line
• 2-dimensional problem: objects are moving on a
plane
• 1.5-dimensional problem: objects are moving on
predefined routes on a plane
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
54
27
Advanced Database Indexing
Yannis Theodoridis
Indexing in dual space
!
!
!
A trajectory is a
function y=vt+a
Problem: trajectory
is not bounded in taxis
Solution: duality
– A trajectory (line in
primal space) is
represented by a
point (v,a) in dual
space (Hough-X
transformation)
o1 o3
y
y2
y2q
o4
o2
y3
y1q
y4
y1
t1 t2 t3 t4
t2q
t1q
y2q
y1q
vmin
Yannis Theodoridis
time
a
vmax
Advanced Database Indexing
v
55
Indexing in dual space(cont.)
!
Methodology: Use a Point Access Method
in the dual Hough-X space
– K-d-tree based methods could be more
efficient than R-tree based techniques
a
a
v
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
v
56
28
Advanced Database Indexing
Yannis Theodoridis
Part II (cont.)
!
!
Spatial DB
Moving Objects (or spatio-temporal) DB
! Image
and Multimedia DB
Yannis Theodoridis
Advanced Database Indexing
57
Image and Multimedia data
!
Image DB:
– satellite photos, medical records
!
Video DB:
– collections of frames
!
Time-series DB:
– financial data, ECG signals
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
58
29
Advanced Database Indexing
Yannis Theodoridis
Similarity retrieval
!
Spatial similarity:
– Retrieval by browsing:
• A browser is used to explore images
– Retrieval by semantics attributes:
• Queries are formulated by using basic image
attributes
– Retrieval by spatial similarity
• Similar configurations of objects wrt a given image
(or sketch) are requested
Yannis Theodoridis
Advanced Database Indexing
59
Similarity retrieval (cont.)
!
Visual similarity
– Content-based retrieval
– “content” could be color, shape, texture, etc.
Methodology
– Feature extraction (e.g. 16 colors)
– High-dimensional vectors
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
60
30
Advanced Database Indexing
Yannis Theodoridis
Spatial similarity
!
Attribute Relational Graphs
– [Petrakis and Faloutsos, 1997]
r 01
a = 270
V0
c = face
l = 100
r 03
a = 50
r02
a = 130
V3
c = left eye
l = 15
Yannis Theodoridis
r 23
a=0
V1
c = nose
l = 20
r 13
a = 60
r 12
a = 120
V2
c = right eye
l = 15
Advanced Database Indexing
61
ARGs and R-trees
!
Methodology
– images are mapped to ARGs
– ARG information is stored in spatial access
methods (R-trees)
– exact matching or similarity queries are
transformed to range queries in R-trees
!
Similarity retrieval
– transformed into a problem of comparing the
query ARG with all data ARGs
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
62
31
Advanced Database Indexing
Yannis Theodoridis
Visual similarity
P1
F(P1)
P2
F(Q)
P3
F(P3)
F(P2)
Q
Yannis Theodoridis
Advanced Database Indexing
63
Visual similarity (cont.)
!
Problem: R-trees (and Quadtrees, etc.)
are inefficient in high dimensionality
– Grow exponentially with the dimensionality
!
!
Formally: “the curse of dimensionality”
Several proposals:
–
–
–
–
TV-tree [Lin et al., 1994],
SS-tree [White & Jain, 1996]
X-tree [Berchtold et al., 1996]
M-tree [Ciaccia et al., 1997]
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
64
32
Advanced Database Indexing
Yannis Theodoridis
The X-tree
!
eXtended node tree
– Assume a fanout 3
– Instead of splitting N9, it becomes a supernode
root
N9 N10 N11
(normal) directory node
supernode
data node
N9
N1 N2 N3
N1
N2
P1 P2 …
…
Yannis Theodoridis
N10
N5 N6
N9’
N4
…
N6
N5
N4
N3
…
N11
N7 N8
…
N8
N7
…
Advanced Database Indexing
…
…
65
Epilogue of Part II
Dozens of proposals, one for each particular
sub-problem
Not a “universal” structure
Multi-dimensional indexing techniques are the
“next wave” in commercial DB indexing
Open issue:
!
!
!
!
– Using an index in high-dim. could be worse that linear
scanning [Weber et al., 1998]
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
66
33
Advanced Database Indexing
Yannis Theodoridis
Part I: Introduction
Part II: Advanced Techniques
Part III: Theoretical Issues
Outline of Part III
!
Generalized indexing
– GiST
!
!
Theory of indexability
The future is multi-dimensional …(?)
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
68
34
Advanced Database Indexing
Yannis Theodoridis
Motivation
!
!
!
Dozens of indexing structures
With current technology, DBMS cannot
implement more than a couple indexing
techniques
Complimentary approaches
– Generalized indexing
– Theory of indexability
Yannis Theodoridis
Advanced Database Indexing
69
Motivation (cont.)
!
Generalized indexing:
– Provide the database infrastructure to
implement (relatively easily) indexing
techniques
!
Theory of indexability:
– Study multi-dimensional indexing in the worstcase (like done in main memory). A worst-case
solution should be as ubiquitous as the B-tree
!
The two approaches are complimentary
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
70
35
Advanced Database Indexing
Yannis Theodoridis
Generalized indexing
!
GiST: the Generalized Search Tree
– [Hellerstein et al., 1995]
– Generalizes the R-tree (which in turn
generalized the B-tree)
– Can be used to implement
•
•
•
•
Yannis Theodoridis
B+-trees
R-trees
KDB-trees
etc.
Advanced Database Indexing
71
GiST details
!
Keys and predicates
– Keys are user-defined types
– Predicates (describe sets of keys) are userdefined
– So, keys (called entries) can be points, lines,
time intervals, sets, etc.
!
Methods on keys
– Consistent, union, penalty, picksplit
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
72
36
Advanced Database Indexing
Yannis Theodoridis
Theory of indexability
!
Workloads + Indexing Schemes =
Indexability
– [Hellerstein et al., 1997]
– Trade-off space overhead vs. worst case I/O
cost
– Simplify the theory, suppress search issues
– Concentrate on lower bound results
Yannis Theodoridis
Advanced Database Indexing
73
Theory of indexability (cont.)
!
Workloads
–
–
–
–
–
!
Model a dataset as a set I
Model a query as a subset Q of I
Size of the problem: N = |I|
Size of Q = |Q|
Workload: W=(I, Q) for Q = {Q1, …, Qq}
Indexing schemes
– Block size B
– Block: a set of objects of size B
– Indexing scheme: set S of blocks
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
74
37
Advanced Database Indexing
Yannis Theodoridis
Theory of indexability (cont.)
!
Cost Model:
– Two parameters, r and A
– Storage redundancy r: measure of space
overhead
• how many times each item in the data set is stored
r = B|S| / N
– Access overhead A: measure of I/O cost
• how many times more blocks than necessary does a
query retrieve
• Any query is covered by at most
A ( |Q|/B ) blocks, 1 ≤ A ≤ B
!
Generalizes to higher dimensions
Yannis Theodoridis
Advanced Database Indexing
75
The future is multi-dimensional
!
“Trees have grown everywhere”
– [Sellis et al., 1997]
!
!
B-tree is the choice in 1-dim. space
R-tree is OK in 2-dim. space but …
– Curse of dimensionality
– Theory of indexability
!
More work is required
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
76
38
Advanced Database Indexing
Yannis Theodoridis
Epilogue
Indexing is necessary for efficient database
management
Novel applications require new techniques
!
!
– Or a single “universal” ?
Unfortunately …
!
– No standard benchmarks for advanced indexing
problems
– Relatively little work on methodologies for index
experimentation and customization
Yannis Theodoridis
Advanced Database Indexing
77
Epilogue (cont.)
Of all the data, only a small percentage is in
applications. Data is in
!
– Applications
– Web sites, etc.
Exabytes* of information (!!!) out there
!
– Including text, images, video, sound [Lesk, 1997]
Goal of the future: index data outside
databases
!
(*) 1 Exabyte = 109 Gbytes or 1018 bytes
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
78
39
Advanced Database Indexing
Yannis Theodoridis
About
!
Presentation in
– http://dias.cti.gr/~ytheod/research/ADBIS
!
Material based on the book
“Advanced Database Indexing”
– Co-authors: Yannis Manolopoulos and Vassilis
J. Tsotras
– Published by Kluwer Academic, 1999
Yannis Theodoridis
Advanced Database Indexing
79
Thank you !
Yannis Theodoridis
ADBIS/DASFAA 2000
Advanced Database Indexing
80
40
Download