Physical Database and Index Structures Databases He Tan

advertisement
He Tan
hetan@ida.liu.se
IISLAB
IDA
Databases
Physical Database and
Index Structures
Real world
Queries
Model
Databases
DBMS
He Tan
Answers
Processing of queries and
updates
Access to stored data
Physical
database
september 2008
1
1
He Tan
hetan@ida.liu.se
IISLAB
IDA
He Tan
hetan@ida.liu.se
IISLAB
IDA
What you will learn
Storage hierarchy
• How data is stored on disk
CPU
• How to speed up accessing of data by using indexes
september 2008
3
3
He Tan
hetan@ida.liu.se
IISLAB
IDA
september 2008
Primary storage
• Disk
Secondary storage
(fast, , accessible by CPU
small, expensive, volatile)
(slow, inaccessible by CPU
big, cheap, permanent, )
4
4
He Tan
hetan@ida.liu.se
IISLAB
IDA
Disk memory
• Cache memory
• Main memory
block
Disk
track
•
A track is divided into equal-sized blocks during disk
formatting (or initialization).
•
arm
actuator
read/write head
spindle
disk rotation
Block is the unit of transfer of data between disk and main
memory, e.g.
ƒ
ƒ
Read = copy block from disk to buffer in primary memory.
Write = the opposite way.
ƒ
R/w time = seek time + rotational delay + block transfer time.
cylinder
locate block
1-2 msec.
9-60 msec.
actuator
movement
september 2008
5
5
september 2008
6
6
1
He Tan
hetan@ida.liu.se
IISLAB
IDA
He Tan
hetan@ida.liu.se
IISLAB
IDA
Block transfer
A block is the unit of transfer of data between secondary memory and
primary memory
Physical database
• How database is stored on disk ?
ƒ How records are placed in disk
Approximately 1000 to 100000 times more expensive to perform one
disk operation than one memory operation
ƒ How records are placed in file
september 2008
•
Processor instruction
•
Memory fetch
•
Disk fetch
ƒ How file blocks are placed to disk blocks
7
7
He Tan
hetan@ida.liu.se
IISLAB
IDA
Data stored in files.
•
Records are allocated to blocks
A file (table) is a sequence of records
•
Suppose that
•
A record (tuple) is a collection of field values
•
Each field (attribute) has an elementary data type
•
A file contains records of the same or different length
•
A file contains records of the different length
ƒ r is the number of records in the file
ƒ B is the size in bytes of the block
ƒ R is the size in bytes of the record (fixed-length)
Records have same type, but fields have variable size.
Records have same type, but fields have multiple values.
Records have same type, but fields are optional.
Records have different types.
9
september 2008
•
Blocking factor:
B
bfr =  
R
•
Blocks needed:
 r 
b = 

 bfr 
10
10
He Tan
hetan@ida.liu.se
IISLAB
IDA
Spanned/Unspanned records
•
File and record storage
Wasted space per block = B – bfr * R.
•
File with 1 000 000 records (r)
Solution: Spanned records.
•
Block size (B) of 4 096 bytes
•
Record size (R) is 100 bytes (fixed-length, unspanned)
•
Blocking factor (bf) is
•
Blocks needed (b) is
wasted space
Unspanned
Unspanned
block i
wasted
record 1
record 2
block i+1
wasted
record 3
record 5
block i
Spanned
Spanned
11
File and record storage
•
•
september 2008
8
8
•
9
He Tan
hetan@ida.liu.se
IISLAB
IDA
september 2008
He Tan
hetan@ida.liu.se
IISLAB
IDA
Files and records
ƒ
ƒ
ƒ
ƒ
september 2008
10 −9 s
10 −8 s
10 −3 s
block i+1
record 1
record 3
record 2
record 4
record 3 p
 B   4096 
 R  =  100  = 40

  
 r  1000000 
 bfr  =  40  = 25000

  
record 5
11
september 2008
12
12
2
He Tan
hetan@ida.liu.se
IISLAB
IDA
He Tan
hetan@ida.liu.se
IISLAB
IDA
File organization
•
How records are placed in the file?
•
Heap files
•
Sorted files
•
Hash files
Heap files
Records are added to the end of the file. Hence,
ƒ Cheap record addition.
ƒ Expensive record retrieval, removal and update, since they imply linear
search:
b


• Average case:  2  block accesses.
• Worst case: b block accesses (if it doesn't exist or several exist).
ƒ Moreover, record removal implies waste of space. So, periodic
reorganization.
september 2008
13
13
He Tan
hetan@ida.liu.se
IISLAB
IDA
14
14
He Tan
hetan@ida.liu.se
IISLAB
IDA
Sorted files
•
september 2008
Hash files
Records ordered according to some field. Hence,
ƒ Cheap record retrieval based on the ordering field:
• All the records: access blocks sequentially.
• Next record: probably in the same block.
• Random record: binary search, then worst case implies
•
log2 b
The hash function applies to the hash field and returns the
position of the record in the array of records. E.g.
position = field mod n
block accesses.
ƒ Expensive record addition and deletion.
• Addition: a temporary unordered file (overflow file) + periodic reorganization.
• Deletion: deletion markers + periodic reorganization.
•
•
Collision: different field values hash to the same position.
ƒ
Solutions:
•
•
•
Is record updating cheap or expensive ?
Open addressing: Check subsequent positions until unused position is found.
Multiple hashing: Use a second hash function.
Chaining: Put the record in the overflow area and link it.
ƒ The search condition to locate the record.
ƒ The field to be modified.
september 2008
He Tan
hetan@ida.liu.se
IISLAB
IDA
september 2008
15
15
17
Contiguous allocation
•
Linked allocation
•
Linked clusters allocation
•
Indexed allocation
16
16
He Tan
hetan@ida.liu.se
IISLAB
IDA
From file blocks to disk blocks
•
september 2008
What you will learn
• How data is stored on disk
• How to speed up accessing of data by using indexes
17
september 2008
18
18
3
He Tan
hetan@ida.liu.se
IISLAB
IDA
He Tan
hetan@ida.liu.se
IISLAB
IDA
Indexing
Primary Index
•
one or more index files,
•
The data file is physically ordered on a key feild on the disk.
•
stores the physical addresses of records.
•
All data records are stored sequentially on a disk.
•
single-level ordered indexes
•
An ordered index file with two fields, an indexing field and a
block pointer.
ƒ Primary index
ƒ Clustering index
ƒ Secondary index
september 2008
ƒ The indexing field is specified on the key field on which the data
file is ordered.
19
19
He Tan
hetan@ida.liu.se
IISLAB
IDA
september 2008
He Tan
hetan@ida.liu.se
IISLAB
IDA
Primary Index
20
20
Blocks needed for primary index?
block
block anchor
•
File with 1 000 000 records (r)
•
Block size (B) of 4 096 bytes
•
Record size (R) 100 bytes (fixed, unspanned)
•
Index size (Ri) is 10 bytes (fixed , unspanned)
select * from EMPLOYEE where NAME = ’Wood, Donald’;
september 2008
21
21
He Tan
hetan@ida.liu.se
IISLAB
IDA
•
The data file is physically ordered on a nonkey field on the disk.
All data records sequentially are stored on a disk
•
An ordered index file with two fields, an indexing field and a
block pointer.
22
22
He Tan
hetan@ida.liu.se
IISLAB
IDA
Clustering Index
•
september 2008
Clustering index
block
ƒ The indexing field is specified on the nonkey field, on which the data file is
ordered.
september 2008
23
23
september 2008
24
24
4
He Tan
hetan@ida.liu.se
IISLAB
IDA
He Tan
hetan@ida.liu.se
IISLAB
IDA
Clustering index
Blocks needed for clustering index
•
File with 1000 000 records
•
Block size (B) of 4096
•
Record size 100 bytes (fix, unspanned)
•
Index size is 10 bytes (unspanned)
•
100 records of each value in average
select * from EMPLOYEE where DEPT = ’5’;
september 2008
25
25
He Tan
hetan@ida.liu.se
IISLAB
IDA
An ordered file with two fields, an indexing field and a block
pointer.
•
The data are not physically ordered on the indexing field
He Tan
hetan@ida.liu.se
IISLAB
IDA
september 2008
27
27
29
Secondary Index Example
•
File with 1000 000 elements (r)
•
Block size (B) of 4096
•
Record size (R) 100
•
Each index entry 10 bytes
Secondary index
secondary index
on a nonordering
key field
ƒ The indexing field is specified on a non-ordering field of the data file
september 2008
26
26
He Tan
hetan@ida.liu.se
IISLAB
IDA
Secondary Index
•
september 2008
september 2008
28
28
Secondary Index
secondary index
on a nonordering
nonkey field
29
5
He Tan
hetan@ida.liu.se
IISLAB
IDA
september 2008
Secondary Index Example
Terminology
•
File with 1000 000 elements (r)
•
Dense index: All records in the file occurs in the index
•
Block size (B) of 4096
•
•
Record size (R) 100
Sparse (nondense) index: Only some of the records occur in the
index.
•
Each index entry, R, is then 10 bytes
•
100 values that are identical (average)
•
Are a primary index/a clustering index/a secondary index dense
or sparse?
•
Option 2/ 3
31
31
He Tan
hetan@ida.liu.se
IISLAB
IDA
He Tan
hetan@ida.liu.se
IISLAB
IDA
32
32
He Tan
hetan@ida.liu.se
IISLAB
IDA
Summary: Main index types
•
september 2008
Summary: Properties of indexes
Primary index
ƒ Index the key field
ƒ File ordered by the key field
•
Clustering index
ƒ Index a nonkey field
ƒ File ordered by that nonkey field
•
Secondary index
ƒ Index on a field that is not used for file ordering.
september 2008
He Tan
hetan@ida.liu.se
IISLAB
IDA
september 2008
33
33
35
september 2008
He Tan
hetan@ida.liu.se
IISLAB
IDA
Example
34
34
Example
Assume a file with 5 000 students. There is in average 50 students
with the same family name. Each record is 200 bytes. Family
name takes 20 bytes and personal number 10. The block size is
4096 and a block pointer is 5 bytes.
Assume a file with 5 000 students. There is in average 50 students
with the same family name. Each record is 200 bytes. Family
name takes 20 bytes and personal number 10. The block size is
4096 and a block pointer is 5 bytes
The file is sorted on personal number.
The file is sorted on family name
•
An index on personal number (files sorted on personal number)
•
An index on personal number (files sorted on personal number)
•
An index on family name
•
An index on family name
35
september 2008
36
36
6
Download