ppt

advertisement
CS4432: Database Systems II
Record and Page Formats
Chapter 12
CS 4432
1
Overview
Data Items
Records
Blocks
Files
Memory
CS 4432
2
What are the data items we want to store?
• a salary
• a name
• a picture
What we have available: Bytes
8
bits
CS 4432
3
To represent:
• Integer (short): 2 bytes
e.g., 35 is
00000000
CS 4432
00100011
4
To represent:
• Boolean
e.g., TRUE
FALSE
1111 1111
0000 0000
• Enumeration types:
e.g., RED  1
GREEN  3
BLUE  2 YELLOW  4 …
Can we use less than 1 byte/code?
Yes, but only if desperate...
CS 4432
5
To represent:
• Characters
 Various coding schemes suggested (ASCII)
Example:
A: 1000001
a:
1100001
5:
0110101
LF: 0001010
CS 4432
6
To represent: String of characters
– Null terminated
c a
e.g.,
– Length given
e.g.,
3 c
t
a
t
- Fixed length
e.g., In Oracle define the string length.
e.g.,
name CHAR(20),
CS 4432
7
Key Points
• Fixed length items
• Variable length items
- usually length given at beginning
• Type of an item :
- tells us how to interpret (plus size if fixed)
CS 4432
8
Overview
Data Items
Records
Blocks
Files
Memory
CS 4432
9
Record - Collection of related data
items (called FIELDS)
E.g.: Employee record:
name
salary
date-of-hire
...
CS 4432
CHAR (20),
NUMBER,
DATE,
10
Types of records:
• Main choices:
– FIXED vs VARIABLE FORMAT
– FIXED vs VARIABLE LENGTH
CS 4432
11
Fixed format
A SCHEMA contains information such as:
- # fields (attributes)
- type of each field (length)
- order of attributes in record
- meaning of each field (domain)
- constraints (primary key, etc).
Not associated with each record.
CS 4432
12
Example: fixed format & fixed length
Employee record
(1) E.id, 2 byte integer
(2) E.name, 10 char.
(3) Dept, 2 byte code
55 s m i t h
02
83 j o n e s
01
Schema
Records
We can simply concatenate fields.
CS 4432
13
Variable format
• What :
– Not all fields are included in the record,
– and/or, fields possibly in different orders.
• Then :
– Record itself must contain format,
i.e., it is “self-describing”:
CS 4432
14
Why Variable Format ?
• “sparse” records
• repeating fields
• evolving formats
CS 4432
15
Example: variable format and length
46 4 S 4
F O RD
Code for Ename
String type
Length of str.
# Fields
Code identifying
field as E#
Integer type
2 5 I
Field name codes could also be strings, i.e., TAGS
CS 4432
16
• EXAMPLE:
variable format record with repeating fields
e.g., Employee has one or more children
3
E_name: Fred
Child: Sally Child: Tom
• Do repeating fields always require
variable format and length?
CS 4432
17
Repeating fields with fixed format & length
• Then allocate maximum number of repeating fields
• If not used, set to null
Example : a person and her hobbies.
Mary
CS 4432
Sailing
Chess
-18
Many variants between
fixed - variable format:
Example1: Include record type in record
5
27
....
record type
record length
tells me what
to expect
(i.e., points to schema)
CS 4432
19
Record header - data at beginning
that describes record
May contain:
- pointer to schema (record type)
- length of record
- time stamp (create time, mod. time)
- other stuff (e.g., ROW-ID in Oracle)
CS 4432
20
Example2: Variant btw FIXED/VAR format
• Hybrid format : one part is fixed, other is variable
E.g.: All employees have E#, name, dept; and
other fields vary.
25 Smith Toy 2 Hobby:chess retired
# of var
fields
CS 4432
21
Also, many variations in internal
organization of record
Just to show one:
*
3 10
*
5
F1
length of field
*
12
F2
F3
total size
3 32 5 15 20
0
1
2
3
4
F1
5
F2
15
F3
20
offsets
CS 4432
22
Question:
We have seen examples for :
* Fixed format and length records
* Variable format and length records
(a) Does fixed format and variable length
make sense?
(b) Does variable format and fixed length
make sense?
CS 4432
23
Next:
Data Items
Records
Blocks
Files
Memory
CS 4432
24
Goal : placing records into blocks
records
blocks
...
a file
CS 4432
assume fixed
length blocks
assume a single file (for now)
25
Options for storing records in blocks:
(1)
(2)
(3)
(4)
(5)
(6)
CS 4432
separating records
spanned vs. unspanned
mixed record types – clustering
split records
sequencing
indirection
26
(1) Separating records
Block
R1
R2
R3
(a) no need to separate if fixed size records.
(b) or, use special marker
(c) or, give record lengths (or offsets)
- within each record
- in block header
CS 4432
27
(2) Spanned vs. Unspanned
• Unspanned: records within one block
block 1
R1
block 2
R2
R3
...
R4 R5
• Spanned : records wrap across 2 blocks
block 1
R1
CS 4432
R2
R3
(a)
R3
R4
(b)
block 2
R5
R7
R6 (a)
...
28
Spanned vs. unspanned:
• Unspanned is much simpler, but may
waste space…
• Spanned essential if
record size > block size
CS 4432
29
Example
106 records
each of size 2,050 bytes (fixed)
block size = 4096 bytes
block 1
R1
2050 bytes
block 2
R2
wasted 2046
2050 bytes wasted 2046
• Utiliz = 50% -> ½ of space is wasted
CS 4432
30
(3) Mixed versus uniform record types
• Mixed - records of different types
(e.g., EMPLOYEE, DEPT)
allowed in same block
e.g., a block
EMP
CS 4432
e1 DEPT d1 DEPT d2
31
Why do we want to mix?
• Answer: CLUSTERING
Records that are frequently
accessed together should be
placed into the same block
• Problems
Creates variable length records in block
Aim to avoid duplicates (how to cluster?)
Insert/deletes are harder
CS 4432
32
Example Clustering
Q1: select C_NAME, C_CITY, AMOUNT, …
from DEPOSIT, CUSTOMER
where DEPOSIT.C_NAME =
CUSTOMER.C.NAME
a block
layout:
CUSTOMER,NAME=SMITH
DEPOSIT,NAME=SMITH
DEPOSIT,NAME=SMITH
CUSTOMER,NAME=JONES
Question: Good idea or bad idea ?
CS 4432
33
• If Q1 frequent with join on customer and
deposit relations, then clustering good
• But if instead Q2 frequent with :
Q2: SELECT *
FROM CUSTOMER
then clustering is counter-productive
CS 4432
34
Compromise:
No mixing, but keep related
records in same cylinder ...
CS 4432
35
So Far: Storing records in blocks
(1)
(2)
(3)
(4)
(5)
(6)
CS 4432
Separating records
Spanned vs. Unspanned
Mixed record types - Clustering
Split records
Sequencing
Indirection
36
Options for storing records in blocks:
(1)
(2)
(3)
(4)
(5)
(6)
CS 4432
separating records
spanned vs. unspanned
mixed record types – clustering
split records
sequencing
indirection
37
(4) Split records
Fixed part in
one block
Typically for
hybrid format
Variable part in
another block
CS 4432
38
Block with fixed recs.
R1 (a)
R2 (a)
Block with variable recs.
R1 (b)
R2 (b)
R2 (c)
CS 4432
39
(5) Sequencing
• Ordering records in file (and block) by some
key value
– Sequential file ( - sequenced file)
• Why sequencing ?
– Typically to make it possible to efficiently read
records in order
CS 4432
40
Sequencing Options
(a) Next record physically contiguous
...
R1
Next (R1)
(b) Linked
R1
Next (R1)
What about INSERT/ DELETE ?
CS 4432
41
Sequencing Options
(c)
Overflow area
Records
in sequence
CS 4432
header
R1
R2
R3
R4
R5
R2.1
R1.3
R4.7
42
(6) Indirection Addressing
• How does one refer to records?
Rx
• Problem: Records can be on disk or in
(virtual) memory. Need common address, but
have different physical locations.
Many options:
Physical
CS 4432
Indirect
43
Purely Physical Addressing
E.g., Record
Address
( ID )
CS 4432
=
Device ID
Cylinder #
Block ID
Track #
Block #
Offset in block
44
Fully Indirect Addressing
Solution: Record ID (Oracle: ROWID) as
global address, maintain a map table.
Map Table
rec ID
r
CS 4432
Rec ID Physical
addr.
address
a
45
Tradeoff
Physical
Flexibility
to move records
(for deletions, insertions)
Indirect
Cost
of indirection
(lookup)
What to do : Options inbetween ?
CS 4432
46
Ex #1 : Indirection in block
Block Header
A block:
Free space
R3
R4
R1
CS 4432
R2
47
Ex. #2 Use logical block #’s
understood by file system
instead of direct disk access
REC ID
File ID,
Block #
CS 4432
File ID
Block #
Record # or Offset
File System
Map
Physical
Block ID
49
Recap: Storing records in blocks
(1)
(2)
(3)
(4)
(5)
(6)
CS 4432
Separating records
Spanned vs. Unspanned
Mixed record types - Clustering
Split records
Sequencing
Indirection
50
Other Topics in Chapter 12
(1) Insertion/Deletion
(2) Buffer Management
(3) Comparison of Schemes
CS 4432
51
Deletion
Block
Rx
CS 4432
52
Options:
(a)
(b)
Deleted and immediately reclaim space
Mark deleted
– May need chain of deleted records
(for re-use)
– Need a way to mark:
• special characters
• delete field
• in map
CS 4432
53
As usual, many tradeoffs...
• How expensive is to move valid record
to free space for immediate reclaim?
• How much space is wasted?
– e.g., deleted records, delete fields, free
space chains,...
CS 4432
54
Concern with deletions
Dangling pointers
R1
?
Note: If pointers point to physical locations
(rather than ROWIDs), storing new data in
deleted block corrupts data.
CS 4432
55
Solution #1: Do not worry
CS 4432
56
Solution #2: Tombstones
E.g., Leave “MARK” in map or old location
• Physical IDs
A block
This space
never re-used
CS 4432
This space can
be re-used
57
Solution #2: Tombstones
E.g., Leave “MARK” in map or old location
• Logical IDs
map
ID
7788
CS 4432
LOC
Never reuse
ID 7788 nor
space in map...
58
Solution #3 (?):
• Place record ID within every record
• When you follow a pointer, check if it
leads to correct record
to
3-77
rec-id:
3-77
Does this work???
If space reused, won’t new
record
have same ID?
CS 4432
59
Insert
Easy case: Records fixed length/not in sequence
 Insert new record at end of file
 or, in deleted slot
A little harder:
 If records are variable size, not as easy
 may not be able to reuse space –
fragmentation
Hard case: records in sequence
 If free space “close by”, not too bad...
 Or, use overflow idea...
 Or worst case, reorganize file ...
CS 4432
60
Interesting problems:
• How much free space
to leave in each block,
track, cylinder?
• How often do I
reorganize file +
overflow?
CS 4432
Free
space
61
Buffer Management
•
•
•
•
•
•
DB features needed
Why LRU may be bad
Pinned blocks
Forced output
Double buffering
Swizzling
CS 4432
Read
Textbook!
62
Pointer Swizzling
Issue : If records (objects) contain pointers to other
objects, translate locations when load objects into memory.
Memory
Disk
block 1
block 1
block 2
block 2
CS 4432
Rec A
Rec A
63
One Option:
Translation
Table
DB Addr Mem Addr
Rec-A
Rec-A-inMem
Solution: Insert fields that
represent pointers into map table.
Translate pointers as needed.
CS 4432
64
Another Option:
In memory pointers - need “type” bit
to disk
M
CS 4432
to memory
65
Swizzling Issues
• Must ‘unswizzle’
• Updating/writing of records
Swizzling Options
• Automatic
• On-demand
• No swizzling / program control
CS 4432
66
Comparison
• There are 1,000,001 ways to organize
my data on disk…
Which is right for me?
CS 4432
67
Issues:
Flexibility
Space Utilization
Complexity
Performance
CS 4432
68
To evaluate a given strategy,
compute following parameters:
-> space used for expected data
- on average
-> expected time to :
- fetch record given key
- fetch record with next key
- insert/delete/update record
- read complete file
- reorganize file (maybe sort)
-> usage patterns / workload:
- how many/which user queries/updates
CS 4432
69
NEXT
Chapter 13 in book
How to find a record quickly,
given a key
CS 4432
70
Download