Storage : Records and Files

advertisement
Records and Files
Storage Technology: Topic 3
Introduction to Database Systems
1
Record Formats: Fixed Length
F1
F2
F3
F4
L1
L2
L3
L4
Base address (B)
Address = B+L1+L2
Information about field types stored in
system catalogs.
 Direct access to i’th field.

Introduction to Database Systems
2
Record Formats: Variable Length

Two alternative formats (# fields is fixed):
F1
4
Field
Count
F2
$
F3
$
F4
$
$
Fields Delimited by Special Symbols
F1
F2
F3
F4
Array of Field Offsets
 Second offers direct access to i’th field, efficient storage
of nulls (special don’t know value); small directory overhead.
Introduction to Database Systems
3
Files of Records
Page or block is OK when doing I/O, but
higher levels of DBMS operate on records, and
files of records.
 FILE: a collection of pages, each containing a
collection of records. Must support:

– insert/delete/modify record
– read a particular record (specified using record id)
– scan all records (possibly with some conditions on
the records to be retrieved)
Introduction to Database Systems
4
Page Formats: Fixed Length Records
Slot 1
Slot 2
Slot 1
Slot 2
Free
Space
...
Slot N
...
Slot N
Slot M
N
PACKED
1 . . . 0 1 1M
number
of records
M ... 3 2 1
UNPACKED, BITMAP
number
of slots
 Record
id = <page id, slot #>. In first
alternative, moving records for free space
management changes rid; may not be acceptable.
Introduction to Database Systems
5
Page Formats: Variable Length Records
Rid = (i,N)
Page i
Rid = (i,2)
Rid = (i,1)
20
N
...
16
2
SLOT DIRECTORY
24
1
Pointer
to start
of free
space
 Can
move records on page without changing rid;
so, attractive for fixed-length records too.
Introduction to Database Systems
6
Unordered (Heap) Files
Simplest file structure contains records in no
particular order.
 As file grows and shrinks, disk pages are
allocated and de-allocated.
 To support record level operations, we must:

– keep track of the pages in a file
– keep track of free space on pages
– keep track of the records on a page

There are many alternatives for keeping track
of this.
Introduction to Database Systems
7
Heap File Implemented as a List
Data
Page
Data
Page
Data
Page
Pages with
Free Space
Header
Page
Data
Page

Data
Page
Data
Page
Full Pages
Each page contains 2 `pointers’ plus data.
Introduction to Database Systems
8
Heap File Using a Page Directory
Data
Page 1
Header
Page
Data
Page 2
DIRECTORY
Data
Page N
The entry for a page can include the number
of free bytes on the page.
 The directory is a collection of pages; linked
list implementation is just one alternative.

Introduction to Database Systems
9
Indexes

A Heap file allows us to retrieve records:
– by specifying the rid, or
– by scanning all records sequentially

Sometimes, we want to retrieve records by
specifying the values in one or more fields, e.g.,
– Find all students in the “CS” department
– Find all students with a gpa > 3

Indexes enable us to answer value-based
(associative) queries efficiently.
Introduction to Database Systems
10
Alternative File Organizations
Many alternatives exist, each ideal for some
situation , and not so good in others:
– Heap files: Suitable when typical access is a file
scan retrieving all records.
– Sorted Files: Best if records must be retrieved in
some order, or only a `range’ of records is needed.
– Hashed Files: Good for equality selections.
 File is a collection of buckets. Bucket = primary
page plus zero or more overflow pages.
 Hashing function h: h(r) = bucket in which
record r belongs. h looks at only some of the
fields of r, called the search fields.
Introduction to Database Systems
11
Cost Model for Analysis
Ignore CPU costs, for simplicity:
–
–
–
–
B: The number of data pages
R: Number of records per page
D: (Average) time to read or write disk page
Measuring number of page I/O’s ignores gains of
sequential I/O; thus, even I/O cost is only
approximated.
– Average-case analysis; based on several simplistic
assumptions.
 Good enough to show the overall trends!
Introduction to Database Systems
12
Cost of Operations
Scan all recs
Heap
File
BD
Sorted
File
BD
Hashed
File
1.25 BD
Equality Search
0.5 BD
D log2B
D
Range Search
BD
Insert
2D
D (log2B + # of 1.25 BD
pages with
matches)
2D
Search + BD
Delete
Search + D Search + BD
2D
 Several assumptions underlie these (rough) estimates!
Introduction to Database Systems
13
Assumptions
Single record insert and delete.
 Heap Files:

– Equality selection on key; exactly one match.
– Insert always at end of file.

Sorted Files:
– Files compacted after deletions.
– Selections on sort field(s).

Hashed Files:
– No overflow buckets, 80% page occupancy.
Introduction to Database Systems
14
Summary
Variable length record format with field offset
directory offers support for direct access to
i’th field and null values.
 Slotted page format supports variable length
records and allows records to move on page.
 File layer keeps track of pages in a file, and
supports abstraction of a collection of records.

– Linked list or directory data structure
– Sorted and hashed files for query processing
Introduction to Database Systems
15
Download