Database:

advertisement
Chapter 10
Storage, Basic File Structure and Indexing
Storage Categories

Storage medium is required to store information/data
 Primary memory
 can be accessed by the CPU directly
 Fast, expensive and limited in capacity
 Volatile
 Secondary memory
 Data on SM cannot be processed by CPU directly
 Slow, larger capacity, less expensive
 Non-volatile
Secondary storage is the media of database storage

Disks and Files



DBMS stores information on (“hard”) disks.
This has major implications for DBMS design!
 READ: transfer data from disk to main memory (RAM).
 WRITE: transfer data from RAM to disk.
Both are high-cost operations, relative to in-memory operations, so must be planned carefully!
Disks



Secondary storage device of choice.
Data is stored and retrieved in units called disk blocks or pages.
Unlike RAM, time to retrieve a disk page varies depending upon location on disk.
 Therefore, relative placement of pages on disk has major impact on DBMS performance!
Components of a Disk
10.1
Disk head
Track
s
Sector
Arm movement
Platter
s
Arm
assembly
Records and Files



Record consists of a collection of related data values or items (or fields, column etc)
A file is a sequence of records made up of
 Fixed-length records
 Variable-length records
A database is stored as a collection of files
Record Formats: Fixed Length
10.2
F1
L1
Base address (B)


F2
F3
F4
L2
L3
L4
Address = B+L1+L2
Information about field types same for all records in a file; stored in system catalogs.
Finding i’th field does not require scan of record.
Fixed Length Records



Store record i start
– 1), where n is the size of each record.
Record access is simple but records may cross blocks
Deletion of record i alternatives
 move records i + 1, . . ., n
to i, . . . , n – 1. move record n to i
 do not move records, but link all free records on a free list
10.3
Variable-Length Records
Record Organization (on Disks)
(a) Unspanned. (b) Spanned
10.4
File Organization & Access Method



File organization refers to physical arrangement of data in a file into records and pages of the
secondary storage
Access method refers to the steps involved in storing and retrieving record from a file
Some common file organizations and access methods are discussed now
Unordered File







Also called a heap or a pile file.
Simplest file structure contains records in no particular order.
As file grows and shrinks, disk pages are allocated and de-allocated.
New records are inserted at the end of the file.
To search for a record, a linear search through the file records is necessary. This requires reading
and searching half the file blocks on the average, and is hence quite expensive.
Record insertion is quite efficient.
Reading the records in order of a particular field requires sorting the file records
10.5
Ordered Files





Also called a sequential file.
File records are kept sorted by the values of an ordering field.
Insertion is expensive: records must be inserted in the correct order. It is common to keep a
separate unordered overflow (or transaction ) file for new records to improve insertion efficiency;
this is periodically merged with the main ordered file.
A binary search can be used to search for a record on its ordering field value. This requires
reading and searching log2 of the file blocks on the average, an improvement over linear search.
Reading the records in order of the ordering field is quite efficient
Hash Files






Hashing for disk files is called External Hashing
The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket1, ..., bucket M-1.
Typically, a bucket corresponds to one (or a fixed number of) disk block.
One of the file fields is designated to be the hash key of the file.
The record with hash key value K is stored in bucket i, where i=h(K), and h is the hashing function.
Search is very efficient on the hash key.
Collisions occur when a new record hashes to a bucket that is already full. An overflow file is kept
for storing such records. Overflow records that hash to each bucket can be linked together.
Indexing Structures for Files


Index is a data structure that allows the DBMS to locate a particular records in a file more quickly
and thereby speed response to user queries
An index file consists of records (called index entries) of the form
search-key
pointer
–
–

Any subset of the fields of a relation can be the search key for an index on the relation.
Search key is not the same as key (minimal set of fields that uniquely identify a record in
a relation).
Index files are typically much smaller than the original file
Types of Indexes
 There are different types of indexes
– Single-level Indexes
 Primary Indexes
 Clustering Indexes
 Secondary Indexes
 Multilevel Indexes
Single Level Index



A single-level index is an auxiliary file that makes it more efficient to search for a record in the data
file.
The index is usually specified on one field of the file (although it could be specified on several
fields)
One form of an index is a file of entries <field value, pointer to record>, which is ordered by field
value
10.6
Primary Index





Defined on an ordered data file
The data file is ordered on a key field
Includes one index entry for each block in the data file; the index entry has the key field value for
the first record in the block, which is called the block anchor
A similar scheme can use the last record in a block.
A primary index is a nondense (sparse) index, since it includes an entry for each disk block of the
data file and the keys of its anchor record rather than for every search value.
10.7
Clustering Index



Defined on an ordered data file
The data file is ordered on a non-key field unlike primary index, which requires that the ordering
field of the data file have a distinct value for each record.
Includes one index entry for each distinct value of the field; the index entry points to the first data
block that contains records with that field value.
10.8
Secondary Index




A secondary index provides a secondary means of accessing a file for which some primary access
already exists.
The secondary index may be on a field which is a candidate key and has a unique value in every
record, or a nonkey with duplicate values.
The index is an ordered file with two fields.
 The first field is of the same data type as some nonordering field of the data file that is
an indexing field.
 The second field is either a block pointer or a record pointer. There can be many
secondary indexes (and hence, indexing fields) for the same file.
Includes one entry for each record in the data file; hence, it is a dense index
10.9
Download