The Database Design and Implementation Process Phase 1: Requirements Collection and Analysis Phase 2: Conceptual Database Design Phase 3: Choice of DBMS Phase 4: Data Model Mapping (Logical Database Design) Phase 5: Physical Database Design <= You are here! Phase 6: Database System Implementation and Tuning Physical Design Inputs • Logical design from Phase 4 • Intended use of the database: applications, and queries & transactions (Q&T) • Expected frequency of invocations of Q&T • Timing constraints of these Q&T • Expected frequencies of update operations • “Uniqueness” of attributes: for designing access path (indexes) Physical DB design/Tuning Decisions • Storage level decisions: – file organization, size of table, record size, block size, I/O and device, DBMS functions optimization • Deciding on indexing • *Denormalization & view materialization • *Tuning queries – make use of indexes – avoid join and correlated query when possible ==> break the inner query to another query – order of table that affects join processing – views defined for every possible application, overkill – Union is better than “or” in the inner query – to match how individual query optimizer works Primary File Organization Sequential: heap Indexed: B-tree, B+, B* trees, ISAM, VSAM Hash file Indexes • Primary index – physical key order (unique) • Clustering index – Closeness properties – physically ordered on non-unique, non-key field – non-dense index (one for each indexing field but not for every record) • Secondary index – non-ordering field of the file – dense index: one entry for each record • Either primary or clustering (only one for a file); can have many secondary indexes • Single-level vs. Multi-level indexes Disk Storage Devices • Preferred secondary storage device for high storage capacity and low cost. • Data stored as magnetized areas on magnetic disk surfaces. • A disk pack contains several magnetic disks connected to a rotating spindle. • Disks are divided into concentric circular tracks on each disk surface. Disk Storage Devices -2 • Because a track usually contains a large amount of information, it is divided into smaller blocks or sectors. • The division of a track into sectors is hardcoded on the disk surface and cannot be changed. • A track is divided into blocks. The block size B is fixed for each system. Typical block sizes range from B=512 bytes to B=4096 bytes. • Whole blocks are transferred between disk and main memory for processing. Disk Storage Devices - 3 Disk Storage Devices - 4 • A read-write head moves to the track that contains the block to be transferred. • Disk rotation moves the block under the readwrite head for reading or writing. • Reading or writing a disk block is time consuming because of the seek time s and rotational delay (latency) rd. • Double buffering can be used to speed up the transfer of contiguous disk blocks. Disk Storage Devices - 5 Records • Fixed and variable length records • Records contain fields which have values of a particular type (e.g., amount, date, time, age) • Fields themselves may be fixed length or variable length • Variable length fields can be mixed into one record: separator characters or length fields are needed so that the record can be “parsed”. Blocking • Blocking: refers to storing a number of records in one block on the disk. • Blocking factor (bfr) refers to the number of records per block. • There may be empty space in a block if an integral # of records do not fit in one block. • Spanned Records: refer to records that exceed the size of one or more blocks and hence span a number of blocks. Files of Records • Databases are stored as files on disks. • A file is a sequence of records, where each record is a collection of data values (or items). • A file descriptor (or file header) includes information that describes the file, such as the field names and their data types, and the addresses of the file blocks on disk. • Records are stored on disk blocks. • The blocking factor bfr for a file is the (average) number of file records stored in a disk block. • A file can have fixed-length records or variablelength records. Unordered Files • Also called a heap or a pile file. • New records are inserted at the end of the file. • To search for a record, a linear search through the file records is necessary (O(n)). • This requires reading and searching half the file blocks on the average, and is hence quite expensive. • Record insertion is quite efficient. • Reading the records in order of a particular field requires sorting the file records (O(nlogn)). Ordered Files • Also called sequential files. Records are kept sorted by the values of an ordering field. • Insertion is expensive: records must be inserted in the correct order – it is common to keep a separate unordered overflow (transaction) file for new records to improve insertion efficiency – this is periodically merged with the main ordered file. • A binary search can be used to search for a record on its ordering field value – requires reading and searching log2n of the file blocks on the average – a big improvement over linear search! • Reading the records in order of the ordering field is quite efficient. Average Access Times • The following table shows the average access time to access a specific record for a given type of file Hashed Files • Hashing for disk files is called external hashing • The file blocks are divided into M equal-sized buckets, numbered buck0, buck1, ..., buckM-1. – Typically, a bucket corresponds to one (or a fixed # of) disk block(s). • One of the file fields is designated to be the hash key. • The record with hash key value K is stored in bucket i, where i=h(K), and h is the hashing function. – E.g. i = PK mod M • Search is very efficient on the hash key. Why? Hashed Files - 2 Hashed Files - 3 • What happens when a new record hashes to a bucket that is already full? • Called a collision. • What does a collision mean about your hashing function? • What to do about collision? – An overflow file is kept for storing such records. – Overflow records that hash to each bucket can be linked together. Hashed Files - 4 • There are numerous methods for collision resolution, including the following: – Open addressing: Proceeding from the occupied position specified by the hash address, the program checks the subsequent positions in order until an unused (empty) position is found. – Chaining: For this method, various overflow locations are kept, usually by extending the array with a number of overflow positions. In addition, a pointer field is added to each record location. A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location. – Multiple hashing: The program applies a second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary. Chaining Example Hashed Files - 5 • To reduce overflow records, a hash file is typically kept 70-80% full. • The hash function h should distribute the records uniformly among the buckets – otherwise, search time will be increased because many overflow records will exist. • Main disadvantages of static external hashing: – Fixed number of buckets M is a problem if the number of records in the file grows or shrinks. – Ordered access on the hash key is quite inefficient (requires sorting the records). Hashed Files - 6 Hashing - Exercise • A PARTS file with Part# as hash key includes records with the following Part# values: 2369, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428, 3943, 4750, 6975, 4981, 9208. • The file uses 8 buckets, numbered 0 to 7. Each bucket is one disk block and holds two records. Show how you would load these records into the file in the given order using the hash function h(K)=K mod 8. • BONUS: Calculate the average number of block accesses for a random retrieval on Part#. Indexes as Access Paths • A single-level index is an auxiliary file that makes it more efficient to search for a record in the data file. • The index is usually specified on one field of the file (although it could be specified on several fields). • One form of an index is a file of entries <field value, pointer to record> which is ordered by field value • The index is called an access path on the field. Indexes as Access Paths - 2 • The index occupies considerably less space than the data because its entries are much smaller. • A binary search on the index yields a pointer to the file record. • Indexes can also be characterized as dense or sparse. – A dense index has an index entry for every search key value (and hence every record) in the data file. – A sparse (or nondense) index, on the other hand, has index entries for only some of the search values Indexes as Access Paths - 3 Example: Given the following data file: EMPLOYEE(NAME, SSN, ADDRESS, JOB, SAL, ... ) Suppose that: record size R=150 bytes block size B=512 bytes r=30000 records Indexes as Access Paths - 4 Then, we get: – blocking factor Bfr= B div R= 512 div 150= 3 records/block – number of file blocks b= (r/Bfr)= (30,000/3)= 10,000 blocks For an index on the SSN field, – – – – – – assume the field size VSSN=9 bytes, assume the record pointer size PR=7 bytes. Then: index entry size RI=(VSSN+ PR)=(9+7)=16 bytes index blocking factor BfrI= B div RI= 512 div 16= 32 entries/block number of index blocks b= (r/ BfrI)= (30,000/32)= 938 blocks binary search needs log2bI= log2938= 10 block accesses This is compared to an average linear search cost of: (b/2)= 30,000/2= 15,000 block accesses If the file records are ordered, the binary search cost would be: log2b= log230,000= 15 block accesses Types of Single-Level Indexes • Primary Index – Defined on an ordered data file – The data file is ordered on a key field – Includes one index entry for each block in the data file; the index entry has the key field value for the first record in the block, which is called the block anchor – A primary index is a nondense (sparse) index, since it includes an entry for each disk block of the data file and the keys of its anchor record rather than for every search value. Types of Single-Level Indexes • Clustering Index – Defined on an ordered data file – The data file is ordered on a non-key field unlike primary index, which requires that the ordering field of the data file have a distinct value for each record. – Includes one index entry for each distinct value of the field; the index entry points to the first data block that contains records with that field value. Clustering index with a separate block cluster for each group of records that share the same value for the clustering field. Types of Single-Level Indexes • Secondary Index – A secondary index provides a secondary means of accessing a file for which some primary access already exists. – The secondary index may be on a field which is a candidate key and has a unique value in every record, or a nonkey with duplicate values. – The index is an ordered file with two fields. • The first field is of the same data type as some nonordering field of the data file that is an indexing field. • The second field is either a block pointer or a record pointer. There can be many secondary indexes (and hence, indexing fields) for the same file. – Includes one entry for each record in the data file; hence, it is a dense index A dense secondary index (with block pointers) on a nonordering key field of a file. A secondary index (with record pointers) on a nonkey field implemented using one level of indirection so that index entries are of fixed length and have unique field values. Properties of Index Types Multi-Level Indexes • Because a single-level index is an ordered file, we can create a primary index to the index itself – the original index file is called the first-level index and the index to the index is called the second-level index. • We can repeat the process, creating a third, fourth, ..., top level until all entries of the top level fit in one disk block • A multi-level index can be created for any type of firstlevel index (primary, secondary, clustering) as long as the first-level index consists of more than one disk block A two-level primary index resembling ISAM (Indexed Sequential Access Method). Multi-Level Indexes • Such a multi-level index is a form of search tree; however, insertion and deletion of new index entries is a severe problem because every level of the index is an ordered file. • Solution? B-trees B-Tree Example • A node in a search tree with pointers to subtrees below it q q B-Tree Definition A B-tree of order M is a multiway search tree such that: • All leaves are on the bottom level. • All internal nodes (except the root node) have at least ceil(M/2) children. • The root node can have as few as 2 children if it is an internal node, and may have no children if the root node is a leaf (i.e., the tree consists only of the root node). • Each leaf node (other than the root node if it is a leaf) must contain at least ceil(M/2) - 1 keys. B-Tree Rules A B-Tree of order 4 must meet the following conditions: • The keys in each node are in ascending order. • At every given Node the following is true: – The subtree starting at Node.Branch[0] has only keys that are less than Node.Key[0]. – The subtree starting at Node.Branch[1] has only keys that are greater than Node.Key[0] and at the same time less than Node.Key[1]. – The subtree starting at Node.Branch[2] has only keys that are greater than Node.Key[1] and at the same time less than Node.Key[2]. – The subtree starting at Node.Branch[3] has only keys that are greater than Node.Key[2]. • Note: if less than the full number of keys are in the Node, these 4 conditions are truncated so that they speak of the appropriate number of keys and branches. Dynamic Multilevel Indexes Using B-Trees and B+-Trees • Because of the insertion and deletion problem, most multi-level indexes use B-tree or B+-tree data structures, which leave space in each tree node (disk block) to allow for new index entries • These data structures are variations of search trees that allow efficient insertion and deletion of new search values. • In B-Tree and B+-Tree data structures, each node corresponds to a disk block • Each node is kept between half-full and completely full Dynamic Multilevel Indexes Using B-Trees and B+-Trees - 2 • An insertion into a node that is not full is quite efficient; if a node is full the insertion causes a split into two nodes • Splitting may propagate to other tree levels • A deletion is quite efficient if a node does not become less than half full • If a deletion causes a node to become less than half full, it must be merged with neighboring nodes B-tree versus B+-tree • In a B-tree, pointers to data records exist at all levels of the tree • In a B+-tree, all pointers to data records exists at the leaf-level nodes • A B+-tree can have fewer levels (or higher capacity of search values) than the corresponding B-tree B-tree structures. (a) A node in a B-tree with q – 1 search values. (b) A B-tree of order p = 3. The values were inserted in the order 8, 5, 1, 7, 3, 12, 9, 6. The nodes of a B+-tree. (a) Internal node of a B+-tree with q –1 search values. (b) Leaf node of a B+-tree with q – 1 search values and q – 1 data pointers. B-Tree Algorithm • When inserting an item, first search for it in the B-tree. • If the item is not there, this search will end at a leaf. • If there is room in this leaf, insert the item here (this may require moving some existing keys). • If this leaf node is full then it must be "split" with about half of the keys going into a new node to the right. • The median (middle) key is moved up to the parent node. (If the parent is full, it has to be split as well.) • Note: If the root node is ever split, the median key moves up into a new root node, thus causing the tree to increase in height by one. B-Tree Exercise • Create a B-Tree of Order 5, containing the following nodes (in the order of their insertion): – CNGAHEKQ MFWLTZDPRXYS • Remember: Order 5 means that each node can have a maximum of 5 children and 4 keys. • BONUS QUESTIONS: – What is the average number of block accesses for a random retrieval on {this tree, a full tree}? – What would the average numbers be if this was {a heap, a sorted file}?