Representing Data Elements

advertisement

Representing Data Elements

Fields, Records, Blocks

Variable-length Data

Modifying Records

Source: our textbook

1

Overview

 Attributes are represented by sequences of bytes, called fields

 Tuples are represented by collections of fields, called records

 Relations are represented by collections of records, called files

 Files are stored in blocks, using specialized data structures to support efficient modification and querying

2

Representing SQL Data Types

 integers and reals: built-in

 CHAR(n): array of n bytes

 VARCHAR(n): array of n+1 bytes (extra byte is either string length or null char)

 dates and times: fixed length strings

 etc.

3

Representing Tuples

 For now, assume all attributes (fields) are fixed length.

 Concatenate the fields

 Store the offset of each field in schema

0 30 286 287 297 name

CHAR(30)

30 bytes address

VARCHAR(255)

256 bytes gender

CHAR(1)

1 byte birthdate

DATE

10 bytes

4

More on Tuples

 Due to hardware considerations, certain types of data need to start at addresses that are multiples of 4 or 8

 Previous example becomes:

0 32 288 292 304 name

CHAR(30)

30 bytes

+ 2 address

VARCHAR(255)

256 bytes gender

CHAR(1)

1 byte

+ 3 birthdate

DATE

10 bytes

+ 2 5

Record Headers

 Often it is convenient to keep some

"header" information in each record:

 a pointer to schema information

(attributes/fields, types, their order in the tuple, constraints)

 length of the record/tuple

 timestamp of last modification

6

Packing Records into Blocks

 Start with block header:

 timestamp of last modification/access

 offset of each record in the block, etc.

 Follow with sequence of records

 May end with some unused space header record 1 record 2 … record n-1record n

7

Representing Addresses

 Often addresses (pointers) are part of records:

 the application data in object-oriented databases

 as part of indexes and other data structures supporting the DBMS

 Every data item (block, record, etc.) has two addresses:

 database address : address on the disk

(typically 8-16 bytes)

 memory address , if the item is in memory

(typically 4 bytes)

8

Translation Table

 Provides mapping from database addresses to memory addresses for all blocks currently in memory

 Later we'll discuss how to implement it

9

Pointer Swizzling

 When a block is moved from disk into main memory, change all the disk addresses that point to items in this block into main memory addresses.

 Need a bit for each address to indicate if it is a disk address or a memory address.

 Why? Faster to follow memory pointers (only uses a single machine instruction).

10

Disk

Example of Swizzling

Main Memory read into main memory swizzled

Block 1 unswizzled

Block 2

11

Swizzling Policies

 Automatic swizzling: as soon as block is brought into memory, swizzle all relevant pointers

 Swizzling on demand: only swizzle a pointer if and when it is actually followed

 No swizzling: always refer to translation table

 Programmer control

12

Automatic Swizzling

 Locating all pointers within a block:

 refer to the schema, which will indicate where addresses are in the records

 for index structures, pointers are at known locations

 Update translation table with memory addresses of items in the block

 Update pointers in the block (in memory) with memory addresses, when possible, as obtained from translation table

13

Unswizzling

 When a block is moved from memory back to disk, all pointers must go back to database (disk) addresses

 Use translation table again

 Important to have an efficient data structure for the translation table

14

Pinned Records and Blocks

 A block in memory is pinned if it cannot be safely written back to disk

 Indicate with a bit in the block header

 Reasons for pinning:

 related to failure recovery (more later)

 because of pointer swizzling

 If block B1 has swizzled pointer to an item in block B2, then B2 is pinned.

15

Unpinning a Block

 Consider each item in the block to be unpinned

 Keep in the translation table the places in memory holding swizzled pointers to that item (e.g., with a linked list)

 Unswizzle those pointers: use translation table to replace the memory addresses with database (disk) addresses

16

Variable Length Data

 Data items with varying size (e.g., if maximum size of a field is large but most of the time the values are small)

 Variable-format records (e.g., NULLs method for representing a hierarchy of entity sets as relations)

 Records that do not fit in a block (e.g., an MPEG of a movie)

17

Variable-Length Fields

 Store the fixed-length fields before the variable-length fields in each record

 Keep in the record header

 record length

 pointers to the beginnings of all the variable-length fields

 Book discusses variations on this idea

18

other header info

Variable Length Fields

to var len field 2 to var len field 3 fixed len field 1 fixed len field 2 var len field 1 var len field 2 var len field 3 record length

19

Variable-Format Records

 Represent by a sequence of tagged fields

 Each tagged field contains

 name

 type

 length, if not deducible from the type

 value

20

Splitting Records Across Blocks

 Called spanned records

 Useful when

 record size exceeds block size

 putting an integral number of records in a block wastes a lot of the block (e.g., record size is 51% of block size)

 Each record or fragment header contains

 bit indicating if it is a fragment

 if fragment then pointers to previous and next fragments of the record (i.e., a linked list)

21

Record Modification

 Modifications to records:

 insert

 delete

 update

 issues even with fixed-length records and fields

 even more involved with variable-length data

22

Inserting New Records

 If records need not be any particular order, then just find a block with enough empty space

 Later we'll see how to keep track of all the tuples of a given relation

 But what if blocks should be kept in a certain order, such as sorted on primary key?

23

header

Insertion in Order

unused record

4 record

3 record

2 record

1

If there is space in the block, then add the record

(going right to left), add a pointer to it (going left to right) and rearrange the pointers as needed.

24

What if Block is Full?

 Records are stored in several blocks, in sorted order

 One approach: keep a linked list of

"overflow" blocks for each block in the main sequence

 Another approach is described in the book

25

Deleting Records

 Try to reclaim space made available after a record is deleted

 If using an offset table, then rearrange the records to fill in any hole that is left behind and adjust the pointers

 Additional mechanisms are based on keeping a linked list of available space and compacting when possible

26

Tombstones

 What about pointers to deleted records?

 We place a tombstone in place of each deleted record

 Tombstone is permanent

 Issue of where to place the tombstone

 Keep a tombstone bit in each record header: if this is a tombstone, then no need to store additional data

27

Updating Records

 For fixed-length records, there is no effect on the storage system

 For variable-length records:

 if length increases, like insertion

 if length decreases, like deletion except tombstones are not necessary

28

Download