Recap of Feb 20: Database Design Goals, Normalization, Normal Forms

advertisement
Recap of Feb 20: Database Design
Goals, Normalization, Normal Forms
• Goals for designing a database: a schema with:
–
–
–
–
simple, easy to phrase queries
avoids redundancies (repetition of information)
avoids anomalies
good performance
• Normalization
– decompose complex relations
– Lossy decompositions
– Functional Dependencies
• Normal Forms: 1NF, 2NF, 3NF, BCNF
– BCNF or 3NF: lossless decomposition in both
• BCNF can’t always ensure dependency preservation
• 3NF sometimes requires null values or redundant information
Getting Physical: Storage and File
Structure (Chapter 11)
• Up until now we have examined database design from a
high-level conceptual view, passing over actual
implementation and underlying hardware.
– Appropriate focus for database users
– But hardware does have an influence on implementation, and
implementation does have an influence on what conceptual designs
will be more efficient and useful
• Now we get physical -- examine physical storage media to
give a background for later focus on implementation of the
data models and languages already described
Chapter 11
At this point we are focussing on the following sections
• 11.1 Overview of Physical Storage Media
• 11.2 Magnetic Disks
• 11.3 RAID (very briefly)
• 11.4 Tertiary Storage
• 11.5 Storage Access
• 11.6 File Organization
• 11.7 Organization of Records in Files
• 11.8 Data-Dictionary Storage
Midterm!
• As per the syllabus, the Midterm will cover the pre-midterm lecture
notes plus sections 1,2,3 (except 3.4 & 3.5), 4, 6, 7-7.7, 11 (except 11.3
and 11.9) of the text
• We have examined all that material except chapter 11, which starts
today
• Scheduling the midterm depends upon the speed with which we get
through the material.
• It looks as if we’ll have enough time to schedule one day for review
before the midterm
• My current guess is that we might be able to schedule the midterm as
early as March 13; certainly no later than March 20. I’m aiming for
Tuesday, March 18.
Midterm Study and Homework #2
• Material you are responsible for:
– All material presented in class before the midterm
– Textbook sections 1,2,3 (except 3.4 & 3.5), 4, 6, 7-7.7, 11 (except 11.3 and
11.9)
• The homework questions from assignment 1 (exercises 1.1, 1.2, 1.3,
and 2.1-2.6) are all useful study aids, as are the questions from
homework assignment #2:
– 3.2, 3.3, 3.5, 3.6 -- 3.9, 3.16
– 4.1, 4.2, 4.4 -- 4.8
– 7.2, 7.4, 7.5, 7.11, 7.12, 7.15, 7.16, 7.21, 7.23
• Homework #2 is due Tuesday, March 11. I’ll try to have them back to
you, graded, Thursday March 13 so you can use them as a study aid
for the exam Tuesday, March 18.
Classification of Physical Storage Media
• Media are classified according to three characteristics:
– speed of access
– cost per unit of data
– reliability
• data loss on power failure or system crash
• physical failure of the storage device
• We can also differentiate storage as either
– volatile storage
– non-volative storage
Physical Storage Media Overview (11.1)
• Typical media available are:
–
–
–
–
–
–
Cache
Main memory
Flash memory
Mag disk
Optical storage (CD or DVD)
Tape storage
Physical Storage Media -- Cache and
Main Memory
• Cache
– fastest and most costly form of storage
– volatile
– managed by computer system hardware
• Main memory
– fast access (10s to 100s of nanoseconds)
– generally too small or expensive to hold the entire database
• current capacities commonly used are up to a few Gigabites
• capacities have gone up and per-byte costs have decreased steadily,
roughly a factor of 2 every 2-3 years
– volatile
Physical Storage Media -- Flash Memory
• Also known as EEPROM -- Electrically Erasable Programmable ReadOnly Memory
• non-volatile
• reading data is comparable to main memory speeds
• writing is more complex
– can’t overwrite a single location -- a whole bank of memory must be
erased to permit writing within that bank.
– erasing is only supported a limited number of times -- 10,000 to one
million erase cycles
– writes are slow (a few microseconds), and erases are slower
• cost comparable to main memory
• widely used in computer systems embedded in other devices, such as
digital cameras and hand-held computers
Physical Storage Media -- Magnetic Disk
•
•
•
•
data is stored on a spinning disk and read/written magnetically
primary medium for long-term storage of data
typically stores entire database
data must be moved from disk to main memory for access, and written back
for storage
– much slower access than main memory (about which more later)
•
•
direct access -- possible to read data on disk in any order, unlike magnetic tape
capacities up to 100 gig
– much larger capacity and cheaper cost/byte than main memory or flash memory
– capacity doubles every two or three years
•
survives power failures and system crashes
– disk failure can destroy data, but this is more rare than system crashes
Physical Storage Media -- Optical
Storage
• Non-volatile; data is read optically from a spinning disk using a laser
• CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular forms
• Write-once, Read-many (WORM) optical disks used for archival
storage (CD-R and DCD-R)
• Multiple-write versions also available (CD-RW, DVD-RW, and DVDRAM)
• Reads and writes are slower than with magnetic disk
• Juke-box systems available for storing large volumes of data
– large numbers of removable disks
– several drives
– mechanism for automatic loading/unloading of disks
Physical Storage Media -- Tape Storage
• Non-volatile
• used primarily for backup (to recover from disk failure) and for
archival data
• sequential access -- much slower than disk
• very high capacity (40-300 GB tapes available)
• tape can be removed from drive; storage costs much cheaper than disk,
but drives are expensive; data is read optically from a spinning disk
using a laser
• Juke-box systems available for storing large volumes of data
– e.g., remote sensing data, possibly hundreds of terabytes (1012 bytes) or
even a petabyte (1015 bytes)
Storage Hierarchy
•
Primary storage: fastest media but
volatile
– cache
– main memory
•
secondary storage: next level in
hierarchy; moderately fast access
time, non-volatile
– also called on-line storage
– flash memory, magnetic disks
•
tertiary storage: lowest level in
hierarchy; slower access time, nonvolatile
– also called off-line storage
– optical storage, magnetic tape
Magnetic Disks (11.2)
•
Read-write head
– positioned very close to the platter
surface (almost touching it)
– Reads or writes magnetically
coded information
•
Surface of platter divided into
circular tracks
– over 16,000 tracks per platter on
typical hard disks
•
Each track is divided into sectors
– a sector is the smallest unit of data
that can be read or written
– sector size is typically 512 bytes
– typically 200 (on inner tracks) to
400 (outer tracks) sectors per track
Magnetic Disks (cont)
•
To read/write a sector
– disk arm swings to position head
on the right track
– platter spins continually; data is
read/written as sector passes under
head
•
Head-disk assemblies
– multiple disk platters on a single
spindle (typically 2 to 4)
– one head per platter, mounted on a
common arm
•
Cylinder i consists of ith track of all
the platters
Magnetic Disks (cont)
• Earlier generation disks were susceptible to head crashes
– disk spins constantly at 60, 120, even 250 revolutions per second
– head is very close to the surface; if it touches the surface it can
scrape the recording medium off the surface, wiping out data and
causing the removed medium to fly around, causing more head
crashes
– newer disks have less friable material; less subject to head crashes
Magnetic Disks (cont)
•
Disk controller -- interfaces
between the computer system and
the disk drive
– accepts high-level commands to
read or write a sector
– initiates actions such as moving
the disk arm to the right track and
actually reading or writing the data
– computes and attaches checksums
to each sector to verify that data is
read back correctly
– Ensures successful writing by
reading back sector after writing it
– Performs remapping of bad sectors
•
Multiple disks are connected to a
computer system through a
controller
– controllers functionality
(checksum, bad sector remapping)
often carried out by individual
disks, reducing load on controller
•
Two disk interface standards are
ATA (AT attachment) and SCSI
(Small Computer System
Interconnect)
Disk Performance Measures
• Access time -- the time it takes from when a read or write request is
issued to when data transfer begins. Consists of:
– Seek time -- time it takes to reposition the arm over the correct track
• average seek time is 1/2 the worst case seek time
• 4 to 10 milliseconds on typical disks
– Rotational latency -- time it takes for the sector to appear under the head
• average latency is 1/2 the worst case latency
• 4 to 11 milliseconds on typical disk (5400 to 15000 rpm)
• Data Transfer Rate -- rate of retrieval from or storage to disk
– 4 to 8 MB per second is typical
– Multiple disks may share a controller, so the rate that controller can handle
is also important. E.G., ATA-5: 66MB/sec, SCSI-3: 40MB/sec, Fiber
Channel: 256 MB/s
Disk Performance Measures (cont.)
• Mean time to failure (MTTF) - the average time the disk is expected to
run continuously without any failure.
– Typically 3-5 years
– Sounds good, but if you have 1500 disks then 300 per year will fail, or
about 1 per day
– MTTF decreases as the disk ages
• RAID (Redundant Arrays of Independent Disks) (11.3)
– disk organization techniques that manage a large number of disks,
providing a view of a single disk of
• high capacity and high speed by using multiple disks in parallel
• high reliability by storing data redundantly, so that data can be recovered even
if a disk fails
• MTTdata loss can be as high as 500,000 to 1,000,000 hours on a RAID
Optimization of Disk-Block Access:
Motivation
• Requests for disk I/O are generated both by the file system and by the
virtual memory manager
• Each request specifies the address on the disk to be referenced in the
form of a block number
– a block is a contiguous sequence of sectors from a single track on one
platter
– block sizes range from 512 bytes to several K (4 -- 16K is typical)
– smaller blocks mean more transfers from disk; larger blocks makes for
more wasted space due to partially filled blocks
– block is the standard unit of data transfer between disk to main memory
• Since disk access speed is much slower than main memory access,
methods for optimizing disk-block access are important
Optimization of Disk-Block Access:
Methods
• Disk-arm Scheduling: requests for several blocks may be
speeded up by requesting them in the order they will pass
under the head.
– If the blocks are on different cylinders, it is advantageous to ask for
them in an order that minimizes disk-arm movement
– Elevator algorithm -- move the disk arm in one direction until all
requests from that direction are satisfied, then reverse and repeat
– Sequential access is 1-2 orders of magnitude faster; random access
is about 2 orders of magnitude slower
Optimization of Disk-Block Access:
Methods
• Non-volatile write buffers
– store written data in a RAM buffer rather than on disk
– write the buffer whenever it becomes full or when no other disk
requests are pending
– buffer must be non-volatile to protect from power failure
• called non-volatile random-access memory (NV-RAM)
• typically implemented with battery-backed-up RAM
– dramatic speedup on writes; with a reasonable-sized buffer write
latency essentially disappears
– why can’t we do the same for reads? (hints: ESP, clustering)
Optimization of Disk-Block Access:
Methods
• File organization (Clustering): reduce access time by
organizing blocks on disk in a way that corresponds
closely to the way we expect them to be accessed
– sequential files should be kept organized sequentially
– hierarchical files should be organized with mothers next to
daughters
– for joining tables (relations) put the joining tuples next to each
other
– over time fragmentation can become an issue
• restoration of disk structure (copy and rewrite, reordered) controls
fragmentation
Optimization of Disk-Block Access:
Methods
• Log-based file system
– does not update in-place, rather writes updates to a log disk
• essentially, a disk functioning as a non-volatile RAM write buffer
– all access in the log disk is sequential, eliminating seek time
– eventually updates must be propogated to the original blocks
• as with NV-RAM write buffers, this can occur at a time when no disk
requests are pending
• the updates can be ordered to minimize arm movement
– this can generate a high degree of fragmentation on files that
require constant updates
• fragmentation increases seek time for sequential reading of files
Storage Access (11.5)
• Basic concepts (some already familiar):
– block-based. A block is a contiguous sequence of sectors from a single
track; blocks are units of both storage allocation and data transfer
– a file is a sequence of records stored in fixed-size blocks (pages) on the
disk
– each block (page) has a unique address called BID
– optimization is done by reducing I/O, seek time, etc.
– database systems seek to minimize the number of block transfers between
the disk and memory. We can reduce the number of disk accesses by
keeping as many blocks as possible in main memory.
– Buffer - portion of main memory used to store copies of disk blocks
– buffer manager - subsystem responsible for allocating buffer space in
main memory and handling block transfer between buffer and disk
Buffer Management
• The buffer pool is the part of the main memory alocated for
temporarily storing disk blocks read from disk and made available to
the CPU
• The buffer manager is the subsystem responsible for the allocation and
the management of the buffer space (transparent to users)
•
On a process (user) request for a block (page) the buffer manager:
–
–
–
–
•
checks to see if the page is already in the buffer pool
if so, passes the address to the process
if not, it loads the page from disk and then passes the address to the process
loading a page might require clearing (writing out) a page to make space
Very similar to the way virtual memory managers work, although it can do a
lot better (why?)
Buffer Replacement Strategies
• Most operating systems use a LRU replacement scheme. In database
environments, MRU is better for some common operations (e.g., join)
– LRU strategy: replace the least recently used block
– MRU strategy: replace the most recently used block
• Sometimes it is useful to fasten or pin blocks to keep them available
during an operation and not let the replacement strategy touch them
– pinned block is thus a block that is not allowed to be written back to disk
• There are situations where it is necessary to write back a block to disk
even though the buffer space it occupies is not yet needed. This write
is called the forced output of a block; useful in recovery situations
• Toss-immediate strategy: free the space occupied by a block as soon as
the final tuple of that block has been processed
Buffer Replacement Strategies
• Most recently used (MRU) strategy: system must pin the block
currently being processed. After the final tuple of that block has been
processed the block is unpinned and becomes the most recently used
block. This is essentially “toss-immediate” with pinning, and works
very well with joins.
• The buffer manager can often use other information (design or
statistical) to predict the probability that a request will reference a
particular page
– e.g., the data dictionary is frequently accessed -- keep the data dictionary
blocks in main memory buffer
– if several pages are available for overwrite; choose the one that has the
lowest number of recent access requests to replace
Buffer Management (cont)
• Existing OS affect DBMS operations by:
–
–
–
–
read ahead, write behind
wrong replacement strategies
Unix is not good for DBMS to run on top
Most commercial systems implement their own I/O on a raw disk
partition
• Variations of buffer allocation
–
–
–
–
common buffer pool for all relations
separate buffer pool for each relation
as above but with relations borrowing space from each other
prioritized buffers for very frequently accessed blocks, e.g. data
dictionary
Buffer Management (cont)
• For each buffer the manager keeps the following:
– which disk and which block it is in
– whether the block is dirty (has been modified) or not (why?)
– information for the replacement strategy
• last time block was accessed
• whether it is pinned
• possible statistical information (access frequency etc.)
Related documents
Download