The Relational Model (cont’d) Introduction to Disks and Storage

advertisement
The Relational Model (cont’d)
Introduction to Disks and Storage
CS 186, Spring 2007, Lecture 3
Cow book Section 1.5, Chapter 3 (cont’d)
Cow book Chapter 9
Mary Roth
Administrivia
• Homework 0 due today 10 p.m.!
• Nathan and Erinaios posted their office hours on
class homepage
• Homework 1 available today from class web site
Submit team members online
Read thru description; we’ll talk more about it
after today’s lecture
• Questions from last time?
Outline
• What we learned last time
– Components of a DBMS
– Relational Data Model
• New stuff
– Storage, Disks and Files
Review: Components of a DBMS
• A DBMS is like an ogre; it has layers
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Today we go here…
Review: Relational Data Model
• Most widely used data model today.
• Relations:
– Schema : specifies name of relation, plus name
and type of each column.
– Instance : a table, with rows and columns that
contain data.
• SQL is a query language for relational data
model
– DDL: To define/modify/change schemas
– DML: To query data in table.
• Keys are a way to associate tuples in different
relations
Let’s return to our bank…
• Can we apply a relational model to our
bank spreadsheet?
CREATE TABLE CUSTOMERS (CustomerID INTEGER,
Name VARCHAR(128),
Address VARCHAR(256),
AccountID INTEGER);
CREATE TABLE ACCOUNTS (AccountID INTEGER,
Balance Double);
Customer
ID
Name
Address
Account
ID
Account
ID
Balance
314159
Frodo
Baggins
BagEnd
112358
112358
4500.00
271828
Sam
Gamgee
BagShot
Row
132124
132124
2000.00
42
Bilbo
Baggins
Rivendell
112358
Primary Keys
• A set of fields is a superkey if:
– No two distinct tuples can have same values in all key fields
• A set of fields is a key for a relation if :
– It is a superkey
– No subset of the fields is a superkey
• what if >1 key for a relation?
– One of the keys is chosen (by DBA) to be the primary key.
Other keys are called candidate keys.
• e.g.
– {sid, gpa} is an example of a superkey.
Students
– sid is a key for Students.
sid name
– what about name? login?
login
53666 Jones jones@cs
53688 Smith smith@eecs
53650 Smith smith@math
age
18
18
19
gpa
3.4
3.2
3.8
Primary and Candidate Keys in SQL
•
•
Keys must be chosen and defined carefully!
• They imply semantics!
What does this set of key definitions imply about students?
CREATE TABLE Enrolled
(sid CHAR(20)
cid CHAR(20),
grade CHAR(2),
PRIMARY KEY (sid),
UNIQUE (cid, grade))
“Students can take only one course, and no two students
in a course receive the same grade.”
Primary and Candidate Keys in SQL
•
Better definition:
CREATE TABLE Enrolled
(sid CHAR(20)
cid CHAR(20),
grade CHAR(2),
PRIMARY KEY (sid,cid))
“For a given student and course, there is a single
grade.”
Foreign Keys, Referential Integrity
• Foreign key: Set of fields in one relation
that is used to `refer’ to a tuple in another
relation.
– Must correspond to the primary key of the other
relation.
– Like a `logical pointer’.
• Plays the same role as the physical pointer in IMS
• If all foreign keys in a table refer to tuples
in the other, referential integrity is achieved
(i.e., no dangling references.)
Foreign Keys in SQL
• E.g. Only students listed in the Students relation
should be allowed to enroll for courses.
– sid is a foreign key referring to Students:
CREATE TABLE Enrolled
(sid CHAR(20),cid CHAR(20),grade CHAR(2),
PRIMARY KEY (sid,cid),
FOREIGN KEY (sid) REFERENCES Students )
Enrolled
sid
53666
53666
53650
53666
cid
grade
Carnatic101
C
Reggae203
B
Topology112
A
History105
B
11111 English102 A
Students
sid
53666
53688
53650
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
age
18
18
19
gpa
3.4
3.2
3.8
Let’s return to our bank…
• Can we define keys for our relations?
CREATE TABLE CUSTOMERS (CustomerID INTEGER NOT NULL,
Name VARCHAR(128),
Address VARCHAR(256),
AccountID INTEGER,
PRIMARY KEY(CustomerID),
FOREIGN KEY(accountid) references ACCOUNTS);
CREATE TABLE ACCOUNTS (AccountID INTEGER NOT NULL,
Balance Double,
PRIMARY KEY (AccountID));
• Why do we need NOT NULL?
• What would happen if I executed these commands
in this order?
Let’s return to our bank…
We’ll come back to these later…
• Write a SQL query (DML) that returns
the names and account balances for all
customers that have an account balance
> 2500.
• Write a SQL query (DML) that
withdraws $300 from Frodo’s account.
Intermission
Disks, Memory, and Files
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
You are here…
Disks and Files
• DBMS stores information on disks.
– Data must be transferred to and from disk and RAM
– READ: transfer data from disk to main memory (RAM).
– WRITE: transfer data from RAM to disk.
• READ and WRITE are expensive and must be
planned carefully!
– DBMS architecture is designed to minimize both
Why Not Store Everything in Main Memory?
• Costs too much. For ~$300,
PCConnection will sell you:
– ~1GB of RAM
– ~30GB of flash
– ~1 TB of disk
• Main memory is volatile. We want data
to be saved between runs. (Obviously!)
The Storage Hierarchy
Smaller, Faster
–Main memory (RAM) for
currently used data.
–Disk for the main
database (secondary
storage).
–Tapes for archiving older
versions of the data
(tertiary storage).
Bigger, Slower
Source: Operating Systems Concepts 5th Edition
Jim Gray’s Storage Latency Analogy:
How Far Away is the Data?
10 9
Andromeda
Tape /Optical
Robot
10 6 Disk
100
10
2
1
RAM
On Board Cache
On Chip Cache
Registers
2,000 Years
Pluto
Sacramento
2 Years
1.5 hr
This Lecture Hall
10 min
This Room
My Head
1 min
Disks
• Secondary storage device of choice.
• Main advantage over tapes:
– faster time to retrieve
– random access vs. sequential.
• Data is stored and retrieved in units called
disk blocks or pages.
• Unlike RAM, time to retrieve a disk block
varies depending upon location on disk.
– Therefore, relative placement of blocks on disk
has major impact on DBMS performance!
Components of a Disk
Disk head
The platters spin (say, 120 rps).
The arm assembly is moved
in or out to position a head
on a desired track. Tracks
under heads make a cylinder
(imaginary!).
Sector
Arm movement
Only one head
reads/writes at any
one time.
Arm assembly
Block size is a multiple
of sector size (which is fixed).

Spindle
Tracks
Platters
Accessing a Disk Page
• Time to access
(read/write) a disk
block:
Transfer time
Seek time
– seek time (moving
arms to position disk
head on track)
– rotational delay
(waiting for block to
rotate under head)
– transfer time (actually
moving data to/from
disk surface)
Arm movement
Rotational delay
Accessing a Disk Page
• Seek time and rotational delay dominate.
– Seek time varies between about 0.3 and 10msec
– Rotational delay varies from 0 to 4msec
– Transfer rate around .08msec per 8K block
• Key to lower I/O cost: reduce seek/rotation
delays!
Arranging Pages on Disk
• `Next’ block concept:
– blocks on same track, followed by
– blocks on same cylinder, followed by
– blocks on adjacent cylinder
• Blocks in a file should be arranged
sequentially on disk (by `next’), to minimize
seek and rotational delay.
• For a sequential scan, pre-fetching several
pages at a time is a big win!
Summary: Disk Space Manager
• Lowest layer of DBMS software manages space
on disk (using OS file system or not?).
• Higher levels call upon this layer to:
– allocate/de-allocate a page
– read/write a page
• Best if a request for a sequence of pages is
satisfied by pages stored sequentially on disk!
– Responsibility of disk space manager.
– Higher levels don’t know how this is done, or how
free space is managed.
– Though they may make performance assumptions!
• Hence disk space manager should do a decent job.
Buffer Management
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
You are here…
Disk Space Management
DB
• Data must be in RAM for DBMS to operate on it!
• Buffer Mgr hides the fact that not all data is in RAM
Buffer Management in a DBMS
Page Requests from Higher Levels
BUFFER POOL
disk page
free frame
MAIN MEMORY
DISK
DB
choice of frame dictated
by replacement policy
• Buffer pool information table contains:
<frame#, pageid, pin_count, dirty>
Requesting a page
Higher level DBMS
component
I need
page 3
BUFFER POOL
Buf Mgr
22
disk page
I need
page 3
3
3
free frames
MAIN MEMORY
Disk Mgr
DISK
1
2
3
… 22 … 90
If requests can be predicted (e.g., sequential scans) pages can be
pre-fetched several pages at a time!

Releasing a page
Higher level DBMS
component
I read page 3
and I’m done
with it
BUFFER POOL
Buf Mgr
22
disk page
3
free frames
MAIN MEMORY
Disk Mgr
DISK
1
2
3
… 22 … 90
Releasing a page
Higher level DBMS
component
I wrote on
page 3 and I’m
done with it
BUFFER POOL
Buf Mgr
22
disk page
3’
3’
free frames
MAIN MEMORY
Disk Mgr
DISK
1
2
3’
3
… 22 … 90
More on Buffer Management
• Requestor of page must eventually unpin it,
and indicate whether page has been
modified:
– dirty bit is used for this.
• Page in pool may be requested many times,
– a pin count is used.
– To pin a page, pin_count++
– A page is a candidate for replacement iff pin
count == 0 (“unpinned”)
• CC & recovery may entail additional I/O when
a frame is chosen for replacement.
– Write-Ahead Log protocol; more later!
Download