Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

advertisement
Spring 2007 Midterm 1 Review
Lectures 2-10
Cow book Chapters
1,3,4,5,8,9,10,11
Administrivia
• Midterm 1 – in class this Thursday!
– Closed book examination
– You will be allowed one 8.5” x 11” sheet of
notes (double sided).
• Sample questions on class web site
Review Outline
• Relational Data Model, Algebra, Calculus
and SQL
• Storage, Buffer Management and
Indexes
Review: DBMS components
•Talks to DBMS to manage data for a specific task
Database application
Query Optimization
and Execution
-> e.g. app to withdraw/deposit money or provide
a history of the account
•Figures out the best way to answer a question
-> There is always more than 1 way to skin a cat…!
•Provides generic ways to combine data
Relational Operators
Access Methods
-> Do you want a list of customers and accounts or
the total account balance of all customers?
•Provides efficient ways to extract data
-> Do you need 1 record or a bunch?
•Makes efficient use of RAM
Buffer Management
-> Think 1,000,000 simultaneous requests!
•Makes efficient use of disk space
Disk Space Management
DB
-> Think 300,000,000 accounts!
Review: ACID properties
• A DBMS ensures a database has ACID properties:
• Atomicity – nothing is ever half baked; database
changes either happen or they don’t.
• Consistency – you can’t peek at the data til it is
baked; database changes aren’t visible until they
are committed
• Isolation – concurrent operations have an
explainable outcome; multiple users can operate
on a database without conflicting
• Durability – what’s done is done; once a database
operation completes, it remains even if the database
crashes
Review: Relational Data Model
• Most widely used data model.
• Relation: made up of 2 parts:
– Schema : specifies name of relation, plus name and type of
each column.
• e.g.
Students(sid: string, name: string, login: string, age: integer, gpa:
real)
– Instance : a table, with rows and columns described by the
schema
• Introduced data independence
– Data layout on disk can change without affecting
applications using the data
• Keys contribute to data independence
– Relationships are determined by field value, not physical
pointers!
Review: Bank of Middle Earth
CustomerID
Name
Address
AccountID
Account
ID
Balance
314159
Frodo
Baggins
BagEnd
112358
112358
4500.00
132124
2000.00
271828
Sam
Gamgee
BagShot
Row
132124
42
Bilbo
Baggins
Rivendell
112358
Give me an example of…
•A super key for Accounts
•Good primary key choices for both
•A foreign key
•A possible check constraint
ALTER TABLE
ACCOUNTS
ADD
CONSTRAINT
CHECK_BAL
CHECK
(BALANCE>= 0)
Review: Query Languages
• Query languages provide 2 key advantages:
– Less work for user asking query
– More opportunities for optimization
• Algebra and safe calculus are simple and powerful models
for query languages for relational model
– Have same expressive power
– Algebra is more operational; calculus is more declarative
• SQL can express every query that is expressible in
relational algebra/calculus. (and more)
• Two sublanguages:
– DDL – Data Definition Language
• Define and modify schema (at all 3 levels)
– DML – Data Manipulation Language
• Queries and IUD (insert update delete)
Review: Basic DDL
CREATE TABLE CUSTOMERS
(CustomerID INTEGER NOT NULL,
Name VARCHAR(128),
Address VARCHAR(256),
AccountID INTEGER,
PRIMARY KEY(CustomerID),
FOREIGN KEY(AccountId)
REFERENCES ACCOUNTS);
CREATE TABLE ACCOUNTS
(AccountID INTEGER NOT NULL,
Balance Double,
PRIMARY KEY (AccountID));
Customer
ID
Name
Address
Account
ID
314159
Frodo
Baggins
BagEnd
112358
271828
Sam
Gamgee
BagShot
Row
132124
42
Bilbo
Baggins
Rivendell
112358
Account
ID
Balance
112358
4500.00
132124
2000.00
• Why do we need NOT NULL?
• What would happen if I executed these commands in this order?
Relational Algebra Review
Reserves
sid
22
58
bid
101
103
day
10/10/96
11/12/96
Basic operations:
•Selection ( σ )
•Projection ( π )
•Cross-product (  )
•Set-difference ( — )
•Union (  )
Sailors
Boats
sid
22
31
58
bid
101
102
103
104
sname rating age
dustin
7
45.0
lubber
8
55.5
rusty
10
35.0
bname
Interlake
Interlake
Clipper
Marine
color
Blue
Red
Green
Red
: gives a subset of rows.
: deletes unwanted columns.
: combine two relations.
: tuples in relation 1, but not 2
: tuples in relation 1 appended with tuples in relation 2.
Additional operations:
•Intersection ()
:tuples that appear in both relations.
•Join (
)
:like  but only keep tuples where common fields are equal.
•Division ( / )
:tuples from relation 1 with matches in relation 2
Relational Algebra Review
Reserves
sid
22
58
bid
101
103
day
10/10/96
11/12/96
Sailors
Boats
sid
22
31
58
bid
101
102
103
104
sname rating age
dustin
7
45.0
lubber
8
55.5
rusty
10
35.0
bname
Interlake
Interlake
Clipper
Marine
color
Blue
Red
Green
Red
Find names of sailors who’ve reserved a green boat
(σ color=‘Green’Boats)
(
( πsname
( πsid
(
( πbid
)
Reserves)
)
Sailors)
)
Relational Algebra Review
Sailors
Reserves
Boats
sid
sname
rating
age
1
Frodo
7
22
2
Bilbo
2
39
3
Sam
8
27
sid
bid
day
bid
bname
color
1
103
9/12
101
Nina
red
2
103
9/13
103
Pinta
blue
3
103
9/14
3
101
9/12
1
103
9/13
Find names of sailors who’ve reserved all boats
•First use division and renaming to find sids of sailors who reserved all boats
•Then join result with sailors and project to get their names
(
ρ
(Tempsids,
πsid,bid Reserves)
sid
bid
1
103
2
103
3
101
3
103
( π bid Boats)
/
πsname ( ( Tempsids
bid
101
103
Sailors) )
Tempsids
sid
=
3
)
Relational Calculus Review
• Variables
TRC: Variables are bound to tuples.
DRC: Variables are bound to domain elements (= column values)
• Constants
7, “Foo”, 3.14159, etc.
• Comparison operators
=, <>, <, >, etc.
• Logical connectives
 - not
 – and
 - or
 - implies
 - is a member of
• Quantifiers
X(p(X)): For every X, p(X) must be true
X(p(X)): There exists at least one X such that p(X) is true
Relational Calculus Review
Find names of sailors who have
reserved a green boat
{ N | S  Sailors
(S.name = N.name 
R  Reserves(S.sid = R.sid 
B  Boats(B.color = “Green” 
B.bid = R.bid)))}
Sailors
sid
S 22
S 31
S 58
sname rating age
dustin
7
45.0
lubber
8
55.5
rusty
10
35.0
Reserves
R
R
sid
22
58
bid
101
103
day
10/10/96
11/12/96
Boats
sname
N
rusty
B
B
B
B
bid
101
102
103
104
bname
Interlake
Interlake
Clipper
Marine
color
Blue
Red
Green
Red
Relational Calculus Review
Boats
Sailors
sid
sname
rating
age
S 1
Frodo
7
22
S 2
Bilbo
2
39
3
Sam
8
27
S
sid
bid
day
bid
bname
color
R
1
103
9/12
B
101
Nina
red
R
2
103
9/13
B
103
Pinta
blue
R
3
103
9/14
R
3
101
9/12
R
1
103
9/13
Find names of sailors who’ve reserved all boats
N
Reserves
{N | SSailors (S.name = N.name 
BBoats (RReserves
sname
(S.sid = R.sid
Sam
 B.bid = R.bid))}
Basic SQL Query
DISTINCT: optional keyword indicating target-list : A list of attributes
answer should not contain duplicates.
of tables in relation-list
In SQL, default is that duplicates
are not eliminated! (Result is called
a “multiset”)
SELECT
[DISTINCT] target-list
FROM
relation-list
WHERE
qualification
qualification : Comparisons
combined using AND, OR and
NOT. Comparisons are Attr op
const or Attr1 op Attr2, where op is
one of ,,,, etc.
relation-list : A list of relation
names, possibly with a rangevariable after each name
Set Operators in SQL
• UNION
Set operators are
almost always used
with nested queries
– Returns the UNION of two sets with (same arity)
– UNION ALL retains duplicates in result
• INTERSECT
– Returns the INTERSECTION of two sets (with same arity)
• EXCEPT
– Set difference: A EXCEPT B returns tuples in A but not B
• IN/NOT IN
– A in B is true if A is a subset of B
• EXISTS/NOT EXISTS
– True if expression evaluates to a set with at least one member
• UNIQUE/NOT UNIQUE
– True if expression evaluates to a set with no duplicates
• Value <comparison op> ANY/ALL
– Value > ANY A is true if A contains at least one member that makes the
comparison true
– Value > ALL B is true if all members of A make the comparison true
SQL Review: Nested query
Find the names of sailors who’ve reserved boat #103
exactly once
SELECT S.sname
FROM Sailors S
WHERE UNIQUE (SELECT sid, bid
FROM Reserves R
WHERE R.bid=103 AND S.sid=R.sid)
1
2
3
Sailors
Reserves
sid sname rating age
sid
bid
day
S
1
Frodo 7
22
1
103
9/12
S
2
Bilbo
2
39
2
103
9/13
S
3
Sam
8
27
1
103
9/13
Aggregate Operators
Often used with
GROUP BY and
HAVING clauses
• Very powerful; enables computations over sets of tuples
• COUNT: returns a count of
tuples in the set
• AVG: returns average of column
values in the set
• SUM: returns sum of column
values in the set
• MIN, MAX: returns min (max)
value of column values in a set.
• DISTINCT can be added to
COUNT, AVG, SUM to perform
computation only over distinct
values.
SELECT COUNT (*)
FROM Sailors S
SELECT AVG (S.age)
FROM Sailors S
WHERE S.rating=10
SELECT AVG(DISTINCT S.age)
FROM Sailors S
WHERE S.rating=10
Sailors who have reserved
all boats
Sailors
sid sname rating
age
1
Frodo
7
22
2
Bilbo
2
39
3
Sam
8
27
SELECT S.name
FROM Sailors S, reserves R
WHERE S.sid = R.sid
GROUP BY S.name, S.sid
HAVING COUNT(DISTINCT R.bid) =
( Select COUNT (*) FROM Boats)
Boats
sname
sid
Frodo
bid
bname
color
101
Nina
red
bid
102
Pinta
blue
1
102
103
Santa Maria red
Bilbo
2
101
Bilbo
2
102
sname
sid
count
Frodo
1
102
Frodo
1
1
count
sid
bid
day
Bilbo
2
103
Bilbo
2
3
3
1
102
9/12
2
102
9/12
Reserves
sname
sid
bid
2
101
9/14
Frodo
1
102,102
1
102
9/10
Bilbo
2
101, 102, 103
2
103
9/13
Review: Storage
• A DBMS is like an ogre; it has layers
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Disks and Files
• DBMS stores information on disks. Why?
• To work with information, DBMS moves data to
RAM.
– READ: transfer data from disk to main memory (RAM).
– WRITE: transfer data from RAM to disk.
• READ and WRITE are expensive. Why?
– must be planned carefully!
– DBMS architecture is designed to minimize both
The Storage Hierarchy
Smaller, Faster
–Main memory (RAM) for
currently used data.
–Disk for the main
database (secondary
storage).
–Tapes for archiving older
versions of the data
(tertiary storage).
Bigger, Slower
Source: Operating Systems Concepts 5th Edition
Components of a Disk
Disk head
The platters spin (say, 120 rps).
The arm assembly is moved
in or out to position a head
on a desired track. Tracks
under heads make a cylinder
(imaginary!).
Sector
Arm movement
Only one head
reads/writes at any
one time.
Arm assembly
Block size is a multiple
of sector size (which is fixed).

Spindle
Tracks
Platters
Disks are slow. Why?
• Time to access
(read/write) a disk
block:
Transfer time
Seek time
– seek time (moving
arms to position disk
head on track)
– rotational delay
(waiting for block to
rotate under head)
– transfer time (actually
moving data to/from
disk surface)
Arm movement
Rotational delay
Disk Space Manager
• Lowest layer of DBMS software manages space
on disk (using OS file system or not?).
• Higher levels call upon this layer to:
– allocate/de-allocate a page
– read/write a page
• Best if a request for a sequence of pages is
satisfied by pages stored sequentially on disk!
– Responsibility of disk space manager.
– Higher levels don’t know how this is done, or how
free space is managed.
– Though they may make performance assumptions!
• Hence disk space manager should do a decent job.
Buffer Management in a DBMS
Page Requests from Higher Levels
BUFFER POOL
disk page
free frame
MAIN MEMORY
DISK
DB
choice of frame dictated
by replacement policy
• Buffer pool information table contains:
<frame#, pageid, pin_count, dirty>
Buffer Management
• Keeps a group a disk pages in memory
• Records whether each is pinned
– What happens when all pages pinned?
– Whan happens when a page is unpinned?
• Keeps track of whether pages are dirty
Buffer Management – Replacement
• What if all frames are used, but not
pinned, and a new page is requested?
• What pages are candidates for
replacement?
• How is the replaced page chosen?
Replacement Policies
• Least Recently Used (LRU)
• Most Recently Used (MRU)
• Clock
• Advantages? Disadvantages?
What is in Database Pages?
• Database contains files, which are made
up of…
• Pages, which are made up of…
• Records, which are made up of…
• Fields, which hold single values.
How are records organized?
• It depends on whether fields variable,
or fixed length
• In Minibase, array of type/offsets,
followed by data.
F1
F2
F3
Array of Field Offsets
F4
How are pages organized?
• It depends on whether records variable, fixed length.
• In Minibase, slot array at beginning of page, records
compacted at end of page.
• What happens if record deleted?
Rid = (i,N)
Page i
Rid = (i,2)
Rid = (i,1)
20
N
...
16
2
SLOT DIRECTORY
24
N
1 # slots
Pointer
to start
of free
space
How are files organized?
• Unordered Heap File: chained directory
pages, containing records that point to
data pages.
Data
Page 1
Header
Page
Data
Page 2
DIRECTORY
Data
Page N
Several possible file organizations
•
•
•
•
•
Heap Files
Sorted Files
Clustered Indexes
Unclustered Index + regular file
What are the tradeoffs?
– Scan
– Sort
– Equality Search
– Range Search
– Insertion/Deletion
Indexes
• Can be used to store data records (alt 1), or
be an auxillary data structure that referrs to
existing file of records (alt 2, 3)
• Many types of index (B-Tree, Hash Table, RTree, etc.)
• How do you choose the right index?
• Difference between clustered and unclustered
indexes?
Clustered vs. Unclustered Index
• Suppose that Alternative (2) is used for data entries,
and that the data records are stored in a Heap file.
– To build clustered index, first sort the Heap file (with some
free space on each block for future inserts).
– Overflow blocks may be needed for inserts. (Thus, order of
data recs is `close to’, but not identical to, the sort order.)
CLUSTERED
Index entries
direct search for
data entries
Data entries
UNCLUSTERED
Data entries
(Index File)
(Data file)
Data Records
Data Records
B-Trees: a common, flexible index
• What is a B-Tree?
• What goes in an index (interior) node?
• What goes in a leaf node?
• How do insertions and deletions work?
Any Questions?
See you here on Thursday…
Download