CMSC424: Database Design Lecture 4 CMSC424, Spring 2005

advertisement
CMSC424: Database Design
Lecture 4
CMSC424, Spring 2005
Review: Relational Data Model
Key Abstraction: Relation
Mathematical relations
Given sets: R = {1, 2, 3}, S = {3, 4}
•
R  S = { (1, 3), (1, 4), (2, 3), (2, 4), (3, 3), (3, 4) }
•
A relation on R, S is any subset () of R  S (e.g: { (1, 4), (3, 4)})
Database relations
Given attribute domains
Branches
=
Accounts
=
Balances
=
{ Downtown, Brighton, … }
{ A-101, A-201, A-217, … }
R
Account  Branches  Accounts  Balances
{ (Downtown,
A-101, 500),
(Brighton,
A-201, 900),
(Brighton,
A-217, 500) }
CMSC424, Spring 2005
bname
acct_no
balance
Downtown
Brighton
Brighton
A-101
A-201
A-217
500
900
500
Review: Terms and Definitions
1.
2.
3.
4.
Tables = Relations
Columns = Attributes
Rows = Tuples
Relation Schema (or Schema)
A list of attributes and their domains
We will require the domains to be atomic
E.g. account(account-number, branch-name, balance)
5. Relation Instance
A particular instantiation of a relation with actual values
Will change with time
CMSC424, Spring 2005
Bank Database: Schema
Account
bname
acct_no
Branch
balance
bname
bcity
assets
Depositor
cname
Borrower
acct_no
cname
lno
Customer
cname
cstreet
ccity
Loan
bname
CMSC424, Spring 2005
lno
amt
Bank Database: An Instance
Account
Branch
bname
acct_no
balance
bname
bcity
assets
Downtown
Mianus
Perry
R.H.
Brighton
Redwood
Brighton
A-101
A-215
A-102
A-305
A-201
A-222
A-217
500
700
400
350
900
700
750
Downtown
Redwood
Perry
Mianus
R.H.
Pownel
N. Town
Brighton
Brooklyn
Palo Alto
Horseneck
Horseneck
Horseneck
Bennington
Rye
Brooklyn
9M
2.1M
1.7M
0.4M
8M
0.3M
3.7M
7.1M
Depositor
cname
acct_no
Johnson
Smith
Hayes
Turner
Johnson
Jones
Lindsay
A-101
A-215
A-102
A-305
A-201
A-217
A-222
Borrower
Customer
cname
cstreet
ccity
Jones
Smith
Hayes
Curry
Lindsay
Turner
Williams
Adams
Johnson
Glenn
Brooks
Green
Main
North
Main
North
Park
Putnam
Nassau
Spring
Alma
Sand Hill
Senator
Walnut
Harrison
Rye
Harrison
Rye
Pittsfield
Stanford
Princeton
Pittsfield
Palo Alto
Woodside
Brooklyn
Stanford
cname
lno
Jones
Smith
Hayes
Jackson
Curry
Smith
Williams
Adams
L-17
L-23
L-15
L-14
L-93
L-11
L-17
L-16
Loan
CMSC424, Spring 2005
bname
lno
amt
Downtown
Redwood
Perry
Downtown
Mianus
R.H.
Perry
L-17
L-23
L-15
L-14
L-93
L-11
L-16
1000
2000
1500
1500
500
900
1300
Review: Keys and Relations
As in the E/R Model:
1. Superkeys
• set of attributes of table for which every row has distinct set of values
2. Candidate keys
•“minimal” superkeys
3. Primary keys
•DBA-chosen candidate keys
Act as Integrity Constraints
i.e., guard against illegal/invalid instance of given schema
e.g., Branch = (bname, bcity, assets)
bname
bcity
assets
Brighton
Brighton
Brooklyn
Boston
5M
3M

CMSC424, Spring 2005
Invalid!!
More on Keys
Determining Primary Keys
If relation schema derived from E-R diagrams, we can
determine the primary keys using the original entity and
relationship sets
Otherwise, same way we do it for E-R diagrams
• Find candidate keys (minimal sets of attributes that can
uniquely identify a tuple)
• Designate one of them to be primary key
Foreign Keys
If a relation schema includes the primary key of another
relation schema, that attribute is called the foreign key
CMSC424, Spring 2005
Schema Diagram for the Banking
Enterprise
CMSC424, Spring 2005
Relational Query Languages
Recall: Query = “Retrieval Program”
Language Examples:
Theoretical:
1. Relational Algebra
2. Relational Calculus
a. Tuple Relational Calculus (TRC)
b. Domain Relational Calculus (DRC)
Practical:
1. SQL (originally: SEQUEL from System R)
2. Quel (used in Ingres)
3. Datalog (Prolog-like – used in research lab systems)
Theoretical QL’s give semantics to Practical QL’s
CMSC424, Spring 2005
Relational Algebra
Basic Operators
1.
2.
3.
4.
5.
6.
select ( σ )
project ( p )
union (  )
set difference ( – )
cartesian product (  )
rename ( ρ )
Relation
Relational
Operator
Relation
CMSC424, Spring 2005
Relation
Select ( σ )
Notation: σpredicate (Relation)
Relation: Can be name of table, or another query
Predicate:
1. Simple
• attribute1 = attribute2
• attribute = constant value (also: ≠, <, >, ≤, ≥)
2. Complex
• predicate AND predicate
• predicate OR predicate
• NOT predicate
CMSC424, Spring 2005
Select ( σ )
Notation: σpredicate (Relation)
Examples:
σ bcity = “Brooklyn” (branch) =
bname
bcity
Downtown Brooklyn
Brighton
Brooklyn
assets
9M
7.1M
σ assets > 8M (σ bcity = “Brooklyn” (branch)) =
bname
bcity
Downtown Brooklyn
CMSC424, Spring 2005
assets
9M
Project ( p )
Notation: pA1, …, An (Relation)
• Each Ai an attribute
• Idea: p selects columns (vs. σ which selects rows)
Examples:
p cstreet, ccity (customer) =
cstreet
ccity
Main
North
Park
Putnam
Nassau
Spring
Alma
Sand Hill
Senator
Walnut
Harrison
Rye
Pittsfield
Stanford
Princeton
Pittsfield
Palo Alto
Woodside
Brooklyn
Stanford
CMSC424, Spring 2005
Project ( p )
Notation: pA1, …, An (Relation)
• Each Ai an attribute
• Idea: p selects columns (vs. σ which selects rows)
Examples:
p bcity (σassets > 5M (branch)) =
bcity
Brooklyn
Horseneck
CMSC424, Spring 2005
Union (  )
Notation: Relation1  Relation2
R  S valid only if:
1.
2.
R, S have same number of columns (arity)
R, S corresponding columns have same domain (compatibility)
Example:
(p cname (depositor))  (p cname (borrower)) =
CMSC424, Spring 2005
cname
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Adams
Set Difference ( – )
Notation: Relation1 - Relation2
R - S valid only if:
1.
2.
R, S have same number of columns (arity)
R, S corresponding columns have same domain (compatibility)
Example:
(p bname (σamount ≥ 1000 (loan))) – (p bname (σ balance < 800 (account))) =
bname
lno
amount
bname
acct_no
balance
Downtown
Redwood
Perry
Downtown
Perry
L-17
L-23
L-15
L-14
L-16
1000
2000
1500
1500
1300
Mianus
Brighton
Redwood
Brighton
A-215
A-201
A-222
A-217
700
900
700
750
CMSC424, Spring 2005
Set Difference ( – )
Notation: Relation1 - Relation2
R - S valid only if:
1.
2.
R, S have same number of columns (arity)
R, S corresponding columns have same domain (compatibility)
Example:
(p bname (σamount ≥ 1000 (loan))) – (p bname (σ balance < 800 (account))) =
bname
lno
amount
bname
acct_no
balance
Downtown
Redwood
Perry
Downtown
Perry
L-17
L-23
L-15
L-14
L-16
1000
–
2000
1500
1500
1300
Mianus
Brighton
Redwood
Brighton
A-215
A-201
=
A-222
A-217
700
900
700
750
CMSC424, Spring 2005
bname
Downtown
Perry
Cartesian Product (  )
Notation: Relation1  Relation2
R  S like cross product for mathematical relations:
• every tuple of R appended to every tuple of S
Example:
depositor  borrower =
How many tuples in
the result?
A: 56
depositor.
cname
acct_no
borrower.
cname
lno
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Smith
…
A-101
A-101
A-101
A-101
A-101
A-101
A-101
A-101
A-215
…
Jones
Smith
Hayes
Jackson
Curry
Smith
Williams
Adams
Jones
…
L-17
L-23
L-15
L-14
L-93
L-11
L-17
L-16
L-17
…
CMSC424, Spring 2005
Rename ( ρ )
Notation: r identifier (Relation)
renames a relation, or
Notation: r identifier0 (identifier1, …, identifiern) (Relation)
renames relation and columns of n-column relation
Use:
massage relations to make , – valid, or  more readable
CMSC424, Spring 2005
Rename ( ρ )
Notation: r identifier0 (identifier1, …, identifiern) (Relation)
renames relation and columns of n-column relation
Example:
r res (dcname, acctno, bcname, lno) (depositor  borrower) =
depositor.
cname
acct_no
borrower.
cname
lno
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Smith
…
A-101
A-101
A-101
A-101
A-101
A-101
A-101
A-101
A-215
…
Jones
Smith
Hayes
Jackson
Curry
Smith
Williams
Adams
Jones
…
L-17
L-23
L-15
L-14
L-93
L-11
L-17
L-16
L-17
…
CMSC424, Spring 2005
Rename ( ρ )
Notation: r identifier0 (identifier1, …, identifiern) (Relation)
renames relation and columns of n-column relation
Example:
r res (dcname, acctno, bcname, lno) (depositor  borrower) =
res =
dcname
acctno
bcname
lno
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Smith
…
A-101
A-101
A-101
A-101
A-101
A-101
A-101
A-101
A-215
…
Jones
Smith
Hayes
Jackson
Curry
Smith
Williams
Adams
Jones
…
L-17
L-23
L-15
L-14
L-93
L-11
L-17
L-16
L-17
…
CMSC424, Spring 2005
Example Query in RA
Determine lno’s for loans that are for an amount that is
larger than the amt of some other loan. (i.e. lno’s for all
non-minimal loans)
Can do in steps:
Temp1  …
Temp2  … Temp1 …
…
CMSC424, Spring 2005
Bank Database: An Instance
Account
Branch
bname
acct_no
balance
bname
bcity
assets
Downtown
Mianus
Perry
R.H.
Brighton
Redwood
Brighton
A-101
A-215
A-102
A-305
A-201
A-222
A-217
500
700
400
350
900
700
750
Downtown
Redwood
Perry
Mianus
R.H.
Pownel
N. Town
Brighton
Brooklyn
Palo Alto
Horseneck
Horseneck
Horseneck
Bennington
Rye
Brooklyn
9M
2.1M
1.7M
0.4M
8M
0.3M
3.7M
7.1M
Depositor
cname
acct_no
Johnson
Smith
Hayes
Turner
Johnson
Jones
Lindsay
A-101
A-215
A-102
A-305
A-201
A-217
A-222
Borrower
Customer
cname
cstreet
ccity
Jones
Smith
Hayes
Curry
Lindsay
Turner
Williams
Adams
Johnson
Glenn
Brooks
Green
Main
North
Main
North
Park
Putnam
Nassau
Spring
Alma
Sand Hill
Senator
Walnut
Harrison
Rye
Harrison
Rye
Pittsfield
Stanford
Princeton
Pittsfield
Palo Alto
Woodside
Brooklyn
Stanford
cname
lno
Jones
Smith
Hayes
Jackson
Curry
Smith
Williams
Adams
L-17
L-23
L-15
L-14
L-93
L-11
L-17
L-16
Loan
CMSC424, Spring 2005
bname
lno
amt
Downtown
Redwood
Perry
Downtown
Mianus
R.H.
Perry
L-17
L-23
L-15
L-14
L-93
L-11
L-16
1000
2000
1500
1500
500
900
1300
Example Query in RA
1. Find the base data we need
Temp1  p lno,amt (loan)
lno
amt
L-17
L-23
L-15
L-14
L-93
L-11
L-16
1000
2000
1500
1500
500
900
1300
2. Make a copy of (1)
Temp2  ρ Temp2 (lno2,amt2) (Temp1)
CMSC424, Spring 2005
lno2
amt2
L-17
L-23
L-15
L-14
L-93
L-11
L-16
1000
2000
1500
1500
500
900
1300
Example Query in RA
3. Take the cartesian product of 1 and 2
Temp3  Temp1  Temp2
lno
amt
lno2
amt2
L-17
L-17
…
L-17
L-23
L-23
…
L-23
…
1000
1000
…
1000
2000
2000
…
2000
…
L-17
L-23
…
L-16
L-17
L-23
…
L-16
…
1000
2000
…
1300
1000
2000
…
1300
…
CMSC424, Spring 2005
Example Query in RA
4. Select non-minimal loans
Temp4  σamt > amt2 (Temp3)
5. Project on lno
Result  p lno (Temp4)
… or, if you prefer…
p lno (
σamt > amt2 (p lno,amt (loan)  (ρTemp2 (lno2,amt2) (p lno,amt (loan)))))
CMSC424, Spring 2005
What we learned so far…
Relational Algebra Operators
1. Select
2. Project
3. Set Union
4. Set Difference
5. Cartesian Product
6. Rename
These are called fundamental operations
CMSC424, Spring 2005
Formal Definition
Basic expression
A relation in the database
A constant relation
e.g. {(A-101, Downtown, 500), (A-215, Mianus, 700)…}
Let E1 and E2 be two relational-algebra expressions, then
the following are also:
1.
2.
3.
4.
5.
6.
σP(E1), where P is a predicate on attributes in E1
pS(E1), where S is a list containing some attributes in E1
E1  E2,
E1 – E2
E1  E2
ρx(E1), where x is the new name for the result of E1
CMSC424, Spring 2005
Relational Algebra
Redundant Operators
1. Natural Join (  )
2. Division (  )
3. Outer Joins (  
 )
4. Update (  ) (we’ve already been using)
• Redundant: Above can be expressed in terms of minimal RA
 e.g. depositor  borrower =
π …(σ…(depositor  ρ…(borrower)))
• Added as convenience
CMSC424, Spring 2005
Natural Join
Notation: Relation1  Relation2
Idea: combines ρ, , σ
A B C
D
E
B
D
A B C
D
E
α
α
α
β
10
10
20
10
‘a’
‘a’
‘b’
‘c’
α
α
β
β
10
20
10
10
1
2
2
3
3
α
α
α
β
β
10
10
20
10
10
‘a’
‘a’
‘a’
‘b’
‘c’
1
2
2
3
+
+

r
=
s
depositor

+
+
+
borrower
≡
πcname,acct_no,lno (σcname=cname2 (depositor  ρt(cname2,lno) (borrower)))
CMSC424, Spring 2005
Division
Notation: Relation1  Relation2
Idea: expresses “for all” queries
A B
α
α
α
β
γ
γ
γ
γ
δ
δ
1
2
3
1
1
3
4
6
1
2
r

B
1
2
A
=
α
δ
s
Query: Find values for A in r
which have corresponding B
values for all B values in s
CMSC424, Spring 2005
Division
Another way to look at it:  and 
173 = 5
The largest value of i such
that: i
Relational Division
 3 ≤ 17
A B
α
α
α
β
γ
γ
γ
γ
δ
δ
1
2
3
1
1
3
4
6
1
2
r

B
1
2
A
=
s
α
δ
t
The largest value of t such that:
(tsr)
CMSC424, Spring 2005
Division
A More Complex Example
A B C D E
α
α
α
β
β
γ
γ
γ
a
a
a
a
a
a
a
a
α
γ
γ
γ
γ
γ
γ
β
a
a
b
a
b
a
b
b
1
1
1
1
3
1
1
1

D E
a 1
b 1
s
r
CMSC424, Spring 2005
=
A B C
α
γ
?a
a
t
γ
γ
Download