Review of relational model and SQL

advertisement
CS 440
Database Management Systems
Lecture 2 : Review of Relational Model and
SQL
1
Announcements
• We have posted the first assignment.
– Written assignment
– Due on January 14th
• Practice questions with solutions will be posted today
after the class.
– Text book website has solutions for some of its problems.
• Piazza is up and running
– The preferred way of communication with the course staff.
– If you have not been invited, email our TA:
subraman@onid.oregonstate.edu
2
Relational Database Management
Conceptual
Design
Entity
Relationship(ER)
Model
Schema
Physical
Layer
Relational Model
Files and Indexes
3
Relational Database Management
Conceptual
Design
Entity Relationship(ER)
Model
Schema
Relational Model
Physical
Layer
Files and Indexes
4
Relational Database Management
Conceptual
Design
Entity
Relationship(ER)
Model
Schema
Physical
Layer
Relational Model
Files and Indexes
5
Relational Database Management
Relational Model & SQL
Conceptual
Design
Entity
Relationship(ER)
Model
Schema
Physical
Layer
Relational Model
Files and Indexes
6
Relational Model
• Relational model defines:
– a way of organizing data: relations
– operations to query and/or manipulate the data
• Much easier to use than procedural languages.
– Say what you want instead of how to do
• Everything is a relation.
– Both data and query
7
Relation: example
Relation name
Book
Title
Attribute names
Price
Category
Year
tuples
MySQL
$102.1
computer
2001
Cell biology
$201.69
biology
1954
French cinema $53.99
art
2002
NBA History $63.65
sport
2010
8
Relation
• Attributes
– Atomic values
– atomic types: string, integer, real, date, …
• Each relation must have keys
– Attributes without duplicate values
– A relation does not contain duplicate tuples.
• Reordering tuples does not change the relation.
• Reordering attributes does not change the
relation.
9
Database Schema vs. Database Instance
• Schema of a Relation
– Names of the relation and their attributes.
– E.g.: Person (Name, Address, SSN)
– Types of the attributes
– Constraints on the values of the attributes
• Schema of the database
– Set of relation schemata
– E.g.: Person (Name, Address, SSN)
Employment(Company, SSN)
10
Database Schema vs. Database Instance
• Schema: Book(Title, Price, Category, Year)
• Instance:
Title
Price
Category
Year
MySQL
$102.1
computer 2001
Cell biology
$201.69
biology
1954
French cinema $53.99
art
2002
NBA History $63.65
sport
2010
11
Relational algebra: operations on
relations
• Basic operations:
–
–
–
–
–
Selection (  ) Selects a subset of rows from relation.
Projection ( ) Deletes unwanted columns from relation.
Cross-product ( ) Allows us to combine two relations.
Set-difference ( ) Tuples in reln. 1, but not in reln. 2.
Union (  ) Tuples in reln. 1 and in reln. 2.


• Additional operations:
–
Intersection, join, … : Not essential, but (very!) useful.
• Since each operation returns a relation, operations can
be composed. (Algebra is “closed”.)
Example Schema
Beers(name, manf)
Bars(name, addr, license)
Drinkers(name, addr, phone)
Likes(drinker, beer)
Sells(bar, beer, price)
Frequents(drinker, bar)
13
Projection
• Deletes attributes that are not in
projection list.
• Schema of result contains exactly the
fields in the projection list, with the
same names that they had in the
(only) input relation.
drinker beer
John
Alice
Smith
Ron
Bud Lite
Bud
Bud
Bud Lite
p drinker(Likes)
drinker
John
Alice
Smith
Ron
Selection
• Selects rows that satisfy
selection condition.
• Schema of result identical
to schema of (only) input
relation.
• Result relation can be the
input for another relational
algebra operation!
(Operator composition.)
bar
beer
Blind pig Bud
Quality Bud
price
9
15
s price<10(Sells)
bar
beer
Blind pig Bud
price
9
Union, Intersection, Set-Difference
• All of these operations take two input relations, which must be
union-compatible:
–
–
Same number of fields.
`Corresponding’ fields have the same type.
• What is the schema of result?
Cross-Product
• Each row of S1 is paired with each row of R1.
• Result schema has one field per field of S1 and R1.
Bars
name
addr
license
Blind pig
1st St.
201
Quality
2nd St.
302
name
Frequents
drinker
bar
John
Blind pig
Alice
Quality
addr
license
drinker
Blind pig
1st St.
201
John
Blind pig
Quality
2nd St.
302
Alice
Quality
Blind pig
1st St.
201
Alice
Quality
Quality
2nd St.
302
John
Blind pig
Bars´Frequents
bar
Joins
Bars
R  c S   c ( R  S)
name
Frequents
addr
license
Blind pig
1st St.
201
John
Blind pig
Quality
2nd St.
302
Alice
Quality
name
drinker
addr
license
Blind pig
1st St.
201
John
Blind pig
Quality
2nd St.
302
Alice
Quality
Bars ▹▹
drinker
bar
Bars.name=Frequents.bar
bar
Frequents
Joins
• Result schema same as that of crossproduct.
• Fewer tuples than cross-product, might be
able to compute more efficiently
• If the condition is equality, it is called equijoin.
• Natural Join: Equijoin on all common
fields.
SQL
• A declarative language for querying data stored in
relational databases
– implements relational algebra with slight
modifications.
• Many standards: SQL92, SQL99, …
– We focus on the core functionalities.
20
The Basic Form
SELECT returned attribute(s)
One or more
FROM relation(s)
WHERE conditions on the tuples of the table(s)
1. Apply the WHERE clause’s conditions on all
relations in the tables in the FROM clause.
2. Return the values of the attributes in the
SELECT clause.
21
Single Relation Query
What beers are made by Anheuser-Busch?
name
Bud
Bud Lite
Bud 2.0
manf
Anheuser-Busch
Anheuser-Busch
Adams
SELECT name
FROM Beers
WHERE manf = ‘Anheuser-Busch’;
name
Bud
Bud Lite
22
Using *
What beers are made by Anheuser-Busch?
SELECT *
FROM Beers
WHERE manf = ‘Anheuser-Busch’;
name
manf
Bud
Anheuser-Busch
Bud Lite Anheuser-Busch
23
WHERE clause
•
•
•
•
May have complex conditions
Logical operators: OR, AND, NOT
Comparison operators: <, >, =, <>,…
Types specific operators: LIKE, …
24
Null Values
• Some tuples may not contain any value for some
of their attributes
– The operator did not enter the data
– The operator did not know the value
–…
• Ex: We do not know Fred’s salary.
– Put 0.0  Fred is not on unpaid leave!
• Databases use null value for these cases
25
A value not like any other value!
• A tuple in Sells relation:
bar
Joe Bar
beer price
Bud NULL
SELECT *
FROM Sells
WHERE price < 0.0 OR price >= 0.0
Does not return Joe Bar.
26
A value not like any other value!
• A tuple in Sells relation:
bar
Joe Bar
SELECT *
FROM Sells
WHERE price IS NULL
beer price
Bud NULL
27
Multi Relation Query: Join
• Find relations between different types of entities:
have more business value!
• Ex: Using relations Likes(drinker, beer) and
Frequents(drinker, bar), find the beers liked by at
least one person who frequents Joe Bar.
SELECT Likes.beer
FROM Likes, Frequents
WHERE Frequents.bar = ‘Joe Bar’ AND
Frequents.drinker = Likes.drinker;
28
Join Queries
• Generally, require processing large number of
tuples  time consuming.
• Relational Database Management Systems
(RDBMS) have ways to process them efficiently
– We talk more about this later in the course
29
Subqueries
• SQL queries that appear in WHERE or FROM
parts of another query.
• Example: Using Sells(bar, beer, price), find the
bars that serve Miller for the same price Joe Bar
charges for Bud.
– Figure out Joe’s price for Bud : JoePrice
– Find bars that offer Miller at price = JoePrice
30
Subqueries
SELECT bar
FROM Sells
WHERE beer=‘Miller’ AND price=
(SELECT price
FROM Sells
WHERE bar= ‘Joe Bar’
AND beer = ‘Bud’);
Subquery
31
Subqueries: ALL, ANY
• We like to compare a value to a set of values
• Example: Using Sells(bar, beer, price), find the
bars that serve Miller for a cheaper price than the
price that every bar charges for Bud.
– Figure out the set of all prices for Bud : BudPrice.
– Find the bars that offer Miller at a cheaper price than
all values in BudPrice.
32
Subqueries: ALL, ANY
SELECT bar
FROM Sells
WHERE beer=‘Miller’ AND price <
ALL
(SELECT price
FROM Sells
WHERE beer = ‘Bud’);
Subquery
• What if we use ANY instead of ALL?
• Returns the bars that serve Miller for a cheaper price than33
the price that at least one bar charges for Bud.
Subqueries: IN
• We like to check if the result of a subquery
contains a particular value.
• Example: Using Beers(name, manf) and
Likes(drinker, beer) find the manf of each beer
John likes.
SELECT manf
FROM Beers
WHERE name IN
A set of beers
(SELECT beer
FROM Likes
WHERE
34
Subqueries: Exists
• We like to check if a subquery has any result.
• Example: Using Beers(name, manf), find the
beers that are the only beer made by their
manufacturers.
SELECT name
FROM Beers b1
WHERE NOT EXISTS
(SELECT *
FROM Beers
WHERE manf=b1.manf AND
name <> b1.name);
35
Bag versus Set
• Duplicates are allowed in bags.
– {a, a, b, b, b} vs. {a, b}
• Generally, the results of SQL queries are bags.
SELECT name
FROM Beers;
name
Bud
Bud Lite
Bud
manf
Anheuser-Busch
Anheuser-Busch
B-company
name
Bud
Bud Lite
Bud
36
Removing Duplicates
• Use DISTINCT
name
Bud
Bud Lite
Bud
SELECT DISTINCT name
FROM Beers;
manf
Anheuser-Busch
Anheuser-Busch
B-company
name
Bud
Bud Lite
37
Set Operations
• R UNION S
– Returns the union between tuples of relation R and
tuples of relation S.
• R INTERSECT S
– Returns the tuples common between relation R and
relation S.
• R EXCEPT S
– Returns the tuples found in relation R but not in
relation S.
38
Set Operations: Example
• Using relations Likes(drinker, beer), Sells(bar,
beer, price), and Frequents(drinker, bar), find the
drinkers and beers such that
– The drinker likes the beer, and
– The drinker frequents at least one bar that sells the
beer
• “and” shows that we should compute intersection.
39
Set operations: Example
(SELECT * FROM Likes)
INTERSECT
The drinker likes the beer
(SELECT drinker, beer
FROM Sells, Frequents
WHERE Frequents.bar=Sells.bar);
The drinker frequents at the bar
that sells the beer
40
Set Operations
• The results of set operations in SQL do not have
any duplicate tuples.
• We can force them not to remove duplicates by
ALL.
– .. INTERSECT …  .. INTERSECT ALL …
– .. UNION …  .. UNION ALL …
– .. DIFFERENCE …  .. DIFFERENCE ALL …
41
Download