database fundamentals

advertisement
COP5725
Advanced Database Systems
Spring 2016
DB Fundamentals
Tallahassee, Florida, 2016
What are Database Management Systems
DBMS is a system for providing EFFICIENT, CONVENIENT,
and SAFE MULTI-USER storage of and access to MASSIVE
amounts of PERSISTENT data
1
Example: Banking System
• Data
• Information on accounts, customers, balances, current interest
rates, transaction histories, etc.
• MASSIVE
• Many gigabytes at a minimum for big banks, more if keep history
of all transactions, even more if keep images of checks -> Far too
big to fit in main memory
• PERSISTENT
• Data outlives programs that operate on it
2
Example: Banking System
• SAFE:
– from system failures
– from malicious users
• CONVENIENT:
– simple commands to debit account, get balance, write statement,
transfer funds, etc.
– also unpredicted queries should be easy
• EFFICIENT:
– don't search all files in order to get balance of one account, get all
accounts with low balances, get large transactions, etc.
– massive data! -> DBMS's carefully tuned for performance
3
Multi-user Access
• Many people/programs accessing same database, or
even same data, simultaneously -> Need careful
controls
– Alex @ ATM1: withdraw $100 from account #007
get balance from database;
if balance >= 100 then balance := balance - 100;
dispense cash;
put new balance into database;
– Bob @ ATM2: withdraw $50 from account #007
get balance from database;
if balance >= 50 then balance := balance - 50;
dispense cash;
put new balance into database;
– Initial balance = 120. Final balance = ??
4
Why File Systems Won’t Work
• Storing data: file system is limited
– size limit by disk or address space
– when system crashes we may lose data
– Password/file-based authorization insufficient
• Query/update:
– need to write a new C++/Java program for every new query
– need to worry about performance
• Concurrency: limited protection
– need to worry about interfering with other users
– need to offer different views to different users (e.g. registrar, students,
professors)
• Schema change:
– entails changing file formats
– need to rewrite virtually all applications
That’s why the notion of DBMS was motivated!
5
DBMS Architecture
User/Web Forms/Applications/DBA
query
Query Parser
transaction
DDL commands
Transaction Manager
DDL Processor
Concurrency
Control
Logging &
Recovery
Query Rewriter
Query Optimizer
Query Executor
Records
Indexes
Buffer Manager
Storage Manager
Storage
CS411
Lock Tables
Buffer:
data, indexes, log, etc
Main Memory
data, metadata, indexes, log, etc
6
Data Structuring: Model, Schema, Data
• Data model
– conceptual structuring of data stored in database
– ex: data is set of records, each with student-ID, name, address,
courses, photo
– ex: data is graph where nodes represent cities, edges represent
airline routes
• Schema versus data
– schema: describes how data is to be structured, defined at set-up
time, rarely changes (also called "metadata")
– data: actual "instance" of database, changes rapidly
– vs. types and variables in programming languages
7
Schema vs. Data
• Schema: name, name of each field, the type of each
field
– Students (Sid:string, Name:string, Age: integer, GPA: real)
– A template for describing a student
• Data: an example instance of the relation
Sid
Name
Age
GPA
0001
Alex
19
3.55
0002
Bob
22
3.10
0003
Chris
20
3.80
0004
David
20
3.95
0005
Eugene
21
3.30
8
Data Structuring: Model, Schema, Data
• Data definition language (DDL)
– commands for setting up schema of database
• Data Manipulation Language (DML)
– Commands to manipulate data in database:
• RETRIEVE, INSERT, DELETE, MODIFY
– Also called "query language"
9
People
• DBMS user: queries/modifies data
• DBMS application designer
– set up schema, loads data, …
• DBMS administrator
– user management, performance tuning, …
• DBMS implementer: builds systems
10
Key Steps in Building DB Applications
• Step 0: pick an application domain
• Step 1: conceptual design
– Discuss with your team mate what to model in the application
domain
– Need a modeling language to express what you want
• ER model is the most popular such language
– Output: an ER diagram of the application domain
• Step 2: pick a type of DBMS’s
– Relational DBMS is most popular and is our focus
11
Key Steps in Building DB Applications
• Step 3: translate ER design to a relational schema
– Use a set of rules to translate from ER to relational schema
– Use a set of schema refinement rules to transform the above
relational schema into a good relational schema
• 1NF, 2NF, 3NF, BCNF, 4NF,……
– At this point
• You have a good relational schema on paper
12
Key Steps in Building DB Applications
• Step 4: Implement your relational DBMS using a
"database programming language" called SQL
– SELECT-FROM-WHERE-GROUPBY-HAVING
• Step 5: Ordinary users cannot interact with the
database directly and the database also cannot do
everything you want, hence write your application
program in C++, Java, PHP, etc. to handle the
interaction and take care of things that the database
cannot do
13
Constraints
• Constraint: an assertion about the database that must
be true at all times
– Part of the database schema
– Very important in database design
• Finding constraints is part of the modeling process
– Keys: social security number uniquely identifies a person
– Single-value constraints: a person can have only one father
– Referential integrity constraints: if you work for a company, it
must exist in the database
– Domain constraints: peoples’ ages are between 0 and 150
– General constraints: all others (at most 30 students enroll in a
class)
14
More about Keys
• Every entity must have a key
– why?
• A key can consist of more than one attribute
• There can be more than one key for an entity set
– Among all candidate keys, one key will be designated as primary
key
15
ER Model vs. Relational Model
• Both are used to model data
• ER model has many concepts
– Entities, relationships, attributes, etc.
– Well-suited for capturing the app. requirements
– Not well-suited for computer implementation
• Relational model
– Has just a single concept: relation (table)
– World is represented with a collection of tables
– Well-suited for efficient manipulations on computers
16
Relation: An Example
Name of Table (Relation)
Column (Field, Attribute)
Products
Name
Price
Category
Manufacturer
Gizmo
19.99
Gadgets
Gizmo works
Power gizmo
29.99
Gadgets
Gizmo works
Single touch
149.99
Photography
Canon
Multi touch
203.99
househould
Hitachi
Row (Record, Tuple)
Domain (Atomic type)
17
Relations
• Schema vs. instance = columns vs. rows
• Schema of a relation
1. Relation name
2. Attribute names
3. Attribute types (domains)
• Schema of a database
– A set of relation schemas
• Questions
– When do you determine a schema (instance)?
– How often do you change your mind?
18
Relations
• The database maintains a current database state
• Updates to the data happen very frequently
– add a tuple
– delete a tuple
– modify an attribute in a tuple
• Updates to the schema are relatively rare, and rather
painful. Why?
19
Defining a Database Schema
• A database schema comprises declarations for the
relations (“tables”) of the database
• Simplest form of creation is:
CREATE TABLE <name> (
<list of elements>
);
• And you may remove a relation from the database
schema by:
DROP TABLE <name>;
20
Elements of Table Declarations
• The principal element is a pair consisting of an attribute
and a type
• The most common types are:
– INT or INTEGER (synonyms)
– REAL or FLOAT (synonyms)
– CHAR(n ) = fixed-length string of n characters
– VARCHAR(n ) = variable-length string of up to n characters
21
Example: Create Table
CREATE TABLE
bar
beer
price
);
Sells (
CHAR(20),
VARCHAR(20),
REAL
22
Declaring Keys
• An attribute or list of attributes may be declared
PRIMARY KEY or UNIQUE
– Each says the attribute(s) so declared functionally determines all
the attributes of the relation schema
– Single attribute keys
CREATE TABLE Beers (
name CHAR(20) UNIQUE,
manf CHAR(20)
);
23
Multi-attribute Keys
CREATE TABLE Sells (
bar
CHAR(20),
beer
VARCHAR(20),
price
REAL,
PRIMARY KEY (bar, beer)
);
24
Foreign Keys
• A Foreign Key is a field whose values are keys in
another relation
– Must correspond to primary key of the second relation
– Like a `logical pointer’
Enrolled
sid
53666
53666
53650
53666
Students
cid
Carnatic101
Reggae203
Topology112
History105
grade
C
B
A
B
sid
53666
53688
53650
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
CREATE TABLE Enrolled
(
sid CHAR(20), cid CHAR(20), grade CHAR(2),
PRIMARY KEY (sid,cid),
FOREIGN KEY (sid) REFERENCES Students,
FOREIGN KEY (cid) REFERENCES Courses
)
age
18
18
19
gpa
3.4
3.2
3.8
25
Relational Algebra
• Querying the database: specify what we want from our
database
– Find all the people who earn more than $1,000,000 and pay taxes in
Tallahassee
• Could write in C++/Java, but a bad idea
• Instead use high-level query languages:
– Theoretical: Relational Algebra, Datalog
– Practical: SQL
– Relational algebra: a basic set of operations on relations that
provide the basic principles
26
What is an “Algebra”?
• Mathematical system consisting of:
– Operands --- variables or values from which new values can be
constructed
– Operators --- symbols denoting procedures that construct new
values from given values
• Examples
– Arithmetic algebra, linear algebra, Boolean algebra ……
• What are operands?
• What are operators?
27
What is Relational Algebra?
• An algebra
– Whose operands are relations or variables that represent relations
– Whose operators are designed to do common things that we need
to do with relations in a database
• relations as input, new relation as output
– Can be used as a query language for relations
28
Relational Operators at a Glance
• Five basic RA operations:
– Basic Set Operations
• union, difference (no intersection, no complement)
– Selection: s
– Projection: p
– Cartesian Product: X
• When our relations have attribute names:
– Renaming: r
• Derived operations:
– Intersection, complement
– Joins (natural join, equi-join, theta join, semi-join)
29
Set Operations
• Union: all tuples in R1 or R2, denoted as R1 U R2
– R1, R2 must have the same schema
– R1 U R2 has the same schema as R1, R2
– Example:
• Active-Employees U Retired-Employees
• Difference: all tuples in R1 and not in R2, denoted as
R1 – R2
– R1, R2 must have the same schema
– R1 - R2 has the same schema as R1, R2
– Example
• All-Employees - Retired-Employees
30
Selection
• Returns all tuples which satisfy a condition, denoted as
sc(R)
– c is a condition: =, <, >, AND, OR, NOT
– Output schema: same as input schema
– Find all employees with salary more than $40,000:
• sSalary > 40000 (Employee)
SSN
Name
Dept-ID
Salary
111060000
Alex
1
30K
SSN
Name
Dept-ID
Salary
754320032
Bob
1
32K
983210129
Chris
2
45K
983210129
Chris
2
45K
31
Projection
• Unary operation: returns certain columns, denoted as
P A1,…,An (R)
–
–
–
–
Eliminates duplicate tuples !
Input schema R(B1, …, Bm)
Condition: {A1, …, An}  {B1, …, Bm}
Output schema S(A1, …, An)
• Example: project social-security number and names:
– P SSN, Name (Employee)
SSN
Name
Dept-ID
Salary
SSN
Name
111060000
Alex
1
30K
111060000
Alex
754320032
Bob
1
32K
754320032
Bob
983210129
Chris
2
45K
983210129
Chris
32
Selection vs. Projection
• Think of relation as a table
– How are they similar?
– How are they different?
– Why do you need both?
33
Cartesian Product
• Each tuple in R1 with each tuple in R2, denoted as R1 x
R2
– Input schemas R1(A1,…,An), R2(B1,…,Bm)
– Output schema is S(A1, …, An, B1, …, Bm)
– Very rare in practice; but joins are very common
– Example: Employee x Dependent
34
Example
Employee
Dependent
SSN
Name
Employee-SSN
Dependent-Name
111060000
Alex
111060000
Chris
754320032
Brandy
754320032
David
Employee x Dependent
SSN
Name
Employee-SSN
Dependent-Name
111060000
Alex
111060000
Chris
111060000
Alex
754320032
David
754320032
Brandy
111060000
Chris
754320032
Brandy
754320032
David
35
Renaming
• Does not change the relational instance, denoted as
Notation: r S(B1,…,Bn) (R)
• Changes the relational schema only
– Input schema: R(A1, …, An)
– Output schema: S(B1, …, Bn)
• Example:
rSoc-sec-num, firstname(Employee)
SSN
Name
Soc-sec-num
firstname
111060000
Alex
111060000
Alex
754320032
Bob
754320032
Bob
983210129
Chris
983210129
Chris
36
Set Operations: Intersection
• Intersection: all tuples both in R1 and in R2, denoted as
R1  R2
– R1, R2 must have the same schema
– R1  R2 has the same schema as R1, R2
– Example
• UnionizedEmployees
 RetiredEmployees
• Intersection is derived:
– R1  R2 = R1 – (R1 – R2)
why ?
37
Theta Join
• A join that involves a predicate q, denoted as R1
q
R2
– Input schemas: R1(A1,…,An), R2(B1,…,Bm)
– Output schema: S(A1,…,An,B1,…,Bm)
– Derived operator:
R1
q
R2 = s q (R1 x R2)
• Take the product R1 x R2
• Then apply SELECTC to the result
• As for SELECT, C can be any Boolean-valued condition
38
Theta Join: Example
Sells
Bar
Name
Address
Bar
Beer
Price
AJ's
1800 Tennessee
AJ’s
Bud
2.5
Michael's Pub
513 Gaines
AJ’s
Miller
2.75
Michael’s Pub
Bud
2.5
Michael’s Pub
Corona
3.0
BarInfo := Sells
Sells.Bar=Bar.Name
Bar
Bar
Beer
Price
Name
Address
AJ’s
Bud
2.5
AJ's
1800 Tennessee
AJ’s
Miller
2.75
AJ's
1800 Tennessee
Michael’s Pub
Bud
2.5
Michael's Pub
513 Gaines
Michael’s Pub
Corona
3.0
Michael's Pub
513 Gaines
39
Natural Join
• Notation: R1 R2
• Input Schema: R1(A1, …, An), R2(B1, …, Bm)
• Output Schema: S(C1,…,Cp)
– Where {C1, …, Cp} = {A1, …, An} U{B1, …, Bm}
• Meaning: combine all pairs of tuples in R1 and R2 that
agree on the attributes:
– {A1,…,An}  {B1,…, Bm} (called the join attributes)
40
Natural Join: Examples
Employee
Dependent
SSN
Name
SSN
Dependent-Name
111060000
Alex
111060000
Chris
754320032
Brandy
754320032
David
Employee Dependent =
P SSN, Name, Dependent-Name(sEmployee.SSN=Dependent.SSN(Employee x Dependent)
SSN
Name
Dependent-Name
111060000
Alex
Chris
754320032
Brandy
David
41
Natural Join: Examples
R
S
A
B
B
C
X
Y
Z
U
X
Z
V
W
Y
Z
Z
V
Z
V
R
S
A
B
C
X
Z
U
X
Z
V
Y
Z
U
Y
Z
V
Z
V
W
42
Natural Join
• Given the schemas R(A, B, C, D), S(A, C, E), what is the
schema of R S ?
• Given R(A, B, C), S(D, E), what is R
• Given R(A, B), S(A, B), what is R
S?
S?
43
Equi-join
• Special case of theta join: condition c contains only
conjunction of equalities
– Result schema is the same as that of Cartesian product
– May have fewer tuples than Cartesian product
– Most frequently used in practice:
R1
A=B
R2
– Natural join is a particular case of equi-join
– A lot of research on how to do it efficiently
44
Building Complex Expressions
• Algebras allow us to express sequences of operations in
a natural way
– Example
• In arithmetic algebra: (x + 4)*(y - 3)
– Relational algebra allows the same
• Three notations, just as in arithmetic:
1. Sequences of assignment statements
2. Expressions with several operators
3. Expression trees
45
Sequences of Assignments
• Create temporary relation names
• Renaming can be implied by giving relations a list of
attributes
• Example: R3 := R1 JOINC R2 can be written:
R4 := R1 x R2
R3 := SELECTC (R4)
46
Expressions with Several Operators
• Example: the theta-join R3 := R1 JOINC R2 can be
written: R3 := SELECTC (R1 x R2)
• Precedence of relational operators:
1. Unary operators --- select, project, rename --- have highest
precedence, bind first
2. Then come products and joins
3. Then intersection
4. Finally, union and set difference bind last

But you can always insert parentheses to force the
order you desire
47
Expression Trees
• Leaves are operands
– either variables standing for relations or particular constant
relations
• Interior nodes are operators, applied to their child or
children
48
Expression Tree: Examples
Given Bars(name, addr), Sells(bar, beer, price), find the names of all
the bars that are either on Tennessee St. or sell Bud for less than $3
UNION
RENAMER(name)
PROJECTname
SELECTaddr = “Tennessee St.”
Bars
PROJECTbar
SELECT
price<3 AND beer=“Bud”
Sells
49
Question: How to do this?
• Using Sells(bar, beer, price), find the bars that sell two
different beers at the same price
50
Glimpse Ahead:
Efficient Implementations of Operators
• s(age >= 30 AND age <= 35)(Employees)
– Method 1: scan the file, test each employee
– Method 2: use an index on age
– Which one is better ? Depends a lot…
• Employees
–
–
–
–
–
Relatives
Iterate over Employees, then over Relatives
Iterate over Relatives, then over Employees
Sort Employees, Relatives, do “merge-join”
“hash-join”
Etc.
51
Glimpse Ahead: Optimizations
Product ( pid, name, price, category, maker-cid)
Purchase (buyer-ssn, seller-ssn, store, pid)
Person(ssn, name, phone number, city)
• Which is better:
sprice>100(Product) (Purchase
(sprice>100(Product) Purchase)
scity=seaPerson)
scity=seaPerson
• Depends ! This is the optimizer’s job…
52
SQL
• Standard language for querying and manipulating data
– SQL stands for Structured Query Language
– Initially developed at IBM by Donald Chamberlin and Raymond
Boyce in the early 1970s, and called SEQUEL (Structured English Query
Language)
– Many standards out there: SQL92, SQL2, SQL3, SQL99
– Vendors support various subsets of these standards
• Why SQL?
– A very-high-level language, in which the programmer is able to avoid
specifying a lot of data-manipulation details that would be necessary in
languages like C++
– Its queries are “optimized” quite well, yielding efficient query executions
53
Introduction
• Two sublanguages
– DDL – Data Definition Language
• define and modify schema
CREATE TABLE table_name
( { column_name data_type [ DEFAULT default_expr ] [
column_constraint [, ... ] ] | table_constraint } [, ... ] )
– DML – Data Manipulation Language
• Queries can be written intuitively
Select-From-Where
54
Select-From-Where Statements
• The principal form of a SQL query is:
SELECT desired attributes
FROM one or more tables
WHERE condition about tuples of the tables
55
Our Running Example
• Most of our SQL queries will be based on the following
database schema
– Underline indicates key attributes
Beers(name, manf)
Bars(name, addr, license)
Drinkers(name, addr, phone)
Likes(drinker, beer)
Sells(bar, beer, price)
Frequents(drinker, bar)
56
Select-From-Where Example
• Using Beers(name, manf), what beers are made by
Busch?
SELECT name
FROM Beers
Name
‘Bud’
‘Bud Lite’
‘Michelob’
WHERE manf = ‘Busch’;
• The answer is a relation with a single attribute name,
and tuples with the name of each beer by Busch, such as
Bud
57
Single-Relation Query
• Operation
1. Begin with the relation in the FROM clause
2. Apply the selection indicated by the WHERE clause
3. Apply the extended projection indicated by the SELECT clause
• Semantics
1. To implement this algorithm think of a tuple variable ranging
over each tuple of the relation mentioned in FROM
2. Check if the “current” tuple satisfies the WHERE clause
3. If so, compute the attributes or expressions of the SELECT
clause using the components of this tuple
58
* In SELECT clauses
• When there is one relation in the FROM clause, * in the
SELECT clause stands for “all attributes of this relation.”
• Example using Beers(name, manf):
SELECT *
FROM Beers
WHERE manf = ‘Busch’;
Name
manf
‘Bud’
‘Busch’
‘Bud Lite’
‘Busch’
‘Michelob’
‘Busch’
Now, the result has each of the attributes of Beers
59
Renaming Attributes
• If you want the result to have different attribute names,
use “AS <new name>” to rename an attribute
• Example based on Beers(name, manf):
SELECT name AS beer, manf
FROM Beers
WHERE manf = ‘Busch’
beer
manf
‘Bud’
‘Busch’
‘Bud Lite’
‘Busch’
‘Michelob’
‘Busch’
60
Expressions in SELECT Clauses
• Any expression that makes sense can appear as an
element of a SELECT clause
• Example: from Sells(bar, beer, price):
SELECT bar, beer,
price * 120 AS priceInYen
FROM Sells;
bar
beer
priceInYen
Joe’s
Bud
300
Sue’s
Miller
360
…
…
…
61
Complex Conditions in WHERE Clause
• From Sells(bar, beer, price), find the price Joe’s Bar
charges for “cheap” beers:
SELECT price
FROM Sells
WHERE bar = ‘joe bar’ AND
price < 5.0;
62
Selections
• What you can use in WHERE:
– attribute names of the relation(s) used in the FROM
– comparison operators: =, <>, <, >, <=, >=
– apply arithmetic operations: stockprice*2
–
–
–
–
operations on strings (e.g., “||” for concatenation)
Lexicographic order on strings
Pattern matching: s LIKE p
Special stuff for comparing dates and times.
63
NULL Values
• Tuples in SQL relations can have NULL as a value for one or more
components
• Meaning depends on context. Two common cases:
– Missing value : e.g., we know Joe’s Bar has some address, but we don’t know
what it is
– Inapplicable : e.g., the value of attribute spouse for an unmarried person
• The logic of conditions in SQL is really 3-valued logic: TRUE,
FALSE, UNKNOWN
– When any value is compared with NULL, the truth value is UNKNOWN
– A query only produces a tuple in the answer if its value for the WHERE clause
is TRUE (not FALSE or UNKNOWN)
64
Three-Valued Logic
• To understand how AND, OR, and NOT work in 3-valued
logic, think of TRUE = 1, FALSE = 0, and UNKNOWN = ½,
AND = MIN; OR = MAX, NOT(x) = 1-x.
• Example:
TRUE AND (FALSE OR NOT(UNKNOWN))
= MIN(1, MAX(0, (1 - ½ )))
= MIN(1, MAX(0, ½ )
= MIN(1, ½ )
=½
65
Surprising Example
• From the following Sells relation:
bar
beer
Price
Joe’s
Bud
NULL
SELECT bar
FROM Sells
WHERE price < 2.00 OR price >= 2.00;
UNKNOWN
UNKNOWN
UNKNOWN
66
Multi-relation Queries
• Interesting queries often combine data from more than
one relation, we can address several relations in one
query by listing them all in the FROM clause.
– Distinguish attributes of the same name by “<relation>.<attribute>”
– Example: Using relations Likes(drinker, beer) and Frequents(drinker,
bar), find the beers liked by at least one person who frequents Joe’s
Bar.
SELECT Likes.beer
FROM
Likes, Frequents
WHERE
Frequents.bar = ‘Joe Bar’ AND
Frequents.drinker = Likes.drinker;
67
Semantics
•
Almost the same as for single-relation queries:
1.
Start with the (Cartesian) product of all the relations in the
FROM clause
2.
Apply the selection condition from the WHERE clause
3.
Project onto the list of attributes and expressions in the
SELECT clause
SELECT a1, a2, …, ak
FROM R1 AS x1, R2 AS x2, …, Rn AS xn
WHERE Conditions
Translation to Relational algebra: Πa1,…,ak (s Conditions (R1 x R2 x … x Rn))
Select-From-Where queries are precisely Select-Project-Join
68
Semantics
SELECT a1, a2, …, ak
FROM R1 AS x1, R2 AS x2, …, Rn AS xn
WHERE Conditions
Answer = {}
for x1 in R1 do
for x2 in R2 do
…..
for xn in Rn do
if Conditions
then Answer = Answer U {(a1,…,ak)
return Answer
69
Explicit Tuple-Variables
• Sometimes, a query needs to use two copies of the
same relation
– Distinguish copies by following the relation name by the name of
a tuple-variable, in the FROM clause
– It’s always an option to rename relations this way, even when not
essential
SELECT s1.bar
FROM Sells s1, Sells s2
WHERE s1.beer = s2.beer AND
s1.price < s2.price;
70
SubQueries
• A parenthesized SELECT-FROM-WHERE statement (subquery) can
be used as a value in a number of places, including FROM and
WHERE clauses
– Example: in place of a relation in the FROM clause, we can place another
query, and then query its result
• Better use a tuple-variable to name tuples of the result
• Subqueries that return Scalar
– If a subquery is guaranteed to produce one tuple with one component, then
the subquery can be used as a value
• “Single” tuple often guaranteed by key constraint
• A run-time error occurs if there is no tuple or more than one tuple
71
Example
•
From Sells(bar, beer, price), find the bars that serve
Miller for the same price Joe charges for Bud
–
Two queries would surely work:
1.
Find the price Joe charges for Bud
2.
Find the bars that serve Miller at that price
SELECT bar
FROM Sells
WHERE beer = ‘Miller’ AND
price = (SELECT price
FROM Sells
WHERE bar = ‘Joe Bar’
AND beer = ‘Bud’)
72
The IN Operator
• <tuple> IN <relation> is true if and only if the tuple is a
member of the relation
– <tuple> NOT IN <relation> means the opposite
– IN-expressions can appear in WHERE clauses
– The <relation> is often a subquery
Query: From Beers(name, manf) and Likes(drinker, beer), find the name and
manufacturer of each beer that Fred likes
SELECT *
FROM Beers
The set of beers
WHERE name IN ( SELECT beer
Fred likes
FROM Likes
WHERE drinker = ‘Fred’
);
73
The Exists Operator
• EXISTS( <relation> ) is true if and only if the <relation>
is not empty
– Being a Boolean-valued operator, EXISTS can appear in WHERE
clauses
Query: From Beers(name, manf), find those beers that are the only
beer by their manufacturer
Set of beers with the
same manf as b1, but
not the same beer
SELECT name
Scope rule: manf refers
to closest nested FROM with
FROM Beers b1
a relation having that attribute.
WHERE NOT EXISTS(
SELECT *
FROM Beers
WHERE manf = b1.manf AND
name <> b1.name);
74
The Operator ANY
• x = ANY( <relation> ) is a Boolean condition meaning
that x equals at least one tuple in the relation
• Similarly, = can be replaced by any of the comparison
operators
– Example: x >= ANY( <relation> ) means x is not smaller than
some tuples in the relation
– Note tuples must have one component only
75
The Operator ALL
• x <> ALL( <relation> ) is true if and only if for every
tuple t in the relation, x is not equal to t
– That is, x is not a member of the relation.
• The <> can be replaced by any comparison operator
– Example: x >= ALL( <relation> ) means there is no tuple larger
than x in the relation
Query: From Sells(bar, beer, price), find the beer(s) sold for the highest price
SELECT beer
FROM Sells
WHERE price >= ALL(
SELECT price
FROM Sells);
price from the outer Sells must
not be less than any price
76
Bag (Set) Semantics for SFW Queries
• The SELECT-FROM-WHERE statement uses bag
semantics
– Selection: preserve the number of occurrences
– Projection: preserve the number of occurrences (no duplicate
elimination)
– Cartesian product, join: no duplicate elimination
• The default for union, intersection, and difference is set
semantics, and is expressed by the following forms,
each involving subqueries:
– ( subquery ) UNION ( subquery )
– ( subquery ) INTERSECT ( subquery )
– ( subquery ) EXCEPT ( subquery )
77
Example
•
Happy Drinker: From relations Likes(drinker, beer), Sells(bar,
beer, price) and Frequents(drinker, bar), find the drinkers and
beers such that:
1. The drinker likes the beer, and
2. The drinker frequents at least one bar that sells the beer
(SELECT * FROM Likes)
INTERSECT
(SELECT drinker, beer
FROM Sells, Frequents
WHERE Frequents.bar = Sells.bar
);
The drinker frequents
a bar that sells the beer
78
Set vs. Bag: Efficiency
• When doing projection in relational algebra, it is easier
to avoid eliminating duplicates
– Just work tuple-at-a-time
• When doing intersection or difference, it is most
efficient to sort the relations first
– At that point you may as well eliminate the duplicates anyway
79
Controlling Duplicate Elimination
• Force the result to be a set by SELECT DISTINCT
– From Sells(bar, beer, price), find all the different prices charged
for beers:
SELECT DISTINCT price
FROM Sells;
• Force the result to be a bag (i.e., don’t eliminate
duplicates) by ALL, as in
. . . UNION ALL . . .
– Lists drinkers who frequent more bars than they like beers, and
does so as many times as the difference of those counts
(SELECT drinker FROM Frequents)
EXCEPT ALL
(SELECT drinker FROM Likes);
80
Aggregations
• SUM, AVG, COUNT, MIN, and MAX can be applied to a
column in a SELECT clause to produce that aggregation
on the column
– e.g. COUNT(*) counts the number of tuples
• Query: From Sells(bar, beer, price), find the average price
of Bud
SELECT AVG(price)
FROM Sells
WHERE beer = ‘Bud’
81
Group By
• We may follow a SELECT-FROM-WHERE expression by
GROUP BY and a list of attributes
– The relation that results from the SELECT-FROM-WHERE is
grouped according to the values of all those attributes, and any
aggregation is applied only within each group
• Query: From Sells(bar, beer, price), find the average
price for each beer:
SELECT beer, AVG(price)
FROM Sells
GROUP BY beer
82
Example
• Query: From Sells(bar, beer, price) and Frequents
(drinker, bar), find for each drinker the average price
of Bud at the bars they frequent:
SELECT drinker, AVG(price)
FROM Frequents, Sells
WHERE beer = ‘Bud’ AND
Compute drinker-barprice of Bud tuples
first, then group
by drinker
Frequents.bar = Sells.bar
GROUP BY drinker;
83
Restriction on SELECT Lists With Aggregation
• If any aggregation is used, then each element of the
SELECT list must be either:
1.
2.
•
Aggregated, or
An attribute on the GROUP BY list
Question: How about this query?
SELECT bar, MIN(price)
FROM Sells
WHERE beer = ‘Bud’;
84
Having Clause
• HAVING <condition> may follow a GROUP BY clause. If
so, the condition applies to each group, and groups not
satisfying the condition are eliminated
–
These conditions may refer to any relation or tuple-variable in
the FROM clause
–
They may refer to attributes of those relations, as long as the
attribute makes sense within a group; i.e., it is either:
1. A grouping attribute, or
2. Aggregated
85
Having Clause: Example
SELECT beer, AVG(price)
FROM Sells
GROUP BY beer
HAVING COUNT(bar) >= 3 OR beer = ‘michelob’;
86
General form of Grouping and Aggregation
SELECT S
FROM
R1,…,Rn
WHERE C1
GROUP BY a1,…,ak
HAVING C2
S = may contain attributes a1,…,ak and/or any aggregates but NO
OTHER ATTRIBUTES
C1 = is any condition on the attributes in R1,…,Rn
C2 = is any condition on aggregate expressions or grouping
attributes
87
General form of Grouping and Aggregation
SELECT S
FROM
R1,…,Rn
WHERE C1
GROUP BY a1,…,ak
HAVING C2
Evaluation steps:
1.
Compute the FROM-WHERE part, obtain a table with all attributes in
R1,…,Rn
2.
Group by the attributes a1,…,ak
3.
Compute the aggregates in C2 and keep only groups satisfying C2
4.
Compute aggregates in S and return the result
88
Modifications
• A modification command does not return a result as a
query does, but it changes the database in some way
• There are three kinds of modifications:
1.
Insert a tuple or tuples
2.
Delete a tuple or tuples
3.
Update the value(s) of an existing tuple or tuples
89
Insertion
• To insert a single tuple:
INSERT INTO <relation>
VALUES ( <list of values> );
• Example: add to Likes(drinker, beer) the fact that Sally
likes Bud:
INSERT INTO Likes
VALUES(‘Sally’, ‘Bud’);
90
Specifying Attributes in INSERT
• We may add to the relation name a list of attributes
• There are two reasons to do so:
1.
We forget the standard order of attributes for the relation
2.
We don’t have values for all attributes, and we want the system
to fill in missing components with NULL or a default value
• Another way to add the fact that Sally likes Bud to
Likes(drinker, beer):
INSERT INTO Likes(beer, drinker)
VALUES(‘Bud’, ‘Sally’);
91
Inserting Many Tuples
• We may insert the entire result of a query into a
relation, using the form:
INSERT INTO <relation>
( <subquery> );
E.g.,
INSERT INTO Beers(name)
SELECT beer from Sells;
92
Example: Insert a Subquery
• Using Frequents(drinker, bar), enter into the new
relation PotBuddies (name) all of Sally’s “potential
buddies,” i.e., those drinkers who frequent at least one
bar that Sally also frequents
The other
INSERT INTO PotBuddies
drinker
(SELECT d2.drinker
FROM Frequents d1, Frequents d2
WHERE d1.drinker = ‘Sally’ AND
d2.drinker <> ‘Sally’ AND
d1.bar = d2.bar
);
Pairs of Drinker
tuples where the
first is for
Sally, the
second is for
someone else,
and the bars are
the same
93
Deletion
• To delete tuples satisfying a condition from some
relation:
DELETE FROM <relation>
WHERE <condition>;
• Example: Delete from Likes(drinker, beer) the fact that
Sally likes Bud:
DELETE FROM Likes
WHERE drinker = ‘Sally’ AND
beer = ‘Bud’;
94
Delete all Tuples
• Make the relation Likes empty:
DELETE FROM Likes;
• Note no WHERE clause needed
95
Delete Many Tuples
• Delete from Beers(name, manf) all beers for which
there is another beer by the same manufacturer.
Beers with the same manufacturer
DELETE FROM Beers b
and a different name from the name
of the beer represented by tuple b
WHERE EXISTS
(
SELECT name FROM Beers a
WHERE a.manf = b.manf AND
a.name <> b.name
);
96
Semantics of Deletion
• Suppose Busch makes only Bud and Bud Lite, and
suppose we come to the tuple b for Bud first
– The subquery is nonempty, because of the Bud Lite tuple, so we
delete Bud
– Now, When b is the tuple for Bud Lite, do we delete that tuple
too?
• The answer is that we do delete Bud Lite as well. The
reason is that deletion proceeds in two stages:
1.
2.
Mark all tuples for which the WHERE condition is satisfied in
the original relation
Delete the marked tuples
97
Updates
• To change certain attributes in certain tuples of a
relation:
UPDATE <relation>
SET <list of attribute assignments>
WHERE <condition on tuples>;
• Example: Change drinker Fred’s phone number to 5551212:
UPDATE Drinkers
SET phone = ‘555-1212’
WHERE name = ‘Fred’;
98
Update Several Tuples
• Increase price that is cheap:
UPDATE Sells
SET price = price * 1.07
WHERE price < 3.0;
99
Views
• A view is a “virtual table”, a relation that is defined in
terms of the contents of other tables and views
– Declare by:
CREATE VIEW <name> AS <query>;
• In contrast, a relation whose value is really stored in the
database is called a base table
100
Example: View Definition
• CanDrink (drinker, beer) is a view “containing” the
drinker-beer pairs such that the drinker frequents at
least one bar that serves the beer:
CREATE VIEW CanDrink AS
SELECT drinker, beer
FROM Frequents, Sells
WHERE Frequents.bar = Sells.bar;
101
Example: Accessing a View
• You may query a view as if it were a base table
– There is a limited ability to modify views if the modification
makes sense as a modification of the underlying base table
• Example:
SELECT beer FROM CanDrink
WHERE drinker = ‘Sally’;
102
What Happens When a View Is Used?
• The DBMS starts by interpreting the query as if the
view were a base table
– Typical DBMS turns the query into something like relational
algebra
• The queries defining any views used by the query are
also replaced by their algebraic equivalents, and
“spliced into” the expression tree for the query
103
Example: View Expansion
PROJbeer
SELECTdrinker=‘Sally’
CanDrink
PROJdrinker, beer
JOINFrequents.bar
Frequents
= Sells.bar
Sells
104
Have fun!
Tallahassee, Florida, 2016
Download