Lec. notes - Wright State University

advertisement
ISOM
MIS3150 Data and Information
Management
Query Languages –SQL
Arijit Sengupta
Structure of this semester
ISOM
MIS3150
1. Design
0. Intro
Database
Fundamentals
Conceptual
Modeling
Relational
Model
2. Querying
Query
Languages
Advanced
SQL
3. Advanced
Topics
Transaction
Management
4. Applications
Java DB
Applications –
JDBC
Data
Mining
Normalization
Newbie
Users
Designers
Developers
Professionals
Today’s Buzzwords
ISOM
•
•
•
•
•
•
•
•
Query Languages
Formal Query Languages
Procedural and Declarative Languages
Relational Algebra
Relational Calculus
SQL
Aggregate Functions
Nested Queries
Objectives
ISOM
At the end of the lecture, you should
• Get a formal as well as practical perspective on query
languages
• Have a background on query language basics (how they
came about)
• Be able to write simple SQL queries from the specification
• Be able to look at SQL queries and understand what it is
supposed to do
• Be able to write complex SQL queries involving nesting
• Execute queries on a database system
Set Theory Basics
ISOM
• A set: a collection of distinct items with
no particular order
• Set description:
 { b | b is a Database Book}
 {c | c is a city with a population of over a
million}
 {x | 1 < x < 10 and x is a natural number}
• Most basic set operation:
 Membership: x  S (read as x belongs to S if
x is in the set S)
Other Set Operations
ISOM
• Addition, deletion (note that adding an existing
item in the set does not change it)
• Set mathematics:
 Union R  S = { x | x  R or x  S}
 Intersection R  S = { x | x R and x  S}
 Set Difference R – S = { x | x  R and x  S}
 Cross-product R x S = { <x,y> | x  R and y  S}
• You can combine set operations much like
arithmetic operations: R – (S  T)
• Usually no well-defined precedence
Relational Query Languages
ISOM
• Query languages: Allow manipulation and retrieval
of data from a database.
• Relational model supports simple, powerful QLs:


Strong formal foundation based on logic.
Allows for much optimization.
• Query Languages != programming languages!



QLs not expected to be “Turing complete”.
QLs not intended to be used for complex calculations.
QLs support easy, efficient access to large data sets.
Formal Relational Query
Languages
ISOM
Two mathematical Query Languages form the
basis for “real” languages (e.g. SQL), and for
implementation:
Relational Algebra: More operational, very
useful for representing execution plans.
Relational Calculus: Lets users describe
what they want, rather than how to compute
it. (Non-operational, declarative.)
 Understanding Algebra & Calculus is key to
understanding SQL, query processing!
Structured Query Language
ISOM
• Need for SQL
 Operations on Data Types
 Definition Manipulation
 Operations on Sets
 Declarative (calculus) vs. Procedural (algebra)
• Evolution of SQL
 SEQUEL ..SQL_92 .. SQL_93
 SQL Dialects
Does SQL treat Relations as ‘Sets’?
Preliminaries
ISOM
• A query is applied to relation instances, and
the result of a query is also a relation
instance.


Schemas of input relations for a query are fixed
(but query will run regardless of instance!)
The schema for the result of a given query is also
fixed! Determined by definition of query language
constructs.
• Positional vs. named-field notation:


Positional notation easier for formal definitions,
named-field notation more readable.
Both used in SQL
Example Instances
ISOM
• Students, Registers, Courses relations for our examples.
R1
S1
sid cid
22 101
58 103
sid
22
31
58
semester
Fall 99
Spring 99
sname GPA
dustin
3.5
lubber
3.8
rusty
4.0
age
25.0
25.5
23.0
C1
cid cname
dept
101 Database CIS
103 Internet
ECI
S2
sid
28
31
44
58
sname GPA
yuppy
3.9
lubber
3.8
guppy
3.5
rusty
4.0
age
24.0
25.5
25.5
23.0
Relational Algebra
ISOM
• Basic operations:







Selection ( ) Selects a subset of rows from relation.
Projection ( ) Deletes unwanted columns from relation.
Cross-product ( ) Allows us to combine two relations.
Set-difference ( ) Tuples in reln. 1, but not in reln. 2.
Union (  ) Tuples in reln. 1 and in reln. 2.


• Additional operations:

Intersection, join, division, renaming: Not essential, but
(very!) useful.
• Since each operation returns a relation, operations
can be composed! (Algebra is “closed”.)
Projection
ISOM
• Deletes attributes that are not in
projection list.
• Schema of result contains
exactly the fields in the
projection list, with the same
names that they had in the
(only) input relation.
• Projection operator has to
eliminate duplicates! (Why??)
 Note: real systems typically
don’t do duplicate elimination
unless the user explicitly asks
for it. (Why not?)
sname
GPA
yuppy
lubber
guppy
rusty
3.9
3.8
3.5
4.0
 sname, gpa(S 2)
age
24.0
25.5
23.0
 age(S 2)
Vertical Slices
ISOM
• Projection
 Specifying Elements
No Specification
List all information about
Students

select
from
*
STUDENT;
(Student)
Algebra: projection
<A1,A2,...Am> (R)
Conditional
List IDs, names, and addresses of
all students

select StudentID, name, address
from STUDENT;
 StudentID, name, address (Student)
Does SQL treat Relations as
‘Sets’?
ISOM
What are the different salaries we pay to our
employees?
select
from
salary
EMPLOYEE;
OR is the following better?
select
from
DISTINCT salary
EMPLOYEE;
Selection
ISOM
• Selects rows that satisfy
selection condition.
• No duplicates in result!
(Why?)
• Schema of result identical
to schema of (only) input
relation.
• Result relation can be the
input for another relational
algebra operation!
(Operator composition.)
sid sname GPA
28 yuppy 3.9
58 rusty 4.0
age
35.0
35.0
 gpa3.8(S 2)
sname
yuppy
rusty
GPA
3.9
4.0
 sname,gpa(
(S 2))
gpa3.8
Horizontal Slices
ISOM
Algebra: selection
or restriction
(R)
• Restriction
 Specifying Conditions
Unconditional
List all students

select
from
*
STUDENT;
(Student)
Conditional
List all students with GPA > 3.0
select *
from STUDENT
where GPA > 3.0;


GPA > 3.0
(Student)
Specifying Conditions
ISOM
List all students in ...
select *
from
STUDENT
where city in (‘Boston’,’Atlanta’);
List all students in ...
select *
from
STUDENT
where zip not between 60115 and 60123;
Pattern Matching
ISOM
‘%’
‘_’
x
any string with n characters, n>=0
any single character.
exact sequence of string x.
List all CIS courses.
select *
from
COURSE
where course# like ‘CIS%’;
List all CIS 3200 level courses.
select *
from COURSE
where course# like ? ;
Missing or Incomplete Information
ISOM
•List all students whose address or telephone number is missing:
select
from
where
*
STUDENT
Address is null or GPA is null;
Horizontal and Vertical
ISOM
Query:
List all student ID, names and addresses who have
GPA > 3.0 and date of birth before Jan 1, 1980.
select
from
where
order by
StudentID, Name, Address
STUDENT
GPA > 3.0 and DOB < ‘1-Jan-80’
Name DESC;
Algebra:  StudentID,name, address ( GPA > 3.0 and DOB < ‘1-Jan-80’ (STUDENT))
Calculus: {t.StudentID, t.name, t.address | t  Student t.GPA > 3.0 
t.DOB < ‘1-Jan-80’}
Order by sorts result in descending (DESC) order.
Note: The default order is ascending (ASC) as in:
order by Name;
Union, Intersection, Set-Difference
ISOM
• All of these operations take
two input relations, which
must be union-compatible:
 Same number of fields.
 `Corresponding’ fields
have the same type.
• What is the schema of
result?
sid
sname
gpa
age
22
31
58
44
28
dustin
lubber
rusty
guppy
yuppy
3.5
3.8
4.0
3.5
3.9
25.0
25.5
23.0
25.5
24.0
S1 S2
sid sname gpa
31 lubber 3.8
58 rusty 4.0
S1 S2
sid
22
sname
dustin
gpa
3.5
S1 S2
age
25.5
23.0
age
25.0
Union
ISOM
List
students who live in Atlanta or GPA > 3.0
select
StudentID, Name, DOB, Address
from
STUDENT
where
Address = ‘Atlanta’
union
select
StudentID, Name, DOB, Address
from
STUDENT
where
GPA > 3.0;
Can we perform a Union on any two Relations ?
Union Compatibility
ISOM
Two
relations, A and B, are union-compatible
if
A and B contain a same number of attributes, and
The corresponding attributes of the two have the same domains
Examples
CIS=Student (ID: Did; Name: Dname; Address: Daddr; Grade: Dgrade);
Senior-Student (SName: Dname; S#: Did; Home: Daddr; Grade: Dgrade);
Course (C#: Dnumber; Title: Dstr; Credits: Dnumber)
Are CIS-Student and Senior-Student union compatible?
Are CIS-Student and Course union compatible?
What happens if we have duplicate tuples?
What will be the column names in the resulting Relation?
Union, Intersect, Minus
ISOM
select
CUSTNAME, ZIP
from
CUSTOMER
where
STATE = ‘MA’
UNION
select
SUPNAME, ZIP
from
SUPPLIER
where
STATE = ‘MA’
ORDER BY 2;
B
A
select
CUSTNAME, ZIP
from
CUSTOMER
where
STATE = ‘MA’
MINUS
select
SUPNAME, ZIP
from
SUPPLIER
where
STATE = ‘MA’
ORDER BY 2;
select
CUSTNAME, ZIP
from
CUSTOMER
where
STATE = ‘MA’
INTERSECT
select
SUPNAME, ZIP
from
SUPPLIER
where
STATE = ‘MA’
ORDER BY 2;
B
A
B
A
Cross-Product
ISOM
• Each row of S1 is paired with each row of R1.
• Result schema has one field per field of S1 and R1,
with field names `inherited’ if possible.
 Conflict: Both S1 and R1 have a field called sid.
(sid) sname GPA
Age (sid) cid
semester
22
dustin
3.5
25.0
22
101
Fall 99
22
dustin
3.5
25.0
58
103
Spring 99
31
lubber
3.8
25.5
22
101
Fall 99
31
lubber
3.8
25.5
58
103
Spring 99
58
rusty
4.0
23.0
22
101
Fall 99
58
rusty
4.0
23.0
58
103
Spring 99
 Renaming operator:
 (C(1 sid1, 5  sid 2), S1  R1)
Joins
ISOM
• Condition Join:
(sid)
22
31
sname
dustin
lubber
R  c S   c ( R  S)
GPA age (sid) cid Semester
3.5
25.0 58
103 Spring 99
3.8
25.5 58
103 Spring 99
S1 
R1
S1. sid  R1. sid
• Result schema same as that of crossproduct.
• Fewer tuples than cross-product, might be
able to compute more efficiently
• Sometimes called a theta-join.
Joins
ISOM
• Equi-Join: A special case of condition join where the
condition c contains only equalities.
sid
22
58
sname
dustin
rusty
GPA
3.5
4.0
S1 
age
25.0
23.0
sid
cid
101
103
semester
Fall 99
Spring 99
R1
• Result schema similar to cross-product, but only one
copy of fields for which equality is specified.
• Natural Join: Equijoin on all common fields.
Find names of students who have taken
course #103
ISOM


Solution 1:
 sname((
Solution 2:
Re gisters) Students)
cid 103
 (Temp1,
Re gisters)
cid 103
 (Temp2,Temp1 Students)
 sname (Temp2)

Solution 3:
 sname(
(Re gisters Students))
cid 103
Connecting/Linking Relations
ISOM
List
information about all students and the classes they are taking
Student
ID
s1
s2
s3
***
Name
Jose
Alice
Tome
***
***
***
***
***
***
Class
Emp#
e1
e3
e2
***
ID
C#
***
s1
BA 201
***
s2
CIS 300
***
s3
CIS 304
***
***
***
What can we use to connect/link Relations?
Join: Connecting relations so that relevant tuples can be retrieved.
Join
ISOM
Cartesian
Product
R1
Student: 30 tuples
R2
Class: 4 tuples
Total Number of Tuples in the Cartesian Product. ?
(match each tuple of student to every tuple of class)
Select tuples having identical Student Ids.
Expected number of such Tuples:
Join Selectivity
Join Forms
R1
R2
ISOM
• General Join Forms
 Equijoin
 Operator Dependent
• Natural Join
• Outer Join
select
from
where
s.*, c.*
STUDENT s, CLASS c
s.StudentID = c. SID;
R1
R2
=
x > y
<>
...
 Left
 Right
 Full
select
from
where
s.*, c.*
STUDENT s, CLASS c
s.StudentID = c.SID (+);
Find names of students who have taken a
CIS course
ISOM
• Information about departments only available in
Courses; so need an extra join:
 sname((

Courses ) Re gisters  Students)
dept 'CIS '
A more efficient solution:
 sname(
sid
((

Courses) Re gis)  Students)
cid dept 'CIS '
 A query optimizer can find this given the first solution!
Find students who have taken an MIS or a CS
course
ISOM
• Can identify all MIS or CS courses, then find students
who have taken one of these courses:
 (Temp1,(
dept 'MIS'dept 'CS '
Courses))
 sname(Temp1 Re gis  Students)


Can also define Temp1 using union! (How?)
What happens if
 is replaced by  in this query?
Find students who have taken a CIS and an ECI
Course
ISOM
• Previous approach won’t work! Must identify
students who have taken CIS courses, students who
have taken ECI courses, then find the intersection
(note that sid is a key for Students):
 (Temp1,
sid
 (Temp2,
((
sid
Courses ) Re gis))
dept 'CIS '
((
Courses ) Re gis))
dept 'ECI '
 sname((Temp1Temp2) Students)
Relational Calculus
ISOM
• Comes in two flavours: Tuple relational calculus
(TRC) and Domain relational calculus (DRC).
• Calculus has variables, constants, comparison ops,
logical connectives and quantifiers.



TRC: Variables range over (i.e., get bound to) tuples.
DRC: Variables range over domain elements (= field values).
Both TRC and DRC are simple subsets of first-order logic.
• Expressions in the calculus are called formulas. An
answer tuple is essentially an assignment of
constants to variables that make the formula evaluate
to true.
Find students with GPA > 3.7 who have
taken a CIS Course
ISOM
t |tStudents t.GPA 3.7
r rRe gisr.sid t.sid 





TRC:




c cCoursesc.cid  r.cid c.dept 'CIS'
 
 
 

 




DRC:





I , N ,G, A | I , N ,G, A StudentsG 3.7
Ir,Cr,S Ir,Cr,S Re gisIr  I 





C,CN , D C,CN , D Courses C  Cr  D 'CIS '





 
 
 
 
 
Find students who have taken all
CIS courses
ISOM
TRC:
t|tStudents 



c cCourses^c.dept 'CIS '








DRC:





r rRe gisr.sid t.sid r.cid  c.cid




  
 
  

  

I , N ,G, A | I , N ,G, A Students
C,CN ,D C,CN , D Courses^ D 'CIS '














Ir,Cr,S Ir,Cr,S Re gis I  Ir Cr C
How will you do this with Relational Algebra?
  
 
  
  
  
Monotonic and Non-Monotonic
Queries
ISOM
• Monotonic queries: queries for which the size of
the results either increase or stay the same as the
size of the inputs increase. The result size never
decreases
• Non-monotonic queries: queries for which it is
possible that the size of the result will DECREASE
when the size of the input increases
• Examples of each?
• Which of the algebra operations is non-monotonic?
• What does this signify?
Summaries and Aggregates
ISOM
Calculate the average GPA
select avg. (GPA)
from STUDENT,
Find the lowest GPA
select min (GPA) as minGPA
from STUDENT,
How many CIS majors?
select
from
where
Discarding duplicates
select avg (distinct GPA)
STUDENT
where major=‘CIS’
count (StudentId)
STUDENT
major=‘CIS’;
(is this above query correct?)
Aggregate Functions
ISOM
COUNT (attr)
attr
SUM (attr)
AVG (attr)
MAX (attr)
MIN (attr)
- a simple count of values in
-
sum of values in attr
average of values in attr
maximum value in attr
minimum value in attr
Take
effect after all the data is retrieved from the database
Applied to either the entire resulting relation or groups
Can’t be involved in any query qualifications (where clause)
Would
the following query be permitted?
select
StudentId
from
STUDENT
where
GPA = max (GPA);
Grouping Results Obtained
ISOM
Show
all students enrolled in each course.
select
cno, StudentID
from
REGISTRATION
group by cno;
Is this grouping OK?
Calculate the average GPA of students by county.
select
county, avg (GPA) as CountyGPA
from
STUDENT
group by county;
Calculate the enrollment of each class.
select
cno, year , term, count (StudentID) as enroll
from
REGISTRATION
group by cno, year, term;
Selections on Groups
ISOM

Show all CIS courses that are full.
select
cno, count (StudentID)
from
REGISTRATION
group by
cno
having
count (StudentID) > 29;
Grouping Results after Join
ISOM
Calculate
the average GPA of each class
select
from
where
group by
course#, avg (GPA)
STUDENT S, CLASS C
S.StudentID = C.SID
course#,
Nesting Queries
ISOM
SELECT
FROM
WHERE
}
attribute(s)
relation(S)
attr [not] {in | comparison operator | exists
( query statement(s) );
List names of students who are taking “BA201”
select
Name
from
Student
where
StudentID in
( select
StudentID
from
REGISTRATION
where course#=‘BA201’);
Sub Queries
ISOM
List all students enrolled in CIS courses
select name
from
STUDENT
where StudentId in
(select
StudentId
from
REGISTRATION
where
cno like ‘CIS%’);
List all courses taken by Student (Id 1011)
select cname
from
COURSE
where cnum = any
(select
cno
from
REGISTRATION
where
StudentId = 1011);
Sub Queries
ISOM
Who received the highest grade in CIS 8140
select
StudentId
from
REGISTRATION
where
cnum = ‘CIS 8140’ and
grade >=all
(select grade
from
REGISTRATION
where
cno = ‘CIS 8140’);
List all students enrolled in CIS courses.
select
name
from
STUDENT S
where
exists
(select *
from
REGISTRATION
where
StudentId = S.StudentId
and cno like ‘CIS%’);
Relational Views
ISOM
• Relations derived from other relations.
• Views have no stored tuples.
• Are useful to provide multiple user views.
View 1
View 2
Base
Relation 1
View N
Base
Relation 2
•What level in the three layer model do views belong?
•Which kind of independence do they support?
View Creation
ISOM
Create View view-name [ ( attr [ , attr ] ...) ]
AS subquery
[ with check option ] ;
DROP VIEW view-name;
 Create a view containing the student ID,
Name, Age and GPA for those who are
qualified to take 300 level courses, i.e., GPA
>=2.0.
View Options
ISOM
• With Check Option enforces the query
condition for insertion or update
To enforce the GPA >=2.0 condition on
all new student tuples inserted into the
view
• A view may be derived from multiple
base relations
Create a view that includes student IDs, student names
and their instructors’ names for all CIS 300 students.
View Retrieval
ISOM
Queries on views are the same as that on base
relations.
Queries on views are expanded into queries on
their base relations.
select
from
where
Name, Instructor-Name
CIS300-Student
Name = Instructor-Name;
View: Update
ISOM
Update on a view actually changes its base relation(s)!
update Qualified-Student
set
GPA = GPA-0.1
where
StudentID = ‘s3’;
insert into
Qualified-Student
values
( ‘s9’, ‘Lisa’, 4.0 )
insert into
Qualified-Student
values
( ‘s10’, ‘Peter’, 1.7 )
Why are some views not updateable?
What type of views are updateable?
Non-monotonic queries – again!
ISOM
• Need to use either MINUS or NOT
EXISTS!
• Find courses where no student has
gpa over 3.5
• Find students who have taken all
courses that Joe has taken
• How would you solve these?
Summary
ISOM
• SQL is a low-complexity, declarative query
language
• The good thing about being declarative is
that internally the query can be changed
automatically for optimization
• Good thing about being low-complexity?
 No SQL query ever goes into an infinite loop
 No SQL query will ever take indefinite amount of
space to get the solution
• Can be used for highly complex problems!
Download