Solution - UMass Amherst

advertisement
CMPSCI 645 -- Homework 1
Due February 10, before class
Exercise 1 [24 points]: ER Modeling
Computer Sciences Department frequent fliers have been complaining to Amherst Airport officials about the
poor organization at the airport. As a result, the officials decided that all information related to the airport
should be organized using a DBMS, and you have been hired to design the database. Your first task is to
organize the information about all the airplanes stationed and maintained at the airport. The relevant
information is as follows:
•
•
•
•
•
•
•
•
Every airplane has a registration number, and each airplane is of a specific model.
The airport accommodates a number of airplane models, and each model is identified by a model number
(e.g., DC-10) and has a capacity and a weight.
A number of technicians work at the airport. You need to store the name, SSN, address, phone number,
and salary of each technician.
Each technician is an expert on one or more plane model(s), and his or her expertise may overlap with
that of other technicians. This information about technicians must also be recorded.
Technicians periodically perform tests to ensure that airplanes are still airworthy. Each test has a Federal
Aviation Administration (FAA) test number, a name, and a maximum possible score.
The FAA requires the airport to keep track of each time a given airplane is tested by a given technician
using a given test. For each testing event, the information needed is the date, the number of hours the
technician spent doing the test, and the score the airplane received on the test.
A number of traffic controllers also work at the airport. Traffic controllers must have an annual medical
examination. For each traffic controller, you must store the date of the most recent exam.
All airport employees (including technicians and traffic controllers) belong to a union. You must store
the union membership number of each employee. You can assume that each employee is uniquely
identified by a social security number.
Draw an ER diagram for the airport database. Be sure to indicate the various attributes of each entity and
relationship set. Also specify the key and participation constraints for each relationship set. Specify any
necessary overlap and covering constraints as well (in English).
Note that you can also assume that every technician and traffic control person has the name, address, phone
number, and salary. The ER diagram needs to be revised accordingly.
Exercise 2 [16 points]: Relational Algebra
(a) Division [4 points]
Please express Division using the five basic relational operators. As with the example in class, let us denote
division by A/B where A = {x,y} and B={y}. For simplicity, assume that x, y are two attributes.
(b) Monotonicity [12 points]
A query or operator on relations is said to be monotonic if whenever we add a tuple to one of the input
relations, the result contains all the tuples that it contained before adding the tuple, plus perhaps more tuples.
That is, there is no way to remove tuples from the output by adding tuples to the input.
For each relational algebra operator below, state whether it is monotone.
(1) ∪
(2) ∩
(3) −
(4) ×
(5) σ
(6) π
Solution:
(a)
ρ(T, πx(A) × B)
ρ(U, T – A)
ρ(V, πx(U))
πx(A) - V
(b) Monotone, (2) Monotone, (3) Not Monotone, (4) Monotone, (5) Monotone, (6) Monotone
Exercise 3 [16 points] Language Theory
Can the following queries be expressed using conjunctive queries? If your answer is yes, write the
conjunctive queries. If your answer is no, explain why.
(1) Find students who have taken ‘Database Systems’ but not ‘Operating Systems’.
(2) Find the age of the youngest student who has taken ‘Database Systems’.
(3) The following table lists direct cause effect relationships.
CauseEffect
Cause
Effect
a
d
b
d
c
e
d
f
d
g
e
x
g
x
Now we want to support the following query:
Among all causes that directly or indirectly contribute to the effect ‘X’, find other (direct or indirect)
effects that they contribute to.
Please write a SQL statement for the above query.
Hints:
• You can use the WITH construct in SQL to create temporary relations (either recursive or not) to be
used in the final query. The syntax of WITH is the following:
WITH RECURSIVE R AS (query)
<final query involving R and possibly other relations>;
•
If you only want to create a nonrecursive temporary relation for use by another query, you can define
the temporary relation in the FROM clause or create a view.
(4) Show the intermediate steps of computing the table AllCauseEffect when you execute your query on the
given table.
Solution:
(1) No, because the query is not monotonic. As we know from the lecture, all conjunctive queries are
monotonic.
(2) No, for the same reason as above.
Note that for inexpressibility, it is better to use the theorem on monotonicity. We will give partial credit if a
student states “we can not express the query using relational operators including selection, project, and join”.
This is not a perfect argument because it is not clear whether there is indeed no way to express the query
using selection, project, and join, or simply we haven’t been lucky enough to find a way.
(3) SQL query:
WITH RECURSIVE AllCauseEffect(cause, effect) AS
( SELECT * FROM CauseEffect
UNION
SELECT R1.cause, R2.effect
FROM AllCauseEffect R1, CauseEffect R2
WHERE R1.effect = R2.cause )
SELECT C.cause, A.effect
FROM ( SELECT DISTINCT R.cause
FROM AllCauseEffect R
WHERE R.effect = ‘X’) AS C, AllCauseEffect A
WHERE C.cause = A.cause and A.effect <> ‘X’;
(4) We execute a recursive query on this table to compute all cause/effect pairs.
Iteration 1: we add tuples (a,f), (a,g), (b,f), (b,g), (c,x), and (d,x)
Iteration 2: we add tuples (a,x), (b,x)
Exercise 4 [20 points] Queries in Relational Algebra
Consider the following relational schema:
Suppliers(sid: integer, sname: string, address: string)
Parts(pid: integer, pname: string, color: string)
Catalog(sid: integer, pid: integer, cost: real)
The domain of each field is listed after the field name. Naturally, the Suppliers and Parts relations represent
supplier entities and part entities and their attributes. The Catalog relation lists the prices charged for parts by
suppliers.
Write the following queries in relational algebra:
(1) Find the Supplier names of the suppliers who supply a red part that costs less than 100 dollars.
πsname((σcolor=’red’ Parts)  (σcost < 100 Catalog)  Suppliers)
(2) Find the Supplier names of the suppliers who supply a red part that costs less than 100 dollars and a green
part that costs less than 100 dollars.
ρ(R1, πsid ((σcolor=’red’ Parts)  (σcost < 100 Catalog)) )
ρ(R2, πsid ((σcolor=’green’ Parts)  (σcost < 100 Catalog)) )
πsname ((R1 ∩ R2)  Suppliers)
(3) Find the pids of parts supplied by at least two different suppliers.
ρ(R1, Catalog)
ρ(R2, Catalog)
πR1.pid(σR1.pid=R2.pid R1.sid R2.sid(R1 × R2))
∧
≠
(4) Find the names of suppliers that supply all red parts.
πsname( πsid,pid (Catalog) / (πpid (σcolor=’red’ Parts))  Suppliers)
(5) Find the names of suppliers that supply only the parts that cost less than 100 dollars.
πsname( ( πsid (Catalog ) - πsid ( σcost >= 100 Catalog ))  Suppliers)
Exercise 5 [24 points] SQL Queries using PostgreSQL
You will use PostgreSQL to execute queries on a sample dataset that consists of three tables conforming to
the schema in Exercise 4. Both the dataset and instructions for connecting to the PostgreSQL server are
available on the assignment web page.
Please take the following steps to complete this exercise.
Step 1: Place the dataset in an appropriate place in your home directory on the Edlab machines.
Step 2: Connect to the PostgreSQL server using
psql -h db-edlab.cs.umass.edu -p 7645
Step 3: Inside PostgreSQL, write a CREATE TABLE command for each file in the dataset. An example:
create table suppliers(
sid int,
sname varchar(30),
address varchar(40),
primary key (sid));
Step 4: Inside PostgreSQL, change your work directory to where your dataset is placed and load the
dataset into corresponding tables. E.g.
yanlei=>
yanlei=>
yanlei=>
yanlei=>
\cd ‘path-of-dataset’
\copy suppliers FROM suppliers.txt with delimiter as ','
\copy parts FROM parts.txt with delimiter as ','
\copy catalog FROM catalog.txt with delimiter as ','
You can use a SELECT query to check the content of each table. E.g.,
yanlei=> select * from parts;
Step 5: Now you are ready to submit SQL queries to retrieve the required information.
Submission: Please create a directory ‘hw1’ under your home directory on the edlab machines. For each
query below, store in the ‘hw1’ directory a valid SQL query, named “Q1.sql”, “Q2.sql”, …, and “Q5.sql”.
Write SQL expressions for each of the following queries and execute them:
(1) Find the sids of suppliers who supply a red part and a green part.
SELECT DISTINCT C.sid
FROM parts P, catalog C
WHERE P.pid = C.pid AND P.color = ‘Red’ AND C.sid IN (
SELECT DISTINCT C1.sid
FROM catalog C1, parts P1
WHERE P1.color = ‘Green’ AND C1.pid = P1.pid);
sid
----3
(1 row)
(2) Find the sname of suppliers who supply every red part.
SELECT S.sname
FROM suppliers S
WHERE NOT EXISTS (
SELECT P.pid
FROM parts P
WHERE P.color = ‘Red’ AND NOT EXISTS (
SELECT C.pid
FROM catalog C
WHERE C.pid = P.pid AND C.sid = S.sid));
sname
---------------------Big Red Tool and Die
(1 row)
(3) Find the sids of suppliers who charge more for some part than the average cost of that part (averaged over
all the suppliers who supply that part).
SELECT DISTINCT C.sid
FROM catalog C
WHERE C.cost > (
SELECT avg(C1.cost)
FROM catalog C1
WHERE C1.pid = C.pid);
sid
----1
2
3
(3 rows)
(4) For each part, find the sname of the supplier who charges the most for that part.
SELECT P.pid, S.sname
FROM parts P, suppliers S, catalog C
WHERE C.pid = P.pid AND C.sid = S.sid AND C.cost = (
SELECT max(C1.cost)
FROM catalog C1
WHERE C1.pid = P.pid);
OR
SELECT C.pid, S.sname
FROM suppliers S, catalog C, (
SELECT pid, max(cost) AS maxcost
FROM catalog
GROUP BY pid
) AS Temp
WHERE C.sid = S.sid AND C.pid = Temp.pid AND C.cost = Temp.maxcost;
pid |
sname
-----+----------------------4 | Acme Widget Suppliers
3 | Big Red Tool and Die
1 | Big Red Tool and Die
8 | Perfunctory Parts
9 | Perfunctory Parts
5 | Alien Aircaft Inc.
6 | Alien Aircaft Inc.
7 | Alien Aircaft Inc.
(8 rows)
OR
pid |
sname
-----+----------------------4 | Acme Widget Suppliers
1 | Big Red Tool and Die
3 | Big Red Tool and Die
8 | Perfunctory Parts
9 | Perfunctory Parts
7 | Alien Aircaft Inc.
6 | Alien Aircaft Inc.
5 | Alien Aircaft Inc.
(8 rows)
(5) Find the sids of suppliers who supply only red parts.
SELECT S.sid
FROM suppliers S
WHERE NOT EXISTS (
SELECT *
FROM catalog C, parts P
WHERE C.pid = P.pid AND C.sid = S.sid AND P.color <> ‘Red’);
sid
----2
(1 row)
(6) For every supplier that supplies only red parts, print the sid of the supplier and the total number of parts
that she supplies.
SELECT S.sid, count(C.pid)
FROM suppliers S, catalog C
WHERE S.sid = C.sid AND NOT EXISTS (
SELECT *
FROM catalog C2, parts P
WHERE C2.sid = S.sid AND C2.pid = P.pid AND P.color <> ‘Red’)
GROUP BY S.sid;
sid | count
-----+------2 |
3
(1 row)
Download