CMPSCI 645 -- Homework 1 Due February 10, before class Exercise 1 [24 points]: ER Modeling Computer Sciences Department frequent fliers have been complaining to Amherst Airport officials about the poor organization at the airport. As a result, the officials decided that all information related to the airport should be organized using a DBMS, and you have been hired to design the database. Your first task is to organize the information about all the airplanes stationed and maintained at the airport. The relevant information is as follows: • • • • • • • • Every airplane has a registration number, and each airplane is of a specific model. The airport accommodates a number of airplane models, and each model is identified by a model number (e.g., DC-10) and has a capacity and a weight. A number of technicians work at the airport. You need to store the name, SSN, address, phone number, and salary of each technician. Each technician is an expert on one or more plane model(s), and his or her expertise may overlap with that of other technicians. This information about technicians must also be recorded. Technicians periodically perform tests to ensure that airplanes are still airworthy. Each test has a Federal Aviation Administration (FAA) test number, a name, and a maximum possible score. The FAA requires the airport to keep track of each time a given airplane is tested by a given technician using a given test. For each testing event, the information needed is the date, the number of hours the technician spent doing the test, and the score the airplane received on the test. A number of traffic controllers also work at the airport. Traffic controllers must have an annual medical examination. For each traffic controller, you must store the date of the most recent exam. All airport employees (including technicians and traffic controllers) belong to a union. You must store the union membership number of each employee. You can assume that each employee is uniquely identified by a social security number. Draw an ER diagram for the airport database. Be sure to indicate the various attributes of each entity and relationship set. Also specify the key and participation constraints for each relationship set. Specify any necessary overlap and covering constraints as well (in English). Note that you can also assume that every technician and traffic control person has the name, address, phone number, and salary. The ER diagram needs to be revised accordingly. Exercise 2 [16 points]: Relational Algebra (a) Division [4 points] Please express Division using the five basic relational operators. As with the example in class, let us denote division by A/B where A = {x,y} and B={y}. For simplicity, assume that x, y are two attributes. (b) Monotonicity [12 points] A query or operator on relations is said to be monotonic if whenever we add a tuple to one of the input relations, the result contains all the tuples that it contained before adding the tuple, plus perhaps more tuples. That is, there is no way to remove tuples from the output by adding tuples to the input. For each relational algebra operator below, state whether it is monotone. (1) ∪ (2) ∩ (3) − (4) × (5) σ (6) π Solution: (a) ρ(T, πx(A) × B) ρ(U, T – A) ρ(V, πx(U)) πx(A) - V (b) Monotone, (2) Monotone, (3) Not Monotone, (4) Monotone, (5) Monotone, (6) Monotone Exercise 3 [16 points] Language Theory Can the following queries be expressed using conjunctive queries? If your answer is yes, write the conjunctive queries. If your answer is no, explain why. (1) Find students who have taken ‘Database Systems’ but not ‘Operating Systems’. (2) Find the age of the youngest student who has taken ‘Database Systems’. (3) The following table lists direct cause effect relationships. CauseEffect Cause Effect a d b d c e d f d g e x g x Now we want to support the following query: Among all causes that directly or indirectly contribute to the effect ‘X’, find other (direct or indirect) effects that they contribute to. Please write a SQL statement for the above query. Hints: • You can use the WITH construct in SQL to create temporary relations (either recursive or not) to be used in the final query. The syntax of WITH is the following: WITH RECURSIVE R AS (query) <final query involving R and possibly other relations>; • If you only want to create a nonrecursive temporary relation for use by another query, you can define the temporary relation in the FROM clause or create a view. (4) Show the intermediate steps of computing the table AllCauseEffect when you execute your query on the given table. Solution: (1) No, because the query is not monotonic. As we know from the lecture, all conjunctive queries are monotonic. (2) No, for the same reason as above. Note that for inexpressibility, it is better to use the theorem on monotonicity. We will give partial credit if a student states “we can not express the query using relational operators including selection, project, and join”. This is not a perfect argument because it is not clear whether there is indeed no way to express the query using selection, project, and join, or simply we haven’t been lucky enough to find a way. (3) SQL query: WITH RECURSIVE AllCauseEffect(cause, effect) AS ( SELECT * FROM CauseEffect UNION SELECT R1.cause, R2.effect FROM AllCauseEffect R1, CauseEffect R2 WHERE R1.effect = R2.cause ) SELECT C.cause, A.effect FROM ( SELECT DISTINCT R.cause FROM AllCauseEffect R WHERE R.effect = ‘X’) AS C, AllCauseEffect A WHERE C.cause = A.cause and A.effect <> ‘X’; (4) We execute a recursive query on this table to compute all cause/effect pairs. Iteration 1: we add tuples (a,f), (a,g), (b,f), (b,g), (c,x), and (d,x) Iteration 2: we add tuples (a,x), (b,x) Exercise 4 [20 points] Queries in Relational Algebra Consider the following relational schema: Suppliers(sid: integer, sname: string, address: string) Parts(pid: integer, pname: string, color: string) Catalog(sid: integer, pid: integer, cost: real) The domain of each field is listed after the field name. Naturally, the Suppliers and Parts relations represent supplier entities and part entities and their attributes. The Catalog relation lists the prices charged for parts by suppliers. Write the following queries in relational algebra: (1) Find the Supplier names of the suppliers who supply a red part that costs less than 100 dollars. πsname((σcolor=’red’ Parts) (σcost < 100 Catalog) Suppliers) (2) Find the Supplier names of the suppliers who supply a red part that costs less than 100 dollars and a green part that costs less than 100 dollars. ρ(R1, πsid ((σcolor=’red’ Parts) (σcost < 100 Catalog)) ) ρ(R2, πsid ((σcolor=’green’ Parts) (σcost < 100 Catalog)) ) πsname ((R1 ∩ R2) Suppliers) (3) Find the pids of parts supplied by at least two different suppliers. ρ(R1, Catalog) ρ(R2, Catalog) πR1.pid(σR1.pid=R2.pid R1.sid R2.sid(R1 × R2)) ∧ ≠ (4) Find the names of suppliers that supply all red parts. πsname( πsid,pid (Catalog) / (πpid (σcolor=’red’ Parts)) Suppliers) (5) Find the names of suppliers that supply only the parts that cost less than 100 dollars. πsname( ( πsid (Catalog ) - πsid ( σcost >= 100 Catalog )) Suppliers) Exercise 5 [24 points] SQL Queries using PostgreSQL You will use PostgreSQL to execute queries on a sample dataset that consists of three tables conforming to the schema in Exercise 4. Both the dataset and instructions for connecting to the PostgreSQL server are available on the assignment web page. Please take the following steps to complete this exercise. Step 1: Place the dataset in an appropriate place in your home directory on the Edlab machines. Step 2: Connect to the PostgreSQL server using psql -h db-edlab.cs.umass.edu -p 7645 Step 3: Inside PostgreSQL, write a CREATE TABLE command for each file in the dataset. An example: create table suppliers( sid int, sname varchar(30), address varchar(40), primary key (sid)); Step 4: Inside PostgreSQL, change your work directory to where your dataset is placed and load the dataset into corresponding tables. E.g. yanlei=> yanlei=> yanlei=> yanlei=> \cd ‘path-of-dataset’ \copy suppliers FROM suppliers.txt with delimiter as ',' \copy parts FROM parts.txt with delimiter as ',' \copy catalog FROM catalog.txt with delimiter as ',' You can use a SELECT query to check the content of each table. E.g., yanlei=> select * from parts; Step 5: Now you are ready to submit SQL queries to retrieve the required information. Submission: Please create a directory ‘hw1’ under your home directory on the edlab machines. For each query below, store in the ‘hw1’ directory a valid SQL query, named “Q1.sql”, “Q2.sql”, …, and “Q5.sql”. Write SQL expressions for each of the following queries and execute them: (1) Find the sids of suppliers who supply a red part and a green part. SELECT DISTINCT C.sid FROM parts P, catalog C WHERE P.pid = C.pid AND P.color = ‘Red’ AND C.sid IN ( SELECT DISTINCT C1.sid FROM catalog C1, parts P1 WHERE P1.color = ‘Green’ AND C1.pid = P1.pid); sid ----3 (1 row) (2) Find the sname of suppliers who supply every red part. SELECT S.sname FROM suppliers S WHERE NOT EXISTS ( SELECT P.pid FROM parts P WHERE P.color = ‘Red’ AND NOT EXISTS ( SELECT C.pid FROM catalog C WHERE C.pid = P.pid AND C.sid = S.sid)); sname ---------------------Big Red Tool and Die (1 row) (3) Find the sids of suppliers who charge more for some part than the average cost of that part (averaged over all the suppliers who supply that part). SELECT DISTINCT C.sid FROM catalog C WHERE C.cost > ( SELECT avg(C1.cost) FROM catalog C1 WHERE C1.pid = C.pid); sid ----1 2 3 (3 rows) (4) For each part, find the sname of the supplier who charges the most for that part. SELECT P.pid, S.sname FROM parts P, suppliers S, catalog C WHERE C.pid = P.pid AND C.sid = S.sid AND C.cost = ( SELECT max(C1.cost) FROM catalog C1 WHERE C1.pid = P.pid); OR SELECT C.pid, S.sname FROM suppliers S, catalog C, ( SELECT pid, max(cost) AS maxcost FROM catalog GROUP BY pid ) AS Temp WHERE C.sid = S.sid AND C.pid = Temp.pid AND C.cost = Temp.maxcost; pid | sname -----+----------------------4 | Acme Widget Suppliers 3 | Big Red Tool and Die 1 | Big Red Tool and Die 8 | Perfunctory Parts 9 | Perfunctory Parts 5 | Alien Aircaft Inc. 6 | Alien Aircaft Inc. 7 | Alien Aircaft Inc. (8 rows) OR pid | sname -----+----------------------4 | Acme Widget Suppliers 1 | Big Red Tool and Die 3 | Big Red Tool and Die 8 | Perfunctory Parts 9 | Perfunctory Parts 7 | Alien Aircaft Inc. 6 | Alien Aircaft Inc. 5 | Alien Aircaft Inc. (8 rows) (5) Find the sids of suppliers who supply only red parts. SELECT S.sid FROM suppliers S WHERE NOT EXISTS ( SELECT * FROM catalog C, parts P WHERE C.pid = P.pid AND C.sid = S.sid AND P.color <> ‘Red’); sid ----2 (1 row) (6) For every supplier that supplies only red parts, print the sid of the supplier and the total number of parts that she supplies. SELECT S.sid, count(C.pid) FROM suppliers S, catalog C WHERE S.sid = C.sid AND NOT EXISTS ( SELECT * FROM catalog C2, parts P WHERE C2.sid = S.sid AND C2.pid = P.pid AND P.color <> ‘Red’) GROUP BY S.sid; sid | count -----+------2 | 3 (1 row)