9/18 - SEAS - University of Pennsylvania

advertisement

Advanced SQL

Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

Zachary G. Ives

University of Pennsylvania

CIS 550 – Database & Information Systems

September 18, 2003

Administrivia

 Please turn in your answers to HW1 now

 Please take a copy of HW2

 Sign up for Oracle account at: http://www.seas.upenn.edu/ora/

 (Those who don’t have eniac accounts: please email me)

 Remember, decisions about groups and projects due by end of Friday in email

2

We’re Studying SQL: A Friendly Face

Over the Tuple Relational Calculus

SELECT [DISTINCT] {T

1

FROM {relation} T

1

.attrib, …, T

, {relation} T

WHERE {predicates}

2

, …

2

.attrib} from-list select-list qualification

 Queries can have set operators (UNION, EXCEPT,

…)

 Queries can be nested

 Often multiple ways of expressing the same query

3

Aggregation

 GROUP BY

SELECT {group-attribs} , {aggregate-operator}(attrib)

FROM {relation} T

1

, {relation} T

2

, …

WHERE {predicates}

GROUP BY {group-list}

 Aggregate operators

 AVG , COUNT , SUM , MAX , MIN

 DISTINCT keyword for AVG , COUNT , SUM

4

Some Examples

 Number of students in each course offering

 Number of different grades expected for each course offering

 Number of (distinct) students taking AI courses

5

Example Data Instance

STUDENT sid name

1 Jill

2 Qun

3 Nitin

4 Marty

Takes sid exp-grade

1 A

1

3

A

A

3

4

C

C cid

550-0103

700-1003

700-1003

501-0103

501-0103

COURSE cid

550-0103

700-1003

501-0103 subj sem

DB

AI

F03

S03

Arch F03

PROFESSOR Teaches

2

8 fid

1 name

Ives

Saul

Roth fid cid

1 550-0103

2 700-1003

8 501-0103

6

What If You Want to Only Show

Some Groups?

 The HAVING clause lets you do a selection based on an aggregate (there must be 1 value per group):

SELECT C.subj, COUNT(S.sid)

FROM STUDENT S, Takes T, COURSE C

WHERE S.sid = T.sid AND T.cid = C.cid

GROUP BY subj

HAVING COUNT(S.sid) > 5

 Exercise: For each subject taught by at least two professors, list the minimum expected grade

7

Aggregation and Table Expressions

(aka Derived Relations)

 Sometimes need to compute results over the results of a previous aggregation:

SELECT subj, AVG(size)

FROM (

SELECT C.cid AS id, C.subj AS subj,

COUNT(S.sid) AS size

FROM STUDENT S, Takes T, COURSE C

WHERE S.sid = T.sid AND T.cid = C.cid

GROUP BY cid, subj)

GROUP BY subj

8

Thought Exercise…

 Tables are great, but…

 Not everyone is uniform – I may have a cell phone but not a fax

 We may simply be missing certain information

 We may be unsure about values

 How do we handle these things?

9

One Answer: Null Values

 We designate a special “null” value to represent

“unknown” or “N/A”

Name Home Fax

Sam 123-4567 NULL

Li 234-8972 234-8766

Maria 789-2312 789-2121

 But a question: what does:

SELECT * FROM CONTACT WHERE Fax < “789-1111” do?

10

Three-State Logic

 Need ways to evaluate boolean expressions and have the result be “unknown” (or T/F)

 Need ways of composing these three-state expressions using AND , OR , NOT :

T AND U = U T OR U = T

F AND U = F F OR U = U NOT U = U

U AND U = U U OR U = U

 Can also test for null-ness: attr IS NULL , attr IS NOT

NULL

 Finally: need rules for arithmetic, aggregation

11

Nulls and Joins

 Sometimes need special variations of joins:

 I want to see all courses and their students

 … But what if there’s a course with no students?

 Outer join:

 Most common is left outer join:

SELECT C.subj, C.cid, T.sid

FROM COURSE C LEFT OUTER JOIN Takes T

ON C.cid = T.cid

WHERE …

12

Revised Example Data Instance

STUDENT sid name

1 Jill

2 Qun

3 Nitin

4 Marty

Takes sid exp-grade

1 A

1

3

A

A

3

4

C

C cid

550-0103

700-1003

700-1003

501-0103

501-0103

COURSE cid

550-0103

700-1003

501-0103

555-0103 subj sem

DB

AI

F03

S03

Arch F03

Food F03

PROFESSOR Teaches

2

8 fid

1 name

Ives

Saul

Roth fid cid

1 550-0103

2 700-1003

8 501-0103

13

Warning on Outer Join

 Oracle doesn’t support standard SQL syntax here:

SELECT C.subj, C.cid, T.sid

FROM COURSE C , Takes T

WHERE C.cid =(+) T.cid

14

Beyond Null

 Can have much more complex ideas of incomplete or approximate information

 Probabilistic models (tuple 80% likely to be an answer)

 Naïve tables (can have variables instead of NULLs)

 Conditional tables (tuple IF some condition holds)

 … And what if you want “0 or more”?

 In relational databases, create a new table and foreign key

 But can have semistructured data (like XML)

15

Modifying the Database:

Inserting Data

 Inserting a new literal tuple is easy, if wordy:

INSERT INTO FACULTY(fid, name)

VALUES (4, ‘Simpson’)

 But we can also insert the results of a query!

INSERT INTO FACULTY(fid, name)

SELECT sid AS fid, name

FROM STUDENT

WHERE sid < 20

16

Deleting Tuples

 Deletion is a fairly simple operation:

DELETE

FROM STUDENT S

WHERE S.sid < 25

17

Updating Tuples

 What kinds of updates might you want to do?

UPDATE STUDENT S

SET S.sid = 1 + S.sid, S.name = ‘Janet’

WHERE S.name = ‘Jane’

18

Now, How Do I Talk to the DB?

 Generally, apps are in a different (“host”) language with embedded SQL statements

 Static: SQLJ, embedded SQL in C

 Runtime: ODBC, JDBC, ADO, OLE DB, …

 Typically, predefined mappings between host language types and SQL types (e.g., VARCHAR  string or char[])

19

Embedded SQL in C

EXEC SQL BEGIN DECLARE SECTION int sid; char name[20];

EXEC SQL END DECLARE SECTION

EXEC SQL INSERT INTO STUDENT

VALUES (:sid, :name);

EXEC SQL

SELECT name, age

INTO :sid, :name

FROM STUDENT

WHERE sid < 20

20

The Impedance Mismatch and Cursors

 SQL is set-oriented – it returns relations

 There’s no relation type in most languages!

 Solution: cursor that’s opened, read

DECLARE sinfo CURSOR FOR

SELECT sid, name FROM STUDENT

OPEN sinfo; while (…) {

FETCH sinfo INTO :sid, :name

}

CLOSE sinfo;

21

JDBC: Dynamic SQL

 Roughly speaking, a Java version of ODBC

 See Chapter 6 of the text for more info

} import java.sql.*;

Connection conn = DriverManager.getConnection(…);

PreparedStatement stmt = conn.prepareStatement(“SELECT * FROM STUDENT”);

ResultSet rs = stmt.executeQuery(); while (rs.next()) { sid = rs.getInteger(1);

22

Database-Backed Web Sites

 We all know traditional static HTML web sites:

Web-Browser

HTTP-Request

GET ...

Web-Server

HTML-File

Load File

File-System

HTML-File

23

Common Gateway Interface (CGI)

Can have the web server invoke code (with parameters) to generate HTML

HTTP-Request

HTML-File

Web Server

Web Server

Load File

File-System

HTML

HTML?

Output Program?

Execute Program

File

I/O, Network, DB

24

CGI: Discussion

 Advantages:

 Standardized: works for every web-server, browser

 Flexible: Any language (C++, Perl, Java, …) can be used

 Disadvantages:

 Statelessness: query-by-query approach

 Inefficient: new process forked for every request

 Security: CGI programmer is responsible for security

 Updates: To update layout, one has to be a programmer

25

DB Access in Java

Java Applet

TCP/UDP

IP

Java-Server-Process

JDBC Driver manager

JDBC-

Driver

JDBC-

Driver

JDBC-

Driver

Sybase Oracle ...

Browser

JVM

26

Java Applets: Discussion

 Advantages:

 Can take advantage of client processing

 Platform independent – assuming standard java

 Disadvantages:

 Requires JVM on client; self-contained

 Inefficient: loading can take a long time ...

 Resource intensive: Client needs to be state of the art

 Restrictive: can only connect to server where applet was loaded from (for security … can be configured)

27

*SP Server pages

(IIS, Tomcat, …)

HTTP Request

HTML File

Web Server

Web Server

Load File

File-System

HTML

Output

HTML?

Script?

Server Extension

File

I/O, Network, DB

28

One Step Beyond: DB-Driven Web

Sites (Strudel, Cocoon, …)

HTTP Request

HTML File

Web Server

DB-Driven Web Server

Cache

Styles

HTML

Dynamic

HTML

Generation

Script?

Data

Other data sources

Local

Database

29

Wrapping Up

 We’ve seen how to query in SQL (DML)

 Basic foundation is TRC-based

 Subqueries and aggregation add extra power

 Nulls and outer joins add flexibility of representation

 We can update tables

 We’ve seen that SQL doesn’t precisely match standard host language semantics

 Embedded SQL

 Dynamic SQL

 Data-driven web sites

30

Remember…

 Groups and project choices due by email by end of day tomorrow – send to zives@cis , dinkar@gradient.cis

 Sign up for your Oracle account ASAP

31

Download