Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan
Zachary G. Ives
University of Pennsylvania
CIS 550 – Database & Information Systems
September 18, 2003
Please turn in your answers to HW1 now
Please take a copy of HW2
Sign up for Oracle account at: http://www.seas.upenn.edu/ora/
(Those who don’t have eniac accounts: please email me)
Remember, decisions about groups and projects due by end of Friday in email
2
We’re Studying SQL: A Friendly Face
Over the Tuple Relational Calculus
SELECT [DISTINCT] {T
1
FROM {relation} T
1
.attrib, …, T
, {relation} T
WHERE {predicates}
2
, …
2
.attrib} from-list select-list qualification
Queries can have set operators (UNION, EXCEPT,
…)
Queries can be nested
Often multiple ways of expressing the same query
3
GROUP BY
SELECT {group-attribs} , {aggregate-operator}(attrib)
FROM {relation} T
1
, {relation} T
2
, …
WHERE {predicates}
GROUP BY {group-list}
Aggregate operators
AVG , COUNT , SUM , MAX , MIN
DISTINCT keyword for AVG , COUNT , SUM
4
Number of students in each course offering
Number of different grades expected for each course offering
Number of (distinct) students taking AI courses
5
STUDENT sid name
1 Jill
2 Qun
3 Nitin
4 Marty
Takes sid exp-grade
1 A
1
3
A
A
3
4
C
C cid
550-0103
700-1003
700-1003
501-0103
501-0103
COURSE cid
550-0103
700-1003
501-0103 subj sem
DB
AI
F03
S03
Arch F03
PROFESSOR Teaches
2
8 fid
1 name
Ives
Saul
Roth fid cid
1 550-0103
2 700-1003
8 501-0103
6
What If You Want to Only Show
Some Groups?
The HAVING clause lets you do a selection based on an aggregate (there must be 1 value per group):
SELECT C.subj, COUNT(S.sid)
FROM STUDENT S, Takes T, COURSE C
WHERE S.sid = T.sid AND T.cid = C.cid
GROUP BY subj
HAVING COUNT(S.sid) > 5
Exercise: For each subject taught by at least two professors, list the minimum expected grade
7
Aggregation and Table Expressions
(aka Derived Relations)
Sometimes need to compute results over the results of a previous aggregation:
SELECT subj, AVG(size)
FROM (
SELECT C.cid AS id, C.subj AS subj,
COUNT(S.sid) AS size
FROM STUDENT S, Takes T, COURSE C
WHERE S.sid = T.sid AND T.cid = C.cid
GROUP BY cid, subj)
GROUP BY subj
8
Tables are great, but…
Not everyone is uniform – I may have a cell phone but not a fax
We may simply be missing certain information
We may be unsure about values
How do we handle these things?
9
We designate a special “null” value to represent
“unknown” or “N/A”
Name Home Fax
Sam 123-4567 NULL
Li 234-8972 234-8766
Maria 789-2312 789-2121
But a question: what does:
SELECT * FROM CONTACT WHERE Fax < “789-1111” do?
10
Need ways to evaluate boolean expressions and have the result be “unknown” (or T/F)
Need ways of composing these three-state expressions using AND , OR , NOT :
T AND U = U T OR U = T
F AND U = F F OR U = U NOT U = U
U AND U = U U OR U = U
Can also test for null-ness: attr IS NULL , attr IS NOT
NULL
Finally: need rules for arithmetic, aggregation
11
Sometimes need special variations of joins:
I want to see all courses and their students
… But what if there’s a course with no students?
Outer join:
Most common is left outer join:
SELECT C.subj, C.cid, T.sid
FROM COURSE C LEFT OUTER JOIN Takes T
ON C.cid = T.cid
WHERE …
12
STUDENT sid name
1 Jill
2 Qun
3 Nitin
4 Marty
Takes sid exp-grade
1 A
1
3
A
A
3
4
C
C cid
550-0103
700-1003
700-1003
501-0103
501-0103
COURSE cid
550-0103
700-1003
501-0103
555-0103 subj sem
DB
AI
F03
S03
Arch F03
Food F03
PROFESSOR Teaches
2
8 fid
1 name
Ives
Saul
Roth fid cid
1 550-0103
2 700-1003
8 501-0103
13
Oracle doesn’t support standard SQL syntax here:
SELECT C.subj, C.cid, T.sid
FROM COURSE C , Takes T
WHERE C.cid =(+) T.cid
14
Can have much more complex ideas of incomplete or approximate information
Probabilistic models (tuple 80% likely to be an answer)
Naïve tables (can have variables instead of NULLs)
Conditional tables (tuple IF some condition holds)
… And what if you want “0 or more”?
In relational databases, create a new table and foreign key
But can have semistructured data (like XML)
15
Modifying the Database:
Inserting Data
Inserting a new literal tuple is easy, if wordy:
INSERT INTO FACULTY(fid, name)
VALUES (4, ‘Simpson’)
But we can also insert the results of a query!
INSERT INTO FACULTY(fid, name)
SELECT sid AS fid, name
FROM STUDENT
WHERE sid < 20
16
Deletion is a fairly simple operation:
DELETE
FROM STUDENT S
WHERE S.sid < 25
17
What kinds of updates might you want to do?
UPDATE STUDENT S
SET S.sid = 1 + S.sid, S.name = ‘Janet’
WHERE S.name = ‘Jane’
18
Generally, apps are in a different (“host”) language with embedded SQL statements
Static: SQLJ, embedded SQL in C
Runtime: ODBC, JDBC, ADO, OLE DB, …
Typically, predefined mappings between host language types and SQL types (e.g., VARCHAR string or char[])
19
EXEC SQL BEGIN DECLARE SECTION int sid; char name[20];
EXEC SQL END DECLARE SECTION
…
EXEC SQL INSERT INTO STUDENT
VALUES (:sid, :name);
EXEC SQL
SELECT name, age
INTO :sid, :name
FROM STUDENT
WHERE sid < 20
20
The Impedance Mismatch and Cursors
SQL is set-oriented – it returns relations
There’s no relation type in most languages!
Solution: cursor that’s opened, read
DECLARE sinfo CURSOR FOR
SELECT sid, name FROM STUDENT
…
OPEN sinfo; while (…) {
FETCH sinfo INTO :sid, :name
…
}
CLOSE sinfo;
21
Roughly speaking, a Java version of ODBC
See Chapter 6 of the text for more info
} import java.sql.*;
Connection conn = DriverManager.getConnection(…);
PreparedStatement stmt = conn.prepareStatement(“SELECT * FROM STUDENT”);
…
ResultSet rs = stmt.executeQuery(); while (rs.next()) { sid = rs.getInteger(1);
…
22
We all know traditional static HTML web sites:
Web-Browser
HTTP-Request
GET ...
Web-Server
HTML-File
Load File
File-System
HTML-File
23
Can have the web server invoke code (with parameters) to generate HTML
HTTP-Request
HTML-File
Web Server
Web Server
Load File
File-System
HTML
HTML?
Output Program?
Execute Program
File
I/O, Network, DB
24
Advantages:
Standardized: works for every web-server, browser
Flexible: Any language (C++, Perl, Java, …) can be used
Disadvantages:
Statelessness: query-by-query approach
Inefficient: new process forked for every request
Security: CGI programmer is responsible for security
Updates: To update layout, one has to be a programmer
25
Java Applet
TCP/UDP
IP
Java-Server-Process
JDBC Driver manager
JDBC-
Driver
JDBC-
Driver
JDBC-
Driver
Sybase Oracle ...
Browser
JVM
26
Advantages:
Can take advantage of client processing
Platform independent – assuming standard java
Disadvantages:
Requires JVM on client; self-contained
Inefficient: loading can take a long time ...
Resource intensive: Client needs to be state of the art
Restrictive: can only connect to server where applet was loaded from (for security … can be configured)
27
*SP Server pages
(IIS, Tomcat, …)
HTTP Request
HTML File
Web Server
Web Server
Load File
File-System
HTML
Output
HTML?
Script?
Server Extension
File
I/O, Network, DB
28
One Step Beyond: DB-Driven Web
Sites (Strudel, Cocoon, …)
HTTP Request
HTML File
Web Server
DB-Driven Web Server
Cache
Styles
HTML
Dynamic
HTML
Generation
Script?
Data
Other data sources
Local
Database
29
We’ve seen how to query in SQL (DML)
Basic foundation is TRC-based
Subqueries and aggregation add extra power
Nulls and outer joins add flexibility of representation
We can update tables
We’ve seen that SQL doesn’t precisely match standard host language semantics
Embedded SQL
Dynamic SQL
Data-driven web sites
30
Groups and project choices due by email by end of day tomorrow – send to zives@cis , dinkar@gradient.cis
Sign up for your Oracle account ASAP
31