Introduction to Database Systems - LSIR

advertisement
Introduction to
Information Systems
SSC, Semester 6
Lecture 01
1
Staff
• Instructor: Karl Aberer
– PSE, Room 1.31, karl.aberer@epfl.ch
– Office hours: by appointment
• TAs:
–
–
–
–
Magdalena Punceva
Anwitaman Datta
Gleb Sklobetskyn
Roman Schmidt
2
Communications
• Web page: lsirww.epfl.ch
– Lectures will be available here
– Homeworks and solutions will be posted here
– The project description and resources will be here
• Mailing list:
– please subscribe (see instructions on the
Web page)
3
Textbook
Main textbook:
• Databases and Transaction Processing,
An application-oriented approach
Philip M. Lewis, Arthur Bernstein, Michael
Kifer, Addison-Wesley 2002.
4
Other Texts
Many classic textbooks (each of them will do it)
• Database Systems: The Complete Book, Hector Garcia-Molina,
Jeffrey Ullman, Jennifer Widom
• Database Management Systems, Ramakrishnan
• Fundamentals of Database Systems, Elmasri, Navathe
• Database Systems, Date (7. edition)
• Modern Database Management, Hoffer, (4. edition)
• Database Systems Concepts, Silverschatz, (4. edition)
5
Material on the Web
SQL Intro
• SQL for Web Nerds, by Philip Greenspun,
http://philip.greenspun.com/sql/
Java Technology
• java.sun.com
WWW Technology
• www.w3c.org
6
Outline for Today’s Lecture
• Overview of database systems
• Course Outline
• First Steps in SQL
7
What is behind this Web Site?
•
•
•
•
•
•
http://immo.search.ch/
Search on a large database
Specify search conditions
Many users
Updates
Access through a web interface
8
Database Management Systems
Database Management System = DBMS
• A collection of files that store the data
• A big C program written by someone else
that accesses and updates those files for you
Relational DBMS = RDBMS
• Data files are structured as relations (tables)
9
Where are RDBMS used ?
• Backend for traditional “database”
applications
– EFPL administration
• Backend for large Websites
– Immosearch
• Backend for Web services
– Amazon
10
Example of a Traditional
Database Application
Suppose we are building a system
to store the information about:
• students
• courses
• professors
• who takes what, who teaches what
11
Can we do it without a DBMS ?
Sure we can! Start by storing the data in files:
students.txt
courses.txt
professors.txt
Now write C or Java programs to implement
specific tasks
12
Doing it without a DBMS...
• Enroll “Mary Johnson” in “CSE444”:
Write a C/Java program to do the following:
Read ‘students.txt’
Read ‘courses.txt’
Find&update the record “Mary Johnson”
Find&update the record “CSE444”
Write “students.txt”
Write “courses.txt”
13
Problems without an DBMS...
• System crashes:
Read ‘students.txt’
Read ‘courses.txt’
Find&update the record “Mary Johnson”
Find&update the record “CSE444”
Write “students.txt”
Write “courses.txt”
CRASH !
– What is the problem ?
• Large data sets (say 50GB)
– Why is this a problem ?
• Simultaneous access by many users
– Lock students.txt – what is the problem ?
14
Enters a DMBS
“Two tier system” or “client-server”
connection
(ODBC, JDBC)
Data files
Database server
(someone else’s
C program)
Applications
15
Functionality of a DBMS
The programmer sees SQL, which has two components:
• Data Definition Language - DDL
• Data Manipulation Language - DML
– query language
Behind the scenes the DBMS has:
• Query engine
• Query optimizer
• Storage management
• Transaction Management (concurrency, recovery)
16
How the Programmer Sees the
DBMS
• Start with DDL to create tables:
CREATE TABLE Students (
Name CHAR(30)
SSN CHAR(9) PRIMARY KEY NOT NULL,
Category CHAR(20)
) ...
• Continue with DML to populate tables:
INSERT INTO Students
VALUES(‘Charles’, ‘123456789’, ‘undergraduate’)
. . . .
17
How the Programmer Sees the
DBMS
• Tables:
Students:
SSN
123-45-6789
234-56-7890
Courses:
CID
CSE444
CSE541
Takes:
Name
Charles
Dan
…
Category
undergrad
grad
…
Name
Databases
Operating systems
SSN
123-45-6789
123-45-6789
234-56-7890
CID
CSE444
CSE444
CSE142
…
Quarter
fall
winter
• Still implemented as files, but behind the scenes can
be quite complex
“data independence” = separate logical view
from physical implementation
18
Transactions
• Enroll “Mary Johnson” in “CSE444”:
BEGIN TRANSACTION;
INSERT INTO Takes
SELECT Students.SSN, Courses.CID
FROM Students, Courses
WHERE Students.name = ‘Mary Johnson’ and
Courses.name = ‘CSE444’
-- More updates here....
IF everything-went-OK
THEN COMMIT;
ELSE ROLLBACK
19
If system crashes, the transaction is still either committed or aborted
Transactions
• A transaction = sequence of statements that
either all succeed, or all fail
• Transactions have the ACID properties:
A = atomicity
C = consistency
I = isolation
D = durability
20
Queries
• Find all courses that “Mary” takes
SELECT C.name
FROM Students S, Takes T, Courses C
WHERE S.name=“Mary” and
S.ssn = T.ssn and T.cid = C.cid
• What happens behind the scene ?
– Query processor figures out how to answer the
query efficiently.
21
Queries, behind the scene
Declarative SQL query
Imperative query execution plan:
sname
SELECT C.name
FROM Students S, Takes T, Courses C
WHERE S.name=“Mary” and
S.ssn = T.ssn and T.cid = C.cid
cid=cid
sid=sid
name=“Mary”
Students
Takes
Courses
The optimizer chooses the best execution plan for a query
22
Database Systems
• The big commercial database vendors:
–
–
–
–
Oracle
IBM (with DB2)
Microsoft (SQL Server)
Sybase
• Some free database systems (Unix) :
– Postgres
– MySQL
– Predator
• In CSE444 we use SQL Server.
You can also choose MySQL, but less support
23
Databases and the Web
• Accessing databases through web interfaces
– Java programming interface (JDBC)
– Embedding into HTML pages (JSP)
– Access through http protocol (Web Services)
• Using Web document formats for data
definition and manipulation
– XML, Xquery, Xpath
– XML databases and messaging systems
24
Database Integration
• Combining data from different databases
– collection of data (wrapping)
– combination of data and generation of new
views on the data (mediation)
• Problem: heterogeneity
– access, representation, content
• Example revisited (Demo)
– http://immo.search.ch/
25
Other Trends in Databases
• Industrial
– Object-relational databases
– Main memory database systems
– Data warehousing and mining
• Research
– Peer-to-peer data management
– Stream data management
– Mobile data management
26
The Course
• Goal: Teaching RDBMS (standard) with a
strong emphasis on the Web
• Fortunately others already did it
– Alon Halevy, Dan Suciu, Univ. of Washington
– http://www.cs.washington.edu/education/course
s/cse444/
– http://www.acm.org/sigmod/record/issues/0309/
4.AlonLevy.pdf
27
Acknowledgement
• Build on UoW course
– many slides
– many exercise
– ideas for the project
• Main difference
– less theory
– will use real Web data in the project
28
Course Outline
(Details on the Web)
Part I
• SQL (Chapter 6)
• The relational data model (Chapter 3)
• Database design (Chapters 2, 3, 7)
• XML, XPath, XQuery
Part II
• Indexes (Chapter 13)
• Transactions and Recovery (Chapter 17 - 18)
Exam
29
Structure
• Prerequisites:
– Programming courses
– Data structures
• Work & Grading:
– Homeworks (4): 0%
– Exam (like homeworks): 50%
– Project: 50% (see next) – each phase graded separately
30
The Project
• Models the real data management needs of a
Web company
– Phase 1: Modelling and Data Acquisition
– Phase 2: Data integration and Applications
– Phase 3: Services
• "One can only start to appreciate database
systems by actually trying to use one" (Halevy)
• Any SW/IT company will love you for these
31
skills 
The Project – Side Effects
• Trains your soft skills
–
–
–
–
team work
deal with bugs, poor documentation, …
produce with limited time resources
project management and reporting
• Results useful for you personally
– Demo:
• Project should be fun 
32
Practical Concerns
•
•
•
•
•
New course, expect some hickups
Important to keep time schedule
Communication through Web
News Group
Student committee for regular feedback
(2 volunteers)
33
Week Date
Lecture
1 12.03.2004 Introduction, Basic SQL
Exercise
Project Start
Return
Ex. 1: SQL
2 19.03.2004 Advanced SQL
3 26.03.2004 Conceputal Modelling
Ex. 1 due
4 02.04.2004 Database Programming
1.1: Database Schema
1.2: Database Population
1.1
2.1: Database Integration
1.2
09.04.2004 Easter
16.04.2004 Easter
5 23.04.2004 Functional Dependencies Ex. 2: FD and RA
6 30.04.2004 Relational Algebra
7 07.05.2004 Introduction to XML
Ex. 2 due / Ex. 3: XML 2.2: Web Database Access
2.1
Ex. 3 due
3.1: Web Service Interface
2.2
3.2: Web Service Usage
3.1
8 14.05.2004 XML Query
9 21.05.2004 Web Services
10 28.05.2004 Concurrency
Ex. 4: Transactions
11 04.06.2004 Recovery
12 11.06.2004 Database Heterogeneity
Ex. 4 due
13 18.06.2004 Indexing
3.2
34
So what is this course about,
really ?
A bit of everything !
• Languages: SQL, XPath, XQuery
• Data modeling
• Theory ! (Functional dependencies, normal
forms)
• Algorithms and data structures (in the second half)
• Lots of implementation and hacking for the
project
• Most importantly: how to meet Real World needs
35
Download