Scalable Access within the Context of Digital Libraries

advertisement
CS541 Database Systems
CS 541 Lecture Slides
Sunil Prabhakar
Instructor

Sunil Prabhakar





LWSN 2142C
Office Hours: catch me or by appointment
sunil@cs.purdue.edu
http://www.cs.purdue.edu/homes/sunil/
Teaching Assistant: Yasin Silva



ysilva@cs.purdue.edu
Office hours: TBA
Assignments and Projects
March 22, 2016
Sunil Prabhakar
2
Course Information

Web page:



Email alias




http://www.cs.purdue.edu/homes/sunil/syllabi/CS541_Fall
2004.html
Projects, Assignments, Solutions, Slides
Announcements: IMPORTANT
cs541@cs.purdue.edu
mailer add me to cs541
WebCT


Grades
Check that you can log in
March 22, 2016
Sunil Prabhakar
3
Course Description





Introductory graduate course on databases
Fundamental concepts & internals
Some coverage of use of databases (Oracle
projects)
Will not teach use of databases!!!
Focus on Relational Databases
March 22, 2016
Sunil Prabhakar
4
Topics







DBMS Concepts and Architecture
Relational Database Model
Relational Languages (Algebra, Calculus, SQL)
Storage and Indexing
Query Processing
Query Optimization
Transaction Processing



Concurrency Control
Recovery
Advanced Topics: TBD (Mining, Indexing, Sensors,
…)
March 22, 2016
Sunil Prabhakar
5
Pre-Requisites

Data Structures


Operating Systems


Notions of trees, hashing, linked lists etc.
I/O
Java



Project 3 will be done in Java
RMI
Simple GUI
March 22, 2016
Sunil Prabhakar
6
Text

Database System Concepts (4th Edition)




Silberschatz, Korth, Sudarshan
ISBN: 0-07-228363-7
McGraw Hill
Supplemental Text:




Concurrency Control and Recovery in Database Systems
Bernstein, Hadzilacos, Goodman.
Out of Print: Avaliable free on the Internet
Link from course web page.
March 22, 2016
Sunil Prabhakar
7
Grading Policy

Tentative







Written Assignments (2)
Programming Projects (3-4)
Mid-term Exam
Final Exam
20%
40%
20%
20%
Final not comprehensive
Grading is curved
No extra credit assignments
March 22, 2016
Sunil Prabhakar
8
Academic Integrity

CS Policy






IMPORTANT: visit, read and accept!!!
https://portals.cs.purdue.edu/student
Need CS login and password.
Cheating will be taken very seriously.
Make sure that you are familiar with what CS
considers to be cheating!!
You may discuss the problems, but the final solution
must be your own.
March 22, 2016
Sunil Prabhakar
9
Course Policy




NO LATE SUBMISSIONS
NO LATE SUBMISSIONS
NO EXTENSIONS
NO EXTENSIONS
***
Only on Documented Medical Reasons or Family
emergency.
March 22, 2016
Sunil Prabhakar
10
Databases

What is a database?


Why do we need a database?







S/w to manage data.
Ease of development,
Efficiency
Concurrency
Reliability
Ease of administration
Data independence
Importance of databases?

Increasing or decreasing? What is changing?
March 22, 2016
Sunil Prabhakar
11
What is interesting?

Essential to modern applications?


Is there anything challenging?



Data is a valuable commodity.
Encompass PL, OS, Logic, Theory, …
Novel solutions with wider applicability: Transactions,
Locking, …
What remains to be done?

Modern applications: Multimedia, Sensors, Streams, Data
Warehouses, Data Mining, Privacy and Security, Knowledge,
Data on the Web, XML, ….
March 22, 2016
Sunil Prabhakar
12
Abstraction


How to provide a generic, application-independent
solution?
Data Models






Abstract view of data
Database efficiently supports this model
Examples: Network, Relational, OO, O-R, …
Most successful model: RELATIONAL
Users access the database as a black box that
supports the model.
Languages are used to interact with this Box:

Relational Algebra, SQL,
March 22, 2016
Sunil Prabhakar
13
Independence

Databases allow applications and users to be
shielded from the internal details:

Physical data independence



How data is stored (bits, pages, formats, etc.)
Compare with Flat file alternative
Logical data independence


March 22, 2016
How data is structured logically.
Allows applications to make changes to the logical organization
of data without have to rebuild applications
Sunil Prabhakar
14
Concurrency Control & Recovery

Two highly desirable requirements:



Challenge:


Enable multiple users to access the data at the same time.
Automatic recovery from crashes.
How to do this in an application-independent manner?
Solution:


Transactions
“Contract” between the DB Black Box and users.
March 22, 2016
Sunil Prabhakar
15
Performance






Critical for databases
Research focus for many years
Must be transparent to the users
Query processing & Optimization
Indexing, storage organization (data independence)
Challenge:


How to optimize without understanding the semantics of an
application?
Solution:

Relation data model -- clean mathematical abstraction,
allows for alternative equivalent evaluations
March 22, 2016
Sunil Prabhakar
16
This course


Study the relational model, ER model, languages.
Transactions






Concurrency Control
Recovery
Storage and File Structures
Indexing and Hashing
Query Processing and Optimization
Advanced Topics

New data types, applications, multi-dimensional data, data
warehousing, data mining, design, …
March 22, 2016
Sunil Prabhakar
17
Download