CS541 Database Systems CS 541 Lecture Slides Sunil Prabhakar Instructor Sunil Prabhakar LWSN 2142C Office Hours: catch me or by appointment sunil@cs.purdue.edu http://www.cs.purdue.edu/homes/sunil/ Teaching Assistant: Yasin Silva ysilva@cs.purdue.edu Office hours: TBA Assignments and Projects March 22, 2016 Sunil Prabhakar 2 Course Information Web page: Email alias http://www.cs.purdue.edu/homes/sunil/syllabi/CS541_Fall 2004.html Projects, Assignments, Solutions, Slides Announcements: IMPORTANT cs541@cs.purdue.edu mailer add me to cs541 WebCT Grades Check that you can log in March 22, 2016 Sunil Prabhakar 3 Course Description Introductory graduate course on databases Fundamental concepts & internals Some coverage of use of databases (Oracle projects) Will not teach use of databases!!! Focus on Relational Databases March 22, 2016 Sunil Prabhakar 4 Topics DBMS Concepts and Architecture Relational Database Model Relational Languages (Algebra, Calculus, SQL) Storage and Indexing Query Processing Query Optimization Transaction Processing Concurrency Control Recovery Advanced Topics: TBD (Mining, Indexing, Sensors, …) March 22, 2016 Sunil Prabhakar 5 Pre-Requisites Data Structures Operating Systems Notions of trees, hashing, linked lists etc. I/O Java Project 3 will be done in Java RMI Simple GUI March 22, 2016 Sunil Prabhakar 6 Text Database System Concepts (4th Edition) Silberschatz, Korth, Sudarshan ISBN: 0-07-228363-7 McGraw Hill Supplemental Text: Concurrency Control and Recovery in Database Systems Bernstein, Hadzilacos, Goodman. Out of Print: Avaliable free on the Internet Link from course web page. March 22, 2016 Sunil Prabhakar 7 Grading Policy Tentative Written Assignments (2) Programming Projects (3-4) Mid-term Exam Final Exam 20% 40% 20% 20% Final not comprehensive Grading is curved No extra credit assignments March 22, 2016 Sunil Prabhakar 8 Academic Integrity CS Policy IMPORTANT: visit, read and accept!!! https://portals.cs.purdue.edu/student Need CS login and password. Cheating will be taken very seriously. Make sure that you are familiar with what CS considers to be cheating!! You may discuss the problems, but the final solution must be your own. March 22, 2016 Sunil Prabhakar 9 Course Policy NO LATE SUBMISSIONS NO LATE SUBMISSIONS NO EXTENSIONS NO EXTENSIONS *** Only on Documented Medical Reasons or Family emergency. March 22, 2016 Sunil Prabhakar 10 Databases What is a database? Why do we need a database? S/w to manage data. Ease of development, Efficiency Concurrency Reliability Ease of administration Data independence Importance of databases? Increasing or decreasing? What is changing? March 22, 2016 Sunil Prabhakar 11 What is interesting? Essential to modern applications? Is there anything challenging? Data is a valuable commodity. Encompass PL, OS, Logic, Theory, … Novel solutions with wider applicability: Transactions, Locking, … What remains to be done? Modern applications: Multimedia, Sensors, Streams, Data Warehouses, Data Mining, Privacy and Security, Knowledge, Data on the Web, XML, …. March 22, 2016 Sunil Prabhakar 12 Abstraction How to provide a generic, application-independent solution? Data Models Abstract view of data Database efficiently supports this model Examples: Network, Relational, OO, O-R, … Most successful model: RELATIONAL Users access the database as a black box that supports the model. Languages are used to interact with this Box: Relational Algebra, SQL, March 22, 2016 Sunil Prabhakar 13 Independence Databases allow applications and users to be shielded from the internal details: Physical data independence How data is stored (bits, pages, formats, etc.) Compare with Flat file alternative Logical data independence March 22, 2016 How data is structured logically. Allows applications to make changes to the logical organization of data without have to rebuild applications Sunil Prabhakar 14 Concurrency Control & Recovery Two highly desirable requirements: Challenge: Enable multiple users to access the data at the same time. Automatic recovery from crashes. How to do this in an application-independent manner? Solution: Transactions “Contract” between the DB Black Box and users. March 22, 2016 Sunil Prabhakar 15 Performance Critical for databases Research focus for many years Must be transparent to the users Query processing & Optimization Indexing, storage organization (data independence) Challenge: How to optimize without understanding the semantics of an application? Solution: Relation data model -- clean mathematical abstraction, allows for alternative equivalent evaluations March 22, 2016 Sunil Prabhakar 16 This course Study the relational model, ER model, languages. Transactions Concurrency Control Recovery Storage and File Structures Indexing and Hashing Query Processing and Optimization Advanced Topics New data types, applications, multi-dimensional data, data warehousing, data mining, design, … March 22, 2016 Sunil Prabhakar 17