Introduction to Information Systems SSC, Semester 6 Lecture 01 1 Staff • Instructors: – Karl Aberer, BC 108, karl aberer at epfl ch – Philippe Cudré-Mauroux, BC 114, philippe cudre-mauroux at epfl ch – Office hours: by appointment • TAs: – Gleb Skobeltsyn (exercises) – Martin Rubli (project) 2 Communications • Web page: lsirww.epfl.ch – Lectures will be available here – Homeworks and solutions will be posted here – The project description and resources will be here • Newsgroup: – epfl.ic.cours.IIS 3 Textbook Main textbook: • Databases and Transaction Processing, An application-oriented approach Philip M. Lewis, Arthur Bernstein, Michael Kifer, Addison-Wesley 2002. 4 Other Texts Many classic textbooks (each of them will do it) • Database Systems: The Complete Book, Hector Garcia-Molina, Jeffrey Ullman, Jennifer Widom • Database Management Systems, Ramakrishnan • Fundamentals of Database Systems, Elmasri, Navathe • Database Systems, Date (7. edition) • Modern Database Management, Hoffer, (4. edition) • Database Systems Concepts, Silverschatz, (4. edition) 5 Material on the Web SQL Intro • SQL for Web Nerds, by Philip Greenspun, http://philip.greenspun.com/sql/ Java Technology • java.sun.com WWW Technology • www.w3c.org 6 The Course • Goal: Teaching RDBMS (standard) with a strong emphasis on the Web • Fortunately others already did it – Alon Halevy, Dan Suciu, Univ. of Washington – http://www.cs.washington.edu/education/course s/cse444/ – http://www.acm.org/sigmod/record/issues/0309/ 4.AlonLevy.pdf 7 Acknowledgement • Build on UoW course – many slides – many exercise – ideas for the project • Main difference – less theory – will use real Web data in the project 8 Outline for Today’s Lecture • Overview of database systems • Course Outline • First Steps in SQL 9 What is behind this Web Site? • • • • • • http://immo.search.ch/ Search on a large database Specify search conditions Many users Updates Access through a web interface 10 Database Management Systems Database Management System = DBMS • A collection of files that store the data • A big C program written by someone else that accesses and updates those files for you Relational DBMS = RDBMS • Data files are structured as relations (tables) 11 Where are RDBMS used ? • Backend for traditional “database” applications – EPFL administration • Backend for large Websites – Immosearch • Backend for Web services – Amazon 12 Example of a Traditional Database Application Suppose we are building a system to store the information about: • students • courses • professors • who takes what, who teaches what 13 Can we do it without a DBMS ? Sure we can! Start by storing the data in files: students.txt courses.txt professors.txt Now write C or Java programs to implement specific tasks 14 Doing it without a DBMS... • Enroll “Mary Johnson” in “CSE444”: Write a C/Java program to do the following: Read ‘students.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “CSE444” Write “students.txt” Write “courses.txt” 15 Problems without an DBMS... • System crashes: Read ‘students.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “CSE444” Write “students.txt” Write “courses.txt” CRASH ! – What is the problem ? • Large data sets (say 50GB) – Why is this a problem ? • Simultaneous access by many users – Lock students.txt – what is the problem ? 16 Enters a DBMS “Two tier system” or “client-server” connection (ODBC, JDBC) Data files Database server (someone else’s C program) Applications 17 Functionality of a DBMS The programmer sees SQL, which has two components: • Data Definition Language - DDL • Data Manipulation Language - DML – query language Behind the scenes the DBMS has: • Query engine • Query optimizer • Storage management • Transaction Management (concurrency, recovery) 18 How the Programmer Sees the DBMS • Start with DDL to create tables: CREATE TABLE Students ( Name CHAR(30) SSN CHAR(9) PRIMARY KEY NOT NULL, Category CHAR(20) ) ... • Continue with DML to populate tables: INSERT INTO Students VALUES(‘Charles’, ‘123456789’, ‘undergraduate’) . . . . 19 How the Programmer Sees the DBMS • Tables: Students: SSN 123-45-6789 234-56-7890 Courses: CID CSE444 CSE541 Takes: Name Charles Dan … Category undergrad grad … Name Databases Operating systems SSN 123-45-6789 123-45-6789 234-56-7890 CID CSE444 CSE444 CSE142 … Quarter fall winter • Still implemented as files, but behind the scenes can be quite complex “data independence” = separate logical view from physical implementation 20 Transactions • Enroll “Mary Johnson” in “CSE444”: BEGIN TRANSACTION; INSERT INTO Takes SELECT Students.SSN, Courses.CID FROM Students, Courses WHERE Students.name = ‘Mary Johnson’ and Courses.name = ‘CSE444’ -- More updates here.... IF everything-went-OK THEN COMMIT; ELSE ROLLBACK 21 If system crashes, the transaction is still either committed or aborted Transactions • A transaction = sequence of statements that either all succeed, or all fail • Transactions have the ACID properties: A = atomicity (a transaction should be done or undone completely ) C = consistency (a transaction should transform a system from one consistent state to another consistent state) I = isolation (each transaction should happen independently of other transactions ) D = durability (completed transactions should remain permanent) 22 Queries • Find all courses that “Mary” takes SELECT C.name FROM Students S, Takes T, Courses C WHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid • What happens behind the scene ? – Query processor figures out how to answer the query efficiently. 23 Queries, behind the scene Declarative SQL query Imperative query execution plan: sname SELECT C.name FROM Students S, Takes T, Courses C WHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid cid=cid sid=sid name=“Mary” Students Takes Courses The optimizer chooses the best execution plan for a query 24 Database Systems • The big commercial database vendors: – – – – Oracle IBM (with DB2) Microsoft (SQL Server) Sybase • Some free database systems (Unix) : – Postgres – MySQL – Predator 25 Databases and the Web • Accessing databases through web interfaces – Java programming interface (JDBC) – Embedding into HTML pages (JSP) – Access through http protocol (Web Services) • Using Web document formats for data definition and manipulation – XML, Xquery, Xpath – XML databases and messaging systems 26 Database Integration • Combining data from different databases – collection of data (wrapping) – combination of data and generation of new views on the data (mediation) • Problem: heterogeneity – access, representation, content • Example revisited – http://immo.search.ch/ – http://www.swissimmo.ch 27 Other Trends in Databases • Industrial – Object-relational databases – Main memory database systems – Data warehousing and mining • Research – Peer-to-peer data management – Stream data management – Mobile data management 28 Course Outline (Details on the Web) Part I • SQL (Chapter 6) • The relational data model (Chapter 3) • Database design (Chapters 2, 3, 7) • XML, XPath, XQuery Part II • Indexes (Chapter 13) • Transactions and Recovery (Chapter 17 - 18) Exam 29 Structure • Prerequisites: – Programming courses – Data structures • Work & Grading: – Homeworks (4): 0% – Exam (like homeworks): 50% – Project: 50% (see next) – each phase graded separately 30 The Project • Models the real data management needs of a Web company – Phase 1: Modelling and Data Acquisition – Phase 2: Data integration and Applications – Phase 3: Services • "One can only start to appreciate database systems by actually trying to use one" (Halevy) • Any SW/IT company will love you for these 31 skills The Project – Side Effects • Trains your soft skills – – – – team work deal with bugs, poor documentation, … produce with limited time resources project management and reporting • Results useful for you personally – Demo: • Project should be fun 32 Practical Concerns • • • • • New course, expect some hickups Important to keep time schedule Communication through Web Newsgroup Student committee for regular feedback (2 volunteers) 33 Week Date Lecture 1 11.03.2005 Introduction, Basic SQL 2 18.03.2005 Advanced SQL 25.03.2005 Easter 01.04.2005 Easter 3 08.04.2005 Conceputal Modelling 4 15.04.2005 Database Programming 5 22.04.2005 Functional Dependencies 6 29.04.2005 Relational Algebra 7 06.05.2005 Introduction to XML 8 13.05.2005 XML Query 9 20.05.2005 Web Services 10 27.05.2005 Concurrency 11 03.06.2005 Recovery 08.06.2005 12 10.06.2005 Database Heterogeneity 13 17.06.2005 Indexing Exercise Project Deadlines Project Presentation Ex. 1: SQL (on machines) Correction Ex. 1 Ex. 2: FD and RA Phase 1 Corr. Ex. 2 / Ex. 3: XML Phase 2 Correction Ex. 3 Ex. 4: Transactions Phase 3 Correction Ex. 4 34 So what is this course about, really ? A bit of everything ! • Languages: SQL, XPath, XQuery • Data modeling • Theory ! (Functional dependencies, normal forms) • Algorithms and data structures (in the second half) • Lots of implementation and hacking for the project • Most importantly: how to meet Real World needs 35