Introduction to Information Systems SSC, Semester 6 Lecture 01 1 Staff • Instructor: Karl Aberer – PSE, Room 1.31, karl.aberer@epfl.ch – Office hours: by appointment • TAs: – – – – Magdalena Punceva Anwitaman Datta Gleb Sklobetskyn Roman Schmidt 2 Communications • Web page: lsirww.epfl.ch – Lectures will be available here – Homeworks and solutions will be posted here – The project description and resources will be here • Mailing list: – please subscribe (see instructions on the Web page) 3 Textbook Main textbook: • Databases and Transaction Processing, An application-oriented approach Philip M. Lewis, Arthur Bernstein, Michael Kifer, Addison-Wesley 2002. 4 Other Texts Many classic textbooks (each of them will do it) • Database Systems: The Complete Book, Hector Garcia-Molina, Jeffrey Ullman, Jennifer Widom • Database Management Systems, Ramakrishnan • Fundamentals of Database Systems, Elmasri, Navathe • Database Systems, Date (7. edition) • Modern Database Management, Hoffer, (4. edition) • Database Systems Concepts, Silverschatz, (4. edition) 5 Material on the Web SQL Intro • SQL for Web Nerds, by Philip Greenspun, http://philip.greenspun.com/sql/ Java Technology • java.sun.com WWW Technology • www.w3c.org 6 Outline for Today’s Lecture • Overview of database systems • Course Outline • First Steps in SQL 7 What is behind this Web Site? • • • • • • http://immo.search.ch/ Search on a large database Specify search conditions Many users Updates Access through a web interface 8 Database Management Systems Database Management System = DBMS • A collection of files that store the data • A big C program written by someone else that accesses and updates those files for you Relational DBMS = RDBMS • Data files are structured as relations (tables) 9 Where are RDBMS used ? • Backend for traditional “database” applications – EFPL administration • Backend for large Websites – Immosearch • Backend for Web services – Amazon 10 Example of a Traditional Database Application Suppose we are building a system to store the information about: • students • courses • professors • who takes what, who teaches what 11 Can we do it without a DBMS ? Sure we can! Start by storing the data in files: students.txt courses.txt professors.txt Now write C or Java programs to implement specific tasks 12 Doing it without a DBMS... • Enroll “Mary Johnson” in “CSE444”: Write a C/Java program to do the following: Read ‘students.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “CSE444” Write “students.txt” Write “courses.txt” 13 Problems without an DBMS... • System crashes: Read ‘students.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “CSE444” Write “students.txt” Write “courses.txt” CRASH ! – What is the problem ? • Large data sets (say 50GB) – Why is this a problem ? • Simultaneous access by many users – Lock students.txt – what is the problem ? 14 Enters a DMBS “Two tier system” or “client-server” connection (ODBC, JDBC) Data files Database server (someone else’s C program) Applications 15 Functionality of a DBMS The programmer sees SQL, which has two components: • Data Definition Language - DDL • Data Manipulation Language - DML – query language Behind the scenes the DBMS has: • Query engine • Query optimizer • Storage management • Transaction Management (concurrency, recovery) 16 How the Programmer Sees the DBMS • Start with DDL to create tables: CREATE TABLE Students ( Name CHAR(30) SSN CHAR(9) PRIMARY KEY NOT NULL, Category CHAR(20) ) ... • Continue with DML to populate tables: INSERT INTO Students VALUES(‘Charles’, ‘123456789’, ‘undergraduate’) . . . . 17 How the Programmer Sees the DBMS • Tables: Students: SSN 123-45-6789 234-56-7890 Courses: CID CSE444 CSE541 Takes: Name Charles Dan … Category undergrad grad … Name Databases Operating systems SSN 123-45-6789 123-45-6789 234-56-7890 CID CSE444 CSE444 CSE142 … Quarter fall winter • Still implemented as files, but behind the scenes can be quite complex “data independence” = separate logical view from physical implementation 18 Transactions • Enroll “Mary Johnson” in “CSE444”: BEGIN TRANSACTION; INSERT INTO Takes SELECT Students.SSN, Courses.CID FROM Students, Courses WHERE Students.name = ‘Mary Johnson’ and Courses.name = ‘CSE444’ -- More updates here.... IF everything-went-OK THEN COMMIT; ELSE ROLLBACK 19 If system crashes, the transaction is still either committed or aborted Transactions • A transaction = sequence of statements that either all succeed, or all fail • Transactions have the ACID properties: A = atomicity C = consistency I = isolation D = durability 20 Queries • Find all courses that “Mary” takes SELECT C.name FROM Students S, Takes T, Courses C WHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid • What happens behind the scene ? – Query processor figures out how to answer the query efficiently. 21 Queries, behind the scene Declarative SQL query Imperative query execution plan: sname SELECT C.name FROM Students S, Takes T, Courses C WHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid cid=cid sid=sid name=“Mary” Students Takes Courses The optimizer chooses the best execution plan for a query 22 Database Systems • The big commercial database vendors: – – – – Oracle IBM (with DB2) Microsoft (SQL Server) Sybase • Some free database systems (Unix) : – Postgres – MySQL – Predator • In CSE444 we use SQL Server. You can also choose MySQL, but less support 23 Databases and the Web • Accessing databases through web interfaces – Java programming interface (JDBC) – Embedding into HTML pages (JSP) – Access through http protocol (Web Services) • Using Web document formats for data definition and manipulation – XML, Xquery, Xpath – XML databases and messaging systems 24 Database Integration • Combining data from different databases – collection of data (wrapping) – combination of data and generation of new views on the data (mediation) • Problem: heterogeneity – access, representation, content • Example revisited (Demo) – http://immo.search.ch/ 25 Other Trends in Databases • Industrial – Object-relational databases – Main memory database systems – Data warehousing and mining • Research – Peer-to-peer data management – Stream data management – Mobile data management 26 The Course • Goal: Teaching RDBMS (standard) with a strong emphasis on the Web • Fortunately others already did it – Alon Halevy, Dan Suciu, Univ. of Washington – http://www.cs.washington.edu/education/course s/cse444/ – http://www.acm.org/sigmod/record/issues/0309/ 4.AlonLevy.pdf 27 Acknowledgement • Build on UoW course – many slides – many exercise – ideas for the project • Main difference – less theory – will use real Web data in the project 28 Course Outline (Details on the Web) Part I • SQL (Chapter 6) • The relational data model (Chapter 3) • Database design (Chapters 2, 3, 7) • XML, XPath, XQuery Part II • Indexes (Chapter 13) • Transactions and Recovery (Chapter 17 - 18) Exam 29 Structure • Prerequisites: – Programming courses – Data structures • Work & Grading: – Homeworks (4): 0% – Exam (like homeworks): 50% – Project: 50% (see next) – each phase graded separately 30 The Project • Models the real data management needs of a Web company – Phase 1: Modelling and Data Acquisition – Phase 2: Data integration and Applications – Phase 3: Services • "One can only start to appreciate database systems by actually trying to use one" (Halevy) • Any SW/IT company will love you for these 31 skills The Project – Side Effects • Trains your soft skills – – – – team work deal with bugs, poor documentation, … produce with limited time resources project management and reporting • Results useful for you personally – Demo: • Project should be fun 32 Practical Concerns • • • • • New course, expect some hickups Important to keep time schedule Communication through Web News Group Student committee for regular feedback (2 volunteers) 33 Week Date Lecture 1 12.03.2004 Introduction, Basic SQL Exercise Project Start Return Ex. 1: SQL 2 19.03.2004 Advanced SQL 3 26.03.2004 Conceputal Modelling Ex. 1 due 4 02.04.2004 Database Programming 1.1: Database Schema 1.2: Database Population 1.1 2.1: Database Integration 1.2 09.04.2004 Easter 16.04.2004 Easter 5 23.04.2004 Functional Dependencies Ex. 2: FD and RA 6 30.04.2004 Relational Algebra 7 07.05.2004 Introduction to XML Ex. 2 due / Ex. 3: XML 2.2: Web Database Access 2.1 Ex. 3 due 3.1: Web Service Interface 2.2 3.2: Web Service Usage 3.1 8 14.05.2004 XML Query 9 21.05.2004 Web Services 10 28.05.2004 Concurrency Ex. 4: Transactions 11 04.06.2004 Recovery 12 11.06.2004 Database Heterogeneity Ex. 4 due 13 18.06.2004 Indexing 3.2 34 So what is this course about, really ? A bit of everything ! • Languages: SQL, XPath, XQuery • Data modeling • Theory ! (Functional dependencies, normal forms) • Algorithms and data structures (in the second half) • Lots of implementation and hacking for the project • Most importantly: how to meet Real World needs 35