C20.0046: Database Management Systems Lecture #1 Matthew P. Johnson Stern School of Business, NYU Spring, 2005 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 1 Personnel Instructor: Matthew P. Johnson mjohnson@stern Office hours: tba, 8-171, KMC please visit! Tutor/TF/grader: tba… M.P. Johnson, DBMS, Stern/NYU, Spring 2005 2 Communications Web page: http://pages.stern.nyu.edu/~mjohnson/dbms/ syllabus course policies may move in the future… Blackboard web site Some materials will be available here Discussion board send general-interest messages here to benefit all! Go to http://sternclasses.nyu.edu Click on C20.0046 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 3 Acknowledgements Thanks to Ramesh, Ullman, et al., Raghu and Johannes, Dan Suciu, Arthur Keller, David Kuijt for course materials See classpage for other related, antecedent DBMS courses M.P. Johnson, DBMS, Stern/NYU, Spring 2005 4 What Is a Database? A very large, integrated collection of data. Models real-world enterprise. Entities Relationships students, courses, instructors, TAs George is currently taking C20.0046 Dick is currently teaching C20.0046 Condi is currently TA-ing C20.0046 but took it last semester A Database Management System (DBMS) is a software package designed to store and manage databases. M.P. Johnson, DBMS, Stern/NYU, Spring 2005 5 Databases are everywhere: Ordering a pizza Databases involved? Pizza Hut’s DB 1. stores previous orders by customer stores previous credit cards used Credit card records 2. huge databases of (attempted) purchases location, date, amount, parties Got approved by credit-report companies phone company’s records 3. 4. Local Usage Details (“Pull his LUDs, Lenny.”) Caller ID 5. ensures reported address matches destination M.P. Johnson, DBMS, Stern/NYU, Spring 2005 6 Your wallet is full of DB records Driver’s license Credit cards NYUCard Medical insurance card Social security card Gym membership Money (serial numbers) Maybe even photos (ids on back) M.P. Johnson, DBMS, Stern/NYU, Spring 2005 7 Databases are everywhere Q: Websites backed by DBMSs? retail: Amazon, etc. data-mining: Page You Made search engines: Google, etc. directories: Internic, etc. searchable DBs: IMDB, tvguide.com, etc. Q: Non-web examples of DBMSs? criminal/terrorist: TIA airline bookings NYPD’s CompStat all serious crime stats by precinct Retailers: Wal-Mart, etc. when to re-order, purchase patterns, data-mining Genomics! M.P. Johnson, DBMS, Stern/NYU, Spring 2005 8 Example of a Traditional DB App Suppose we are building a system to store the information about: checking accounts savings accounts account holders state of each of each person’s accounts M.P. Johnson, DBMS, Stern/NYU, Spring 2005 9 Can we do it without a DBMS? Sure we can! Start by storing the data in files: checking.txt savings.txt customers.txt Now write C or Java programs to implement specific tasks M.P. Johnson, DBMS, Stern/NYU, Spring 2005 10 Doing it without a DBMS... Transfer $100 from George’s savings to checking: Write a C program to do the following: Read savings.txt Find&update the record “George” balance -= 100 Write savings.txt Read checking.txt Find&update the record “George” balance += 100 Write checking.txt M.P. Johnson, DBMS, Stern/NYU, Spring 2005 11 Problems without an DBMS... 1. System crashes: Read savings.txt Find&update the rec “George.” Write savings.txt Read checking.txt Find&update the rec “George” Write checking.txt CRASH ! Q: What is the problem ? A: George lost his $100 Same problem even if reordered 2. Simultaneous access by many users George and Dick visit ATMs at same Lock checking.txt before each use–what is the problem? M.P. Johnson, DBMS, Stern/NYU, Spring 2005 12 Problems without an DBMS... 3.Large data sets (say 500GB) No indices Why is this a problem? Finding “George” in huge flatfile is expensive Modifications intractable without better data structures “George” “Georgie” is very expensive Deletions are very expensive M.P. Johnson, DBMS, Stern/NYU, Spring 2005 13 Problems without an DBMS... 5.Security? File system may be insecure File system security may be coarse 6.Application programming interface (API)? suppose need other apps to access DB 7.How to interact with other DBMSs? M.P. Johnson, DBMS, Stern/NYU, Spring 2005 14 General problems to solve In building our own system, many Qs arise: how do we store the data? (file organization, etc.) how do we query the data? (write programs…) make sure that updates don’t mess things up? leave the DB “consistent” provide different views on the data? e.g., ATM user’s view v. bank teller’s view how do we deal with crashes? Too hard! Go buy a DBMS! Q: How does a DBMS solve these problems? A: See third part of course M.P. Johnson, DBMS, Stern/NYU, Spring 2005 15 Big issue: Transaction processing Grouping of several queries (or other database actions) into one transaction ACID properties Atomicity Consistency constraints on relationships Isolation all or nothing concurrency control Simulated solipsism Durability Crash recovery M.P. Johnson, DBMS, Stern/NYU, Spring 2005 16 Atomicity & Durability Saw how George lost $100 with makeshift software A DBMS prevents this outcome xacts are all or nothing One idea: Keep a log (history) of all actions in set of xacts Durability: Use log to redo or undo certain ops in crash recovery Atomicity: don’t really commit changes until end Then, all at once M.P. Johnson, DBMS, Stern/NYU, Spring 2005 17 Isolation Concurrent execution is essential for performance. Interleaving actions of different user programs can lead to inconsistency: Frequent, slow disk accesses don’t waste CPU – keep running e.g., two programs simultaneously withdraw from the same account DBMS ensures such problems don’t arise: users can pretend they are using a single-user system. M.P. Johnson, DBMS, Stern/NYU, Spring 2005 18 Isolation Contrast with a file in two Notepads Strategy: ignore multiple users whichever saves last wins first save is overwritten Contrast with a file in two Words Strategy: blunt isolation One can edit To the other it’s read-only M.P. Johnson, DBMS, Stern/NYU, Spring 2005 19 Consistency Each xant (on a consistent DB) must leave it in a consistent state can define integrity constraints checks the defined claims about the data remain true M.P. Johnson, DBMS, Stern/NYU, Spring 2005 20 Data Models Any DBMS uses a data model: collection of concepts for describing data Schema: description of partic set of data, using some data model Relational data model: most widely used (by far) data model Oracle, DB2, SQLServer, other SQL DBMSs main concept: relation ~ table of rows & columns a rel’s schema defines its fields M.P. Johnson, DBMS, Stern/NYU, Spring 2005 21 Example: university database Conceptual schema: Physical schema: Students(ssn: string, name: string, login: string, age: int, gpa: real) Courses(cid: string, cname: string, credits: int) Enrolled(sid:string, cid:string, grade: string) Relations stored as unordered text files. Indices on first column of each rel External Schema (View): Course_info(ssn: string, name: string) My_courses(cname: string, grade: string) M.P. Johnson, DBMS, Stern/NYU, Spring 2005 22 How the programmer sees the DBMS Start with DDL to create tables: CREATE TABLE Students ( Name CHAR(30) SSN CHAR(9) PRIMARY KEY NOT NULL, Category CHAR(20) ) ... Continue with DML to populate tables: INSERT INTO Students VALUES(‘Howard, ‘123456789’, ‘undergraduate’) . . . . M.P. Johnson, DBMS, Stern/NYU, Spring 2005 23 How the programmer sees the DBMS Tables: Takes: Students: SSN 123-45-6789 234-56-7890 Courses: CID C20.0046 C20.0056 Name Howard Wesley … Category undergrad grad … SSN 123-45-6789 CID C20.0046 123-45-6789 C20.0056 234-56-7890 C20.0046 … CName Databases Advanced Software semester Spring, 2004 Spring, 2004 Fall, 2003 Still implemented as files, but behind the scenes can be quite complex “data independence” = separate logical view from physical implementation M.P. Johnson, DBMS, Stern/NYU, Spring 2005 24 Querying: Structured Query Language Find all the students who have taken C20.0046: Find all the students who C20.0046 last fall: SELECT SSN FROM Takes WHERE CID=‘C20.0046’ AND Semester=‘Fall, 2003’ Find the students’ names: SELECT SSN FROM Takes WHERE CID=‘C20.0046’ SELECT Name FROM Students, Takes WHERE Students.SSN=Takes.SSN AND CID=‘C20.0046’ AND Semester=‘Fall, 2003’ Query processor does this efficiently. M.P. Johnson, DBMS, Stern/NYU, Spring 2005 25 Database Industry Relational databases are a great success of theoretical ideas. based on most “theoretical” type of math there is: set theory DBMS companies are among the largest software companies in the world. Oracle, IBM (with DB2), Microsoft (SQL Server, Microsoft Access), etc. Also opensource: MySQL, PostgreSQL, etc. $20B+ industry. XML (“semi-structured data”) also important New lingua franca for exchanging data M.P. Johnson, DBMS, Stern/NYU, Spring 2005 26 The Study of DBMS Several aspects: This course covers all three Modeling and design of databases DBMS programming: querying and update DBMS implementation though more time on first two Also will look at some more advanced areas XML, data-mining, LDAP? M.P. Johnson, DBMS, Stern/NYU, Spring 2005 27 Databases are used by DB app programmers desktop app programmers web developers Database administrators (DBAs) design schemas security/authorization crash recovery tuning better paid than programmers! Everyone else (perhaps indirectly) “You may not be interested in databases, but databases are interested in you.” - Trotsky M.P. Johnson, DBMS, Stern/NYU, Spring 2005 28 Course outline Database design: The relational model: Entity/Relationship models Modeling constraints Relational algebra Transforming E/R models to relational schemas SQL Views and triggers M.P. Johnson, DBMS, Stern/NYU, Spring 2005 29 Outline (Continued) Connecting to a database from a programming language Storage and indexing Transactions XML Advanced topics May change as course progresses partly in response to audience M.P. Johnson, DBMS, Stern/NYU, Spring 2005 30 Textbook Database Management Systems by Raghu Ramakrishnan, Johannes Gehrke 3 edition (August 14, 2002) Available: NYU bookstore Amazon/BN (may be cheaper) Amazon.co.uk (may be cheaper still) Links on class page M.P. Johnson, DBMS, Stern/NYU, Spring 2005 31 SQL Readings Optional reference: SQL in a Nutshell Online (free) SQL tutorials include: A Gentle Introduction to SQL (http://sqlzoo.net/) SQL for Web Nerds (http://philip.greenspun.com/sql/) M.P. Johnson, DBMS, Stern/NYU, Spring 2005 32 Grading Prerequisites: Programming experience Work & Grading: presumably C/C++/Java Homework 30%: O(4) Project: 30% - see below. Midterm (closed book/notes): 15% Final (closed book/notes): 20% Class participation: 5% Stern Curve Class attendance is required Absences will seriously affect your total grade M.P. Johnson, DBMS, Stern/NYU, Spring 2005 33 The Project: design end-to-end DB app data model creation of DB in Oracle/Mysql Identify entities (and fields), relationships Identify resulting relations (tables) Insertion of real(alistic) data (web) app for accessing/modifying data Identification of “interesting” questions to ask Production of DBMS interface Work in pairs/threes (start forming now) Choose topic on your own: previous e.g.s online Start forming your group today! M.P. Johnson, DBMS, Stern/NYU, Spring 2005 34 Collaboration Homework and exams done individually Project done with your team members only Non-cited use of others’ problem solutions, code, etc. = plagiarism See Stern’s stern academic honesty policy Contact me if you’re at all unclear before a particular case Cite any materials used if you’re at all unclear after a particular case M.P. Johnson, DBMS, Stern/NYU, Spring 2005 35 On-going Feedback Don’t wait until the end-of-semester course evals to complain or give feedback on how to improve course. (It’s too late for you then!) Come see me early on during my office hours or send me email with your concerns “We’re in touch, so you be in touch.” M.P. Johnson, DBMS, Stern/NYU, Spring 2005 36 Summary DBMS used to maintain, query large datasets. Benefits include recovery from system crashes, concurrent access, data integrity, security, and quick application development. Database skills are critical in financial services, marketing and other business areas! M.P. Johnson, DBMS, Stern/NYU, Spring 2005 37 So what is this course about, really? A bit of everything! Languages: SQL, XPath, XQuery Data modeling Some theory! Algorithms and data structures (in the third part) Functional dependencies, normal forms e.g., how to find most efficient schema for data e.g., indices make data much faster to find – but how? Lots of implementation and hacking for the project Business DBMS examples/cases Most importantly: how to meet real-world needs M.P. Johnson, DBMS, Stern/NYU, Spring 2005 38 For next time Get the book Read chapters 1 & 2 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 39 For right now: written survey name previous cs/is/math/logic courses previous programming experience career plans: programmer, DBA, MBA, etc. why taking class M.P. Johnson, DBMS, Stern/NYU, Spring 2005 40