CS3431 – Database Systems I Introduction Murali Mani What is a Database System? • Database: a large collection of related data. usually too large to fit in computer memory at once Focus: information, rather than computation Database Mangement System (DBMS) Software that allows us to create, use and maintain a database. Murali Mani Database Applications E-commerce: inventory of books, CDs etc at Amazon, B&N etc. Banks, Airlines Universities GIS (Maps) – find restaurants closest to WPI WWW (World Wide Web) Bio-informatics (genome data) Digital Libraries Murali Mani Focus of this course Tabular View of Data: Airline System Flight Passenger FlewIn flightNo start destination miles 101 BOS LAX 3000 102 PVD LAX 2900 pName ffNumber DoB milesEarned Joe 1001 1980 12000 Mary 1002 1981 11000 flightNo ffNumber date 101 1001 Jan 4 102 1002 Jan 5 Murali Mani Focus of this Course: RDBMS Tabular view of data: Relational Model Data Model: A collection of concepts used for describing data Structures, Constraints, Operations Schema: Describes structures and constraints for a given application. RDBMS: Relational Database Management Systems Software that allows us to create, use and maintain a relational database. Murali Mani Levels of Abstraction View1 • External schema (view) describes how users see the data • Logical schema describes the logical structures used • Physical schema describes files and indexes Murali Mani View2 Logical Schema Physical Schema disk View3 Levels of Abstraction: Example Logical Schema: Flight, Passenger, FlewIn Physical Schema Index on flightNo for Flight Index on flightNo for FlewIn Views NoOfPassengers (flightNo, date, numPassengers) Murali Mani Why use DBMS, and not files? Data independence and efficient access Reduced application development time Data integrity: Ensure consistency of data even with multiple users Recovery from crashes, security etc. Murali Mani Data independence and efficient access Data independence Logical Data Independence: Logical schema can change, but views need not change Physical Data Independence: Physical schema such as indexes can change, but logical schema ned not change. Efficient Access Indexes allow you to see only the “necessary” portion of data, as opposed to sequential access in files. Murali Mani Reduced application development time Higher level of data abstraction Queries are written in a high level language tailored for database applications. SELECT pname FROM Passengers WHERE flightNo = 101 Murali Mani Data Integrity Concurrent Access, DBMS ensures data is consistent eg: multiple airline staff trying to reserve a seat for different customers. Ideas: Transactions – grouping multiple instructions (reads/writes) into one atomic unit Locks – locking of resources (tables) Murali Mani Recovery from Crashes, Security etc If the system crashes in the middle of a transaction, recovery should be possible. Ideas: logging, commit/rollback of transactions Also other features such as security, access control, privileges etc to facilitate administration. Murali Mani Who use databases? End users DB application programmers Database Administrators Database design Security, Authorization Data availability, crash recovery Database tuning (for performance) Murali Mani Why study DBMS? Need to process large amounts of data keep increasing Video, WWW, geographic information systems (GIS), genome data, digital libraries DBMS research is one of the most exciting areas in Computer Science !! Murali Mani What will we learn in this course? Database Design Operations for Relational Model: Relational Algebra SQL: Representing the application requirements formally in a conceptual model (ER, Entity Relationship Model) Translate an ER schema to relational schema Analyze the goodness of relational schema designed using normalization theory. DDL (Data Definition Language) DML (Data Manipulation Language) Briefly study indexes, transactions, logging, security Murali Mani Course Logistics Web Page: http://www.cs.wpi.edu/~cs3431/b05 Lectures M, T, R – regular lectures F – discussion on project, H/Ws Grading H/W assignments (mostly 5): 10% Projects (in 3 phases): 25% mid term (Nov 18): 30% Final (Dec 15): 30% Class participation: 5% Murali Mani H/Ws and Projects H/Ws will be due Friday before class. Project will be done in 3 phases Phase 0: Due Nov 11, 4:00 pm by email Phase 1: Due Nov 29, 4:00 pm (via turnin) Phase 2: Due Dec 13, 4:00 pm (via turnin) Late submissions Marks for late submissions will not count. However we will be happy to grade them. Murali Mani Tips for doing well Exams Master the topics Master the topics on time Project Ensure that you are on schedule Additional investigations can get up to 6 additional points. Murali Mani Office Hours Will be posted on the web. Make use of the office hours to ensure you are mastering the materials. Murali Mani Introductory Material Sets, Relations and Functions Murali Mani Sets Unordered collection of objects Characteristics Unordered No duplicates (no object appears more than once in a set) Eg: Set of passengers, set of flights Recall the main set operations Union, intersection, complement Check subset Murali Mani Relations Given multiple sets A1, A2, …, An, a relation is a set of n-tuples of the form (a11, a12, …, a1n), where a11 is an element of A1, a12 is an element of A2, and so on. Eg: suppose the set of course = {DB1, DB2}, the set of TAs = {Hong, Song}, then a relation between these two sets could be {(DB1, Hong), (DB1, Song), (DB2, Hong)} Murali Mani Functions Given two sets A, B, a function f from A to B is denoted as f: A B. This maps any value of A to one value of B. Eg: consider function from faculty members to depts {(Mike Gennert CS), (Peter Hansen Humanities)} Characteristics A is called domain B is called range No value of A can map to multiple B’s. Murali Mani Functions Injection (one to one): Surjections (onto) No 2 values in A map to the same B Eg: set of Husbands set of wives Every value in B has at least 1 value in A that maps to it Bijections One to one and onto Murali Mani