Course Information - URL Database Management Systems: Introduction All course material will be placed at the URL: www.seas.gwu.edu/~narahari/cs178/ lab materials will be on Fall 2004 • Instructor: Bhagi Narahari narahari@gwu.edu Office Hours: Tues,Wed 4-6pm, Fri 1-2pm TAs: Lab: Ali Khoshgozaran alikosh@gwu.edu HWs: Stefan CS 178 Database Management Systems 1 Course Info on Web CS 178 Database Management Systems 2 Course Requirements: Grading Notes in PDF or HTML Homeworks 15% Programming assignments 15% Will post material for each topic on the day that we cover the material in class late submissions invoke penalty of 10% each day late to a max of 60% penalty Homeworks in PDF or HTML Project description and requirements Oracle “Getting started” and Demo files Exam and project schedule will be placed All announcements - schedule, changes, etc. CS 178 Database Management Systems www.seas.gwu.edu/~cs178/ All course announcements will be placed on web-- check once a week! Two Exams (in class): 45-50% Final Project (demos required): 20-25% 3 CS 178 Database Management Systems 4 1 Academic Integrity Policy Course Requirements & Rules Homeworks www.cs.gwu.edu/academics/integrity.html written (theory) • details and FAQ relational algebra, file structures and indexing, query opt. No collaboration (of any sort) on homeworks No collaboration among teams programming assignments using Oracle SQL, PL/SQL, Oracle Forms, JDBC/Java . Project within team each team member must have clear role -- i.e., clearly partitioned tasks for each team member 2 person Team Final project due by last week of Class Grading criteria explained on project page. Will use Oracle or MySQL-- TBD CS 178 Database Management Systems violation of integrity policy -- default is maximum punishment (at least F for course) 5 Project 6 Lab Sections and TA A set of “applications” will be posted on the web site – after teams are finalized (after next class). A clear set of “minimum” requirements will be specified – note that meeting minimum requirements does not imply an A grade on the project. A portion of the project grade will be “competetive” Clear deadlines for specific steps in the project will be posted. CS 178 Database Management Systems CS 178 Database Management Systems Lab sections conducted by TA (Ali) Does Tuesday 2pm work for all ?? Lab sections will cover Intro Oracle: SQL, PL/SQL, Forms, JDBC Short tutorials – including application development Clarifications on Programming Assignments Help with analysis of Project (but not in the design of the project) There will be another TA for homework and lecture questions– details coming soon. 7 CS 178 Database Management Systems 8 2 Accounts, Team Partners etc. Prerequisites For Oracle Account: CS 141/151 Programming and Data Structures Languages: Java is required (can do project using Oracle Forms and Reports) CS 52/156 Operating system basics CS 133 Discrete Math/Logic CS 52/136 Computer Architecture/Organization Email Name and your hobbes username to the lab TA (Ali) Submit the requested background info to the TA during the first lab • You will NOT be allowed to work on the project unless you have submitted the background info. CS 178 Database Management Systems 9 Outline CS 178 Database Management Systems 10 Introduction to DBMS Introduction to Relational DBMS Logical level design of Relational Databases Formal Query Languages: Rel. algebra Query languages: SQL Relational Schema Design and Normal Forms, Tuning Physical Database Design Storage, Indexing, File Structures Query Processing and optimization-i.e., how things work Concurrency and Recovery; Intro to transaction processing Overview of Performance modelling Advanced Topics-time permitting: Security and Privacy, GIS, Data Mining, OLAP CS 178 Database Management Systems 11 CS 178 Database Management Systems 12 3 What Is a DBMS? Why Use a DBMS? A very large, integrated collection of data. Models real-world enterprise. Data independence and efficient access. Reduced application development time. Data integrity and security. Uniform data administration. Concurrent access, recovery from crashes. Entities (e.g., students, courses) Relationships (e.g., Jimmy Page is taking CS178) A Database Management System (DBMS) is the software to store/retrieve and manage databases. CS 178 Database Management Systems Why Study Databases?? 13 ? CS 178 Database Management Systems Why Study Databases?? Nothing to do on Mon,Wed 2-3pm! Shift from computation to information processing 14 ? Information gathering is first step to analysis at the “low end”: scramble to webspace (a mess!) at the “high end”: scientific applications tons of data can be collected easily using current technology Datasets increasing in diversity and volume. To effectively analyze data, must Digital libraries, interactive video, Human Genome project, GIS... need for DBMS exploding collect relevant data store in manner amenable to efficient access provide infrastructure for ease of programming DBMS encompasses most of CS OS, languages, theory, “A”I, multimedia, logic Data analysis methods are current emphasis in the market CS 178 Database Management Systems 15 CS 178 Database Management Systems 16 4 Data Analysis Data Analysis: Data Mining Data Warehousing Data mining: finding ‘hidden patterns’ in data; i.e., patterns and relationships that are not ‘obvious’ nothing but a big database (remember: you can charge your client more if you say warehouse instead of DB!!!). purchasing patterns of supermarket customers OLAP (on-line analytical processing) • multidimensional view of the data • How do you use the above pattern/knowledge to improve your marketing strategy ?? (car types, month, number of sales per month for each car type) can be viewed as 3-dimensional data can hypothesize better with different data view CS 178 Database Management Systems • Leave it to the Business Majors to worry about!! Data mining is “engine” behind Personalization software 17 Okay, back to CS178 CS 178 Database Management Systems 18 Course Outline: Schedule of Topics “logical level”- Part 1 Why the discussion on Data mining etc.? Analysis is important to make informed decisions efficient analysis requires efficient storage&design efficient storage&design requires study of DBMS! Data Mining and other data analysis tools are current trends how is data represented at the logical level how to design a good logical database for an application • what is a good design? Overview of Performance Modelling ‘physical level’ - Part 2 require solid background in database design and analysis!! how to store data on disks and memory how to efficiently implement logical level operators at ‘machine level’ and remember- DBMS is basic backbone in Transaction Processing systems! CS 178 Database Management Systems Pattern: 40% of Customers who buy beer also buy diapers. • 19 note similarity to programming language implementation CS 178 Database Management Systems 20 5 Data Models Schemas and Instances A data model is a collection of concepts for describing data. A schema is a description of a particular collection of data, using the a given data model. The relational model of data is the most widely used model today. Schema: overall design of database instance: collection of info stored at particular time var cust1: customer Main concept: relation, basically a table with rows and columns. Every relation has a schema, which describes the columns, or fields. • schema is defined at DB design time Other data models: data changes but schema does not Network Hierarchical OO CS 178 Database Management Systems 21 Levels of Abstraction Many views, single conceptual (logical) schema and physical schema. Views describe how users see the data. Conceptual schema defines logical structure Physical schema describes the files and indexes used. cust1 is variable of type customer; structure of customer is scheme, value of cust1 is instance CS 178 Database Management Systems 22 Levels of Abstraction.. View 1 View 2 View 3 Another approach is three schema architecture Physical or Internal Level: how data is stored Conceptual Level: describes what data Conceptual Schema Physical Schema also Record Level View Level: describes only part of data ATM machine * Schemas are defined using Data Definition Language (DDL); * data is modified/queried using Data Manipulation Lang(DML). CS 178 Database Management Systems 23 CS 178 Database Management Systems 24 6 Example: University Database These layers must consider concurrency control and recovery Structure of a DBMS Conceptual schema: A typical DBMS has a Query Optimization layered architecture. and Execution The figure does not Relational Operators show the concurrency Files and Access Methods control and recovery components. Buffer Management This is one of several Disk Space Management possible architectures; each system has its own variations. Students(sid: string, name: string, login: string, age: integer, gpa:real) Courses(cid: string, cname:string, credits:integer) Enrolled(sid:string, cid:string, grade:string) Physical schema: Relations stored as unordered files. Index on first column of Students. External Schema (View): Course_info(cid:string,enrollment:integer) DB CS 178 Database Management Systems 25 Key concepts CS 178 Database Management Systems 26 Data Independence Applications insulated from how data is structured and stored. Data independence Program/data independence Ability to modify a scheme definition without affecting next level => Application/data independence Concurrency control Recovery from failure Logical data independence: Protection from changes in logical structure of data. Physical data independence: Protection from changes in physical structure of data. Supports Transaction processing Why is this an advantage?? CS 178 Database Management Systems 27 CS 178 Database Management Systems 28 7 Data Definition and Manipulation Languages Query Languages data definition language (DDL) to specify database schema Data manipulation language (DML) allows users to access or manipulate data as organized by data model Formal query languages: Relational algebra, Relational Calculus, Domain calculus Commercial query languages: SQL, QUEL SQL: “descendent” of SEQUEL; mostly relational algebra and some aspects of relational calculus procedural DMLs: require user to specify what data and how to get it non-procedural DMLs: require user to specify what data is needed without specifying how to get it. CS 178 Database Management Systems has procedural and non-procedural aspects 29 Concurrency Control 30 Transaction: An Execution of a DB Program Key concept is transaction, which is an atomic sequence of database actions (reads/writes). Each transaction, executed completely, must leave the DB in a consistent state if DB is consistent when the transaction begins. Concurrent execution of user programs essential for good DBMS performance. Because disk accesses are frequent, and relatively slow, it is important to keep the CPU humming by working on several user programs concurrently. Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed. DBMS ensures such problems don’t arise: users can pretend they are using a single-user system. CS 178 Database Management Systems CS 178 Database Management Systems Thus, ensuring that a transaction (run alone) preserves consistency is ultimately the user’s responsibility! 31 CS 178 Database Management Systems 32 8 Recovery System Scheduling Concurrent Transactions DBMS ensures that execution of a set of concurrent transactions {T1, ... , Tn} is equivalent to some serial execution T1’ ... Tn’. Ability to recover from system failures Loss of main memory data (from power failures) Disk failure Maintain consistency-- transactions should not interfere with each other avoid conflicts for resources -- design protocol that ensures “mutual exclusion” • Recovery system must ensure that database is in a consistent state after failure examples ??? recover from system crashes -- log file methods DBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle of a Xact. CS 178 Database Management Systems 33 What Next 34 A little history Start with Data Models In DBMS: single instance of data maintained and accessed by different users File Processing: Earliest form of information storage systems each user keeps files needed for specific application Relational Model with a little ER model intro Formal query languages- Relational algebra SQL Database schema design: how to design a “good” schema, how to measure “good”? Normal Forms (3NF, BCNF) one user keeps track of students fees and payments second user keeps files on student grades Demonstrate concepts learnt on Commercial DBMS - Oracle CS 178 Database Management Systems CS 178 Database Management Systems 35 CS 178 Database Management Systems 36 9 File Processing and Database Progression of Database Systems Existence of catalog/data dictionary in DBMS; in file proc this is part of application data abstraction provided by DBMS - provide conceptual view of data without details on how it is stored program-data independence- DBMS programs written independent of specific data support of multiple user-views CS 178 Database Management Systems Early 1950’s- file proc, IBM’s Ramac system 1960’s: first generalized DBMS IBM Sabre 1970’s: Relational model proposed, INGRES, System R, Query languagesSequel(SQL),QUEL 1980’s: DBMS for PCs, commercial RDBMSOracle, Sybase, Informix 1990s: Object-relational DBMS, Multimedia, Spatial/GIS, Data Mining/OLAP, Dist.DB 37 Summary 38 Data Models DBMS used to maintain, query large datasets. Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security. Levels of abstraction give data independence. A DBMS typically has a layered architecture. DBAs hold responsible jobs and are well-paid!☺ DBMS use in emerging applications such as Human Genome project is exploding CS 178 Database Management Systems CS 178 Database Management Systems Conceptual or object-based logical models ER model, OO- concepts close to user perception Record-based/Representational models: provide concepts understood by users but not too far from physical -- Relational, Network, Hierarchical Physical data models: describe details of how data stored, record formats... 39 CS 178 Database Management Systems 40 10 Entity Relationship Model Record-based Logical Models Relational model: represent data and their relationships by collection of tables. (Recall: relation is a collection of tuples) Network model: collection of records, relationship between them by links or pointers Hierarchical model: similar to network but organized as collection of trees note: relational model does not use pointers Based on collection of real world objects or concept called entities; ex: employee, student attribute represents properties of entity; s.s.num relationship represents interaction between entities constraints to which database must conform overall logical structure represented by ER diagram representing entity sets, relationships,attributes CS 178 Database Management Systems 41 CS 178 Database Management Systems 42 11