CS3431 – Database Systems I Introduction Instructor: Mohamed Eltabakh meltabakh@cs.wpi.edu 1 Today’s Lecture Overview on Database Management Systems Course Logistics 2 What is a Database System? Software platform for managing large amounts of data Managing means: Storing, querying, indexing, and structuring the data Different names refer to the same thing: Database systems Database management systems DBMS 3 What is a Database System? (Cont’d) What’s inside a DBMS Collection of interrelated data (E.g., for a given application) Set of programs to secure and access the data An environment that is both convenient and efficient to use Usually data is too large to fit in computer memory at once Data stored on disk Usually many users want to access this data and do so fast Databases touch all aspects of our lives. We use it without knowing !!! 4 Database Applications Have you ever used a database application? E-commerce: books, equipment etc. at Amazon Banks -- your valuable $$ and ATM transactions Airlines – manage flights to get you places Universities – manage student enrollment GIS (Maps) – find restaurants closest to WPI Bio-informatics (genome data) Data is everywhere. To efficiently manage it, we need DBMS ? 5 Why use DBMS, and not files? Several drawbacks of using file systems Data redundancy and inconsistency Difficulty in accessing data Multiple file formats, duplication of information in different files Multiple records formats within the same file No order enforced between fields Need to write a new program out each new task to carry Integrity problems …. Account balance >= 0 Student cannot take same course twice 6 Why use DBMS, and not files? (Cont’d) Concurrent access by multiple users Security problems Hard to provide user access to some, but not all, data Recovery from crashes Many users need to access/update the data at the same time (concurrent access) While updating the data the system crashes Maintenance problems Hard to search for or update a field Hard to add new fields 7 DBMS Provides Solutions Data consistency even with multiple users Efficient access to the data Data integrity embedded in the DBMS Recovery from crashes, security 8 Basic Terminology Data Model Data Schema Collection of actual data that conforms to given schema Database Management System (DBMS) Describes structures for a particular application, using the given model Database Tools used for describing the data Software platform that allows us to create, stores, use, and maintain a database SQL & Data Manipulation Language (DML) Language to manipulate, e.g., update or query, the data 9 Data Model A collection of tools for describing Data objects Data relationships Data semantics Data constraints We will learn these two models Several data models: Relational model Entity-Relationship (ER) data model Object-based data models (Object-oriented) Semi-structured data model (XML) Other older models: Network model Hierarchical model 10 Example: ER Model Graphical model for describing entities, attributes, and relationships 11 Data Schema Captures the relationships between objects (“entities”) in an application Schemas can be represented graphically or textual 12 Query Language (SQL) Language for accessing and manipulating the data organized by the appropriate data model SQL: Structured Query Language SELECT ID, Name FROM Student WHERE address=“320FL”; 13 Query Language Two classes of languages Procedural – user specifies what data is required and how to get those data Declarative (non-procedural) – user specifies what data is required without specifying how to get those data DBMSs use SQL SELECT ID, Name FROM Student WHERE address=“320FL”; 14 A Big Picture of What You will Learn 15 You will Learn Data Model Relational Model Entity-Relationship (ER) Model Data Schema Database Build an actual database and manipulate data Database Management System (DBMS) How to put pieces together to build a schema describing the application We will use Oracle Query Language SQL Language 16 Relational Data Model: Overview The most widely used model today It is a tabular representation of the data Main concepts: Relations (Tables), basically a table with rows and columns. Every relation has a schema, which describes the columns, or fields. Field or attribute Example Database : Relational Tabular View of Data in Airline System Flight Passenger Travel Tabular flightNo start destination miles 101 BOS LAX 3000 102 PVD LAX 2900 pName freqFlyerID DoB milesEarned Mike 3433 1980 12000 Mary 5872 1981 11000 flightNo freqFlyerID date 101 3433 Jan 4 102 5872 Jan 5 view of data is called “Relational Model” 18 Entity-Relationship Model: Overview Models the application as a collection of entities and relationships Represented using Entity-Relationship Diagram (ERD) 19 SQL: Overview SQL: Non-procedural language to access the data inside a database External programs, e.g., in C or Java, typically access the database using: Language extensions to allow embedded SQL ODBC: Open Database Connectivity JDBC: Java Database Connectivity 20 Logical vs. Physical How this information is stored??? 21 Levels of AbstractionView of Data An architecture for a database system • View Level --describes how users see the data • Logical Level – describes the logical structures used • Relational Model • ERD model • Physical Level -- describes files and indexes th Database System Concepts - 5 Edition, May 23, 2005 Usually hidden from users 1.7 ©Silberschatz, Korth and Su 22 Levels of Abstraction: Airline Application Example Logical (Conceptual) Level Physical Level Flight, Passenger, Travel tables Flight table stored as a sorted file on the flight number Index on flightNo attribute for Flight relation View Level (External Schema) NoOfPassengers (flightNo, date, numPassengers) Hide employees salary These levels of abstraction lead to “Data Independence” 23 Data Independence DBMS has the three levels of abstractions Ability to modify one level without affecting the other levels Physical data independence: Physical schema such as indexes can change, but logical schema need not change Protection from changes in physical structure of data Logical data independence: Logical schema can change, but views need not change Protection from changes in logical structure of data Other Advanced Topics Efficient access Query optimization Concurrency control Recovery control Big Data Analytics >> We will not have time to study these subjects during the course >> It is important to know their existence and what is meant by each component 25 Efficient Access Indexing Indexes gives direct access to “necessary” portion of data, as opposed to sequential access in files Directly find this customer without scanning all customers 26 Query Optimization Costing: Estimate expected execution times Query optimization : SELECT ID, Name FROM Student WHERE address=“320FL”; Generates many alternatives to answer a query Estimates the cost of each alternative Automatically determine and prepare optimal (or near optimal) access plans for getting the data Optimizer = “The Bread and Butter of a DBMS !” 27 Concurrency Control DBMS ensures data is consistent under concurrent access E.g.: multiple airline staff trying to reserve a seat for different customers Concepts: Transactions – grouping multiple instructions (reads/writes) into one atomic unit Locks – locking of resources (tables) 28 Recovery Control If system crashes in middle of transaction, recovery must be provided : Cannot afford to loose data or leave it inconsistent Concepts: Logging of transactions’ actions Ability to redo or undo transactions 29 Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics • How to manage very large amounts of data and extract value and knowledge from them 30 Data Explosion 2 Billion Internet users by 2011 1.3 Billion RFID tags in 2005 30 Billion RFID tags by 2010 4.6 Billon Mobile Phones World Wide Capital market data volumes grew 1,750%, 2003-06 World Data Centre for Climate § 220 Terabytes of Web data § 9 Petabytes of additional data Twitter process 7 terabytes of data every day Facebook process 10 terabytes of data every day 31 Who uses databases? End users DB application programmers Database Administrators Database design Security, Authorization Data availability, crash recovery Database tuning (for performance) 32 Summary : Why study DBMS? Need to process large amounts of data efficiently Video, WWW, computer games, geographic information systems (GIS), genome data, digital libraries, etc. Make use of all functionalities provided by DBMSs DB administrators and programmers hold rewarding jobs DB research is one of the most exciting areas in Computer Science !! 33