Lecture 1 DBMS & More Atif Farid Mohammad Adjunct Professor UNCC Introductions How many of you know? PhP or ColdFusion or JavaScript How many of you are? Grads or Undergrads What are your majors? CS or SIS or Something Else DBMS & More… Textbook: • Fundamentals of Database Systems, Elmarsi/Navathe, Benjamin, 2011 (Sixth Edition) Workload: • (30 %) 3 In Class Quiz • (30%) Midterm exam • (40% ) One project, parts 1 & 2 Any Questions ? Objectives • • • • • • • Introduction Example of database systems Database characteristics People associated with database Advantage of Using a DBMS Database implications When Not to use a DBMS 8 • Databases: Introduction – Used to maintain information and to present data to users • Examples of Databases include: – Reservations systems (Hotel, Car, Airline) – Transactions processing systems (Online Banking systems, local library) – Investigations systems(Scientific Database systems) – Multimedia database systems – Geographic information systems (GIS) 9 Core Database Terminologies: 1 • Data? – Any information (most likely in electronic form) worth preserving • E.g., names, addresses, grades, etc. • Database? – A collection of related data describing the activities of one or more organizations ◦ Organized (or structured) for access and modification ◦ Preserved over a long period – E.g. University Database 10 Core Terminologies: 3 • Query? an operation that extracts specified data from the database E.g. get the list of all courses and grades taken by “Smith” • Relation? an organization of data into a two-dimensional table, rows (tuples) represent basic entities or facts of some sort columns (attributes or fields) represent properties of those entities. • Schema? a description of the structure of the data in a database also known as metadata 11 Database Def-1 A database is a shared collection of logically related data that is stored to meet the requirements of different users of an organization Database Def-2 A database is a self-describing collection of integrated records Database Def-3 A database models a particular real world system in the computer in the form of data The concept of a shared organizational database Management Planning Marketing Control Sales Product Development Corporate Database Accounting Accounts Receivable Accounts Payable Manufacturing Scheduling Production A bit of History Computer initially used for computational/ engineering purposes Commercial applications introduced File Processing System File Processing System A collection of programs that perform services for the end-users such as production of reports File Processing Systems Examination Registration Library Applications Examination Applications Registration Applications Library Data Files Examination Data Files Registration Data Files Library Program and Data Interdependence File Processing Systems Library Exam Registration Reg_Number Reg_Number Reg_Number Name Name Name Father Name Address Father Name Books Issued Class Phone Fine Semester Address Grade Class Duplication of Data Incompatible Formats Vulnerable to Inconsistency Database Approach Advantages of Database Approach Library Examination Registration Library Applications Examination Applications Registration Applications Database Management System - Data Sharing - Controlled Redundancy University Students Database - Data Independence - Better Data Integrity Data & Information Company: Super Soft Dept: Sales Emp Name Age Salary Matt Damon 23 55 Bruce Willis 24 55 Katie Holmes 20 40 Robert Langdon 19 20 Schema Database Applications Database Management System (DBMS) Other Advantages Data consistency Better data security Faster development of new applications They also provide Economy of scale Better concurrency control Better backup and recovery procedures Disadvantages Higher costs Conversion cost More difficult recovery Typical Components Software interact Application Programs develop End users Application Programmers “What” to get DBMS Data maintain Database Administrators “How” to get Database design Database Designers Users Levels of Data Real World Data Entity, Attribute e.g. A student, A class name Metadata Data Occurrences Student record, Data item occurrence e.g. ‘s001’, ’Amir’, ‘CS101’ Record type, Data item type e.g. Student record type Employee name, age, qual, sal Emp Name text Age number Sal number John Durso 23 55 Lisa Smith 24 55 Cindy Bates 20 40 Braden Sams 19 20 Levels of Data Real-world data Metadata Data Occurrence Database Users Application Programmers End Users – Naïve – Sophisticated Roles in the Database Environment • Data Administrator (DA) • Database Administrator (DBA) • Database Designers (Logical and Physical) • Application Programmers • End Users (naive and sophisticated) Functions of DBA Schema definition Granting data access Routine Maintenance Backups Monitoring disk space Monitoring jobs running Database Properties • A database must – Represent some aspects of real world – Collection of data must be logically coherent and meaningful – Database is designed, build, and populated with data for specific purpose 34 Database Construction • Database can be constructed – Manually • E.g., a library card catalog – Computerized system • A Specific set of applications • Database Mgt systems 35 Database Mgt system (DBMS) • Database management system (DBMS)? – A collection of programs that enables users • • • • To define (specifying data type, etc) To construct (storing) To manipulate (reading, writing) To share (simultaneous access) • To protect (security & privacy protection) • To modify (changing requirements) • Database system= DBMS + Database 36 37 Example: University Database System • A typical university database system maintaining information regarding – Students – Courses – Grades 38 Example: A University Database 39 Database Engineering • Design of any DB application starts – Requirements definition and analysis – Conceptual Design • Performed using Entity relationship Diagram – Logical Design • Performed using Orcale / Mysquel, etc – Physical Design (storing/accessing/indexing) 40 Main Characteristics of DB • Self-Describing nature of a database system ( catalog =Data + metadata) • Insulation between programs and data ( program-data-independence) • Support of multiple views of the data (virtual data) • Sharing of data and multi-user transaction processing (concurrency control) – Transaction and atomicity • ACID 41 People involved in Database • Involved two types of people – people on the Scene – people behind the Scene 42 People on the Scene – Database administrators (DBA) • Authorization, coordination, supervision of DB – Database designers • Defining, building, maintaining, etc – End Users • Casual, naive, sophisticated – Software Engineer • system analysts and application programmers 43 People behind the Scene • DBMS system designers and implementers • Tool developers • Operators and maintenance personnel 44 File System vs. Database Mgt Systems • Suppose an organization needs to manage a large collection of data, say, 500 GB (i.e., 500X1024 MB) • Data is supposed to be accessed concurrently by employees • Changes made to the data must be applied consistently • Access to the data must be restricted 45 File systems vs. A DBMS (cont’) • Using file systems, the data is stored in OS files – Not enough MM – Difficulty to directly access data (with 32-bit , we can access 4GB) – Need to write special program to answer each question – Duplicated efforts – inconsistency – Concurrent accesses – Security policies –… 46 Primary Advantages of Using a DBMS: 1 • Controlling Redundancy – Duplicate space and efforts – inconsistency • Restricting Unauthorized Access – Security and authorization subsystem • Providing persistent storage for program objects and data structure – Impedance mismatch problem (incompatibility between PL and DBMS) • Deriving new information from existing ones (view) 47 Primary Advantages of Using a DBMS: 2 • • • • • • Providing Multiple User Interfaces Representing Complex Relationships among Data Enforcing Integrity Constraints Providing Backup and crash Recovery Scheduling concurrent accesses to the data Reducing application development time 48 Implication of the Database Approach • Standards can be enforced (data and display format) • Application development time can be reduced • Flexibility and maintainability • Availability of up-to date information • Economics of Scale (i.e., reducing overall costs of operations, sources and mgt) 49 When Not to Use A DBMS • When there is unnecessary overhead resulted from: – High initial up-front cost – Generality that a DBMS provides for defining and processing data – Overhead for providing • • • • security concurrency recovery integrity 50 When to Use File System • A File system is desirable under these conditions: – The database and application are simple, welldefined, and unchangeable – Single-user access to the data is required – performance 51 So, what is a Database? • Collection of data central to some enterprise • Essential to operation of enterprise – Contains the only record of enterprise activity • An asset in its own right – Historical data can guide enterprise strategy – Of interest to other enterprises • State of database mirrors state of enterprise – Database is persistent 52 What is a Database Management System? • A Database Management System (DBMS) is a program that manages a database: – Supports a high-level access language (e.g. SQL). – Application describes database accesses using that language. – DBMS interprets statements of language to perform requested database access. 53 What is a Transaction? • When an event in the real world changes the state of the enterprise, a transaction is executed to cause the corresponding change in the database state – With an on-line database, the event causes the transaction to be executed in real time • A transaction is an application program with special properties - discussed later - to guarantee it maintains database correctness 54 What is a Transaction Processing System? • Transaction execution is controlled by a TP monitor – Creates the abstraction of a transaction, analogous to the way an operating system creates the abstraction of a process – TP monitor and DBMS together guarantee the special properties of transactions • A Transaction Processing System consists of TP monitor, databases, and transactions 55 transactions Transaction Processing System DBMS database DBMS database Transaction Processing Monitor 56 System Requirements • High Availability: on-line => must be operational while enterprise is functioning • High Reliability: correctly tracks state, does not lose data, controlled concurrency • High Throughput: many users => many transactions/sec • Low Response Time: on-line => users are waiting 57 System Requirements (con’t) • Long Lifetime: complex systems are not easily replaced – Must be designed so they can be easily extended as the needs of the enterprise change • Security: sensitive information must be carefully protected since system is accessible to many users – Authentication, authorization, encryption 58 Roles in Design, Implementation, and Maintenance of a TPS • System Analyst - specifies system using input from customer; provides complete description of functionality from customer’s and user’s point of view • Data Scientists: use all available data sources (internal and external) to analyze and gain insights to help decision makers. • Database Designer - specifies structure of data that will be stored in database. 59 Roles in Design, Implementation and Maintenance of a TPS (con’t) • Application Programmer - implements application programs (transactions) that access data and support enterprise rules • Database Administrator - maintains database once system is operational: space allocation, performance optimization, database security • System Administrator - maintains transaction processing system: monitors interconnection of HW and SW modules, deals with failures and congestion 60 OLTP vs. OLAP • On-line Transaction Processing (OLTP) – Day-to-day handling of transactions that result from enterprise operation – Maintains correspondence between database state and enterprise state • On-line Analytic Processing (OLAP) – Analysis of information in a database for the purpose of making management decisions 61 OLAP • Analyzes historical data (terabytes) using complex queries • Due to volume of data and complexity of queries, OLAP often uses a data warehouse • Data Warehouse - (offline) repository of historical data generated from OLTP or other sources • Data Mining - use of warehouse data to discover relationships that might influence enterprise strategy 62 Examples - Supermarket • OLTP – Event is 3 cans of soup and 1 box of crackers bought; update database to reflect that event • OLAP – Last winter in all stores in northeast, how many customers bought soup and crackers together? • Data Mining – Are there any interesting combinations of foods that customers frequently bought together? 63 Big Data Era • Every 2 days we create as much information as we did from the dawn of civilization up to 2003. (2010, Eric Shmidt, Google) • Science: Astronomy, Physics, Bioinformatics, Neuroinfomatics, earth science, etc. • Business: Automobile, Healthcare, Financial, infotaiment • The automotive industry is projected to be the 2nd largest generator of data by 2015 64 Big Data Opportunities to Auto • Google self-driving car (generates 1GB/s). • Recommendation/alert to customers. • Manufactures can better understand customers and market trends. • Better driving behaviors, better cars, less accidents, and the bottom line is happier customers. 65 Big Data Opportunities to Auto • Google self-driving car (generates 1GB/s). • Recommendation/alert to customers. • Manufactures can better understand customers and market trends. • Better driving behaviors, better cars, less accidents, and the bottom line is happier customers. 66 Why is Database Important in the Future? “In the future, … any given discipline advances is likely to depend on… database, workflow management, visualization, and cloud computing technologies. “ G. Bell, T. Hey, A. Szalay, “Beyond the Data Deluge,” Science, Vol. 323, no. 5919, pp. 1297-1298, 2009. Turing Awardees in DB Charles Bachman (1973) Edgar F. Codd (1981) Jim Gray (1998) 68 Charles Bachman Developer of IDS: the first database system Edgar. F. Codd Inventor of the Relational Model 70 Jim Gray Founder of Transaction Processing 71 Questions ?