Part 1: Introduction Junping Sun Database Systems 1-1 Questions 1. What is a database? 2. What is a database system? 3. What is a database management system? 4. What type of data can a database management system handle? 5. What are the science and techniques of database management? 6. What are the most important issues in database systems? 7. What are these differences between database system approaches and other approaches? 8. What is going on in the current database field? Junping Sun Database Systems 1-2 1 Introduction • • • • • • • • • Purposes and Objectives of Database Systems Abstraction and View of Data Data Models Database Languages Data Definition Language v.s. Data Manipulation Language Transaction and Storage Management Database Administrator and Users Database System Architectures Development of Database Systems and Current Trends Issues in Database Systems and Applications Junping Sun Database Systems 1-3 Database Database: • A logical coherent collection of data with some inherent semantic meaning. From a system point of view: • A database is a collection of interrelated data in cluster format. From a management point of view: • • A database is a collection of interrelated data that models an enterprise activity. A database contains information about one particular enterprise. Junping Sun Database Systems 1-4 2 Database Systems Database System: • Database System = Databases + Database Management Systems • It is a collection of interrelated files and a set of programs that allow users to access and modify these files It provides a common repository to a group of interesting users to access the data in the database. It is a computerized data keeping system. Database systems are designed to manage large bodies of information. A database system must provide for the safety of the information stored, despite system crashes or attempts at unauthorized access. If data are to be shared among several users, the system must avoid possible anomalous results. • • • • • Database Systems Junping Sun 1-5 Database Management Systems Database Management System: • A software subsystem managing databases It enables concurrent users to create, operate, and maintain a database. It offers data definition and manipulation facilities. • It is an interface between the computer operating system and a database system, • It is also an interface between database end users and the database system. • The primary goal of the database management system (DBMS) is to provide an environment that is both convenient and efficient to use in retrieving and storing database information. Junping Sun Database Systems 1-6 3 Database Management System as an Interface Applications ..... ..... System Software ..... ..... DBMS Others ..... ..... Operating Systems ..... Computer Hardware Database Systems Junping Sun 1-7 A Simplified Database System Environment Users/Programmers Application Programs/Queries DBMS SOFTWARE SOFTWARE TO PROCESS QUERIES/PROGRAMS SOFTWARE TO ACCESS STORED DATA Stored Database Definition Stored Databases META-DATA Junping Sun Database Systems 1-8 4 Physical Centralized Database Architecture Junping Sun Database Systems 1-9 Simplified Logical Client-Server Architecture Junping Sun Database Systems 1-10 5 Simplified Logical Client-Server Architecture Junping Sun Database Systems 1-11 Purposes of Database Systems • Database systems were developed to handle the following difficulties of typical file-processing systems supported by conventional operating systems. • • • • • • • Data redundancy and inconsistency Difficulty in accessing data Data isolation - multiple files and format Integrity problems Atomicity of updates Concurrency access by multiple users Security problems Junping Sun Database Systems 1-12 6 Data Abstraction and View of Data Objectives of Database Systems: • A major purpose of a database system is to provide users with an abstract view of the data. • The system hides certain details of how the data are stored and maintained. • For the database system to be usable, it must retrieve data efficiently. This concern has led to the design of complex data structures (internal) for the representation of data in the database. • Since many database-system end users are not computer trained, developers hide the complexity from users through several levels of abstraction, to simplify users’ interactions with the system. Database Systems Junping Sun 1-13 The Three Levels of Data Abstraction View Level view 1 view 2 . . . view n Logical Level Physical Level Junping Sun Database Systems 1-14 7 Physical Level Physical Level: • The lowest level of abstraction describes how the data are actually stored. • At the physical level, complex low-level data structures are described in details. • It uses the physical data model to describe the complete details of data storage structures, access paths to the database, and how the data are distributed in the address space. Record Format and Data Blocking: • Fixed length record v.s. variable length • Spanned v.s. unspanned File Organization and Access Paths: • Sequential v.s. random (hashing) • Indices: static v.s. dynamic, balanced v.s. unbalanced, single dimension v.s. multi-dimension. Junping Sun Database Systems 1-15 Logical Level Logical Level: • It describes what data are stored in the database, and what relationships (logical) exist among those data. • The entire database is thus described in terms of a small number of relatively simple structures. • Although the implementation of the simple data structures at the logical level may involve complex physical-level structures, the database end users of logical level do not necessarily need to be aware of this complexity. • The logical level of abstraction is used by database administrators, who must decide what information is to be stored in the database. Junping Sun Database Systems 1-16 8 View Level View Level: • The highest level of abstraction describes only part of the entire database. • Despite of the use of simpler structures at the logical level, some complexity remains, because of the large size of the database. • Many users of the database system will not be concerned with all the information. Since such users need to access only a part of the database. • The interaction between each group of users and the database system is simplified, the view level of abstraction is defined. • A database system may provide many views for the same database. Database Systems Junping Sun 1-17 Three Levels of Database Architecture (ANSI/SPARC) End Users VIEW LEVEL EXTERNAL VIEW1 ... EXTERNAL VIEWN mapping (m:n) view/logical LOGICAL LEVEL LOGICAL SCHEMA mapping (1:1) logical/physical PHYSICAL LEVEL PHYSICAL SCHEMA Stored Databases Junping Sun Database Systems 1-18 9 An Example of the Three Levels External (PL/1) DCL 1 EMPP, 2 EMP# CHAR(6), 2 SAL FIXED BIN(31); External (COBOL) 01 EMPC. 02 EMPNO PIC X(6). 02 DEPTNO PIC X(4). Logical EMPLOYEE EMPLOYEE_NUMBER DEPARTMENT_NUMBER SALARY CHARACTER (6) CHARACTER (4) NUMERIC (5) Physical STORED_EMP LENGTH = 18 PREFIX TYPE = BYTE(6), OFFSET = 0 EMP# TYPE = BYTE(6), OFFSET = 6, INDEX = EMPX DEPT# TYPE = BYTE(4), OFFSET = 12 PAY TYPE = FULLWORD, OFFSET = 16 Junping Sun Database Systems 1-19 Database Instances Database Instances: • The collection of information stored in the database at a particular moment (a particular time point) is called database instances or database status. • The actual content of the database at a particular point in time. • Databases change over time as information is inserted, deleted, and/or updated. Junping Sun Database Systems 1-20 10 Database Schema Database Schema: • • • The overall design of a database is called the database schema. It is the logical structure of the database. The schema are changed infrequently. A database schema corresponds to the programming language type definition. A variable of a given type has a particular value at a given instant. The value of a variable in programming language corresponds to an instance of a database schema. Database Schemas at Three Levels: • Each database has physical schema, logical schema and subschema. • Database systems support one physical schema, one logical schema, several subschemas. Database Systems Junping Sun 1-21 Database Hierarchy Structure User Level Database User Group 1 User Group 2 ..... User Group n Subschema 1 Subschema 2 ..... Subschema n User View External Schema Subschema to Conceptual Schema Mapping Conceptual Level Database DBMS DBA View Database Schema Conceptual/Logical Schema Conceptual to Internal Mapping Physical Level Database OS System Programmer View Internal/Physical Schema Junping Sun Storage Schema Database Systems 1-22 11 Data Independence Data Independence: • The ability to modify a schema definition at one level without affecting a schema definition at the next higher level is called data independence. Physical Data Independence: • The ability to modify the physical schema without causing modifications of both logical schema and application programs. • Modifications at the physical level are occasionally necessary to improve database performance. Logical Data Independence: • The ability to modify the logical schema without causing application programs to be rewritten. • Modifications at the logical level are necessary whenever the logical structure of the database is altered. Junping Sun Database Systems 1-23 Data Models Data Model: • • • • • Data Model = Schema + Operations + Constraints A collection of tools for describing: data, data relationship, data semantics, and data constraints Underlying the structure of a database is the data model. A set of concepts that can be used to describe the structure of a database. A collection of conceptual tools for describing data, data relationships, data semantics, consistency constraints. The various data models that have been proposed fall into three groups: object-based logical models record-based logical models physical models Junping Sun Database Systems 1-24 12 Object-Based Logical Models Object-Based Logical Models: • Object-based logical models are used in describing data at the logical and view levels. • They are characterized by the fact that they provide fairly flexible structuring capabilities and allow data constraints to be specified explicitly. • Several of the more widely known ones are The entity-relationship model [Chen 1976] The object-oriented model [Kim 1990b] The semantic data model [Hull and King 1987] The functional data model [Sibley and Kerschberg 1977][Shipman 1981] Database Systems Junping Sun 1-25 A Sample E-R Diagram social-security customer-street account-number customer-name customer Junping Sun balance customer-city depositor Database Systems account 1-26 13 Record-Based Logical Models Record-Based Logical Models: • Record-based logical models are used in describing data at the logical and view levels. • In contrast to object-based data models, they are used both to specify the overall logical structure of the database and to provide a higher-level description of the implementation. • Record-based models are so named because the database is structured in fixed-format records of several types. Each record type defines a fixed number of fields, or attributes, and each field is usually of a fixed length. • The simplicity of record-based logical model is in contrast to many of the object-based models, whose richer structure often leads to variablelength records at the physical level. Junping Sun Database Systems 1-27 Relational Data Model: • Relational Model, Network Model, Hierarchical Model are three most widely accepted record-based logical data models. • The relational model differs from the network and hierarchical models in that it does not use pointers or physical links. • The relational model relates records by the values they contain in each of tables. This freedom from the use of pointers allows a formal mathematical foundation to be defined. Physical Data Model: • They are used to describe data at the lowest level. Physical data models capture aspects of database system implementation. Junping Sun Database Systems 1-28 14 A Sample Relational Database customer-name social-security customer-street customer-city account-number Johnson Smith Hayes Turner Johnson Jones Lindsay Smith 192-83-7465 019-28-3746 677-89-9011 182-73-6091 192-83-7465 321-12-3123 336-66-9999 019-28-3746 Alma North Main Putnam Alma Main Park North Palo Alto Rye Harrison Stamford Palo Alto Harrison Pittsfield Rye A-101 A-215 A-102 A-201 A-201 A-217 A-222 A-201 Database Systems Junping Sun 1-29 A Sample Network Database A-101 500 Johnson 192-83-7456 Alma Palo Alto Smith 019-28-3746 North Rye Hayes 677-89-9011 Main Jarrison Tumer 182-73-6091 Putnam Stamford Jones 321-12-3123 Main Harrison Lindsay 336-66-9999 Pittsfiel Junping Sun Park Database Systems A-215 700 A102 400 A-305 350 A-201 900 A-217 750 A-222 700 1-30 15 A Sample Hierarchical Database Johnson 192-83-7465 ... Smith 019-28-3746 ... Hayes 677-89-9 ... Tumer 182-73-609 ... Jones 321-12-3123 ... A-101 500 A-201 900 Lindsay 336-66-9999 ... A-102 400 A-217 750 A-215 700 A-201 900 A-305 350 A-222 700 Database Systems Junping Sun 1-31 Database Languages Database Languages: • It is a description and implementation of data models. • It is an interface between database users and a database system. • Database Language = Database Definition Language (DDL) + Database Manipulation Language (DML) • DDL defines database schema • DML describes database operations such as Queries, insertion, deletion, and update Junping Sun Database Systems 1-32 16 Data Definition Language • • A database schema is specified by a set of definitions expressed by a special language called a data definition language. The result of compilation of DDL statements is a set of tables that is stored in a special file called data dictionary or data directory. Data Dictionary: • It is a file that contains meta-data - that is, the data about data (description of data). • The data dictionary is consulted before actual data are read or modified in the database. Data Storage and Definition Language: • The storage structure and access methods used by the database system are specified by a set of definitions in a special type of DDL called data storage and definition language. • The result of compilation of these definitions is a set of instructions to specify the implementation details of the database schemas. Database Systems Junping Sun 1-33 Data Manipulation Language Data Manipulation: • • • • The retrieval of information stored in the database The insertion of new information into the database The deletion of information from the database The modification (update) of information stored in the database Data Manipulation Language: • • • It is a language that enables users to access or manipulate data as organized by the appropriate data model. A query is a statement requesting the retrieval of information. The portion of a DML that involves information retrieval is called a query language. It is common practice to use the terms query language and data manipulation language synonymously. Junping Sun Database Systems 1-34 17 DML- Procedural v.s. Non-Procedural Language There are basically two types of DMLs: Procedural Data Manipulation Languages: • They require a user to specify both what data are needed how to get those data. • Most procedural DMLs are set of procedural call statements and must be embedded in a general purpose programming language. • Procedural DMLs need to make use of programming language constructs such as looping, if.. then. Non-Procedural Data Manipulation Languages: • They require a user to specify what data are needed without specifying how to get those data. • Non-procedural DML statements can be either entered interactively or be embedded in a general purpose programming language. Database Systems Junping Sun 1-35 Transaction Management Transaction: • • • A transaction is a collection of operations that performs a single logical function in a database application. The execution of a program that includes database access operations. Each transaction is a unit of both atomicity and consistency. Transaction Management: • The transaction management components in a DBMS ensures that the database remains in a consistent (correct) state despite system failures (e.g. power failures and operating system crashes) and transaction failures. Junping Sun Database Systems 1-36 18 Properties of Transactions (ACID) • To ensure integrity of the data, it is required that the database system maintains the following properties of the transactions. Atomicity: • A transaction is an atomic unit of processing; it is either performed in its entirety or not performed at all. Consistency: • A correct execution of the transaction must take the database from one consistent state to another. Isolation: • A transaction should not make its updates visible to other transactions until it is committed. Although multiple transactions may execute concurrently, the system guarantees that each transaction is unaware of other transactions executing concurrently. Durability: • Once a transaction changes the database and the changes are committed, these changes must never be lost because of subsequent failures. Junping Sun Database Systems 1-37 Storage Management Storage Manager: • A storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. • The storage manager is responsible for the following tasks: interaction with the file manager efficient storing, retrieving, and updating of data Junping Sun Database Systems 1-38 19 Event Sequences of Record Retrieval in an Application Application Program 2 11 Application Program Subschema 1 Status User Work Area 10 3 Schema DBMS 9 System Buffer 8 5 4 6 7 Operating System Database Junping Sun Physical Storage Schema Database Systems 1-39 Event Sequences of Record Retrieval in an Application 1. Application program sends a request to read a record. Program provides the search key, data type and others. 2. DBMS checks the subschema used by program A and data type. 3. DBMS looks up the global database schema (conceptual/logical schema) and determine the required items. 4. DBMS looks up the physical storage schema description and decides which record to be retrieved. 5. DBMS sends request to operating system and indicates the requested record. 6. Operating system retrieve the requested record from storage device. 7. The requested record is sent to system buffer from storage device. 8. DBMS compares subschema and schema and determines the required items, DBMS transforms the data record in schema format to subschema format. 9. DBMS sends the data in system buffer to program user work area. 10.DBMS provides application program with DB_status value and also if any errors. 11.Then application program can use and process the data. Junping Sun Database Systems 1-40 20 Event Sequences from Different View AP1 Application Programmer View S UWA AP2 ...... S UWA AP1 Subschema DBMS DBA View AP2 Subschema .... Schema Systems Buffer OS System Programmer View Junping Sun Database Physical Storage Schema Database Systems 1-41 DBMS Capabilities • Persistence • The ability of data to persist through different program invocation. Secondary storage management • • Transactions DBMS should support a sequence of primitive atomic reads and writes against the database. Concurrency control DBMS should support transactions of multiple users currently accessing the database. • Recovery It is necessary to restore the database to a consistent state after crashes. • Ad hoc query facility The facility should be reasonably declarative. • • Security Integrity • DBMS map one consistent state onto another Performance Junping Sun Database Systems 1-42 21 Database Administrator Database Administrator: • • Coordinates all the activities of the database system; The database administrator has a good understanding of the information resources and needs in an enterprise. The Duties of Database Administrators: • • • • • • • Schema definition Storage structure and access method definition Schema and physical organization modification Granting user authority to access the database Specify integrity constraints Acting as liaison with users Monitoring performance and responding to changes in requirements Junping Sun Database Systems 1-43 Database Users • Users are differentiated by the way they expect to interact with the system Application Programmers: • Interact with system through DML Sophisticated Users: • Form requests in a database query language Specialized Users: • Write specialized database applications that do not fit into the traditional data processing framework Naive Users: • Invoke one of the permanent application programs that have been written previously Junping Sun Database Systems 1-44 22 Classifications of Database Management Systems Classification Based on Database Models: Hierarchical, Network, Relational, Object-Oriented DBMS Classification Based on the Number of Users: Single User V.S. Multi-users DBMS Classification Based on the Number of Sites: Centralized V.S. Distributed DBMS Classification Based on Geographical Locations: Local v.s. Long Haul DBMS Classification Based on the Type of Databases Involved: Homogeneous V.S. Heterogeneous DBMS Classification Based on the Autonomous Capability: Non-Federated V.S. Federated DBMS Classification Based on the Type of Media: Single or Simple Media V.S. Multimedia DBMS Classification Based on the Time and Dimension Space: Temporal DBMS and Spatial DBMS Classification Based on the Extensibility: Closed vs. Open DBMS Database Systems Junping Sun 1-45 The Evolution of Databases File Systems Network Hierarchical Relational Object-Oriented Languages Semantic Models Complex Object Models Object-Oriented Databases Hypermedia Information Retrieval Artificial Intelligence Intelligent Databases Junping Sun Database Systems 1-46 23 Generations of Database Systems 1. File System: ISAM (Index Sequential Access Methods) VSAM (Virtual Storage Access Methods) 2. Hierarchical Database Systems IMS/VS (Information Management System//Virtual Storage) IMS DB/DC (Database/Data Communication) 3. Network Database Systems CODASYL DBTG (Conference on Data Systems and Languages Database Task Group) 4. Relational Database Systems Relational Data Model by E. F. Codd in 1970 System R, INGRES, System 2000 after 1976 5. Next Generation Database Systems Object-Relational Databases v.s. Object-Oriented Databases Junping Sun Database Systems 1-47 Evolution of Database Systems Evolution from File Management Systems to DBMS • Separation of Data Description and Data Manipulation • Data Integrity, Sharing, and Security • Minimal Redundancy and Storage Space • Easy Data Administration and Control Evolution from Network and Hierarchical DBMS to Relational DBMS • Logical and Physical Data Independence • Procedural to Nonprocedural Interfaces Evolution Navigation to Non-navigation • Enriched Data Model Based on Predicate Logic Concept Evolution from Non-Object-Oriented DBMS to Object-Oriented DBMS • Traditional Database Applications to Non-traditional Database Applications Junping Sun Database Systems 1-48 24 New Database Applications CAD/CAE/CAM/CAP/CASE/CIM Databases Cartographical and Geological Databases Data Intensive Knowledge Based Systems Geographical Databases and Information Systems Historical Databases Graphical, Image, Pictorial and Visual Databases Office Automation and Office Information Systems Real Time Database Systems Scientific Database and Medical Applications Genomic Databases Organic Compound, Spectral Analysis, Genetic Encoding Chromatography Patterns Software Engineering Databases Configuration and Project Management Statistical Databases System Services and Network Management and Modeling Junping Sun Database Systems 1-49 Comparative View of Different Data Processing Approaches Software Engineering with Programming Languages: • Abstract Data Type Data Structure + Operations • Object-Oriented Programming Languages: Class + Messages Artificial Intelligence • Declarative Knowledge v.s. Procedural Knowledge • Extensional Knowledge (Facts) v.s. Intensional Knowledge (Rule) Database Management Systems: • Schema (Data Definitions) and Operations (Data Manipulations) Comparison • Main memory based search methods v.s. secondary based search methods • Functional based v.s. data organization oriented Junping Sun Database Systems 1-50 25 Issues in Database Systems Database Design: Conceptual Design • It produces a high-level, abstract representation of reality (mini-world). Logical Design • It translates this representation into specifications that can be implemented on and processed by a computer system. Physical Design • It determines the physical storage structures and access methods required for efficient access to the contents of a database from secondary storage device. • It depends on the DBMS. Database Management System Design: • To design a system software to provide the user with a means of communicating with the database efficiently. • DBMS is an implementation data model oriented system. Junping Sun Database Systems 1-51 Phases of Database Design for Large Databases Stage 1: Requirements Collections and Analysis Data Requirements Processing Requirements Stage 2: Conceptual Design Conceptual & External Schema Design (DBMS-Independent) Transaction Design (DBMS-Independent) Stage 3: Data Model Conceptual & Stage 4: Physical Design Frequencies performance constraints Internal Schema Design (DBMS-Dependent) Stage 5: Implementation DML Statements Junping Sun Database Systems Implementation 1-52 26 Database Design Stages Data Processing Requirements Information Requirements Requirement Analysis Stage I Requirement Specification Conceptual Design DBMS Characteristics Stage II Information Structure Implementation Design Stage III Hardware/Operating System Characteristics Logical Database Structure Physical Design Stage IV Junping Sun Database Systems 1-53 Important Issues in Database System Domain Semantics of Database Systems: • A database should be correctly designed in terms of database application semantics at the logical level. Performance of Database Systems: • A database system should be able to provide the quickest response to any type of user queries. • This requires the database schema is optimally designed at both the logical level (database modeling) and physical level (database tuning). Junping Sun Database Systems 1-54 27 Hierarchy of Data Types 5th Normal Form 4th Normal Form Boyce-Codd Normal Form 3rd Relational DBMS Normal Form 2nd Normal Form Object-Oriented DBMS 1st Normal Form Non First Normal Form (NF2) Semi-Structural Data (Hypermedia/Hypertext) Junping Sun Database Systems 1-55 Issues in Applications of Computer Information Systems Data Processing (Search) Correct, Efficiency and User Friendly Information Storage and Retrieval Search Algorithms & Access Mechanisms Data Structures Physical Storage Computer Hardware Junping Sun Database Systems 1-56 28 Database Technology Database Technology: • It is a comprehensive application of computer science and other technologies. Compiler Data Structures & Algorithms Operating Systems System Analysis, Modeling and Design Software Engineering AI and Expert Systems Optimization Theory User Interface and Human Factors Network and Distributed Systems Mathematical Predicate Logic Junping Sun Database Languages Storage Structures & Data Access Concurrency Control Database Modeling and Design DB & DBMS Development Heuristic Search and DDS Query Optimization Database Interfaces (GUI) Distributed Database Systems Database Theory Database Systems 1-57 Current Trends of Database Systems • From the Simple Data Model to the Complex Object Data Model Pure Object-Oriented Data Model v.s. Relational-Object Data Model • From Discrete Data Model to Probabilistic Data Model • From Simple/Single Media Data to Multimedia/Hypermedia Data • From Structural Data to Semi-Structural Data • From Single Database to Multidatabase Systems (Homogenous v.s. Heterogeneous) Data Warehousing • From Single Dimensional Applications to Multidimensional Applications (On Line Transaction Processing v.s. On Line Analytical Processing) Junping Sun Database Systems 1-58 29 Current Trends of Database Systems • From Data Query to Data Mining and Knowledge Query • From Simple Text-based Database User Interfaces to Intelligent Graphical User Interfaces • From Centralized Database Systems to Distributed/Parallel Database Systems (Internet v.s. Intranet) • From Single Processor Computer (Server) to Multiple Processor Computer System (Server) • From Single Tier to Multiple Tiers • From Fat Client Applications to Thin Client Applications Junping Sun Database Systems 1-59 Database and DBMS Revisited • Programs = Data Structures + Algorithms Nicklaus Wirth • Database System = Databses + Database Management System(s) • Database Management System = Junping Sun Data Model + Data Structures + Algorithms Database Systems 1-60 30