Chapter 2 DataBase System Concepts and Architecture 2.1 Data Models, Schemas, and Instances 2.2 DBMS Architecture and Data Independence 2.3 Database Languages and Interfaces 2.4 The Database System Environment 2.5 Classification of Database Management Systems 2.6 Summary 2-1 2-1 2.1 Data Models, Schemas, and Instances ‧data types ‧relationships Data Model: A set of concepts to describe the structure of a database, and certain constraints that the database should obey. Provide data abstraction Data Model Operations: Operations for specifying database retrievals and updates by referring to the concepts of the data model. ‧generic operation: insert, delete, modify, retrieve ‧user-defined operations 2-2 2-2 2.1.1 Categories of Data Models: - Conceptual (high-level, semantic) data models: Provide concepts that are close to the way many users perceive data. (Also called entity-based or object-based data models.) ‧entity ‧attribute ‧relationship - Physical (low-level, internal) data models: Provide concepts that describe details of how data is stored in the computer. ‧record formats ‧record ordering ‧access paths - Implementation (record-oriented) data models: Provide concepts that fall between the above two, balancing user views with some computer storage details. ‧relational ‧network ‧hierarchical 2-2 2-3 2.1.2 Schemas, Instances and Database State cf database Database Schema (meta-data): The description of a database. Includes descriptions of the database structure and the constraints that should hold on the database. Schema Diagram: A diagrammatic display of (some aspects of ) a database schema. (refer to Fig 2.1 2-5) Database Instance: The actual data stored in a database at a particular moment in time. Also called database state ( or occurrence, snapshot) (refer to Fig 1.2 2-6) Each schema construct has its own current set of instances. The database schema changes very infrequently. The database state changes every time the database is updated. Schema is also called intension, whereas state is called extension. 2-3 2-4 Figure 2.1 Schema diagram for UNIVERSITY database schema construct Known data: name of record types, data items 2-4a 2-5 Figure 1.2 UNIVERSITY Database 2-4 2-6 define empty state load initial state update valid state state satisfy database schema update 2-3 2-7 2.2 DBMS Architecture and Data Independence 2.2.1 Three-Schema Architecture Proposed to support DBMS characteristics of: - Insulation of programs and data/program and operations (program-data and program-operation independence) - Support of multiple views of the data. - Use of catalog (database description) Defines DBMS schema at three levels: (see 2-9) - Internal schema at the internal level to describe data storage structures and access paths. Typically uses a physical data model. - Conceptual schema at the conceptual level to describe the structure and constraints for the whole database. Uses a conceptual or an implementation data model. - External schema at the external level to describe the various user views. Usually uses the same data model as the conceptual level or high-level data model. Mappings among schema levels are also needed. Programs refer to an external schema, 2-5 schema for execution 2-8 and are mapped by the DBMS to the internal Figure 2.2 The Three-schema architecture 2-6 2-6 2-9 2.2.2 Data Independence By adding or removing a record type or data item to · expand the database (2-11) · reduce the database Logical Data Independence: The capacity to change the conceptual schema without having to change the external schemas and their application programs. Physical Data Independence: The capacity to change the internal schema without having to change the conceptual schema. Reorganize physical files to improve performance e.g. List all sections offered in Fall 1998 When a schema at a lower level is changed, only the mappings between this schema and higher-lever schemas need to be changed in a DBMS that fully supports data independence. The higher-level schemas themselves are unchanged. Hence, the application programs need not be changed since they refer to the external schemas. Disadvantages of two levels of mappings: Overhead during compilation or execution of a query or program 2-7 2-10 UNIVERSITY Conceptual Schema STUDENT (Name, Student Number, Class, Major) COURSE (Course Name, Course Number, Credit, Dept) PREREQUISITE (Course Number, Prerequisite Number) SECTION (Section Id, Course Number, Semester, Year, Instructor) GRADE_REPORT(Student Number, Section Id , Grade) UNIVERSITY External Schema TRANSCRIPT(Student Name, Course Number, Grade, Semester, Year, Section Id) derived from STUDENT, SECTION, GRADE_REPORT PREREQUISITES(Course Name, Course Number, Prerequisites) derived from PREREQUISITE, COURSE Change GRADE-REPORT Schema Construct GRADE_REPORT (Student Number, Student Name, Section Id, Course Number, Grade) Change Mapping (& View Definition) TRANSCRIPT derived from SECTION, GRADE_REPORT 2-7a 2-11 2.3 Database Languages and Interfaces provide appropriate languages and interfaces for each category of users. 2.3.1 DBMS Languages Data Definition Language (DDL): Used by the DBA and database designers to specify the conceptual schema of a database. In many DBMSs, the DDL is also used to define internal and external schemas (views). In some DBMSs, separate storage definition language (SDL) and view definition language (VDL) are used to define internal and external schemas. DDL Compiler Data Manipulation Language (DML): Used to specify database retrievals and updates (insertion, deletion, modifications) - DML commands (data sublanguage) can be embedded in a general-purpose programming language (host language). - Alternatively, stand-alone DML commands can be applied directly (query language). 2-8 2-12 Types of DML -Procedural DML: • Also called record-at-a-time (record-oriented) or low-level DML • Must be embedded in a programming language. • Searches for and retrieves individual database records and uses looping and other constructs of the host programming language to retrieve multiple records. -Declarative or non-procedural DML: • Also called set-at-a-time (set-oriented) or high-level DML. • Can be used as a stand-alone query language or can be embedded in a programming language. • Searches for and retrieves information from multiple related database records in a single command. - host language: general-purpose language - data sublanguage: DML - C++ 2-9 2-13 2.3.2 DBMS Interfaces - Stand-alone query language interfaces. (casual end user) - Programmer interfaces for embedding DML in programming languages: (programmer) -Pre-compiler Approach -Procedure (Subroutine) Call Approach - User-friendly interfaces: -Menu-based Interfaces for Browsing. -Forms-based Interfaces. -Graphical User Interfaces. -Natural language Interfaces -Combination of the above -Interfaces for Parametic Users (using function keys) - Interfaces for the DBA: -Creating accounts, granting authorizations -Setting system parameters 2-10 -Changing schemas or access path 2-14 2.4 The Database System Environment 2.4.1 DBMS Component Modules Figure 2.3 2-11 2-15 2.4.2 Database System Utilities To perform certain functions such as: - Loading data stored in files into a database. Conversion tool - Backing up the database periodically on storage. - File reorganizing database file structures. - Report generation utilities. - Performance monitoring utilities. - Other functions, such as sorting, user monitoring, data compression, etc. 2-12 2-16 2.4.3 Tools, Application Environments, and Communications Facilities Data dictionary utility: - Used to store schema descriptions and other information such as design decisions, application program descriptions, user information, usage standards, etc. (comment) -Active data dictionary is accessed by DBMS software and users/DBA. -Passive data dictionary is accessed by users/DBA only. Communications Facilities - Allow users at locations remote from the database system site to access the database. DB (DBMS)/DC (Data Communication System) 2-12 2-17 2.5 Classification of Database Management Systems Based on the data model used: •Data models -Traditional: Relational, Network (see 2-19), Hierarchical - Emerging: Object-oriented, Semantic, Entity- Relationship, other. Other classifications: •Number of users : Single-user (typically used with personal computers) vs. multi-user (most DBMSs) •Number of sites: Centralized (uses a single computer) vs. distributed (uses multiple computers). Homogeneous vs. Heterogeneous • Cost of DBMS software. $10,000~100,000 $100~3,000 •Types of access paths used. (inverted file structures, …) •Purpose general purpose special purpose e.g. airline reservations, telephone directory, on-line transaction processing system 2-13 2-18 Figure 2.4 A Network Schema 2-14 2-19