Chapter 1 Introduction to Database System < PART 2 > Instructors: Churee Techawut CS (204)321 Database System I Outlines 1) Basic definitions 2) Database system environment 3) Examples of database 4) Typical DBMS functionality 5) Major characteristics of database approach 6) Different types of database users 7) Additional characteristics of database approach 8) When not to use a DBMS 9) Components of a database system 10) Database system concepts and architecture Database System Concepts and Architecture 1) Data models 2) 3) Schemas VS. instances Three-schema architecture 4) 5) 6) Data Independence DBMS language DBMS Interface 7) 8) 9) 10) Database system environment Database system utilities Database architectures Classification of DBMS Data Models Data model “A set of concepts that can be used to describe the structure of a database (data types and relationships) and certain constraints that the database should obey.” Data model operations “Operations for specifying database retrievals and updates by referring to the concepts of the data model.” Operations on the data model may include basic operations and user-defined operations. (e.g. A user-defined operation is COMPUTE_GPA which can be applied to a STUDENT object.) Data Models Categories of data models 1) Conceptual (high-level, semantic) data models: Provide concepts that are close to the way many users perceive data. 2) Physical (low-level, internal) data models: Provide concepts that describe the details of how data is stored in the computer. 3) Implementation (representational) data models: Provide concepts that fall between above two, balancing user views with some computer storage details. Schemas VS. Instances In any data model it is important to distinguish between the description of the database and the database itself. Database schema (or meta-data) “The description of database. It includes description of database structure and the constraints that should hold on the database.” The database schema is specified during database design and is not expected to change frequently. e.g. Name: string StudentNumber: string Class: integer Major: string Schemas VS. Instances Schema diagram “A diagrammatic display of a database schema – structure of each record type (not the actual instances of a record).” STUDENT Name StudentNumber Class COURSE CourseName CourseNumber PREREQUITSITE CourseNumber Major CreditHours Department PrerequisiteNumber SECTION SectionIdentifier CourseNumber Semester GRADE_REPORT StudentNumber SectionIdentifier Grade Year Instructor Schemas VS. Instances Schema construct “An object within the schema.” e.g. STUDENT, COURSE. Database instances “The actual data stored in a database at a particular moment in time. Also called database state or occurrence.” Many database instances can be constructed to correspond to a particular database schema. Schemas VS. Instances Define a new DB Data firstly loaded Specify DB schema DBMS catalog Update operation Database Database Database empty state initial state DBMS ensures valid state Schemas VS. Instances Distinction The database schema does not frequently change, but the database state changes every time the database is updated. Schema is also called intension, whereas state is called extension. Three-Schema Architecture The three-schema architecture was proposed to support DBMS characteristics of : Program-data independence. Supporting multiple views of the data. The goal of the three-schema architecture is to separate the user applications and the physical database. Three-Schema Architecture Schema can be defined at the following three level. 1) Internal schema at the internal level Describes physical storage structures and access paths. Typically uses a physical data model. 2) Conceptual schema at the conceptual level Describes the structure (such as entities, data type, relationship) and constraints for the whole database. Uses a conceptual or implementation data model. 3) External schemas at the external level Describes the various user views. Usually uses the same data model as the conceptual level. Three-Schema Architecture END USERS External level Conceptual level Internal level EXTERNAL VIEW1 EXTERNAL VIEWn CONCEPTUAL SCHEMA INTERNAL SCHEMA STORED DATABASE Source: Elmasri R. & Navathe S.B. (1994) Fundamentals of database systems. Three-Schema Architecture Mappings among schema levels are needed to transform requests and data. If the request is a database retrieval, the data extracted from the stored database must be reformatted to match the user’s external view. Programs refer to an external schema, and are mapped by the DBMS to the internal schema for execution. Notice that the three schemas are only descriptions of data; the only data that actually exists is at the physical level. Data Independence Two types of data independence: 1) Logical data independence The capacity to change the conceptual schema without having to change the external schemas and their application programs. 2) Physical data independence The capacity to change the internal schema without having to change the conceptual (or external) schema. Data Independence When a schema at a lower level is changed, only the mappings between this schema and higher-level schemas need to be changed in a DBMS that fully supports data independence. The higher-level schemas themselves are unchanged. Therefore, the application programs need not be changed since they refer to the external schemas. DBMS Language Once the design of a database is completed and a DBMS is chosen to implement the database: Data Definition Language (DDL) is used by the DBA and by database designers to define the conceptual schema for the database and any mapping between the two. Storage Definition Language (SDL) is used to specify the internal schema. View Definition Language (VDL) are used to specify external schema user views and their mappings to the conceptual schema. DBMS Language Once the database schemas are compiled and the database is populated with data: Data Manipulation Language (DML) are used to specify database retrievals and updates. DML commands (data sublanguage) can be embedded in a generalpurpose programming language (host language), such as COBOL, C or an Assembly Language. In object-oriented systems, the host and data sublanguages typically form one integrated language such as C++. Alternatively, a high-level DML used in stand-alone interactive manner is called a query language. DBMS Language Types of DML 1) Procedural DML (record-at-a-time or low-level DML) Must be embedded in a programming language. Typically retrieve individual records from the database, and use looping and other constructs of the host programming language to retrieve multiple records. Specify how to retrieve data. e.g. COBOL, C, etc. DBMS Language 2) Declarative or Non-procedural DML (set-at-a-time or high-level DML) Use as a stand-alone query language or embedded in a programming language. Typically retrieves information from multiple related database records in a single command. Specify what data to retrieve than how to retrieve. Also called declarative languages. e.g. SQL DBMS Interface Stand-alone query language interfaces Programmer interfaces for embedding DML in programming languages: 1) Pre-compiler Approach 2) Procedure Call Approach DBMS Interface User-friendly interfaces provided by a DBMS 1) Menu-based interface No need to memorize the specific commands and syntax of a query language. 2) Graphical interface Specify query via schema diagram and can be combined with menus. 3) Forms-based interface Usually programmed for parametric users to fill out the form entries to insert new data for creating canned transactions. 4) Natural language interface Accept and interpret requests written in English or some other language. 5) Combination of above Other DBMS Interface Speech as Input and Output Web Browser as an interface Interfaces for parametric users (e.g., bank tellers) Have a small set of operations. Use function keys for minimizing number of keystrokes. Interface for the DBA Use privileged commands for creating accounts, setting system parameters, granting account authorization, changing schema, and reorganizing the storage structure of a database. Database System Environment Source: Elmasri R. & Navathe S.B. (1994) Fundamentals of database systems Database System Environment DBMS components modules are as follows. 1) Stored data manager controls access to DBMS information stored on disk. 2) DDL compiler processes schema definitions, specified in the DDL, and stores descriptions of the schemas (meta-data) in the DBMS catalog. It also compiles commands into object code for database access. 3) Run-time database processor handles database access at run time by executing the request. 4) Query compiler parses and analyzes a query. 5) Precomplier extracts DML commands from an application program written in a host programming language. 6) DML complier compiles DML commands into object code for database access. The rest of the program is sent to the host language compiler. Database System Utilities Common database utilities have the following types of functions 1) Loading existing data files into the database. 2) Backing up copy of the database periodically. 3) Reorganizing database file structures to improve performance. 4) Report generation utilities. 5) Monitoring database usage and providing statistics to the DBA. Other functions, such as sorting, user monitoring, data compression, etc. Database System Utilities Data dictionary is an important and very useful utility. Used to store schema descriptions and other information such as design decisions, application program descriptions, user information, usage standard, etc. Data dictionary vs. DBMS catalog Combination of catalog/data dictionary: Active data dictionary is accessed by DBMS s/w and users/DBA. Passive data dictionary is accessed by users/DBA only. Database Architectures user client Application Application client network Database system user network server Application server Database system Two-tier architecture Three-tier architecture Source: Silberschatz A., Korth, H.F. & Sudarshan S. (2006) Database system concepts. Database Architectures Two Tier Client-Server Architectures Application on client machine invokes database system functionality at the server machine through query language statements. Application program interface like ODBC (Open Database Connectivity) and JDBC (Java Database Connectivity) are used for interaction. Three Tier Client-Server Architectures Client machine communicates with application server only which means it does not contain any direct database calls. Application server communicates with a database system to access data. Appropriate for large applications, and web applications. Classification of DBMS Based on the data model used: Traditional: Relational, Network, Hierarchical Emerging: Object-oriented, Object-relational Other classifications Single-user (typically used with micro-computers) vs. multi-user (most DBMSs). Centralized (uses a single computer with one database) vs. distributed (uses multiple computers, multiple databases) Distributed (or client server based database systems, a set of database servers supports a set of clients) Classification of DBMS Data model is the main criterion used to classify DBMS. 1) Relational data model represents a collection of tables. 2) Network model represents data as record types and limited type of 1:N relationship, called a set type. 3) Hierarchical model represents data as hierarchical tree structures. Each hierarchy represents a number of related records. 4) Object-oriented model defines a database in terms of objects, their properties, and their operations. Object with the same structure and behavior belongs to a class. 5) Object-relational model combines relational data model and objectoriented model to define complex data types. Example of Relational Data Model STUDENT COURSE Name StudentNumber Class Major Smith 17 1 COSC Brown 8 2 COSC CourseName CourseNumber Intro to Computer Science COSC1310 4 COSC Data Structures COSC3320 4 COSC Discrete Mathematics MATH2410 3 MATH Database COSC3380 3 COSC PREREQUISITE CreditHours Department CourseNumber PrerequisiteNumber COSC3380 COSC3320 COSC3380 MATH2410 COSC3320 COSC1310 Source: Elmasri R. & Navathe S.B. (1994) Fundamentals of database systems Example of Relational Data Model (Cont.) SECTION SectionIdentifier CourseNumber Semester Year Instructor 85 MATH2410 Fall 91 King 92 COSC1310 Fall 91 Anderson 102 COSC3320 Fall 92 Knuth 112 MATH2410 Fall 92 Chang 119 COSC1310 Fall 92 Anderson 135 COSC3380 Fall 92 Stone GRADE_REPORT StudentNumber SectionIdentifier Grade 17 112 B 17 119 C 8 85 A 8 92 A 8 102 B 8 135 A Source: Elmasri R. & Navathe S.B. (1994) Fundamentals of database systems Example of a Network Schema STUDENT COURSE COURSE_OFFERINGS IS_A STUDENT_ GRADES HAS_A SECTION SECTION_GRADES PREREQUISITE GRADE_REPORT Source: Elmasri R. & Navathe S.B. (1994) Fundamentals of database systems Example of a Hierarchical Schema DEPARTMENT DNAME DNUMBER MGRNAME MGRSTARTDATE EMPLOYEE NAME SSN BDATE ADDRESS PROJECT PNAME PNUMBER PLOCATION Source: Elmasri R. & Navathe S.B. (1994) Fundamentals of database systems