Subject Name: Database Management systems Subject Code: 20ISE43A Department: ISE 20ISE43A DATABASE MANAGEMENT SYSTEMS Engineered for Tomorrow 2 Engineered for Tomorrow Objective • The objective of this course is to prepare students to design and implement database information systems. It teaches the students a three-stage methodology for designing relational database applications, namely, conceptual, logical, and physical database modeling and design. • In the first stage students will build a conceptual data model (ER-Diagram) that is independent of all physical considerations then to transform this model in the second stage into the logical model (Relational database). Also students will translate the logical data model into a physical design (SQL Queries) for the target DBMS. Module–I INTRODUCTION Engineered for Tomorrow 4 Engineered for Tomorrow Module 1- TOPICS • • • • • • • • • • • • Introduction, An example Characteristics of Database approach Advantages of using DBMS approach A brief history of database applications Data models, schemas and instances Three-schema architecture and data independence Database languages and interfaces The database system environment Centralized and client-server architectures Conceptual Data Modelling using Entities and Relationships Entity types, Entity sets, attributes, roles, and structural constraints Weak entity types, ER diagrams, example 5 Engineered for Tomorrow Learning Outcomes • Students will be able to understand the Database System Environment in detail with its users, characteristics and advantages. • Also the data basic terminologies like data models, schemas and instances are exposed in this unit . • Will known about the types of database systems based on their distribution. 6 Engineered for Tomorrow Why Database Approach Databases touch all aspects of our lives 7 Engineered for Tomorrow 1. INTRODUCTION • Early days database - File systems • Drawbacks: • Data redundancy • Multiple file formats, duplication of information in different files • No data recovery • No backup of data • Inconsistent data • Multiple file updation • Limited data sharing and excessive program maintenance • Hard to add new constraints or change existing ones Engineered for Tomorrow 1.1 Basic Definitions • Database: – A collection of related data. • Data: – Real world facts that can be recorded and have an implicit meaning. • Database Management System (DBMS): – A software package/ system to facilitate the creation and maintenance of a computerized database. (eg. Oracle, SQL Server , MS Access, DB2, MySQL) • Database System: – The DBMS software together with the data itself. Sometimes, the applications are also included.(eg. Flipkart) Engineered for Tomorrow A database has the following implicit properties: • A database represents some aspect of the real world, sometimes called the miniworld or the universe of discourse (UoD). Changes to the miniworld re reflected in the database. • A database is a logically coherent collection of data with some inherent meaning. A random assortment of data cannot correctly be referred to as a database. • A database is designed, built, and populated with data for a specific purpose. It has an intended group of users and some preconceived applications in which these users are interested. 10 Engineered for Tomorrow 1.2 Database system environment 11 Engineered for Tomorrow 1.3 DBMS Functionality • Define a particular database in terms of its data types, structures, and constraints • Construct or Load the initial database contents on a secondary storage medium • Manipulating the database: – Retrieval: Querying, generating reports – Modification: Insertions, deletions and updates to its content – Accessing the database through Web applications • Processing and Sharing by a set of concurrent users and application programs – yet, keeping all data valid and consistent 12 Engineered for Tomorrow • Other features: – Protection or Security measures to prevent unauthorized access – “Active” processing to take internal actions on data – Presentation and Visualization of data – Maintaining the database and associated programs over the lifetime of the database application • Called database, software, and system maintenance 13 Engineered for Tomorrow 2. Example • Mini-world for the example: Part of a UNIVERSITY environment. • Some mini-world entities: • – STUDENTs – COURSEs – SECTIONs (of COURSEs) – (academic) DEPARTMENTs – INSTRUCTORs Some relationships: – SECTIONs are of specific COURSEs – STUDENTs take SECTIONs – COURSEs have prerequisite COURSEs – INSTRUCTORs teach SECTIONs – COURSEs are offered by DEPARTMENTs – STUDENTs major in DEPARTMENTs 14 Engineered for Tomorrow 15 Engineered for Tomorrow 16 Engineered for Tomorrow 2.1 Class Diagram Example 17 Engineered for Tomorrow 3. Characteristics of the Database Approach • Self-describing nature of a database system: – A DBMS catalog stores the description of a particular database (e.g. data structures, types, and constraints) called meta-data (data about data). – This allows the DBMS software to work with different database applications. • Insulation between programs and data: – Called program-data independence. – Allows changing data structures and storage organization without having to change the DBMS access programs. 18 Engineered for Tomorrow Database catalog 19 Engineered for Tomorrow • • • Data Abstraction: – A data model is used to hide storage details and present the users with a conceptual view of the database. – Programs refer to the data model constructs rather than data storage details Support of multiple views of the data: – Each user may see a different view of the database, which describes only the data of interest to that user. Sharing of data and multi-user transaction processing: – Allowing a set of concurrent users to retrieve from and to update the database. – Concurrency control within the DBMS guarantees that each transaction is correctly executed or aborted – Recovery subsystem ensures each completed transaction has its effect permanently recorded in the database – OLTP (Online Transaction Processing) is a major part of database applications. This allows hundreds of concurrent transactions to execute per second. Engineered for Tomorrow 4. Advantages of Using the Database Approach • Controlling redundancy in data storage and in development and maintenance efforts. – Sharing of data among multiple users. • Restricting unauthorized access to data. • Providing persistent storage for program Objects – In Object-oriented DBMSs – see Chapters 11-12 • • • • • • Providing Storage Structures (e.g. indexes) for efficient Query Processing Providing backup and recovery services. Providing multiple interfaces to different classes of users. Representing complex relationships among data. Enforcing integrity constraints on the database. Drawing Inferences and Actions from the stored data using deductive and active rules Engineered for Tomorrow • Others – Enforce standards • Flexibility to change data structures: – Database structure may evolve as new requirements are defined. • Availability of current information: – Extremely important for on-line transaction systems such as airline, hotel, car reservations. • Economies of scale: – Wasteful overlap of resources and personnel can be avoided by consolidating data and applications across departments. 22 Engineered for Tomorrow 5. Historical Development of Database Technology • • • Early Database Applications: – The Hierarchical and Network Models were introduced in mid 1960s and dominated during the seventies. – A bulk of the worldwide database processing still occurs using these models. Relational Model based Systems: – Relational model was originally introduced in 1970, was heavily researched and experimented with in IBM Research and several universities. – Relational DBMS Products emerged in the 1980s. Object-oriented and emerging applications: – Object-Oriented Database Management Systems (OODBMSs) were introduced in late 1980s and early 1990s to cater to the need of complex data processing in CAD and other applications. • Their use has not taken off much. – Many relational DBMSs have incorporated object database concepts, leading to a new category called object-relational DBMSs (ORDBMSs) – Extended relational systems add further capabilities (e.g. for multimedia data, XML, and other data types) Engineered for Tomorrow • Data on the Web and E-commerce Applications: – Web contains data in HTML (Hypertext markup language) with links among pages. – This has given rise to a new set of applications and E-commerce is using new standards like XML (eXtended Markup Language). – Script programming languages such as PHP and JavaScript allow generation of dynamic Web pages that are partially generated from a database • Also allow database updates through Web pages • New functionality is being added to DBMSs in the following areas: – Scientific Applications – XML (eXtensible Markup Language) – Image Storage and Management – Audio and Video data management – Data Warehousing and Data Mining – Spatial data management – Time Series and Historical Data Management Engineered for Tomorrow 6. Data Models • • • Data Model: – A set of concepts to describe the structure of a database, the operations for manipulating these structures, and certain constraints that the database should obey. Data Model Structure and Constraints: – Constructs are used to define the database structure – Constructs typically include elements (and their data types) as well as groups of elements (e.g entity, record, table), and relationships among such groups – Constraints specify some restrictions on valid data; these constraints must be enforced at all times Data Model Operations: – These operations are used for specifying database retrievals and updates by referring to the constructs of the data model. – Operations on the data model may include basic model operations (e.g. generic insert, delete, update) and user-defined operations (e.g. compute_student_gpa, update_inventory) Engineered for Tomorrow 6.1 Categories of Data Models • Conceptual (high-level, semantic) data models: – Provide concepts that are close to the way many users perceive data. • (Also called entity-based or object-based data models.) • Physical (low-level, internal) data models: – Provide concepts that describe details of how data is stored in the computer. These are usually specified in an ad-hoc manner through DBMS design and administration manuals • Implementation (representational) data models: – Provide concepts that fall between the above two, used by many commercial DBMS implementations (e.g. relational data models used in many commercial systems). Engineered for Tomorrow 6.2 Schemas versus Instances • Database Schema: – The description of a database. – Includes descriptions of the database structure, data types, and the constraints on the database. • Schema Diagram: – An illustrative display of (some aspects of) a database schema. • Schema Construct: – A component of the schema or an object within the schema, e.g., STUDENT, COURSE. Engineered for Tomorrow 6.3 Database State • • • • • • • Database State: – The actual data stored in a database at a particular moment in time. This includes the collection of all the data in the database. – Also called database instance (or occurrence or snapshot). • The term instance is also applied to individual database components, e.g. record instance, table instance, entity instance Database State: – Refers to the content of a database at a moment in time. Initial Database State: – Refers to the database state when it is initially loaded into the system. Valid State: – A state that satisfies the structure and constraints of the database. Distinction – The database schema changes very infrequently. – The database state changes every time the database is updated. Schema is also called intension. State is also called extension. Engineered for Tomorrow Database Schema Engineered for Tomorrow 7. Three-Schema Architecture • Proposed to support DBMS characteristics of: – Program-data independence. – Support of multiple views of the data. • Not explicitly used in commercial DBMS products, but has been useful in explaining database system organization Engineered for Tomorrow • Defines DBMS schemas at three levels: – Internal schema at the internal level to describe physical storage structures and access paths (e.g indexes). • Typically uses a physical data model. – Conceptual schema at the conceptual level to describe the structure and constraints for the whole database for a community of users. • Uses a conceptual or an implementation data model. – External schemas at the external level to describe the various user views. • Usually uses the same data model as the conceptual level. 31 Engineered for Tomorrow The three-schema architecture 32 Engineered for Tomorrow • Mappings among schema levels are needed to transform requests and data. – Programs refer to an external schema, and are mapped by the DBMS to the internal schema for execution. – Data extracted from the internal DBMS level is reformatted to match the user’s external view (e.g. formatting the results of an SQL query for display in a Web page) 33 Engineered for Tomorrow 7.1 Data Independence • Logical Data Independence: – The capacity to change the conceptual schema without having to change the external schemas and their associated application programs. • Physical Data Independence: – The capacity to change the internal schema without having to change the conceptual schema. – For example, the internal schema may be changed when certain file structures are reorganized or new indexes are created to improve database performance 34 Engineered for Tomorrow • When a schema at a lower level is changed, only the mappings between this schema and higher-level schemas need to be changed in a DBMS that fully supports data independence. • The higher-level schemas themselves are unchanged. – Hence, the application programs need not be changed since they refer to the external schemas. 35 Engineered for Tomorrow 8. DBMS Languages • Data Definition Language (DDL) • Data Manipulation Language (DML) – High-Level or Non-procedural Languages: These include the relational language SQL • May be used in a standalone way or may be embedded in a programming language – Low Level or Procedural Languages: • These must be embedded in a programming language Storage definition language (SDL) • Specifies the internal schema View definition language (VDL) • Specifies user views/mappings to conceptual schema Engineered for Tomorrow 8.1 DBMS Languages • Data Definition Language (DDL): – Used by the DBA and database designers to specify the conceptual schema of a database. – In many DBMSs, the DDL is also used to define internal and external schemas (views). – In some DBMSs, separate storage definition language (SDL) and view definition language (VDL) are used to define internal and external schemas. • SDL is typically realized via DBMS commands provided to the DBA and database designers • Data Manipulation Language (DML): – Used to specify database retrievals and updates – DML commands (data sublanguage) can be embedded in a general-purpose programming language (host language), such as COBOL, C, C++, or Java. • A library of functions can also be provided to access the DBMS from a programming language – Alternatively, stand-alone DML commands can be applied directly (called a query language). 37 Engineered for Tomorrow 8.2 DBMS Interfaces • Stand-alone query language interfaces – Example: Entering SQL queries at the DBMS interactive SQL interface (e.g. SQL*Plus in ORACLE) • Programmer interfaces for embedding DML in programming languages • User-friendly interfaces – Menu-based, forms-based, graphics-based, etc. • Menu-based interfaces for Web clients or browsing • Forms-based interfaces • Graphical user interfaces • Natural language interfaces • Speech input and output • Interfaces for parametric users • Interfaces for the DBA 38 Engineered for Tomorrow 9. DBMS Component Modules 39 Engineered for Tomorrow • The DBMS system structure is divided into two component modules. • Upper module - different users of the database environment and their interfaces. • Lower module - internals of the DBMS responsible for storage of data and processing of transaction. • UPPER Module • DBA staff • Defines the database and makes changes to its description using the DDL and other privileged commands. • The DBMS catalog stores the description of the schemas that are processed by the DLL compilers. • The catalog includes names and sizes of the files, data types, storage details of each file, mapping information among schemas and constraints. • Casual users (persons with occasional need) • Utilize menu based or form based query interfaces. • The query compiler parses the queries, analysis the data elements for its correctness and converts it into internal form. • The internal query is passed through the query optimizer for rearrangement of operations and elimination of redundancies. 40 Engineered for Tomorrow • Application programmer – Writes programs in host languages. – The precompiler separates the DML command and the host program. – DML commands are submitted to the DML compiler and the rest of the program to the host compiler. – The outputs of DML compiler and host compiler are linked by the compiled transaction to form the executable codes which includes the calls to the runtime database processor. • Parametric users – Supply the parameters (eg. account number and amount for bank withdrawal) using the compiled transactions. • The privileged commands, executable query plan and the compiled transactions with the runtime parameters are executed by the runtime database processor. It refers the data dictionary for updations. 41 Engineered for Tomorrow • LOWER module – Stored data manager carry out the input and output operations between the disk and main memory and provides support for database management including management of access paths and indexes. – The file manager deals with the data structure options for storing in the database and maintains the metadata as well as indexes for various files. – The interfacing between the main memory and disk is handled by the buffer manager. – The transaction manager is responsible for concurrency and crash recovery by maintaining a record of all changes to the database. 42 Engineered for Tomorrow 9.1 Database System Utilities • To perform certain functions such as: – Loading data stored in files into a database. Includes data conversion tools. – Backing up the database periodically on tape. – Reorganizing database file structures. – Report generation utilities. – Performance monitoring utilities. – Other functions, such as sorting, user monitoring, data compression, etc. 43 Engineered for Tomorrow 9.2 Other Tools • Data dictionary / repository: – Used to store schema descriptions and other information such as design decisions, application program descriptions, user information, usage standards, etc. – Active data dictionary is accessed by DBMS software and users/DBA. – Passive data dictionary is accessed by users/DBA only. • Application Development Environments (computer-aided software engineering) tools: • Examples: and CASE – PowerBuilder (Sybase) – JBuilder (Borland) – JDeveloper 10G (Oracle) 44 Engineered for Tomorrow 10.Database Design Conceptual design: (ER Model is used at this stage.) What are the entities and relationships in the enterprise? What information about these entities and relationships should we store in the database? What are the integrity constraints or business rules that hold? A database `schema’ in the ER Model can be represented pictorially (ER diagrams). Can map an ER diagram into a relational schema. Engineered for Tomorrow Modeling A database can be modeled as: a collection of entities, relationship among entities. An entity is an object that exists and is distinguishable from other objects. Example: specific person, company, event, plant Entities have attributes Example: people have names and addresses An entity set is a set of entities of the same type that share the same properties. Example: set of all persons, companies, trees, holidays 46 Engineered for Tomorrow Attributes An entity is represented by a set of attributes, that is descriptive properties possessed by all members of an entity set. Domain – the set of permitted values for each attribute Attribute types: Simple and composite attributes. Single-valued and multi-valued attributes Example: multivalued attribute: phone_numbers Derived attributes Can be computed from other attributes Example: age, given date_of_birth 47 Engineered for Tomorrow Mapping Cardinality Constraints Express the number of entities to which another entity can be associated via a relationship set. Most useful in describing binary relationship sets. For a binary relationship set the mapping cardinality must be one of the following types: One to one One to many Many to one Many to many 48 Engineered for Tomorrow Mapping Cardinalities One to one One to many 49 Engineered for Tomorrow Mapping Cardinalities Many to one Many to many 50 Engineered for Tomorrow 11. ER Model Basics ssn name lot Employees Entity: Real-world object distinguishable from other objects. An entity is described (in DB) using a set of attributes. Entity Set: A collection of similar entities. E.g., all employees. All entities in an entity set have the same set of attributes. (Until we consider ISA hierarchies, anyway!) Each entity set has a key. Each attribute has a domain. 51 Engineered for Tomorrow name ER Model Basics (Contd.) lot ssn since name ssn dname lot Employees did Works_In budget Departments Employees supervisor subordi nate Reports_To Relationship: Association among two or more entities. E.g., Attishoo works in Pharmacy department. Relationship Set: Collection of similar relationships. An n-ary relationship set R relates n entity sets E1 ... En; each relationship in R involves entities e1 E1, ..., en En Same entity set could participate in different relationship sets, or in different “roles” in same set. 52 Engineered for Tomorrow Relationship Sets A relationship is an association among several entities A relationship set is a mathematical relation among n 2 entities, each taken from entity sets {(e1, e2, … en) | e1 E1, e2 E2, …, en En} where (e1, e2, …, en) is a relationship 53 Engineered for Tomorrow Degree of a Relationship Set Refers to number of entity sets that participate in a relationship set. Relationship sets that involve two entity sets are binary (or degree two). Generally, most relationship sets in a database system are binary. Relationship sets may involve more than two entity sets. CSC2110 - Data Structures/Algorithms 54 Engineered for Tomorrow Degree of a Relationship Set Example: Suppose employees of a bank may have jobs (responsibilities) at multiple branches, with different jobs at different branches. Then there is a ternary relationship set between entity sets employee, job, and branch Relationships between more than two entity sets are rare. Most relationships are binary. 55 Engineered for Tomorrow since Additional features of the ER model Key Constraints name ssn dname lot Employees did Manages Departments Consider Works_In: An employee can work in many departments; a dept can have many employees. In contrast, each dept has at most one manager, according to the key constraint on Manages. 1-to-1 1-to Many Many-to-1 Many-to-Many 56 Engineered for Tomorrow Participation Constraints Does every department have a manager? If so, this is a participation constraint: the participation of Departments in Manages is said to be total (vs. partial). Every Departments entity must appear in an instance of the Manages relationship. since name ssn dname did lot Employees Manages budget Departments Works_In since 57 Engineered for Tomorrow 12.Weak Entities A weak entity can be identified uniquely only by considering the primary key of another (owner) entity. Owner entity set and weak entity set must participate in a one-tomany relationship set (one owner, many weak entities). Weak entity set must have total participation in this identifying relationship set. name ssn lot Employees cost Policy pname age Dependents 58 Engineered for Tomorrow Weak Entity Sets An entity set that does not have a primary key is referred to as a weak entity set. The existence of a weak entity set depends on the existence of a identifying entity set it must relate to the identifying entity set via a total, one-tomany relationship set from the identifying to the weak entity set Identifying relationship depicted using a double diamond The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes among all the entities of a weak entity set. The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity set is existence dependent, plus the weak entity set’s discriminator. 59 Engineered for Tomorrow Weak Entity Sets (Cont.) We depict a weak entity set by double rectangles. We underline the discriminator of a weak entity set with a dashed line. payment_number – discriminator of the payment entity set Primary key for payment – (loan_number, payment_number) 60 Engineered for Tomorrow Aggregation Used when we have to model a relationship involving (entitity sets and) a relationship set. Aggregation allows us to treat a relationship set as an entity set for purposes of participation in (other) relationships. name ssn lot Employees Monitors since started_on pid until pbudget dname did budget Projects Sponsors Departments Aggregation vs. ternary relationship: Monitors is a distinct relationship, with a descriptive attribute. Also, can say that each sponsorship is monitored by at most one employee. 61 Engineered for Tomorrow Aggregation n Consider the ternary relationship works_on, which we saw earlier n Suppose we want to record managers for tasks performed by an employee at a branch 62 Engineered for Tomorrow Aggregation (Cont.) Relationship sets works_on and manages represent overlapping information Every manages relationship corresponds to a works_on relationship However, some works_on relationships may not correspond to any manages relationships So we can’t discard the works_on relationship Eliminate this redundancy via aggregation Treat relationship as an abstract entity Allows relationships between relationships Abstraction of relationship into new entity 63 Engineered for Tomorrow E-R Diagram With Aggregation 64 Engineered for Tomorrow Binary vs. Ternary Relationships If each policy is owned by just 1 employee, and each dependent is tied to the covering policy, first diagram is inaccurate. What are the additional constraints in the 2nd diagram? name ssn pname lot Employees Dependents Covers Bad design age Policies policyid cost name pname ssn lot age Dependents Employees Purchaser Beneficiary Better design policyid Policies cost 65 Engineered for Tomorrow Binary vs. Ternary Relationships (Contd.) Previous example illustrated a case when two binary relationships were better than one ternary relationship. An example in the other direction: a ternary relation Contracts relates entity sets Parts, Departments and Suppliers, and has descriptive attribute qty. No combination of binary relationships is an adequate substitute: S “can-supply” P, D “needs” P, and D “deals-with” S does not imply that D has agreed to buy P from S. How do we record qty? 66 Engineered for Tomorrow Summary of Conceptual Design Conceptual design follows requirements analysis, Yields a high-level description of data to be stored ER model popular for conceptual design Constructs are expressive, close to the way people think about their applications. Basic constructs: entities, relationships, and attributes (of entities and relationships). Some additional constructs: weak entities, ISA hierarchies, and aggregation. Note: There are many variations on ER model. 67 Engineered for Tomorrow Summary of ER (Contd.) Several kinds of integrity constraints can be expressed in the ER model: key constraints, participation constraints, and overlap/covering constraints for ISA hierarchies. Some foreign key constraints are also implicit in the definition of a relationship set. Some constraints (notably, functional dependencies) cannot be expressed in the ER model. Constraints play an important role in determining the best database design for an enterprise. 68 Engineered for Tomorrow Summary of ER (Contd.) ER design is subjective. There are often many ways to model a given scenario. Analyzing alternatives can be tricky, especially for a large enterprise. Common choices include: Entity vs. attribute, entity vs. relationship, binary or n-ary relationship, whether or not to use ISA hierarchies, and whether or not to use aggregation. Ensuring good database design: resulting relational schema should be analyzed and refined further. FD information and normalization techniques are especially useful. 69 Engineered for Tomorrow Example • • • • • • Assume we have the following application that models cricket teams, capture the following information in the ER- Diagram We have a set of teams, each team has an ID, name, stadium, and to which country this team belongs. Each team has many players, and each player belongs to one team. Each player has a number, name, DoB, start year, and jersey number that he uses. Teams play matches, in each match there is a host team and a guest team. The match takes place in the stadium of the host team. For each match we need to keep track of the following: – The date on which the match is played – The final result of the match – The players participated in the match. For each player, how many runs he scored, whether or not he took wickets, and whether or not he scored 100’s. Each match has exactly three umpires. For each umpire we have an ID, name, DoB, years of experience. Two umpires are main and the other one is the third umpire. 70 ID DOB ExpYears Name UMPIRE Is Main Name ID Country Host role Date TEAM Score Stadium MATCHES Guest Score Guest Host Score Belongs_to In_match Plays P_Num PLAYER Wickets 100’s P_Name Dob Runs MATCHPLAYER Jersey_No Start_Year 71 Example • Company stores information on its shipped item. The requirements are as follows • Shipped items can be characterized by item number (unique), weight, insurance amount, destination and delivery date. Shipped items are received at a single trade center. • Trade centers are characterized by their type, ID, and address. Shipped items are dispatched via one or more standard transportation. • These transportation are characterized by a unique schedule Number, mode (e.g, airways, waterways, roadways), and a delivery Route. • Create an E-R diagram that captures this information. 72 Item_No Weight Waterways InsurancAmt Shipped Items N Airways Destination N Roadways Via DeliverDate Mode Received From N Transportation ScheduleNo 1 Trade Center ZIP Route ID Address City Type Country Street 73