Database Management System Overview: The database management system consists of two parts. They are: 1. Database and 2. Management System What is a Database? To understand database, first we need to start from data, which is the basic building block of any DBMS. Data: Facts, figures, statistics etc. having no particular meaning (e.g. 1, Raman, 19 etc). Record: Collection of related data items, e.g. in the above example the three data items had no meaning. But if we organize them in the following way, then they collectively represent meaningful information. Roll 1 Name Raman Age 19 Table or Relation: Collection of related records. Roll 1 2 3 Name Raman Jay Raj Age 19 20 18 The columns of this table or relation are called Fields, Attributes or Domains. The rows are called Tuples or Records. Database: The database is a collection of inter-related data which is used to retrieve, insert and delete the data efficiently. Collection of related relations. Consider the following collection of tables: Roll 1 2 3 Name Raman Jay Raj Age 19 20 18 For example: The college Database organizes the data about the admin, staff, students and faculty etc. In a database, data is organized strictly in row and column format. The rows are called Tuple or Record. The data items within one row may belong to different data types. On the other hand, the columns are often called Domain or Attribute. All the data items within a single attribute are of the same data type. Database Management System: Database management system is a software which is used to manage the database. For example: MySQL, Oracle, SQL Server etc. are a very popular commercial database which is used in different applications. DBMS and its applications: A Database management system is a computerized recordkeeping system. The overall purpose of DBMS is to allow the users to define, store, retrieve and update the information contained in the database on demand. Some of the major areas of application are as follows: 1. Banking 2. Airlines 3. Universities 4. Manufacturing and selling 5. Human resources File System: Data file is a collection of related records stored on a storage medium such as a hard disk or optical disc. While a database is a collection of data organized in a manner that allows access, retrieval, and use of that data. Keeping information in a file processing system has a number of major disadvantages: ● Data redundancy and inconsistency ✔ Multiple file formats, duplication of information in different files ● Difficulty in accessing data ✔ Need to write a new program to carry out each new task ● Data isolation ✔ multiple files and formats ● Integrity problems ✔ Hard to add new constraints or change existing ones ● Atomicity problems ✔ Failures may leave database in an inconsistent state with partial updates carried out ✔ E.g. transfer of funds from one account to another should either complete or not happen at all ● Security problems DBMS VS File System DBMS File System ● DBMS is a collection of data. In DBMS, the user is not required to write the procedures. ● File system is a collection of data. In this system, the user has to write the procedures for managing the database. ● DBMS gives an abstract view of data that hides the details. ● File system provides the detail of the data representation and storage of data. ● DBMS provides a crash recovery mechanism, i.e., DBMS protects the user from the system failure. ● File system doesn't have a crash mechanism, i.e., if the system crashes while entering some data, then the content of the file will lost. ● DBMS provides a good protection mechanism ● It is very difficult to protect a file under the file system. ● DBMS takes care of Concurrent access of data using some form of locking. ● In the File system, concurrent access has many problems like redirecting the file while other deleting some information or updating some information. Database Abstraction: Database systems are made-up of complex data structures. To ease the user interaction with database, the developers hide internal irrelevant details from users. This process of hiding irrelevant details from user is called data abstraction. Internal or Physical level: This is the lowest level of data abstraction. It describes how data is actually stored in database. You can get the complex data structure details at this level. Conceptual or logical level: This is the middle level of 3-level data abstraction architecture. It describes what data is stored in database. View level: Highest level of data abstraction. This level describes the user interaction with database system. Database Architecture: The DBMS design depends upon its architecture. DBMS architecture depends upon how users are connected to the database to get their request done. Types of DBMS Architecture: 1-Tier Architecture: ● In this architecture, the database is directly available to the user. ● The 1-Tier architecture is used for development of the local application, where programmers can directly communicate with the database for the quick response. 2-Tier Architecture: The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the client end can directly communicate with the database at the server side. For this interaction, API's like: ODBC, JDBC are used. 3-Tier Architecture: The 3-Tier architecture contains another layer between the client and server. In this architecture, client can't directly communicate with the server. Data Independence: Data Independence is defined as a property of DBMS that helps you to change the Database schema at one level of a database system without requiring to change the schema at the next higher level. In DBMS there are two types of data independence1. Physical data independence 2. Logical data independence Before we learn Data Independence, a refresher on Database Levels is important. The database has 3 levels as shown in the diagram below Physical Data Independence ● ● Physical data independence helps you to separate conceptual levels from the internal/physical levels. It allows you to provide a logical description of the database without the need to specify physical structures. ● With Physical independence, you can easily change the physical storage structures or devices without an effect on the conceptual schema. Examples of changes under Physical Data Independence Due to Physical independence, any of the below change will not affect the conceptual layer. ● ● ● ● ● ● ● Using a new storage device like Hard Drive or Magnetic Tapes Modifying the file organization technique in the Database Switching to different data structures. Changing the access method. Modifying indexes. Changes to compression techniques or hashing algorithms. Change of Location of Database from say C drive to D Drive Logical Data Independence Logical Data Independence is the ability to change the conceptual scheme without changing 1. External views 2. External programs When compared to Physical Data independence, it is challenging to achieve logical data independence. Examples of changes under Logical Data Independence ● ● ● Add/Modify/Delete a new attribute, entity or relationship is possible without a rewrite of existing application programs. Merging two records into one. Breaking an existing record into two or more records. Importance of Data Independence: ● ● ● ● Helps you to improve the quality of the data. Database system maintenance becomes affordable. Enforcement of standards and improvement in database security. You don't need to alter data structure in application programs Models of Database Architecture: Hierarchical, Network and Relational Models Conceptually, there are three broad options with regard to database models. These are: a. Hierarchical model b. Network model c. Relational model (a) Hierarchical model: This model presents data to users in a hierarchy of data elements that can be represented in a sort of inverted tree. (b) Network model: In the network model of database, there are no levels and a record can have any number of owners and also can have ownership of several records. This model is the same as the hierarchical model, the only difference is that a record can have more than one parent. (c) Relational model: The most recent and popular model of database design is the relational database model. This model was developed to overcome the problems of complexity and inflexibility of the earlier two models in handling databases with many-to-many relationships between entities. Database Schema and instance: A database schema is the skeleton structure that represents the logical view of the entire database. A database schema is a blueprint or architecture of how our data will look. It doesn’t hold data itself, but instead describes the shape of the data and how it might relate to other tables or models. An entry in our database will be an instance of the database schema. It will contain all of the properties described in the schema. Database Languages: Database languages are used to read, update and store data in a database. There are several such languages that can be used for this purpose; one of them is SQL (Structured Query Language). 1. Data Definition Language: DDL stands for Data Definition Language. ● It is used to define database structure or pattern. ● It is used to create schema, tables, indexes, constraints, etc. in the database. ● Using the DDL statements, you can create the skeleton of the database. ● Data definition language is used to store the information of metadata like the number of tables and schemas, their names, indexes, columns in each table, constraints, etc. ● Here are some tasks that come under DDL: Create: It is used to create objects in the database. Alter: It is used to alter the structure of the database. Drop: It is used to delete objects from the database. Truncate: It is used to remove all records from a table. Rename: It is used to rename an object. Comment: It is used to comment on the data dictionary. These commands are used to update the database schema that's why they come under Data definition language. 2. Data Manipulation Language: DML stands for Data Manipulation Language. It is used for accessing and manipulating data in a database. Here are some tasks that come under DML: Select: It is used to retrieve data from a database. Insert: It is used to insert data into a table. Update: It is used to update existing data within a table. Delete: It is used to delete all records from a table. Merge: It performs UPSERT operation, i.e., insert or update operations. 3. Data Control Language: DCL stands for Data Control Language. It is used to retrieve the stored or saved data. DCL commands are as follows 1. GRANT 2. REVOKE It is used to grant or revoke access permissions from any database user. Interfaces in DBMS: A database management system (DBMS) interface is a user interface which allows for the ability to input queries to a database without using the query language itself. ● ● Menu-Based Interfaces for Web Clients or Browsing Forms-Based Interfaces ● ● Graphical User Interface Interfaces for DBA Data Models in DBMS: ● ● ● Data Model is a logical structure of Database. Data Models are used to show how data is stored, connected, accessed and updated in the database management system. We use a set of symbols and text to represent the information so that members of the organization can communicate and understand it. Entity-Relationship Data Model: ● An ER model is the logical representation of data as objects and relationships among them. ● In ER modeling, the database structure is portrayed as a diagram called an entityrelationship diagram. ● It is very easy and simple to understand so it can be used by the developers to communicate with the stakeholders. ER diagram has the following three components: Entities: Entity is a real-world thing. It can be a person, place, or even a concept. Example: Teachers, Students, Course, Building, Department etc are some of the entities of a School Management System. ● In the ER diagram, an entity can be represented as rectangles. Attributes: An entity contains a real-world property called attribute. This is the characteristics of that attribute. Example: The entity teacher has the property like teacher id, salary, age, etc. ● Eclipse is used to represent an attribute. Relationship: Relationship tells how two attributes are related. Example: Teacher works for a department. ● Diamond or rhombus is used to represent the relationship. Weak Entity An entity that depends on another entity called a weak entity. The weak entity doesn't contain any key attribute of its own. The weak entity is represented by a double rectangle. Attributes: ● Key Attribute The key attribute is used to represent the main characteristics of an entity. It represents a primary key. The key attribute is represented by an ellipse with the text underlined. ● Composite Attribute An attribute that composed of many other attributes is known as a composite attribute. The composite attribute is represented by an ellipse, and those ellipses are connected with an ellipse. ● Multivalued Attribute An attribute can have more than one value. These attributes are known as a multivalued attribute. The double oval is used to represent multivalued attribute. For example, a student can have more than one phone number. Derived Attribute An attribute that can be derived from other attribute is known as a derived attribute. It can be represented by a dashed ellipse. For example, a person's age changes over time and can be derived from another attribute like Date of birth. Mapping Constraint: A mapping constraint is a data constraint that expresses the number of entities to which another entity can be related via a relationship set. There are four types of relationships: 1. One to One 2. One to Many 3. Many to One 4. Many to Many 1. One to One Relationship: When a single instance of an entity is associated with a single instance of another entity then it is called one to one relationship. 2. One to Many Relationship: When a single instance of an entity is associated with more than one instances of another entity then it is called one to many relationship. 3. Many to One Relationship When more than one instances of an entity is associated with a single instance of another entity then it is called many to one relationship. For example – many students can study in a single college but a student cannot study in many colleges at the same time. 4. Many to Many Relationship When more than one instances of an entity is associated with more than one instances of another entity then it is called many to many relationship. Keys: ● ● Keys play an important role in the relational database. It is used to uniquely identify any record or row of data from the table. It is also used to establish and identify relationships between tables. For example: In Student table, ID is used as a key because it is unique for each student. In PERSON table, passport number, license number, SSN are keys since they are unique for each person. Types of key: 1. Super Key: ● Super key is a set of an attribute which can uniquely identify a tuple. ● In the above EMPLOYEE table, for (EMPLOEE_ID, EMPLOYEE_NAME) the name of two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can also be a key. 2. Candidate key ● A candidate key is an attribute or set of an attribute which can uniquely identify a tuple. ● CANDIDATE KEY is a set of attributes that uniquely identify tuples in a table. Candidate Key is a super key with no repeated attributes. ● Every table must have at least a single candidate key. A table can have multiple candidate keys but only a single primary key. 3. Primary key: ● It is the first key which is used to identify one and only one instance of an entity uniquely. An entity can contain multiple keys as we saw in PERSON table. ● In the EMPLOYEE table, ID can be primary key since it is unique for each employee. In the EMPLOYEE table, we can even select License Number and Passport Number as primary key since they are also unique. 4. Foreign key: ● Foreign keys are the column of the table which is used to point to the primary key of another table. ● In a company, every employee works in a specific department, and employee and department are two different entities. So we can't store the information of the department in the employee table. That's why we link these two tables through the primary key of one table. ● The use of a foreign key is simply to link the attributes of two tables together with the help of a primary key attribute. Thus, it is used for creating and maintaining the relationship between the two relations. SID Name A B A B C Marks 78 60 78 60 80 Department CS EE CS EE IT Course C1 C2 C2 C3 C2 Participation Constraint: ● ● ● Participation constraint specifies the existence of an entity when it is related to another entity in a relationship type. Minimum cardinality is the minimum number of instances of an entity that can be associated with each instance of another entity. Maximum cardinality is the maximum number of instances of an entity that can be associated with each instance of another entity. There are two types of participation constraints: Total and Partial Participation Total Participation: ● ● It specifies that each entity in the entity set must compulsorily participate in at least one relationship instance in that relationship set. Total participation is represented using a double line between the entity set and relationship set. Partial Participation: ● ● It specifies that each entity in the entity set may or may not participate in the relationship instance in that relationship set. Partial participation is represented using a single line between the entity set and relationship set. Generalization: Generalization uses bottom-up approach where two or more lower level entities combine together to form a higher level new entity. These two entities have two common attributes: Name and Address, we can make a generalized entity with these common attributes. We have created a new generalized entity Person and this entity has the common attributes of both the entities. Specialization: ● ● ● ● It is a process in which an entity is divided into sub-entities. Specialization is a top-down process. The idea behind Specialization is to find the subsets of entities that have few distinguish attributes. For example – Consider an entity employee which can be further classified as subentities Technician, Engineer & Accountant because these sub entities have some distinguish attributes. Aggregation: Aggregation is a process in which a single entity alone is not able to make sense in a relationship so the relationship of two entities acts as one entity. Reduction of ER diagram to Table: The database can be represented using the notations, and these notations can be reduced to a collection of tables. 1. A strong entity set with only simple attributes will require only one table in relational model. ● Attributes of the table will be the attributes of the entity set. ● The primary key of the table will be the key attribute of the entity set. 2. A strong entity set with any number of composite attributes will require only one table in relational model. ● While conversion, simple attributes of the composite attributes are taken into account and not the composite attribute itself. 3. A strong entity set with any number of multi valued attributes will require two tables in relational model. ● One table will contain all the simple attributes with the primary key. ● Other table will contain the primary key and all the multi valued attributes. 5. Translating Relationship Set into a Table- ● ● ● A relationship set will require one table in the relational model. Attributes of the table arePrimary key attributes of the participating entity sets. Its own descriptive attribute. 6. For Binary Relationships with Cardinality RatiosThe following four cases are possible- Case-01: Binary relationship with cardinality ratio m:n Case-02: Binary relationship with cardinality ratio 1:n Case-03: Binary relationship with cardinality ratio m:1 Case-04: Binary relationship with cardinality ratio 1:1 ● For Binary Relationship With Cardinality Ratio m:n Here, three tables will be required1. A ( a1 , a2 ) 2. R ( a1 , b1 ) 3. B ( b1 , b2 ) ● For Binary Relationship With Cardinality Ratio 1:n Here, two tables will be required1. A ( a1 , a2 ) 2. BR ( a1 , b1 , b2 ) ● For Binary Relationship With Cardinality Ratio m:1 Here, two tables will be required1. AR ( a1 , a2 , b1 ) 2. B ( b1 , b2 ) ● For Binary Relationship with Cardinality Ratio 1:1 Here, two tables will be required. Either combine ‘R’ with ‘A’ or ‘B’ Way-01: 1. AR ( a1 , a2 , b1 ) 2. B ( b1 , b2 ) Way-02: 1. A ( a1 , a2 ) 2. BR ( a1 , b1 , b2 ) Extended ER Diagram: ● Enhanced entity-relationship (EER) diagrams are basically an expanded upon version of ER diagrams. ● EER models are helpful tools for designing databases with high-level models. ● With their enhanced features, you can plan databases more thoroughly by delving into the properties and constraints with more precision. An EER diagram provides you with all the elements of an ER diagram while adding: ● Subclasses and Super classes. ● Specialization and Generalization. ● Category or union type. ● Aggregation. Features of EER Model ● ● ● ● ● ● EER creates a design more accurate to database schemas. It reflects the data properties and constraints more precisely. It includes all modeling concepts of the ER model. Diagrammatic technique helps for displaying the EER schema. It includes the concept of specialization and generalization. It is used to represent a collection of objects that is union of objects of different of different entity types. A. Sub Class and Super Class ● Sub class and Super class relationship leads the concept of Inheritance. ● The relationship between sub class and super class is denoted with symbol. 1. Super Class ● Super class is an entity type that has a relationship with one or more subtypes. ● An entity cannot exist in database merely by being member of any super class. For example: Shape super class is having sub groups as Square, Circle and Triangle. 2. Sub Class ● ● Sub class is a group of entities with unique attributes. Sub class inherits properties and attributes from its super class. For example: Square, Circle, Triangle are the sub class of Shape super class. B. Specialization and Generalization 1. Generalization ● ● ● ● ● Generalization is the process of generalizing the entities which contain the properties of all the generalized entities. It is a bottom approach, in which two lower level entities combine to form a higher level entity. Generalization is the reverse process of Specialization. It defines a general entity type from a set of specialized entity type. It minimizes the difference between the entities by identifying the common features. For example: 2. Specialization ● ● ● Specialization is a process that defines a group entities which is divided into sub groups based on their characteristic. It is a top down approach, in which one higher entity can be broken down into two lower level entity. It maximizes the difference between the members of an entity by identifying the unique characteristic or attributes of each member. ● It defines one or more sub class for the super class and also forms the superclass/subclass relationship. For example C. Category or Union ● ● Category represents a single super class or sub class relationship with more than one super class. It can be a total or partial participation. For example Car booking, Car owner can be a person, a bank (holds a possession on a Car) or a company. Category (sub class) → Owner is a subset of the union of the three super classes → Company, Bank, and Person. A Category member must exist in at least one of its super classes. D. Aggregation ● ● ● Aggregation is a process that represent a relationship between a whole object and its component parts. It abstracts a relationship between objects and viewing the relationship as an object. It is a process when two entity is treated as a single entity. Degree of Relationship: ● ● The degree of a relationship is the number of entity types that participate in a relationship. By seeing an E-R diagram, we can simply tell the degree of a relationship i.e. the number of an entity type that is connected to a relationship is the degree of that relationship. For example, if we have two entity type ‘Customer’ and ‘Account’ and they are linked using the primary key and foreign key. We can say that the degree of relationship is 2 because here two entities are taking part in the relationship. Based on the number of entity types that are connected we have the following degree of relationships: ● Unary ● Binary ● Ternary ● N-ary Unary (degree 1): A unary relationship exists when both the participating entity type are the same. In this case we say that the degree of relationship is 1. ● For example, suppose we have many students who belong to a particular club-like dance club, basketball club etc. and some of them are club leads. So, a particular group of student is managed by their respective club lead and the club leads are chosen from students. ● So, the ‘Student’ is the only entity participating here. ● We can say that the minimum degree of a relationship can be one. Binary (degree 2): ● A binary relationship exists when exactly two entity type participates. ● When such a relationship is present we say that the degree is 2. ● It is easy to deal with such relationship as these can be easily converted into relational tables. For example, we have two entity type ‘Customer’ and ‘Account’ where each ‘Customer’ has an ‘Account’ which stores the account details of the ‘Customer’. Since we have two entity types participating we call it a binary relationship. Ternary(degree 3): ● A ternary relationship exists when exactly three entity type participates. ● When such a relationship is present we say that the degree is 3. ● As the number of entity increases in the relationship, it becomes complex to convert them into relational tables. For example, We have three entity type ‘Employee’, ‘Department’ and ‘Location’. The relationship between these entities are defined as an employee works in a department, an employee works at a particular location. So, we can see we have three entities participating in a relationship so it is a ternary relationship. The degree of this relation is 3. N-ary (n degree): ● An N-ary relationship exists when ‘n’ number of entities are participating. ● So, any number of entities can participate in a relationship. There is no limitation to the maximum number of entities that can participate. Database Structure: DBMS is a software that allows access to data stored in a database and provides an easy and effective method of – ● ● ● ● ● Defining the information. Storing the information. Manipulating the information. Protecting the information from system crashes or data theft. Differentiating access permissions for different users. A database system is partitioned into modules that deal with each of the responsibilities of the overall system. The functional components of a database system can be broadly divided into the storage manager and the query processor components. The storage manager is important because databases typically require a large amount of storage space. The query processor is important because it helps the database system simplify and facilitate access to data. ● ● 1. Query Processor: It interprets the requests (queries) received from end user via an application program into instructions. It also executes the user request which is received from the DML compiler. 2. Storage Manager: Storage Manager is a program that provides an interface between the data stored in the database and the queries received. It is also known as Database Control System. It maintains the consistency and integrity of the database by applying the constraints and executes the DCL statements. It is responsible for updating, storing, deleting, and retrieving data in the database.