Chapter Seven Basic Data Storage 135 Importance of this chapter With the immense advancement using of diversified applications across the globe, it goes without saying that managing this huge user information is of outmost importance. How these data are stored in the data storages along with every software, websites and automated systems or in other tools are established and managed is the main focus of this chapter. Expected outcome of this chapter • Describes databases and DBMS concepts, terminology, and architecture. • Describes the basic concepts necessary for a good understanding of databases design and implementation Describe the conceptual modeling techniques used in database systems. • • Describes the relational data model, its integrity constraints and update operations, and the operations of the relational algebra. 7.1. An introduction to database Database: Database is a structured and self-describing collection of data that is used to store data and data definitions (metadata) and mange data consistency and integrity. Database contains data and metadata stored in a table like format with columns (attributes) and rows (records). Metadata describes different data definitions. The tool that is used to manage a database is called Database Management System (DBMS). It provides common functionalities and interfaces for managing and controlling database activities. Database functionalities: • • • • • Data Storage Management Data Transformation and Presentation Security Multi-user Access Control Backup and Recovery 136 • • • Data Integrity Database Access Language Database Communication Interface. Applications of databases: • • • • • • • Banking: transactions Airlines: reservations, schedules Universities: registration, grades Sales: customers, products, purchases Online retailers: order tracking, customized recommendations Manufacturing: production, inventory, orders, supply chain Human resources: employee records, salaries, tax deductions Storing data in database is advantageous than storing data in a file system (.doc, .excel file etc.). Because, in a database, data can be easily manipulated (add, delete, update) which is difficult in file processing system as there are many different file types. Advantages: • • • • • • High data quality, integrity, and consistency Reduced data redundancy and application maintenance Easy access and sharing Scalable Improved security Specialized and productive management tool Major disadvantages: • • Increased complexity Greater impact of failure Problems related to topics Problem 1: What is database? Why do we use database? Solution: Database is a structured and self-describing collection of data that is used to store data and data definitions (metadata) and mange data consistency and integrity. Database is used to• • • • • Data Storage Management. Data Transformation and Presentation Security Multi-user Access Control Backup and Recovery 137 • • • Data Integrity Database Access Language Database Communication Interface. Problem 2: What are the advantage and disadvantage of database? Solution: Advantage and disadvantages of database is given below: Advantages: • • • • • • High data quality, integrity, and consistency Reduced data redundancy and application maintenance Easy access and sharing Scalable Improved security Specialized and productive management tool Major disadvantages: • • Increased complexity Greater impact of failure Exercises: 1. What are the contents of a database? What is metadata? 2. Write the name of some application where database is used. 3. Why storing data in file system is difficult then storing in database? 7.2. Database architecture: Table, Fields and Records Relational Database: A relational database (RDB) is a collective set of multiple data sets organized by tables, records and columns. RDBs also establish a well-defined relationship between database tables. In simplest terms, a relational database is one that presents information in tables with rows and columns showing relation between them. All the database design and data manipulation tasks are carried by a Database Management System (DBMS). It is computer software designed for the purpose of managing databases based on a variety of data models. Regardless of data models, the logical structure of the database is called Schema and the actual content of the database at a particular point in time is called an Instance. In a database, data are organized in relations (tables) which may be linked by some 138 constraints. Columns contain what information is needed to be stored and Rows contain each record. For example, in a university database, there can be tables to store information about courses, instructors, student, section etc. Figure (1) is a sample database design for a university. Columns are called fields and each row contains record which stores value for each field. For example, figure (2) shows a database table for storing the data about Instructors. Here, ID, name, dept_name and salary are the fields to store and {2222, Einstein, Physics, 95000} is a record. Database table properties: Unique table name. All values in a row describe the one instance. All values in a column are of the same kind. Each row is distinct. A cell of the table holds a single value. Each column has a unique name. There is no ordering in rows. NULL value can also be stored in tables. 139 Problems related to topics Problem 1: What is DBMS? Solution: The database design and data manipulation tasks are carried by a Database Management System (DBMS). It is computer software designed for the purpose of managing databases based on a variety of data models. Problem 2: What is schema and instance? Solution: the logical structure of the database is called Schema and the actual content of the database at a particular point in time is called an Instance. Problem 3: What are the properties of database table? Solution: Database table properties: Unique table name. All values in a row describe the one instance. All values in a column are of the same kind. Each row is distinct. A cell of the table holds a single value. Each column has a unique name. There is no ordering in rows. NULL value can also be stored in tables. Exercises: 1. What is a relational database? How data are stored in a relational database? What is a field? 2. Find the tables necessary for a Library Management System/ Hospital Management System. 3. Identify the fields of each table for the above mentioned systems. 7.3. Designing a database: Entity, Attributes and Relationships Data Model: A detailed model that captures overall structure of organizational data while being independent of any implementation considerations. Data modelling involves examining the data objects in a system and identifying the relationship between these objects. There are different ways of model a database. They are: Relational model 140 Entity-Relationship data model (mainly for database design) Object-based data models (Object-oriented and Object-relational) Semi structured data model (XML) Entity Relationship Diagram is a form of data modelling which is widely used in designing a database. This model uses a graphical representation of entities and their relationships to each other, based on which tables are created in a database. The primary purpose of an ERD is to document the logical structure of a database. Entity: An entity is an object that exists and is distinguishable from other objects. Example: specific person, company, event, plant. Attribute: An entity is represented by a set of attributes, that is descriptive properties possessed by an entity. Example: instructor = (ID, name, street, city, salary), course= (course_id, title, credits) Relationship: A relationship is an association among several entities. Example: students are enrolled in courses. Instructor teaches courses. It is the primary task to determine all the entities and their attributes while modelling a database. Entities are denoted using rectangle. For example, in a university database following entities may be present: 141 Attributes are denoted using elliptical shape. Entity relationships are denoted using rhombus shape. For example, for the Students entity the attributes would be sid, name, age, gpa and fro courses the attributes would be cid and title. Both the entities are related using the relation enrol as follows: 142 Problems related to topics: Problem 1: What is a data model? Write the names of different data models. Solution: Data Model: A detailed model that captures overall structure of organizational data while being independent of any implementation considerations. Different kinds of data models are as follows: Relational model Entity-Relationship data model (mainly for database design) Object-based data models (Object-oriented and Object-relational) Semi structured data model (XML) Problem 2: What is entity and attribute? Discuss with example. Solution: Entity: An entity is an object that exists and is distinguishable from other objects. Example: specific person, company, event, plant. Attribute: An entity is represented by a set of attributes, that is descriptive properties possessed by an entity. Example: instructor = (ID, name, street, city, salary), course= (course_id, title, credits). For example, in a university management system, there can be entities like faculty, departments, classrooms, students, courses. Each of these entities will have several attributes. Suppose that, attributes of Students entity are SID, name, age, GPA. Another entity Courses have attributes like CID and title. This is demonstrated in below diagram: 143 Exercises: 1. Find all the entities of an online bookshop management database. Find out all the attributes for each entity. 7.4. Designing a database: Keys It is important that any entity in an entity set be uniquely identifiable. Practically, we use the values of certain attributes to uniquely identify an entity. For example, in a bank database using customer’s SSN, the customer’s full information can be brought up. In a database table, keys are defined to identify each record distinctly. Key can be one single attribute or a collection or set of attributes. For example, in a Person table a person can be uniquely identified by the SSN or a combination of First Name, Last Name and SSN. In practice, these combinations of attributes are classified using four types of Keys: Super key Candidate key Primary key Foreign key. Primary key: A primary key is a candidate key and a single field that is most appropriate to be the main reference key for the table. The primary key must contain unique values, must never be null and uniquely identify each record in the table. For example, in a Students table, we can see that, using only {StudentID} it is possible to identify each record distinctly. So, this is the primary key for the table. Foreign Key: A foreign key is generally a primary key from one table that appears as a field 144 in another table to establish a relation between the first and second table. For example, consider the relationship between Students and Courses. Student information is stores in Students table and course information is stored in Courses table. From these two tables how to show that which student have which courses? This is done using a foreign key field {courseId} in the Students table, which will contain the values of {courseId} field of Courses table. 145 Problems related to topics Problem 1: Why do we need to use keys in a database? Write the names of the keys used in databases. Solution: In a database table, keys are defined to identify each record distinctly. It is important that any entity in an entity set be uniquely identifiable. Practically, we use the values of certain attributes to uniquely identify an entity, which are called keys. Keys can be one single attribute or a collection or set of attributes. Different kinds of keys are used in designing a database. They are: Super key Candidate key Primary key Foreign key Problem 2: What is Primary key? Describe with example. Solution: Primary key: A primary key is a candidate key and a single field that is most appropriate to be the main reference key for the table. The primary key must contain unique values, must never be null and uniquely identify each record in the table. For example, in a Students table, we can see that, using only {StudentID} it is possible to identify each record distinctly. So, this is the primary key for the table. Problem 2: For a movie database, identify the Primary Key and Foreign Key in Directors and Movies table. Solution: In Directors table DirectorID is the primary key. Similarly, in Movies table, MovieID is the primary key. To establish a relationship between these two tables, DirectorID field would be used as a foreign key in Movies table, which will reference to the primary key field DirectorID of Directors table. 146 Exercises: 1. What is Foreign Key? Why do we use Foreign Key in a database? Explain with example. 2. For an online book shop database, identify the Primary Key and Foreign Key in Customers and Orders table. 147 Points to Remember 1. Database is a structured and self-describing collection of data. 2. The logical structure of the database is called Schema and the actual content of the database at a particular point in time is called an Instance. 3. Database contains data and metadata stored in a table like format with columns (attributes) and rows (records). Columns are called fields and each row contains record which stores value for each field 4. An entity is an object that exists and is distinguishable from other objects. It is represented by a set of attributes that is descriptive properties possessed by an entity. Entities are connected by different relationships. 5. Keys are defined to identify each record distinctly. Several types of keys are used in designing a database. Vocabulary DBMS – Database Management System. Tools to mange or manipulate data using database. RDB – Relational Database. It is a collective set of multiple data sets organized by tables, records and columns. ERD – Entity Relationship Diagram. This is a form of data modeling which uses a graphical representation of entities and their relationships to each other, based on which tables are created in a database. Schema – Design of the database. The logical structure of the database is called Schema. Instance – the actual content of the database at a particular point in time is called an Instance. Key – Keys are defined to identify each record distinctly. Key can be one single attribute or a collection or set of attributes. 148