Database design and Applications CSIZG518/SEZG518/SSZG518)(S2 -22) BITS Pilani Pilani Campus Prof Uma Maheswari DDA course content 1. Introduction and Overview of DBMS 2. Conceptual Database Design (ER and EER Modeling) 3. Relational Model 4. Relational Algebra and Calculus 5. SQL 6. Schema Refinement and Normal Forms 7. Disk Storage 8. Hashing and Indexing 9. Transaction Management and Concurrency Control 10. Database Recovery 11. DB security 12. QUERY processing and optimization 13. Schema-less DB - NOSQL introduction. BITS Pilani, Pilani Campus BITS Pilani Pilani Campus Session 1 Introduction to Database Management Systems (DBMS) Concepts and Architecture Learning Objective Introduction to RDBMS RDBMS vs Traditional FS 3-schema arch data independence DBMS architecture Data dictionary, DB design phases Refer : T1-Chapter 1 and 2; RL : 1.1 , 1.2 BITS Pilani, Pilani Campus What is Database ? A unit of Information is called DATUM while DATA is plural. So DATA is – Collection of related data – By data , we mean known facts that can be recorded and that have implicit meaning. – Represents some aspect of the real world – Logically coherent collection of data with inherent meaning – Built for a specific purpose This collection of related data with an implicit meaning is a database. BITS Pilani, Pilani Campus Types of Data Examples Character Number A N $ Boolean 12 56.9 54/78 -5.6777 89,677 03-08-2019 1/5/2018 $56678 Rs 34900 1 or 0 true or false Sets {RED, YELLOW, GREEN BLUE} {USD, INR, AUD, EURO, POUND} Date Currency BITS Pilani, Pilani Campus What is Database ? DATABASE: A collection of related data with an implicit meaning is called a Database. Examples of Database: employee database (contains all about employees from one or more branches of an organization), sales database (contains all about sales and salesperson from one or more branches of an organization),, user database (all who uses the system and credentials and log details) and so on. BITS Pilani, Pilani Campus Database Size: • Size of database can big to small depends on the application’s usage of data that can be volume, variety, velocity, value and Validity of data. • A large database for databases with several dozen gigabytes of data and a schema with more than 30 or 40 distinct entity types, that cover a wide array of databases used in government, industry, and financial and commercial institutions. • Application systems for these databases are called transaction processing systems or TPS due to the large transaction volumes and rates that are required. BITS Pilani, Pilani Campus Types of data used by Applications: BITS Pilani, Pilani Campus Types of Databases and Database Applications Traditional database applications – Store textual or numeric information Multimedia databases – Store images, audio clips, and video streams digitally Geographic information systems (GIS) – Store and analyze maps, weather data, and satellite images Data warehouses and online analytical processing (OLAP) systems • Extract and analyze useful business information from very large databases • Support decision making Real-time and active database technology • Control industrial and manufacturing processes Time series DB • Financial data • the volatility of stock trading BITS Pilani, Pilani Campus DBMS and its OPERATIONS BITS Pilani, Pilani Campus What is Database management system (DBMS) ? • Collection of programs • Enables users to create and maintain a database Def: The DBMS is a general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications How do we share the DB ? Server BITS Pilani, Pilani Campus Traditional file processing BITS Pilani, Pilani Campus Traditional file processing BITS Pilani, Pilani Campus Traditional file processing BITS Pilani, Pilani Campus How does DBMS Look? BITS Pilani, Pilani Campus DBMS environment BITS Pilani, Pilani Campus What operations can you do with DBMS? Defining a database Constructing the database Manipulating a database Sharing a database Query a database BITS Pilani, Pilani Campus DBMS operations in detail Defining a database – Specify the data types, structures, and constraints of the data to be stored. – The database definition or descriptive information is also stored by the DBMS in the form of a database catalog or dictionary; it is called metadata. Constructing the database is the process of storing the data on some storage medium that is controlled by the DBMS. Manipulating a database includes functions such as querying the database to retrieve specific data, updating the database and generating reports from the data. BITS Pilani, Pilani Campus What is sharing operation in DBMS? – Sharing a database allows multiple users and programs to access the database simultaneously. BITS Pilani, Pilani Campus How do we access Database from application programs? An application program can access a database by sending queries to DBMS using: Three main integration approaches: – Embed SQL in the host language (Embedded SQL) – Create special API to call SQL commands (JDBC,ODBC) – Allow ‘external’ code to be executed from within SQL What is Query ? A DBMS operation that causes some data to be retrieved BITS Pilani, Pilani Campus Embedded SQL { int a; /* ... */ EXEC SQL SELECT salary INTO :a FROM Employee WHERE SSN=876543210; /* ... */ printf("The salary is %d\n", a); /* ... */ } BITS Pilani, Pilani Campus Using SQL in API call app.post("/api/getAllcustomers", (req, res) => { const {eid } = req.body; console.log("eid is ..",eid); pool.query "SELECT * FROM customers WHERE customercity= $1", [eid], (error, results) => { if (error) { console.log(error); res.status(203); } else { res.status(200).json(results.rows); } }); }); BITS Pilani, Pilani Campus Allow ‘external’ code to be executed from within SQL EXAMPLE OF PYTHON CODE EXECUTION IN SQL SERVER EXECUTE sp_execute_external_script @language = N'Python', @script = N' a=1 b=2 c = a+b print ("Example instruction on Python") print("Result =", c)'; To allow the use of external scripts in the Python language, you must enable the system parameter “external scripts enabled” in SQL Server, this is done using the system procedure sp_configure. sp_configure 'external scripts enabled', 1; RECONFIGURE WITH OVERRIDE BITS Pilani, Pilani Campus Mobile accessing a database? BITS Pilani, Pilani Campus CASE STUDY: UNIVERSITY Database design BITS Pilani, Pilani Campus Examples of Queries and Updates: Examples of queries: – Retrieve the transcript – List the names of students who took the section of the ‘Database’ course offered in fall 2008 and their grades in that section – List the prerequisites of the ‘Database’ course Examples of updates: – Change the class of ‘Smith’ to sophomore – Create a new section for the ‘Database’ course for this semester – Enter a grade of ‘A’ for ‘Smith’ in the ‘Database’ section of last semester BITS Pilani, Pilani Campus Features of DBMS The features of DBMS are: Data Independence, Back-up and Restore, Transaction and Concurrency Control, Data Security Data Integrity. BITS Pilani, Pilani Campus Protection for DB? Protection includes: – System protection (against hardware or software malfunction (or crashes)) – Security protection (against unauthorized or malicious access) Hacking db BITS Pilani, Pilani Campus What is “Maintain the DB system” ? – Allow the system to evolve as requirements change over time Therefore it is an activity designed to keep a database running smoothly. A number of different systems can do this: 1. performed by people who are comfortable and familiar with the database system and the specifics of the particular database 2. Databases are used to maintain a library of information in a well organized, accessible format. Database maintenance is used to keep the database clean and well organized so that it will not lose functionality. 3. Backing up the data 4. Checking for signs of corruption in the database 5. Server maintenance. BITS Pilani, Pilani Campus Database approach Characteristics of the Database Approach : 1. Self-describing nature of a database system 2. Insulation between programs and data, and data abstraction 3. Support of multiple views of the data 4. Sharing of data and multiuser transaction processing BITS Pilani, Pilani Campus Database approach 1. Self-Describing Nature of a Database System Database system contains complete definition of structure and constraints. Meta-data – Describes structure of the database Database catalog used by: – DBMS software – Database users who need information about database structure BITS Pilani, Pilani Campus Database approach 2. Insulation between programs and data implies “data abstraction” Program-data independence – Structure of data files is stored in DBMS catalog separately from access programs Program-operation independence – Operations specified in two parts: • Interface includes operation name and data types of its arguments • Implementation can be changed without affecting the interface BITS Pilani, Pilani Campus Database Approach 3. Support of multiple views of the data View Subset of the database Contains virtual data derived from the database files but is not explicitly stored Multiuser DBMS Users have a variety of distinct applications Must provide facilities for defining multiple views BITS Pilani, Pilani Campus Database Approach Multiple views of the data BITS Pilani, Pilani Campus Database Approach 4. Sharing of data and multiuser transaction processing Allow multiple users to access the database at the same time Concurrency control software – Ensure that several users trying to update the same data do so in a controlled manner • Result of the updates is correct Online transaction processing (OLTP) application A multiuser DBMS software is to ensure that concurrent transactions operate correctly and efficiently. BITS Pilani, Pilani Campus What is Transaction in DB? May cause some data to be read and some data to be written into the database A’s Account B’s Account Open_Account(A) Old_Balance = A.balance New_Balance = Old_Balance - 500 A.balance = New_Balance Close_Account(A) Open_Account(B) Old_Balance = B.balance New_Balance = Old_Balance + 500 B.balance = New_Balance Close_Account(B) Eg., When you credit or debit in your bank account BITS Pilani, Pilani Campus Three-schema architecture BITS Pilani, Pilani Campus Concept of 3 level or layers of DB Each Layer has to handle two issues Level 1 How to store data Specify how to view the stored data level2 level3 BITS Pilani, Pilani Campus 3 Layer architecture BITS Pilani, Pilani Campus Three schema architecture The 3 levels or layers of database or DBMS architecture which are External, conceptual or logical and physical or internal levels or layers. Each Level has a schema and data abstraction There is a schema at each levels or layers of the Database architecture while there is a abstraction at each levels or layers as well. These abstractions are called external, conceptual and physical level abstractions respectively. These abstractions are needed to view data while the schema at each level describe data at some layer of visualization of the database. BITS Pilani, Pilani Campus External Schema External schemas, which usually are also in terms of the data model of the DBMS, allow data access to be customized (and authorized) at the level of individual users or groups of users. Any given database has exactly one conceptual schema and one physical schema because it has just one set of stored relations, but it may have several external schemas, each tailored to a particular group of users. Each external schema consists of a collection of one or more views and relations from the conceptual schema. SQL queries we place at this level BITS Pilani, Pilani Campus Conceptual Schema conceptual schema (sometimes called the logical schema) describes the stored data in terms of the data model of the DBMS. In a relational DBMS, the conceptual schema describes all relations that are stored in the database. Faculty(d: string, fname: string, sal: real) Courses(cid: string, cname: string, credits: integer) Rooms(rno: integer, address: string, capacity: integer) Enrolled(sid: string, cid: string, grade: string) Teaches(d: string, cid: string) Meets In(cid: string, rno: integer, time: string Design of conceptual schemas is called “Conceptual database design” BITS Pilani, Pilani Campus Physical Schema The physical schema species additional storage details. Essentially, the physical schema summarizes how the relations described in the conceptual schema are actually stored on secondary storage devices such as disks and tapes. We must decide what file organizations to use to store the relations, and create auxiliary data structures called indexes to speed up data retrieval operations. indexes could be by hashing or trees Decisions about the physical schema are based on an understanding of how the data is typically accessed. The process of arriving at a good physical schema is called physical database design.. BITS Pilani, Pilani Campus Example 1 : 3-level schema BITS Pilani, Pilani Campus Example 2 : 3-level schema BITS Pilani, Pilani Campus Example 2 : 3-level schema BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Data Independence A database system normally contains a lot of data in addition to users’ data. For example, it stores data about data, known as metadata, to locate and retrieve data easily. It is rather difficult to modify or update a set of metadata once it is stored in the database. But as a DBMS expands, it needs to change over time to satisfy the requirements of the users. If the entire data is dependent, it would become a tedious and highly complex job. BITS Pilani, Pilani Campus DATA INDEPENDENCE Data Independence Data Independence is defined as a property of DBMS that helps you to change the Database schema at one level of a database system without requiring to change the schema at the next higher level. Data independence helps you to keep data separated from all programs that make use of it. Data abstraction or abstraction at each level makes this data independence possible. Types of data independence There are two types namely logical and physical data independence. BITS Pilani, Pilani Campus DATA INDEPENDENCE 1. Logical data independence: Logical data is data about database, that is, it stores information about how data is managed inside. For example, a table (relation) stored in the database and all its constraints, applied on that relation. So Logical data independence is a kind of mechanism, which liberalizes itself from actual data stored on the disk. If we do some changes on table format, it should not change the data residing on the disk. BITS Pilani, Pilani Campus DATA INDEPENDENCE 2. Physical data independence: All the schemas are logical, and the actual data is stored in bit format on the disk. Physical data independence is the power to change the physical data without impacting the schema or logical data. For example, in case we want to change or upgrade the storage system itself − suppose we want to replace hard-disks with SSD − it should not have any impact on the logical data or schemas. BITS Pilani, Pilani Campus Data Independence Metadata itself follows a layered architecture, so that when we change data at one layer, it does not affect the data at another level. This data is independent but mapped to each other. BITS Pilani, Pilani Campus Mapping Mapping: Mapping is used to transform the request and response between various database levels of architecture. There are two types of mapping namely external and Internal mapping. In External / Conceptual mapping, it is necessary to transform the request from external level to conceptual schema. In Conceptual / Internal mapping, DBMS transform the request from the conceptual to internal level. Mapping is not good for small database, because it takes more time . BITS Pilani, Pilani Campus Mapping BITS Pilani, Pilani Campus DBMS ARCHITECTURE BITS Pilani, Pilani Campus DBMS ARCHITECTURE BITS Pilani, Pilani Campus Client Server concept Client and Server concept The DBMS may be on a centralized machine or server. The clients or end users or programs be it standalone or web/mobile applications accessing the database on the server. There could be several users or apps trying to read or write data on the database located on server. BITS Pilani, Pilani Campus Types of DBMS Architecture Types of DBMS Architecture The types of DBMS architectures are single tier or multi -tier. The n-tier architecture divides the whole system into related but independent n modules which can be modified, altered, changed or replaced independently. There are different layers like presentation layer(UI), application layer (business logic or programs) and data layer where actual database is stored. 1-tier architecture is where the user sits directly on the DBMS and all changes done by DBMS itself. Thus no client or server. BITS Pilani, Pilani Campus DBMS Architecture 2-tier architecture is where the presentation or UI or app is on your mobile or computer which is the client program and Server which has the database. The Database is on server so it could be treated that data is safe from unauthorised users. BITS Pilani, Pilani Campus DBMS ARCHITECTURE – 2 TIER BITS Pilani, Pilani Campus DBMS ARCHITECTURE – 2 TIER BITS Pilani, Pilani Campus DBMS ARCHITECTURE – 2 TIER BITS Pilani, Pilani Campus DBMS ARCHITECTURE – 3 TIER 3-tier architecture is where there are 3 layers or modules. The client or presentation layer which is app on mobile or computer and server which has the application layer containing business logic modules. The data layer contains the actual databases. BITS Pilani, Pilani Campus DBMS ARCHITECTURE – 3 TIER BITS Pilani, Pilani Campus DBMS ARCHITECTURE – 3 TIER BITS Pilani, Pilani Campus DBMS ARCHITECTURE – 3 TIER BITS Pilani, Pilani Campus DBMS ARCHITECTURE BITS Pilani, Pilani Campus Example of Three-Tier Architecture A common environment for using a database has three tiers of processors: 1. Web servers --- talk to the user. 2. Application servers --- execute the business logic. 3. Database servers --- get what the app servers need from the database. BITS Pilani, Pilani Campus What all are stored in DB? BITS Pilani, Pilani Campus What is METADATA? Metadata : Data about data is meta data. It describes about itself. Example BITS Pilani, Pilani Campus What is Meta data? BITS Pilani, Pilani Campus DB or SYSTEM CATALOG Every database stores every information about its objects. These information can be structure, definition, purpose, storage, number of columns and records, dependencies, access rights, owner and other info. These useful information about the data in the database ,also called as metadata. These metadata are also stored as rows and columns of a table. Collection of these metadata is stored in the system catalog or data dictionary. When the database is created a system catalog is automatically created as well. This catalog keeps track of the objects created and changes made to each objects in the database. Every database has its own system catalog. BITS Pilani, Pilani Campus Database / system Catalog .The information stored in the catalog is called meta-data, and it describes the structure of the primary database BITS Pilani, Pilani Campus Example of Database Catalog BITS Pilani, Pilani Campus SYSTEM or DB CATALOG Who creates system catalog? When you create a database the system catalog is automatically created by DBMS. BITS Pilani, Pilani Campus SYSTEM CATALOG The system catalog contains information such as the following: • • • • • • • • • • • • • User accounts and default settings Privileges and other security information Performance statistics Object sizing Object growth Table structure and storage Index structure and storage Information on other database objects, such as views, synonyms, triggers, and stored procedures Table constraints and referential integrity information User sessions Auditing information Internal database settings Locations of database files BITS Pilani, Pilani Campus What is a Query Queries in a DBMS : Queries : Questions involving the data stored in DBMS. Types of queries : Formal query language -> Relational Algebra and Relational Calculus commercial query language -> SQL statements BITS Pilani, Pilani Campus DB Architecture BITS Pilani, Pilani Campus DB architecture Database Architecture: Database architecture deals the design, development, implementation and maintenance of computer programs that which store and organize information for businesses, agencies and institutions. The development and implementation software to meet the needs of users is done by Data Architect. Design of DBMS depends on its architecture which can be centralized or hierarchical. Data Abstraction: Hiding the implementation and storage details of DB or database. BITS Pilani, Pilani Campus Example of DB architecture Green ones are tables , BITS Pilani, Pilani Campus Data Dictionary Creating a Data Dictionary for Online Delivery System Delivery Table – This table contains details about the delivery Primary Key Attribute Data Type Y Delivery_id INTEGER Delivery_date Date Delivery_person_name VARCHAR Delivery_person_contact VARCHAR Order Table – This table contains details about order PrimaryKey Attribute Data Type Size Y Order_id INTEGER Cust_id INTEGER Delivery_id INTEGER Date Date Branch_id INTEGER - Size 50 20 Description Unique ID of delivery Date of the delivery Name of the person who does the delivery of specific order Name of the person who does the delivery of specific order Description Unique Id of order Unique ID of customer Unique ID of delivery Date of a specific order Unique ID of the branch BITS Pilani, Pilani Campus DBMS Functions, Pros and cons. BITS Pilani, Pilani Campus DBMS BITS Pilani, Pilani Campus DBMS Applications BITS Pilani, Pilani Campus DBMS Advantages BITS Pilani, Pilani Campus DBMS Disadvantages BITS Pilani, Pilani Campus WHAT IS ERP OR CRM ? These packages identify a set of common tasks (e.g., inventory management, human resources planning, financial analysis) encountered by a large number of organizations and provide a general application layer to carry out these tasks. The data is stored in a relational DBMS, and the application layer can be customized to different companies, leading to lower overall costs for the companies, compared to the cost of building the application layer from scratch. Extending database capabilities for new applications – Extensions to better support specialized requirements for applications – Enterprise resource planning (ERP) – Customer relationship management (CRM) Databases versus information retrieval – Information retrieval (IR) • Deals with books, manuscripts, and various forms of library-based articles BITS Pilani, Pilani Campus ERP ERP applications are most commonly deployed in a distributed and often widely dispersed manner. While the servers may be centralized, the clients are usually spread to multiple locations throughout the enterprise. Enterprise Resource Planning software can be used to automate and simplify individual activities across a business or organization, such as accounting and procurement, project management, customer relationship management, risk management, compliance, and supply chain operations. BITS Pilani, Pilani Campus ERP systems SAP SAP has multiple ERP offerings – By Design, Manufacturing, and Business One. Sage Aimed at firms in manufacturing, distribution and services, Sage offers a suite of products providing a range of key ERP tools. Microsoft Dynamics This has developed into a full suite of ERP products, and now includes applications for, amongst other things, financial management, human resources, and supply chain management. What are the resources? money, human resources, machinery, land, telecom spectrum, oil fields, coal mines licenses etc e-governance E-seva ELCOT https://it.tn.gov.in/en/ELCOT/e-interventions Salesforce.com Salesforce.com is one the biggest names in ERP and customer relationship management (CRM) solutions, which can be used across a variety of sectors. The firm offers a range of products depending on a businesses’ size and sector. BITS Pilani, Pilani Campus CRM Customer Relationship Management(CRM) is a business strategy to acquire and manage the most valuable customer relationships. CRM requires a customer-centric business philosophy and culture to support effective marketing, sales and service processes. CRM system stores the history of the relationship between seller and customer. Brands optimize employee employee performance, implement new marketing tools, improve service levels, and drive revenue growth based on this data. BITS Pilani, Pilani Campus CRM examples CRM platform includes the following functions: •organizes data for easy interpretation, •prompts CSRs for useful information, •protects against duplicate records, •flags incomplete records. use of CRM in the public domain for quite some time now. These include the IRCTC – the Indian Railways ticketing web portal, the Road Transport Authority services, the Municipal corporations and its allied departments for health and governance as well as legal and utility services such as electricity and gas operators. These have all been progressively managing citizen data and relationships using CRM techniques. As a part of the massive software-as-a-service (SaaS) market, CRM technology represents the fastest-growing category of enterprise-software. The major players in the CRM market are Adobe Systems, Microsoft, Oracle, Salesforce, and SAP. Salesforce Sales Cloud Zoho CRM Hubspot BITS Pilani, Pilani Campus TIMELINE OF DBMS BITS Pilani, Pilani Campus When Not to Use a DBMS More desirable to use regular files for: – Simple, well-defined database applications not expected to change at all – Stringent, real-time requirements that may not be met because of DBMS overhead – Embedded systems with limited storage capacity – No multiple-user access to data BITS Pilani, Pilani Campus DBMS WORKERS or Users System analysts – Determine requirements of end users Application programmers – Implement these specifications as programs DBMS system designers and implementers Design and implement the DBMS modules and interfaces as a software package Tool developers Design and implement tools Operators and maintenance personnel Responsible for running and maintenance of hardware and software environment for database system BITS Pilani, Pilani Campus DBMS WORKERS or Users Database administrators (DBA) are responsible for: – Authorizing access to the database – Coordinating and monitoring its use – Acquiring software and hardware resources Database designers are responsible for: – Identifying the data to be stored – Choosing appropriate structures to represent and store this data End users People whose jobs require access to the database Types Casual end users Naive or parametric end users Sophisticated end users Standalone users BITS Pilani, Pilani Campus QUESTIONS?? BITS Pilani, Pilani Campus Phases of Database Design BITS Pilani, Pilani Campus Phases of Database Design 1. Requirements specification and analysis 2. Conceptual design 3. Logical design 4. Physical design BITS Pilani, Pilani Campus Phases of Database Design BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Phases of Database Design BITS Pilani, Pilani Campus Phases of Database Design 1. Requirements specification and analysis • The requirements and the collection analysis phase produce both data requirements and functional requirements. • The data requirements are used as a source of database design. • The data requirements should be specified in as detailed and complete form as possible. BITS Pilani, Pilani Campus Phases of Database Design 2. Conceptual design • The result of this phase is an Entity-Relationship (ER) diagram or UML class diagram. It is a high-level data model of the specific application area. • It describes how different entities (objects, items) are related to each other. It also describes what attributes (features) each entity has. It includes the definitions of all the concepts (entities, attributes) of the application area. • During or after the conceptual schema design, the basic data model operations can be used to specify the high-level user operations identified during the functional analysis. This also serves to confirm that the conceptual schema meets all the identified functional requirements. BITS Pilani, Pilani Campus Phases of Database Design 3. Logical design A. create relation schemas: • The result of the logical design phase (or data model mapping phase) is a set of relation schemas. The ER diagram or class diagram is the basis for these relation schemas. • To create the relation schemas is quite a mechanical operation. There are rules how the ER model or class diagram is transferred to relation schemas. • The relation schemas are the basis for table definitions. In this phase (if not done in previous phase) the primary keys and foreign keys are defined. BITS Pilani, Pilani Campus Phases of Database Design 3. Logical design B. Normalization Normalization is the last part of the logical design. The goal of normalization is to eliminate redundancy and potential update anomalies. • Redundancy means that the same data is saved more than once in a database. Update anomaly is a consequence of redundancy. If a piece of data is saved in more than one place, the same data must be updated in more than one place. • Normalization is a technique by which one can modify the relation schema to reduce the redundancy. Each normalization phase adds more relations (tables) into the database. BITS Pilani, Pilani Campus Phases of Database Design 4. Physical design The goal of the last phase of database design, physical design, is to implement the database. At this phase one must know which database management system (DBMS) is used. For example, different DBMS's have different names for datatypes and have different datatypes. The SQL clauses to create the database are written. The indexes, the integrity constraints (rules) and the users' access rights are defined. Finally the data to test the database is added in. BITS Pilani, Pilani Campus Eg, UNIVERSITY database STEP A: ANALYZE THE PROBLEM Step1 : What does this DB do? Information concerning students, courses, and grades in a university environment Step2 : What are the Data records? – – – – – STUDENT COURSE SECTION GRADE_REPORT PREREQUISITE Step 3: Step Specify structure of records of each file by specifying data type for each data element – String of alphabetic characters – Integer – Etc. BITS Pilani, Pilani Campus UNIVERSITY database contd… Step B: Relate the records. Step 4: Construct UNIVERSITY database – Store data to represent each student, course, section, grade report, and prerequisite as a record in appropriate file Step 5: Relationships among the records Step 6: Manipulation involves querying and updating BITS Pilani, Pilani Campus Examples of Queries and Updates: Examples of queries: – Retrieve the transcript – List the names of students who took the section of the ‘Database’ course offered in fall 2008 and their grades in that section – List the prerequisites of the ‘Database’ course Examples of updates: – Change the class of ‘Smith’ to sophomore – Create a new section for the ‘Database’ course for this semester – Enter a grade of ‘A’ for ‘Smith’ in the ‘Database’ section of last semester BITS Pilani, Pilani Campus UNIVERSITY Database designed as: BITS Pilani, Pilani Campus BITS Pilani Pilani Campus Session 2 Topic : ER and EER Design Practise Tutorial. LEARNING OUTCOME ER Designing EER Designing REFER: T1-Chapter 3 and 4; Sections: 3.1, 3.3-3.9, 4.1-4.3, 4.6 BITS Pilani, Pilani Campus Roadmap for ER design 1. ENTITY and its types strong, weak 2. Attributes and its types 3. Keys Primary , Foreign , Super , Candidate , Composite 4. Relationship and degree of relationship a. Mapping constraints or cardinality constraint b. Participation constraint c. Relationship weak and strong, degree d. When to use binary ,ternary and higher degree. e. Two entities can have multiple relations. f. Redundant relationships and removing the redundancy. 5. ER design steps. 6. Example of ER design. BITS Pilani, Pilani Campus Domains, Attributes, Tuples, and Relations BITS Pilani, Pilani Campus Example of entity and entity set Student BITS Pilani, Pilani Campus ATTRIBUTE in ER diagram 1. NULL values to attributes if that attribute doesn’t have value e.g., FName = NULL Nationality = NULL Gender = NULL 2. Domain: set of permitted values for that attribute. E.g., dom(pval) - any between 0 to +1 dom (votecast_age) any value from 18+ BITS Pilani, Pilani Campus TYPES OF ATTRIBUTES BITS Pilani, Pilani Campus TYPES OF ATTRIBUTES (contd.) BITS Pilani, Pilani Campus Attributes NOTE: Primary key will be underlined BITS Pilani, Pilani Campus KEYS Entities and relationships are distinguishable using various keys A key is a combination of one or more attributes, e.g., social-security number, combination of name and socialsecurity number. A superkey is a key defined either for an entity set or relationship set that uniquely identifies an entity, e.g., social-security number, phone number, combination of name and social-security number. ( Note : UNIQUE and NOT NULL) A candidate key is a minimal superkey that uniquely identifies either an entity or a relationship, e.g., social-security number, phone number. ( Note : UNIQUE and NOT NULL) A primary key is a candidate key that is chosen by the database designer to identify the entities of an entity set. ( Note : UNIQUE and NOT NULL) A Composite key is a candidate key with more one attribute to identify the entities of an entity set. eg., {stud_id, stud_email, stud_name} BITS Pilani, Pilani Campus KEYS A foreign key is a set of one or more attributes of a strong entity set that are employed to construct the discriminator of a weak entity set. The primary key of a weak entity set is formed by the primary key of the strong entity set on which it is existence-dependent. BITS Pilani, Pilani Campus Attributes BITS Pilani, Pilani Campus Attributes Construct ER: Employee has SSN, salary, Age, Bdate , phone numbers, Address of each employee complete address has area, city, state while street-add has door# and apt no,. BITS Pilani, Pilani Campus Attributes BITS Pilani, Pilani Campus Answer these: 1.Determine the type of attributes in Customer(name, age, addr, phno, DOB) book table 2. If an Employee can reside at the HQ and Bcity. Also that this addr is further divided as door#, street and city . What will this attribute addr be? 3. Consider the book table : find super key, Primary key, Candidate key. 4. For person entity find Pk,Ck? 5. If K1={ID} and K2{name, addr} which one should be Pk? 6. Find Composite Key attributes or alternate key attributes? Order{ Custid, orderid, sales} Student{ Sid, name, addr, mark} BITS Pilani, Pilani Campus ER notations BITS Pilani, Pilani Campus Relationship Relationships have: 1. Degree 2. How many entities of one entity is participating in the relation with how many entities of another entity called as mapping or cardinality constraint. 3. Role names 4. Participation constraints 5. Relationships can have attributes. BITS Pilani, Pilani Campus Relationship BITS Pilani, Pilani Campus Types of Relationship based on degree of relation. QUATERNARY BINARY TRENARY N-NARY BITS Pilani, Pilani Campus Questions 1. What are the redundant relationships in fig1 ? Fig 1 2. Draw ER for Employee has name and id and works in a project for so many hours. Project has a name and employee uses many machines identified by mcid attribute. 3. Represent a scenario “employee who supervises other employees” in ER 4. Convert fig 2 into a 1:N relationships Fig 2 Fig 3 5. What are the redundant relationships in fig3 ? BITS Pilani, Pilani Campus ER design steps 1. Identify nouns as it becomes entity types. 2. Identify all attributes of each entity type. 3. Mark the Pk ,partial keys (if it exists) for each entity type. 3 a. Identify if entity type is weak or strong. 4. Identify the verbs from problem statement which becomes Relationships. a. determine the degree of relationship as unary, binary or ternary b. determine Mapping constraints or cardinality constraint as 1:1,1:N or N:1 , M:N c.. Determine Participation constraint as total or partial d.. Relationship is weak only if one entity type in the relationship is weak e. determine if we have multiple relations. f. determine rolename if any. g. determine if relationship has any attributes. h. Redundant relationships and removing the redundancy. 5. Determine any aggregation is needed 6. Loop 1 to 5 until the design is acceptable (ie., only when it captures all data in problem statement). BITS Pilani, Pilani Campus Problem 1 Consider a mail order database in which employees take orders for parts from customers. The data requirements are summarized as follows: The mail order company has employees identified by a unique employee number, their first and last names, and a zip code where they are located. Customers of the company are uniquely identified by a customer number. In addition, their first and last names and a zip code where they are located are recorded. The parts being sold by the company are identified by a unique part number. In addition, a part name, their price, and quantity in stock are recorded. Orders placed by customers are taken by employees and are given a unique order number. Each order may contain certain quantities of one or more parts and their received date as well as a shipped date is recorded. Design an Entity-Relationship diagram for the mail order database. BITS Pilani, Pilani Campus Solution Step 1: Identify the Entities: Employee, Customer, Parts, Order Step 2: Identify the attributes of each entity: Employee => enum, fname, lname, zipcode Customer => custnum, fname,lname,zipcode Parts => partnum, partname, price, qtyinhand Order =>ordernum,recvddate, qty , shippeddate Step 3: Consider the “The mail order company has employees identified by a unique employee number, their first and last names, and a zip code where they are located” BITS Pilani, Pilani Campus Solution Step 4 : 2.Consider the “ Customers of the company are uniquely identified by a customer number. In addition, their first and last names and a zip code where they are located are recorded.” Step 5 : 3.Consider the “The parts being sold by the company are identified by a unique part number. In addition, a part name, their price, and quantity in stock are recorded.” BITS Pilani, Pilani Campus Solution Step 6 : BR 1 employee serves only one customer at a time 4. consider the “Orders placed by customers are taken by employees and are given a unique order number. Each order may contain certain quantities of one or more parts and their received date as well as a shipped date is recorded.” BITS Pilani, Pilani Campus BR 1 employee serves only one customer at a time What if BR is 1 employee can server more than 1 customer at a time ? BITS Pilani, Pilani Campus WHERE SHOULD “QTY” be stored? has Parts Order BITS Pilani, Pilani Campus Solution 2 BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Solution : BITS Pilani, Pilani Campus Solution 2 BITS Pilani, Pilani Campus Problem 2 UPS prides itself on having up-to-date information on the processing and current location of each shipped item. To do this, UPS relies on a companywide information system. Shipped items are the heart of the UPS product tracking information system. Shipped items can be characterized by item number (unique), weight, dimensions, insurance amount, destination, and final delivery date. Shipped items are received into the UPS system at a single retail center. Retail centers are characterized by their type, uniqueID, and address. Shipped items make their way to their destination via one or more standard UPS transportation events (i.e., flights, truck deliveries). These transportation events are characterized by a unique scheduleNumber, a type (e.g, flight, truck), and a deliveryRoute. Please create an Entity Relationship diagram that captures this information about the UPS system. Be certain to indicate identifiers and cardinality constraints. BITS Pilani, Pilani Campus These transportation events are characterized by a unique scheduleNumber, a type (e.g, flight, truck), and a deliveryRoute. Shipped items can be characterized by item number (unique), weight, dimensions, insurance amount, destination, and final delivery date. . Retail centers are characterized by their type, uniqueID, and address • Shipped items are received into the UPS system at a single retail center. • Shipped items make their way to their destination via one or more standard UPS transportation events (i.e., flights, truck deliveries). BITS Pilani, Pilani Campus Problem 3 A friend is interested in keeping track of information about his album collection. He is not concerned about whether or not the albums are CDs, tapes, LPs, etc. Also, assume that he does not have any compilation albums—that is, each album has songs from a single band. For each album, he wants to store which band recorded the album, the title, the year, and the chronology (e.g. this is the 4th album for that band). He also wants to store the songs, including title, length, track number, and writer(s). Of course, if two bands record the same song, they might have different track numbers and lengths. For each band (group or individual), he also wants to store the names of all of the band members. For each band member, he needs their first and last names, and country of origin. Consider both band members and songwriters as musicians. BITS Pilani, Pilani Campus store the songs, including title, length, track number, and writer(s). For each album, he wants to store which band recorded the album, the title, the year, and the chronology (e.g. this is the 4th album for that band). For each band (group or individual), he also wants to store the names of all of the band members. For each band member, he needs their first and last names, and country of origin. BITS Pilani, Pilani Campus Problem 4 Construct an E-R diagram for a car-insurance company whose customers own one or more cars each. Each car has associated with it zero to any number of recorded accidents. BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Problem 5 Consider a database used to record the marks that students get in different exams of different course offerings. BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Problem Suppose you are given the following requirements for a simple database for the National Hockey League (NHL): the NHL has many teams, each team has a name, a city, a coach, a captain, and a set of players, each player belongs to only one team, each player has a name, a position (such as left wing or goalie), a skill level, and a set of injury records, a team captain is also a player, a game is played between two teams (referred to as host_team and guest_team) and has a date (such as May 11th, 1999) and a score (such as 4 to 2). Construct a clean and concise ER diagram for the NHL database. BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Practise P r a c t i s e Practise….. BITS Pilani, Pilani Campus Roadmap for EER design 1. ENTITY and its types strong, weak 2. ENTITIES Generalization and Specialization. 3. Determine if lattices. 4. For each entity determine its Attributes and its types 5. Keys Primary , Foreign , Super , Candidate , Composite 6. Relationship and degree of relationship a. Mapping constraints or cardinality constraint b. Participation constraint c. Relationship weak and strong, degree d. When to use binary ,ternary and higher degree. e. Two entities can have multiple relations. f. Redundant relationships and removing the redundancy. 5. ER and EER design steps. 6. Example of EER design. BITS Pilani, Pilani Campus EER design steps 1. Specialization: extracting a subclass from an entity set. 2. Generalization: combining one or more entity sets into a higher-level entity. a. Disjoint generalization: an entity belongs to at most one lower-level entity set. b. Overlapping generalization: entities may belong to multiple lower-level entities. c. Hierarchy: each entity set is only the object of one “ISA” relationship. 3. Lattice: entity sets may belong to multiple “ISA” relationships. 4. Condition-defined constraint: defines membership in a subclass via a predicate. based on membership: d,o based on definition : User-defined constraint: membership is manually defined. (db user defined) predicate defined constraint: based on a condition it becomes one subclass based on Completeness constraint: all entities belong to a lower-level entity. Total constraint: all entities belong to lower-level entity sets. Partial Constraint: entities not required to belong to lower-level entity set. 5. Aggregation: grouping part of a schema into a larger unit. 6. LOOP from ER1 to EER 5 until design is accepted. BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus ER design steps 1. Identify nouns as it becomes entity types. 2. Identify all attributes of each entity type. 3. Mark the Pk ,partial keys (if it exists) for each entity type. 3 a. Identify if entity type is weak or strong. 4. Identify the verbs from problem statement which becomes Relationships. a. determine the degree of relationship as unary, binary or ternary b. determine Mapping constraints or cardinality constraint as 1:1,1:N or N:1 , M:N c.. Determine Participation constraint as total or partial d.. Relationship is weak only if one entity type in the relationship is weak e. determine if we have multiple relations. f. determine rolename if any. g. determine if relationship has any attributes. h. Redundant relationships and removing the redundancy. 5. Determine any aggregation is needed 6. Loop 1 to 5 until the design is acceptable (ie., only when it captures all data in problem statement). BITS Pilani, Pilani Campus EER design points When developing an ER diagram presents several choices, including the following: Should a concept be modelled as an entity or an attribute? Should a concept be modelled as an entity or a relationship? What are the relationship sets and their participating entity sets? Should we use binary or ternary relationships? Should we use aggregation UNIONs should be avoided. • ER design is subjective. There are often many ways to model a given scenario. Analyzing alternatives can be tricky, especially for a large enterprise. • Common choices include: Entity vs. attribute Key for the entity / to store or discard an attribute Entity vs. relationship Binary or n-ary relationship Use of ISA hierarchies (EER ) Use of aggregation BITS Pilani, Pilani Campus Specialization and Generalization Total and partial participation BITS Pilani, Pilani Campus Example a. To represent a person can be an employee or customer only. b. To represent a person can be a student and staff BITS Pilani, Pilani Campus Example 1. The EER diagram below describes the database of a training center, including information about its members, training activities and bookings. Each member is identified through his/her e-mail address. Gold-members can book any training activity, while common members can only book core activities. For each training activity, the database stores the schedule (week, week day, and time), the room, the leader and the e-mail of the leader. Each leader leads several activities per week, but the same activities every week. Training activities are yoga , core and Aerobics. BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Example 2: A nonprofit organization depends on a number of different types of persons for its successful operation. The organization is interested in the following attributes for all of these persons: SSN, name, Address, City, State, Zip and Telephones. Three types of persons are of greatest interest: employees, volunteers, and donors. Employees have only a Date Hired attribute, and volunteers have only a Skill attribute. Donors have only a relationship (named Donates) with an Item entity type. A donor must have donated one or more items, and an item may have no donors, or one or more donors. There are persons other than employees, volunteers, and donors who are of interest to the organization so that a person may not belong to any of these three groups and may also belong to more than one group at the same time. BITS Pilani, Pilani Campus There are persons other than employees, volunteers, and donors who are of interest to the organization so that a person may not belong to any of these three groups and may also belong to more than one group at the same time. BITS Pilani, Pilani Campus Example 3 Consider a bank, and model the following two aspects: • There are three different kinds of ACCOUNTs, namely SAVINGS_ACCTs, CHECKING_ACCTs and TRUSts. For each ACCOUNT we have to take care of its TRANSACTIONs. Each TRANSACTION has a type such as “deposit”, “withdrawal” or “check”. Furthermore, each transaction has a date/time (consisting of a date and a time) and an amount. • There are different kinds of LOANS, namely CAR_LOANS, HOME_LOANS, CREDIT_LINE and PERSONAL ones. For each LOAN we have to take care of its PAYMENTs. Each PAYMENT has a type, date and amount BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Example 4: Rules Entertainment is a chain of theaters owned by former husband and wife actors /entertainers who, for some reason, can’t get a job performing anymore. The owners want a database to track what is playing or has played on each screen in each theater of their chain at different times of the day. A theater (identified by a Theater ID and described by a theater name and location) contains one or more screens for viewing various movies. Within each theater each screen is identified by its number and is described by the seating capacity for viewing the screen. Movies are scheduled for showing in time slots each day. Each screen can have different time slots on different days (i.e., not all screens in the same theater have movies starting at the same time, and even on different days the same movie may play at different times on the same screen). For each time slot, the owners also want to know the end time of the time slot (assume all slots end on the same day the slot begins), attendance during that time slot, and the price charged for attendance in that time slot. Each movie (which can be either a trailer, feature, or commercial) is identified by a Movie ID and further described by its title, duration, and type (i.e., trailer, feature, or commercial). In each time slot, one or more movies are shown. The owners want to also keep track of in what sequence the movies are shown (e.g., in a time slot there might be two trailers, followed by two commercials, followed by a feature film, and closed with another commercial). BITS Pilani, Pilani Campus Answer BITS Pilani, Pilani Campus Example 5 BITS Pilani, Pilani Campus EER Models Design: Example BITS Pilani, Pilani Campus EER Models Design: Example BITS Pilani, Pilani Campus EER Models Design: Example BITS Pilani, Pilani Campus EER Models Design: Example BITS Pilani, Pilani Campus SOLUTION BITS Pilani, Pilani Campus Example BITS Pilani, Pilani Campus EXAMPLE ER : Consider a CONFERENCE_REVIEW database in which researchers submit their research papers for consideration. Reviews by reviewers are recorded for use in the paper selection process. The database system caters primarily to reviewers who record answers to evaluation questions for each paper they review and make recommendations regarding whether to accept or reject the paper. The data requirements are summarized as follows: ■ Authors of papers are uniquely identified by e-mail id. First and last names are also recorded. ■ Each paper is assigned a unique identifier by the system and is described by a title, abstract, and the name of the electronic file containing the paper. ■ A paper may have multiple authors, but one of the authors is designated as the contact author. ■ Reviewers of papers are uniquely identified by e-mail address. Each reviewer’s first name, last name, phone number, affiliation, and topics of interest are also recorded. ■ Each paper is assigned between two and four reviewers. A reviewer rates each paper assigned to him omarksr her on a scale of 1 to 10 in four categories: technical merit, readability, originality, and relevance to the conference. Finally, each reviewer provides an overall recommendation regarding each paper. ■ Each review contains two types of written comments: one to be seen by the review committee only and the other as feedback to the author(s). BITS Pilani, Pilani Campus Solution: BITS Pilani, Pilani Campus Solution 2 BITS Pilani, Pilani Campus TextBook Example ER The COMPANY database keeps track of a company’s employees, departments, and projects. Suppose that after the requirements collection and analysis phase, the database designers provide the following description of the miniworld—the part of the company that will be represented in the database. The company is organized into departments. Each department has a unique name, a unique number, and a particular employee who manages the department. We keep track of the start date when that employee began managing the department. A department may have several locations. A department controls a number of projects, each of which has a unique name, a unique number, and a single location. The database will store each employee’s name, Social Security number,2 address, salary, sex (gender), and birth date. An employee is assigned to one department, but may work on several projects, which are not necessarily controlled by the same department. It is required to keep track of the current number of hours per week that an employee works on each project, as well as the direct supervisor of each employee (who is another employee). The database will keep track of the dependents of each employee for insurance purposes, including each dependent’s first name, sex, birth date, and relationship to the employee. BITS Pilani, Pilani Campus ER Diagram BITS Pilani, Pilani Campus BITS Pilani Pilani Campus Session 3 Topic : Relational model and Logical design (ER /EER to RM) and Normalization LEARNING OUTCOME Relational model concepts Relational data model constraints ER /EER to RM FDs Normalization REFER: T1-Chapter 5 Sections: 5.1- 5.3 BITS Pilani, Pilani Campus Data Model and schema • The data model emphasizes on what data is needed and how it should be organized instead of what operations will be performed on data. • The data models are used to represent the data and how it is stored in the database and to set the relationship between data items. Data Model is like an architect's building plan, which helps to build conceptual models and set a relationship between data items. So it is an abstract model that organizes data description, data semantics, and consistency constraints of data. There are 3 different types of data models: conceptual data models, logical data models, and physical data models, and each one has a specific purpose. 1. Conceptual Data Model: This Data Model defines WHAT the system contains. This model is typically created by Business stakeholders and Data Architects. The purpose is to organize, scope and define business concepts and rules. This is a high level data model where we use ER , EER and UML to represent the business concepts and organize data. 2. Logical Data Model: Defines HOW the system should be implemented regardless of the DBMS. This model is typically created by Data Architects and Business Analysts. The purpose is to developed technical map of rules and data structures. Schema or relational schema is belonging to logical data model. Schema means a logical view. 3. Physical Data Model: This Data Model describes HOW the system will be implemented using a specific DBMS system. This model is typically created by DBA and developers. The purpose is actual implementation of the database. While table basically stored in files belong to physical data model. BITS Pilani, Pilani Campus Relational model concepts BITS Pilani, Pilani Campus Relational Model Concepts What is a RELATION SCHEMA? BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus Characteristics of relations a relation is a set – tuples are not in any order, and have no duplicates flat relational model – values are atomic, not structures or lists NULL (ω) – “information missing”, “not applicable”; ambiguous semantics, not a member of any domain the order in which the attributes are listed in a table is irrelevant. The Null value: used for don't know, not applicable or value undefined Values of Attributes: For a relation to be in First Normal Form, each of its attribute domains must consist of atomic (neither composite nor multi-valued) values. Notation BITS Pilani, Pilani Campus Relational Model Concepts Informal Terms Formal Terms Table Column Row Values in a column Table Definition Relation Attribute/Domain Tuple Domain Schema of a Relation Populated Table Extension BITS Pilani, Pilani Campus Explicit or Schema-based constraints Constraints are conditions that must hold on all valid relation instances. There are three main types of constraints: 1.Key constraints a. Each attribute value must be either null (which is really a non-value) or drawn from the domain of that attribute. b. for any two distinct tuples t1 and t2 in a relation state r of R, we have the constraint that: t1[SK] ≠ t2[SK] 2. Entity integrity constraints The primary key attributes PK of each relation schema cannot have null values in any tuple. 3. Referential integrity constraints tuple in one relation is related to tuple in another relation. BITS Pilani, Pilani Campus Referential Integrity Constraint Statement of the constraint The value in the foreign key column (or columns) FK of the the referencing relation R1 can be either: (1) a value of an existing primary key value of the corresponding primary key PK in the referenced relation R2,, or.. (2) a null. In case (2), the FK in R1 should not be a part of its own primary key. BITS Pilani, Pilani Campus Relational DB and Relational DB Schemas BITS Pilani, Pilani Campus Relational DB and Relational DB Schemas BITS Pilani, Pilani Campus One possible database state for the COMPANY relational database schema. BITS Pilani, Pilani Campus Relations together with a set of integrity constraints. BITS Pilani, Pilani Campus Operations and Constraint violation We now see how violations happens INSERT a tuple. DELETE a tuple. MODIFY/Update a tuple. BITS Pilani, Pilani Campus Update Operations on Relations and violations ■ Operation: Update the salary of the EMPLOYEE tuple with Ssn = ‘999887777’ to 28000. ■ Operation: Update the Dno of the EMPLOYEE tuple with Ssn = ‘999887777’ to 1. ■ Operation: Update the Dno of the EMPLOYEE tuple with Ssn = ‘999887777’ to 7. ■ Operation: Update the Ssn of the EMPLOYEE tuple with Ssn = ‘999887777’ to ‘987654321’. Result: Acceptable. Result: Acceptable. Result: Unacceptable, because it violates referential integrity. Result:: Unacceptable, because it violates primary key constraint by repeating a value that already exists as a primary key in another tuple; it violates referential integrity constraints because there are other relations that refer to the existing value of Ssn BITS Pilani, Pilani Campus Insert operation and handle violation ■ Operation: Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, NULL, ‘1960-04-05’, ‘6357 Windy Lane, Katy, TX’, F, 28000, NULL, 4> into EMPLOYEE. ■ Operation: Insert <‘Alicia’, ‘J’, ‘Zelaya’, ‘999887777’, ‘1960-04-05’, ‘6357 Windy Lane, Katy, TX’, F, 28000, ‘987654321’, 4> into EMPLOYEE. ■ Operation: Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, ‘677678989’, ‘1960-04-05’, ‘6357 Windswept, Katy, TX’, F, 28000, ‘987654321’, 7> into EMPLOYEE. ■ Operation: Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, ‘677678989’, ‘1960-04-05’, ‘6357 Windy Lane, Katy, TX’, F, 28000, NULL, 4> into EMPLOYEE. Result: This insertion violates the entity integrity constraint (NULL for the primary key Ssn), so it is rejected. Result: This insertion violates the key constraint because another tuple with the same Ssn value already exists in the EMPLOYEE relation, and so it is rejected. Result: This insertion violates the referential integrity constraint specified on Dno in EMPLOYEE because no corresponding referenced tuple exists in DEPARTMENT with Dnumber = 7 Result: This insertion satisfies all constraints, so it is acceptable BITS Pilani, Pilani Campus Delete operations and constraint violations ■ Operation: Delete the WORKS_ON tuple with Essn = ‘999887777’ and Pno = 10. ■ Operation: Delete the EMPLOYEE tuple with Ssn = ‘999887777’. ■ Operation: Delete the EMPLOYEE tuple with Ssn = ‘333445555’. Result: This deletion is acceptable and deletes exactly one tuple. Result: This deletion is not acceptable, because there are tuples in WORKS_ON that refer to this tuple. Hence, if the tuple in EMPLOYEE is deleted, referential integrity violations will result. Result: This deletion will result in even worse referential integrity violations, because the tuple involved is referenced by tuples from the EMPLOYEE, DEPARTMENT, WORKS_ON, and DEPENDENT relations BITS Pilani, Pilani Campus In-Class Exercise Consider the following relations for a database that keeps track of student enrollment in courses and the books adopted for each course: STUDENT(SSN, Name, Major, Bdate) COURSE(Course#, Cname, Dept) STUDENT(SSN, Name, Major, Bdate) COURSE(Course#, Cname, Dept) ENROLL(SSN, Course#, Quarter, Grade) ENROLL(SSN, Course#, Quarter, Grade) BOOK_ADOPTION(Course#, Quarter, Book_ISBN) BOOK_ADOPTION(Course#, Quarter, Book_ISBN) TEXT(Book_ISBN, Book_Title, Publisher, Author) TEXT(Book_ISBN, Book_Title, Publisher, Author) Draw a relational schema diagram specifying the foreign keys for this schema. BITS Pilani, Pilani Campus LEARNING OUTCOME Mapping ER Constructs to relations Mapping Class hierarchies REFER: T1-Chapter 5 Sections: 5.1- 5.3 BITS Pilani, Pilani Campus Database Modelling and Implementation Process (Problem statement) BITS Pilani, Pilani Campus Mapping ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations BITS Pilani, Pilani Campus ER Constructs to relations ER-to-Relational Mapping Algorithm Step 1: Mapping of Regular Entity Types Step 2: Mapping of Weak Entity Types Step 3: Mapping of Binary 1:1 Relation Types Step 4: Mapping of Binary 1:N Relationship Types. Step 5: Mapping of Binary M:N Relationship Types. Step 6: Mapping of Multivalued attributes. Step 7: Mapping of N-ary Relationship Types. Mapping EER Model Constructs to Relations Step 8: Options for Mapping Specialization or Generalization. Step 9: Mapping of Union Types (Categories). BITS Pilani, Pilani Campus ER-to-Relational Mapping Steps Step 7: Mapping of N-ary Relationship Types. (Non-binary relationships) For each n-ary relationship type R, where n > 2, create a new relation S to represent R. Include as foreign key attributes in S the primary keys of the relations that represent the participating entity types. Also include any simple attributes of the n-ary relationship type (or simple components of composite attributes) as attributes of S. Example: The relationship type SUPPY in the ER on the next slide. This can be mapped to the relation SUPPLY shown in the relational schema, whose primary key is the combination of the three foreign keys {SNAME, PARTNO, PROJNAME} BITS Pilani, Pilani Campus ER-to-Relational Mapping Steps FIGURE 4.11 Ternary relationship types. (a) The SUPPLY relationship. BITS Pilani, Pilani Campus ER-to-Relational Mapping Steps FIGURE 7.3 Mapping the n-ary relationship type SUPPLY from Figure 4.11a. BITS Pilani, Pilani Campus ER-to-Relational Mapping Steps BITS Pilani, Pilani Campus ER to RM tutorials BITS Pilani, Pilani Campus Problem 1 BITS Pilani, Pilani Campus solution 1 BITS Pilani, Pilani Campus Alternate Solution BITS Pilani, Pilani Campus Problem 2: BITS Pilani, Pilani Campus Solution: BITS Pilani, Pilani Campus Problem 3: BITS Pilani, Pilani Campus Solution : BITS Pilani, Pilani Campus TEXTBOOK Problem : COMPANY DB BITS Pilani, Pilani Campus COMPANY : ER BITS Pilani, Pilani Campus Solution: BITS Pilani, Pilani Campus ER-to-Relational Mapping Steps BITS Pilani, Pilani Campus EER-to-Relational Mapping Steps Step8: Options for Mapping Specialization or Generalization. Option 8A: Multiple relations, Super class and subclasses. Create a relation for the super class, including the super class attributes. Create a relation for each subclass, which includes the primary key of the super class (which acts as the foreign key) and the attributes of the subclass specialization. This works for any specialization (partial, total, disjoint, overlapping) Option 8B: Multiple relations, Subclass relations only Create a relation for each subclass, with the attributes of both the super class and the attributes of the subclass. This only works for total specializations, meaning that every entity in the super class must belong to at least one subclass. Otherwise members of the super class that don’t belong to a subclass will not be represented. BITS Pilani, Pilani Campus EER-to-Relational Mapping Steps Multiple relations, Super class and subclasses. FIGURE 7.4 – Using Option 8A Create a relation for the super class, including the super class attributes. Create a relation for each subclass, which includes the primary key of the super class (which acts as the foreign key) and the attributes of the subclass specialization. This works for any specialization (partial, total, disjoint, overlapping) BITS Pilani, Pilani Campus EER-to-Relational Mapping Steps Tonnage Multiple relations, Subclass relations only FIGURE 7.4 – Using Option 8B Create a relation for each subclass, with the attributes of both the super class and the attributes of the subclass. This only works for total specializations, meaning that every entity in the super class must belong to at least one subclass. Otherwise members of the super class that don’t belong to a subclass will not be represented. BITS Pilani, Pilani Campus EER-to-Relational Mapping Steps Option 8C: Single relation with one type attribute. Create a single relation, with all the attributes of the super class and all the attributes of a subclass. Include a ‘Type’ attribute, which is the discriminating attribute which indicates which subclass the row belongs to. This only works if the specialization is disjoint, meaning the super class entity cannot be a member of more than one subclass. Option 8D: Single relation with multiple type attributes. Create a single relation with all the attributes of the super class and all the attributes of the subclass. Include a Boolean “Type” attribute for each subclass, which indicates whether the row belongs to that subclass. This works with overlapping specializations, to indicate if the super class entity belongs to more than one subclass. BITS Pilani, Pilani Campus EER-to-Relational Mapping Steps EngType Single relation with one type attribute. Create a single relation, with all the attributes of the super class and all the attributes of a subclass. Include a ‘Type’ attribute, which is the discriminating attribute which indicates which subclass the row belongs to. This only works if the specialization is disjoint, meaning the super class entity cannot be a member of more than one subclass. BITS Pilani, Pilani Campus EER-to-Relational Mapping Steps Single relation with multiple type attributes O Create a single relation with all the attributes of the super class and all the attributes of the subclass. Include a Boolean “Type” attribute for each subclass, which indicates whether the row belongs to that subclass. This works with overlapping specializations, to indicate if the super class entity belongs to more than one subclass. BITS Pilani, Pilani Campus EER-to-Relational Mapping Steps Step 9: Mapping of Union Types (Categories). For mapping a category whose defining super classes have different keys, you can specify a new key attribute, called a surrogate key, when creating a relation to correspond to the category. Then create a relation for each category, which includes the attributes of the category, and the surrogate key, which acts as the foreign key. BITS Pilani, Pilani Campus EER to RM tutorials BITS Pilani, Pilani Campus Problem 4: BITS Pilani, Pilani Campus Solution : BITS Pilani, Pilani Campus Problem 5: BITS Pilani, Pilani Campus Solution 1: BITS Pilani, Pilani Campus Solution 2 :Multiple relations ,super and sub classes 8A BITS Pilani, Pilani Campus Problem 6: BITS Pilani, Pilani Campus Solution: BITS Pilani, Pilani Campus Solution BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus FUNCTIONAL DEPENDENCIES BITS Pilani, Pilani Campus FD What is a functional dependency? Functional Dependency is when one attribute determines another attribute in a DBMS system. Functional Dependency plays a vital role to find the difference between good and bad database design. Example: if we know the value of Employee number, we can By this, we can say that the city, Employee Name, and salary are functionally depended on Employee number. A functional dependency is denoted by an arrow → The functional dependency of X on Y is represented by X →Y obtain Employee Name, city, salary, etc. Employee number Employee Name Salary City 1 Dana 50000 San Francisco 2 Francis 38000 London 3 Andrew 25000 Tokyo BITS Pilani, Pilani Campus Functional Dependencies Determine if FD for the following schema? Determine if FD: eid {ename, age} valid or not? BITS Pilani, Pilani Campus Functional Dependencies Armstrong Axioms or Rules are : Reflexivity: X->X // An attribute(s) determines itself. Augmentation: if X->Y then XZ->YZ. Transitivity: if X->Y & Y->Z then X->Z. Additivity or Union : if X->Y & X->Z then X->YZ. Projectivity or Decomposition: If X->YZ then X-> Y & X->Z. Pseudo-Transitivity: If X->Y, YZ->W then XZ->W. BITS Pilani, Pilani Campus Types of Functional Dependencies Trivial dependency is a set of attributes which are called a trivial if the set of attributes are included in that attribute. So, X -> Y is a trivial functional dependency if Y is a subset of X. Ie., {X,Y} ->X The following dependencies are also trivial: X->X & Y->Y. Emp_id Emp_name AS555 Harry AS811 George AS999 Kevin Consider this table with two columns Emp_id and Emp_name. {Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id is a subset of {Emp_id,Emp_name}. BITS Pilani, Pilani Campus Types of Functional Dependencies Non-trivial functional dependency Functional dependency which also known as a nontrivial dependency occurs when A->B subset of A. In a relationship, if attribute B is not a subset of attribute A, then it is dependency. Company CEO Age Example: Microsoft Satya Nadella 51 Google Sundar Pichai 46 (Company} -> {CEO} (if we know the Company, we know the CEO name) Apple Tim Cook 57 But CEO is not a subset of Company, and hence it's nontrivial functional dependency. BITS Pilani, Pilani Campus Types of Functional Dependencies Multivalued dependency occurs in the situation where there are multiple independent multivalued attributes in a single table. A multivalued dependency is a complete constraint between two sets of attributes in a relation. It in a relation. Car_model Maf_year Color H001 2017 Metallic H001 2017 Green H005 2018 Metallic H005 2018 Blue H010 2015 Metallic H033 2012 Gray Maf_year and color are independent of each other but dependent on car_model. In this example, these two columns are said to be multivalue dependent on car_model. This dependence can be represented like this: car_model -> maf_year car_model-> colour BITS Pilani, Pilani Campus Types of Functional Dependencies A transitive dependency is a type of functional dependency which happens when t is indirectly formed by two functional dependencies. {Company} -> {CEO} (if we know the company, we know its CEO's name) {CEO } -> {Age} If we know the CEO, we know the Age Company CEO Age Microsoft Satya Nadella 51 Google Sundar Pichai 46 Alibaba Jack Ma 54 Therefore according to the rule of rule of transitive dependency: { Company} -> {Age} should hold, that makes sense because if we know the company name, we can know his age. Note: You need to remember that transitive dependency can only occur in a relation of three or more attributes. BITS Pilani, Pilani Campus Types of Functional Dependencies Full Functional Dependency: A FD X ... Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more. Partial FD : A functional dependency X → Y is a partial dependency if some attribute A ε X can be removed from X and the dependency still holds; that is, for some A ε X, (X − {A}) → Y. BITS Pilani, Pilani Campus FD Advantages of Functional Dependency Functional Dependency avoids data redundancy. Therefore same data do not repeat at multiple locations in that database It helps you to maintain the quality of data in the database It helps you to defined meanings and constraints of databases It helps you to identify bad designs It helps you to find the facts regarding the database design BITS Pilani, Pilani Campus Logical Design - Normalization BITS Pilani, Pilani Campus Normalization BITS Pilani, Pilani Campus Normalization: 1NF A relation will be 1NF If it contains an atomic value. It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute. First normal form disallows the multi-valued attribute, composite attribute, and their combinations. EMP_ID EMP_NAME EMP_PHONE EMP_STATE 14 John 7272826385, 9064738238 UP 20 Harry 8574783832 Bihar 12 Sam 7390372389, 8589830302 Punjab Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE. SOLUTION: The decomposition of the EMPLOYEE table into 1NF has been shown below: EMP_ID EMP_NAME EMP_PHONE EMP_STATE 14 John 7272826385 UP 14 John 9064738238 UP 20 Harry 8574783832 Bihar 12 Sam 7390372389 Punjab 12 Sam 8589830302 Punjab BITS Pilani, Pilani Campus Normalization : 1NF 1NF (tables values are single and atomic) Check if table is in UnNormalized Form UNF (ie., tables values are not single and atomic) Yes: Relation should have no multivalued attributes or nested relations. So make all values as single and atomic. Remedy : Form new relations for each multivalued attribute or nested relation. BITS Pilani, Pilani Campus Normalization: 2 NF In the 2NF, Relational must be in 1NF. All non-key attributes are fully functional dependent on the primary key Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can teach more than one subject. TEACHER table TEACHER_ID SUBJECT TEACHER_AGE 25 Chemistry 30 25 Biology 30 47 English 35 83 Math 38 83 Computer 38 Is the Table in 2NF? In the given table, nonprime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate key. That's why it violates the rule for 2NF. To convert the given table into 2NF, we decompose it into two tables: BITS Pilani, Pilani Campus Normalization : 2 NF TEACHER_DETAIL table: TEACHER_ID TEACHER_AGE 25 30 47 35 83 38 TEACHER_SUBJECT table: TEACHER_ID SUBJECT 25 Chemistry 25 Biology 47 English 83 Math 83 Computer BITS Pilani, Pilani Campus Normalization : 2 NF 2NF ( partial dependency of NPA on PK attributes) Check if table already in 1NF (ie., values as single and atomic in the table) Yes : Check if table in 2NF (ie., partial dependency of NPA on PK attributes) TEST for 2NF: For relations where primary key contains multiple attributes, no nonkey attribute should be functionally dependent on a part of the primary key. Remedy : Decompose and set up a new relation. for each partial key with its dependent attribute(s). Make sure to keep a relation with the original primary key and any attributes that are fully functionally dependent on it. Ie., PA -- Fully dependent NPA’s FD 1: SSN Pnumber, hours FD 2: SSN ename FD 3: Pnumber Pname, Plocation Now table in 2NF BITS Pilani, Pilani Campus 2NF example: BITS Pilani, Pilani Campus Normalization: 3 NF A relation will be in 3NF If it is in 2NF and not contain any transitive partial dependency. 3NF is used to reduce the data duplication. It is also used to achieve the data integrity. If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form. A relation is in third normal form if it holds at-least one of the following conditions for every non-trivial function dependency X → Y. 1.X is a super key. 2.Y is a prime attribute, i.e., each element of Y is part of some candidate key. EMPLOYEE_DETAIL table: EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY 222 Harry 201010 UP Noida 333 Stephan 02228 US Boston 444 Lan 60007 US Chicago 555 Katharine 06389 UK Norwich 666 John 462007 MP Bhopal Super key in the table above: {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on Candidate key: {EMP_ID} BITS Pilani, Pilani Campus Normalization: 3 NF Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime. Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. Is table in 3NF? The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third normal form. SOLUTION That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key. EMPLOYEE_ZIP table: EMPLOYEE table: EMP_ZIP EMP_STATE EMP_CITY EMP_ID EMP_NAME EMP_ZIP 222 333 444 555 666 Harry Stephan Lan Katharine John 201010 02228 60007 06389 462007 201010 UP Noida 02228 US Boston 60007 US Chicago 06389 UK Norwich 462007 MP Bhopal BITS Pilani, Pilani Campus Normalization: 3 NF 3NF (no transitive dependency) Check if table in 2NF already (ie., NPA are fully functional dependent on PA) YES : Check if table in 3NF (ie., no transitive dependency) Test : Relation should not have a nonkey attribute functionally determined by another nonkey attribute (or by a set of nonkey attributes). That is, there should be no transitive dependency of a nonkey attribute on the primary key. Or A relation schema R is in 3NF if every nonprime attribute of R meets both of the following conditions: ■ It is fully functionally dependent on every key of R. ■ It is nontransitively dependent on every key of R. Remedy: Decompose and set up a relation that includes the nonkey attribute(s) that functionally determine(s) other nonkey attribute(s). SSN - Dnumber and Dnumber - Dname, DmgrSSN BITS Pilani, Pilani Campus 3NF example BITS Pilani, Pilani Campus BCNF ( BOYCE CODD NF) BCNF is the advance version of 3NF. It is stricter than 3NF. A table is in BCNF if every functional dependency X → Y, X is the super key of the table. For BCNF, the table should be in 3NF, and for every FD, LHS is super key. Example: Let's assume there is a company where employees work in more than one department. EMPLOYEE table: EMP_ID EMP_COUN EMP_DEPT TRY DEPT_TYPE EMP_DEPT_ NO 264 India Designing D394 283 264 India Testing D394 300 364 UK Stores D283 232 364 UK Developing D283 549 Candidate key: {EMP-ID, EMP-DEPT} Is the table in BCNF? The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys. In the above table Functional dependencies are as follows: 1.EMP_ID → EMP_COUNTRY 2.EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO} BITS Pilani, Pilani Campus BCNF ( BOYCE CODD NF) To convert the given table into BCNF, we decompose it into three tables: Functional dependencies: EMP_COUNTRY table: 1.EMP_ID → EMP_COUNTRY EMP_ID EMP_COUNTRY 2.EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO} 264 India EMP_DEPT_MAPPING table: 264 India EMP_ID EMP_DEPT EMP_DEPT table: D394 283 EMP_DEPT DEPT_TYPE EMP_DEPT_NO D394 300 Designing D394 283 D283 232 Testing D394 300 Stores D283 232 Developing D283 549 D283 549 Candidate keys: For the first table: EMP_ID For the second table: EMP_DEPT For the third table: {EMP_ID, EMP_DEPT} Now, this is in BCNF because left side part of both the functional dependencies is a key. BITS Pilani, Pilani Campus BCNF ( BOYCE CODD NF) Example: consider a relation schema BOOK_RATING(ISBN, Book_title, R_ID, Rating) . The candidate keys are (ISBN, R_ID) and This relation schema is not in BCNF since both the candidate keys are composite as well as overlapping. However, it is in 3NF. (Book_title, R_ID). Remedy : problem can be resolved by decomposing this relation schema into two relation schemas as shown here. BOOK_TITLE_INFO(ISBN, Book_title) and REVIEW(R_ID, ISBN, Rating) Or BOOK_TITLE_INFO(ISBN, Book_title) and REVIEW(R_ID, Book_title, Rating) Now, all these relation schemas are in BCNF. Note that BCNF is the most desirable normal form as it ensures the elimination of all redundancy that can be detected using functional dependencies. Note: If there is only one determinant upon which other attributes depend and it is a candidate key, 3NF and BCNF are identical. BITS Pilani, Pilani Campus Normalization : 4NF ( MVD ) Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a third attribute. A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it always requires at least three attributes. Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of each model every year. BIKE_MODEL MANUF_YEAR COLOR M2011 M2001 M3001 M3001 M4006 M4006 2008 2008 2013 2013 2017 2017 White Black White Black White Black Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other. In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The representation of these dependencies is shown below: 1.BIKE_MODEL → → MANUF_YEAR 2.BIKE_MODEL → → COLOR This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL multidetermined COLOR". BITS Pilani, Pilani Campus Normalization : 4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency. For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will be a multi-valued dependency. Example Is the table in 4NF? STUDENT The given STUDENT table is in 3NF, but the STU_ID COURSE HOBBY 21 Computer Dancing 21 Math Singing 34 Chemistry Dancing 74 Biology Cricket 59 Physics Hockey COURSE and HOBBY are two independent entity. Hence, there is no relationship between COURSE and HOBBY. In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data. BITS Pilani, Pilani Campus Normalization : 4NF So to make the above table into 4NF, we can decompose it into two tables: STUDENT_COURSE STUDENT_HOBBY STU_ID COURSE STU_ID HOBBY 21 Computer 21 Dancing 21 Math 21 Singing 34 Chemistry 34 Dancing 74 Biology 74 Cricket 59 Physics 59 Hockey BITS Pilani, Pilani Campus Normalization : 4NF BITS Pilani, Pilani Campus Normalization: 5NF A table is said to be in the 5NF if and only if it is in 4NF and every Join dependency in it is implied by the candidate key. BITS Pilani, Pilani Campus Normalization: 5NF Definition. A relation schema R is in fifth normal form (5NF) (or project-join normal form (PJNF)) with respect to a set F of functional, multivalued, and join dependencies if, for every nontrivial join dependency JD(R1, R2, ..., Rn) in F+ (that is, implied by F), every Ri is a superkey of R. BITS Pilani, Pilani Campus BITS Pilani Pilani Campus Tutorial Session : 5 RA, SQL. LEARNING OUTCOME RA BITS Pilani, Pilani Campus Example 1 BITS Pilani, Pilani Campus EXAMPLE BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus LEARNING OUTCOME SQL BITS Pilani, Pilani Campus SQL tutorials Find the names and ages of all sailors. SELECT DISTINCT S.sname, S.age FROM Sailors S Find all sailors with a rating above 7. SELECT S.sid, S.sname, S.rating, S.age FROM Sailors AS S WHERE S.rating > 7 Find the names of sailors who have reserved boat number 103. SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid = R.sid AND R.bid=103 Find the sids of sailors who have reserved a red boat. SELECT R.sid FROM Boats B, Reserves R WHERE B.bid = R.bid AND B.color = ‘red’ BITS Pilani, Pilani Campus SQL tutorials Find the names of sailors who have reserved a red boat. Find all sailors with a rating less than 20. Find the colors of boats reserved by Lubber. SELECT S.sname FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = ‘red’ SELECT S.sid, S.sname, S.rating, S.age FROM Sailors AS S WHERE S.rating <20 SELECT B.color FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid AND S.sname = ‘Lubber’ Find the names of sailors who have reserved at least one boat. SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid = R.sid BITS Pilani, Pilani Campus SQL tutorials Compute increments for the ratings of persons who have sailed two different boats on the same day. Find the ages of sailors whose name begins and ends with B and has at least three characters. SELECT S.sname, S.rating+1 AS rating FROM Sailors S, Reserves R1, Reserves R2 WHERE S.sid = R1.sid AND S.sid = R2.sid AND R1.day = R2.day AND R1.bid <> R2.bid SELECT S.age FROM Sailors S WHERE S.sname LIKE ‘B %B’ Find the names of sailors who have reserved both a red and a green boat. SELECT S.sname FROM Sailors S, Reserves R1, Boats B1, Reserves R2, Boats B2 WHERE S.sid = R1.sid AND R1.bid = B1.bid AND S.sid = R2.sid AND R2.bid = B2.bid AND B1.color=‘red’ AND B2.color = ‘green’ Or SELECT S.sname FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = ‘red’ UNION SELECT S2.sname FROM Sailors S2, Boats B2, Reserves R2 WHERE S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = ‘green’ BITS Pilani, Pilani Campus BITS Pilani Pilani Campus Session 6: Topic : Logical design II (FD and Normalization) BITS Pilani, Pilani Campus LEARNING OUTCOME FDs Normalization REFER: T1-Chapter 5 Sections: 5.1- 5.3 BITS Pilani, Pilani Campus FUNCTIONAL DEPENDENCIES BITS Pilani, Pilani Campus FD What is a functional dependency? By this, we can say that Employee Name, and salary are functionally depended on Employee number. Functional Dependency is when one attribute determines another attribute in a DBMS A functional dependency is denoted by an arrow → system. Functional Dependency plays a vital role to find the difference between good and The functional dependency of X on Y is represented bad database design. by X →Y Example: if we know the value of Employee number, we can obtain Employee Name and Salary Written as : Employee number -- Employee Name , Salary. Employee number Employee Name Salary 1 Dana 50000 2 Francis 38000 3 Andrew 25000 A Functional dependency is a constraint between two sets of attributes in a relation from a database. BITS Pilani, Pilani Campus Functional Dependencies : Exercise Determine the FDs for the following schema? Which is valid FD ? TEXT → COURSE, TEACHER → COURSE, TEACHER --> TEXT, COURSE → TEXT Determine if FD: eid {ename, age} valid or not? A relation state of TEACH with a possible functional dependency TEXT → COURSE. However, TEACHER → COURSE, TEACHER TEXT and COURSE → TEXT are ruled out. Determine if FD: eid ename valid or not? BITS Pilani, Pilani Campus Functional Dependencies BITS Pilani, Pilani Campus Types of Functional Dependencies Trivial dependency is a set of attributes which are called a trivial if the set of attributes are included in that attribute. So, X -> Y is a trivial functional dependency if ie., {X,Y} ->X The following dependencies are also trivial: X->X & Y->Y. Emp_id Emp_name AS555 Harry AS811 George AS999 Kevin Consider this table with two columns Emp_id and Emp_name. X is superkey and is {Emp_id, Emp_name} So {Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id is a subset of {Emp_id,Emp_name}. BITS Pilani, Pilani Campus Types of Functional Dependencies Non-trivial functional dependency Functional dependency which also known as a nontrivial dependency occurs when X->Y holds true where Y is not a subset of X. In a relationship, if then it is considered as a non-trivial dependency. Example: Company CEO Age Microsoft Satya Nadella 51 Google Sundar Pichai 46 Apple Tim Cook 57 {Company} -> {CEO} (if we know the Company, we know the CEO name) X= Company which is CK and Y is CEO , but Since CEO is not a subset of Company, and hence it's nontrivial functional dependency. BITS Pilani, Pilani Campus Types of Functional Dependencies Multivalued dependency occurs in the situation where there are multiple independent multivalued attributes in a single table. A multivalued dependency is a complete constraint between two sets of attributes in a relation. It present in a relation. Car_model Maf_year Color H001 2017 Metallic H001 2017 Green H005 2018 Metallic H005 2018 Blue H010 2015 Metallic H033 2012 Gray Maf_year and color are independent of each other but dependent on car_model. In this example, these two columns are said to be multivalue dependent on car_model. This dependence can be represented like this: car_model maf_year and car_model colour BITS Pilani, Pilani Campus Types of Functional Dependencies A transitive dependency is a type of functional dependency which happens when t is indirectly formed by two functional dependencies. {Company} -> {CEO} (if we know the company, we know its CEO's name) {CEO } -> {Age} If we know the CEO, we know the Age. Represented as {Company} -> {CEO} and {CEO}->{Age} Company CEO Age Microsoft Satya Nadella 51 Google Sundar Pichai 46 Alibaba Jack Ma 54 Therefore according to the rule of rule of transitive dependency: { Company} -> {Age} should hold, that makes sense because if we know the company name, we can company name, we can know his age. Note: You need to remember that transitive dependency can only occur in a relation of three or more attributes. BITS Pilani, Pilani Campus Types of Functional Dependencies Full Functional Dependency: A FD X Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more. X is superkey ie., {SSN,Pnumber} If we remove SSN from X then Pnumber Hours is not VALID! IIly If we remove Pnumber from X then SSN Hours is also not VALID! This is called FULL FD. Partial FD : A functional dependency X → Y is a partial dependency if some attribute A ε X can be removed from X and the dependency still holds; that is, for some A ε X, (X − {A}) → Y. X is superkey ie., {SSN,Pnumber} Y is Ename If we remove SSN from X and Now FD2 is SSN Ename holds true . This is called Partially FD. IIly FD3 Pnumber Pname. Plocation is also Partial FD BITS Pilani, Pilani Campus FD Advantages of Functional Dependency Functional Dependency avoids data redundancy. Therefore same data do not repeat at multiple locations in that database It helps you to maintain the quality of data in the database It helps you to defined meanings and constraints of databases It helps you to identify bad designs It helps you to find the facts regarding the database design BITS Pilani, Pilani Campus Key Attributes and its types: If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys. In a practical relational database, each relation schema must have a primary key. If no candidate key is known for a relation, the entire relation can be treated as a default superkey. BITS Pilani, Pilani Campus Prime and Non Prime attributes Prime Attributes or PA : An attribute of relation schema R is called a prime attribute of R if it is a member of some candidate key of R. For Ex. : Work-on Both SSN and PNUMBER are prime attributes of work-on Non Prime Attribute or NPA : An attribute of relation schema R is called non prime attribute if it is not a member of any candidate key. e.g., Hours is non prime attribute of work-on. BITS Pilani, Pilani Campus Closure of Attributes using the FDs: Let R = {A, B, C, D, E, F} and a set of FDs F={ A ->BC, E-> CF, B ->E, CD ->EF, F->D} Compute the closure of a set of attribute {A, B} under the given set of FDs. Let R = {A, B, C, D, E, F} and a set of FDs F ={A ->BC, E-> CF, B ->E, CD ->EF, F->D} Compute the closure of an attribute {A} under the given set of FDs. BITS Pilani, Pilani Campus Attribute closure and Extraneous Attributes : F= {A->D, BC->A, BC->D, C->B,E->A, E->D} Determine extraneous attributes? BITS Pilani, Pilani Campus Closure of FD set or F+: The set of functional dependencies that is logically implied by F is called the closure of F and is written as F+ BITS Pilani, Pilani Campus Problems using Armstrong’s axiom: 1. 2. BITS Pilani, Pilani Campus Canonical cover or minimal set of FDs: 1. 2. 3. Singleton RHS Extraneous attributes removed Remove redundant FDs. R(A ,B ,C) and F = {A->B ,AB->C }. Find minimal cover? A canonical cover is "allowed" to have more than one attribute on the RHS. A minimal cover cannot. As an example, the canonical cover may be "A -> BC" where the minimal cover would be "A -> B, A -> C". BITS Pilani, Pilani Campus Canonical cover or minimal set of FDs: A minimal cover of a set of functional dependencies E is a set of functional dependencies F that satisfies the property that every dependency in E is in the closure F+ of F. A set of functional dependencies F to be minimal if it satisfies the following condition. (i) Every dependency in F has a single attribute for its right-hand side. (ii) We cannot replace any dependency X -+ A in F with a dependency Y -+ A, where Y is a proper subset of X and still have a set of dependencies that is equivalent to F. (iii) We cannot remove any dependency from F and still have a set of dependencies that is equivalent to F. BITS Pilani, Pilani Campus Canonical cover or minimal cover: F = {A ->B, AB->C, D->AC, D->E} and G = {A->BC, D->AB}. Find if F covers G? BITS Pilani, Pilani Campus F = {A ->B, AB->C, D->AC, D->E} and G = {A->BC, D->AB}. Find if F covers G? BITS Pilani, Pilani Campus Problem Find Ckey/Prime Key/minimal key and NPA? R(A,B,C,D,E,F) F={C->F,E->A, EC->D, A->B} Find non and redundant FD: R(A,B,C,D) F={ABC->D, BC->D} BITS Pilani, Pilani Campus Logical Design - Normalization BITS Pilani, Pilani Campus Normalization Note: Decomposing relations should Should preserve DEPENDENCY. (ie., FDs of the original relation are not lost. ) BITS Pilani, Pilani Campus 1 NF BITS Pilani, Pilani Campus Normalization: 1NF A relation will be 1NF 1NF disallows relations within relations or relations as attribute values within tuples. The only attribute values If it contains an atomic value. It states that an attribute of a table cannot hold multiple values. permitted by 1NF are single atomic (or indivisible) values. It must hold only single-valued attribute. First normal form disallows the multi-valued attribute, composite attribute, and their combinations. EMPLOYEE table EMP_ID EMP_NAME EMP_PHONE EMP_STATE 14 John 7272826385, 9064738238 UP 20 Harry 8574783832 Bihar 12 Sam 7390372389, 8589830302 Punjab SOLUTION: The EMPLOYEE table into 1NF has been shown below: EMP_ID EMP_NAME EMP_PHONE EMP_STATE 14 John 7272826385 UP 14 John 9064738238 UP 20 Harry 8574783832 Bihar 12 Sam 7390372389 Punjab 12 Sam 8589830302 Punjab Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE. BITS Pilani, Pilani Campus Normalization : 1NF 1NF (tables values are single and atomic) Check if table is in UnNormalized Form UNF (ie., tables values are not single and atomic) Yes: Relation should have no multivalued attributes or nested relations. So make all values as single and atomic. Remedy : Form new relations for each multivalued attribute or nested relation. BITS Pilani, Pilani Campus 2 NF BITS Pilani, Pilani Campus Normalization: 2 NF In the 2NF, Relational must be in 1NF. All non-key attributes are fully functional dependent on the primary key ie., no partial dependency. Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can teach more than one subject. Candidate Keys/PK: {Teacher_Id, Subject} Non prime attribute: Teacher_Age TEACHER table TEACHER_ID SUBJECT TEACHER_AGE 25 25 47 83 83 Chemistry Biology English Math Computer 30 30 35 38 38 Is the Table in 2NF? In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate key. That's why it violates the rule for 2NF. To convert the given table into 2NF, we decompose it into two tables: 2NF requires that all data elements in a table are full functionally dependent on the table's primary key. • If data clement only dependent on part of primary key ie., partial dependent, then they are parsed out to separate tables. BITS Pilani, Pilani Campus Normalization : 2 NF TEACHER_DETAIL table: TEACHER_ID TEACHER_AGE 25 30 47 35 83 38 TEACHER_SUBJECT table: TEACHER_ID SUBJECT 25 Chemistry 25 Biology 47 English 83 Math 83 Computer BITS Pilani, Pilani Campus Normalization : 2 NF FD 1: SSN, Pnumber -> hours FD 2: SSN ename FD 3: Pnumber Pname, Plocation Now table in 2NF BITS Pilani, Pilani Campus 3 NF BITS Pilani, Pilani Campus Normalization: 3 NF A relation will be in 3NF If it is in 2NF and not contain any transitive partial dependency. 3NF is used to reduce the data duplication. It is also used to achieve the data integrity. If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form. A relation is in third normal form if it holds at-least one of the following conditions for every non-trivial function dependency X → Y. 1.X is a super key. 2.Y is a prime attribute, i.e., each element of Y is part of some candidate key. EMPLOYEE_DETAIL table: EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY 222 Harry 201010 UP Noida 333 Stephan 02228 US Boston 444 Lan 60007 US Chicago 555 Katharine 06389 UK Norwich 666 John 462007 MP Bhopal Super key in the table above: {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on Candidate key: {EMP_ID} BITS Pilani, Pilani Campus Normalization: 3 NF Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime. Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. Is table in 3NF? The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third normal form. SOLUTION ? That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key. EMPLOYEE table: EMP_ID EMP_NAME EMP_ZIP 222 333 444 555 666 Harry Stephan Lan Katharine John 201010 02228 60007 06389 462007 EMPLOYEE_ZIP table: EMP_ZIP EMP_STATE EMP_CITY 201010 UP Noida 02228 US Boston 60007 US Chicago 06389 UK Norwich 462007 MP Bhopal BITS Pilani, Pilani Campus Normalization: 3 NF 3NF (no transitive dependency) Check if table in 2NF already (ie., NPA are fully functional dependent on PA) YES : Check if table in 3NF (ie., no transitive dependency) Test : Relation should not have a nonkey attribute functionally determined by another nonkey attribute (or by a set of nonkey attributes). That is, there should be no transitive dependency of a nonkey attribute on the primary key. Or A relation schema R is in 3NF if every nonprime attribute of R meets both of the following conditions: ■ It is fully functionally dependent on every key of R. ■ It is nontransitively dependent on every key of R. Remedy: Decompose and set up a relation that includes the nonkey attribute(s) that functionally determine(s) other nonkey attribute(s). OR we can say: A relation schema R is in 3NF if, whenever a non trivial functional dependency SSN - Dnumber and Dnumber - Dname, DmgrSSN BITS Pilani, Pilani Campus 3NF example BITS Pilani, Pilani Campus Boyce Codd NF or BC NF BITS Pilani, Pilani Campus BCNF ( BOYCE CODD NF) BCNF is the advance version of 3NF. It is stricter than 3NF. A table is in BCNF if every functional dependency X → Y, X is the super key of the table. For BCNF, the table should be in 3NF, and for every FD, LHS is super key. Example: Let's assume there is a company where employees work in more than one department. EMPLOYEE table: Candidate key: EMP_ID EMP_COUN EMP_DEPT TRY DEPT_TYPE DEPT_NO_ OF_EMP {EMP_ID,, EMP_DEPT} 264 India Designing D394 283 264 India Testing D394 300 364 UK Stores D283 232 Is the table in BCNF? The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys. 364 UK Developing D283 549 Super keys: {Emp_Id}, {Emp_Id, Emp_Country}, {Emp_Id, Emp_Dept}, {Emp_Dept}, {Emp_Id,Emp_Dept_no}…so on In the above table Functional dependencies are as follows: EMP_ID → EMP_COUNTRY EMP_DEPT → {DEPT_TYPE, DEPT_NO_OF_EMP} To convert it to BCNF: The table is decomposed so that each FD here is valid and every FD depends on SK on LHS BITS Pilani, Pilani Campus BCNF ( BOYCE CODD NF) To convert the given table into BCNF, we decompose it into three tables: Functional dependencies: EMP_COUNTRY table: EMP_ID EMP_COUNTRY 264 India 264 India 1.EMP_ID → EMP_COUNTRY 2.EMP_DEPT → {DEPT_TYPE, DEPT_NO_OF_EMP} EMP_DEPT_MAPPING table: EMP_DEPT table: EMP_ID EMP_DEPT D394 Designing EMP_DEPT DEPT_TYPE DEPT_NO_OF_EMP D394 Testing Designing D394 283 D283 Stores Testing D394 300 D283 Developing Stores D283 232 Developing D283 549 Candidate keys: For the first table: EMP_ID For the second table: EMP_DEPT For the third table: {EMP_ID, EMP_DEPT} Now, this is in BCNF because left side part of both the functional dependencies is a key. BITS Pilani, Pilani Campus BCNF ( BOYCE CODD NF) Example: Consider a relation schema BOOK_RATING(ISBN, Book_title, R_ID, Rating) . Two Candidate keys = {(ISBN, R_ID), (Book_title, R_ID)}. Remedy : problem can be resolved by decomposing this relation schema into two relation schemas as shown here. BOOK_TITLE_INFO(ISBN, Book_title) and REVIEW(R_ID, ISBN, Rating) Or BOOK_TITLE_INFO(ISBN, Book_title) and REVIEW(R_ID, Book_title, Rating) Now, all these relation schemas are in BCNF. Note that BCNF is the most desirable normal form as it ensures the elimination of all redundancy that can be detected using functional dependencies. Note: If there is only one determinant upon which other attributes depend and it is a candidate key, 3NF and BCNF are identical. This relation schema is not in BCNF since both the candidate keys are composite as well as overlapping. However, it is in 3NF. BITS Pilani, Pilani Campus BCNF ( BOYCE CODD NF) BITS Pilani, Pilani Campus BCNF ( BOYCE CODD NF) Normalize the relation professor so as it is in BCNF. The PROFESSOR Relation decompose into two relation: PROF 1 and PROF 2 respectively. BITS Pilani, Pilani Campus BCNF ( BOYCE CODD NF) Note: Every relation in BCNF is also in 3NF, but a relation is 3NF is not necessarily in BCNF A relation is not in BCNF if 1.the candidate keys in the relation are composite keys (that is, they are not single attribute keys) 2.there is more than one candidate key in the relation 3.the keys overlap, that is, some attributes in the keys are common. BITS Pilani, Pilani Campus BCNF ( BOYCE CODD NF) Note: Every relation in BCNF is also in 3NF, but a relation is 3NF is not necessarily in BCNF Ex: Where each student may have only one tutor, but each tutor may have many students. This table is subject to insertion anomalies as both the Tutor ID and SIN must be entered whenever a tutor-student pair is entered. So decompose to convert to BCNF. Candidate keys are: {ID, TutorID} and {ID, TutorSIN} TutorID → TutorSIN and TutorSIN → TutorID, but because both TutorID and TutorSIN are prime attributes these FDs do not violate 3NF. Neither TutorID nor TutorSIN alone are superkeys, and thus BCNF is violated. BITS Pilani, Pilani Campus 4 NF BITS Pilani, Pilani Campus Normalization : 4NF ( MVD ) Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a third attribute. A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it always requires at least three attributes. Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of each model every year. BIKE_MODEL MANUF_YEAR COLOR M2011 M2001 M3001 M3001 M4006 M4006 2008 2008 2013 2013 2017 2017 White Black White Black White Black Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other. In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The representation of these dependencies is shown below: 1.BIKE_MODEL MANUF_YEAR 2.BIKE_MODEL COLOR This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL multidetermined COLOR". BITS Pilani, Pilani Campus Normalization : 4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency. For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will be a multi-valued dependency. Is the table in 4NF? Example STUDENT STU_ID COURSE HOBBY 21 Computer Dancing 21 Math Singing 34 Chemistry Dancing 74 Biology Cricket 59 Physics Hockey The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no relationship between COURSE and HOBBY. In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data. BITS Pilani, Pilani Campus Normalization : 4NF So to make the above table into 4NF, we can decompose it into two tables: STUDENT_COURSE STUDENT_HOBBY STU_ID COURSE STU_ID HOBBY 21 Computer 21 Dancing 21 Math 21 Singing 34 Chemistry 34 Dancing 74 Biology 74 Cricket 59 Physics 59 Hockey BITS Pilani, Pilani Campus BCNF to 4NF: An entity type is in 4NF if it is BCNF and there are non multivalued dependencies between its attribute types. Any entity is BCNF is transformed into 4NF : (i) Direct any multivalued dependencies. (ii) Decompose entity type. BITS Pilani, Pilani Campus Normalization : 4NF BITS Pilani, Pilani Campus Normalization : 4NF BITS Pilani, Pilani Campus 5 NF BITS Pilani, Pilani Campus 5NF A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless. 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy. 5NF is also known as Project-join normal form (PJ/NF). Is the table in 5NF? SUBJECT LECTURER SEMESTER Computer Anshika Semester 1 Computer John Semester 1 Math John Semester 1 Math Akash Semester 2 Chemistry Praveen Semester 1 NO its not! In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for Semester 2. In this case, combination of all these fields required to identify a valid data. Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we can't leave other two columns blank. BITS Pilani, Pilani Campus 5NF So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3: P2 P1 P3 SEMESTER SUBJECT SUBJECT LECTURER SEMSTER LECTURER Semester 1 Computer Computer Anshika Semester 1 Anshika Semester 1 Math Computer John Semester 1 John Semester 1 Chemistry Math John Semester 1 John Semester 2 Math Math Akash Semester 2 Akash Chemistry Praveen Semester 1 Praveen BITS Pilani, Pilani Campus Normalization: 5NF A table is said to be in the 5NF if and only if it is in 4NF and every Join dependency in it is implied by the candidate key. BITS Pilani, Pilani Campus Lossless Decomposition or NonAdditive JD A decomposition {R1, R2,…, Rn} of a relation R is called a lossless decomposition for R if the natural join of R1, R2,…, Rn produces exactly the relation R. A decomposition is lossless if we can recover: R(A, B, C) Decompose R1(A, B) R2(A, C) Recover R’(A, B, C) Thus, R’ = R BITS Pilani, Pilani Campus Lossy and Lossless decomposition Problems: 1. R(A,B,C) R1(A,B) and R2(B,C) Find if lossless or lossy decomposition or JD R= A B C 2. R(A,B,C) R1(A,C) and R2(B,C) Find if lossless or lossy decomposition or JD R= A B C 1 2 1 1 2 1 2 5 3 2 5 3 3 3 3 3 3 3 BITS Pilani, Pilani Campus Example : Explain Is it Lossy or lossless join decomposition in the following relation R(A, B,C,D,E) and F={ A->B, B->C, D->C, D->E} Where R is decomposed into R1(A,B,D) and R2(C,D,E) ? BITS Pilani, Pilani Campus Example : Explain Is it Lossy or lossless join decomposition in the following relation R(A, B,C,D,E) and F={ A->B, B->C, D->C, D->E} Where R is decomposed into R1(A,B,D) and R2(C,D,E) ? Solution: A B C D E R1 A D BD CD DD E D R2 D D D Therefore LOSSLESS JD. BITS Pilani, Pilani Campus Example Given: Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Required FD’s: branch-namebranch-city assets loan-numberamount branch-name Decompose Lending-schema into two schemas: Branch-schema = (branch-name, branch-city, assets) Loan-info-schema = (branch-name, customer-name, loan-number, amount) Show that decomposition is Lossless Decomposition BITS Pilani, Pilani Campus Problems: 1. R(A,B,C,D,E) and F={A->D,B->C, AB->E} Is it in 1NF, 2NF, 3NF or BCNF? 2. R = (A, B, C, D, E). We decompose it into R1 = (A, B, C), R2 = (A, D, E). The set of functional dependencies is: A → BC, CD → E, B → D, E → A. Show that this decomposition is a lossless-join decomposition. BITS Pilani, Pilani Campus Problems: 3. R(A,B,C) and F={A-B, B->C} is in 3NF or not? 4. R(A,B,C,D) and F={A->BCD, BC->D, D->B} is it in BCNF or not? BITS Pilani, Pilani Campus NF BITS Pilani, Pilani Campus NF Remedy : problem can be resolved by decomposing this relation schema into two relation schemas as shown here. BOOK_TITLE_INFO(ISBN, Book_title) and REVIEW(R_ID, ISBN, Rating) Or BOOK_TITLE_INFO(ISBN, Book_title) and REVIEW(R_ID, Book_title, Rating) Now, all these relation schemas are in BCNF. Note that BCNF is the most desirable normal form as it ensures the elimination of all redundancy that can be detected using functional dependencies. BOOK_RATING(ISBN, Book_title, R_ID, Rating) . Two Candidate keys = {(ISBN, R_ID), (Book_title, R_ID)}. Note: If there is only one determinant upon which other attributes depend and it is a candidate key, 3NF and BCNF are identical. Relation in BCNF This relation schema is not in BCNF since both the candidate keys are composite as well as overlapping. However, it is in 3NF. BITS Pilani, Pilani Campus NFs Overview Functional dependencies (FD): tool to detect redundancies in schemas Relations can be in different normal forms - the higher, the less redundancies. But there is a trade-off (see above). If a relation is in BCNF, it is free of redundancies that can be detected using FDs. Thus, trying to decompose into BCNF is a good heuristic. If a relation is not in BCNF, we can try to decompose it into a collection of BCNF relations. Decompositions can be loss-less and/or dependency-preserving Must consider whether all FDs are preserved. If a dependency-preserving decomposition into BCNF is not possible (or unsuitable, given typical queries), should consider decomposition into 3NF. BITS Pilani, Pilani Campus BITS Pilani Pilani Campus Tutorial Session 6: Data storage, Indexing and Normalization. LEARNING OUTCOME Secondary storage devices (Files, records, blocks on disks) B and B+ trees Hashing techniques(internal & external) REFER: T1-Chapter 13 Sections: 13.1-13.8 BITS Pilani, Pilani Campus Disk Parameters Calculation: Usually, the disk manufacturer provides an average seek time in milliseconds. The typical range of average seek time is 4 to 10 msec. If the speed of disk rotation is p revolutions per minute (rpm), then the average rotational delay rd is given by rd = (1/2) * (1/p) min= (60 * 1000)/(2 * p) msec = 30000/p msec. Block transfer time (btt) = B/tr msec where B is Block size and tr is transfer rate. Transfer rate = track size in bytes / 1 rpm. The average time (s) needed to find and transfer a block, given its block address, is estimated by (s + rd + btt) msec. BITS Pilani, Pilani Campus Disk Parameters Calculation: To transfer consecutively k noncontiguous blocks that are on the same cylinder, we need approximately s + (k * (rd + btt)) msec. The rotational delay for all but the first block, so the estimate for transferring k consecutive blocks is s + rd + (k * btt) msec. bulk transfer rate (btr) that takes the gap size into account when reading consecutively stored blocks. If the gap size is G bytes, then btr = (B/(B + G)) * tr bytes/msec. The estimated time to read k blocks consecutively stored on the same cylinder becomes s + rd + (k * (B/btr)) msec. BITS Pilani, Pilani Campus Placing file records on Disk BITS Pilani, Pilani Campus Placing file records on Disk BITS Pilani, Pilani Campus Placing file records on Disk BITS Pilani, Pilani Campus Placing file records on Disk BITS Pilani, Pilani Campus Disk Parameters Calculation: Formula A. Usually, the disk manufacturer provides an average seek time in milliseconds. B. The typical range of average seek time is 4 to 10 msec. C. If the speed of disk rotation is p revolutions per minute (rpm), then the average rotational delay rd is given by rd = (1/2) * (1/p) min= (60 * 1000)/(2 * p) msec = 30000/p msec. 1 rpm = 60 x 1000 / rpm msec And rd = (1 rpm) /2 D. Block transfer time (btt) = B/tr msec where B is Block size and tr is transfer rate. E. Transfer rate = track size in bytes / 1 rpm. F. The average time (s) needed to find and transfer a block, given its block address, is estimated by (s + rd + btt) msec. G. To transfer consecutively k noncontiguous blocks that are on the same cylinder, we need approximately s + (k * (rd + btt)) msec. H. The rotational delay for all but the first block, so the estimate for transferring k consecutive blocks is s + rd + (k * btt) msec. I. bulk transfer rate (btr) that takes the gap size into account when reading consecutively stored blocks. If the gap size is G bytes, then a. J. btr = (B/(B + G)) * tr bytes/msec. The estimated time to read k blocks consecutively stored on the same cylinder becomes a. s + rd + (k * (B/btr)) msec1 k. Blocking factor = Bfr = floor(B/R) where B – block size in bytes and R is record size in bytes. BITS Pilani, Pilani Campus Problem Consider a disk with the following characteristics (these are not parameters of any particular disk unit): block size B=512 bytes, interblock gap size G=128 bytes, number of blocks per track=20, number of tracks per surface=400. A disk pack consists of 15 double-sided disks. (a) What is the total capacity of a track and what is its useful capacity (excluding interblock gaps)? (b) How many cylinders are there? (c) What is the total capacity and the useful capacity of a cylinder? (d) What is the total capacity and the useful capacity of a disk pack? (e) Suppose the disk drive rotates the disk pack at a speed of 2400 rpm (revolutions per minute); what is the transfer rate in bytes/msec and the block transfer time btt in msec? What is the average rotational delay rd in msec? What is the bulk transfer rate? (f) Suppose the average seek time is 30 msec. How much time does it take (on the average) in msec to locate and transfer a single block given its block address? (g) Calculate the average time it would take to transfer 20 random blocks and compare it with the time it would take to transfer 20 consecutive blocks using double buffering to save seek time and rotational delay. BITS Pilani, Pilani Campus a) What is the total capacity of a track and what is its useful capacity (excluding interblock gaps)? block size B=512 bytes, interblock gap size G=128 bytes, number of blocks per track=20, number of tracks per surface=400. A disk pack consists of 15 double-sided disks. NOTE : Write the units after computed values. BITS Pilani, Pilani Campus b) How many cylinders are there? block size B=512 bytes, interblock gap size G=128 bytes, number of blocks per track=20, number of tracks per surface=400. A disk pack consists of 15 double-sided disks. NOTE : Write the units after computed values. BITS Pilani, Pilani Campus (c) What is the total capacity and the useful capacity of a cylinder? block size B=512 bytes, interblock gap size G=128 bytes, number of blocks per track=20, number of tracks per surface=400. A disk pack consists of 15 double-sided disks. NOTE : Write the units after computed values. BITS Pilani, Pilani Campus (d) What is the total capacity and the useful capacity of a disk pack? block size B=512 bytes, interblock gap size G=128 bytes, number of blocks per track=20, number of tracks per surface=400. A disk pack consists of 15 double-sided disks. NOTE : Write the units after computed values. BITS Pilani, Pilani Campus (e) Suppose the disk drive rotates the disk pack at a speed of 2400 rpm (revolutions per minute); what is the transfer rate in bytes/msec and the block transfer time btt in msec? What is the average rotational delay rd in msec? What is the bulk transfer rate? NOTE : Write the units after computed values. block size B=512 bytes, interblock gap size G=128 bytes, number of blocks per track=20, number of tracks per surface=400. A disk pack consists of 15 double-sided disks. BITS Pilani, Pilani Campus (f) Suppose the average seek time is 30 msec. How much time does it take (on the average) in msec to locate and transfer a single block given its block address? block size B=512 bytes, interblock gap size G=128 bytes, number of blocks per track=20, number of tracks per surface=400. A disk pack consists of 15 double-sided disks. NOTE : Write the units after computed values. BITS Pilani, Pilani Campus (g) Calculate the average time it would take to transfer 20 random blocks and compare it with the time it would take to transfer 20 consecutive blocks using double buffering to save seek time and rotational delay. block size B=512 bytes, interblock gap size G=128 bytes, number of blocks per track=20, number of tracks per surface=400. A disk pack consists of 15 double-sided disks. NOTE : Write the units after computed values. BITS Pilani, Pilani Campus Solution (a) Using the block size B=512 bytes, interblock gap size G=128 bytes, number of blocks per track=20, Now calculate 1 block storing capacity = 1 block size + I Gap) = = (d) since number of tracks per surface=400. Total capacity of a disk pack = 15 * 2 * 400 * 20 * (512+128) (512 +128) = 640 bytes = 153600000 bytes = 153.6 Mbytes For 1 track which has 20 blocks and so its storage capacity = (Total track size = 20 * (512+128) Useful capacity of a disk pack = 15 * 2 * 400 * 20 * 512 (ie., excluding gap size) = 122.88 Mbytes = 12800 bytes = 12.8 Kbytes Useful capacity of a track = 20 * 512 = 10240 bytes = 10.24 Kbytes (ie., excluding the gap size) (b) Number of cylinders = number of tracks = 400 (c) since a disk pack consists of 15 double-sided disks. Total cylinder capacity = 15*2*20*(512+128) = 384000 bytes = 384 Kbytes NOTE : Write the units after computed values. Useful cylinder capacity = 15 * 2 * 20 * 512 (ie., excluding the gap size) = 307200 bytes = 307.2 Kbytes BITS Pilani, Pilani Campus Solution (e) Using the above FORMULA H Transfer rate = track size in bytes / 1 rpm. Transfer rate tr= (total track size in bytes)/(time for one disk revolution in msec) tr= (12800) / ( (60 * 1000) / (2400) ) = (12800) / (25) = 512 bytes/msec Using the above formula G Block transfer time (btt) = B/tr msec where B is Block size and tr is transfer rate. (g) So now using calculated from previous step time to transfer 20 random blocks = 20 * (s + rd + btt) = 20 * 43.5 = 870 msec time to transfer 20 consecutive blocks using double buffering = s + rd + 20*btt = 30 + 12.5 + (20*1) = 62.5 msec (a more accurate estimate of the latter can be calculated using the bulk transfer block transfer time btt = B / tr = 512 / 512 = 1 msec rate as follows: Using the above formula G time to transfer 20 consecutive blocks using double buffering If the speed of disk rotation is p revolutions per minute (rpm), then the average = s+rd+((20*B)/btr) = 30+12.5+ (10240/409.6) = 42.5+ 25 = 67.5 msec) rotational delay rd is given by rd = (1/2) * (1/p) min average rotational delay rd = (time for one disk revolution in msec) / 2 = 25 / 2 = 12.5 msec (f) Using the above formula Using the above formula The average time (s) needed to find and transfer a block, given its block address, is estimated by (s + rd + btt) msec. bulk transfer rate (btr) that takes the gap size into account when reading consecutively stored blocks. If the gap size is G bytes, then btr = (B/(B + G)) * tr bytes/msec. average time to locate and transfer a block = s+rd+btt bulk transfer rate btr= tr * ( B/(B+G) ) = 512*(512/640) = 30+12.5+1 = 43.5 msec NOTE : Write the units after computed values. = 409.6 bytes/msec BITS Pilani, Pilani Campus Problem Let us say we have XAT supplier company has stored info on files. A Supplier file has rec = 1000 records of fixed length. Each record has the following fields/cols (in bytes) : sup# (10), part# ( 10) , pname(200) pdescp(700) and a deletion marker byte. The file is stored on the disk whose parameters are given as block size B = 1024 bytes; interblock gap size G = 200bytes; number of blocks per track = 25; number of tracks per surface = 500. A disk pack consists of 18 double-sided disks, seek time s= 20msec, rotational delay rd = 12.5 and rpm=2000 msec. a. Calculate the record size in bytes. b. Calculate the blocking factor and the number of file blocks b, assuming an unspanned organization. c. Calculate the average time it takes to find a record by doing a linear search on the file if (i) the file blocks are stored contiguously, and double buffering is used; (ii) the file blocks are not stored contiguously. d. Assume that the file is ordered by part#; by doing a binary search, calculate the time it takes to search for a record given its part# value. BITS Pilani, Pilani Campus Solution a. Calculate the record size in bytes. b. Calculate the blocking factor and the number of file blocks b, assuming an unspanned organization.Understand unspanned ie., if the last record cannot fit in that block that whole record is stored in the next consecutive or stored in a different block. A Supplier file has rec = 1000 records of fixed length. Each record has the following fields/cols (in bytes) : sup# (10), part# ( 10) , pname(200) pdescp(700) and a deletion marker byte. block size B = 1024 bytes; interblock gap size G = 200bytes; number of blocks per track = 25; number of tracks per surface = 500. A disk pack consists of 18 double-sided disks, seek time s= 20msec, rotational delay rd = 12.5 and rpm=2000 msec. NOTE : Write the units after computed values. BITS Pilani, Pilani Campus Solution c. Calculate the average time it takes to find a record by doing a linear search on the file (i)The file blocks are stored contiguously, and double buffering is used; A Supplier file has rec = 1000 records of fixed length. Each record has the following fields/cols (in bytes) : sup# (10), part# ( 10) , pname(200) pdescp(700) and a deletion marker byte. NOTE : Write the units after computed values. block size B = 1024 bytes; interblock gap size G = 200bytes; number of blocks per track = 25; number of tracks per surface = 500. A disk pack consists of 18 double-sided disks, seek time s= 20msec, rotational delay rd = 12.5 and rpm=2000 msec. BITS Pilani, Pilani Campus Solution (ii) the file blocks are not stored contiguously. A Supplier file has rec = 1000 records of fixed length. Each record has the following fields/cols (in bytes) : sup# (10), part# ( 10) , pname(200) pdescp(700) and a deletion marker byte. block size B = 1024 bytes; interblock gap size G = 200bytes; number of blocks per track = 25; number of tracks per surface = 500. A disk pack consists of 18 double-sided disks, seek time s= 20msec, rotational delay rd = 12.5 and rpm=2000 msec. d. Assume that the file is ordered by part#; by doing a binary search, calculate the time it takes to search for a record given its part# value. NOTE : Write the units after computed values. BITS Pilani, Pilani Campus Solution a. Calculate the record size in bytes. Record size = 10 + 10+ 200+ 700 +1 = 921 bytes. b. Calculate the blocking factor and the number of file blocks b, assuming an unspanned organization. Understand unspanned ie., if the last record cannot fit in that block that whole record is stored in the next consecutive or stored in a different block. Using formula from above Blocking factor = Bfr = floor(B/R) where B – block size in bytes and R is record size in bytes. Given in the problem block size B = 1024 bytes; interblock gap size G = 200bytes; number of blocks per track = 25; number of tracks per surface = 500. B = 1024 bytes Bfr = floor(B/R) = floor(1024/921)= floor(1.113) = 1. (ie., 1 record in one block) Therefore for 1000 records we need 1000 blocks. b = ceil(rec/ bfr) = ceil(1000/1) = 1000 blocks. BITS Pilani, Pilani Campus Solution c. Calculate the average time it takes to find a record (i)The file blocks are stored contiguously, and double buffering is by doing a linear search on the file used; Using the formula: = s+rd+(Hbs *B/btr) The average time to do linear search is searching half the total file blocks Hbs= b/2 where Hbs = 500, tr = track size/ 1 rpm = 30600 bytes /30 msec, calculate btr = (B/(B+G)) x tr , seek time s=20 msec, rd= 12.5. tr = 30600/30 = 1020 bytes/msec The average time to do linear search = Hbs = b/2= 1000/2=500 blocks. Total track size = 25 * (1024+200) = 30600 bytes Time for 1 revolution of disk (ie 1 rpm) = 60 x 1000 / rpm = 60 x 1000/2000 = 30 msec Given in the problem block size B = 1024 bytes; interblock gap size G = 200bytes; number of blocks per track = 25; number of tracks per surface = 500. btr = ( 1024/(1024 + 200)) x 1020 = 853 bytes/msec = s+rd+(Hbs *B/btr) =20 +12.5 +(500 *(1024/853))=632.7344 msec= 0.632734 sec BITS Pilani, Pilani Campus Solution (ii) the file blocks are not stored contiguously. Similarly calculate using the following equation: = Hbs *( s+rd +btt) where btt = B/tr where tr = number of bytes on a track/ 1 rpm = 30600 bytes /30 msec , Hbs =500 and B =1024 Btt = 1024 /( 30600/30) = 1.004 Given seek time s=20 msec, rd= 12.5. So , = Hbs *( s+rd +btt) = 500 * (20 +12.5 +1.004) = 18252 msec. NOTE : Write the units after computed values. a. Assume that the file is ordered by part#; by doing a binary search, calculate the time it takes to search for a record given its part# value. = ceil(log 2b) *(s+rd+btt) Since b is 1000 blocks .(calculated above) = ceil (log 21000) x (20+12.5+ 1.004) = ceil (log 21000) x (33.504) = 9.9 x33.504 = 331.7 msec BITS Pilani, Pilani Campus Question? Without rearranging the actual records can you put them in particular order based on a key field or fields? Sol: By Indexing or Hashing. Indexing is like Index at the end of Book , so you need to search linearly or binary to get to that term and find page number where that term is in the book. So catch is file (actual data) and a index to it is needed. Hashing is O(1) doesn’t need to search but computes the data location in the file. BITS Pilani, Pilani Campus B+ TREE Tutorials BITS Pilani, Pilani Campus CRUD on B+ tree Create a B+tree 2,5,7,10,13,16,20,22,23,24 Delete 23,10 BITS Pilani, Pilani Campus CRUD on B+ tree Delete 23,10 BITS Pilani, Pilani Campus HASHING TECHNIQUE BITS Pilani, Pilani Campus Hashing Technique BITS Pilani, Pilani Campus where data is stored at the data blocks whose address is generated by using hash function. The memory location where these records are stored is called as data block or data bucket. This data bucket is capable of storing one or more records. BITS Pilani, Pilani Campus Hashing Technique BITS Pilani, Pilani Campus Hashing Technique BITS Pilani, Pilani Campus Structure of extendible hashing BITS Pilani, Pilani Campus Problems: BITS Pilani, Pilani Campus