Uploaded by lbv50016

Combined DDA Class PPT

advertisement
Database design and Applications
CSIZG518/SEZG518/SSZG518)(S2 -22)
BITS Pilani
Pilani Campus
Prof Uma Maheswari
DDA course content













1. Introduction and Overview of DBMS
2. Conceptual Database Design (ER and EER Modeling)
3. Relational Model
4. Relational Algebra and Calculus
5. SQL
6. Schema Refinement and Normal Forms
7. Disk Storage
8. Hashing and Indexing
9. Transaction Management and Concurrency Control
10. Database Recovery
11. DB security
12. QUERY processing and optimization
13. Schema-less DB - NOSQL introduction.
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Session 1
Introduction to Database Management Systems (DBMS)
Concepts and Architecture
Learning Objective







Introduction to RDBMS
RDBMS vs Traditional FS
3-schema arch
data independence
DBMS architecture
Data dictionary,
DB design phases
Refer : T1-Chapter 1 and 2;
RL
: 1.1 , 1.2
BITS Pilani, Pilani Campus
What is Database ?
A unit of Information is called DATUM while DATA is plural. So
DATA is
– Collection of related data
– By data , we mean known facts that can be recorded and that
have implicit meaning.
– Represents some aspect of the real world
– Logically coherent collection of data with inherent meaning
– Built for a specific purpose
This collection of related data with an implicit meaning is a database.
BITS Pilani, Pilani Campus
Types of Data
Examples
Character
Number
A
N
$
Boolean
12
56.9
54/78
-5.6777
89,677
03-08-2019
1/5/2018
$56678
Rs 34900
1 or 0
true or false
Sets
{RED, YELLOW, GREEN BLUE}
{USD, INR, AUD, EURO, POUND}
Date
Currency
BITS Pilani, Pilani Campus
What is Database ?
DATABASE:
A collection of related data with an implicit meaning is called a Database.
Examples of Database:
 employee database (contains all about employees from one or more branches of an
organization),
 sales database (contains all about sales and salesperson from one or more branches
of an organization),,
 user database (all who uses the system and credentials and log details) and so on.
BITS Pilani, Pilani Campus
Database Size:
• Size of database can big to small depends on the application’s usage of
data that can be volume, variety, velocity, value and Validity of data.
• A large database for databases with several dozen gigabytes of data and
a schema with more than 30 or 40 distinct entity types, that cover a
wide array of databases used in government, industry, and financial and
commercial institutions.
• Application systems for these databases are called transaction
processing systems or TPS due to the large transaction volumes and
rates that are required.
BITS Pilani, Pilani Campus
Types of data used by Applications:
BITS Pilani, Pilani Campus
Types of Databases and Database
Applications
Traditional database applications
– Store textual or numeric information
Multimedia databases
– Store images, audio clips, and video streams digitally
Geographic information systems (GIS)
– Store and analyze maps, weather data, and satellite images
Data warehouses and online analytical processing (OLAP) systems
• Extract and analyze useful business information from very large databases
• Support decision making
Real-time and active database technology
• Control industrial and manufacturing processes
Time series DB
• Financial data
• the volatility of stock trading
BITS Pilani, Pilani Campus
DBMS and its OPERATIONS
BITS Pilani, Pilani Campus
What is Database management system (DBMS) ?
• Collection of programs
• Enables users to create and maintain a database
Def:
The DBMS is a general-purpose software system that facilitates
the processes of defining, constructing, manipulating, and
sharing databases among various users and applications
How do we share the DB ? Server
BITS Pilani, Pilani Campus
Traditional file processing
BITS Pilani, Pilani Campus
Traditional file processing
BITS Pilani, Pilani Campus
Traditional file processing
BITS Pilani, Pilani Campus
How does DBMS Look?
BITS Pilani, Pilani Campus
DBMS environment
BITS Pilani, Pilani Campus
What operations can you do with DBMS?
 Defining a database
 Constructing the database
 Manipulating a database
 Sharing a database
 Query a database
BITS Pilani, Pilani Campus
DBMS operations in detail
Defining a database
– Specify the data types, structures, and constraints of the data to be stored.
– The database definition or descriptive information is also stored by the
DBMS in the form of a database catalog or dictionary; it is called metadata.
Constructing the database is the process of storing the data on some storage
medium that is controlled by the DBMS.
Manipulating a database includes functions such as querying the database to
retrieve specific data, updating the database and generating reports from the
data.
BITS Pilani, Pilani Campus
What is sharing operation in DBMS?
– Sharing a database allows multiple users and programs to access the
database simultaneously.
BITS Pilani, Pilani Campus
How do we access Database from
application programs?
 An application program can access a database by sending queries to
DBMS using:
Three main integration approaches:
– Embed SQL in the host language (Embedded SQL)
– Create special API to call SQL commands (JDBC,ODBC)
– Allow ‘external’ code to be executed from within SQL
What is Query ?
A DBMS operation that causes some data to be retrieved
BITS Pilani, Pilani Campus
Embedded SQL
{
int a;
/* ... */
EXEC SQL SELECT salary INTO :a
FROM Employee
WHERE SSN=876543210;
/* ... */
printf("The salary is %d\n", a);
/* ... */
}
BITS Pilani, Pilani Campus
Using SQL in API call
app.post("/api/getAllcustomers", (req, res) => {
const {eid } = req.body;
console.log("eid is ..",eid);
pool.query
"SELECT * FROM customers WHERE customercity= $1",
[eid],
(error, results) => {
if (error) {
console.log(error);
res.status(203);
}
else {
res.status(200).json(results.rows);
}
});
});
BITS Pilani, Pilani Campus
Allow ‘external’ code to be executed from
within SQL
EXAMPLE OF PYTHON CODE EXECUTION IN SQL SERVER
EXECUTE sp_execute_external_script
@language = N'Python',
@script = N'
a=1
b=2
c = a+b
print ("Example instruction on Python")
print("Result =", c)';
To allow the use of external scripts
in the Python language, you must
enable the system parameter
“external scripts enabled” in SQL
Server, this is done using the system
procedure sp_configure.
sp_configure 'external scripts enabled', 1;
RECONFIGURE WITH OVERRIDE
BITS Pilani, Pilani Campus
Mobile accessing a database?
BITS Pilani, Pilani Campus
CASE STUDY: UNIVERSITY Database design
BITS Pilani, Pilani Campus
Examples of Queries and Updates:
Examples of queries:
– Retrieve the transcript
– List the names of students who took the section of the ‘Database’
course offered in fall 2008 and their grades in that section
– List the prerequisites of the ‘Database’ course
Examples of updates:
– Change the class of ‘Smith’ to sophomore
– Create a new section for the ‘Database’ course for this semester
– Enter a grade of ‘A’ for ‘Smith’ in the ‘Database’ section of last
semester
BITS Pilani, Pilani Campus
Features of DBMS
The features of DBMS are:





Data Independence,
Back-up and Restore,
Transaction and Concurrency Control,
Data Security
Data Integrity.
BITS Pilani, Pilani Campus
Protection for DB?
Protection includes:
– System protection (against hardware or software malfunction
(or crashes))
– Security protection (against unauthorized or malicious access)
Hacking db
BITS Pilani, Pilani Campus
What is “Maintain the DB system” ?
– Allow the system to evolve as requirements change over time
Therefore it is an activity designed to keep a database running smoothly.
A number of different systems can do this:
1. performed by people who are comfortable and familiar with the database
system and the specifics of the particular database
2. Databases are used to maintain a library of information in a well organized,
accessible format. Database maintenance is used to keep the database clean
and well organized so that it will not lose functionality.
3. Backing up the data
4. Checking for signs of corruption in the database
5. Server maintenance.
BITS Pilani, Pilani Campus
Database approach
Characteristics of the Database Approach :
1. Self-describing nature of a database system
2. Insulation between programs and data, and data
abstraction
3. Support of multiple views of the data
4. Sharing of data and multiuser transaction
processing
BITS Pilani, Pilani Campus
Database approach
1. Self-Describing Nature of a Database System
Database system contains complete definition of structure and
constraints.
Meta-data
– Describes structure of the database
Database catalog used by:
– DBMS software
– Database users who need information about database structure
BITS Pilani, Pilani Campus
Database approach
2. Insulation between programs and data implies “data
abstraction”
 Program-data independence
– Structure of data files is stored in DBMS catalog separately from access
programs
 Program-operation independence
– Operations specified in two parts:
• Interface includes operation name and data types of its arguments
• Implementation can be changed without affecting the interface
BITS Pilani, Pilani Campus
Database Approach
3. Support of multiple views of the data
 View
 Subset of the database
 Contains virtual data derived from the database files but is not
explicitly stored
 Multiuser DBMS
 Users have a variety of distinct applications
 Must provide facilities for defining multiple views
BITS Pilani, Pilani Campus
Database Approach
Multiple views of the data
BITS Pilani, Pilani Campus
Database Approach
4. Sharing of data and multiuser transaction processing
Allow multiple users to access the database at the same time
 Concurrency control software
– Ensure that several users trying to update the same data do so in a
controlled manner
• Result of the updates is correct
 Online transaction processing (OLTP) application
A multiuser DBMS software is to ensure that concurrent transactions
operate correctly and efficiently.
BITS Pilani, Pilani Campus
What is Transaction in DB?
May cause some data
to be read and some
data to be written into
the database
A’s Account
B’s Account
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
Eg.,
When you credit or
debit in your bank
account
BITS Pilani, Pilani Campus
Three-schema architecture
BITS Pilani, Pilani Campus
Concept of 3 level or layers of DB
Each Layer has to handle two issues
Level 1
 How to store data
 Specify how to view the stored data
level2
level3
BITS Pilani, Pilani Campus
3 Layer architecture
BITS Pilani, Pilani Campus
Three schema architecture
The 3 levels or layers of database or DBMS architecture which are External, conceptual or logical
and physical or internal levels or layers.
 Each Level has a schema and data abstraction
 There is a schema at each levels or layers of the Database architecture while there is a
abstraction at each levels or layers as well.
 These abstractions are called external, conceptual and physical level abstractions
respectively.
 These abstractions are needed to view data while the schema at each level describe data at
some layer of visualization of the database.
BITS Pilani, Pilani Campus
External Schema
 External schemas, which usually are also in terms of the data model of the
DBMS, allow data access to be customized (and authorized) at the level of
individual users or groups of users.
 Any given database has exactly one conceptual schema and one physical
schema because it has just one set of stored relations, but it may have
several external schemas, each tailored to a particular group of users.
 Each external schema consists of a collection of one or more views and
relations from the conceptual schema.
SQL queries we place at this level
BITS Pilani, Pilani Campus
Conceptual Schema
conceptual schema (sometimes called the logical schema) describes the stored data
in terms of the data model of the DBMS.
In a relational DBMS, the conceptual schema describes all relations that are stored in
the database.
Faculty(d: string, fname: string, sal: real)
Courses(cid: string, cname: string, credits: integer)
Rooms(rno: integer, address: string, capacity: integer)
Enrolled(sid: string, cid: string, grade: string)
Teaches(d: string, cid: string)
Meets In(cid: string, rno: integer, time: string
Design of conceptual schemas is called “Conceptual database design”
BITS Pilani, Pilani Campus
Physical Schema
 The physical schema species additional storage details. Essentially, the
physical schema summarizes how the relations described in the conceptual
schema are actually stored on secondary storage devices such as disks and
tapes.
 We must decide what file organizations to use to store the relations, and
create auxiliary data structures called indexes to speed up data retrieval
operations.
indexes could be by hashing or trees
 Decisions about the physical schema are based on an understanding of how
the data is typically accessed. The process of arriving at a good physical
schema is called physical database design..
BITS Pilani, Pilani Campus
Example 1 : 3-level schema
BITS Pilani, Pilani Campus
Example 2 : 3-level schema
BITS Pilani, Pilani Campus
Example 2 : 3-level schema
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Data Independence
A database system normally contains a lot of data in addition to users’
data.
For example,
it stores data about data, known as metadata, to locate and retrieve
data easily. It is rather difficult to modify or update a set of metadata
once it is stored in the database.
But as a DBMS expands, it needs to change over time to satisfy the
requirements of the users. If the entire data is dependent, it would
become a tedious and highly complex job.
BITS Pilani, Pilani Campus
DATA INDEPENDENCE
Data Independence
Data Independence is defined as a property of DBMS that helps you to change the Database
schema at one level of a database system without requiring to change the schema at the next
higher level. Data independence helps you to keep data separated from all programs that
make use of it.
Data abstraction or abstraction at each level makes this data independence possible.
Types of data independence
There are two types namely logical and physical data independence.
BITS Pilani, Pilani Campus
DATA INDEPENDENCE
1. Logical data independence:
Logical data is data about database, that is, it stores information about
how data is managed inside.
For example, a table (relation) stored in the database and all its constraints, applied on that
relation.
So Logical data independence is a kind of mechanism, which liberalizes itself from actual data
stored on the disk. If we do some changes on table format, it should not change the data residing
on the disk.
BITS Pilani, Pilani Campus
DATA INDEPENDENCE
2. Physical data independence:
All the schemas are logical, and the actual data is stored in bit format on the disk.
Physical data independence is the power to change the physical data without impacting the
schema or logical data.
For example, in case we want to change or upgrade the storage system itself − suppose we
want to replace hard-disks with SSD − it should not have any impact on the logical data or
schemas.
BITS Pilani, Pilani Campus
Data Independence
Metadata itself follows a layered
architecture, so that when we
change data at one layer, it does
not affect the data at another
level. This data is independent but
mapped to each other.
BITS Pilani, Pilani Campus
Mapping
Mapping: Mapping is used to transform the request and response between various database
levels of architecture. There are two types of mapping namely external and Internal mapping.
In External / Conceptual mapping, it is necessary to transform the request from external level to
conceptual schema.
In Conceptual / Internal mapping, DBMS transform the request from the conceptual to internal
level.
Mapping is not good for small database, because it takes more time
.
BITS Pilani, Pilani Campus
Mapping
BITS Pilani, Pilani Campus
DBMS ARCHITECTURE
BITS Pilani, Pilani Campus
DBMS ARCHITECTURE
BITS Pilani, Pilani Campus
Client Server concept
Client and Server concept
The DBMS may be on a centralized machine or server. The clients or end users or programs be it
standalone or web/mobile applications accessing the database on the server. There could be
several users or apps trying to read or write data on the database located on server.
BITS Pilani, Pilani Campus
Types of DBMS Architecture
Types of DBMS Architecture
The types of DBMS architectures are single tier or multi -tier. The n-tier architecture divides the
whole system into related but independent n modules which can be modified, altered,
changed or replaced independently. There are different layers like presentation layer(UI),
application layer (business logic or programs) and data layer where actual database is stored.
1-tier architecture is where the user sits directly on the DBMS and all changes done by DBMS
itself. Thus no client or server.
BITS Pilani, Pilani Campus
DBMS Architecture
2-tier architecture is where the
presentation or UI or app is on your
mobile or computer which is the
client program and Server which
has the database.
The Database is on server so it could
be treated that data is safe from
unauthorised users.
BITS Pilani, Pilani Campus
DBMS ARCHITECTURE – 2 TIER
BITS Pilani, Pilani Campus
DBMS ARCHITECTURE – 2 TIER
BITS Pilani, Pilani Campus
DBMS ARCHITECTURE – 2 TIER
BITS Pilani, Pilani Campus
DBMS ARCHITECTURE – 3 TIER
3-tier architecture is where there are
3 layers or modules.
The client or presentation layer which
is app on mobile or computer and
server which has the application
layer containing business logic
modules.
The data layer contains the actual
databases.
BITS Pilani, Pilani Campus
DBMS ARCHITECTURE – 3 TIER
BITS Pilani, Pilani Campus
DBMS ARCHITECTURE – 3 TIER
BITS Pilani, Pilani Campus
DBMS ARCHITECTURE – 3 TIER
BITS Pilani, Pilani Campus
DBMS ARCHITECTURE
BITS Pilani, Pilani Campus
Example of Three-Tier Architecture
A common environment for using
a database has three tiers of
processors:
1. Web servers --- talk to the
user.
2. Application servers --- execute
the business logic.
3. Database servers --- get what
the app servers need from the
database.
BITS Pilani, Pilani Campus
What all are stored in DB?
BITS Pilani, Pilani Campus
What is METADATA?
Metadata : Data about data is meta data. It describes about itself.
Example
BITS Pilani, Pilani Campus
What is Meta data?
BITS Pilani, Pilani Campus
DB or SYSTEM CATALOG
Every database stores every information about its objects. These information can be structure, definition,
purpose, storage, number of columns and records, dependencies, access rights, owner and other info.
These useful information about the data in the database ,also called as metadata. These metadata are also
stored as rows and columns of a table.
Collection of these metadata is stored in the system catalog or data dictionary.
When the database is created a system catalog is automatically created as well.
This catalog keeps track of the objects created and changes made to each objects in the database. Every
database has its own system catalog.
BITS Pilani, Pilani Campus
Database / system Catalog
.The information stored in the
catalog is called meta-data, and it
describes the structure of the
primary database
BITS Pilani, Pilani Campus
Example of Database Catalog
BITS Pilani, Pilani Campus
SYSTEM or DB CATALOG
Who creates system catalog?
When you create a database the
system catalog is automatically
created by DBMS.
BITS Pilani, Pilani Campus
SYSTEM CATALOG
The system catalog contains information such as the following:
•
•
•
•
•
•
•
•
•
•
•
•
•
User accounts and default settings
Privileges and other security information
Performance statistics
Object sizing
Object growth
Table structure and storage
Index structure and storage
Information on other database objects, such as views, synonyms, triggers, and stored
procedures
Table constraints and referential integrity information
User sessions
Auditing information
Internal database settings
Locations of database files
BITS Pilani, Pilani Campus
What is a Query
 Queries in a DBMS :
Queries : Questions involving the data stored in DBMS.
Types of queries :
Formal query language -> Relational Algebra and
Relational Calculus
commercial query language -> SQL statements
BITS Pilani, Pilani Campus
DB Architecture
BITS Pilani, Pilani Campus
DB architecture
Database Architecture:
Database architecture deals the design, development, implementation and maintenance of
computer programs that which store and organize information for businesses, agencies and
institutions. The development and implementation software to meet the needs of users is
done by Data Architect.
Design of DBMS depends on its architecture which can be centralized or hierarchical.
Data Abstraction: Hiding the implementation and storage details of DB or database.
BITS Pilani, Pilani Campus
Example of DB architecture
Green ones are tables ,
BITS Pilani, Pilani Campus
Data Dictionary
Creating a Data Dictionary for Online Delivery System
Delivery Table – This table contains details about the delivery
Primary Key
Attribute
Data Type
Y
Delivery_id
INTEGER
Delivery_date
Date
Delivery_person_name VARCHAR
Delivery_person_contact VARCHAR
Order Table – This table contains details about order
PrimaryKey Attribute
Data Type
Size
Y
Order_id
INTEGER
Cust_id
INTEGER
Delivery_id INTEGER
Date
Date
Branch_id
INTEGER
-
Size
50
20
Description
Unique ID of delivery
Date of the delivery
Name of the person who does the delivery of specific order
Name of the person who does the delivery of specific order
Description
Unique Id of order
Unique ID of customer
Unique ID of delivery
Date of a specific order
Unique ID of the branch
BITS Pilani, Pilani Campus
DBMS Functions, Pros and cons.
BITS Pilani, Pilani Campus
DBMS
BITS Pilani, Pilani Campus
DBMS Applications
BITS Pilani, Pilani Campus
DBMS Advantages
BITS Pilani, Pilani Campus
DBMS Disadvantages
BITS Pilani, Pilani Campus
WHAT IS ERP OR CRM ?
These packages identify a set of common tasks (e.g., inventory management, human
resources planning, financial analysis) encountered by a large number of organizations
and provide a general application layer to carry out these tasks.
The data is stored in a relational DBMS, and the application layer can be customized to
different companies, leading to lower overall costs for the companies, compared to the
cost of building the application layer from scratch.
Extending database capabilities for new applications
– Extensions to better support specialized requirements for applications
– Enterprise resource planning (ERP)
– Customer relationship management (CRM)
Databases versus information retrieval
– Information retrieval (IR)
• Deals with books, manuscripts, and various forms of library-based articles
BITS Pilani, Pilani Campus
ERP
ERP applications are most commonly deployed in a distributed
and often widely dispersed manner. While the servers may be
centralized, the clients are usually spread to multiple locations
throughout the enterprise.
Enterprise Resource Planning software can be used to
automate and simplify individual activities across a
business or organization, such as accounting and
procurement, project management, customer
relationship management, risk management,
compliance, and supply chain operations.
BITS Pilani, Pilani Campus
ERP systems
SAP
SAP has multiple ERP offerings – By Design, Manufacturing, and
Business One.
Sage
Aimed at firms in manufacturing, distribution and services, Sage
offers a suite of products providing a range of key ERP tools.
Microsoft Dynamics
This has developed into a full suite of ERP products, and now
includes applications for, amongst other things, financial
management, human resources, and supply chain management.
What are the resources?
money, human resources, machinery, land, telecom
spectrum, oil fields, coal mines licenses etc
e-governance
E-seva
ELCOT https://it.tn.gov.in/en/ELCOT/e-interventions
Salesforce.com
Salesforce.com is one the biggest names in ERP and customer
relationship management (CRM) solutions, which can be used
across a variety of sectors. The firm offers a range of products
depending on a businesses’ size and sector.
BITS Pilani, Pilani Campus
CRM
Customer Relationship
Management(CRM) is a
business strategy to acquire
and manage the most valuable
customer relationships. CRM
requires a customer-centric
business philosophy and
culture to support effective
marketing, sales and service
processes.
CRM system stores the history of the relationship
between seller and customer. Brands optimize employee
employee performance, implement new marketing
tools, improve service levels, and drive revenue growth
based on this data.
BITS Pilani, Pilani Campus
CRM examples
CRM platform includes the following functions:
•organizes data for easy interpretation,
•prompts CSRs for useful information,
•protects against duplicate records,
•flags incomplete records.
use of CRM in the public domain for quite
some time now. These include the IRCTC –
the Indian Railways ticketing web portal,
the Road Transport Authority services, the
Municipal corporations and its allied
departments for health and governance as
well as legal and utility services such as
electricity and gas operators. These have
all been progressively managing citizen data
and relationships using CRM techniques.
As a part of the massive software-as-a-service (SaaS)
market, CRM technology represents the fastest-growing
category of enterprise-software. The major players in
the CRM market are Adobe Systems, Microsoft,
Oracle, Salesforce, and SAP.
Salesforce Sales Cloud
Zoho CRM
Hubspot
BITS Pilani, Pilani Campus
TIMELINE OF DBMS
BITS Pilani, Pilani Campus
When Not to Use a DBMS
More desirable to use regular files for:
– Simple, well-defined database applications not expected to
change at all
– Stringent, real-time requirements that may not be met because
of DBMS overhead
– Embedded systems with limited storage capacity
– No multiple-user access to data
BITS Pilani, Pilani Campus
DBMS WORKERS or Users
System analysts
– Determine requirements of end users
Application programmers
– Implement these specifications as programs
DBMS system designers and implementers
Design and implement the DBMS modules and interfaces as a software
package
Tool developers
Design and implement tools
Operators and maintenance personnel
Responsible for running and maintenance of hardware and software
environment for database system
BITS Pilani, Pilani Campus
DBMS WORKERS or Users
Database administrators (DBA) are responsible for:
– Authorizing access to the database
– Coordinating and monitoring its use
– Acquiring software and hardware resources
Database designers are responsible for:
– Identifying the data to be stored
– Choosing appropriate structures to represent and store this data
End users
People whose jobs require access to the database
Types
Casual end users
Naive or parametric end users
Sophisticated end users
Standalone users
BITS Pilani, Pilani Campus
QUESTIONS??
BITS Pilani, Pilani Campus
Phases of Database Design
BITS Pilani, Pilani Campus
Phases of Database Design
1. Requirements specification and analysis
2. Conceptual design
3. Logical design
4. Physical design
BITS Pilani, Pilani Campus
Phases of Database Design
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Phases of Database Design
BITS Pilani, Pilani Campus
Phases of Database Design
1. Requirements specification and analysis
• The requirements and the collection analysis phase produce
both data requirements and functional requirements.
• The data requirements are used as a source of database design.
• The data requirements should be specified in as detailed and
complete form as possible.
BITS Pilani, Pilani Campus
Phases of Database Design
2. Conceptual design
• The result of this phase is an Entity-Relationship (ER) diagram or UML class
diagram. It is a high-level data model of the specific application area.
• It describes how different entities (objects, items) are related to each other.
It also describes what attributes (features) each entity has. It includes the
definitions of all the concepts (entities, attributes) of the application area.
• During or after the conceptual schema design, the basic data model
operations can be used to specify the high-level user operations identified
during the functional analysis. This also serves to confirm that the conceptual
schema meets all the identified functional requirements.
BITS Pilani, Pilani Campus
Phases of Database Design
3. Logical design
A. create relation schemas:
• The result of the logical design phase (or data model mapping phase) is a set
of relation schemas. The ER diagram or class diagram is the basis for these
relation schemas.
• To create the relation schemas is quite a mechanical operation. There are
rules how the ER model or class diagram is transferred to relation schemas.
• The relation schemas are the basis for table definitions. In this phase (if not
done in previous phase) the primary keys and foreign keys are defined.
BITS Pilani, Pilani Campus
Phases of Database Design
3. Logical design
B. Normalization
Normalization is the last part of the logical design. The goal of normalization is to
eliminate redundancy and potential update anomalies.
• Redundancy means that the same data is saved more than once in a database.
Update anomaly is a consequence of redundancy. If a piece of data is saved in
more than one place, the same data must be updated in more than one place.
• Normalization is a technique by which one can modify the relation schema to reduce
the redundancy. Each normalization phase adds more relations (tables) into the
database.
BITS Pilani, Pilani Campus
Phases of Database Design
4. Physical design
 The goal of the last phase of database design, physical design, is to
implement the database. At this phase one must know which database
management system (DBMS) is used.
 For example, different DBMS's have different names for datatypes and
have different datatypes.
 The SQL clauses to create the database are written. The indexes, the
integrity constraints (rules) and the users' access rights are defined.
 Finally the data to test the database is added in.
BITS Pilani, Pilani Campus
Eg, UNIVERSITY database
STEP A: ANALYZE THE PROBLEM
Step1 :
What does this DB do?
Information concerning students, courses, and grades in a university environment
Step2 :
What are the Data records?
–
–
–
–
–
STUDENT
COURSE
SECTION
GRADE_REPORT
PREREQUISITE
Step 3:
Step Specify structure of records of each file by specifying data type for each data
element
– String of alphabetic characters
– Integer
– Etc.
BITS Pilani, Pilani Campus
UNIVERSITY database contd…
Step B: Relate the records.
Step 4: Construct UNIVERSITY database
– Store data to represent each student, course, section, grade report, and
prerequisite as a record in appropriate file
Step 5: Relationships among the records
Step 6: Manipulation involves querying and updating
BITS Pilani, Pilani Campus
Examples of Queries and Updates:
Examples of queries:
– Retrieve the transcript
– List the names of students who took the section of the ‘Database’
course offered in fall 2008 and their grades in that section
– List the prerequisites of the ‘Database’ course
Examples of updates:
– Change the class of ‘Smith’ to sophomore
– Create a new section for the ‘Database’ course for this semester
– Enter a grade of ‘A’ for ‘Smith’ in the ‘Database’ section of last
semester
BITS Pilani, Pilani Campus
UNIVERSITY Database designed as:
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Session 2
Topic : ER and EER Design Practise Tutorial.
LEARNING OUTCOME
 ER Designing
 EER Designing
REFER:
T1-Chapter 3 and 4;
Sections:
3.1, 3.3-3.9, 4.1-4.3, 4.6
BITS Pilani, Pilani Campus
Roadmap for ER design
1. ENTITY and its types strong, weak
2. Attributes and its types
3. Keys
Primary , Foreign , Super , Candidate , Composite
4. Relationship and degree of relationship
a. Mapping constraints or cardinality constraint
b. Participation constraint
c. Relationship weak and strong, degree
d. When to use binary ,ternary and higher degree.
e. Two entities can have multiple relations.
f. Redundant relationships and removing the redundancy.
5. ER design steps.
6. Example of ER design.
BITS Pilani, Pilani Campus
Domains, Attributes, Tuples, and
Relations
BITS Pilani, Pilani Campus
Example of entity and entity set
Student
BITS Pilani, Pilani Campus
ATTRIBUTE in ER diagram
1. NULL values to attributes if that
attribute doesn’t have value
e.g.,
FName = NULL
Nationality = NULL
Gender = NULL
2. Domain:
set of permitted values for that attribute.
E.g.,
dom(pval) - any between 0 to +1
dom (votecast_age)  any value from 18+
BITS Pilani, Pilani Campus
TYPES OF ATTRIBUTES
BITS Pilani, Pilani Campus
TYPES OF ATTRIBUTES (contd.)
BITS Pilani, Pilani Campus
Attributes
NOTE:
Primary key will be underlined
BITS Pilani, Pilani Campus
KEYS
Entities and relationships are distinguishable using various keys
A key is a combination of one or more attributes, e.g., social-security number, combination of name and socialsecurity number.
A superkey is a key defined either for an entity set or relationship set that uniquely identifies an entity,
e.g., social-security number, phone number, combination of name and social-security number. ( Note : UNIQUE
and NOT NULL)
A candidate key is a minimal superkey that uniquely identifies either an entity or a relationship,
e.g., social-security number, phone number. ( Note : UNIQUE and NOT NULL)
A primary key is a candidate key that is chosen by the database designer to identify the entities of an entity set.
( Note : UNIQUE and NOT NULL)
A Composite key is a candidate key with more one attribute to identify the entities of an entity set.
eg., {stud_id, stud_email, stud_name}
BITS Pilani, Pilani Campus
KEYS
A foreign key is a set of one or more attributes of a strong entity set that are
employed to construct the discriminator of a weak entity set. The primary key
of a weak entity set is formed by the primary key of the strong entity set on
which it is existence-dependent.
BITS Pilani, Pilani Campus
Attributes
BITS Pilani, Pilani Campus
Attributes
Construct ER:
Employee has SSN, salary,
Age, Bdate , phone numbers,
Address of each employee
complete address has area,
city, state while street-add
has door# and apt no,.
BITS Pilani, Pilani Campus
Attributes
BITS Pilani, Pilani Campus
Answer these:
1.Determine the type of attributes in
Customer(name, age, addr, phno, DOB)
book table
2. If an Employee can reside at the HQ and Bcity. Also that this
addr is further divided as door#, street and city . What will this
attribute addr be?
3. Consider the book table : find super key, Primary key, Candidate
key.
4. For person entity find Pk,Ck?
5. If K1={ID} and K2{name, addr} which one should be Pk?
6. Find Composite Key attributes or alternate key attributes?
Order{ Custid, orderid, sales}
Student{ Sid, name, addr, mark}
BITS Pilani, Pilani Campus
ER notations
BITS Pilani, Pilani Campus
Relationship
Relationships have:
1. Degree
2. How many entities of one entity is participating in the relation with how many
entities of another entity called as mapping or cardinality constraint.
3. Role names
4. Participation constraints
5. Relationships can have attributes.
BITS Pilani, Pilani Campus
Relationship
BITS Pilani, Pilani Campus
Types of Relationship based on degree of relation.
QUATERNARY
BINARY
TRENARY
N-NARY
BITS Pilani, Pilani Campus
Questions
1. What are the redundant relationships in fig1 ?
Fig 1
2. Draw ER for Employee has name and id and works in a
project for so many hours. Project has a name and
employee uses many machines identified by mcid
attribute.
3. Represent a scenario “employee who supervises other
employees” in ER
4. Convert fig 2 into a 1:N relationships
Fig 2
Fig 3
5. What are the redundant relationships in fig3 ?
BITS Pilani, Pilani Campus
ER design steps
1. Identify nouns as it becomes entity types.
2. Identify all attributes of each entity type.
3. Mark the Pk ,partial keys (if it exists) for each entity type.
3 a. Identify if entity type is weak or strong.
4. Identify the verbs from problem statement which becomes Relationships.
a. determine the degree of relationship as unary, binary or ternary
b. determine Mapping constraints or cardinality constraint as 1:1,1:N or N:1 , M:N
c.. Determine Participation constraint as total or partial
d.. Relationship is weak only if one entity type in the relationship is weak
e. determine if we have multiple relations.
f. determine rolename if any.
g. determine if relationship has any attributes.
h. Redundant relationships and removing the redundancy.
5. Determine any aggregation is needed
6. Loop 1 to 5 until the design is acceptable (ie., only when it captures all data in problem statement).
BITS Pilani, Pilani Campus
Problem 1
Consider a mail order database in which employees take orders for parts from
customers. The data requirements are summarized as follows:
 The mail order company has employees identified by a unique employee number,
their first and last names, and a zip code where they are located.
 Customers of the company are uniquely identified by a customer number.
 In addition, their first and last names and a zip code where they are located are
recorded.
 The parts being sold by the company are identified by a unique part number. In
addition, a part name, their price, and quantity in stock are recorded.
 Orders placed by customers are taken by employees and are given a unique order
number. Each order may contain certain quantities of one or more parts and their
received date as well as a shipped date is recorded.
Design an Entity-Relationship diagram for the mail order database.
BITS Pilani, Pilani Campus
Solution
Step 1: Identify the Entities: Employee, Customer, Parts, Order
Step 2: Identify the attributes of each entity:
Employee => enum, fname, lname, zipcode
Customer => custnum, fname,lname,zipcode
Parts => partnum, partname, price, qtyinhand
Order =>ordernum,recvddate, qty , shippeddate
Step 3:
Consider the “The mail order company has employees identified
by a unique employee number, their first and last names, and
a zip code where they are located”
BITS Pilani, Pilani Campus
Solution
Step 4 :
2.Consider the “ Customers of the company
are uniquely identified by a customer
number. In addition, their first and last
names and a zip code where they are
located are recorded.”
Step 5 :
3.Consider the “The parts being sold by the
company are identified by a unique part
number. In addition, a part name, their
price, and quantity in stock are recorded.”
BITS Pilani, Pilani Campus
Solution
Step 6 :
BR 1 employee serves only one customer at a time
4. consider the “Orders placed by
customers are taken by
employees and are given a
unique order number. Each order
may contain certain quantities of
one or more parts and their
received date as well as a
shipped date is recorded.”
BITS Pilani, Pilani Campus
BR 1 employee serves
only one customer at a
time
What if BR is 1 employee
can server more than 1
customer at a time ?
BITS Pilani, Pilani Campus
WHERE SHOULD “QTY” be stored?
has
Parts
Order
BITS Pilani, Pilani Campus
Solution 2
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Solution :
BITS Pilani, Pilani Campus
Solution 2
BITS Pilani, Pilani Campus
Problem 2
UPS prides itself on having up-to-date information on the processing and
current location of each shipped item. To do this, UPS relies on a companywide information system. Shipped items are the heart of the UPS product
tracking information system. Shipped items can be characterized by item
number (unique), weight, dimensions, insurance amount, destination, and
final delivery date. Shipped items are received into the UPS system at a
single retail center. Retail centers are characterized by their type, uniqueID,
and address. Shipped items make their way to their destination via one or
more standard UPS transportation events (i.e., flights, truck deliveries).
These transportation events are characterized by a unique scheduleNumber,
a type (e.g, flight, truck), and a deliveryRoute. Please create an Entity
Relationship diagram that captures this information about the UPS system.
Be certain to indicate identifiers and cardinality constraints.
BITS Pilani, Pilani Campus
These transportation events
are characterized by a
unique scheduleNumber, a
type (e.g, flight, truck), and
a deliveryRoute.
Shipped items can be
characterized by item
number (unique),
weight, dimensions,
insurance amount,
destination, and final
delivery date.
. Retail centers are characterized by their
type, uniqueID, and address
• Shipped items are received into the UPS
system at a single retail center.
• Shipped items make their way to their
destination via one or more standard UPS
transportation events (i.e., flights, truck
deliveries).
BITS Pilani, Pilani Campus
Problem 3
A friend is interested in keeping track of information about his album collection. He is not concerned
about whether or not the albums are CDs, tapes, LPs, etc.
Also, assume that he does not have any compilation albums—that is, each album has songs from a
single band.
For each album, he wants to store which band recorded the album, the title, the year, and the
chronology (e.g. this is the 4th album for that band). He also wants to store the songs, including
title, length, track number, and writer(s).
Of course, if two bands record the same song, they might have different track numbers and lengths.
For each band (group or individual), he also wants to store the names of all of the band members.
For each band member, he needs their first and last names, and country of origin.
Consider both band members and songwriters as musicians.
BITS Pilani, Pilani Campus
store the songs,
including title,
length, track
number, and
writer(s).
For each album, he wants to store
which band recorded the album, the
title, the year, and the chronology
(e.g. this is the 4th album for that
band).
For each band (group
or individual), he also
wants to store the
names of all of the
band members.
For each band
member, he needs
their first and last
names, and country
of origin.
BITS Pilani, Pilani Campus
Problem 4
Construct an E-R diagram for a car-insurance company
whose customers own one or more cars each. Each car
has associated with it zero to any number of recorded
accidents.
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Problem 5
Consider a database used to record the marks that
students get in different exams of different course
offerings.
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Problem
Suppose you are given the following requirements for a simple database for the National
Hockey League (NHL):
 the NHL has many teams,
 each team has a name, a city, a coach, a captain, and a set of players,
 each player belongs to only one team,
 each player has a name, a position (such as left wing or goalie), a skill level, and a set of
injury records,
 a team captain is also a player,
 a game is played between two teams (referred to as host_team and guest_team) and has a
date (such as May 11th, 1999) and a score (such as 4 to 2).
Construct a clean and concise ER diagram for the NHL database.
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Practise P r a c t i s e Practise…..
BITS Pilani, Pilani Campus
Roadmap for EER design
1. ENTITY and its types strong, weak
2. ENTITIES Generalization and Specialization.
3. Determine if lattices.
4. For each entity determine its Attributes and its types
5. Keys Primary , Foreign , Super , Candidate , Composite
6. Relationship and degree of relationship
a. Mapping constraints or cardinality constraint
b. Participation constraint
c. Relationship weak and strong, degree
d. When to use binary ,ternary and higher degree.
e. Two entities can have multiple relations.
f. Redundant relationships and removing the redundancy.
5. ER and EER design steps.
6. Example of EER design.
BITS Pilani, Pilani Campus
EER design steps
1. Specialization: extracting a subclass from an entity set.
2. Generalization: combining one or more entity sets into a higher-level entity.
a. Disjoint generalization: an entity belongs to at most one lower-level entity set.
b. Overlapping generalization: entities may belong to multiple lower-level entities.
c. Hierarchy: each entity set is only the object of one “ISA” relationship.
3. Lattice: entity sets may belong to multiple “ISA” relationships.
4. Condition-defined constraint: defines membership in a subclass via a predicate.
based on membership: d,o
based on definition :
 User-defined constraint: membership is manually defined. (db user defined)
 predicate defined constraint: based on a condition it becomes one subclass
based on Completeness constraint: all entities belong to a lower-level entity.
 Total constraint: all entities belong to lower-level entity sets.
 Partial Constraint: entities not required to belong to lower-level entity set.
5. Aggregation: grouping part of a schema into a larger unit.
6. LOOP from ER1 to EER 5 until design is accepted.
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
ER design steps
1. Identify nouns as it becomes entity types.
2. Identify all attributes of each entity type.
3. Mark the Pk ,partial keys (if it exists) for each entity type.
3 a. Identify if entity type is weak or strong.
4. Identify the verbs from problem statement which becomes Relationships.
a. determine the degree of relationship as unary, binary or ternary
b. determine Mapping constraints or cardinality constraint as 1:1,1:N or N:1 , M:N
c.. Determine Participation constraint as total or partial
d.. Relationship is weak only if one entity type in the relationship is weak
e. determine if we have multiple relations.
f. determine rolename if any.
g. determine if relationship has any attributes.
h. Redundant relationships and removing the redundancy.
5. Determine any aggregation is needed
6. Loop 1 to 5 until the design is acceptable (ie., only when it captures all data in problem statement).
BITS Pilani, Pilani Campus
EER design points
When developing an ER diagram presents several choices, including the following:





Should a concept be modelled as an entity or an attribute?
Should a concept be modelled as an entity or a relationship?
What are the relationship sets and their participating entity sets? Should we use
binary or ternary relationships?
Should we use aggregation
 UNIONs should be avoided.
• ER design is subjective. There are often many ways to model a given scenario. Analyzing alternatives can be tricky, especially for a large
enterprise.
• Common choices include:






Entity vs. attribute
Key for the entity / to store or discard an attribute
Entity vs. relationship
Binary or n-ary relationship
Use of ISA hierarchies (EER )
Use of aggregation
BITS Pilani, Pilani Campus
Specialization and Generalization
Total and partial participation
BITS Pilani, Pilani Campus
Example
a. To represent a person can be an
employee or customer only.
b. To represent a person can be a
student and staff
BITS Pilani, Pilani Campus
Example 1.
The EER diagram below describes the database of a training center, including
information about its members, training activities and bookings.
 Each member is identified through his/her e-mail address. Gold-members
can book any training activity, while common members can only book core
activities.
 For each training activity, the database stores the schedule (week, week day,
and time), the room, the leader and the e-mail of the leader. Each leader
leads several activities per week, but the same activities every week.
 Training activities are yoga , core and Aerobics.
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Example 2:
A nonprofit organization depends on a number of different types of persons for its
successful operation. The organization is interested in the following attributes for all
of these persons: SSN, name, Address, City, State, Zip and Telephones. Three types
of persons are of greatest interest: employees, volunteers, and donors.
Employees have only a Date Hired attribute, and volunteers have only a Skill
attribute. Donors have only a relationship (named Donates) with an Item entity type.
A donor must have donated one or more items, and an item may have no donors, or
one or more donors. There are persons other than employees, volunteers, and donors
who are of interest to the organization so that a person may not belong to any of these
three groups and may also belong to more than one group at the same time.
BITS Pilani, Pilani Campus
There are persons other than
employees, volunteers, and
donors who are of interest to the
organization so that a person may
not belong to any of these three
groups and may also belong to
more than one group at the same
time.
BITS Pilani, Pilani Campus
Example 3
Consider a bank, and model the following two aspects:
• There are three different kinds of ACCOUNTs, namely SAVINGS_ACCTs,
CHECKING_ACCTs and TRUSts. For each ACCOUNT we have to take care
of its TRANSACTIONs.
Each TRANSACTION has a type such as “deposit”, “withdrawal” or “check”.
Furthermore, each transaction has a date/time (consisting of a date and a
time) and an amount.
• There are different kinds of LOANS, namely CAR_LOANS, HOME_LOANS,
CREDIT_LINE and PERSONAL ones. For each LOAN we have to take care
of its PAYMENTs. Each PAYMENT has a type, date and amount
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Example 4:
Rules Entertainment is a chain of theaters owned by former husband and wife actors /entertainers who, for some
reason, can’t get a job performing anymore. The owners want a database to track what is playing or has played
on each screen in each theater of their chain at different times of the day. A theater (identified by a Theater ID
and described by a theater name and location) contains one or more screens for viewing various movies. Within
each theater each screen is identified by its number and is described by the seating capacity for viewing the
screen. Movies are scheduled for showing in time slots each day. Each screen can have different time slots on
different days (i.e., not all screens in the same theater have movies starting at the same time, and even on
different days the same movie may play at different times on the same screen). For each time slot, the owners
also want to know the end time of the time slot (assume all slots end on the same day the slot begins),
attendance during that time slot, and the price charged for attendance in that time slot. Each movie (which can
be either a trailer, feature, or commercial) is identified by a Movie ID and further described by its title, duration,
and type (i.e., trailer, feature, or commercial). In each time slot, one or more movies are shown. The owners
want to also keep track of in what sequence the movies are shown (e.g., in a time slot there might be two
trailers, followed by two commercials, followed by a feature film, and closed with another commercial).
BITS Pilani, Pilani Campus
Answer
BITS Pilani, Pilani Campus
Example 5
BITS Pilani, Pilani Campus
EER Models Design: Example
BITS Pilani, Pilani Campus
EER Models Design: Example
BITS Pilani, Pilani Campus
EER Models Design: Example
BITS Pilani, Pilani Campus
EER Models Design: Example
BITS Pilani, Pilani Campus
SOLUTION
BITS Pilani, Pilani Campus
Example
BITS Pilani, Pilani Campus
EXAMPLE ER :
Consider a CONFERENCE_REVIEW database in which researchers submit their research papers for consideration. Reviews
by reviewers are recorded for use in the paper selection process. The database system caters primarily to reviewers
who record answers to evaluation questions for each paper they review and make recommendations regarding
whether to accept or reject the paper. The data requirements are summarized as follows:
■ Authors of papers are uniquely identified by e-mail id. First and last names are also recorded. ■ Each paper is assigned a
unique identifier by the system and is described by a title, abstract, and the name of the electronic file containing the
paper.
■ A paper may have multiple authors, but one of the authors is designated as the contact author. ■ Reviewers of papers
are uniquely identified by e-mail address. Each reviewer’s first name, last name, phone number, affiliation, and topics
of interest are also recorded.
■ Each paper is assigned between two and four reviewers. A reviewer rates each paper assigned to him omarksr her on a
scale of 1 to 10 in four categories: technical merit, readability, originality, and relevance to the conference. Finally, each
reviewer provides an overall recommendation regarding each paper.
■ Each review contains two types of written comments: one to be seen by the review committee only and the other as
feedback to the author(s).
BITS Pilani, Pilani Campus
Solution:
BITS Pilani, Pilani Campus
Solution 2
BITS Pilani, Pilani Campus
TextBook Example ER
The COMPANY database keeps track of a company’s employees, departments, and projects.
Suppose that after the requirements collection and analysis phase, the database designers
provide the following description of the miniworld—the part of the company that will be
represented in the database.
The company is organized into departments.

Each department has a unique name, a unique number, and a particular employee who
manages the department. We keep track of the start date when that employee began managing
the department. A department may have several locations.

A department controls a number of projects, each of which has a unique name, a unique
number, and a single location.

The database will store each employee’s name, Social Security number,2 address, salary, sex
(gender), and birth date. An employee is assigned to one department, but may work on several
projects, which are not necessarily controlled by the same department. It is required to keep track
of the current number of hours per week that an employee works on each project, as well as the
direct supervisor of each employee (who is another employee).

The database will keep track of the dependents of each employee for insurance purposes,
including each dependent’s first name, sex, birth date, and relationship to the employee.
BITS Pilani, Pilani Campus
ER Diagram
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Session 3
Topic : Relational model and Logical design (ER /EER to RM)
and Normalization
LEARNING OUTCOME





Relational model concepts
Relational data model constraints
ER /EER to RM
FDs
Normalization
REFER:
T1-Chapter 5
Sections:
5.1- 5.3
BITS Pilani, Pilani Campus
Data Model and schema
•
The data model emphasizes on what data is needed and how it should be organized instead of what operations will be
performed on data.
•
The data models are used to represent the data and how it is stored in the database and to set the relationship between
data items.
Data Model is like an architect's building plan, which helps to build conceptual models and set a relationship between data
items. So it is an abstract model that organizes data description, data semantics, and consistency constraints of data.
There are 3 different types of data models: conceptual data models, logical data models, and physical data models, and each
one has a specific purpose.
1. Conceptual Data Model: This Data Model defines WHAT the system contains. This model is typically created by Business
stakeholders and Data Architects. The purpose is to organize, scope and define business concepts and rules. This is a high
level data model where we use ER , EER and UML to represent the business concepts and organize data.
2. Logical Data Model: Defines HOW the system should be implemented regardless of the DBMS. This model is typically
created by Data Architects and Business Analysts. The purpose is to developed technical map of rules and data structures.
Schema or relational schema is belonging to logical data model. Schema means a logical view.
3. Physical Data Model: This Data Model describes HOW the system will be implemented using a specific DBMS system. This
model is typically created by DBA and developers. The purpose is actual implementation of the database. While table
basically stored in files belong to physical data model.
BITS Pilani, Pilani Campus
Relational model concepts
BITS Pilani, Pilani Campus
Relational Model Concepts
What is a RELATION SCHEMA?
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Characteristics of relations
 a relation is a set – tuples are not in any order, and
have no duplicates
 flat relational model – values are atomic, not structures
or lists
 NULL (ω) – “information missing”, “not applicable”;
ambiguous semantics, not a member of any domain
 the order in which the attributes are listed in a table is
irrelevant.
 The Null value: used for don't know, not applicable or
value undefined
 Values of Attributes: For a relation to be in First
Normal Form, each of its attribute domains must consist
of atomic (neither composite nor multi-valued) values.
Notation
BITS Pilani, Pilani Campus
Relational Model Concepts
Informal Terms
Formal Terms
Table
Column
Row
Values in a column
Table Definition
Relation
Attribute/Domain
Tuple
Domain
Schema of a Relation
Populated Table
Extension
BITS Pilani, Pilani Campus
Explicit or Schema-based constraints
Constraints are conditions that must hold on all valid relation instances.
There are three main types of constraints:
1.Key constraints
a.
Each attribute value must be either null (which is really a non-value) or drawn
from the domain of that attribute.
b. for any two distinct tuples t1 and t2 in a relation state r of R, we have the
constraint that: t1[SK] ≠ t2[SK]
2. Entity integrity constraints The primary key attributes PK of each relation schema
cannot have null values in any tuple.
3. Referential integrity constraints
tuple in one relation is related to tuple in another
relation.
BITS Pilani, Pilani Campus
Referential Integrity Constraint
Statement of the constraint
The value in the foreign key column (or columns) FK of the the referencing relation R1 can be either:
(1) a value of an existing primary key value of the corresponding primary key PK in the referenced relation R2,,
or..
(2) a null.
In case (2), the FK in R1 should not be a part of its own primary key.
BITS Pilani, Pilani Campus
Relational DB and Relational DB Schemas
BITS Pilani, Pilani Campus
Relational DB and Relational DB Schemas
BITS Pilani, Pilani Campus
One possible database state for the COMPANY
relational database schema.
BITS Pilani, Pilani Campus
Relations together with a set of integrity constraints.
BITS Pilani, Pilani Campus
Operations and Constraint violation
We now see how violations happens
 INSERT a tuple.
 DELETE a tuple.
 MODIFY/Update a tuple.
BITS Pilani, Pilani Campus
Update Operations on Relations and
violations
■ Operation:
Update the salary of the EMPLOYEE tuple with
Ssn = ‘999887777’ to 28000.
■ Operation:
Update the Dno of the EMPLOYEE tuple
with Ssn = ‘999887777’ to 1.
■ Operation:
Update the Dno of the EMPLOYEE tuple
with Ssn = ‘999887777’ to 7.
■ Operation:
Update the Ssn of the EMPLOYEE tuple
with Ssn = ‘999887777’ to ‘987654321’.
Result:
Acceptable.
Result:
Acceptable.
Result:
Unacceptable,
because it violates
referential integrity.
Result:: Unacceptable, because it violates primary key constraint by repeating a value that
already exists as a primary key in another tuple; it violates referential integrity constraints
because there are other relations that refer to the existing value of Ssn
BITS Pilani, Pilani Campus
Insert operation and handle violation
■ Operation:
Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, NULL,
‘1960-04-05’, ‘6357 Windy Lane, Katy,
TX’, F, 28000, NULL, 4> into EMPLOYEE.
■ Operation:
Insert <‘Alicia’, ‘J’, ‘Zelaya’, ‘999887777’,
‘1960-04-05’, ‘6357 Windy Lane, Katy,
TX’, F, 28000, ‘987654321’, 4> into
EMPLOYEE.
■ Operation:
Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, ‘677678989’,
‘1960-04-05’, ‘6357 Windswept,
Katy, TX’, F, 28000, ‘987654321’, 7> into
EMPLOYEE.
■ Operation:
Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, ‘677678989’,
‘1960-04-05’, ‘6357 Windy Lane,
Katy, TX’, F, 28000, NULL, 4> into EMPLOYEE.
Result: This insertion violates the entity integrity
constraint (NULL for the
primary key Ssn), so it is rejected.
Result: This insertion violates the key constraint because
another tuple with the same Ssn value already exists in the
EMPLOYEE relation, and so it is
rejected.
Result: This insertion violates the referential integrity
constraint specified on Dno in EMPLOYEE because no
corresponding referenced tuple exists in
DEPARTMENT with Dnumber = 7
Result: This insertion satisfies all constraints, so it is
acceptable
BITS Pilani, Pilani Campus
Delete operations and constraint
violations
■ Operation:
Delete the WORKS_ON tuple
with Essn = ‘999887777’
and Pno = 10.
■ Operation:
Delete the EMPLOYEE tuple
with Ssn = ‘999887777’.
■ Operation:
Delete the EMPLOYEE tuple with
Ssn = ‘333445555’.
Result: This deletion is
acceptable and deletes
exactly one tuple.
Result: This deletion is not
acceptable, because there are
tuples in WORKS_ON that refer to
this tuple. Hence, if the tuple in
EMPLOYEE is deleted, referential
integrity violations will result.
Result: This deletion will result in even
worse referential integrity violations,
because the tuple involved is referenced
by tuples from the EMPLOYEE,
DEPARTMENT, WORKS_ON, and
DEPENDENT relations
BITS Pilani, Pilani Campus
In-Class Exercise
Consider the following relations for a database that keeps
track of student enrollment in courses and the books
adopted for each course:
STUDENT(SSN, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
STUDENT(SSN, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
ENROLL(SSN, Course#, Quarter, Grade)
ENROLL(SSN, Course#, Quarter, Grade)
BOOK_ADOPTION(Course#, Quarter, Book_ISBN)
BOOK_ADOPTION(Course#, Quarter, Book_ISBN)
TEXT(Book_ISBN, Book_Title, Publisher, Author)
TEXT(Book_ISBN, Book_Title, Publisher, Author)
Draw a relational schema diagram specifying the foreign
keys for this schema.
BITS Pilani, Pilani Campus
LEARNING OUTCOME
 Mapping ER Constructs to relations
 Mapping Class hierarchies
REFER:
T1-Chapter 5
Sections:
5.1- 5.3
BITS Pilani, Pilani Campus
Database Modelling and Implementation
Process
(Problem
statement)
BITS Pilani, Pilani Campus
Mapping ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
BITS Pilani, Pilani Campus
ER Constructs to relations
 ER-to-Relational Mapping Algorithm
Step 1: Mapping of Regular Entity Types
Step 2: Mapping of Weak Entity Types
Step 3: Mapping of Binary 1:1 Relation Types
Step 4: Mapping of Binary 1:N Relationship Types.
Step 5: Mapping of Binary M:N Relationship Types.
Step 6: Mapping of Multivalued attributes.
Step 7: Mapping of N-ary Relationship Types.
 Mapping EER Model Constructs to Relations
Step 8: Options for Mapping Specialization or Generalization.
Step 9: Mapping of Union Types (Categories).
BITS Pilani, Pilani Campus
ER-to-Relational Mapping Steps
Step 7: Mapping of N-ary Relationship Types. (Non-binary relationships)
 For each n-ary relationship type R, where n > 2, create a new relation S to represent R.
 Include as foreign key attributes in S the primary keys of the relations that represent the
participating entity types.
 Also include any simple attributes of the n-ary relationship type (or simple components of
composite attributes) as attributes of S.
Example:
 The relationship type SUPPY in the ER on the next slide. This can be mapped to the relation
SUPPLY shown in the relational schema, whose primary key is the combination of the three
foreign keys {SNAME, PARTNO, PROJNAME}
BITS Pilani, Pilani Campus
ER-to-Relational Mapping Steps
FIGURE 4.11
Ternary relationship types. (a) The SUPPLY relationship.
BITS Pilani, Pilani Campus
ER-to-Relational Mapping Steps
FIGURE 7.3
Mapping the n-ary relationship type SUPPLY from Figure 4.11a.
BITS Pilani, Pilani Campus
ER-to-Relational Mapping Steps
BITS Pilani, Pilani Campus
ER to RM tutorials
BITS Pilani, Pilani Campus
Problem 1
BITS Pilani, Pilani Campus
solution 1
BITS Pilani, Pilani Campus
Alternate Solution
BITS Pilani, Pilani Campus
Problem 2:
BITS Pilani, Pilani Campus
Solution:
BITS Pilani, Pilani Campus
Problem 3:
BITS Pilani, Pilani Campus
Solution :
BITS Pilani, Pilani Campus
TEXTBOOK Problem : COMPANY DB
BITS Pilani, Pilani Campus
COMPANY : ER
BITS Pilani, Pilani Campus
Solution:
BITS Pilani, Pilani Campus
ER-to-Relational Mapping Steps
BITS Pilani, Pilani Campus
EER-to-Relational Mapping Steps
Step8: Options for Mapping Specialization or Generalization.
Option 8A: Multiple relations, Super class and subclasses.
 Create a relation for the super class, including the super class attributes.
 Create a relation for each subclass, which includes the primary key of the super class (which acts as the
foreign key) and the attributes of the subclass specialization.
 This works for any specialization (partial, total, disjoint, overlapping)
Option 8B: Multiple relations, Subclass relations only
 Create a relation for each subclass, with the attributes of both the super class and the attributes of the
subclass.
 This only works for total specializations, meaning that every entity in the super class must belong to at
least one subclass. Otherwise members of the super class that don’t belong to a subclass will not be
represented.
BITS Pilani, Pilani Campus
EER-to-Relational Mapping Steps
Multiple relations, Super class and subclasses.
FIGURE 7.4 – Using Option 8A
Create a relation for the super class, including the super
class attributes.
Create a relation for each subclass, which includes the
primary key of the super class (which acts as the foreign
key) and the attributes of the subclass specialization.
This works for any specialization (partial, total, disjoint,
overlapping)
BITS Pilani, Pilani Campus
EER-to-Relational Mapping Steps
Tonnage
Multiple relations, Subclass relations only
FIGURE 7.4 – Using Option 8B
Create a relation for each subclass, with the attributes
of both the super class and the attributes of the subclass.
This only works for total specializations, meaning that
every entity in the super class must belong to at least one
subclass. Otherwise members of the super class that
don’t belong to a subclass will not be represented.
BITS Pilani, Pilani Campus
EER-to-Relational Mapping Steps
Option 8C: Single relation with one type attribute.
 Create a single relation, with all the attributes of the super class and all the attributes of a subclass.
 Include a ‘Type’ attribute, which is the discriminating attribute which indicates which subclass the row
belongs to.
 This only works if the specialization is disjoint, meaning the super class entity cannot be a member of more
than one subclass.
Option 8D: Single relation with multiple type attributes.
 Create a single relation with all the attributes of the super class and all the attributes of the subclass.
 Include a Boolean “Type” attribute for each subclass, which indicates whether the row belongs to that
subclass.
 This works with overlapping specializations, to indicate if the super class entity belongs to more than one
subclass.
BITS Pilani, Pilani Campus
EER-to-Relational Mapping Steps
EngType
Single relation with one type attribute.
Create a single relation, with all the attributes of
the super class and all the attributes of a subclass.
Include a ‘Type’ attribute, which is the
discriminating attribute which indicates which
subclass the row belongs to.
This only works if the specialization is disjoint,
meaning the super class entity cannot be a member
of more than one subclass.
BITS Pilani, Pilani Campus
EER-to-Relational Mapping Steps
Single relation with multiple type attributes
O
Create a single relation with all the
attributes of the super class and all the
attributes of the subclass.
Include a Boolean “Type” attribute for each
subclass, which indicates whether the row
belongs to that subclass.
This works with overlapping specializations,
to indicate if the super class entity belongs to
more than one subclass.
BITS Pilani, Pilani Campus
EER-to-Relational Mapping Steps
Step 9: Mapping of Union Types (Categories).
 For mapping a category whose defining super classes have different keys, you can specify a new
key attribute, called a surrogate key, when creating a relation to correspond to the category.
 Then create a relation for each category, which includes the attributes of the category, and the
surrogate key, which acts as the foreign key.
BITS Pilani, Pilani Campus
EER to RM tutorials
BITS Pilani, Pilani Campus
Problem 4:
BITS Pilani, Pilani Campus
Solution :
BITS Pilani, Pilani Campus
Problem 5:
BITS Pilani, Pilani Campus
Solution 1:
BITS Pilani, Pilani Campus
Solution 2 :Multiple relations ,super and sub classes 8A
BITS Pilani, Pilani Campus
Problem 6:
BITS Pilani, Pilani Campus
Solution:
BITS Pilani, Pilani Campus
Solution
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
FUNCTIONAL DEPENDENCIES
BITS Pilani, Pilani Campus
FD
What is a functional dependency?
Functional Dependency is when one attribute
determines another attribute in a DBMS
system. Functional Dependency plays a vital
role to find the difference between good and
bad database design.
Example: if we know the value of Employee number, we can
 By this, we can say that the city, Employee Name, and salary are functionally
depended on Employee number.
 A functional dependency is denoted by an arrow →
 The functional dependency of X on Y is represented
by X →Y
obtain Employee Name, city, salary, etc.
Employee
number
Employee
Name
Salary
City
1
Dana
50000
San Francisco
2
Francis
38000
London
3
Andrew
25000
Tokyo
BITS Pilani, Pilani Campus
Functional Dependencies
Determine if FD for the following schema?
Determine if FD: eid  {ename, age} valid or not?
BITS Pilani, Pilani Campus
Functional Dependencies
Armstrong Axioms or Rules are :
 Reflexivity: X->X // An attribute(s) determines itself.
 Augmentation: if X->Y then XZ->YZ.
 Transitivity: if X->Y & Y->Z then X->Z.
 Additivity or Union : if X->Y & X->Z then X->YZ.
 Projectivity or Decomposition: If X->YZ then X-> Y & X->Z.
 Pseudo-Transitivity: If X->Y, YZ->W then XZ->W.
BITS Pilani, Pilani Campus
Types of Functional Dependencies
Trivial dependency is a set of attributes which are called a trivial if the set of attributes are included in
that attribute.
So, X -> Y is a trivial functional dependency if Y is a subset of X.
Ie., {X,Y} ->X
The following dependencies are also trivial: X->X & Y->Y.
Emp_id
Emp_name
AS555
Harry
AS811
George
AS999
Kevin
Consider this table with two columns Emp_id and Emp_name.
{Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id is a
subset of {Emp_id,Emp_name}.
BITS Pilani, Pilani Campus
Types of Functional Dependencies
Non-trivial functional dependency
Functional dependency which also known as a nontrivial dependency occurs when A->B
subset of A. In a relationship, if attribute B is not a subset of attribute A, then it is
dependency.
Company
CEO
Age
Example:
Microsoft
Satya Nadella
51
Google
Sundar Pichai
46
(Company} -> {CEO} (if we know the Company, we know
the CEO name)
Apple
Tim Cook
57
But CEO is not a subset of Company, and hence it's nontrivial functional dependency.
BITS Pilani, Pilani Campus
Types of Functional Dependencies
Multivalued dependency occurs in the situation where there are multiple independent multivalued attributes in a
single table. A multivalued dependency is a complete constraint between two sets of attributes in a relation. It
in a relation.
Car_model
Maf_year
Color
H001
2017
Metallic
H001
2017
Green
H005
2018
Metallic
H005
2018
Blue
H010
2015
Metallic
H033
2012
Gray
 Maf_year and color are independent of each
other but dependent on car_model.
 In this example, these two columns are said to be
multivalue dependent on car_model.
 This dependence can be represented like this:
car_model -> maf_year
car_model-> colour
BITS Pilani, Pilani Campus
Types of Functional Dependencies
A transitive dependency is a type of functional dependency which happens
when t is indirectly formed by two functional dependencies.
{Company} -> {CEO} (if we know the company, we know its CEO's name)
{CEO } -> {Age} If we know the CEO, we know the Age
Company
CEO
Age
Microsoft
Satya Nadella
51
Google
Sundar Pichai
46
Alibaba
Jack Ma
54
Therefore according to the rule of rule of transitive dependency:
{ Company} -> {Age} should hold, that makes sense because if we know the
company name, we can know his age.
Note: You need to remember that transitive dependency can only occur in a
relation of three or more attributes.
BITS Pilani, Pilani Campus
Types of Functional Dependencies
Full Functional Dependency:
A FD X ... Y is a full functional dependency if removal of any attribute A from X means that the dependency
does not hold any more.
Partial FD : A functional dependency X → Y is a partial dependency if some attribute A ε X can be removed from X
and the dependency still holds; that is, for some A ε X, (X − {A}) → Y.
BITS Pilani, Pilani Campus
FD
Advantages of Functional Dependency
 Functional Dependency avoids data redundancy. Therefore same data do not
repeat at multiple locations in that database
 It helps you to maintain the quality of data in the database
 It helps you to defined meanings and constraints of databases
 It helps you to identify bad designs
 It helps you to find the facts regarding the database design
BITS Pilani, Pilani Campus
Logical Design - Normalization
BITS Pilani, Pilani Campus
Normalization
BITS Pilani, Pilani Campus
Normalization: 1NF
A relation will be 1NF
 If it contains an atomic value.
 It states that an attribute of a table cannot hold multiple values.
 It must hold only single-valued attribute.
 First normal form disallows the multi-valued attribute, composite attribute, and their combinations.
EMP_ID
EMP_NAME
EMP_PHONE
EMP_STATE
14
John
7272826385,
9064738238
UP
20
Harry
8574783832
Bihar
12
Sam
7390372389,
8589830302
Punjab
Relation EMPLOYEE is not in 1NF because of
multi-valued attribute EMP_PHONE.
SOLUTION:
The decomposition of the EMPLOYEE
table into 1NF has been shown
below:
EMP_ID
EMP_NAME
EMP_PHONE
EMP_STATE
14
John
7272826385 UP
14
John
9064738238 UP
20
Harry
8574783832 Bihar
12
Sam
7390372389 Punjab
12
Sam
8589830302 Punjab
BITS Pilani, Pilani Campus
Normalization : 1NF
1NF (tables values are single and atomic)
Check if table is in UnNormalized Form UNF (ie.,
tables values are not single and atomic)
Yes: Relation should have no multivalued attributes
or nested relations.
So make all values as single and atomic.
Remedy : Form new relations for each multivalued
attribute or nested relation.
BITS Pilani, Pilani Campus
Normalization: 2 NF
In the 2NF,
 Relational must be in 1NF.
 All non-key attributes are fully functional
dependent on the primary key
Example: Let's assume, a school can store
the data of teachers and the subjects they
teach. In a school, a teacher can teach more
than one subject.
TEACHER table
TEACHER_ID
SUBJECT
TEACHER_AGE
25
Chemistry
30
25
Biology
30
47
English
35
83
Math
38
83
Computer
38
Is the Table in 2NF? In the given table, nonprime attribute TEACHER_AGE is dependent on
TEACHER_ID which is a proper subset of a
candidate key.
That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose
it into two tables:
BITS Pilani, Pilani Campus
Normalization : 2 NF
TEACHER_DETAIL table:
TEACHER_ID
TEACHER_AGE
25
30
47
35
83
38
TEACHER_SUBJECT table:
TEACHER_ID
SUBJECT
25
Chemistry
25
Biology
47
English
83
Math
83
Computer
BITS Pilani, Pilani Campus
Normalization : 2 NF
2NF ( partial dependency of NPA on PK attributes)
Check if table already in 1NF (ie., values as single and atomic in the table)
Yes : Check if table in 2NF (ie., partial dependency of NPA on PK attributes)
TEST for 2NF: For relations where primary key contains multiple attributes, no nonkey
attribute should be functionally dependent on a part of the primary key.
Remedy : Decompose and set up a new relation.
for each partial key with its dependent attribute(s). Make sure to keep a
relation with the original primary key and any attributes that are fully
functionally dependent on it.
Ie., PA -- Fully dependent NPA’s
FD 1: SSN Pnumber, hours
FD 2: SSN  ename
FD 3: Pnumber  Pname, Plocation
Now table in 2NF
BITS Pilani, Pilani Campus
2NF example:
BITS Pilani, Pilani Campus
Normalization: 3 NF
A relation will be in 3NF
 If it is in 2NF and not contain any transitive partial dependency.
 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
 If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form.
A relation is in third normal form if it
holds at-least one of the following
conditions for every non-trivial
function dependency X → Y.
1.X is a super key.
2.Y is a prime attribute, i.e., each
element of Y is part of some candidate
key.
EMPLOYEE_DETAIL table:
EMP_ID
EMP_NAME EMP_ZIP
EMP_STATE EMP_CITY
222
Harry
201010
UP
Noida
333
Stephan
02228
US
Boston
444
Lan
60007
US
Chicago
555
Katharine
06389
UK
Norwich
666
John
462007
MP
Bhopal
Super key in the table above:
{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
Candidate key: {EMP_ID}
BITS Pilani, Pilani Campus
Normalization: 3 NF
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID.
Is table in 3NF?
The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super
key(EMP_ID). It violates the rule of third normal form.
SOLUTION
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP>
table, with EMP_ZIP as a Primary key.
EMPLOYEE_ZIP table:
EMPLOYEE table:
EMP_ZIP
EMP_STATE
EMP_CITY
EMP_ID
EMP_NAME EMP_ZIP
222
333
444
555
666
Harry
Stephan
Lan
Katharine
John
201010
02228
60007
06389
462007
201010
UP
Noida
02228
US
Boston
60007
US
Chicago
06389
UK
Norwich
462007
MP
Bhopal
BITS Pilani, Pilani Campus
Normalization: 3 NF
3NF (no transitive dependency)
Check if table in 2NF already (ie., NPA are fully functional dependent
on PA)
YES : Check if table in 3NF (ie., no transitive dependency)
Test : Relation should not have a nonkey attribute functionally
determined by another nonkey attribute (or by a set of nonkey
attributes). That is, there should be no transitive dependency of a
nonkey attribute on the primary key.
Or
A relation schema R is in 3NF if every nonprime attribute of R meets
both of the following conditions:
■ It is fully functionally dependent on every key of R.
■ It is nontransitively dependent on every key of R.
Remedy: Decompose and set up a relation that includes the nonkey
attribute(s) that functionally determine(s) other nonkey
attribute(s).
SSN - Dnumber and Dnumber - Dname, DmgrSSN
BITS Pilani, Pilani Campus
3NF example
BITS Pilani, Pilani Campus
BCNF ( BOYCE CODD NF)
BCNF is the advance version of 3NF. It is stricter than 3NF.
A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
EMP_ID
EMP_COUN EMP_DEPT
TRY
DEPT_TYPE EMP_DEPT_
NO
264
India
Designing
D394
283
264
India
Testing
D394
300
364
UK
Stores
D283
232
364
UK
Developing D283
549
Candidate key:
{EMP-ID, EMP-DEPT}
Is the table in BCNF?
The table is not in BCNF
because neither EMP_DEPT
nor EMP_ID alone are keys.
In the above table Functional dependencies are as follows:
1.EMP_ID → EMP_COUNTRY
2.EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
BITS Pilani, Pilani Campus
BCNF ( BOYCE CODD NF)
To convert the given table into BCNF, we decompose it into three tables:
Functional dependencies:
EMP_COUNTRY table:
1.EMP_ID → EMP_COUNTRY
EMP_ID
EMP_COUNTRY
2.EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
264
India
EMP_DEPT_MAPPING table:
264
India
EMP_ID
EMP_DEPT
EMP_DEPT table:
D394
283
EMP_DEPT
DEPT_TYPE
EMP_DEPT_NO
D394
300
Designing
D394
283
D283
232
Testing
D394
300
Stores
D283
232
Developing
D283
549
D283
549
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.
BITS Pilani, Pilani Campus
BCNF ( BOYCE CODD NF)
Example:
consider a relation schema
BOOK_RATING(ISBN, Book_title, R_ID, Rating) .
The candidate keys are (ISBN, R_ID) and
This relation schema is not in BCNF since both the candidate keys are
composite as well as overlapping.
However, it is in 3NF.
(Book_title, R_ID).
Remedy : problem can be resolved by decomposing this relation schema
into two relation schemas as shown here.
BOOK_TITLE_INFO(ISBN, Book_title) and
REVIEW(R_ID, ISBN, Rating)
Or
BOOK_TITLE_INFO(ISBN, Book_title) and
REVIEW(R_ID, Book_title, Rating)
Now, all these relation schemas are in BCNF. Note that BCNF is the most
desirable normal form as it ensures the elimination of all redundancy that
can be detected using functional dependencies.
Note:
If there is only one determinant upon which other attributes depend and it is
a candidate key, 3NF and BCNF are identical.
BITS Pilani, Pilani Campus
Normalization : 4NF ( MVD )
Multivalued dependency occurs when two attributes in a table are independent of
each other but, both depend on a third attribute.
A multivalued dependency consists of at least two attributes that are dependent on a
third attribute that's why it always requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(white
and black) of each model every year.
BIKE_MODEL
MANUF_YEAR
COLOR
M2011
M2001
M3001
M3001
M4006
M4006
2008
2008
2013
2013
2017
2017
White
Black
White
Black
White
Black
Here columns COLOR and MANUF_YEAR are
dependent on BIKE_MODEL and independent
of each other.
In this case, these two columns can be called
as multivalued dependent on BIKE_MODEL.
The representation of these dependencies is
shown below:
1.BIKE_MODEL → → MANUF_YEAR
2.BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL multidetermined
COLOR".
BITS Pilani, Pilani Campus
Normalization : 4NF
A relation will be in 4NF
 if it is in Boyce Codd normal form and has no multi-valued dependency.
 For a dependency A → B, if for a single value of A, multiple values of B
exists, then the relation will be a multi-valued dependency.
Example
Is the table in 4NF?
STUDENT
The given STUDENT table is in 3NF, but the
STU_ID
COURSE
HOBBY
21
Computer
Dancing
21
Math
Singing
34
Chemistry
Dancing
74
Biology
Cricket
59
Physics
Hockey
COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE
and HOBBY.
In the STUDENT relation, a student with
STU_ID, 21 contains two
courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a
Multi-valued dependency on STU_ID, which
leads to unnecessary repetition of data.
BITS Pilani, Pilani Campus
Normalization : 4NF
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STUDENT_HOBBY
STU_ID
COURSE
STU_ID
HOBBY
21
Computer
21
Dancing
21
Math
21
Singing
34
Chemistry
34
Dancing
74
Biology
74
Cricket
59
Physics
59
Hockey
BITS Pilani, Pilani Campus
Normalization : 4NF
BITS Pilani, Pilani Campus
Normalization: 5NF
A table is said to be in the 5NF if and only if it is in 4NF and every Join
dependency in it is implied by the candidate key.
BITS Pilani, Pilani Campus
Normalization: 5NF
Definition. A relation schema R is in fifth normal form (5NF) (or project-join
normal form (PJNF)) with respect to a set F of functional, multivalued, and
join dependencies if, for every nontrivial join dependency JD(R1, R2, ..., Rn) in
F+ (that is, implied by F), every Ri is a superkey of R.
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Tutorial Session : 5
RA, SQL.
LEARNING OUTCOME
 RA
BITS Pilani, Pilani Campus
Example 1
BITS Pilani, Pilani Campus
EXAMPLE
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
LEARNING OUTCOME
 SQL
BITS Pilani, Pilani Campus
SQL tutorials
Find the names and ages of all sailors.
SELECT DISTINCT S.sname, S.age
FROM Sailors S
Find all sailors with a rating above 7.
SELECT S.sid, S.sname, S.rating,
S.age FROM Sailors AS S WHERE
S.rating > 7
Find the names of sailors who have reserved boat number 103.
SELECT S.sname FROM
Sailors S, Reserves R
WHERE S.sid = R.sid AND
R.bid=103
Find the sids of sailors who have reserved a red boat.
SELECT R.sid FROM Boats
B, Reserves R WHERE B.bid
= R.bid AND B.color = ‘red’
BITS Pilani, Pilani Campus
SQL tutorials
Find the names of sailors who have reserved a red boat.
Find all sailors with a rating less than 20.
Find the colors of boats reserved by Lubber.
SELECT S.sname FROM
Sailors S, Reserves R, Boats B
WHERE S.sid = R.sid AND
R.bid = B.bid AND B.color =
‘red’
SELECT S.sid, S.sname,
S.rating, S.age FROM Sailors
AS S WHERE S.rating <20
SELECT B.color FROM
Sailors S, Reserves R, Boats
B WHERE S.sid = R.sid AND
R.bid = B.bid AND S.sname
= ‘Lubber’
Find the names of sailors who have reserved at least one boat.
SELECT S.sname
FROM Sailors S,
Reserves R WHERE
S.sid = R.sid
BITS Pilani, Pilani Campus
SQL tutorials
Compute increments for the ratings of persons who have sailed two different boats on the same day.
Find the ages of sailors whose name begins and ends with B and has at least three characters.
SELECT S.sname, S.rating+1 AS
rating FROM Sailors S,
Reserves R1, Reserves R2
WHERE S.sid = R1.sid AND S.sid
= R2.sid AND R1.day = R2.day
AND R1.bid <> R2.bid
SELECT S.age FROM
Sailors S WHERE
S.sname LIKE ‘B %B’
Find the names of sailors who have reserved both a red and a green boat.
SELECT S.sname FROM Sailors S, Reserves R1, Boats B1, Reserves
R2, Boats B2 WHERE S.sid = R1.sid AND R1.bid = B1.bid AND S.sid =
R2.sid AND R2.bid = B2.bid AND B1.color=‘red’ AND B2.color =
‘green’
Or
SELECT S.sname FROM Sailors S, Reserves R, Boats B WHERE S.sid =
R.sid AND R.bid = B.bid AND B.color = ‘red’
UNION
SELECT S2.sname FROM Sailors S2, Boats B2, Reserves R2 WHERE
S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = ‘green’
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Session 6:
Topic : Logical design II
(FD and Normalization)
BITS Pilani, Pilani Campus
LEARNING OUTCOME
 FDs
 Normalization
REFER:
T1-Chapter 5
Sections:
5.1- 5.3
BITS Pilani, Pilani Campus
FUNCTIONAL DEPENDENCIES
BITS Pilani, Pilani Campus
FD
What is a functional dependency?
 By this, we can say that Employee Name, and salary are functionally depended
on Employee number.
Functional Dependency is when one attribute
determines another attribute in a DBMS
 A functional dependency is denoted by an arrow →
system. Functional Dependency plays a vital
role to find the difference between good and
 The functional dependency of X on Y is represented
bad database design.
by X →Y
Example: if we know the value of Employee number, we can obtain Employee Name and Salary
Written as :
Employee number -- Employee Name , Salary.
Employee
number
Employee
Name
Salary
1
Dana
50000
2
Francis
38000
3
Andrew
25000
A Functional dependency is a constraint between two sets of
attributes in a relation from a database.
BITS Pilani, Pilani Campus
Functional Dependencies : Exercise
Determine the FDs for the following schema?
Which is valid FD ?
TEXT → COURSE, TEACHER → COURSE, TEACHER --> TEXT, COURSE →
TEXT
Determine if FD: eid  {ename, age} valid or not?
A relation state of TEACH
with a possible functional
dependency TEXT →
COURSE. However,
TEACHER → COURSE,
TEACHER  TEXT and
COURSE → TEXT are
ruled out.
Determine if FD:
eid  ename
valid or not?
BITS Pilani, Pilani Campus
Functional Dependencies
BITS Pilani, Pilani Campus
Types of Functional Dependencies
Trivial dependency is a set of attributes which are called a trivial if the set of attributes are included in
that attribute.
So, X -> Y is a trivial functional dependency if
ie., {X,Y} ->X
The following dependencies are also trivial: X->X & Y->Y.
Emp_id
Emp_name
AS555
Harry
AS811
George
AS999
Kevin
Consider this table with two columns Emp_id and Emp_name.
X is superkey and is {Emp_id, Emp_name}
So
{Emp_id, Emp_name} -> Emp_id
is a trivial functional dependency as Emp_id is a subset of
{Emp_id,Emp_name}.
BITS Pilani, Pilani Campus
Types of Functional Dependencies
Non-trivial functional dependency
Functional dependency which also known as a nontrivial dependency occurs when X->Y
holds true where Y is not a subset of X. In a relationship, if
then it is considered as
a non-trivial dependency.
Example:
Company
CEO
Age
Microsoft
Satya Nadella
51
Google
Sundar Pichai
46
Apple
Tim Cook
57
{Company} -> {CEO} (if we know the Company, we know the
CEO name)
X= Company which is CK and Y is CEO , but
Since CEO is not a subset of Company, and hence it's nontrivial functional dependency.
BITS Pilani, Pilani Campus
Types of Functional Dependencies
Multivalued dependency occurs in the situation where there are multiple independent multivalued
attributes in a single table.
A multivalued dependency is a complete constraint between two sets of attributes in a relation. It
present in a relation.
Car_model
Maf_year
Color
H001
2017
Metallic
H001
2017
Green
H005
2018
Metallic
H005
2018
Blue
H010
2015
Metallic
H033
2012
Gray
 Maf_year and color are independent of each other but
dependent on car_model.
 In this example, these two columns are said to be
multivalue dependent on car_model.
 This dependence can be represented like this:
car_model
maf_year and car_model
colour
BITS Pilani, Pilani Campus
Types of Functional Dependencies
A transitive dependency is a type of functional dependency which happens
when t is indirectly formed by two functional dependencies.
{Company} -> {CEO} (if we know the company, we know its CEO's name)
{CEO } -> {Age} If we know the CEO, we know the Age.
Represented as {Company} -> {CEO} and {CEO}->{Age}
Company
CEO
Age
Microsoft
Satya Nadella
51
Google
Sundar Pichai
46
Alibaba
Jack Ma
54
Therefore according to the rule of rule of transitive dependency:
{ Company} -> {Age} should hold, that makes sense because if we know the company name, we can
company name, we can know his age.
Note: You need to remember that transitive dependency can only occur in a relation of three or more
attributes.
BITS Pilani, Pilani Campus
Types of Functional Dependencies
Full Functional Dependency:
A FD X  Y is a full functional dependency if removal of any attribute A from X means that the dependency
does not hold any more.
X is superkey ie., {SSN,Pnumber}
If we remove SSN from X then Pnumber  Hours is not VALID!
IIly If we remove Pnumber from X then SSN  Hours is also not
VALID!
This is called FULL FD.
Partial FD : A functional dependency X → Y is a partial dependency if some attribute A ε X can be removed from X
and the dependency still holds; that is, for some A ε X, (X − {A}) → Y.
X is superkey ie., {SSN,Pnumber}
Y is Ename
If we remove SSN from X and
Now FD2 is SSN  Ename holds true .
This is called Partially FD.
IIly FD3 Pnumber  Pname. Plocation is also
Partial FD
BITS Pilani, Pilani Campus
FD
Advantages of Functional Dependency
 Functional Dependency avoids data redundancy. Therefore same data do not
repeat at multiple locations in that database
 It helps you to maintain the quality of data in the database
 It helps you to defined meanings and constraints of databases
 It helps you to identify bad designs
 It helps you to find the facts regarding the database design
BITS Pilani, Pilani Campus
Key Attributes and its types:
If a relation schema has more than one key,
each is called a candidate key.
One of the candidate keys is arbitrarily
designated to be the primary key,
and the others are called secondary keys.
In a practical relational database, each
relation schema must have a primary key.
If no candidate key is known for a relation, the
entire relation can be treated as a default
superkey.
BITS Pilani, Pilani Campus
Prime and Non Prime attributes
Prime Attributes or PA :
An attribute of relation schema R is called a prime attribute of R if it is a member of some
candidate key of R.
For Ex. : Work-on
Both SSN and PNUMBER are prime attributes of work-on
Non Prime Attribute or NPA :
An attribute of relation schema R is called non prime attribute if it is not a member of any candidate key.
e.g., Hours is non prime attribute of work-on.
BITS Pilani, Pilani Campus
Closure of Attributes using the FDs:
Let R = {A, B, C, D, E, F} and a set of FDs
F={ A ->BC, E-> CF, B ->E, CD ->EF, F->D}
Compute the closure of a set of attribute {A, B}
under the given set of FDs.
Let R = {A, B, C, D, E, F} and a set of FDs
F ={A ->BC, E-> CF, B ->E, CD ->EF, F->D}
Compute the closure of an attribute {A}
under the given set of FDs.
BITS Pilani, Pilani Campus
Attribute closure and Extraneous
Attributes :
F= {A->D, BC->A, BC->D, C->B,E->A, E->D} Determine extraneous attributes?
BITS Pilani, Pilani Campus
Closure of FD set or F+:
The set of functional dependencies that is logically implied by F is called the closure of F
and is written as F+
BITS Pilani, Pilani Campus
Problems using Armstrong’s axiom:
1.
2.
BITS Pilani, Pilani Campus
Canonical cover or minimal set of FDs:
1.
2.
3.
Singleton RHS
Extraneous attributes removed
Remove redundant FDs.
R(A ,B ,C) and F = {A->B ,AB->C }. Find minimal cover?
 A canonical cover is "allowed" to have
more than one attribute on the RHS.
 A minimal cover cannot.
As an example, the canonical cover may be
"A -> BC" where the minimal cover would be
"A -> B, A -> C".
BITS Pilani, Pilani Campus
Canonical cover or minimal set of FDs:
A minimal cover of a set of functional dependencies E is a set of functional dependencies F that
satisfies the property that every dependency in E is in the closure F+ of F.
A set of functional dependencies F to be minimal if it satisfies the following condition.
(i) Every dependency in F has a single attribute for its right-hand side.
(ii) We cannot replace any dependency X -+ A in F with a dependency Y -+ A, where Y is a proper
subset of X and still have a set of dependencies that is equivalent to F.
(iii) We cannot remove any dependency from F and still have a set of dependencies that is
equivalent to F.
BITS Pilani, Pilani Campus
Canonical cover or minimal cover:
F = {A ->B, AB->C, D->AC, D->E} and G = {A->BC, D->AB}. Find if F covers G?
BITS Pilani, Pilani Campus
F = {A ->B, AB->C, D->AC, D->E} and G = {A->BC, D->AB}. Find if F covers G?
BITS Pilani, Pilani Campus
Problem
Find Ckey/Prime Key/minimal key and NPA?
R(A,B,C,D,E,F)
F={C->F,E->A, EC->D, A->B}
Find non and redundant FD:
R(A,B,C,D) F={ABC->D, BC->D}
BITS Pilani, Pilani Campus
Logical Design - Normalization
BITS Pilani, Pilani Campus
Normalization
Note:
Decomposing
relations should
Should preserve
DEPENDENCY.
(ie., FDs of the
original relation are
not lost. )
BITS Pilani, Pilani Campus
1 NF
BITS Pilani, Pilani Campus
Normalization: 1NF
A relation will be 1NF
1NF disallows relations within relations or relations as
attribute values within tuples. The only attribute values
 If it contains an atomic value.
 It states that an attribute of a table cannot hold multiple values. permitted by 1NF are single atomic (or indivisible) values.
 It must hold only single-valued attribute.
 First normal form disallows the multi-valued attribute, composite attribute, and their combinations.
EMPLOYEE table
EMP_ID
EMP_NAME
EMP_PHONE
EMP_STATE
14
John
7272826385,
9064738238
UP
20
Harry
8574783832
Bihar
12
Sam
7390372389,
8589830302
Punjab
SOLUTION:
The EMPLOYEE table into 1NF has
been shown below:
EMP_ID
EMP_NAME
EMP_PHONE
EMP_STATE
14
John
7272826385 UP
14
John
9064738238 UP
20
Harry
8574783832 Bihar
12
Sam
7390372389 Punjab
12
Sam
8589830302 Punjab
Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.
BITS Pilani, Pilani Campus
Normalization : 1NF
1NF (tables values are single and atomic)
Check if table is in UnNormalized Form
UNF (ie., tables values are not single and
atomic)
Yes: Relation should have no multivalued
attributes or nested relations.
So make all values as single and atomic.
Remedy : Form new relations for each
multivalued attribute or nested relation.
BITS Pilani, Pilani Campus
2 NF
BITS Pilani, Pilani Campus
Normalization: 2 NF
In the 2NF,
 Relational must be in 1NF.
 All non-key attributes are fully functional
dependent on the primary key ie., no partial
dependency.
Example: Let's assume, a school can store the
data of teachers and the subjects they teach. In a
school, a teacher can teach more than one
subject.
Candidate Keys/PK: {Teacher_Id, Subject}
Non prime attribute: Teacher_Age
TEACHER table
TEACHER_ID
SUBJECT
TEACHER_AGE
25
25
47
83
83
Chemistry
Biology
English
Math
Computer
30
30
35
38
38
Is the Table in 2NF?
In the given table, non-prime attribute
TEACHER_AGE is dependent on TEACHER_ID which
is a proper subset of a candidate key.
That's why it violates the rule for 2NF.
To convert the given table into 2NF, we
decompose it into two tables:
2NF requires that all data elements in a table are full
functionally dependent on the table's primary key.
• If data clement only dependent on part of primary key ie.,
partial dependent, then they are parsed out to separate tables.
BITS Pilani, Pilani Campus
Normalization : 2 NF
TEACHER_DETAIL table:
TEACHER_ID
TEACHER_AGE
25
30
47
35
83
38
TEACHER_SUBJECT table:
TEACHER_ID
SUBJECT
25
Chemistry
25
Biology
47
English
83
Math
83
Computer
BITS Pilani, Pilani Campus
Normalization : 2 NF
FD 1: SSN, Pnumber -> hours
FD 2: SSN  ename
FD 3: Pnumber  Pname, Plocation
Now table in 2NF
BITS Pilani, Pilani Campus
3 NF
BITS Pilani, Pilani Campus
Normalization: 3 NF
A relation will be in 3NF
 If it is in 2NF and not contain any transitive partial dependency.
 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
 If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form.
A relation is in third normal form if it
holds at-least one of the following
conditions for every non-trivial
function dependency X → Y.
1.X is a super key.
2.Y is a prime attribute, i.e., each
element of Y is part of some candidate
key.
EMPLOYEE_DETAIL table:
EMP_ID
EMP_NAME EMP_ZIP
EMP_STATE EMP_CITY
222
Harry
201010
UP
Noida
333
Stephan
02228
US
Boston
444
Lan
60007
US
Chicago
555
Katharine
06389
UK
Norwich
666
John
462007
MP
Bhopal
Super key in the table above:
{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
Candidate key: {EMP_ID}
BITS Pilani, Pilani Campus
Normalization: 3 NF
Non-prime attributes:
In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID.
Is table in 3NF?
The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID).
It violates the rule of third normal form.
SOLUTION ?
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP>
table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMP_ID
EMP_NAME EMP_ZIP
222
333
444
555
666
Harry
Stephan
Lan
Katharine
John
201010
02228
60007
06389
462007
EMPLOYEE_ZIP table:
EMP_ZIP
EMP_STATE
EMP_CITY
201010
UP
Noida
02228
US
Boston
60007
US
Chicago
06389
UK
Norwich
462007
MP
Bhopal
BITS Pilani, Pilani Campus
Normalization: 3 NF
3NF (no transitive dependency)
Check if table in 2NF already (ie., NPA are fully functional
dependent on PA)
YES : Check if table in 3NF (ie., no transitive dependency)
Test : Relation should not have a nonkey attribute functionally
determined by another nonkey attribute (or by a set of
nonkey attributes). That is, there should be no transitive
dependency of a nonkey attribute on the primary key.
Or
A relation schema R is in 3NF if every nonprime attribute of R
meets both of the following conditions:
■ It is fully functionally dependent on every key of R.
■ It is nontransitively dependent on every key of R.
Remedy: Decompose and set up a relation that includes the
nonkey attribute(s) that functionally determine(s) other
nonkey attribute(s).
OR
we can say: A relation schema R is in 3NF if, whenever a non
trivial functional dependency
SSN - Dnumber and Dnumber - Dname, DmgrSSN
BITS Pilani, Pilani Campus
3NF example
BITS Pilani, Pilani Campus
Boyce Codd NF
or
BC NF
BITS Pilani, Pilani Campus
BCNF ( BOYCE CODD NF)
BCNF is the advance version of 3NF. It is stricter than 3NF.
A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
Candidate key:
EMP_ID
EMP_COUN EMP_DEPT
TRY
DEPT_TYPE DEPT_NO_
OF_EMP
{EMP_ID,, EMP_DEPT}
264
India
Designing
D394
283
264
India
Testing
D394
300
364
UK
Stores
D283
232
Is the table in BCNF?
The table is not in BCNF because neither
EMP_DEPT nor EMP_ID alone are keys.
364
UK
Developing D283
549
Super keys: {Emp_Id}, {Emp_Id, Emp_Country}, {Emp_Id, Emp_Dept}, {Emp_Dept}, {Emp_Id,Emp_Dept_no}…so on
In the above table Functional dependencies are as follows:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, DEPT_NO_OF_EMP}
To convert it to BCNF:
The table is decomposed so that each FD here is
valid and every FD depends on SK on LHS
BITS Pilani, Pilani Campus
BCNF ( BOYCE CODD NF)
To convert the given table into BCNF, we decompose it into three tables:
Functional dependencies:
EMP_COUNTRY table:
EMP_ID
EMP_COUNTRY
264
India
264
India
1.EMP_ID → EMP_COUNTRY
2.EMP_DEPT → {DEPT_TYPE, DEPT_NO_OF_EMP}
EMP_DEPT_MAPPING table:
EMP_DEPT table:
EMP_ID
EMP_DEPT
D394
Designing
EMP_DEPT
DEPT_TYPE
DEPT_NO_OF_EMP
D394
Testing
Designing
D394
283
D283
Stores
Testing
D394
300
D283
Developing
Stores
D283
232
Developing
D283
549
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.
BITS Pilani, Pilani Campus
BCNF ( BOYCE CODD NF)
Example: Consider a relation schema
BOOK_RATING(ISBN, Book_title, R_ID, Rating) .
Two Candidate keys = {(ISBN, R_ID), (Book_title, R_ID)}.
Remedy : problem can be resolved by decomposing this relation
schema into two relation schemas as shown here.
BOOK_TITLE_INFO(ISBN, Book_title) and
REVIEW(R_ID, ISBN, Rating)
Or
BOOK_TITLE_INFO(ISBN, Book_title) and
REVIEW(R_ID, Book_title, Rating)
Now, all these relation schemas are in BCNF. Note that BCNF is the
most desirable normal form as it ensures the elimination of all
redundancy that can be detected using functional dependencies.
Note:
If there is only one determinant upon which other attributes depend
and it is a candidate key, 3NF and BCNF are identical.
This relation schema is not in BCNF since both the candidate keys are composite as well as overlapping.
However, it is in 3NF.
BITS Pilani, Pilani Campus
BCNF ( BOYCE CODD NF)
BITS Pilani, Pilani Campus
BCNF ( BOYCE CODD NF)
Normalize the relation professor so as it is in BCNF.
The PROFESSOR Relation decompose into two relation:
PROF 1 and PROF 2 respectively.
BITS Pilani, Pilani Campus
BCNF ( BOYCE CODD NF)
Note: Every relation in BCNF is also in 3NF, but a relation is 3NF is not necessarily in BCNF
A relation is not in BCNF if
1.the candidate keys in the
relation are composite keys (that
is, they are not single attribute
keys)
2.there is more than one candidate
key in the relation
3.the keys overlap, that is, some
attributes in the keys are
common.
BITS Pilani, Pilani Campus
BCNF ( BOYCE CODD NF)
Note: Every relation in BCNF is also in 3NF, but a relation is 3NF is not necessarily in BCNF
Ex: Where each student may have only one tutor, but each tutor
may have many students.
This table is subject to insertion anomalies as
both the Tutor ID and SIN must be entered
whenever a tutor-student pair is entered.
So decompose to convert to BCNF.
Candidate keys are: {ID, TutorID} and {ID, TutorSIN}
TutorID → TutorSIN and TutorSIN → TutorID, but because
both TutorID and TutorSIN are prime attributes these FDs do not violate
3NF.
Neither TutorID nor TutorSIN alone are superkeys, and thus BCNF is
violated.
BITS Pilani, Pilani Campus
4 NF
BITS Pilani, Pilani Campus
Normalization : 4NF ( MVD )
Multivalued dependency occurs when two attributes in a table are independent of each other but,
both depend on a third attribute.
A multivalued dependency consists of at least two attributes that are dependent on a third
attribute that's why it always requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(white and black)
of each model every year.
BIKE_MODEL
MANUF_YEAR
COLOR
M2011
M2001
M3001
M3001
M4006
M4006
2008
2008
2013
2013
2017
2017
White
Black
White
Black
White
Black
Here columns COLOR and MANUF_YEAR are
dependent on BIKE_MODEL and independent
of each other.
In this case, these two columns can be called
as multivalued dependent on BIKE_MODEL.
The representation of these dependencies is
shown below:
1.BIKE_MODEL
MANUF_YEAR
2.BIKE_MODEL
COLOR
This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL multidetermined
COLOR".
BITS Pilani, Pilani Campus
Normalization : 4NF
A relation will be in 4NF
 if it is in Boyce Codd normal form and has no multi-valued dependency.
 For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.
Is the table in 4NF?
Example
STUDENT
STU_ID
COURSE
HOBBY
21
Computer
Dancing
21
Math
Singing
34
Chemistry
Dancing
74
Biology
Cricket
59
Physics
Hockey
The given STUDENT table is in 3NF, but the
COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE
and HOBBY.
In the STUDENT relation, a student with
STU_ID, 21 contains two
courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a
Multi-valued dependency on STU_ID, which
leads to unnecessary repetition of data.
BITS Pilani, Pilani Campus
Normalization : 4NF
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STUDENT_HOBBY
STU_ID
COURSE
STU_ID
HOBBY
21
Computer
21
Dancing
21
Math
21
Singing
34
Chemistry
34
Dancing
74
Biology
74
Cricket
59
Physics
59
Hockey
BITS Pilani, Pilani Campus
BCNF to 4NF:
An entity type is in 4NF if it is BCNF and there are non multivalued dependencies
between its attribute types.
Any entity is BCNF is transformed into 4NF :
(i) Direct any multivalued dependencies.
(ii) Decompose entity type.
BITS Pilani, Pilani Campus
Normalization : 4NF
BITS Pilani, Pilani Campus
Normalization : 4NF
BITS Pilani, Pilani Campus
5 NF
BITS Pilani, Pilani Campus
5NF
A relation is in 5NF
 if it is in 4NF and not contains any join dependency and joining should be lossless.
 5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy.
 5NF is also known as Project-join normal form (PJ/NF).
Is the table in 5NF?
SUBJECT
LECTURER
SEMESTER
Computer
Anshika
Semester 1
Computer
John
Semester 1
Math
John
Semester 1
Math
Akash
Semester 2
Chemistry
Praveen
Semester 1
NO its not!
 In the above table, John takes both Computer
and Math class for Semester 1 but he doesn't
take Math class for Semester 2.
 In this case, combination of all these fields
required to identify a valid data.
 Suppose we add a new Semester as Semester 3
but do not know about the subject and who will
be taking that subject so we leave Lecturer and
Subject as NULL. But all three columns together
acts as a primary key, so we can't leave other
two columns blank.
BITS Pilani, Pilani Campus
5NF
So to make the above table into 5NF, we can decompose it into three relations
P1, P2 & P3:
P2
P1
P3
SEMESTER
SUBJECT
SUBJECT
LECTURER
SEMSTER
LECTURER
Semester 1
Computer
Computer
Anshika
Semester 1
Anshika
Semester 1
Math
Computer
John
Semester 1
John
Semester 1
Chemistry
Math
John
Semester 1
John
Semester 2
Math
Math
Akash
Semester 2
Akash
Chemistry
Praveen
Semester 1
Praveen
BITS Pilani, Pilani Campus
Normalization: 5NF
A table is said to be in the 5NF if and only if it is in 4NF and every Join dependency in it is implied by the candidate key.
BITS Pilani, Pilani Campus
Lossless Decomposition or NonAdditive JD
A decomposition {R1, R2,…, Rn} of a relation R is called a lossless decomposition for
R if the natural join of R1, R2,…, Rn produces exactly the relation R.
A decomposition is lossless if we can recover:
R(A, B, C)
Decompose
R1(A, B) R2(A, C)
Recover
R’(A, B, C)
Thus,
R’ = R
BITS Pilani, Pilani Campus
Lossy and Lossless decomposition Problems:
1. R(A,B,C)  R1(A,B) and R2(B,C)
Find if lossless or lossy decomposition or JD
R=
A
B C
2. R(A,B,C)  R1(A,C) and R2(B,C)
Find if lossless or lossy decomposition or JD
R=
A
B C
1
2
1
1
2
1
2
5
3
2
5
3
3
3
3
3
3
3
BITS Pilani, Pilani Campus
Example :
Explain Is it Lossy or lossless join decomposition in the following relation
R(A, B,C,D,E) and F={ A->B, B->C, D->C, D->E} Where R is decomposed
into R1(A,B,D) and R2(C,D,E) ?
BITS Pilani, Pilani Campus
Example :
Explain Is it Lossy or lossless join decomposition in the following relation
R(A, B,C,D,E) and F={ A->B, B->C, D->C, D->E} Where R is decomposed
into R1(A,B,D) and R2(C,D,E) ?
Solution:
A
B
C
D
E
R1
A D
BD
CD
DD
E D
R2
D
D
D
Therefore LOSSLESS JD.
BITS Pilani, Pilani Campus
Example
Given:
Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount)
Required FD’s:
branch-namebranch-city assets
loan-numberamount branch-name
Decompose Lending-schema into two schemas:
Branch-schema = (branch-name, branch-city, assets)
Loan-info-schema = (branch-name, customer-name, loan-number, amount)
Show that decomposition is Lossless Decomposition
BITS Pilani, Pilani Campus
Problems:
1. R(A,B,C,D,E) and F={A->D,B->C, AB->E} Is it in 1NF, 2NF,
3NF or BCNF?
2. R = (A, B, C, D, E). We decompose it into R1 = (A, B, C),
R2 = (A, D, E). The set of functional dependencies is: A → BC,
CD → E, B → D, E → A. Show that this decomposition is a
lossless-join decomposition.
BITS Pilani, Pilani Campus
Problems:
3. R(A,B,C) and F={A-B, B->C} is in 3NF or not?
4. R(A,B,C,D) and F={A->BCD, BC->D, D->B} is it in BCNF or not?
BITS Pilani, Pilani Campus
NF
BITS Pilani, Pilani Campus
NF
Remedy : problem can be resolved by decomposing this relation
schema into two relation schemas as shown here.
BOOK_TITLE_INFO(ISBN, Book_title) and
REVIEW(R_ID, ISBN, Rating)
Or
BOOK_TITLE_INFO(ISBN, Book_title) and
REVIEW(R_ID, Book_title, Rating)
Now, all these relation schemas are in BCNF. Note that BCNF is the
most desirable normal form as it ensures the elimination of all
redundancy that can be detected using functional dependencies.
BOOK_RATING(ISBN, Book_title, R_ID, Rating) .
Two Candidate keys = {(ISBN, R_ID), (Book_title, R_ID)}.
Note:
If there is only one determinant upon which other attributes
depend and it is a candidate key, 3NF and BCNF are identical.
Relation in BCNF
This relation schema is not in
BCNF since both the
candidate keys are composite
as well as overlapping.
However, it is in 3NF.
BITS Pilani, Pilani Campus
NFs Overview
 Functional dependencies (FD): tool to detect redundancies in schemas
 Relations can be in different normal forms - the higher, the less redundancies. But there is a trade-off (see
above).
 If a relation is in BCNF, it is free of redundancies that can be detected using FDs. Thus, trying to
decompose into BCNF is a good heuristic.
 If a relation is not in BCNF, we can try to decompose it into a collection of BCNF relations.
 Decompositions can be loss-less and/or dependency-preserving
 Must consider whether all FDs are preserved.
 If a dependency-preserving decomposition into BCNF is not possible (or unsuitable, given typical
queries), should consider decomposition into 3NF.
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Tutorial Session 6:
Data storage, Indexing and Normalization.
LEARNING OUTCOME
 Secondary storage devices (Files, records, blocks on disks)
 B and B+ trees
 Hashing techniques(internal & external)
REFER:
T1-Chapter 13
Sections: 13.1-13.8
BITS Pilani, Pilani Campus
Disk Parameters Calculation:
 Usually, the disk manufacturer provides an average seek time in milliseconds.
The typical range of average seek time is 4 to 10 msec.
 If the speed of disk rotation is p revolutions per minute (rpm), then the average
rotational delay rd is given by
rd = (1/2) * (1/p) min= (60 * 1000)/(2 * p) msec = 30000/p msec.
 Block transfer time (btt) = B/tr msec where B is Block size and tr is transfer rate.
 Transfer rate = track size in bytes / 1 rpm.
 The average time (s) needed to find and transfer a block, given its block address, is
estimated by
(s + rd + btt) msec.
BITS Pilani, Pilani Campus
Disk Parameters Calculation:
 To transfer consecutively k noncontiguous blocks that are on the same cylinder, we
need approximately s + (k * (rd + btt)) msec.
 The rotational delay for all but the first block, so the estimate for transferring k
consecutive blocks is s + rd + (k * btt) msec.
 bulk transfer rate (btr) that takes the gap size into account when reading
consecutively stored blocks. If the gap size is G bytes, then
btr = (B/(B + G)) * tr bytes/msec.
 The estimated time to read k blocks consecutively stored on the same cylinder
becomes
s + rd + (k * (B/btr)) msec.
BITS Pilani, Pilani Campus
Placing file records on Disk
BITS Pilani, Pilani Campus
Placing file records on Disk
BITS Pilani, Pilani Campus
Placing file records on Disk
BITS Pilani, Pilani Campus
Placing file records on Disk
BITS Pilani, Pilani Campus
Disk Parameters Calculation:
Formula
A. Usually, the disk manufacturer provides an average seek time in milliseconds.
B. The typical range of average seek time is 4 to 10 msec.
C. If the speed of disk rotation is p revolutions per minute (rpm), then the average
rotational delay rd is given by
rd = (1/2) * (1/p) min= (60 * 1000)/(2 * p) msec = 30000/p msec.
1 rpm = 60 x 1000 / rpm msec
And rd = (1 rpm) /2
D. Block transfer time (btt) = B/tr msec where B is Block size and tr is transfer rate.
E. Transfer rate = track size in bytes / 1 rpm.
F.
The average time (s) needed to find and transfer a block, given its block address, is estimated by
(s + rd + btt) msec.
G.
To transfer consecutively k noncontiguous blocks that are on the same cylinder, we need approximately s + (k * (rd + btt)) msec.
H. The rotational delay for all but the first block, so the estimate for transferring k consecutive blocks is s + rd + (k * btt) msec.
I.
bulk transfer rate (btr) that takes the gap size into account when reading consecutively stored blocks. If the gap size is G bytes, then
a.
J.
btr = (B/(B + G)) * tr bytes/msec.
The estimated time to read k blocks consecutively stored on the same cylinder becomes
a.
s + rd + (k * (B/btr)) msec1
k. Blocking factor = Bfr = floor(B/R) where B – block size in bytes and R is record size in bytes.
BITS Pilani, Pilani Campus
Problem
Consider a disk with the following characteristics (these are not parameters of any particular disk unit): block
size B=512 bytes, interblock gap size G=128 bytes, number of blocks per track=20, number of tracks per
surface=400. A disk pack consists of 15 double-sided disks.
(a) What is the total capacity of a track and what is its useful capacity (excluding interblock gaps)?
(b) How many cylinders are there?
(c) What is the total capacity and the useful capacity of a cylinder?
(d) What is the total capacity and the useful capacity of a disk pack?
(e) Suppose the disk drive rotates the disk pack at a speed of 2400 rpm (revolutions per minute); what is the
transfer rate in bytes/msec and the block transfer time btt in msec? What is the average rotational delay rd in
msec? What is the bulk transfer rate?
(f) Suppose the average seek time is 30 msec. How much time does it take (on the average) in msec to locate
and transfer a single block given its block address?
(g) Calculate the average time it would take to transfer 20 random blocks and compare it with the time it would
take to transfer 20 consecutive blocks using double buffering to save seek time and rotational delay.
BITS Pilani, Pilani Campus
a) What is the total capacity of a track and what is its useful capacity (excluding
interblock gaps)?
block size B=512 bytes,
interblock gap size G=128 bytes,
number of blocks per track=20,
number of tracks per surface=400.
A disk pack consists of 15 double-sided
disks.
NOTE : Write the units after computed values.
BITS Pilani, Pilani Campus
b) How many cylinders are there?
block size B=512 bytes,
interblock gap size G=128 bytes,
number of blocks per track=20,
number of tracks per surface=400.
A disk pack consists of 15 double-sided
disks.
NOTE : Write the units after computed values.
BITS Pilani, Pilani Campus
(c) What is the total capacity and the useful capacity of a cylinder?
block size B=512 bytes,
interblock gap size G=128 bytes,
number of blocks per track=20,
number of tracks per surface=400.
A disk pack consists of 15 double-sided
disks.
NOTE : Write the units after computed values.
BITS Pilani, Pilani Campus
(d) What is the total capacity and the useful capacity of a disk pack?
block size B=512 bytes,
interblock gap size G=128 bytes,
number of blocks per track=20,
number of tracks per surface=400.
A disk pack consists of 15 double-sided
disks.
NOTE : Write the units after computed values.
BITS Pilani, Pilani Campus
(e) Suppose the disk drive rotates the disk pack at a speed of 2400 rpm
(revolutions per minute); what is the transfer rate in bytes/msec and the block
transfer time btt in msec? What is the average rotational delay rd in msec?
What is the bulk transfer rate?
NOTE : Write the units after computed values.
block size B=512 bytes,
interblock gap size G=128 bytes,
number of blocks per track=20,
number of tracks per surface=400.
A disk pack consists of 15 double-sided
disks.
BITS Pilani, Pilani Campus
(f) Suppose the average seek time is 30 msec. How much time does it take (on
the average) in msec to locate and transfer a single block given its block
address?
block size B=512 bytes,
interblock gap size G=128 bytes,
number of blocks per track=20,
number of tracks per surface=400.
A disk pack consists of 15 double-sided
disks.
NOTE : Write the units after computed values.
BITS Pilani, Pilani Campus
(g) Calculate the average time it would take to transfer 20 random blocks and
compare it with the time it would take to transfer 20 consecutive blocks using
double buffering to save seek time and rotational delay.
block size B=512 bytes,
interblock gap size G=128 bytes,
number of blocks per track=20,
number of tracks per surface=400.
A disk pack consists of 15 double-sided
disks.
NOTE : Write the units after computed values.
BITS Pilani, Pilani Campus
Solution
(a) Using the block size B=512 bytes, interblock gap size G=128 bytes, number of blocks per
track=20,
Now calculate 1 block storing capacity = 1 block size + I Gap) =
=
(d) since number of tracks per surface=400.
Total capacity of a disk pack = 15 * 2 * 400 * 20 * (512+128)
(512 +128) = 640 bytes
= 153600000 bytes = 153.6 Mbytes
For 1 track which has 20 blocks and so its storage capacity = (Total track size
= 20 * (512+128)
Useful capacity of a disk pack
= 15 * 2 * 400 * 20 * 512
(ie., excluding gap size)
= 122.88 Mbytes
= 12800 bytes = 12.8 Kbytes
Useful capacity of a track = 20 * 512 = 10240 bytes = 10.24 Kbytes
(ie., excluding the gap size)
(b) Number of cylinders = number of tracks = 400
(c) since a disk pack consists of 15 double-sided disks.
Total cylinder capacity = 15*2*20*(512+128)
= 384000 bytes = 384 Kbytes
NOTE : Write the units after computed values.
Useful cylinder capacity = 15 * 2 * 20 * 512
(ie., excluding the gap size)
= 307200 bytes = 307.2 Kbytes
BITS Pilani, Pilani Campus
Solution
(e)
Using the above FORMULA H
Transfer rate = track size in bytes / 1 rpm.
Transfer rate tr= (total track size in bytes)/(time for one disk revolution in msec)
tr= (12800) / ( (60 * 1000) / (2400) ) = (12800) / (25) = 512 bytes/msec
Using the above formula G
Block transfer time (btt) = B/tr msec where B is Block size and tr is transfer rate.
(g) So now using calculated from previous step
time to transfer 20 random blocks = 20 * (s + rd + btt) = 20 * 43.5 = 870 msec
time to transfer 20 consecutive blocks using double buffering
= s + rd + 20*btt = 30 + 12.5 + (20*1) = 62.5 msec
(a more accurate estimate of the latter can be calculated using the bulk transfer
block transfer time btt = B / tr = 512 / 512 = 1 msec
rate as follows:
Using the above formula G
time to transfer 20 consecutive blocks using double buffering
If the speed of disk rotation is p revolutions per minute (rpm), then the average
= s+rd+((20*B)/btr) = 30+12.5+ (10240/409.6) = 42.5+ 25 = 67.5 msec)
rotational delay rd is given by rd = (1/2) * (1/p) min
average rotational delay rd = (time for one disk revolution in msec) / 2
= 25 / 2
= 12.5 msec
(f)
Using the above formula
Using the above formula
The average time (s) needed to find and transfer a block, given its block address,
is estimated by (s + rd + btt) msec.
bulk transfer rate (btr) that takes the gap size into account when reading consecutively
stored blocks. If the gap size is G bytes, then btr = (B/(B + G)) * tr bytes/msec.
average time to locate and transfer a block = s+rd+btt
bulk transfer rate btr= tr * ( B/(B+G) )
= 512*(512/640)
= 30+12.5+1 = 43.5 msec
NOTE : Write the units after computed values.
= 409.6 bytes/msec
BITS Pilani, Pilani Campus
Problem
Let us say we have XAT supplier company has stored info on files.
A Supplier file has rec = 1000 records of fixed length.
Each record has the following fields/cols (in bytes) :
sup# (10), part# ( 10) , pname(200) pdescp(700) and a deletion marker byte.
The file is stored on the disk whose parameters are given as block size B = 1024 bytes; interblock gap size G =
200bytes; number of blocks per track = 25; number of tracks per surface = 500. A disk pack consists of 18
double-sided disks, seek time s= 20msec, rotational delay rd = 12.5 and rpm=2000 msec.
a. Calculate the record size in bytes.
b. Calculate the blocking factor and the number of file blocks b, assuming an unspanned organization.
c. Calculate the average time it takes to find a record by doing a linear search on the file if
(i) the file blocks are stored contiguously, and double buffering is used;
(ii) the file blocks are not stored contiguously.
d. Assume that the file is ordered by part#; by doing a binary search, calculate the time it takes to search for a
record given its part# value.
BITS Pilani, Pilani Campus
Solution
a. Calculate the record size in bytes.
b. Calculate the blocking factor and the number of file blocks
b, assuming an unspanned organization.Understand
unspanned ie., if the last record cannot fit in that block that
whole record is stored in the next consecutive or stored in a
different block.
A Supplier file has rec = 1000 records of fixed length.
Each record has the following fields/cols (in bytes) :
sup# (10), part# ( 10) , pname(200) pdescp(700) and a
deletion marker byte.
block size B = 1024 bytes; interblock gap size G =
200bytes; number of blocks per track = 25; number of
tracks per surface = 500. A disk pack consists of 18
double-sided disks, seek time s= 20msec, rotational delay
rd = 12.5 and rpm=2000 msec.
NOTE : Write the units after computed values.
BITS Pilani, Pilani Campus
Solution
c. Calculate the average time it takes to find a record
by doing a linear search on the file
(i)The file blocks are stored contiguously, and double buffering
is used;
A Supplier file has rec = 1000 records of fixed length.
Each record has the following fields/cols (in bytes) :
sup# (10), part# ( 10) , pname(200) pdescp(700) and a
deletion marker byte.
NOTE : Write the units after computed values.
block size B = 1024 bytes; interblock gap size G = 200bytes;
number of blocks per track = 25; number of tracks per
surface = 500. A disk pack consists of 18 double-sided disks,
seek time s= 20msec, rotational delay rd = 12.5 and
rpm=2000 msec.
BITS Pilani, Pilani Campus
Solution
(ii) the file blocks are not stored contiguously.
A Supplier file has rec = 1000 records of fixed length.
Each record has the following fields/cols (in bytes) :
sup# (10), part# ( 10) , pname(200) pdescp(700) and a
deletion marker byte.
block size B = 1024 bytes; interblock gap size G = 200bytes;
number of blocks per track = 25; number of tracks per
surface = 500. A disk pack consists of 18 double-sided disks,
seek time s= 20msec, rotational delay rd = 12.5 and
rpm=2000 msec.
d. Assume that the file is ordered by part#; by doing a
binary search, calculate the time it takes to search for a
record given its part# value.
NOTE : Write the units after computed values.
BITS Pilani, Pilani Campus
Solution
a. Calculate the record size in bytes.
Record size = 10 + 10+ 200+ 700 +1 = 921 bytes.
b. Calculate the blocking factor and the number of file
blocks b, assuming an unspanned organization.
Understand unspanned ie., if the last record
cannot fit in that block that whole record is stored
in the next consecutive or stored in a different
block.
Using formula from above
Blocking factor = Bfr = floor(B/R)
where B – block size in bytes and R is record size in
bytes.
Given in the problem block size B = 1024 bytes;
interblock gap size G = 200bytes; number of blocks per
track = 25; number of tracks per surface = 500.
B = 1024 bytes
Bfr = floor(B/R) = floor(1024/921)= floor(1.113) = 1.
(ie., 1 record in one block)
Therefore for 1000 records we need 1000 blocks.
b = ceil(rec/ bfr) = ceil(1000/1) = 1000 blocks.
BITS Pilani, Pilani Campus
Solution
c. Calculate the average time it takes to find a record (i)The file blocks are stored contiguously, and double buffering is
by doing a linear search on the file
used;
Using the formula:
= s+rd+(Hbs *B/btr)
The average time to do linear search is
searching half the total file blocks
Hbs= b/2
where Hbs = 500, tr = track size/ 1 rpm = 30600 bytes /30
msec, calculate btr = (B/(B+G)) x tr , seek time s=20 msec,
rd= 12.5.
tr = 30600/30 = 1020 bytes/msec
The average time to do linear search = Hbs =
b/2= 1000/2=500 blocks.
Total track size = 25 * (1024+200) = 30600 bytes
Time for 1 revolution of disk (ie 1 rpm) = 60 x
1000 / rpm = 60 x 1000/2000 = 30 msec
Given in the problem block size B = 1024 bytes; interblock
gap size G = 200bytes; number of blocks per track = 25;
number of tracks per surface = 500.
btr = ( 1024/(1024 + 200)) x 1020 = 853 bytes/msec
= s+rd+(Hbs *B/btr)
=20 +12.5 +(500 *(1024/853))=632.7344 msec= 0.632734
sec
BITS Pilani, Pilani Campus
Solution
(ii) the file blocks are not stored contiguously.
Similarly calculate using the following equation:
= Hbs *( s+rd +btt)
where btt = B/tr where tr = number of bytes on a track/ 1 rpm = 30600 bytes /30 msec
, Hbs =500 and B =1024
Btt = 1024 /( 30600/30) = 1.004
Given seek time s=20 msec, rd= 12.5.
So , = Hbs *( s+rd +btt) = 500 * (20 +12.5 +1.004)
= 18252 msec.
NOTE : Write the units after computed values.
a. Assume that the file is ordered by part#; by doing a
binary search, calculate the time it takes to search for a
record given its part# value.
= ceil(log 2b) *(s+rd+btt)
Since b is 1000 blocks .(calculated above)
= ceil (log 21000) x (20+12.5+ 1.004) = ceil (log 21000) x
(33.504) = 9.9 x33.504 = 331.7 msec
BITS Pilani, Pilani Campus
Question?
Without rearranging the actual records can you put them in
particular order based on a key field or fields?
Sol:
By Indexing or Hashing.
Indexing is like Index at the end of Book , so you need to search linearly or
binary to get to that term and find page number where that term is in the
book. So catch is file (actual data) and a index to it is needed.
Hashing is O(1) doesn’t need to search but computes the data location in
the file.
BITS Pilani, Pilani Campus
B+ TREE Tutorials
BITS Pilani, Pilani Campus
CRUD on B+ tree
Create a B+tree
2,5,7,10,13,16,20,22,23,24
Delete 23,10
BITS Pilani, Pilani Campus
CRUD on B+ tree
Delete 23,10
BITS Pilani, Pilani Campus
HASHING TECHNIQUE
BITS Pilani, Pilani Campus
Hashing Technique
BITS Pilani, Pilani Campus
where data is stored at the data
blocks whose address is generated by
using hash function. The memory
location where these records are
stored is called as data block or data
bucket. This data bucket is capable of
storing one or more records.
BITS Pilani, Pilani Campus
Hashing Technique
BITS Pilani, Pilani Campus
Hashing Technique
BITS Pilani, Pilani Campus
Structure of extendible hashing
BITS Pilani, Pilani Campus
Problems:
BITS Pilani, Pilani Campus
Download