Uploaded by kanhayakat

DBMS-1

advertisement
structures, indexing mechanisms, and
optimization techniques.
DBMS
1) What do you understand by a data model? Explain
•
Abstraction Level: It is a lower-level abstraction
that specifies how data is stored on disk or in
memory, taking into consideration the
performance and efficiency of data retrieval.
•
Audience: Internal models are designed for
database administrators, system architects,
and developers who need to optimize and
manage the physical storage and retrieval of
data.
•
Example: SQL Data Definition Language (DDL)
statements, which define tables, indexes, and
constraints in a relational database, are part of
the internal model.
the difference between conceptual data model
and the internal model.
Ans.
Understanding Data Model: A data model is a
conceptual representation and abstraction of the
structure of a database. It defines how data is stored,
organized, and manipulated within a database system.
Data models help in understanding the relationships
between different data elements and serve as a
blueprint for designing and implementing databases.
They provide a way to visually represent data and its
relationships, allowing for effective communication
between database designers, developers, and
stakeholders.
2) By What are the main steps of database design?
Explain them in brief.
Difference between Conceptual Data Model and
Internal Model:
Ans. The database design process involves several key
steps to ensure the creation of an efficient, organized,
and well-structured database system. Here are the
main steps of database design explained briefly:
1. Conceptual Data Model:
•
•
Purpose: The conceptual data model focuses
on representing high-level concepts and
relationships between entities in the real
world. It aims to capture the essential business
rules and requirements without being
concerned with how data is physically stored
or implemented.
1. Requirement Analysis:
•
Abstraction Level: It is an abstraction at the
highest level, emphasizing clarity and simplicity
in representing entities, attributes, and
relationships.
•
Audience: Conceptual data models are
primarily designed for business stakeholders,
users, and analysts who want to understand
the structure and meaning of the data in the
organization.
•
Example: An entity-relationship diagram (ERD)
is a common tool used for creating conceptual
data models. It represents entities as well as
relationships between them.
2. Conceptual Design:
•
Description: The conceptual design phase
involves creating a high-level conceptual
model that represents the essential entities,
relationships, and attributes in the system.
Techniques like Entity-Relationship Diagrams
(ERD) are commonly used. This step focuses on
the overall structure of the database without
concern for implementation details.
3. Normalization:
2. Internal Model:
•
Description: In this initial phase, the database
designer collaborates with stakeholders to
gather and understand the requirements of
the database system. This includes identifying
the information needs, user expectations, and
business rules that the database must adhere
to.
•
Purpose: The internal model, also known as
the physical data model, deals with how data is
actually stored and accessed in a database
system. It is concerned with the
implementation details, such as data storage
[1]
Description: Normalization is the process of
organizing data to eliminate redundancy and
dependency issues. It involves decomposing
tables into smaller, related tables to minimize
data redundancy and improve data integrity.
Normal forms, such as First Normal Form (1NF)
to Fifth Normal Form (5NF), are applied to
achieve a well-structured database.
Ans. Entity Integrity Constraint:
Entity integrity is a fundamental concept in database
design, and it is enforced through the use of entity
integrity constraints. The primary goal of entity
integrity is to ensure that each row (record) in a
database table is uniquely identifiable. In other words,
it ensures that no primary key value is NULL, and each
record can be uniquely identified by its primary key.
4. Data Model Refinement:
•
Description: Building on the conceptual design,
the data model is refined to address
normalization concerns and improve overall
design efficiency. This step may involve
adjusting entities, attributes, and relationships
based on normalization results and additional
analysis.
Key points about entity integrity constraints:
1. Primary Key Constraint:
5. Physical Design:
•
Description: In the physical design phase, the
conceptual model is translated into a physical
model that defines how data will be stored and
accessed in the database system. Decisions
regarding data types, indexing, partitioning,
and storage structures are made to optimize
performance.
•
A primary key is a column or a set of columns
in a table that uniquely identifies each record.
•
The primary key must have a unique value for
each record, and it cannot contain NULL
values.
2. Uniqueness and Identification:
•
The primary key enforces the uniqueness of
records, making each record identifiable by its
primary key value.
•
Ensures that there are no duplicate records
based on the primary key.
6. Implementation:
•
Description: The implementation phase
involves translating the physical design into a
database management system (DBMS) specific
language or script. This includes creating
tables, defining constraints, specifying indexes,
and setting up other necessary elements
within the chosen DBMS.
3. Enforcement:
7. Testing and Evaluation:
•
Description: The designed database is
rigorously tested to ensure it meets the
specified requirements and functions as
intended. Testing involves validating data
integrity, accuracy, and performance. Feedback
from users and stakeholders is considered to
refine the design further.
•
Implemented using the PRIMARY KEY
constraint in SQL.
•
The primary key is typically defined when
creating a table and is used to uniquely identify
records.
Referential Integrity Constraint:
8. Deployment and Maintenance:
Referential integrity is another crucial concept in
database design, and it is enforced through referential
integrity constraints. This type of constraint ensures
the consistency and accuracy of relationships between
tables in a relational database.
•
Key points about referential integrity constraints:
Description: Once testing is successful, the
database is deployed for actual use.
Continuous monitoring and maintenance
activities, including backups, security updates,
and performance tuning, are carried out to
ensure the ongoing reliability and efficiency of
the database system.
1. Foreign Key Constraint:
3) Explain the entity integrity and referential integrity
constraints. How they are use full in database
Design?
•
A foreign key is a column or a set of columns in
a table that refers to the primary key in
another table.
•
It establishes a link between two tables,
creating a relationship.
2. Relationship Consistency:
[2]
•
•
Ans. Insertion Anomalies:
Ensures that relationships between tables are
consistent and valid.
Insertion anomalies occur when it is challenging to add
data to the database without violating the integrity
constraints. There are three main types of insertion
anomalies:
The values in the foreign key column must
match the values in the referenced primary key
column.
1. Incomplete Information:
3. Enforcement:
•
Implemented using the FOREIGN KEY
constraint in SQL.
•
Specifies that the values in the foreign key
column must match the values in the primary
key column of the referenced table.
•
CourseID | CourseName
Usefulness in Database Design:
1. Data Integrity:
•
Entity integrity constraints ensure that each
record is uniquely identifiable, maintaining the
integrity of the data at the individual record
level.
•
Referential integrity constraints ensure that
relationships between tables are consistent,
preventing orphaned or dangling records.
1
| Database Design | Smith
2
| Algorithms
CourseID | CourseName
| Johnson
| InstructorName
1
| Database Design | Smith
2
| Algorithms
| Johnson
3
| Data Mining
| (unknown)
Referential integrity constraints ensure that
relationships between tables are consistent,
reflecting the real-world relationships
accurately.
This incomplete information can lead to insertion
anomalies.
Redundant Data:
•
3. Avoiding Orphans and Dangling References:
•
| InstructorName
If we want to insert a new course but don't yet know
the instructor, we can't insert the record without
violating the integrity constraint.
2. Consistency:
•
Suppose we have a table to store
information about courses and their
instructors. The table has columns for
CourseID, CourseName, and
InstructorName.
Referential integrity constraints prevent the
creation of orphaned records (records in a
child table without a corresponding parent
record) or dangling references (references to
non-existent records).
If the same information is repeated for
multiple records, it can lead to redundancy
and inconsistencies.
CourseID | CourseName
| InstructorName
1
| Database Design | Smith
2
| Algorithms
| Johnson
4. Simplified Querying:
3
| Data Mining
| Smith
•
In this example, if an instructor changes, we need to
update multiple records, leading to potential
inconsistencies.
Well-defined relationships through foreign keys
simplify querying and retrieval of related data
from multiple tables.
Inability to Add Certain Information:
5. Ease of Maintenance:
•
•
Constraints contribute to the maintainability of
the database by providing a structured way to
define and enforce rules on the data.
The structure of the table might restrict the
ability to add certain types of information.
CourseID | CourseName
4) Explain with the help of examples, the concept of
insertion anomalies and deletion anomalies
[3]
| InstructorName
1
| Database Design | Smith
2
| Algorithms
| Johnson
If we want to add a new instructor without assigning
them to a course, it may not be possible in the current
structure.
CourseID | CourseName
Ans. To determine the number of redundant functional
dependencies (FDs) in the given set 𝐹 = {𝐴 →
𝐡, 𝐡𝐢 → 𝐷, 𝐷 → 𝐡𝐢, 𝐷𝐸 → ∅}, we can use the
Armstrong's axioms closure computation. The number
of redundant FDs is the difference between the total
number of FDs and the number of essential FDs.
Essential FDs are those that cannot be derived from
others.
| InstructorName
1
| Database Design | Smith
2
| Algorithms
| Johnson
3
| (unknown)
| Brown
This inability to add an instructor without a course
creates an insertion anomaly.
Armstrong's Axioms:
1. Reflexivity: If X is a set of attributes and Y is a
subset of X, then Y→X.
2. Augmentation: If X→Y, then XZ→YZ for any Z.
3. Transitivity: If X→Y and Y→Z, then X→Z.
Deletion Anomalies:
Deletion anomalies occur when removing data from
the database results in unintended loss of information.
There are three main types of deletion anomalies:
Essential Steps:
1. Loss of Entire Data:
•
1. Start with the given FDs:
F={A→B,BC→D,D→BC,DE→∅}
2. Find the closure of each attribute set:
o 𝑨+= {𝐴, 𝐡}
o 𝐡𝐢+= {𝐡, 𝐢, 𝐷}
o 𝐷+= {𝐷, 𝐡, 𝐢}
o 𝐷𝐸+= {𝐷, 𝐸, 𝐡, 𝐢}
3. Identify the essential FDs:
o A→B is essential.
o BC→D is essential.
o D→BC is essential.
o DE→∅ is essential.
Suppose we have a table that stores
information about instructors and the courses
they teach.
InstructorID | InstructorName | CourseID |CourseName
1
| Smith
| 1
| Database Desn
2
| Johnson
| 2
| Algorithms
If an instructor teaches only one course and decides
not to teach anymore, deleting the record results in
the loss of both the instructor and the course
information.
Conclusion:
There are no redundant FDs in the given set FF. All the
FDs are essential and cannot be derived from others.
The closure computation confirms that each FD
provides additional information.
Loss of Specific Information:
•
Deleting a record can result in the loss of
specific information, such as a course taught
by an instructor.
6) What is the goal of query optimization? Why is
optimization important?
InstructorID|InstructorName|CourseID | CourseName
1
| Smith
| 1
| Database Desn
1
| Smith
| 2
| Algorithms
If we delete the record for the course "Algorithms," we
lose information about that specific course taught by the
instructor.
Ans. The goal of query optimization in the context of
database management systems is to improve the
efficiency and performance of queries by finding the
most efficient execution plan for a given SQL query.
The optimization process aims to minimize the time
and resources required to retrieve the desired results
while considering various factors such as indexes, join
strategies, and access methods.
Importance of Query Optimization:
5) Given R with FD set F = (A→B, BC→D, D→BC,
1. Improved Performance:
DE→Ø} Find the number of redundant FDs in F.
•
[4]
Query optimization helps in generating
execution plans that minimize the overall
query execution time. This is crucial for
systems dealing with large datasets and
complex queries.
9. Query Rewrite:
•
2. Resource Utilization:
•
Optimized queries make efficient use of system
resources, including CPU, memory, and
storage. This is particularly important in largescale database systems where resource
consumption directly impacts the overall
system performance.
Some optimization techniques involve
rewriting queries to an equivalent but more
efficient form. This can include transforming
subqueries, simplifying expressions, or using
specific syntax to guide the optimizer.
7) What is normalization? Explain the first and second
normal forms using appropriate example.
3. Reduced Response Time:
Ans.
•
Normalization is a database design process that
involves organizing tables and their relationships to
reduce data redundancy and improve data integrity.
The normalization process consists of several normal
forms, each building on the previous one, with the goal
of systematically organizing data to avoid certain types
of anomalies and redundancies.
By choosing the most efficient execution plan,
query optimization reduces the response time
for users and applications, leading to a more
responsive and user-friendly system.
4. Cost Reduction:
•
Optimized queries reduce the workload on the
database server, leading to lower operational
costs. This is important in terms of hardware
requirements, energy consumption, and
maintenance costs.
First Normal Form (1NF):
A relation is in First Normal Form (1NF) if it meets the
following criteria:
Atomic Values:
Each cell in the table must contain only atomic
(indivisible) values, and these values must be of the
same data type.
Unique Column Names:
Each column in the table must have a unique name.
Ordering of Rows and Columns:
The order in which data is stored does not matter, and
there is no significance to the order of columns.
Example:
Consider the following unnormalized table that violates
1NF:
5. Concurrency and Scalability:
•
Efficient queries contribute to better system
concurrency and scalability. With optimized
queries, multiple users can simultaneously
access and manipulate data without significant
performance degradation.
6. Adaptability to Changing Workloads:
•
Query optimization allows database systems to
adapt to varying workloads. By dynamically
adjusting execution plans based on current
system conditions, the database can handle
changing query patterns and load.
StudentID | Courses
7. Index Utilization:
101
| Math, Physics
•
102
| Chemistry, Biology
Optimization involves selecting appropriate
indexes to speed up data retrieval. Efficient use
of indexes reduces the number of disk I/O
operations, leading to faster query execution.
This table violates 1NF because the "Courses" column
contains multiple values (non-atomic values) separated
by commas. To bring it into 1NF, we can split the table
into two:
8. Join Strategies:
•
Optimization involves choosing the most
efficient join strategies, such as nested loop
joins, hash joins, or merge joins. The selection
depends on the size of tables, available
indexes, and system resources.
StudentID | Course
[5]
101
| Math
101
| Physics
102
| Chemistry
102
| Biology
Now, each cell contains atomic values, and the "Courses"
column has been replaced with a new "Course" column,
adhering to 1NF.
Second Normal Form (2NF):
A relation is in Second Normal Form (2NF) if it is already
in 1NF and if all non-prime attributes (attributes not part
of any candidate key) are fully functionally dependent on
the entire primary key.
Example:
Consider the following table:
EmployeeID | ProjectID | ProjectName
| EmployeeName
1
| 101
| ProjectA
| Alice
2
| 102
| ProjectB
| Bob
3
| 101
| ProjectA
| Carol
In this table, the composite key is {EmployeeID,
ProjectID}, and ProjectName is dependent only on
ProjectID, while EmployeeName is dependent only on
EmployeeID.
1. Active (A):
To bring it into 2NF, we split the table into two:
•
ProjectID | ProjectName
The initial state where the transaction is
actively executing its operations.
101
| ProjectA
2. Partially Committed (PC):
102
| ProjectB
•
EmployeeID | ProjectID | EmployeeName
1
| 101
| Alice
2
| 102
| Bob
3
| 101
| Carol
The transaction has completed its execution
successfully, and it is about to be committed.
However, the system has not yet guaranteed
the permanency of changes.
3. Committed (C):
•
8) During its execution, a transaction passes through
The transaction has completed successfully,
and its changes are now permanent and visible
to other transactions.
4. Failed (F):
several states, until it finally commits or aborts. List
all possible sequences of states through which a
transaction may pass. Explain why each state
transition may occur. OR
•
Define different states of a Transaction with proper
diagram?
The transaction has encountered an error
during its execution, making it impossible to
complete successfully. It will be rolled back to
undo any changes made.
5. Aborted (Abo):
•
Ans. In a database management system, a transaction
goes through various states during its execution. The
typical states that a transaction can pass through
include:
The transaction has been rolled back, and any
changes made during its execution have been
undone. The database is brought back to its
state before the transaction started.
6. Terminated (T):
[6]
•
•
The transaction has either committed or
aborted, and it is no longer actively
participating in the execution.
Now, let's explore the possible sequences of states and
the reasons for each state transition:
8. Failed → Terminated (F → T):
1. Active → Committed (A → C):
•
•
The transaction has executed all its operations
successfully, and it is ready to make its changes
permanent. This transition occurs when the
transaction issues a commit statement, and
the system ensures that it can be committed.
and relational calculus.
Ans.
The transaction has encountered an error or
violation of a constraint during its execution,
making it impossible to proceed. The system
detects the failure and transitions the
transaction to the failed state.
Basis of
Comparison
Language
Type
Relational
Calculus
Relational
Calculus is a
Declarative
(nonprocedural)
language.
Procedure
Relational
Relational
Algebra means Calculus
how to obtain means what
the result.
result we have
to obtain.
Order
In Relational
In Relational
Algebra, the
Calculus, the
order is
order is not
specified in
specified.
which the
operations
have to be
performed.
Domain
Relational
Relation
Algebra is
Calculus can
independent
be domainof the domain. dependent
because of
domain
relational
calculus.
Programming Relational
Relational
language
Algebra is
Calculus is not
nearer to a
nearer to
programming programming
language.
language but
to natural
language.
3. Failed → Aborted (F → Abo):
•
The system decides to abort the transaction
due to a failure. The transaction is rolled back
to undo any changes made, bringing the
database back to its state before the
transaction started.
4. Partially Committed → Committed (PC → C):
•
The system has successfully committed the
transaction's changes after ensuring that it is
safe to do so. The transition occurs after the
system has guaranteed the permanency of
changes.
5. Partially Committed → Aborted (PC → Abo):
•
An error or failure occurs after the
transaction has been partially
committed. The system decides to
abort the transaction, undoing any
changes made during the partial
commitment.
6. Active → Aborted (A → Abo):
•
After the transaction has been rolled
back, it enters the terminated state.
9) Explain the difference between relational algebra
2. Active → Failed (A → F):
•
The transaction has completed its
execution, and it is either committed
or aborted. It enters the terminated
state as it is no longer actively
participating in the execution.
The transaction or the system decides
to abort the transaction for some
reason, possibly due to a user-initiated
rollback or a detected issue during
execution.
7. Active → Terminated (A → T):
[7]
Relational
Algebra
It is a
Procedural
language.
Inclusion in
SQL
The SQL
includes only
some features
from the
relational
algebra.
3. Persistence
SQL is based
to a greater
extent on the
tuple
relational
calculus.
ODBMS provides a mechanism for the persistent
storage of objects, allowing data to survive beyond
the lifespan of the application.
Advantages:
•
Data Integrity: Persistent storage ensures that
data is preserved even if the application is shut
down or the system is restarted, maintaining
data integrity.
•
Efficient Storage: ODBMS optimizes storage for
object-oriented structures, reducing the need
for complex mappings between objects and
relational tables.
10)
Discuss 4 basic features of ODBMS. What are
the advantages?
Ans.
Here are four basic features of ODBMS and their
associated advantages:
1. Object-Oriented Data Model:
4. Concurrency Control:
ODBMS uses an object-oriented data model, where
data is represented as objects, and these objects can
encapsulate both data and the operations that can be
performed on the data.
ODBMS incorporates mechanisms for managing
concurrent access to the database by multiple users or
processes.
Advantages:
Advantages:
•
Encapsulation: Objects encapsulate both data
and behavior, promoting a more modular and
maintainable code structure.
•
Inheritance: Object-oriented models support
inheritance, allowing the creation of
hierarchies and reuse of code, leading to more
efficient and scalable system development.
•
Polymorphism: ODBMS supports
polymorphism, enabling the use of different
object types interchangeably, enhancing
flexibility and code reusability.
•
Concurrency Management: ODBMS supports
concurrent access, allowing multiple users to
read and write data simultaneously without
compromising consistency.
•
Transaction Support: ODBMS provides
transaction management features to ensure
the atomicity, consistency, isolation, and
durability (ACID properties) of database
transactions.
Advantages of ODBMS in General:
•
Improved Developer Productivity: The use of
an object-oriented data model simplifies the
mapping between the application code and
the database, reducing development time and
effort.
•
Enhanced Data Modeling: ODBMS facilitates a
more natural and intuitive representation of
complex relationships and structures, making it
easier to model real-world scenarios.
•
Increased Flexibility: The support for complex
data types and object-oriented features
enhances the flexibility of data modeling,
allowing developers to adapt to changing
requirements more easily.
•
Reduced Impedance Mismatch: ODBMS
minimizes the impedance mismatch between
the object-oriented programming paradigm
2. Complex Data Types:
ODBMS allows the use of complex data types, such as
arrays, lists, and structures, within objects, providing
more flexibility in modeling real-world entities and
relationships.
Advantages:
•
•
Rich Data Modeling: Complex data types allow
for a more accurate representation of realworld entities and relationships, leading to a
more intuitive and natural data model.
Improved Query Capabilities: The ability to use
complex data types in queries enables more
powerful and expressive query languages for
retrieving and manipulating data.
[8]
used in applications and the relational model
used in traditional databases, leading to more
seamless integration.
Purpose: The logical schema represents the overall
logical structure of the entire database as seen by the
database administrator or designer. It provides an
abstract representation of the data model, including
entities, relationships, constraints, and the meaning of
the data.
11)
Explain the three-schema architecture of
DBMS?
Components:
Ans.
•
Entity-Relationship Diagrams (ERDs): Illustrates
the entities, relationships, and attributes in the
database, offering a high-level view of the data
model.
•
Integrity Constraints: Defines rules and
constraints that ensure the accuracy and
consistency of the data across the database.
3. Physical Schema:
Purpose: The physical schema describes how data is
stored, indexed, and retrieved at the physical level. It
represents the actual implementation details of the
database on the underlying hardware, such as storage
structures, indexing mechanisms, and access paths.
Components:
The three-schema architecture is a conceptual
framework that was proposed by the database
community as a means to separate the user
applications from the physical database. It divides the
database system into three components, or "schemas,"
each serving a specific purpose. This architecture
provides a clear and modular structure for designing
database systems. The three schemas are as follows:
•
Indexes: Specifies the indexes created on
tables to optimize data retrieval.
•
Storage Structures: Defines how data is stored
on disk, including details such as file
organization, clustering, and partitioning.
•
Access Paths: Describes the methods used to
access and retrieve data efficiently.
1. User Schema (External Schema):
Purpose: The user schema represents the way data is
viewed and accessed by individual users or
applications. It defines the logical structure and
organization of data as seen by a specific user or group
of users.
12)
Give an example of a weak entity set? Explain
why it is weak with an ER diagram.
Ans. A weak entity set is an entity set that does not
have a primary key attribute that uniquely identifies its
entities independently of other entities. It depends on
another entity, called the "owner" or "parent" entity,
for identification. The existence of a weak entity is
meaningful only in the context of the owning entity.
The relationship between a weak entity set and its
owning entity set is typically represented by a "strong"
or "identifying" relationship.
Components:
•
User Views: Describes how data appears to
specific users or applications. It includes
subsets of data, specific fields, and customized
structures tailored to meet the requirements
of individual user perspectives.
•
User Operations: Specifies the operations and
transactions that users can perform on the
data.
Let's consider an example of a weak entity set:
2. Logical Schema (Conceptual Schema):
[9]
that a Dependent entity is uniquely identified
within the context of a specific Professor entity.
13)
What do you mean by the Integrity
constraints? Explain each with the proper
example?
Ans. Integrity constraints are rules that are defined on
a database schema to ensure the accuracy, consistency,
and reliability of the data stored in a relational
database. These constraints help maintain data
integrity and prevent the entry of inconsistent or
invalid data. There are several types of integrity
constraints, each serving a specific purpose. Here are
the main types of integrity constraints, along with
explanations and examples:
•
Strong Entity: Professor
•
•
1.Entity Integrity Constraint:
Definition: Ensures that each row in a table has a unique
and non-null primary key value.
Example: In a "Students" table, the "StudentID" column is
the primary key. The entity integrity constraint ensures
that each student has a unique identifier (StudentID) and
that the identifier cannot be null.
Attributes: ID (Primary Key), Name,
Salary, City
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
CONSTRAINT PK_Students PRIMARY KEY (StudentID)
);
2.Referential Integrity Constraint:
Weak Entity: Dependent
•
Attributes: Name (Partial Key), DOB,
Relation
•
Dependent on Professor entity for
identification.
Definition: Ensures that relationships between tables
remain consistent. It requires that foreign key values in a
child table match primary key values in the parent table.
Example: In a "Orders" table, the "CustomerID" column is
a foreign key referencing the "Customers" table's primary
key.
Explaination: In this scenario, the Professor entity is
considered a strong entity with the primary key
attribute "ID." On the other hand, the Dependent
entity is a weak entity because it does not have a
primary key that uniquely identifies it independently of
the Professor entity. The "Name" attribute in the
Dependent entity is a partial key, meaning it is not
sufficient on its own to uniquely identify a Dependent.
•
The "Professor" entity has a primary key attribute
"ID."
•
The "Dependent" entity is a weak entity set with
the partial key attribute "Name" and attributes like
DOB and Relation.
•
The "Dependent" entity has a foreign key attribute
"ProfessorID," which establishes an identifying
relationship with the "Professor" entity. This means
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
CONSTRAINT FK_Orders_Customers FOREIGN KEY
(CustomerID) REFERENCES Customers(CustomerID)
);
3.Domain Integrity Constraint:
Definition: Enforces valid data types, formats, and ranges
for columns.
Example: Ensuring that a "BirthDate" column only
contains valid dates.
[10]
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
BirthDate DATE CHECK (BirthDate >= '1900-01-01' AND
BirthDate <= CURRENT_DATE)
);
•
Axiom: If X → Y and Y → Z, then X → Z.
•
Explanation: If X functionally determines Y, and
Y functionally determines Z, then X functionally
determines Z. This rule allows us to derive new
functional dependencies by transitive
relationships.
Example: Consider a set of functional dependencies:
1. A → B
4.Key Constraint:
Definition: Ensures that all values in a column or a
combination of columns are unique.
Example: Ensuring that each email address in a "Users"
table is unique.
2. BC → D
3. D → E
Applying Armstrong's axioms:
1. Reflexivity (Reflexivity Rule):
CREATE TABLE Users (
UserID INT PRIMARY KEY,
Email VARCHAR(255) UNIQUE,
Password VARCHAR(50)
);
•
2. Augmentation (Augmentation Rule):
14)
Explain the Armstrong's axioms for Functional
dependencies?
Explanation: This axiom states that if a set of
attributes Y is functionally determined by
another set X, and Y is a subset of X, then X
functionally determines Y.
Axiom: If X → Y, then XZ → YZ for any set of
attributes Z.
•
Explanation: If a set of attributes X functionally
determines another set Y, then adding any set
of attributes Z to both X and Y will still result in
X functionally determining Y.
If BC → D, then BCD → DCD
•
If A → B and B → C, then A → C
•
If A → B and BC → D, then A → D
Ans.
To normalize a relation RR up to Boyce-Codd
Normal Form (BCNF), we need to follow a step-bystep process:
Given Relation 𝑅(𝐴, 𝐡, 𝐢, 𝐷, 𝐸, 𝐹) with Functional
Dependencies (FDs): {𝑨π‘ͺ, 𝑩𝑬, 𝑨𝑩π‘ͺ, π‘ͺ𝑫, 𝑬 → 𝑭}
Step 1: Identify Candidate Keys
2. Augmentation Axiom (Augmentation Rule):
•
•
Given R (A, B, C, D, E, F) with FDs {AC, BE,
ABC, CD, E→F} Normalize R up to BCNF.
1. Reflexivity Axiom (Reflexivity Rule):
•
If A → B, then AC → BC
15)
Here are the three main Armstrong's axioms:
Axiom: If Y is a subset of X, then X → Y.
•
3. Transitivity (Transitivity Rule):
Ans. Armstrong's axioms are a set of inference rules
that form the foundation for reasoning about
functional dependencies in a relational database.
These axioms were introduced by William W.
Armstrong and provide a systematic way to derive new
functional dependencies from given ones. The axioms
help in understanding and manipulating the closure of
sets of functional dependencies.
•
A → A (Reflexivity of A → A)
Determine the closure of attribute sets to find
candidate keys.
Closure of {A}+={A,C,D,F}
Closure of {B}+={B,E,F}
Both {A} and {B} are candidate keys.
3. Transitivity Axiom (Transitivity Rule):
Step 2: Check for BCNF Violations
[11]
Example:
Check if there are any FDs where the left side is
not a superkey.
ABC violates BCNF as {𝑨}+=
{𝑨, π‘ͺ, 𝑫, 𝑭} does not contain B
Step 3: Decompose the Relation
•
Decompose the relation to remove the
BCNF violation.
StudentID | Course
| Instructor
1
| Math
| Mr. A
1
| Physics
| Mr. B
2
| Chemistry | Mr. C
2
| Physics
| Mr. B
R1(A,B,C,D,F),R2(B,E)
In this example, the MVD StudentID β†  Course
indicates that each student can be enrolled in multiple
courses.
Step 4: Check Decomposed Relations
Fourth Normal Form (4NF):
•
Check if the decomposed relations are in
BCNF.
A relation is in 4NF if it is in Boyce-Codd Normal
Form (BCNF) and has no non-trivial multivalued
dependencies.
For R1(A,B,C,D,F):
•
Now, let's consider the relation
EmployeeProjects(EmployeeID,Project,Skill)Emplo
yeeProjects(EmployeeID,Project,Skill) with the
following dependencies:
No BCNF violation as
{A}+={A,C,D,F}contains all attributes.
For R2(B,E):
•
•
•
•
No BCNF violation as {B}+={B,E,F} contains
all attributes.
Result: The normalized relations are R1(A,B,C,D,F)
and R2(B,E), and they are in Boyce-Codd Normal
Form (BCNF).
EmployeeID β†  Project
EmployeeID β†  Skill
Project β†  Skill
Example:
16)
What is multivalued functional
dependency? Explain 4NF with
example.
Ans.
EmployeeID | Project
| Skill
1
| ProjectA
| Java
1
| ProjectA
| SQL
1
| ProjectB
| Python
2
| ProjectB
| Java
2
| ProjectC
| SQL
Here, the MVD Project β†  Skill indicates that for a
specific project, there can be multiple skills.
Multivalued Functional Dependency (MVD):
To bring the relation into 4NF, we decompose it
into two tables:
An MVD occurs when there is a dependency
between two sets of attributes in a relation,
indicating that for a given value of one set of
attributes, there can be multiple values associated
with another set of attributes.
Employee_Projects_1:
Consider a relation R(StudentID, Course,
Instructor) with an MVD StudentID β† Course,
meaning that for a specific student, there can be
multiple courses.
[12]
EmployeeID
1
1
2
|
|
|
|
Project
ProjectA
ProjectB
ProjectB
2
| ProjectC
3. Deadlock Detection: Periodically check for the
presence of a deadlock using algorithms. If
detected, take corrective actions like aborting
one or more transactions.
Project_Skills:
Project
ProjectA
ProjectA
ProjectB
ProjectB
ProjectC
|
|
|
|
|
|
Skill
Java
SQL
Python
Java
SQL
Starvation:
Definition: Starvation occurs when a transaction is
delayed or blocked indefinitely from making progress
because other transactions continuously obtain
resources, preventing it from executing.
Now, both tables are in BCNF, and there are no
non-trivial multivalued dependencies within each
table. This decomposition ensures that the
original relation is in 4NF. Each table represents a
distinct aspect of the original information,
avoiding redundancy and adhering to the
principles of normalization.
Causes:
1. Priority Inversion: Lower-priority transactions
might be continuously preempted by higherpriority transactions, preventing them from
making progress.
2. Resource Monopoly: If a transaction
consistently acquires resources, it might
prevent other transactions from accessing
those resources.
17)
What are the deadlock and starvation
problems in database concurrent tractions?
Example: Consider a situation where there are
multiple transactions, but one transaction with higher
priority always gets access to resources, preventing
lower-priority transactions from executing.
Ans. Deadlock:
Definition: A deadlock occurs in a database system
when two or more transactions are blocked
indefinitely, each waiting for the other to release a
resource. Essentially, it's a situation where
transactions are unable to proceed because each is
holding a resource that the other needs.
18)
Explain the ACID property of a Transaction in
detail.
Causes:
1. Circular Wait: Transactions form a circular
chain, with each transaction in the chain
holding a resource that the next transaction in
the chain is waiting for.
Ans. ACID is an acronym that stands for the four key
properties of a database transaction: Atomicity,
Consistency, Isolation, and Durability. These properties
ensure that transactions are reliable and maintain
data integrity, even in the face of errors or system
failures.
2. No Preemption: Resources cannot be forcibly
taken away from a transaction; they can only
be released voluntarily.
Here's a detailed explanation of each ACID property:
Example: Consider two transactions, T1 and T2. If T1
has locked Resource A and is waiting for Resource B,
and T2 has locked Resource B and is waiting for
Resource A, a deadlock occurs.
1. Atomicity:
Deadlock Prevention and Handling:
1. Lock Ordering: Enforce a strict order in which
transactions can request and acquire locks to
avoid circular waits.
2. Timeouts: Implement timeouts for
transactions. If a transaction doesn't acquire
all required locks within a certain time, it may
release its locks and restart.
•
A transaction is an indivisible unit of work.
•
It either executes completely, or not at all.
•
If any part of the transaction fails, the entire
transaction is rolled back to its initial
state, ensuring no partial changes are made.
•
Example: A money transfer between accounts
must either complete in full or not happen at
all, to avoid inconsistencies.
2. Consistency:
[13]
•
A transaction must transform the database
from one consistent state to another.
•
It must adhere to all defined integrity
constraints and business rules.
•
Example: If a transaction involves debiting one
account and crediting another, the total
balance across both accounts must remain
consistent.
Primary Index is an ordered file which is fixed length
size with two fields. The first field is the same a
primary key and second, filed is pointed to that
specific data block. In the primary Index, there is
always one to one relationship between the entries in
the index table.
The primary Indexing in DBMS is also further divided
into two types.
3. Isolation:
•
•
•
•
Each transaction must execute in isolation
from other concurrently running transactions.
Changes made by one transaction should not
be visible to others until it commits.
This prevents inconsistencies and conflicts
caused by interleaved operations.
Example: Two users attempting to modify the
same product inventory simultaneously
should not interfere with each other's
changes.
•
Dense Index
•
Sparse Index
Dense Index
In a dense index, a record is created for every search
key valued in the database. This helps you to search
faster but needs more space to store index records. In
this Indexing, method records contain search key
value and points to the real record on the disk.
4. Durability:
•
Once a transaction commits (successfully
completes), its changes must be permanent.
•
They must be persisted to the database and
survive system failures or power outages.
•
Example: If a transaction records a
payment, that payment record must be
preserved even in the event of a system crash.
19)
What is the use of Index in
database management system?
Explain the primary and secondary
Indexing with proper diagram?
Sparse Index
It is an index record that appears for only some of the
values in the file. Sparse Index helps you to resolve the
issues of dense Indexing in DBMS. In this method of
indexing technique, a range of index columns stores
the same data block address, and when data needs to
be retrieved, the block address will be fetched.
Ans. Use of Index in Database Management System
(DBMS):
An index in a database is a data structure that
improves the speed of data retrieval operations on a
database table. It works like the index in a book,
allowing the database management system to locate
and access the rows in a table quickly. Indexing is
crucial for efficient querying and retrieval of data,
especially in large databases.
Below is a database index Example of Sparse Index
Primary Index in DBMS
[14]
16 MARKS
Q1
a. What is lossy decomposition? Check whether the
following decompositions are lossy or lossless.
(i) Let R=ABCD, R1 = AD, R2 = AB, R3 = BE, R4 = CDE, R5
= AE, F={A->C, B- >C, C->D, DE->C, CE->A}
(ii) R (XYZWQ), FD= {X->Z, Y->Z, Z->W, WQ->Z, ZQ-> X,
R1 (XW), R2 (XY), R3
(YQ), R4 (ZWQ), R5 (XQ)
b. Eliminate redundant FDs from
(i) F={X->Y, Y->X, Y->Z, Z->Y, X->Z, Z->X}
Secondary Index in DBMS
(ii) F = {X->YZ, ZW->P, P->Z, W->XPQ, XYQ, YW, WQ ->YZ}
The secondary Index in DBMS can be generated by a
field which has a unique value for each record, and it
should be a candidate key. It is also known as a nonclustering index.
Ans.
a. Lossy Decomposition:
Let’s understand secondary indexing with a database
index example:
Lossy decomposition refers to a situation in
database normalization where the decomposition
of a relation into multiple smaller relations results
in a loss of information, making it impossible to
reconstruct the original relation. In other words,
the join of the decomposed relations does not
produce the same result as the original relation.
Let's check the given decompositions:
In a bank account database, data is stored sequentially
by acc_no; you may want to find all accounts in of a
specific branch of ABC bank.
(i) Decomposition: R=ABCD, R1=AD, R2=AB,
R3=BE, R4=CDE, R5=AE, F={A→C, B→C, C→D,
DE→C, CE→A}
Here, you can have a secondary index in DBMS for
every search-key. Index record is a record point to a
bucket that contains pointers to all the records with
their specific search-key value.
To check for lossy decomposition, we need to see
if the natural join of the decomposed relations is
equal to the original relation:
This two-level database indexing technique is used to
reduce the mapping size of the first level. For the first
level, a large range of numbers is selected because of
this; the mapping size always remains small.
Secondary Index Example
𝑅1 β‹ˆ 𝑅2 β‹ˆ 𝑅3 β‹ˆ 𝑅4 β‹ˆ 𝑅5 = 𝐴𝐷 β‹ˆ 𝐴𝐡 β‹ˆ 𝐡𝐸
β‹ˆ 𝐢𝐷𝐸 β‹ˆ 𝐴𝐸
If this is equal to R=ABCD, then it's lossless;
otherwise, it's lossy.
(ii) Decomposition: 𝑅(π‘‹π‘Œπ‘π‘Šπ‘„), 𝐹𝐷 = {𝑋 →
𝑍, π‘Œ → 𝑍, 𝑍 → π‘Š, π‘Šπ‘„ → 𝑍, 𝑍𝑄 → 𝑋},
𝑅1(π‘‹π‘Š), 𝑅2(π‘‹π‘Œ), 𝑅3(π‘Œπ‘„), 𝑅4(π‘π‘Šπ‘„), 𝑅5(𝑋𝑄)
Similarly, we check if 𝑅1 β‹ˆ 𝑅2 β‹ˆ 𝑅3 β‹ˆ 𝑅4 β‹ˆ 𝑅5
is equal to 𝑅 = π‘‹π‘Œπ‘π‘Šπ‘„.
/////////////////////////////////////////////////////////
//////////////////////////////////////
b. Eliminate Redundant FDs:
[15]
(i) FD Set: 𝐹 = {𝑋 → π‘Œ, π‘Œ → 𝑋, π‘Œ → 𝑍, 𝑍 →
π‘Œ, 𝑋 → 𝑍, 𝑍 → 𝑋}
the basic operations and their representations in
each of these relational languages:
To eliminate redundant FDs, we can use the
Armstrong's axioms and closure calculation:
1. Relational Algebra:
Relational algebra operations include:
1. Start with the given set of FDs.
2. Check for redundancy by calculating the
closure of the left side and seeing if it
implies the right side.
1. Selection (σ): Selects rows from a relation
based on a given condition.
2. Projection (π): Retrieves specific columns
from a relation.
3. Union (∪): Combines tuples from two
relations, eliminating duplicates.
4. Difference (-): Subtracts tuples from one
relation that are also in another relation.
5. Cross Product (×): Produces a Cartesian
product of two relations.
6. Rename (ρ): Renames the attributes of a
relation.
After eliminating redundant FDs, we get a minimal
cover.
(ii) FD Set: 𝐹 = {𝑋 → π‘Œπ‘, π‘π‘Š → 𝑃, 𝑃 → 𝑍, π‘Š →
𝑋𝑃𝑄, π‘‹π‘Œπ‘„, π‘Œπ‘Š, π‘Šπ‘„ → π‘Œπ‘}
Similarly, apply the Armstrong's axioms to
eliminate redundant FDs and obtain a minimal
cover.
Example (in relational algebra):
Q2
a. A database is being constructed to keep track
of the teams and games of a
•
Selection: σAge>21(Students)
•
Projection: πName,GPA(Students)
Union: R∪S
Difference: R−S
Cross Product: R×S
Rename: ρNewName(R)
•
•
•
•
sports league. A term has a number of players, not
all of whom participate in each
game. It is desired to keep track of the players
participating in each game for each
2. Tuple Relational Calculus (TRC):
team, the positions they played in that game, and
the result of the game. Try to
In TRC, the basic operations include:
1. Selection (∀): Specifies a condition to filter
tuples.
2. Projection (∏): Lists the attributes to be
included in the result.
3. Join (∼): Combines tuples from two
relations based on a common attribute.
4. Rename (∼): Renames attributes.
design an ER schema diagram for this application,
stating any assumptions you
make, Choose your favourite sport (soccer,
football, baseball....)
b. What are the basic operations for a relational
language? How are basic operations represented
in relational algebra, TRC, DRC, and SQL?
Example (in TRC):
Ans.
•
The basic operations in a relational language, such
as relational algebra, Tuple Relational Calculus
(TRC), Domain Relational Calculus (DRC), and SQL,
are generally aimed at retrieving, manipulating, or
combining data in relational databases. Below are
•
•
•
[16]
Selection: {{𝑑 ∣ 𝑑 ∈ 𝑆𝑑𝑒𝑑𝑒𝑛𝑑𝑠 ∧ 𝑑. 𝐴𝑔𝑒 >
21}
Projection: {𝑑. π‘π‘Žπ‘šπ‘’, 𝑑. 𝐺𝑃𝐴 ∣ 𝑑 ∈
𝑆𝑑𝑒𝑑𝑒𝑛𝑑𝑠}
Join: {𝑠, 𝑐 ∣ 𝑠 ∈ 𝑆𝑑𝑒𝑑𝑒𝑛𝑑𝑠, 𝑐 ∈ πΆπ‘œπ‘’π‘Ÿπ‘ π‘’π‘  ∧
𝑠. 𝐼𝐷 = 𝑐. 𝑆𝑑𝑒𝑑𝑒𝑛𝑑𝐼𝐷}
Rename: {𝑑. π‘π‘’π‘€π‘π‘Žπ‘šπ‘’ ∣ 𝑑 ∈ 𝑅}
Serializability in the context of database
transactions refers to the property where the
execution of a set of transactions produces results
that are equivalent to some serial execution of
those transactions. In other words, the final state
of the database should be the same as if the
transactions were executed one after the other in
some order. Serializability ensures the consistency
of the database despite concurrent execution of
3. Domain Relational Calculus (DRC):
DRC is similar to TRC but uses quantifiers over the
domain.
Example (in DRC):
•
•
•
•
Selection: {𝑑 ∣ 𝑑 ∈ 𝑆𝑑𝑒𝑑𝑒𝑛𝑑𝑠 ∧ 𝑑. 𝐴𝑔𝑒 >
21}
Projection: {𝑑. π‘π‘Žπ‘šπ‘’, 𝑑. 𝐺𝑃𝐴 ∣ 𝑑 ∈
𝑆𝑑𝑒𝑑𝑒𝑛𝑑𝑠}
Join: {𝑠, 𝑐 ∣ 𝑠 ∈ 𝑆𝑑𝑒𝑑𝑒𝑛𝑑𝑠, 𝑐 ∈ πΆπ‘œπ‘’π‘Ÿπ‘ π‘’π‘  ∧
𝑠. 𝐼𝐷 = 𝑐. 𝑆𝑑𝑒𝑑𝑒𝑛𝑑𝐼𝐷}
Rename: {𝑑. π‘π‘’π‘€π‘π‘Žπ‘šπ‘’ ∣ 𝑑 ∈ 𝑅}
transactions.
Conflict Serializability:
Conflict serializability is a particular form of
serializability that focuses on conflicts between
transactions. A conflict occurs when two
transactions access the same data item, and at
least one of them is a write operation. There are
two types of conflicts:
4. SQL (Structured Query Language):
In SQL, the basic operations include:
1. SELECT: Corresponds to both selection and
projection.
2. FROM: Specifies the tables from which to
retrieve data.
3. JOIN: Combines data from multiple tables
based on a condition.
4. WHERE: Applies conditions to filter rows.
5. GROUP BY: Groups rows based on
specified attributes.
6. ORDER BY: Sorts the result set based on
specified columns.
7. INSERT, UPDATE, DELETE: Modify data in
the tables.
1. Read-Write Conflict (RW):
o Occurs when one transaction reads
a data item, and another
transaction writes to the same data
item.
2. Write-Write Conflict (WW):
o Occurs when two transactions
write to the same data item.
Example for Conflict Serializability –
T1
•
•
•
T3
R(X)
R(X)
Example (in SQL):
•
T2
W(Y)
W(X)
Selection: ‘SELECT * FROM Students
R(Y)
WHERE Age > 21;’
Projection: ‘SELECT Name, GPA FROM
Students;’
Join: SELECT ‘Students.Name,
Courses.CourseName FROM Students JOIN
Courses ON Students.ID = Courses.StudentID;’
Rename: ‘SELECT Name AS NewName
FROM R;’
W(Y)
Now, we will list all the conflicting operations.
Further, we will determine whether the schedule
is conflict serializable using Precedence Graph.
Two operations are said to be conflicting if the
belong to different transaction, operate on same
data and at least one of them is a write operation.
Q3
a. What is serializability? Explain conflict
serializability and view serializability.
1.
2.
3.
4.
Ans.
[17]
R3(X) and W2(X) [ T3 -> T2 ]
W1(Y) and R3(Y) [ T1 -> T3 ]
W1(Y) and W2(Y) [ T1 -> T2 ]
R3(Y) and W2(Y) [ T3 -> T2 ]
Constructing the precedence graph, we see there
are no cycles in the graph. Therefore, the schedule
is Conflict Serializable.
1.
2.
3.
4.
5.
6.
7.
R1(A) and W2(A) [ T1 -> T2 ]
R1(A) and W3(A) [ T1 -> T3 ]
W2(A) and R3(A) [ T2 -> T3 ]
W2(A) and W1(A) [ T2 -> T1 ]
W2(A) and W3(A) [ T2 -> T3 ]
R3(A) and W1(A) [ T3 -> T1 ]
W3(A) and W1(A) [ T1 -> T3 ]
Constructing the precedence graph for conflicting
operations in the schedule.
The serializable schedule is,
T1 -> T3 -> T2
View Serializability:
View serializability is another form of serializability
that focuses on the final outcome seen by users
(views) of the database. It allows more flexibility in
the scheduling of transactions as long as the views
presented to users are consistent with some serial
order.
b. Test if the following schedule is conflict
serializable or not. R1(A), R2 (D), W1 (B),
R2(B), W3 (B), R4 (B), W2 (C), R5 (C), W4
(E), R5 (E), W5
Ans.
A schedule is view serializable if, for every pair of
transactions Ti and Tj in the schedule, the
following conditions hold:
To determine if the given schedule is conflict
serializable, we can use the precedence graph method.
The precedence graph helps visualize the
dependencies between transactions based on read and
write operations. If the graph is acyclic, the schedule is
conflict serializable.
1. If Ti completes before Tj starts, then the
view of the database seen by Tj is the
same as if Ti had executed first.
2. If Tj reads a data item written by Ti, then
the view of the database seen by Tj is the
same as if Ti had executed just before Tj.
Given the schedule:
𝑅1(𝐴), 𝑅2(𝐷), π‘Š1(𝐡), 𝑅2(𝐡), π‘Š3(𝐡), 𝑅4(𝐡), π‘Š2(𝐢), 𝑅5(𝐢),
Example for View Serializability –
Let us consider the following transaction schedule
and test it for View Serializability.
T1
R(A)
T2
Let's construct the precedence graph:
1. Transaction Nodes: Create a node for each
transaction.
T3
•
Nodes: 𝑇1, 𝑇2, 𝑇3, 𝑇4, 𝑇5
2. Directed Edges: For each conflicting operation,
draw a directed edge from the transaction that
performs the earlier operation to the one that
performs the later operation.
W(A)
R(A)
W(A)
W(A)
•
As we know that if a schedule is Conflict
Serializable, then it is View Serializable also. So
first let us check for Conflict Serializability.
Edges:
𝑇1 → 𝑇2, 𝑇1 → 𝑇3, 𝑇1 → 𝑇4, 𝑇2 → 𝑇3
,
𝑇3 → 𝑇4,2 → 𝑇5, 𝑇4 → 𝑇5
The conflicting operations for this schedule are –
[18]
Now, examine the graph for cycles. If there are no
cycles, the schedule is conflict serializable.
o
o
The graph is as follows:
T1
/ | \
/ | \
v v v
T2 T3 T4
\ |
/
\ | /
|
v
T5
Conclusion:
2. Multigranularity Locks:
•
•
The precedence graph contains a cycle (R2 -> W1 -> R4 > R2), indicating that the schedule is not conflict
serializable. Therefore, the given schedule does not
satisfy the conflict serializability property.
•
a. Explain various locking technique for
concurrency control.
Ans.
Concurrency control in a database system is
crucial to ensure that multiple transactions can
execute concurrently without interfering with
each other, preserving the consistency of the
database. Locking is a widely used technique for
concurrency control. Here are various locking
techniques:
•
Timeouts and Detection:
o Transactions are given a certain
amount of time to complete, and if
they cannot acquire required locks
within this time, they are rolled
back.
o Detection algorithms identify
circular wait conditions and resolve
deadlocks.
4. Timestamp-Based Concurrency Control:
•
1. Binary Locks (Two-Phase Locking):
•
Basic Idea:
o Locks can be acquired at various
levels of granularity (e.g., at the
level of a page, table, or database).
o Reduces contention by allowing
transactions to lock only the
portion of the data they need.
Example:
o A transaction might lock a specific
row, a page, or an entire table.
3. Deadlock Handling:
Q4
•
No transaction can request a new
lock once it releases any lock.
A transaction cannot release any
lock until it has acquired all the
locks it needs.
Basic Idea:
o Transactions acquire locks before
accessing data items and release
locks when done.
o Two phases: Growing Phase
(acquiring locks) and Shrinking
Phase (releasing locks).
Types of Locks:
o Shared Lock (S): Multiple
transactions can hold shared locks
on the same item simultaneously.
o Exclusive Lock (X): Only one
transaction can hold an exclusive
lock on an item.
Protocol:
•
Basic Idea:
o Assign a unique timestamp to each
transaction representing its start
time.
o Use timestamps to determine the
order of operations and resolve
conflicts.
Types:
o Timestamp Ordering Protocol: Uses
timestamps to order transactions
and prevent conflicts.
o Thomas Write Rule: Allows a
transaction to write if its
timestamp is greater than the
timestamp of the last transaction
that wrote the item.
5. Optimistic Concurrency Control:
[19]
•
•
Basic Idea:
o Transactions proceed without
locking resources.
o Validation phase checks for
conflicts before committing.
o If conflicts are detected,
transactions are rolled back and
restarted.
Timestamps are used to determine the order
of operations and to detect conflicts.
2. Validation Phase:
•
Transactions proceed without acquiring locks
during the execution phase.
•
At the end of the transaction, a validation
phase is performed to check for conflicts with
other transactions.
6. Two-Phase Commit (2PC):
3. Conflict Detection:
•
Basic Idea:
o Ensures atomicity of distributed
transactions.
o Coordinator sends a "prepare"
message to all participants, and
participants reply with an
acknowledgment.
o If all participants agree, the
coordinator sends a "commit"
message; otherwise, it sends an
"abort" message.
7. Read and Write Locks:
•
•
Conflicts are detected by comparing the
timestamps of transactions and the data items
they have accessed.
•
Common types of conflicts include:
Read-Write Conflict: If a transaction
attempts to write to an item that has
been read by a later transaction.
o
Write-Write Conflict: If two
transactions attempt to write to the
same item.
4. Rollback and Restart:
Basic Idea:
o Introduces separate read and write
locks to allow multiple transactions
to read a data item simultaneously
while ensuring exclusive access for
writing.
o Reduces contention for read
operations.
•
If conflicts are detected during the validation
phase, the transaction is rolled back.
•
The transaction is then restarted with a new
timestamp, and the process repeats.
5. Example Algorithm: Thomas Write Rule:
b. Describe optimistic concurrency control
techniques?
•
The Thomas Write Rule is an optimistic
concurrency control protocol used to manage
conflicts in write operations.
•
If a transaction T wants to write to an item, it
can proceed only if its timestamp is greater
than the timestamp of the last transaction that
wrote to that item.
Ans.
Optimistic Concurrency Control (OCC) is a concurrency
control technique that allows transactions to proceed
without acquiring locks on data items during the
execution phase. Instead of locking resources, OCC
allows transactions to execute freely and checks for
conflicts at the end of the transaction, during the
validation phase. If conflicts are detected, the
transaction is rolled back and restarted. Here are the
key characteristics and techniques associated with
Optimistic Concurrency Control:
6. Benefits of Optimistic Concurrency Control:
•
Reduces contention for locks, allowing for
greater parallelism in transaction execution.
•
Optimistic approach is well-suited for scenarios
where conflicts are infrequent.
7. Drawbacks:
•
1. Timestamps:
•
o
Transactions are assigned unique timestamps
that represent their start times.
[20]
Increased rollbacks: Transactions may be rolled
back and restarted more frequently in case of
conflicts.
•
o
Additional overhead: The validation phase
introduces extra overhead to check for
conflicts.
4. Media Failures:
8. Applications:
•
Optimistic Concurrency Control is often used in
scenarios where contention for resources is
low, and conflicts are expected to be
infrequent.
MVCC maintains multiple versions of data
items with different timestamps.
•
Allows for concurrent read and write
operations without conflicts by ensuring that
each transaction reads a consistent snapshot
of the database.
Example:
1. Transaction T modifies data item X.
2. The modification is immediately applied to the
database and logged.
3. If a failure occurs before the commit, the
recovery manager uses the log to undo the
changes made by the incomplete transaction.
Types of Database Failures:
Deferred Database Modification:
1. Transaction Failures:
Can result from application errors,
hardware failures, or system crashes.
In this approach, changes made by a transaction are
first recorded in the log. The actual modifications to
the database are deferred until the transaction is
committed. If a failure occurs, the recovery manager
uses the log to undo or redo transactions.
Example:
2. System Failures:
o
o
1. Transaction T modifies data item X.
Result from hardware or software
faults that cause the entire system to
crash.
2. The modification is recorded in the log but not
applied to the database.
Can lead to loss of data in memory.
3. If the transaction is committed, the changes
are applied to the database.
3. Disk Failures:
o
Examples include disk corruption, file
system errors, or accidental deletion.
In this approach, changes made by a transaction are
immediately written to the database and the log. If a
failure occurs, the recovery manager uses the log to
undo or redo transactions.
Database failures can occur due to various reasons,
and recovery mechanisms are crucial to ensure the
integrity and consistency of the database. Here are
various types of database failures:
o
o
Immediate Database Modification:
Ans.
Occur when a transaction cannot
proceed due to an error.
Involve the loss or corruption of data
due to issues with storage media.
Log-based recovery is a technique used to recover the
database after a failure. It involves maintaining a
transaction log that records all changes made to the
database during transactions. There are two
approaches to log-based recovery: immediate and
deferred.
Q5 What are the various types of database
failure? Explain Log-Based recovery
scheme by showing the immediate and
deferred database modification with
proper example.
o
o
Log-Based Recovery Scheme:
9. Example Scenario: Multi-Version Concurrency
Control (MVCC):
•
Can result in data loss if not handled
properly.
4. If a failure occurs before the commit, the
recovery manager uses the log to undo or
redo the changes.
Occur when one or more disks storing
the database become unavailable or
fail.
Example Scenario:
[21]
𝑅(𝐴, 𝐡, 𝐢, 𝐷, 𝐸, 𝐹)π‘€π‘–π‘‘β„Ž 𝐹𝐷𝑠: {𝐴𝐡𝐢, 𝐡𝐢𝐷, 𝐷𝐸𝐹, 𝐡𝐢
→ 𝐴𝐺, 𝐴𝐡𝐺 → 𝐷𝐹}
Suppose we have a simple transaction that transfers
money from one account to another:
1. Transaction T:
a) Find the Closure of Each Determinant:
o
Reads the balance of account A.
o
Deducts Rs. 10000 from the balance
of account A.
o
Adds Rs. 10000 to the balance of
account B.
1.
2.
3.
4.
5.
6.
7.
2. Immediate Modification:
o
Changes are immediately applied to
the database and logged.
Operation
Read(A)
|
|
To find the candidate key, we need to check the
closure of each possible combination of
attributes.
Data
T1
|
Write(A)
| -10000
T1
|
Write(B)
| +10000
1. ABC+: ABC+ includes {ABCDEF}.
2. AB+: AB+ includes {ABCDEF}.
3. BC+: BC+ includes {BCDEFA}.
4. AC+: AC+ includes {ACBDEF}.
5. BD+: BD+ includes {BCDEFA}.
6. CD+: CD+ includes {CDEFA}.
7. DE+: DE+ includes {DE}.
8. DF+: DF+ includes {DF}.
9. EF+: EF+ includes {EF}.
10. ABCDEF+: ABCDEF+ includes {ABCDEF}.
If a failure occurs before the commit, the recovery
manager uses the log to undo the changes.
Deferred Modification:
• Changes are recorded in the log but not
applied to the database immediately.
Transaction Log:
Timestamp | Operation
| Data
T1
|
Read(A)
|
T1
|
Write(A)
| -10000
T1
|
Write(B)
| +10000
A+ includes {A}.
B+ includes {B}.
C+ includes {C}.
D+ includes {D}.
E+ includes {E}.
F+ includes {F}.
G+ includes {}.
b) Find the Candidate Key:
Transaction Log:
Timestamp|
T1
|
A+:
B+:
C+:
D+:
E+:
F+:
G+:
From the above, we can see that both ABC and AB
cover all attributes, and removing any attribute
from them will not cover all attributes. Therefore,
ABC and AB are candidate keys.
If the transaction is committed, the changes are
applied to the database.
c) Find the Canonical Cover:
The canonical cover is obtained by eliminating
redundant dependencies and ensuring
irreducibility.
Q6
Consider the following relation R(A, B,C,D,E,F) with
a set of functional dependencies: FD={ABC, BCD,
DEF, BC→ AG, ABG → DF }
1. Eliminate Redundant Dependencies:
o Remove ABG→DF as it is implied by
BC→AG and AB→DF.
2. Ensure Irreducibility:
o No further reduction is needed.
a) Find the closure of each determinant.
b) Find the candidate key.
c) Find the canonical cover.
Canonical Cover:
Ans.
{ABC→DEF, BC→AG}
Given Relation and Functional
Dependencies:
/////////////////////////////////////////////////////////
[22]
key column, establishing a relationship between the
two tables.
2 MARKS
1) In Relational model what do you mean by
cardinality?
6) Explain the following terms associated with
relational database design: Primary Key, Secondary
key, Foreign Key?
Ans. Cardinality refers to the relationship between the
number of tuples (rows) in one table and the number
of tuples in another table. The three common
cardinalities are:
•
Ans. Primary Key: A primary key is a unique
identifier for a record in a table. It ensures that
each record can be uniquely identified and is
used to establish relationships with other
tables.
•
Secondary Key: A secondary key is a candidate
key that is not selected as the primary key. It
provides an alternative means of accessing
data but may not be unique.
•
Foreign Key: A foreign key is a column or set of
columns in a table that refers to the primary
key of another table. It establishes a link
between the two tables, enforcing referential
integrity.
One-to-one
One-to-many
Many-to-many
2)
How can you map a conceptual model to a
relational model?
Ans. Mapping a conceptual model to a relational model
involves identifying entities, attributes, and
relationships in the conceptual model and
transforming them into tables, columns, and foreign
keys in the relational model. Each entity becomes a
table, attributes become columns, and relationships
are represented through foreign keys.
3)
7) What is ACID property?
Ans. ACID (Atomicity, Consistency, Isolation, Durability)
is a set of properties that guarantee reliable processing
of database transactions.
What is the use of DML in DBMS?
Ans. Data Manipulation Language (DML) in a Database
Management System (DBMS) is used to perform
operations like insertion, updating, retrieval, and
deletion of data in a database. It allows users and
applications to interact with the data stored in the
database.
Commands like INSERT, UPDATE, DELETE etc.
4)
•
Atomicity: Transactions are treated as a single,
indivisible unit of work.
•
Consistency: Transactions bring the database
from one consistent state to another.
•
Isolation: Transactions operate independently
of each other, ensuring that the outcome of
one transaction does not affect others.
•
Durability: Once a transaction is committed, its
effects are permanent and survive system
failures.
List two reasons why we may choose to define a
view?
1. Ans. To simplify complex queries: Views can be
used to simplify complex queries by providing
a virtual representation of the data, hiding the
underlying complexity.
8)
2. Security: Views can be used to restrict access
to certain columns or rows, ensuring that users
only see the data they are authorized to
access.
What is Phantom Phenomenon?
Ans. Phantom Phenomenon is a concurrency control
issue in a database where a transaction retrieves a set
of records based on a certain condition, and another
transaction inserts or deletes records that match the
condition before the first transaction completes. This
can result in the first transaction seeing "phantom"
records that were inserted or not seeing records that
were deleted.
5) A primary key if combined with a foreign key
creates what?
Ans. When a primary key from one table is combined
with a foreign key in another table, it creates a
referential integrity constraint. This ensures that values
in the foreign key column match values in the primary
9) What is the possible violation if an application
program uses isolation level "Repeatable Read"
[23]
Ans. The possible violation associated with the
isolation level "Repeatable Read" is the phenomenon
of non-repeatable reads. It occurs when a transaction
reads the same data multiple times during its
execution, but the data may have been modified or
deleted by other transactions between the reads.
14) Explain briefly about the object-oriented data
model?
• The object-oriented data model organizes data
into objects, which are instances of classes.
• Each object has attributes (data fields) and
methods (procedures).
• It supports encapsulation, inheritance, and
polymorphism, providing a way to model
complex real-world entities.
15) Define Foreign Key? Write an example to explain it.
• A Foreign Key is a column or a set of columns
in a table that refers to the primary key of
another table. It establishes a link between the
two tables.
• Example: If we have a "Orders" table with a
foreign key "CustomerID" referencing the
primary key "CustomerID" in the "Customers"
table, it ensures that each order is associated
with a valid customer.
16) Explain the role of DBA in DBMS?
10) Which protocol always ensures recoverable
schedule?
Ans. The Two-Phase Locking (2PL) protocol always
ensures a recoverable schedule. This protocol ensures
that transactions acquire all the locks they need before
releasing any locks and follows a strict protocol of
acquiring, using, and releasing locks, preventing the
possibility of a deadlock and ensuring recoverability.
11) What is metadata? Give an example?
•
Ans. Data that describes other data, providing
information about its
structure, content, usage, and management.
•
Example: In a library database, metadata might
include:
o
Book title, author, publication
date, ISBN
o
Member name, address, membership
status
Ans. A Database Administrator (DBA) is responsible for
managing and maintaining a database system. Their
role includes tasks like database design, security
management, data backup and recovery, performance
monitoring, and ensuring data integrity. DBAs play a
crucial role in the efficient and secure functioning of a
database.
o
Borrowing dates, due dates, renewal
history
17) Define a Transaction in database? Explain the dirty
read problem?
Ans. A transaction is a logical unit of work that consists
of one or more SQL statements. The ACID properties
(Atomicity, Consistency, Isolation, Durability) ensure
the reliability of transactions. The dirty read problem
occurs when one transaction reads uncommitted
changes made by another transaction, potentially
leading to inaccuracies if the second transaction is
rolled back.
12) Differentiate between schema and instance?
Ans.
•
A schema is the overall design of a database,
including its structure, constraints, and
relationships.
• An instance, on the other hand, is a snapshot
of the database at a specific moment,
representing the actual data stored in the
database.
13) Explain how update command works in SQL?
18) What is trivial functional dependency
• A trivial functional dependency is a
dependency where the independent attribute
or set of attributes functionally determines
itself or a subset of itself.
• For example, if A determines A in the context
of {A, B}, it is a trivial functional dependency.
19) Explain the use of hashing in Index structures?
Ans. The UPDATE command in SQL is used to modify
existing records in a table.
•
Syntax: UPDATE table_name SET column1 =
value1, column2 = value2, ... WHERE condition;
•
Example: UPDATE customers SET email =
'new_email@example.com' WHERE
customer_id = 123;
Ans. Hashing is a technique used in index structures to
map keys to locations, providing efficient retrieval of
records. Hash functions convert keys into hash codes,
and these codes determine the storage location.
Hashing is commonly used in hash indexes to speed up
[24]
Download