Uploaded by Kousik Maiti

DBMS PYQs

advertisement
DBMS PYQs
2018 – 19
10.a) Define catalog management in distributed database.
Ans: Efficient catalog management in distributed databases is critical to ensure satisfactory performance related to site
autonomy, view management, and data distribution and replication. Catalogs are databases themselves containing
metadata about the distributed database system.
Three popular management schemes for distributed catalogs are centralized catalogs, fully replicated catalogs,
and partitioned catalogs. The choice of the scheme depends on the database itself as well as the access patterns of the
applications to the underlying data.
•
Centralized Catalogs: In this scheme, the entire catalog is stored in one single site. Owing to its central nature, it is
easy to implement. On the other hand, the advantages of reliability, availability, autonomy, and distribution of
processing load are adversely impacted. For read operations from noncentral sites, the requested catalog data is
locked at the central site and is then sent to the requesting site. On completion of the read operation, an
acknowledgement is sent to the central site, which in turn unlocks this data. All update operations must be
processed through the central site. This can quickly become a performance bottleneck for write-intensive
applications.
•
Fully Replicated Catalogs: In this scheme, identical copies of the complete catalog are present at each site. This
scheme facilitates faster reads by allowing them to be answered locally. However, all updates must be broadcast to
all sites. Updates are treated as transactions and a centralized two-phase commit scheme is employed to ensure
catalog consistency. As with the centralized scheme, write-intensive applications may cause increased network
traffic due to the broadcast associated with the writes.
•
Partially Replicated Catalogs: The centralized and fully replicated schemes restrict site autonomy since they must
ensure a consistent global view of the catalog. Under the partially replicated scheme, each site maintains complete
catalog information on data stored locally at that site. Each site is also permitted to cache entries retrieved from
remote sites. However, there are no guarantees that these cached copies will be the most recent and updated. The
system tracks catalog entries for sites where the object was created and for sites that contain copies of this object.
Any changes to copies are propagated immediately to the original (birth) site. Retrieving updated copies to replace
stale data may be delayed until an access to this data occurs. In general, fragments of relations across sites should
be uniquely accessible. Also, to ensure data distribution transparency, users should be allowed to create synonyms
for remote objects and use these synonyms for subsequent referrals.
10.b) Explain the disadvantage of Distributed DBMS.
Ans:
•
The database is easier to expand as it is already spread across multiple systems and it is not too complicated to
add a system.
•
The distributed database can have the data arranged according to different levels of transparency i.e. data with
different transparency levels can be stored at different locations.
•
The database can be stored according to the departmental information in an organization. In that case, it is
easier for an organizational hierarchical access.
•
It is cheaper to create a network of systems containing a part of the database. This database can also be easily
increased or decreased.
•
Even if some of the data nodes go offline, the rest of the database can continue its normal functions.
11. Sort Notes:
a) Wait-Die and Wound-Wait:
Wait-Die Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already held with a conflicting lock by
another transaction, then one of the two possibilities may occur −
•
If TS(Ti) < TS(Tj) − that is Ti, which is requesting a conflicting lock, is older than Tj − then Ti is allowed to wait until
the data-item is available.
•
If TS(Ti) > TS(tj) − that is Ti is younger than Tj − then Ti dies. Ti is restarted later with a random delay but with the
same timestamp.
This scheme allows the older transaction to wait but kills the younger one.
Wound-Wait Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already held with conflicting lock by
some another transaction, one of the two possibilities may occur −
•
If TS(Ti) < TS(Tj), then Ti forces Tj to be rolled back − that is Ti wounds Tj. Tj is restarted later with a random delay
but with the same timestamp.
•
If TS(Ti) > TS(Tj), then Ti is forced to wait until the resource is available.
This scheme, allows the younger transaction to wait; but when an older transaction requests an item held by a younger
one, the older transaction forces the younger one to abort and release the item.
In both the cases, the transaction that enters the system at a later stage is aborted.
b) View and Conflict serializability in DBMS:
S.No. Conflict Serializability
1.
2.
Two schedules are said to be conflict equivalent if all
Two schedules are said to be view equivalent if the
the conflicting operations in both the schedule get
order of initial read, final write and update
executed in the same order. If a schedule is a conflict
operations is the same in both the schedules. If a
equivalent to its serial schedule then it is called
schedule is view equivalent to its serial schedule then
Conflict Serializable Schedule.
it is called View Serializable Schedule.
If a schedule is view serializable then it may or may
If a schedule is conflict serializable then it is also view
not be conflict serializable.
serializable schedule.
Conflict equivalence can be easily achieved by
3.
View Serializability
reordering the operations of two transactions
therefore, Conflict Serializability is easy to achieve.
View equivalence is rather difficult to achieve as both
transactions should perform similar actions in a
similar manner. Thus, View Serializability is difficult
to achieve.
S.No. Conflict Serializability
For a transaction T1 writing a value A that no one
4.
else reads but later some other transactions say T2
write its own value of A, W(A) cannot be placed
under positions where it is never read.
View Serializability
If a transaction T1 writes a value A that no other
transaction reads (because later some other
transactions say T2 writes its own value of A) W(A)
can be placed in positions of the schedule where it is
never read.
c) Tuple and Domain relation calculus:
S.
Basis of
No.
Comparison
1.
2.
Tuple Relational Calculus (TRC)
Domain Relational Calculus (DRC)
The Tuple Relational Calculus (TRC) is
The Domain Relational Calculus (DRC)
used to select tuples from a relation. The
employs a list of attributes from which to
tuples with specific range values, tuples
choose based on the condition. It’s similar
with certain attribute values, and so on
to TRC, but instead of selecting entire
can be selected.
tuples, it selects attributes.
Representation
In TRC, the variables represent the tuples
In DRC, the variables represent the value
of variables
from specified relations.
drawn from a specified domain.
A tuple is a single element of relation. In
A domain is equivalent to column data type
database terms, it is a row.
and any constraints on the value of data.
This filtering variable uses a tuple of
This filtering is done based on the domain
relations.
of attributes.
Definition
3.
Tuple/ Domain
4.
Filtering
The predicate expression condition
associated with the TRC is used to test
5.
Return Value
every row using a tuple variable and
return those tuples that met the
condition.
5.
DRC takes advantage of domain variables
and, based on the condition set, returns the
required attribute or column that satisfies
the criteria of the condition.
Membership
The query cannot be expressed using a
The query can be expressed using a
condition
membership condition.
membership condition.
S.
Basis of
No.
Comparison
Tuple Relational Calculus (TRC)
Domain Relational Calculus (DRC)
The QUEL or Query Language is a query
The QBE or Query-By-Example is query
language related to it,
language related to it.
It reflects traditional pre-relational file
It is more similar to logic as a modeling
structures.
language.
6.
Query Language
7.
Similarity
8.
Syntax
Notation: {T | P (T)} or {T | Condition (T)}
9.
Example
{T | EMPLOYEE (T) AND T.DEPT_ID = 10}
10.
Focus
11.
Variables
Uses tuple variables (e.g., t)
Uses scalar variables (e.g., a1, a2, …, an)
12.
Expressiveness
Less expressive
More expressive
13.
Ease of use
Easier to use for simple queries.
More difficult to use for simple queries.
Useful for selecting tuples that satisfy a
Useful for selecting specific values or for
certain condition or for retrieving a
constructing more complex queries that
subset of a relation.
involve multiple relations.
14.
Use case
Focuses on selecting tuples from a
relation
Notation: { a1, a2, a3, …, an | P (a1, a2, a3,
…, an)}
{ | < EMPLOYEE > DEPT_ID = 10 }
Focuses on selecting values from a relation
d) Checkpoint, Rollback and Commit:
Checkpoint: A checkpoint is a process that saves the current state of the database to disk. This includes all transactions
that have been committed, as well as any changes that have been made to the database but not yet committed. The
checkpoint process also includes a log of all transactions that have occurred since the last checkpoint. This log is used to
recover the database in the event of a system failure or crash.
When a checkpoint occurs, the DBMS will write a copy of the current state of the database to disk. This is done to ensure
that the database can be recovered quickly in the event of a failure. The checkpoint process also includes a log of all
transactions that have occurred since the last checkpoint. This log is used to recover the database in the event of a
system failure or crash.
Commit: COMMIT in SQL is a transaction control language that is used to permanently save the changes done in the
transaction in tables/databases. The database cannot regain its previous state after its execution of commit.
Rollback: ROLLBACK in SQL is a transactional control language that is used to undo the transactions that have not been
saved in the database. The command is only been used to undo changes since the last COMMIT.
Difference between COMMIT and ROLLBACK
COMMIT
ROLLBACK
1. COMMIT permanently saves the changes made
by the current transaction.
ROLLBACK undo the changes made by the current transaction.
2. The transaction can not undo changes after
COMMIT execution.
Transaction reaches its previous state after ROLLBACK.
3. When the transaction is successful, COMMIT is
applied.
When the transaction is aborted, incorrect execution, system
failure ROLLBACK occurs.
4. COMMIT statement permanently save the
state, when all the statements are executed
successfully without any error.
In ROLLBACK statement if any operations fail during the
completion of a transaction, it cannot permanently save the
change and we can undo them using this statement.
5. Syntax of COMMIT statement are:
Syntax of ROLLBACK statement are:
COMMIT;
ROLLBACK;
e) Reference architecture of distributed DBMS:
1. Data is distributed system are usually fragmented and replicated. Considering this fragmentation and replication
issue
2.The reference architecture of DBMS consist of the following schemas: o
o
o
o
A set of global external schema.
A global conceptual schema.
A fragmentation schema and allocation schema.
A set of schemas for each local DBMS.
Global external schema- In a distributed system, user applications and user access to the distributed database are
represented by a number of global external schemas. This is the topmost level in the reference architecture of Domestic
level describes the part of the distributed database that is relevant to different users.
Global conceptual schema- The GCS represents the logical description of entire database as if it is not distributed. This
level contains definitions of all entities, relationships among entities and security and integrity information of whole
databases stored at all sites in a distributed system.
Fragmentation schema and allocation schema- The fragmentation schema describes how the data is to be logically
partitioned in a distributed database. The GCS consists of a set of global relations, and the mapping between the global
relations and fragments is defined in the fragmentation schema.
The allocation schema is a description of where the data(fragments)are to be located, taking account of any replication.
The type of mapping in the allocation schema determined whether the distributed database is redundant or nonredundant. In case of redundant data distribution, the mapping is one to many, whereas in case of non-redundant data
distribution is one to one.
Local schemas- In a distributed database system, the physical data organization at each machine is probably different,
and therefore it requires an individual internal schema definition at each site, called local internal schema.
To handle fragmentation and replication issues, the logical organization of data at each site is described by a third layer
in the architecture called local conceptual schema.
The GCS is the union of all local conceptual schemas thus the local conceptual schemas are mappings of the global
schema onto each site. This mapping is done by local mapping schemas.
This architecture provides a very general conceptual framework for understanding distributed database.
2017 – 18
2. What is Data Dictionary in DBMS?
Ans: A data dictionary contains metadata i.e., data about the database. The data dictionary is very important as it
contains information such as what is in the database, who is allowed to access it, where is the database physically stored
etc. The users of the database normally don't interact with the data dictionary, it is only handled by the database
administrators.
The data dictionary in general contains information about the following −
•
Names of all the database tables and their schemas.
•
Details about all the tables in the database, such as their owners, their security constraints, when they were
created etc.
•
Physical information about the tables such as where they are stored and how.
•
Table constraints such as primary key attributes, foreign key information etc.
•
Information about the database views that are visible.
This is a data dictionary describing a table that contains employee details.
Field Name
Data Type
Field Size for display
Description
Example
EmployeeNumber
Integer
10
Unique ID of each employee
1645000001
Name
Text
20
Name of the employee
David Heston
Date of Birth
Date/Time
10
DOB of Employee
08/03/1995
Phone Number
Integer
10
Phone number of employee
6583648648
3. Explain two phase locking protocol.
Ans:
2PL locking protocol
Every transaction will lock and unlock the data item in two different phases.
•
Growing Phase − All the locks are issued in this phase. No locks are released, after all changes to data-items are
committed and then the second phase (shrinking phase) starts.
•
Shrinking phase − No locks are issued in this phase, all the changes to data-items are noted (stored) and then
locks are released.
The 2PL locking protocol is represented diagrammatically as follows −
In the growing phase transaction reaches a point where all the locks it may need has been acquired. This point is called
LOCK POINT.
After the lock point has been reached, the transaction enters a shrinking phase.
Two phase locking is of two types −
•
Strict two-phase locking protocol
A transaction can release a shared lock after the lock point, but it cannot release any exclusive lock until the transaction
commits. This protocol creates a cascade less schedule.
•
Rigorous two-phase locking protocol
A transaction cannot release any lock either shared or exclusive until it commits.
5. Discuss different levels of views.
Ans: There are mainly 3 levels of data abstraction:
Physical: This is the lowest level of data abstraction. It tells us how the data is actually stored in memory. The access
methods like sequential or random access and file organization
methods like B+ trees and hashing are used for the same.
Usability, size of memory, and the number of times the records
are factors that we need to know while designing the database.
Suppose we need to store the details of an employee. Blocks of
storage and the amount of memory used for these purposes are
kept hidden from the user.
Logical: This level comprises the information that is actually
stored in the database in the form of tables. It also stores the
relationship among the data entities in relatively simple
structures. At this level, the information available to the user at
the view level is unknown.
We can store the various attributes of an employee and
relationships, e.g., with the manager can also be stored.
View: This is the highest level of abstraction. Only a part of the actual database is viewed by the users. This level exists
to ease the accessibility of the database by an individual user. Users view data in the form of rows and columns. Tables
and relations are used to store data. Multiple views of the same database may exist. Users can just view the data and
interact with the database, storage and implementation details are hidden from them.
6. What is weak entity set? Discuss with suitable example.
Ans: Weak entities are represented with double rectangular box in the ER Diagram and the identifying relationships are
represented with double diamond. Partial Key attributes are represented with dotted lines.
Example-1:
In the below ER Diagram, ‘Payment’ is the weak entity. ‘Loan Payment’ is the identifying relationship and ‘Payment
Number’ is the partial key. Primary Key of the Loan along with the partial key would be used to identify the records.
7. a) Why distributed deadlocks occur?
Ans: Deadlock is a state of a database system having two or more transactions, when each transaction is waiting for a
data item that is being locked by some other transaction. A deadlock can be indicated by a cycle in the wait-for-graph.
This is a directed graph in which the vertices denote transactions and the edges denote waits for data items.
For example, in the following wait-for-graph, transaction T1 is waiting for data item X which is locked by T3. T3 is waiting
for Y which is locked by T2 and T2 is waiting for Z which is locked by T1. Hence, a waiting cycle is formed, and none of
the transactions can proceed executing.
7.b) What are distributed wait for graph and local wait for graph? How wait for graph helps in deadlock detection?
Under what circumstances global wait for graph biased deadlock handling leads to unnecessary rollback?
Ans:
Wait for Graph
o
This is the suitable method for deadlock detection. In this method, a graph is created based on the transaction
and their lock. If the created graph has a cycle or closed loop, then there is a deadlock.
o
The wait for the graph is maintained by the system for every transaction which is waiting for some data held by
the others. The system keeps checking the graph if there is any cycle in the graph.
The wait for a graph for the above scenario is shown below:
8.a) Define equivalence transformation.
Ans: The equivalence rule says that expressions of two forms are the same or equivalent because both expressions
produce the same outputs on any legal database instance. It means that we can possibly replace the expression of the
first form with that of the second form and replace the expression of the second form with an expression of the first
form. Thus, the optimizer of the query-evaluation plan uses such an equivalence rule or method for transforming
expressions into the logically equivalent one.
8.b) What is parametric query?
Ans: Parameterized SQL queries allow you to place parameters in an SQL query instead of a constant value. A parameter
takes a value only when the query is executed, which allows the query to be reused with different values and for
different purposes. Parameterized SQL statements are available in some analysis clients, and are also available through
the Historian SDK.
For example, you could create the following conditional SQL query, which contains a parameter for the collector name:
SELECT* FROM ihtags WHERE collectorname=? ORDER BY tagname
9.a) What do you mean by referential integrity?
Ans: Referential Integrity Rule in DBMS is based on Primary and Foreign Key. The Rule defines that a foreign key have a
matching primary key. Reference from a table to another table should be valid.
Referential Integrity Rule example −
<Employee>
EMP_ID
EMP_NAME
DEPT_ID
DEPT_NAME
DEPT_ZONE
<Department>
DEPT_ID
The rule states that the DEPT_ID in the Employee table has a matching valid DEPT_ID in the Department table.
To allow join, the referential integrity rule states that the Primary Key and Foreign Key have same data types.
10.c) Write down the basic time stamp methods.
Ans: Every transaction is issued a timestamp based on when it enters the system. Suppose, if an old transaction Ti has
timestamp TS(Ti), a new transaction Tj is assigned timestamp TS(Tj) such that TS(Ti) < TS(Tj). The protocol manages
concurrent execution such that the timestamps determine the serializability order. The timestamp ordering protocol
ensures that any conflicting read and write operations are executed in timestamp order. Whenever some
Transaction T tries to issue a R_item(X) or a W_item(X), the Basic TO algorithm compares the timestamp
of T with R_TS(X) & W_TS(X) to ensure that the Timestamp order is not violated. This describes the Basic TO protocol in
the following two cases.
1. Whenever a Transaction T issues a W_item(X) operation, check the following conditions:
o
If R_TS(X) > TS(T) or if W_TS(X) > TS(T), then abort and rollback T and reject the operation. else,
o
Execute W_item(X) operation of T and set W_TS(X) to TS(T).
2. Whenever a Transaction T issues a R_item(X) operation, check the following conditions:
o
If W_TS(X) > TS(T), then abort and reject T and reject the operation, else
o
If W_TS(X) <= TS(T), then execute the R_item(X) operation of T and set R_TS(X) to the larger of TS(T) and
current R_TS(X).
11.a) Distributed Transparency:
Distribution transparency is the property of distributed databases by the virtue of which the internal details of the
distribution are hidden from the users. The DDBMS designer may choose to fragment tables, replicate the fragments
and store them at different sites.
The three dimensions of distribution transparency are −
o
Location transparency
o
Fragmentation transparency
o
Replication transparency
Location Transparency
Location transparency ensures that the user can query on any table(s) or fragment(s) of a table as if they were stored
locally in the user’s site. The fact that the table or its fragments are stored at remote site in the distributed database
system, should be completely oblivious to the end user. The address of the remote site(s) and the access mechanisms
are completely hidden.
Fragmentation Transparency
Fragmentation transparency enables users to query upon any table as if it were unfragmented. Thus, it hides the fact
that the table the user is querying on is actually a fragment or union of some fragments. It also conceals the fact that the
fragments are located at diverse sites.
Replication Transparency
Replication transparency ensures that replication of databases are hidden from the users. It enables users to query upon
a table as if only a single copy of the table exists.
Replication transparency is associated with concurrency transparency and failure transparency. Whenever a user
updates a data item, the update is reflected in all the copies of the table. However, this operation should not be known
to the user. This is concurrency transparency. Also, in case of failure of a site, the user can still proceed with his queries
using replicated copies without any knowledge of failure. This is failure transparency.
11.b) Normalization
If a database design is not perfect, it may contain anomalies, which are like a bad dream for any database administrator.
Managing a database with anomalies is next to impossible.
•
Update anomalies − If data items are scattered and are not linked to each other properly, then it could lead to
strange situations. For example, when we try to update one data item having its copies scattered over several
places, a few instances get updated properly while a few others are left with old values. Such instances leave the
database in an inconsistent state.
•
Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because of unawareness,
the data is also saved somewhere else.
•
Insert anomalies − We tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring the database to a consistent state.
First Normal Form
First Normal Form is defined in the definition of relations (tables) itself. This rule defines that all the attributes in a
relation must have atomic domains. The values in an atomic domain are indivisible units.
We re-arrange the relation (table) as below, to convert it to First Normal Form.
Each attribute must contain only a single value from its pre-defined domain.
Second Normal Form
Before we learn about the second normal form, we need to understand the following −
•
Prime attribute − An attribute, which is a part of the candidate-key, is known as a prime attribute.
•
Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a non-prime attribute.
If we follow second normal form, then every non-prime attribute should be fully functionally dependent on prime key
attribute. That is, if X → A holds, then there should not be any proper subset Y of X, for which Y → A also holds true.
We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID. According to the rule, nonkey attributes, i.e. Stu_Name and Proj_Name must be dependent upon both and not on any of the prime key attribute
individually. But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID
independently. This is called partial dependency, which is not allowed in Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial dependency.
Third Normal Form
For a relation to be in Third Normal Form, it must be in Second Normal form and the following must satisfy −
•
No non-prime attribute is transitively dependent on prime key attribute.
•
For any non-trivial functional dependency, X → A, then either −
o
X is a superkey or,
o
A is prime attribute.
We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute. We find that City can
be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute. Additionally, Stu_ID →
Zip → City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as follows −
Boyce-Codd Normal Form
Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF states that −
•
For any non-trivial functional dependency, X → A, X must be a super-key.
In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the super-key in the relation
ZipCodes. So,
Stu_ID → Stu_Name, Zip
and
Zip → City
Which confirms that both the relations are in BCNF.
11.d) Fragmentation in DDBMS:
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table are called fragments.
Fragmentation can be of three types: horizontal, vertical, and hybrid (combination of horizontal and vertical). Horizontal
fragmentation can further be classified into two techniques: primary horizontal fragmentation and derived horizontal
fragmentation.
Fragmentation should be done in a way so that the original table can be reconstructed from the fragments. This is needed
so that the original table can be reconstructed from the fragments whenever required. This requirement is called
“reconstructiveness.”
Advantages of Fragmentation
• Since data is stored close to the site of usage, efficiency of the database system is increased.
• Local query optimization techniques are sufficient for most queries since data is locally available.
• Since irrelevant data is not available at the sites, security and privacy of the database system can be
maintained.
Disadvantages of Fragmentation
• When data from different fragments are required, the access speeds may be very low.
• In case of recursive fragmentations, the job of reconstruction will need expensive techniques.
• Lack of back-up copies of data in different sites may render the database ineffective in case of failure of a
site.
Vertical Fragmentation
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In order to maintain
reconstructiveness, each fragment should contain the primary key field(s) of the table. Vertical fragmentation can be used
to enforce privacy of data.
For example, let us consider that a University database keeps records of all registered students in a Student table having
the following schema.
STUDENT
Regd_No
Name
Course
Address
Semester
Fees
Marks
Now, the fees details are maintained in the accounts section. In this case, the designer will fragment the database as
follows −
CREATE TABLE STD_FEES AS
SELECT Regd_No, Fees
FROM STUDENT;
Horizontal Fragmentation
Horizontal fragmentation groups the tuples of a table in accordance to values of one or more fields. Horizontal
fragmentation should also confirm to the rule of reconstructiveness. Each horizontal fragment must have all columns of
the original base table.
For example, in the student schema, if the details of all students of Computer Science Course needs to be maintained at
the School of Computer Science, then the designer will horizontally fragment the database as follows −
CREATE COMP_STD AS
SELECT * FROM STUDENT
WHERE COURSE = "Computer Science";
Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are used. This is the most
flexible fragmentation technique since it generates fragments with minimal extraneous information. However,
reconstruction of the original table is often an expensive task.
Hybrid fragmentation can be done in two alternative ways −
•
•
At first, generate a set of horizontal fragments; then generate vertical fragments from one or more of the
horizontal fragments.
At first, generate a set of vertical fragments; then generate horizontal fragments from one or more of the
vertical fragments.
2016 – 17
7.b) Explain the advantages of Distributed DBMS.
Ans:
•
The database is easier to expand as it is already spread across multiple systems and it is not too complicated to add
a system.
•
The distributed database can have the data arranged according to different levels of transparency i.e data with
different transparency levels can be stored at different locations.
•
The database can be stored according to the departmental information in an organisation. In that case, it is easier
for a organisational hierarchical access.
•
there were a natural catastrophe such as fire or an earthquake all the data would not be destroyed it is stored at
different locations.
•
It is cheaper to create a network of systems containing a part of the database. This database can also be easily
increased or decreased.
•
Even if some of the data nodes go offline, the rest of the database can continue its normal functions.
9.a) What is Package in Oracle? What is its advantage.
Ans: A package is a collection object that contains definitions for a group of related small functions or programs.
It includes various entities like the variables, constants, cursors, exceptions, procedures, and many more. All packages
have a specification and a body.
Advantage
•
•
•
•
•
Helps in making the code modular.
Provides security by hiding the implementation details.
Helps in improving the functionality.
Makes it easy to use the pre-compiled code.
Allows the user to get quick authorization and access.
2015 – 16
3. Describe about view in SQL.
Ans: Views in SQL are kind of virtual tables. A view also has rows and columns as they are in a real table in the database.
We can create a view by selecting fields from one or more tables present in the database. A View can either have all the
rows of a table or specific rows based on certain condition.
We can create View using CREATE VIEW statement. A View can be created from a single table or multiple tables. Syntax:
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE condition;
6. Define the concept of aggregation.
Ans: In aggregation, the relation between two entities is treated as a single entity. In aggregation, relationship with its
corresponding entities is aggregated into a higher-level entity.
For example: Center entity offers the Course entity act as a single entity in the relationship which is in a relationship
with another entity visitor. In the real world, if a visitor visits a coaching center, then he will never enquiry about the
Course only or just about the Center instead he will ask the enquiry about both.
8.c) Write sort note on Trigger in Database.
Ans: A trigger is a stored procedure in database which automatically invokes whenever a special event in the database
occurs. For example, a trigger can be invoked when a row is inserted into a specified table or when certain table columns
are being updated.
Syntax:
create trigger [trigger_name]
[before | after]
{insert | update | delete}
on [table_name]
[for each row]
[trigger_body]
Explanation of syntax:
1.
2.
3.
4.
5.
6.
create trigger [trigger_name]: Creates or replaces an existing trigger with the trigger_name.
[before | after]: This specifies when the trigger will be executed.
{insert | update | delete}: This specifies the DML operation.
on [table_name]: This specifies the name of the table associated with the trigger.
[for each row]: This specifies a row-level trigger, i.e., the trigger will be executed for each row being affected.
[trigger_body]: This provides the operation to be performed as trigger is fired
BEFORE and AFTER of Trigger:
BEFORE triggers run the trigger action before the triggering statement is run. AFTER triggers run the trigger action after
the triggering statement is run.
10.a) What is Query Optimization? Write the steps of Query Optimization.
Ans: Query optimization is of great importance for the performance of a relational database, especially for the execution
of complex SQL statements. A query optimizer decides the best methods for implementing each query.
The query optimizer selects, for instance, whether or not to use indexes for a given query, and which join methods to
use when joining multiple tables. These decisions have a tremendous effect on SQL performance, and query
optimization is a key technology for every application, from operational Systems to data warehouse and analytical
systems to content management systems.
There is the various principle of Query Optimization are as follows −
•
Understand how your database is executing your query − The first phase of query optimization is
understanding what the database is performing. Different databases have different commands for this. For
example, in MySQL, one can use the “EXPLAIN [SQL Query]” keyword to see the query plan. In Oracle, one can
use the “EXPLAIN PLAN FOR [SQL Query]” to see the query plan.
•
Retrieve as little data as possible − The more information restored from the query, the more resources the
database is required to expand to process and save these records. For example, if it can only require to fetch
one column from a table, do not use ‘SELECT *’.
•
Store intermediate results − Sometimes logic for a query can be quite complex. It is possible to produce the
desired outcomes through the use of subqueries, inline views, and UNION-type statements. For those methods,
the transitional results are not saved in the database but are directly used within the query. This can lead to
achievement issues, particularly when the transitional results have a huge number of rows.
Steps of Query Optimization:
Query optimization involves three steps, namely query tree generation, plan generation, and query plan code
generation.
Step 1 − Query Tree Generation
A query tree is a tree data structure representing a relational algebra expression. The tables of the query are
represented as leaf nodes. The relational algebra operations are represented as the internal nodes. The root represents
the query as a whole.
During execution, an internal node is executed whenever its operand tables are available. The node is then replaced by
the result table. This process continues for all internal nodes until the root node is executed and replaced by the result
table.
For example, let us consider the following schemas −
EMPLOYEE
EmpID
EName
Salary
DeptNo
DateOfJoining
DEPARTMENT
DNo
DName
Location
Example 1
Let us consider the query as the following.
$$\pi_{EmpID} (\sigma_{EName = \small "ArunKumar"} {(EMPLOYEE)})$$
The corresponding query tree will be −
Example 2
Let us consider another query involving a join.
$\pi_{EName, Salary} (\sigma_{DName = \small "Marketing"} {(DEPARTMENT)}) \bowtie_{DNo=DeptNo}{(EMPLOYEE)}$
Following is the query tree for the above query.
Step 2 − Query Plan Generation
After the query tree is generated, a query plan is made. A query plan is an extended query tree that includes access
paths for all operations in the query tree. Access paths specify how the relational operations in the tree should be
performed. For example, a selection operation can have an access path that gives details about the use of B+ tree index
for selection.
Besides, a query plan also states how the intermediate tables should be passed from one operator to the next, how
temporary tables should be used and how operations should be pipelined/combined.
Step 3− Code Generation
Code generation is the final step in query optimization. It is the executable form of the query, whose form depends upon
the type of the underlying operating system. Once the query code is generated, the Execution Manager runs it and
produces the results.
Approaches to Query Optimization
Among the approaches for query optimization, exhaustive search and heuristics-based algorithms are mostly used.
Exhaustive Search Optimization
In these techniques, for a query, all possible query plans are initially generated and then the best plan is selected.
Though these techniques provide the best solution, it has an exponential time and space complexity owing to the large
solution space. For example, dynamic programming technique.
Heuristic Based Optimization
Heuristic based optimization uses rule-based optimization approaches for query optimization. These algorithms have
polynomial time and space complexity, which is lower than the exponential complexity of exhaustive search-based
algorithms. However, these algorithms do not necessarily produce the best query plan.
Some of the common heuristic rules are −
•
Perform select and project operations before join operations. This is done by moving the select and project
operations down the query tree. This reduces the number of tuples available for join.
•
Perform the most restrictive select/project operations at first before the other operations.
•
Avoid cross-product operation since they result in very large-sized intermediate tables.
10.b) What is schedule and when it is called conflict serializable?
Ans: A series of operations from one transaction to another transaction is known as a Schedule.
A schedule is called conflict serializable if it can be transformed into a serial schedule by swapping non-conflicting
operations.
Conflicting operations: Two operations are said to be conflicting if all conditions satisfy:
•
•
•
They belong to different transactions
They operate on the same data item
At Least one of them is a write operation
10.c) Describe briefly the process how one can test whether a non-serial-schedule is conflict serializable or not.
Ans: Consider the following schedule:
S1: R1(A), W1(A), R2(A), W2(A), R1(B), W1(B), R2(B), W2(B)
If Oi and Oj are two operations in a transaction and Oi< Oj (Oi is executed before Oj), same order will follow in the
schedule as well. Using this property, we can get two transactions of schedule S1:
T1: R1(A), W1(A), R1(B), W1(B)
T2: R2(A), W2(A), R2(B), W2(B)
Possible Serial Schedules are: T1->T2 or T2->T1
-> Swapping non-conflicting operations R2(A) and R1(B) in S1, the schedule becomes,
S11: R1(A), W1(A), R1(B), W2(A), R2(A), W1(B), R2(B), W2(B)
-> Similarly, swapping non-conflicting operations W2(A) and W1(B) in S11, the schedule becomes,
S12: R1(A), W1(A), R1(B), W1(B), R2(A), W2(A), R2(B), W2(B)
S12 is a serial schedule in which all operations of T1 are performed before starting any operation of T2. Since S has been
transformed into a serial schedule S12 by swapping non-conflicting operations of S1, S1 is conflict serializable.
Let us take another Schedule:
S2: R2(A), W2(A), R1(A), W1(A), R1(B), W1(B), R2(B), W2(B)
Two transactions will be:
T1: R1(A), W1(A), R1(B), W1(B)
T2: R2(A), W2(A), R2(B), W2(B)
Possible Serial Schedules are: T1->T2 or T2->T1
Original Schedule is as:
S2: R2(A), W2(A), R1(A), W1(A), R1(B), W1(B), R2(B), W2(B)
Swapping non-conflicting operations R1(A) and R2(B) in S2, the schedule becomes,
S21: R2(A), W2(A), R2(B), W1(A), R1(B), W1(B), R1(A), W2(B)
Similarly, swapping non-conflicting operations W1(A) and W2(B) in S21, the schedule becomes,
S22: R2(A), W2(A), R2(B), W2(B), R1(B), W1(B), R1(A), W1(A)
In schedule S22, all operations of T2 are performed first, but operations of T1 are not in order (order should be R1(A),
W1(A), R1(B), W1(B)). So S2 is not conflict serializable.
11. Sort Notes.
a) Distributed Failures:
Designing a reliable system that can recover from failures requires identifying the types of failures with which the
system has to deal. In a distributed database system, we need to deal with four types of failures: transaction failures
(aborts), site (system) failures, media (disk) failures, and communication line failures. Some of these are due to
hardware and others are due to software.
1. Transaction Failures: Transactions can fail for a number of reasons. Failure can be due to an error in the transaction
caused by incorrect input data as well as the detection of a present or potential deadlock. Furthermore, some
concurrency control algorithms do not permit a transaction to proceed or even to wait if the data that they attempt to
access are currently being accessed by another transaction. This might also be considered a failure.The usual approach
to take in cases of transaction failure is to abort the transaction, thus resetting the database to its state prior to the start
of this transaction.
2. Site (System) Failures: The reasons for system failure can be traced back to a hardware or to a software failure. The
system failure is always assumed to result in the loss of main memory contents. Therefore, any part of the database that
was in main memory buffers is lost as a result of a system failure. However, the database that is stored in secondary
storage is assumed to be safe and correct. In distributed database terminology, system failures are typically referred to
as site failures, since they result in the failed site being unreachable from other sites in the distributed system. We
typically differentiate between partial and total failures in a distributed system. Total failure refers to the simultaneous
failure of all sites in the distributed system; partial failure indicates the failure of only some sites while the others remain
operational.
3. Media Failures: Media failure refers to the failures of the secondary storage devices that store the database. Such
failures may be due to operating system errors, as well as to hardware faults such as head crashes or controller failures.
The important point is that all or part of the database that is on the secondary storage is considered to be destroyed and
inaccessible. Duplexing of disk storage and maintaining archival copies of the database are common techniques that deal
with this sort of catastrophic problem. Media failures are frequently treated as problems local to one site and therefore
not specifically addressed in the reliability mechanisms of distributed DBMSs.
4. Communication Failures There are a number of types of communication failures. The most common ones are the
errors in the messages, improperly ordered messages, lost messages, and communication line failures. The first two
errors are the responsibility of the computer network; we will not consider them further. Therefore, in our discussions of
distributed DBMS reliability, we expect the underlying computer network hardware and software to ensure that two
messages sent from a process at some originating site to another process at some destination site are delivered without
error and in the order in which they were sent. Lost or undeliverable messages are typically the consequence of
communication line failures or (destination) site failures. If a communication line fails, in addition to losing the
message(s) in transit, it may also divide the network into two or more disjoint groups. This is called network partitioning.
If the network is partitioned, the sites in each partition may continue to operate. In this case, executing transactions that
access data stored in multiple partitions becomes a major issue.
b) OODBMS and ORDBMS:
OODBMS
The object-oriented database system is an extension of an object-oriented programming language that includes DBMS
functions such as persistent objects, integrity constraints, failure recovery, transaction management, and query
processing. These systems feature object description language (ODL) for database structure creation and object query
language (OQL) for database querying. Some examples of OODBMS are ObjectStore, Objectivity/DB, GemStone, db4o,
Giga Base, and Zope object database.
ORDBMS
An object-relational database system is a relational database system that has been extended to incorporate objectoriented characteristics. Database schemas and the query language natively support objects, classes, and inheritance.
Furthermore, it permits data model expansion with new data types and procedures, exactly like pure relational systems.
Oracle, DB2, Informix, PostgreSQL (UC Berkeley research project), etc. are some of the ORDBMSs.
Difference between OODBMS and ORDBMS:
OODBMS
ORDBMS
It stands for Object Oriented Database Management
System.
It stands for Object Relational Database Management
System.
Object-oriented databases, like Object Oriented
Programming,
An object-relational database is one that is based on both
the
represent data in the form of objects and classes.
relational and object-oriented database models.
OODBMSs support ODL/OQL.
ORDBMS adds object-oriented functionalities to SQL.
Every object-oriented system has a different set of
constraints
Keys, entity integrity, and referential integrity are
constraints of
that it can accommodate.
an object-oriented database.
The efficiency of query processing is low.
Processing of queries is quite effective.
2015
2. Why BCNF is stronger than 3NF? “All candidate key(s) is / are super key(s), but all super key(s) is / are not candidate
key(s)” - justify.
Ans:
1.
2.
3.
4.
5.
BCNF is a normal form used in database normalization.
3NF is the third normal form used in database normalization.
BCNF is stricter than 3NF because each and every BCNF is relation to 3NF but every 3NF is not relation to BCNF.
BCNF non-transitionally depends on individual candidate key but there is no such requirement in 3NF.
Hence BCNF is stricter than 3NF
All candidate key(s) is / are super key(s), but all super key(s) is / are not candidate key(s)
•
•
•
•
•
A Super key is a single key or a group of multiple keys that can uniquely identify tuples in a table.
Super keys can contain redundant attributes that might not be important for identifying tuples.
Super keys are a superset of Candidate keys.
Candidate keys are a subset of Super keys. They contain only those attributes which are required to identify tuples
uniquely.
All Candidate keys are Super keys. But the vice-versa is not true.
3. What is functional dependency? Explain with an example.
Ans: Functional dependency refers to the relation of one attribute of the database to another. With the help of functional
dependency, the quality of the data in the database can be maintained.
The symbol for representing functional dependency is -> (arrow).
Example of Functional Dependency
Consider the following table.
Employee Number
Name
City
Salary
1
bob
Bangalore
25000
2
Lucky
Delhi
40000
The details of the name of the employee, salary and city are obtained by the value of the number of Employee (or id of
an employee). So, it can be said that the city, salary and the name attributes are functionally dependent on the attribute
Employee Number.
Example
SSN->ENAME read as SSN functionally dependent on ENAME or SSN
determines ENAME.
PNUMBER->{PNAME,PLOCATION} (PNUMBER determines PNAME and PLOCATION)
{SSN,PNUMBER}->HOURS (SSN and PNUMBER combined determines HOURS)
8.a) What are the criteria to be satisfied during fragmentation?
Ans: Fragmentation is a process of dividing the whole or full database into various sub tables or sub relations so that
data can be stored in different systems. The small pieces of sub relations or sub tables are called fragments. These
fragments are called logical data units and are stored at various sites. It must be made sure that the fragments are such
that they can be used to reconstruct the original relation (i.e., there isn’t any loss of data).
In the fragmentation process, let’s say, if a table T is fragmented and is divided into a number of fragments say T1, T2,
T3…. TN. The fragments contain sufficient information to allow the restoration of the original table T. This restoration
can be done by the use of UNION or JOIN operation on various fragments. This process is called data fragmentation. All
of these fragments are independent which means these fragments cannot be derived from others. The users needn’t be
logically concerned about fragmentation which means they should not concern that the data is fragmented and this is
called fragmentation Independence or we can say fragmentation transparency.
8.b) Describe how replication affects the implementation of distributed database.
Ans: Distributed Database Replication is the process of creating and maintaining multiple copies (redundancy) of data in
different sites. The main benefit it brings to the table is that duplication of data ensures faster retrieval. This eliminates
single points of failure and data loss issues if one site fails to deliver user requests, and hence provides you and your
teams with a fault-tolerant system.
However, Distributed Database Replication also has some disadvantages. To ensure accurate and correct responses to
user queries, data must be constantly updated and synchronized at all times. Failure to do so will create inconsistencies
in data, which can hamper business goals and decisions for other teams.
2015 – 16 (9.a)
Give an example of a relation schema R and a set of dependencies such that R is in BCNF, but is not in 4NF.
Ans: The relation schema R = (A, B, C, D, E) and the set of dependencies
A → BC
B → CD
E → AD
constitute a BCNF decomposition, however it is clearly not in 4NF. (It is BCNF because all FDs are trivial).
2016 – 17 (5)
What is indexing and what are the different types of indexing?
Ans:
•
•
Indexing is used to optimize the performance of a database by minimizing the number of disk accesses required
when a query is processed.
The index is a type of data structure. It is used to locate and access the data in a database table quickly.
Indexes can be created using some database columns.
•
The first column of the database is the search key that contains a copy of the primary key or candidate key of the
table. The values of the primary key are stored in sorted order so that the corresponding data can be accessed
easily.
•
The second column of the database is the data reference. It contains a set of pointers holding the address of the
disk block where the value of the particular key can be found.
(Extra)
Recovery in DBMS.
Crash Recovery
DBMS is a highly complex system with hundreds of transactions being executed every second. The durability and
robustness of a DBMS depends on its complex architecture and its underlying hardware and system software. If it fails or
crashes amid transactions, it is expected that the system would follow some sort of algorithm or techniques to recover
lost data.
Failure Classification
To see where the problem has occurred, we generalize a failure into various categories, as follows −
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from where it can’t go any further. This is
called transaction failure where only a few transactions or processes are hurt.
Reasons for a transaction failure could be −
•
Logical errors − Where a transaction cannot complete because it has some code error or any internal error
condition.
•
System errors − Where the database system itself terminates an active transaction because the DBMS is not
able to execute it, or it has to stop because of some system condition. For example, in case of deadlock or
resource unavailability, the system aborts an active transaction.
System Crash
There are problems − external to the system − that may cause the system to stop abruptly and cause the system to
crash. For example, interruptions in power supply may cause the failure of underlying hardware or software failure.
Examples may include operating system errors.
Disk Failure
In early days of technology evolution, it was a common problem where hard-disk drives or storage drives used to fail
frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other failure, which
destroys all or a part of disk storage.
Storage Structure
We have already described the storage system. In brief, the storage structure can be divided into two categories −
•
Volatile storage − As the name suggests, a volatile storage cannot survive system crashes. Volatile storage
devices are placed very close to the CPU; normally they are embedded onto the chipset itself. For example, main
memory and cache memory are examples of volatile storage. They are fast but can store only a small amount of
information.
•
Non-volatile storage − These memories are made to survive system crashes. They are huge in data storage
capacity, but slower in accessibility. Examples may include hard-disks, magnetic tapes, flash memory, and nonvolatile (battery backed up) RAM.
Recovery and Atomicity
When a system crashes, it may have several transactions being executed and various files opened for them to modify
the data items. Transactions are made of various operations, which are atomic in nature. But according to ACID
properties of DBMS, atomicity of transactions as a whole must be maintained, that is, either all the operations are
executed or none.
When a DBMS recovers from a crash, it should maintain the following −
•
It should check the states of all the transactions, which were being executed.
•
A transaction may be in the middle of some operation; the DBMS must ensure the atomicity of the transaction
in this case.
•
It should check whether the transaction can be completed now or it needs to be rolled back.
•
No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as maintaining the atomicity of a
transaction −
•
Maintaining the logs of each transaction, and writing them onto some stable storage before actually modifying
the database.
•
Maintaining shadow paging, where the changes are done on a volatile memory, and later, the actual database is
updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a transaction. It is important that the
logs are written prior to the actual modification and stored on a stable storage media, which is failsafe.
Log-based recovery works as follows −
•
The log file is kept on a stable storage media.
•
When a transaction enters the system and starts execution, it writes a log about it.
<Tn, Start>
•
When the transaction modifies an item X, it writes logs as follows −
<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V1 to V2.
•
When the transaction finishes, it logs −
<Tn, commit>
The database can be modified using two approaches −
•
Deferred database modification − All logs are written on to the stable storage and the database is updated
when a transaction commits.
•
Immediate database modification − Each log follows an actual database modification. That is, the database is
modified immediately after every operation.
Recovery with Concurrent Transactions
When more than one transaction are being executed in parallel, the logs are interleaved. At the time of recovery, it
would become hard for the recovery system to backtrack all logs, and then start recovering. To ease this situation, most
modern DBMS use the concept of 'checkpoints'.
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the memory space available in the
system. As time passes, the log file may grow too big to be handled at all. Checkpoint is a mechanism where all the
previous logs are removed from the system and stored permanently in a storage disk. Checkpoint declares a point
before which the DBMS was in consistent state, and all the transactions were committed.
Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the following manner −
•
The recovery system reads the logs backwards from the end to the last checkpoint.
•
It maintains two lists, an undo-list and a redo-list.
•
If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it puts the transaction
in the redo-list.
•
If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the transaction in
undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the transactions in the redo-list and
their previous logs are removed and then redone before saving their logs.
Download