The Introduction of databases which is a collection of data has

advertisement
The Introduction of databases which is a shared collection of logically related data,
and a description of this data, designed to meet the information needs of an
Organisation, has changed the way the data is stored in organisations. At the early
stages of its introduction, Database Systems were used for managing data in business
applications, Record Keeping and Reservation Systems which had four requirements.
 Efficiency, in the access to and modification of large amounts of data.
 Resilience or the ability of the data to survive and kind of hardware and
software crashes.
 Access Control, Simultaneous access of data by multiple users in a consistent
manner and to ensure unauthorised access of data.
 Persistence, the maintenance of data over long periods of time independent of
any programs that access the data.
Database Management Systems which enables users to define, create,
maintain, and control access to the data, are now used in almost every computing
environment to organise, create and maintain important collections of information
In 1970’s there were two popular approaches to construct database
management systems the first approach, exemplified by IBM’s Information
Management Systems, has a data model that requires all data records to be assembled
into a collection of trees. Consequently, some records are root records and all others
have unique parent records. The query language permits an application programmer
to navigate from root records to the records of interest, accessing one record at a time.
The reason why IBM restricted IMS to the management of hierarchies of record was
to allow the use of serial storage devices, most notably magnetic tape which was the
requirement at that time. Even though it was one of the first commercial DBMS, IMS
is still the most preferred hierarchical DBMS used by most large mainframe
installations.
At that time, another significant development was the emergence of the IDS
(Integrated Data Store) from General Electric. These developments led to a new type
of database system know as Network DBMS. The Network model was modelled to
partly represent more complex data relationships than those which can be presented
by Hierarchical Database Systems. This was developed according to the standards of
the Conference on Data Systems Languages. They suggested that collection of DBMS
records be arranged into a directed graph. Again a navigational query language was
deigned by which an application program could move from a specific entry point
record to the desired information.
Both tree based (Hierarchical) and graph based (Network) approaches to data
management have several fundamental disadvantages. Consider the following
examples:
1. To answer a specific database request, an application programmer,
skilled in performing disk-oriented optimization, must write a complex program to
navigate through the database. For example, the company president cannot, at short
notice, receive a response to the query "How many employees in the Widget
department will retire in the next three years?" unless a program exists to count
departmental retirees.
2. When the structure of the database changes, as it will whenever new
kinds of information are added, application programs usually need to be rewritten.
As a result, the database systems of 1970 were costly to use because of the lowlevel interface between the application Program and the DBMS, and because the
dynamic nature of user data mandates continued program maintenance.
The relational data model, proposed by DR. E. F. Codd offered a
fundamentally different approach to data storage. It was introduced to overcome
several shortcomings of the earlier generations of hierarchical and network database
systems. These suffered from limited independence from application program.
Although physical design remains an important issue with relational systems, the use
of declarative query languages and a uniform logical data model insulates the
programmer from concerns with physical storage.
Earlier systems required a navigational approach to locating and accessing data. Both
earlier models rely on nested structures encapsulating one to many relationships. To
retrieve instances of related entity types, a program must start from a predefined
access point and locate a record of one type. It then iterates over the related records of
the next type, and so on until all required data has been found. In a situation where a
record can be found in more than one way, it depends on the efficiency of the
programmers’ ability to select the one that is appropriate to the query.
The Navigational approach is well suited to certain classes of problems
particularly those involving complex nested objects, recursive relationships and graph
traversal.
Codd’s main objective in developing the relational model was to provide a
sound theoretical basis for a simple means of manipulating stored data. The data is
represented by simple tabular data structures (relations), and that users access data
through a high-level, nonprocedural (or declarative) query language. Instead of
writing an algorithm to obtain desired records one at a time, the application
programmer is only required to specify a predicate that identifies the desired records
or combination of records. A query optimizer in the DBMS translates the predicate
specification into an algorithm to perform database access to solve the query.
The model was intended to maximize data independence and minimize
redundancy. Wherever possible, application programs should be shielded from
changes in the data structure and organisation of data. If possible all data values
should be stored only once, which not only minimises storage requirements, but also
avoids the possibilities of update anomalies when more than one copy of the value
needs to be changed.
The Relational Database Management System which is based on the
Relational Model developed by Dr. E.F. Codd has become dominant data processing
software in use today.
Databases are sometimes regarded as electronic islands that are distinct and
generally inaccessible places, like remote islands. This may be a result of
geographical separation, incompatible computer architectures, incompatible
communication protocols, and so on. Distributed DBMS’s should help resolve the
islands of information problem.
In the late 1970s there was a realization that organizations are fundamentally
decentralized and require databases at multiple sites. For example, information about
the California customers of a company might be stored on a machine in Los Angels,
while data about the New England customers could exist on a machine in Boston.
Such data distribution moves the data closer to the people who are responsible for it
and reduces remote communication costs. Furthermore, the decentralized system is
more likely to be available when crashes occur. If a single, central site goes down, all
data is unavailable. However, if one of several regional sites goes down, only part of
the total database is inaccessible. Moreover, if the company chooses to pay the cost of
multiple copies of important data, then a single site failure need not cause data
inaccessibility. This led to the emergence of Distributed Database Management
System.
A Distributed Database Management System (DDBMS) consists of single
logical database that is split into a number of fragments. Each fragment is stored on
one or more computers under the control of a separate DBMS, with the computers
connected by a communications network. Each site is capable of independently
processing user requests that require access to local data (that is, each site has some
degree of local autonomy) and is also capable of processing data stored on other
computers in the network.
Users access the distributed database via applications. Applications are
classified as those that do not require data from other sites ( Local Applications) and
those that do require data from other sites (Global Applications). We require a
DDBMS to have at least one global application.
A DDBMS therefore is a, A collection of logically related shared data which is split
into a number of fragments. These fragments may be replicated and are allocated to
sites which are linked. The data at each site is under the control of a DBMS which can
handle local applications autonomously. Each DBMS participates in at least one
global applications. It is not necessary that every site has it’s own local database, a
site can be used only for the application purpose as well.
Summary
The database management system has revolutionised the way data is stored,
managed and accessed. The introduction of the Network and Hierarchical Database
Management Systems has shifted the data processing from the traditional file based
systems to the Database environment. With the introduction of the Relational database
management system, the emphasis has been on data independence and redundancy.
The applications were now not required to understand the architecture as result of the
data independence. Then the era of Distributed Systems has dramatically changed the
way data is transferred from one place to another. The data transfer is now just
seconds away.
Database technology has taken us from a paradigm of data processing in which each
application defined and maintained its own data, to one in which data is defined and
administered centrally. During recent times, rapid developments in network and data
communication technology, epitomized by the internet, mobile and wireless
computing and intelligent devices. Now with the combination of these two
technologies, distributed database technology may change the mode of working from
centralized to decentralized.
What is a Distributed Database Management System?
A Distributed database is a collection of multiple, logically interrelated databases
distributed over a network. A Distributed Database Management System (Distributed
DBMS) is a software system that permits the management of distributed databases
and makes the distribution transparent to the users.
A DDBS is not a collection of files that can be individually stored at each node of a
computer network. To form a DDBS, files should not only be logically related but
there should be structure among the files, and access should be via common interface
Advantages OF DDBMS:
Transparent Management of Distributed and Replicated Data
A transparent system means hiding the implementation details from the user. The user
is unaware of the complexities which are involved in the management, storing and
retrieving of the data. The user can retrieve data from any site very easily depending
on the level of transparency. The advantage of a fully transparent DBMS is the high
level of support that it provides for the development of complex applications
Data Independence
Data Independence is fundamental form of transparency that we look for within a
DBMS. Data Definition can occur at two levels. At one level the logical structure of
the data is defined and at the other level the physical structure of the data is defined.
The former is commonly know as the schema definition, whereas the latter is referred
to as the physical data description. As a result we have two types of data
independence, logical data independence and the physical data independence. Logical
data independence refers to the immunity of user applications to changes in the
logical structure of the database. Physical data independence deals with hiding the
details of the storage structure from user applications.
Network Transparency
Only data needs to be shielded in case of a centralized database environment, but in
case of distributed database management environment apart from the data, another
resource also has to be shielded, that is the network. The user should be protected
from the operational details of the network and if possible the user should not be
aware of the existence of a network. This would result in the user not able to
differentiate between a distributed network and centralized network. This type of
transparency is called network transparency.
Replication Transparency
For performance, reliability and availability reasons, it is usually desirable to be able
to distribute data in a replicated fashion across the machines on a network. It helps in
performance since diverse and conflicting user requirements can be more easily
accommodated. For example, data that is commonly accessed by one user can be
placed on that user’s local as well as on the machine of another user with the same
access requirements. This increases the locality of reference. Furthermore, if one of
the machines fails, a copy of the data is still available on another machine on the
network.
If data is replicated, it is preferable for the user not to be aware of the existence of
multiple copies, but the practical implementation is a little bit difficult from that.
Fragmentation Transparency
In case of Distributed Database Management System it is generally advisable to
divide each database into a fragment and treat each fragment as a separate database
object. It is done for improving performance, reliability and availability.
Fragmentation can reduce the effects of replication. Each replica is not a full set but
only a subset of it.
When database objects are fragmented, we have to deal with queries which are based
on the entire relation but are now fragments. We have to ensure that the user is
unaware of that data is fragmented.
There are mainly two type of fragmentation. The first is horizontal fragmentation,
which consists of a subset of the tuples of a relation. Horizontal fragmentation groups
together the tuples in a relation that collectively used by the important transactions. A
horizontal fragment is produced by specifying a predicate that performs a restriction
on the tuples in the relation. This is done by using the selection operation in algebra.
The selection operation groups together tuples that have some common property.
The second type of fragmentation is vertical fragmentation, which consists of a subset
of the attributes of a relation. Vertical fragmentation groups together the attributes in a
relation that are used jointly by the important transactions. A vertical fragment is
defined using the projection operation of the relational algebra. Apart from horizontal
and vertical fragmentation we also have mixed fragmentation, which consists of
horizontal fragment that is subsequently vertically fragmented, or vertical
fragmentation that is subsequently horizontal fragmented. A mixed fragmentation is
defined using the selection and projection operation of the relational algebra.
Download