The Introduction of databases which is a shared collection of logically related data, and a description of this data, designed to meet the information needs of an Organisation, has changed the way the data is stored in organisations. At the early stages of its introduction, Database Systems were used for managing data in business applications, Record Keeping and Reservation Systems which had four requirements. Efficiency, in the access to and modification of large amounts of data. Resilience or the ability of the data to survive and kind of hardware and software crashes. Access Control, Simultaneous access of data by multiple users in a consistent manner and to ensure unauthorised access of data. Persistence, the maintenance of data over long periods of time independent of any programs that access the data. Database Management Systems which enables users to define, create, maintain, and control access to the data, are now used in almost every computing environment to organise, create and maintain important collections of information In 1970’s there were two popular approaches to construct database management systems the first approach, exemplified by IBM’s Information Management Systems, has a data model that requires all data records to be assembled into a collection of trees. Consequently, some records are root records and all others have unique parent records. The query language permits an application programmer to navigate from root records to the records of interest, accessing one record at a time. The reason why IBM restricted IMS to the management of hierarchies of record was to allow the use of serial storage devices, most notably magnetic tape which was the requirement at that time. Even though it was one of the first commercial DBMS, IMS is still the most preferred hierarchical DBMS used by most large mainframe installations. At that time, another significant development was the emergence of the IDS (Integrated Data Store) from General Electric. These developments led to a new type of database system know as Network DBMS. The Network model was modelled to partly represent more complex data relationships than those which can be presented by Hierarchical Database Systems. This was developed according to the standards of the Conference on Data Systems Languages. They suggested that collection of DBMS records be arranged into a directed graph. Again a navigational query language was deigned by which an application program could move from a specific entry point record to the desired information. Both tree based (Hierarchical) and graph based (Network) approaches to data management have several fundamental disadvantages. Consider the following examples: 1. To answer a specific database request, an application programmer, skilled in performing disk-oriented optimization, must write a complex program to navigate through the database. For example, the company president cannot, at short notice, receive a response to the query "How many employees in the Widget department will retire in the next three years?" unless a program exists to count departmental retirees. 2. When the structure of the database changes, as it will whenever new kinds of information are added, application programs usually need to be rewritten. As a result, the database systems of 1970 were costly to use because of the lowlevel interface between the application Program and the DBMS, and because the dynamic nature of user data mandates continued program maintenance. The relational data model, proposed by DR. E. F. Codd offered a fundamentally different approach to data storage. It was introduced to overcome several shortcomings of the earlier generations of hierarchical and network database systems. These suffered from limited independence from application program. Although physical design remains an important issue with relational systems, the use of declarative query languages and a uniform logical data model insulates the programmer from concerns with physical storage. Earlier systems required a navigational approach to locating and accessing data. Both earlier models rely on nested structures encapsulating one to many relationships. To retrieve instances of related entity types, a program must start from a predefined access point and locate a record of one type. It then iterates over the related records of the next type, and so on until all required data has been found. In a situation where a record can be found in more than one way, it depends on the efficiency of the programmers’ ability to select the one that is appropriate to the query. The Navigational approach is well suited to certain classes of problems particularly those involving complex nested objects, recursive relationships and graph traversal. Codd’s main objective in developing the relational model was to provide a sound theoretical basis for a simple means of manipulating stored data. The data is represented by simple tabular data structures (relations), and that users access data through a high-level, nonprocedural (or declarative) query language. Instead of writing an algorithm to obtain desired records one at a time, the application programmer is only required to specify a predicate that identifies the desired records or combination of records. A query optimizer in the DBMS translates the predicate specification into an algorithm to perform database access to solve the query. The model was intended to maximize data independence and minimize redundancy. Wherever possible, application programs should be shielded from changes in the data structure and organisation of data. If possible all data values should be stored only once, which not only minimises storage requirements, but also avoids the possibilities of update anomalies when more than one copy of the value needs to be changed. The Relational Database Management System which is based on the Relational Model developed by Dr. E.F. Codd has become dominant data processing software in use today. Databases are sometimes regarded as electronic islands that are distinct and generally inaccessible places, like remote islands. This may be a result of geographical separation, incompatible computer architectures, incompatible communication protocols, and so on. Distributed DBMS’s should help resolve the islands of information problem. In the late 1970s there was a realization that organizations are fundamentally decentralized and require databases at multiple sites. For example, information about the California customers of a company might be stored on a machine in Los Angels, while data about the New England customers could exist on a machine in Boston. Such data distribution moves the data closer to the people who are responsible for it and reduces remote communication costs. Furthermore, the decentralized system is more likely to be available when crashes occur. If a single, central site goes down, all data is unavailable. However, if one of several regional sites goes down, only part of the total database is inaccessible. Moreover, if the company chooses to pay the cost of multiple copies of important data, then a single site failure need not cause data inaccessibility. This led to the emergence of Distributed Database Management System. A Distributed Database Management System (DDBMS) consists of single logical database that is split into a number of fragments. Each fragment is stored on one or more computers under the control of a separate DBMS, with the computers connected by a communications network. Each site is capable of independently processing user requests that require access to local data (that is, each site has some degree of local autonomy) and is also capable of processing data stored on other computers in the network. Users access the distributed database via applications. Applications are classified as those that do not require data from other sites ( Local Applications) and those that do require data from other sites (Global Applications). We require a DDBMS to have at least one global application. A DDBMS therefore is a, A collection of logically related shared data which is split into a number of fragments. These fragments may be replicated and are allocated to sites which are linked. The data at each site is under the control of a DBMS which can handle local applications autonomously. Each DBMS participates in at least one global applications. It is not necessary that every site has it’s own local database, a site can be used only for the application purpose as well. Summary The database management system has revolutionised the way data is stored, managed and accessed. The introduction of the Network and Hierarchical Database Management Systems has shifted the data processing from the traditional file based systems to the Database environment. With the introduction of the Relational database management system, the emphasis has been on data independence and redundancy. The applications were now not required to understand the architecture as result of the data independence. Then the era of Distributed Systems has dramatically changed the way data is transferred from one place to another. The data transfer is now just seconds away. Database technology has taken us from a paradigm of data processing in which each application defined and maintained its own data, to one in which data is defined and administered centrally. During recent times, rapid developments in network and data communication technology, epitomized by the internet, mobile and wireless computing and intelligent devices. Now with the combination of these two technologies, distributed database technology may change the mode of working from centralized to decentralized. What is a Distributed Database Management System? A Distributed database is a collection of multiple, logically interrelated databases distributed over a network. A Distributed Database Management System (Distributed DBMS) is a software system that permits the management of distributed databases and makes the distribution transparent to the users. A DDBS is not a collection of files that can be individually stored at each node of a computer network. To form a DDBS, files should not only be logically related but there should be structure among the files, and access should be via common interface Advantages OF DDBMS: Transparent Management of Distributed and Replicated Data A transparent system means hiding the implementation details from the user. The user is unaware of the complexities which are involved in the management, storing and retrieving of the data. The user can retrieve data from any site very easily depending on the level of transparency. The advantage of a fully transparent DBMS is the high level of support that it provides for the development of complex applications Data Independence Data Independence is fundamental form of transparency that we look for within a DBMS. Data Definition can occur at two levels. At one level the logical structure of the data is defined and at the other level the physical structure of the data is defined. The former is commonly know as the schema definition, whereas the latter is referred to as the physical data description. As a result we have two types of data independence, logical data independence and the physical data independence. Logical data independence refers to the immunity of user applications to changes in the logical structure of the database. Physical data independence deals with hiding the details of the storage structure from user applications. Network Transparency Only data needs to be shielded in case of a centralized database environment, but in case of distributed database management environment apart from the data, another resource also has to be shielded, that is the network. The user should be protected from the operational details of the network and if possible the user should not be aware of the existence of a network. This would result in the user not able to differentiate between a distributed network and centralized network. This type of transparency is called network transparency. Replication Transparency For performance, reliability and availability reasons, it is usually desirable to be able to distribute data in a replicated fashion across the machines on a network. It helps in performance since diverse and conflicting user requirements can be more easily accommodated. For example, data that is commonly accessed by one user can be placed on that user’s local as well as on the machine of another user with the same access requirements. This increases the locality of reference. Furthermore, if one of the machines fails, a copy of the data is still available on another machine on the network. If data is replicated, it is preferable for the user not to be aware of the existence of multiple copies, but the practical implementation is a little bit difficult from that. Fragmentation Transparency In case of Distributed Database Management System it is generally advisable to divide each database into a fragment and treat each fragment as a separate database object. It is done for improving performance, reliability and availability. Fragmentation can reduce the effects of replication. Each replica is not a full set but only a subset of it. When database objects are fragmented, we have to deal with queries which are based on the entire relation but are now fragments. We have to ensure that the user is unaware of that data is fragmented. There are mainly two type of fragmentation. The first is horizontal fragmentation, which consists of a subset of the tuples of a relation. Horizontal fragmentation groups together the tuples in a relation that collectively used by the important transactions. A horizontal fragment is produced by specifying a predicate that performs a restriction on the tuples in the relation. This is done by using the selection operation in algebra. The selection operation groups together tuples that have some common property. The second type of fragmentation is vertical fragmentation, which consists of a subset of the attributes of a relation. Vertical fragmentation groups together the attributes in a relation that are used jointly by the important transactions. A vertical fragment is defined using the projection operation of the relational algebra. Apart from horizontal and vertical fragmentation we also have mixed fragmentation, which consists of horizontal fragment that is subsequently vertically fragmented, or vertical fragmentation that is subsequently horizontal fragmented. A mixed fragmentation is defined using the selection and projection operation of the relational algebra.