Distributed Database Definition: A distributed database is a database that consists of two or more files located in different sites either on the same network or on entirely different networks. Portions of the database are stored in multiple physical locations and processing is distributed among multiple database nodes. A centralized distributed database management system (DDBMS) integrates data logically so it can be managed as if it were all stored in the same location. The DDBMS synchronizes all the data periodically and ensures that data updates and deletes performed at one location will be automatically reflected in the data stored elsewhere. By contrast, a centralized database consists of a single database file located at one site using a single network. Distributed databases are designed to meet the workload requirements without having to make changes in the database application. Distributed databases resolve various issues, such as availability, fault tolerance, throughput, latency, scalability, and many other problems that can arise from using a single machine and a single database. Features of Distributed Database: In general, distributed databases include the following features: Location independent Distributed query processing: Distributed databases answer queries in a distributed environment that manages data at multiple sites. High-level queries are transformed into a query execution plan for simpler management. Distributed transaction management: Provides a consistent distributed database through commit protocols, distributed concurrency control techniques, and distributed recovery methods in case of many transactions and failures. Hardware independent Operating system independent Network independent Transaction transparency DBMS independent Advantages of Distributed Database: Modular Development: Modular development of a distributed database implies that a system can be expanded to new locations or units by adding new servers and data to the existing setup and connecting them to the distributed system without interruption. This type of expansion causes no interruptions in the functioning of distributed databases. Reliability: Distributed databases offer greater reliability in contrast to centralized databases. In case of a database failure in a centralized database, the system comes to a complete stop. In a distributed database, the system functions even when failures occur, only delivering reduced performance until the issue is resolved. Lower Communication Cost: Locally storing data reduces communication costs for data manipulation in distributed databases. Local data storage is not possible in centralized databases. Better Response: Efficient data distribution in a distributed database system provides a faster response when user requests are met locally. In centralized databases, user requests pass through the central machine, which processes all requests. The result is an increase in response time, especially with a lot of queries. Disadvantages of Distributed Database: Costly Software: Ensuring data transparency and coordination across multiple sites often requires using expensive software in a distributed database system. Large Overhead: Many operations on multiple sites requires numerous calculations and constant synchronization when database replication is used, causing a lot of processing overhead. Data Integrity: A possible issue when using database replication is data integrity, which is compromised by updating data at multiple sites. Improper Data Distribution: Responsiveness to user requests largely depends on proper data distribution. That means responsiveness can be reduced if data is not correctly distributed across multiple sites. Practical Application of Distributed Database: For instance, your banking systems are distributed systems. There are systems that are handling the actual transactions against your checking/saving accounts, entire distributed systems that deal with the processing of a credit or debit-card transaction, other systems that deal with mortgages/loans, yet they are all operating on your accounts. An example of this would be someone using a debit card to make a purchase. For illustration, let’s look at this diagram: With the advent of “chipped” cards, even your card is part of the distributed system - but we will discount you in this description. 1. The store has a Point-of-Sales (POS) system that is being used to handle the sale. This is a “System” in most cases. Unless you’re at a local neighborhood store, this system has many “Terminals” where each cashier sits, they will have local servers which then transfer data to corporate servers, etc. 2. The card reader may be part of the POS, or nowadays, it’s quite often a different system (aka Stripe, Square, etc). In this case, when you swipe the card or tap your phone, that “System” will send the data to both the POS and the Payment Processor. In many cases, this involves the “device” connecting to a gateway server and pushing data to that gateway, which then passes that on to other systems/servers. 3. The Payment Processor is another “System”. The card-reader is sending your information there. Their system will also be distributed amongst many tiers of servers, each one receiving the data, potentially transforming it and processing it in some way. Eventually they have to ship that data off to the Bank. 4. Your Bank is another “System”. The payment processor is sending data to the bank about the transaction you’re trying to make. This request arrives at a “gateway” server, which then potentially transforms that information into a format that makes sense for the Bank’s internal systems, then passes that on. Other servers/systems will receive that and check for fraud or other analytics, while other servers/systems are trying to see if there is available fund to cover the transaction. All of this comes together and finally some system/server gives the go-ahead to process the transaction, which is sent back to the Payment Processor. So, just by buying groceries and paying with a credit/debit card, you’re using a massive distributed system, where each “layer” is composed of yet more distributed systems.