is a concept that a distributed database system can only have 2 of the 3: Consistency,
Availability and Partition Tolerance.
CAP Theorem is very important in the Big Data world, especially when we need to make trade off’s between the three.
This condition states that the system continues to run, despite the number of messages being delayed by the network between nodes.
A system that is partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data records are sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages. When dealing with modern distributed systems, Partition
Tolerance is not an option. It’s a necessity. Hence, we have to trade between Consistency and Availability.
This condition states that all nodes see the same data at the same time. Simply put, performing a read operation will return the value of the most recent write operation causing all nodes to return the same data. A system has consistency if a transaction starts with the system in a consistent state, and ends with the system in a consistent state. In this model, a system can (and does) shift into an inconsistent state during a transaction, but the entire transaction gets rolled back if there is an error during any stage in the process
3. High Availability
This condition states that every request gets a response on success/failure. Achieving availability in a distributed system requires that the system remains operational 100% of the time. Every client gets a response, regardless of the state of any individual node in the system. This metric is trivial to measure: either you can submit read/write commands, or you cannot. Hence, the databases are time independent as the nodes need to be available online at all times.
is a single logical unit of work which accesses and possibly modifies the contents of a database.
Transactions access data using read and write operations.
In order to maintain consistency in a database, before and after the transaction, certain properties are followed.
These are called ACID properties.
describes a set of properties which guarantee a database transaction is reliable.
is a theorem that describes how the laws of physics dictate that a distributed system MUST make a tradeoff among desirable characteristics.
As you can see, these terms technically refer to different things. The way in which they are related is that a distributed database which guarantees ACID transactions MUST choose consistency over availability according to the CAP Theorem (e.g. it is a CP system) .
If a distributed database chooses availability over consistency in accordance with the CAP Theorem (e.g. it is an AP system), it cannot provide ACID transactions
The more reliable your network, the lower the probability you’ll need to think about CAP.
Most NoSQL stores lack true ACID
Different CC models:
challenges for data management
1. Network Connection Dependency
M always have an internet connection in order to send files to the cloud and retrieve them.
2. Limited Features
Not all cloud providers are created equally. When you use cloud computing for storage and backup, you should ideally be working with a provider who offers the value of unlimited bandwidth. You may also experience limited storage space or accessibility. SaaS offerings may usually begin with a free package, but you will be charged for premium offerings and extra space
3. Loss of Control
You are, essentially, trusting another party to take care of your data. You are trusting that they will maintain their data centers and servers with the same care as you would, if not more. You have to trust that your provider’s data centers are compliant and secured both physically and online.
Cloud hacking cases as recent as the past few months have shown that not all cloud providers are as secure as they claim to be. As a business, you can’t afford to have sensitive information about your company or your clients fall victim to hackers.
Types of NoSQL databases and the name of the databases system that falls in that category are:
MongoDB falls in the category of NoSQL document based database.
Key value store: Memcached, Redis, Coherence
Tabular: Hbase, Big Table, Accumulo
Document based: MongoDB, CouchDB, Cloudant
When huge amount of data need to be stored and retrieved .
The relationship between the data you store is not that important
The data changing over time and is not structured.
Support of Constraints and Joins is not required at database level
The data is growing continuously and you need to scale the database regular to handle the data.