CAP Theorem
- only applies to replicated databases
- choose between consistency and availability
Consistency
- every read would get the most up to date data
Availability
- Every single database node can respond to valid requests
Partition Tolerance
- Implied, always assumed
- If there are disconnection between system, it should still function
PACELC
- Given P, choose A or C
- AC is impossible, due to the inevitable network failure
- Else, favour Latency or Consistency
Advantages and disadvantages of a centralised database
Advantages
- Data maintenance, such as updating and taking backups of a centralised database, is easy.
- Centralised databases provide stronger consistency and ACID transactions than distributed databases.
- Centralised databases provide a much simpler programming model for the end programmers as compared to distributed databases.
- It’s more efficient for businesses that have a small amount of data to store that can reside on a single node.
Disadvantages
- A centralised database can slow down, causing high latency for end users, when the number of queries per second accessing the centralised database is approaching single-node limits.
- A centralised database has a single point of failure. Because of this, its probability of not being accessible is much higher.
Advantages and disadvantages of a distributed database
Advantages
- It’s fast and easy to access data in a distributed database because data is retrieved from the nearest database shard or the one frequently used.
- Data with different levels of distribution transparency can be stored in separate places.
- Intensive transactions consisting of queries can be divided into multiple optimised subqueries, which can be processed in a parallel fashion.
Disadvantages
- Sometimes, data is required from multiple sites, which takes more time than expected.
- Relations are partitioned vertically or horizontally among different nodes. Therefore, operations such as joins need to reconstruct complete relations by carefully fetching data. These operations can become much more expensive and complex.
- It’s difficult to maintain consistency of data across sites in the distributed database, and it requires extra measures.
- Updating and backups in distributed databases take time to synchronise data.