Design Requirements

Functional requirements: These represent the features a user of the designed system will be able to use. For example, the system will allow a user to search for content using the search bar.
Non-functional requirements (NFRs): The non-functional requirements are criteria based on which the user of a system will consider the system usable. NFR may include requirements like high availability, low latency, scalability, and so on.
Requirements Definitions

A service is scalable when remain performant in proportion to the resources added

If you have a performance problem, your system is slow for single user
If you have a scalability problem, you system is fast for a single user but slow under heavy load

Latency is time take from perform some action and receiving the result
Throughput is the number of such action or result per unit of time
Generally, you should aim for maximal throughput with acceptable latency.

Consistency - Every read receives the most recent write or an error
Availability - Every read receives a response, without guarantee it contain the most recent version of the information
Partition Tolerance - The system continues to operate despite arbitrary partitioning due to the network failures

Waiting for a response from a partitioned node might result in a timeout error.
- Good choice if requires ATOMIC read and writes
- Banking system

Responses return the most readily available version of the data available on any node, which might not be the latest.
- AP is a good choice if the business needs to allow for eventual consistency or when the system needs to continue working despite external errors.

Weak consistency

Eventual consistency

After a write, reads will eventually see it (typically in milliseconds). Data is replicated asynchronously.
- DNS and email
- Dynamo and Cassandra

Strong consistency

Active-passive
- Heartbeats are sent between the active and passive server. If the heartbeat interrupted, the passive server takes the IP address and resume services
- The length of downtime is determined by whether the passive service is already running, or needs to start up
Active-active
- Both server are managing traffic
- DNS should know both IPs

Fail-over add more hardware and additional complexity
Potential loss of data if the active system fail, before any newly written data can be replicated to the passive