Design Requirements
- Functional requirements: These represent the features a user of the designed system will be able to use. For example, the system will allow a user to search for content using the search bar.
- Non-functional requirements (NFRs): The non-functional requirements are criteria based on which the user of a system will consider the system usable. NFR may include requirements like high availability, low latency, scalability, and so on.
- Requirements Definitions
Performance vs scalability
A service is scalable when remain performant in proportion to the resources added
- If you have a performance problem, your system is slow for single user
- If you have a scalability problem, you system is fast for a single user but slow under heavy load
Latency vs throughput
- Latency is time take from perform some action and receiving the result
- Throughput is the number of such action or result per unit of time
- Generally, you should aim for maximal throughput with acceptable latency.
Availability vs consistency
- Consistency - Every read receives the most recent write or an error
- Availability - Every read receives a response, without guarantee it contain the most recent version of the information
- Partition Tolerance - The system continues to operate despite arbitrary partitioning due to the network failures
CA
- Cannot exist in real-world applications due to the unavoidable network failure
CP
- Waiting for a response from a partitioned node might result in a timeout error.
- Good choice if requires ATOMIC read and writes
- Banking system
AP
- Responses return the most readily available version of the data available on any node, which might not be the latest.
- AP is a good choice if the business needs to allow for eventual consistency or when the system needs to continue working despite external errors.
Consistency Pattern
Weak consistency
- After a write, read may or may not see it.
- Memcached
- UDP
Eventual consistency
- After a write, reads will eventually see it (typically in milliseconds). Data is replicated asynchronously.
Strong consistency
- After a write, reads will see it. Data is replicated synchronously.
Availability patterns
Fail-over
- Active-passive
- Heartbeats are sent between the active and passive server. If the heartbeat interrupted, the passive server takes the IP address and resume services
- The length of downtime is determined by whether the passive service is already running, or needs to start up
- Active-active
- Both server are managing traffic
- DNS should know both IPs
Disadvantages
- Fail-over add more hardware and additional complexity
- Potential loss of data if the active system fail, before any newly written data can be replicated to the passive
Replication
Availability in numbers
- Three 9s
- 1m 26.4s downtime per day
- Four 9s
- 8.6s downtime per day