Design a rate limiter

#system_design

Rate Limiter

Benefits
- Prevent resource starvation caused by Denial of Service (DoS) attack.
- Reduced cost.
  - Especially for companies that relies on third party APIs
- Prevent server from being overloaded
  - Filter out bots or user's misbehaviour

Step 1 - Understand the problem and establish design scope

Questions to ask

What kind of rate limiter?
- Server or client side
Does rate limiter throttle based on IP, user ID or other properties
What is the scale
- Startup or large companies
Will the system wok in a distributed environment?
Is the rate limiter a seperate service or should it be implemented in application code?
Do we need to inform user who are throttled?

Example requirements

Accurately limit excessive requests
Low latency (not slow down HTTP response time)
Distributed rate limiting (across multiple services and processes)
High fault tolerance
Limited memory as possible
Exception handling (Show clear exception message to throttled users)

Step 2 - Propose high-level design and get buy-in

CleanShot 2024-10-16 at 13.33.25@2x.png

Can be implemented on client-side, server-side, or a middleware between client and server
- Usually not in client side, since the requests can be easily forged into malicious ones.
- Return HTTP 429 if exceeds, which indicates too many request from the client side
Since cloud micro-services have become widely used, rate limiting is usually implemented within a component called API gateway.

How to implement the rate limiter

Evaluate the technology stack, make sure current programming language is sufficient to implement rate limiter on the server-side.
Whether you are using a third-party gateway or not, determines whether you have the full control over the implementation and algorithms
Whether you are using micro-services or not
- Building own rate limiter takes time and effort, evaluate based on engineering resources

Algorithms

Token bucket

Details
- We have a bucket with predefined "tokens" capacity
- There is a "re-filler", which insert tokens of certain quantity to the bucket per time unit
- When there are not enough tokens, the request is dropped, else, the request is forwarded along the token
Algorithm
- Bucket size
- Re-fill rate - Number of token put into bucket every second
How many bucket do we need? (Clarify this requirement)
- Create new bucket for each API endpoints - such as friend request, post...
- Based on IP address
- If have a overall request limit, just have a global bucket shared by all requests
Pro:
- Easy to implement
- Memory efficient
- Allows burst of request as long as there are tokens left
Cons:
- Limited parameter to fine tune algorithm to more specific needs

Leaking bucket algorithm

Detail
- Similar to token bucket, but requests are processed at a fixed rate
- FIFO queue
  - When request arrives, check whether the queue is full, drop the request if full
  - Request are pulled from the queue and processed at regular intervals.
Algorithm
- Bucket size
- Outflow rate
Pros:
- Memory efficient (limited queue size)
- User cases where stable outflow is required
Cons:
- Burst of request fill up the queue with old requests first, if they are not processed in time, it will block the new requests

Fixed window counter

Details
- Divide timelines into fixed size time window, and assign counter to each window
- Each request increment the counter by one
- Once the counter reaches the predefined threshold, requests are dropped until a new time window starts
Pros:
- Memory efficient
- Easy to understand
- Setting quota per time interval fits certain use cases
Cons:
- Spike in request volume at the edges of time interval
  - The last few request from the previous time interval plus the first few request in the next interval, may exceed the quota of this period of interval (the interval starts from the middle of the previous interval to the middle of current interval)

Sliding window log

Details
- Keep tracks of request timestamps
  - Kept in cache
- When new request comes in, remove all outdated ones
- Add the timestamp for the new request to the log
Pros:
- Accurate
Cons:
- Memory heavy, even rejected timestamp might still be stored

Sliding window counter

Details
- Combines the previous two
- Requests in current window + requests in the previous window * overlap percentage of the rolling window and previous window
Pros:
- Smooth out spike of traffic, since the rate is based on the average rate of the previous window
- Memory efficient
Cons:
- Assumes that the request in previous window is evenly distributed

High level architecture

Keep a tracker on how many request from same user, IP address, etc.
Keep it in an In-memory cache to avoid slowness of disk access
- Redis, which offer two commands
- INCR: Increment the counter by 1
- EXPIRE: Sets a timeout for the counter, delete if exceeds the time

Step 3 - Design deep dive

CleanShot 2024-10-16 at 14.42.17@2x.png

Rate limiting rules

Lyft open-sourced their rate-limiting components:

domain: auth
descriptors:
	- key: auth_type
	  Value: login
	  rate_limit:
		  unit: minute
		  requests_per_unit: 5

Clients are not allowed to login more than 5 times in 1 minute

Exceeding the rate limit

API returns a HTTP response code 429.
In some cases you can keep the requests and process them later
- Information such as remaining number of allowed request within the window, number of second to wait make request are with the returned HTTP headers

Distributed environment

Race condition

Locks are the most obvious solution
- However significantly slow down the system
- Work arounds:
  - Lua script
  - Sorted sets data structure in Redis

Synchronisation issue

With multiple rate limiter, since the web tier is stateless, therefor the all rate limiter should have these information from Redis
- Use centralised data store like Redis

Performance optimisation

Multi-data center setup is crucial to server across different geo-locations
Use eventual consistency model

Monitory

Monitor the rate limiting algorithm and rules (to ensure they are effective)
- When too many valid request are dropped
- Or to accommodate spike in traffic

Design a rate limiter

Rate Limiter

Step 1 - Understand the problem and establish design scope

Questions to ask

Example requirements

Step 2 - Propose high-level design and get buy-in

How to implement the rate limiter

Algorithms

Token bucket

Leaking bucket algorithm

Fixed window counter

Sliding window log

Sliding window counter

High level architecture

Step 3 - Design deep dive

Rate limiting rules

Exceeding the rate limit

Distributed environment

Race condition

Synchronisation issue

Performance optimisation

Monitory

Step 4 - Wrap up

Additional discussions

Hard vs soft rate limiting

Rate limiting on different HTTP layer

Design client to avoid being rate limited