Design a unique ID generator in distributed systems

Question

Step 1 - Understand the problem and establish design scope

Questions to ask

What are the characteristics of the unique IDs
For each new record, does ID increment by 1
Do IDs only contains numerical values
What is the ID length requirement
What is the scale of the system

Example requirement

ID must be unique
IDs are numerical values only
IDs fit into 64-bit
IDs are ordered by date
Ability to generate over 10,000 unique IDs per second

Step 2 - Propose high-level design and get buy-in

Multi-master replication

Relies on the auto_increment feature of database, where the ID increase by k (k = number of server in user) each time
Cons:
- Hard to scale with multiple data centers
- IDs do not go up with time across server
- It does not scale well when a server is added or removed

UUID

Pros:
- Scalable
- No synchronisation issues
Cons:
- IDs are 128 bits rather than 64
- IDs do not go up with time
- IDs could be non-numeric

Ticket server

Use auto_increment in a single database server
Pros:
- Numeric
- Easy to implement
Cons:
- SOF

Twitter snowflake approach

CleanShot 2024-10-19 at 18.26.46@2x.png

Step 3 - Design deep dive

Timestamp ensure the IDs can be ordered by date - 41 bits
- The maximum timestamp can be represented in 41 bits (in ms) is around 69 years
Sequence number - 12 bits
- Field is 0 unless more than one ID is generated per millisecond
- Maximum 4096 new IDs per ms
Datacenter ID - 5 bits
- 32 centers
Machine ID - 5 bits
1 extra bit for future uses, can distinguish positive and negative integers

Step 4 - Wrap up

Talk about clock synchronisation across timestamps
Modify the length of each section
Availability discussion - since this is a mission critical system