Cache

CleanShot 2024-09-25 at 17.14.51.png

Types of cache

Caches can be located on the client side (OS or browser), server side, or in a distinct cache layer.
CDNs are considered a type of cache.
Reverse proxies and caches such as Varnish can serve static and dynamic content directly. Web servers can also cache requests, returning responses without having to contact application servers.
Your database usually includes some level of caching in a default configuration, optimized for a generic use case. Tweaking these settings for specific usage patterns can further boost performance.

Application caching

In-memory caches such as Memcached and Redis are key-value stores between your application and your data storage. Since the data is held in RAM, it is much faster than typical databases where data is stored on disk. RAM is more limited than disk, so cache invalidation algorithms such as least recently used (LRU) can help invalidate 'cold' entries and keep 'hot' data in RAM.

High performance

We used consistent hashing. Finding a key under this algorithm requires a time complexity of O(log(N)), where N represents the number of cache shards.
Inside a cache server, keys are located using hash tables that require constant time on average.
The LRU eviction approach uses a constant time to access and update cache entries in a doubly linked list.
The communication between cache clients and servers is done through TCP and UDP protocols, which is also very fast.
Since we added more replicas, these can reduce the performance penalties that we have to face if there’s a high request load on a single machine.
An important feature of the design is adding, retrieving, and serving data from the RAM. Therefore, the latency to perform these operations is quite low.

Redis

Redis has the following additional features:

Persistence option
Built-in data structures such as sorted sets and lists

There are multiple levels you can cache that fall into two general categories: database queries and objects:

Row level
Query-level
Fully-formed serialisable objects
Fully-rendered HTML

Generally, you should try to avoid file-based caching, as it makes cloning and auto-scaling more difficult.

When to update cache

Cache-aside (Memcached)

Process
- Look for entry in cache, resulting in a cache miss
- Load entry from the database
- Add entry to cache
- Return entry
Only requested data is cached, which avoids filling up the cache with data that isn't requested.

Disadvantages

Each cache miss results in three trips, which can cause a noticeable delay.
Data can become stale if it is updated in the database.
- Can be mitigated by setting a TTL on cache

Write through

Process
- Application adds/updates entry in cache
- Cache synchronously writes entry to data store
- Return
Date in the cache is not stale
User should be more tolerant to latency when write data

Disadvantages

Slower due to the write
Most data written might never be read, which can be minimised with a TTL.

Write behind (write-back)

Process
- Add/update entry in cache
- Asynchronously write entry to the data store, improving write performance

Disadvantage(s): write-behind

There could be data loss if the cache goes down prior to its contents hitting the data store.
It is more complex to implement write-behind than it is to implement cache-aside or write-through.

Refresh-ahead

You can configure the cache to automatically refresh any recently accessed cache entry prior to its expiration.
Not accurately predicting which items are likely to be needed in the future can result in reduced performance than without refresh-ahead.

Cache Disadvantages

Need to maintain consistency between caches and the source of truth such as the database through cache invalidation.
Cache invalidation is a difficult problem, there is additional complexity associated with when to update the cache.
Need to make application changes such as adding Redis or memcached.

Memcached vs Redis

Memcached is preferred for smaller, simpler read-heavy systems, whereas Redis is useful for systems that are complex and are both read- and write-heavy.
Simplicity - Memcached need manual managing of the clusters, where as redis automate most of the data division tasks
Persistence - Redis is persistent
Data types - Memcached only supported object, redis support strings, hash maps, sorted set ...