Caching is a data storage technique that plays an essential role in designing scalable internet applications. A cache is any data store that can store and retrieve data quickly for future use.

This enables faster response times and decreases the load on other parts of your system.

So why do we need caching?

Without caching computers and the internet would be impossibly slow due to the access time of retrieving data at every step. Caches take advantage of a principle called locality to store data closer to where it is and likely to be needed.

In a looser sense, caching can also refer to storing pre-computed data that would otherwise be difficult to serve on demand. For instance, in personalized news feeds and analytics reports.

How does caching work?

First, you can use an in-memory application cache. Storing data directly in the application's memory is a fast and straightforward option. Still, each server must maintain its own cache which increases overall memory demands and the cost of the system.

Second, you can use a distributed in-memory cache. For example, a separate caching server such as MemCache or Redis can be used to store data so that multiple servers can read and write from the same cache.

Finally, a file system cache. A file system cache also stores commonly accessed files. CDNs are one example of the distributed file system that takes advantage of geographic locality.

Caching policies

If caching is so great, why not cache everything? Well, there are two main reasons: cost and accuracy.

Since caching is meant to be fast and temporary, it's often implemented with more expensive and less resilient hardware than other types of storage. For this reason, caches are typically smaller than the primary data storage system. They must selectively choose which data to keep and which to remove or evict. id:: 63e51598-290b-40d3-a1fe-da464853c86c

The selection process known as a caching policy helps the cache free up space for the more relevant data that will be needed.

Some examples of caching strategies include first-in-first-out or FIFO. Like a queue, this policy evicts whichever item was added the longest to go and keeps the most recently added items. The second one is the Least Recently Used, LRU. this policy keeps track of when items were last retrieved and evicts whichever item has not been accessed recently. The third one is the Least Frequently Used, LFU. This policy tracks how often items are retrieved and evicts whichever item is used least frequently regardless of when it was last accessed.

What is cache coherence?

One final consideration is how to ensure appropriate cache consistency.

A [[write-through]] cache updates the cache and main memory simultaneously meaning there is no chance either can go out of date. It also simplifies the system.

And a [[write-behind]] cache, memory updates happen asynchronously. It may lead to inconsistency but it speeds things up a lot.

Another option is the [[cache-aside]] or [[lazy-loading]] where data is loaded into the cache on demand. First, the application checks the cache for the requested data. Then the application fetches the data from the data store and updates the cache if it is not there. This simple strategy keeps the data stored in the cache relatively relevant if you choose a cache eviction policy and a limited TTL combination that matches data access patterns.

How to design a cache

The three big decisions you will have to make when designing a cache

How big should the cache be?
How should I evict the cache data?
Which expiration policy should I choose?

This is a digest of a video on Youtube when I am practicing my English listening and dictation.
Database Caching for System Design Interviews

Database Caching Overview

Ace your system design interview

Table of contents

So why do we need caching?

How does caching work?

Caching policies

What is cache coherence?

How to design a cache