Caching is the highest-leverage performance tool in a microservice platform — and the easiest to get subtly, dangerously wrong. A good cache turns a 200ms database call into a 2ms lookup; a bad one serves stale data, stampedes your database when a hot key expires, or quietly drifts out of sync across instances. This deep dive covers caching in Java microservices: the Spring Cache abstraction, Redis as a distributed cache, invalidation, and the failure modes that cause incidents.
flowchart TD
Req[Request] --> Q{In cache?}
Q -->|hit| Ret[Return cached value]
Q -->|miss| DB[(Database)]
DB --> Pop[Populate cache with TTL]
Pop --> Ret
Cache data that is read far more than written, expensive to produce, and tolerant of slight staleness: reference data, computed aggregates, the results of slow downstream calls, rendered fragments. Do not cache data that must be perfectly fresh (account balances at the moment of a transaction), is cheap to fetch anyway, or is unique per request (no reuse, so no hit). The cache only helps when the same value is read many times — measure the hit rate to confirm it’s earning its keep.
| In-process (Caffeine) | Distributed (Redis) | |
|---|---|---|
| Latency | Nanoseconds (local heap) | ~1ms (network hop) |
| Shared across instances | No — each instance has its own | Yes |
| Survives restart | No | Yes |
| Capacity | Bounded by heap | Large, independent |
Use Caffeine for small, hot, read-mostly data where per-instance copies are fine. Use Redis when instances must share state, the cache should survive restarts, or entries are too big to hold everywhere. High-throughput systems often layer both — a near cache (Caffeine) in front of Redis — to cut even the 1ms network hop for the hottest keys.
Spring decouples your code from the cache provider behind annotations. Add the Redis starter, enable caching, and annotate methods — switching providers later is configuration, not a rewrite.
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
@Configuration
@EnableCaching
class CacheConfig {
@Bean
RedisCacheConfiguration cacheConfig() {
return RedisCacheConfiguration.defaultCacheConfig()
.entryTtl(Duration.ofMinutes(10)) // ALWAYS set a TTL
.disableCachingNullValues();
}
}
@Service
class ProductService {
@Cacheable(cacheNames = "product", key = "#id")
public Product byId(String id) { /* slow lookup, runs only on a miss */ }
@CachePut(cacheNames = "product", key = "#p.id")
public Product update(Product p) { /* refresh the cached value */ return save(p); }
@CacheEvict(cacheNames = "product", key = "#id")
public void delete(String id) { /* remove on delete */ }
}
@Cacheable returns the cached value on a hit and runs the method only on a miss; @CachePut always runs and updates the cache; @CacheEvict removes entries. The single most important line is the TTL — see below.
@Cacheable does and the default for most workloads.Every cached entry needs a time-to-live. A TTL is your safety net: even if invalidation logic misses a case, stale data self-heals when the entry expires. The trade-off is directness — short TTLs mean fresher data but more misses (more load); long TTLs mean more staleness but better hit rates. Tune per data type: seconds for fast-moving data, hours for stable reference data. A cache with no TTL is both a memory leak and a guaranteed staleness bug.
“There are only two hard things in computer science: cache invalidation and naming things.” In a distributed system the challenge is that data changes in one service while cached copies live in Redis (and maybe in each instance’s near cache). Strategies, roughly in order of strength:
@CacheEvict/@CachePut. Correct as long as all writers go through that path.Key design matters too: namespace keys clearly (product:{id}), and avoid the temptation to “clear the whole cache” on any change — that turns one write into a stampede.
A popular key expires; simultaneously hundreds of requests miss and all hit the database to recompute the same value — a stampede that can knock over the very source you were protecting. Defenses:
SETNX or Redisson’s lock) so only one request rebuilds the value while others wait or briefly serve stale.Redis is a dependency, and dependencies fail. A cache should be an optimization, not a single point of failure — if Redis is down, the service should fall back to the source (slower but working), not error out. Configure short Redis timeouts so a slow cache fails fast rather than adding latency to every request, and wrap cache access so a Redis outage degrades performance instead of availability. Test this path explicitly.
disableCachingNullValues).maxmemory and an eviction policy (e.g. allkeys-lru) so Redis sheds cold keys instead of OOMing.Caching done right is deliberate: cache read-heavy, expensive, staleness-tolerant data; reach for Redis when state must be shared; always set a TTL; plan invalidation before you ship; protect hot keys from stampedes; and make the system survive the cache being down. Spring’s cache abstraction makes the mechanics easy — the engineering is in the staleness, invalidation, and failure decisions, and in measuring the hit rate so you know the cache is actually paying for itself.
When should I use a distributed cache like Redis instead of an in-memory cache?
Use an in-process cache (Caffeine) for small, hot, read-mostly data local to one instance. Use Redis when multiple service instances must share cached data, when the cache must survive restarts, or when entries are too large to hold per-instance. Many systems layer both (near cache + Redis).
What is a cache stampede and how do you prevent it?
A stampede happens when a popular key expires and many concurrent requests all miss and hit the database at once. Prevent it with request coalescing (a lock so only one thread refills), slightly randomized TTLs to avoid synchronized expiry, and optionally serving stale data while refreshing in the background.