Caching in Java Microservices: Redis, Spring Cache & Invalidation

Caching is the highest-leverage performance tool in a microservice platform — and the easiest to get subtly, dangerously wrong. A good cache turns a 200ms database call into a 2ms lookup; a bad one serves stale data, stampedes your database when a hot key expires, or quietly drifts out of sync across instances. This deep dive covers caching in Java microservices: the Spring Cache abstraction, Redis as a distributed cache, invalidation, and the failure modes that cause incidents.

TL;DR: Use Spring’s cache abstraction over Redis for data shared across instances. Always set a TTL — a cache without expiry is a memory leak and a staleness bug. Cache-aside is the default pattern. Plan invalidation up front (it’s the hard part). Protect hot keys from stampedes, design for cache and Redis being unavailable, and never cache without measuring the hit rate.

Tailor your resume to a backend / Java role →

flowchart TD
  Req[Request] --> Q{In cache?}
  Q -->|hit| Ret[Return cached value]
  Q -->|miss| DB[(Database)]
  DB --> Pop[Populate cache with TTL]
  Pop --> Ret

Cache-aside: serve from the cache on a hit; load from the source and populate on a miss.

What to cache (and what not to)

Cache data that is read far more than written, expensive to produce, and tolerant of slight staleness: reference data, computed aggregates, the results of slow downstream calls, rendered fragments. Do not cache data that must be perfectly fresh (account balances at the moment of a transaction), is cheap to fetch anyway, or is unique per request (no reuse, so no hit). The cache only helps when the same value is read many times — measure the hit rate to confirm it’s earning its keep.

In-process vs distributed

	In-process (Caffeine)	Distributed (Redis)
Latency	Nanoseconds (local heap)	~1ms (network hop)
Shared across instances	No — each instance has its own	Yes
Survives restart	No	Yes
Capacity	Bounded by heap	Large, independent

Use Caffeine for small, hot, read-mostly data where per-instance copies are fine. Use Redis when instances must share state, the cache should survive restarts, or entries are too big to hold everywhere. High-throughput systems often layer both — a near cache (Caffeine) in front of Redis — to cut even the 1ms network hop for the hottest keys.

Spring’s cache abstraction over Redis

Spring decouples your code from the cache provider behind annotations. Add the Redis starter, enable caching, and annotate methods — switching providers later is configuration, not a rewrite.

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>

@Configuration
@EnableCaching
class CacheConfig {
  @Bean
  RedisCacheConfiguration cacheConfig() {
    return RedisCacheConfiguration.defaultCacheConfig()
        .entryTtl(Duration.ofMinutes(10))   // ALWAYS set a TTL
        .disableCachingNullValues();
  }
}

@Service
class ProductService {
  @Cacheable(cacheNames = "product", key = "#id")
  public Product byId(String id) { /* slow lookup, runs only on a miss */ }

  @CachePut(cacheNames = "product", key = "#p.id")
  public Product update(Product p) { /* refresh the cached value */ return save(p); }

  @CacheEvict(cacheNames = "product", key = "#id")
  public void delete(String id) { /* remove on delete */ }
}

@Cacheable returns the cached value on a hit and runs the method only on a miss; @CachePut always runs and updates the cache; @CacheEvict removes entries. The single most important line is the TTL — see below.

Caching patterns

Cache-aside (lazy loading) — the app checks the cache, and on a miss loads from the source and populates it. This is what @Cacheable does and the default for most workloads.
Write-through — writes go to the cache and the source together, keeping them consistent at the cost of write latency.
Write-behind — writes hit the cache and are flushed to the source asynchronously; fast but risks loss on failure. Use rarely and carefully.

TTLs and the staleness trade-off

Every cached entry needs a time-to-live. A TTL is your safety net: even if invalidation logic misses a case, stale data self-heals when the entry expires. The trade-off is directness — short TTLs mean fresher data but more misses (more load); long TTLs mean more staleness but better hit rates. Tune per data type: seconds for fast-moving data, hours for stable reference data. A cache with no TTL is both a memory leak and a guaranteed staleness bug.

Invalidation — the genuinely hard part

“There are only two hard things in computer science: cache invalidation and naming things.” In a distributed system the challenge is that data changes in one service while cached copies live in Redis (and maybe in each instance’s near cache). Strategies, roughly in order of strength:

TTL-only — accept staleness up to the TTL. Simplest; fine for tolerant data.
Explicit eviction — evict/update on write with @CacheEvict/@CachePut. Correct as long as all writers go through that path.
Event-driven invalidation — publish a change event (e.g. on Kafka or Redis pub/sub) so every instance invalidates its near cache when data changes elsewhere. Necessary when you run a local cache layer.

Key design matters too: namespace keys clearly (product:{id}), and avoid the temptation to “clear the whole cache” on any change — that turns one write into a stampede.

Cache stampede (thundering herd)

A popular key expires; simultaneously hundreds of requests miss and all hit the database to recompute the same value — a stampede that can knock over the very source you were protecting. Defenses:

Request coalescing — a per-key lock (Redis SETNX or Redisson’s lock) so only one request rebuilds the value while others wait or briefly serve stale.
Jittered TTLs — add randomness to expiry so many keys don’t expire at the same instant after a bulk load.
Stale-while-revalidate — serve the slightly-expired value immediately and refresh in the background.

Design for the cache failing

Redis is a dependency, and dependencies fail. A cache should be an optimization, not a single point of failure — if Redis is down, the service should fall back to the source (slower but working), not error out. Configure short Redis timeouts so a slow cache fails fast rather than adding latency to every request, and wrap cache access so a Redis outage degrades performance instead of availability. Test this path explicitly.

Operational pitfalls

Caching nulls/errors unintentionally — cache a “not found” and you can serve it long after the record appears. Decide deliberately (Spring’s disableCachingNullValues).
Big keys / big values — a multi-MB value or a giant collection in one key hurts Redis latency for everyone. Keep entries small.
Serialization mismatches — pin a stable serializer (e.g. JSON) so a class change doesn’t make existing cached entries unreadable across a deploy.
No eviction policy on Redis — set maxmemory and an eviction policy (e.g. allkeys-lru) so Redis sheds cold keys instead of OOMing.
Flying blind — export hit/miss metrics (Micrometer) and watch the hit rate; a cache with a 5% hit rate is just added complexity.

Takeaways

Caching done right is deliberate: cache read-heavy, expensive, staleness-tolerant data; reach for Redis when state must be shared; always set a TTL; plan invalidation before you ship; protect hot keys from stampedes; and make the system survive the cache being down. Spring’s cache abstraction makes the mechanics easy — the engineering is in the staleness, invalidation, and failure decisions, and in measuring the hit rate so you know the cache is actually paying for itself.

Frequently asked questions

When should I use a distributed cache like Redis instead of an in-memory cache?
Use an in-process cache (Caffeine) for small, hot, read-mostly data local to one instance. Use Redis when multiple service instances must share cached data, when the cache must survive restarts, or when entries are too large to hold per-instance. Many systems layer both (near cache + Redis).

What is a cache stampede and how do you prevent it?
A stampede happens when a popular key expires and many concurrent requests all miss and hit the database at once. Prevent it with request coalescing (a lock so only one thread refills), slightly randomized TTLs to avoid synchronized expiry, and optionally serving stale data while refreshing in the background.

Land your next Java role — tailor your resume with AI →