Extending the Platform Cache Library: Distributed Locking, Cache Warming and Cache Stampede Protection

Introduction

In the previous article, we built a reusable caching library that provides:

  • L1 Cache using EhCache
  • L2 Cache using Redis or GemFire
  • Strategy Pattern based provider selection
  • Kafka and Spring Event based invalidation
  • Auto configuration
  • Metrics and monitoring

While this solution works well for most workloads, large-scale production systems face additional challenges:

  • Multiple pods rebuilding the same cache simultaneously
  • Cache stampede during peak traffic
  • Cold cache after deployments
  • Expensive database queries triggered repeatedly
  • Traffic spikes after cache expiration

To solve these problems, we will enhance our caching library with:

  • Distributed Locking
  • Cache Warming
  • Cache Stampede Protection
  • Double Check Locking
  • Randomized TTL
  • Background Refresh
  • Cache Preloading

The goal remains unchanged:

Microservices should only add a dependency and configure properties.

The library handles the complexity.


Current Architecture

Client
   |
Controller
   |
Service
   |
Platform Cache Library
   |
-------------------------
|                       |
L1 Cache           L2 Cache
EhCache       Redis/GemFire
|
Database

New Enhanced Architecture

Client
   |
Controller
   |
Service
   |
Platform Cache Library
   |
----------------------------
|            |             |
L1 Cache   L2 Cache    Lock Manager
EhCache   Redis/GemFire
                         |
                  Distributed Lock
                         |
                    Redis/GemFire

Additional Components:

Cache Warming Engine

Stampede Protection

Background Refresh Engine

Distributed Lock Service

Problem 1: Cache Stampede

Assume:

product:1001

Expires at:

10:00:00 AM

At:

10:00:01

5000 requests arrive.

Without protection:

5000 Requests
      |
Cache Miss
      |
5000 DB Calls

Result:

  • Database overload
  • Thread pool exhaustion
  • Increased latency

Solution: Distributed Locking

Only one pod should rebuild cache.

All other requests should wait.


Lock Provider Interface

Create a new package:

com.company.cache.lock

DistributedLockProvider

public interface DistributedLockProvider {

    boolean acquireLock(
            String lockKey,
            long timeoutSeconds);

    void releaseLock(
            String lockKey);

}

Redis Lock Implementation

@Component
@ConditionalOnProperty(
        name="cache.l2.type",
        havingValue="redis")
public class RedisLockProvider
        implements DistributedLockProvider {

    @Autowired
    private RedisTemplate<String,String>
            redisTemplate;

    @Override
    public boolean acquireLock(
            String lockKey,
            long timeout) {

        Boolean success =
                redisTemplate
                    .opsForValue()
                    .setIfAbsent(
                        lockKey,
                        "LOCKED",
                        Duration.ofSeconds(timeout));

        return Boolean.TRUE.equals(success);
    }

    @Override
    public void releaseLock(
            String lockKey) {

        redisTemplate.delete(lockKey);
    }
}

GemFire Lock Implementation

@Component
@ConditionalOnProperty(
        name="cache.l2.type",
        havingValue="gemfire")
public class GemFireLockProvider
        implements DistributedLockProvider {

    private final ConcurrentHashMap<
            String,
            ReentrantLock> locks =
            new ConcurrentHashMap<>();

    @Override
    public boolean acquireLock(
            String key,
            long timeout) {

        return locks
                .computeIfAbsent(
                        key,
                        k -> new ReentrantLock())
                .tryLock();
    }

    @Override
    public void releaseLock(
            String key) {

        locks.get(key).unlock();
    }
}

Enhanced Cache Manager

Current implementation:

L1

↓

L2

↓

DB

Updated flow:

L1

↓

L2

↓

Acquire Distributed Lock

↓

Check Cache Again

↓

Load DB

↓

Update Cache

↓

Release Lock

Double Check Locking

public <T> T get(
        String key,
        Supplier<T> supplier,
        long ttl) {

    Object local =
            localCache.get(key);

    if(local != null) {

        return (T)local;
    }

    Object distributed =
            provider.get(key);

    if(distributed != null) {

        localCache.put(key, distributed);

        return (T)distributed;
    }

    String lockKey =
            "lock:" + key;

    boolean acquired =
            lockProvider.acquireLock(
                    lockKey,
                    10);

    if(acquired) {

        try {

            distributed =
                    provider.get(key);

            if(distributed != null) {

                return (T)distributed;
            }

            T value =
                    supplier.get();

            if(value != null) {

                provider.put(
                        key,
                        value,
                        ttl);

                localCache.put(
                        key,
                        value);
            }

            return value;

        } finally {

            lockProvider.releaseLock(
                    lockKey);
        }

    }

    return waitForCache(key);
}

Waiting Requests Strategy

All non-lock owners wait.

private <T> T waitForCache(
        String key) {

    int retries = 10;

    while(retries-- > 0) {

        Object value =
                provider.get(key);

        if(value != null) {

            return (T)value;
        }

        try {

            Thread.sleep(100);

        } catch(Exception e) {

        }
    }

    return null;
}

This reduces:

5000 DB Calls

↓

1 DB Call

Problem 2: Simultaneous Expiry

Assume:

100000 keys

TTL:

300 Seconds

All expire together.

This creates traffic spikes.


Randomized TTL

Add jitter.


Utility Class

public class TtlUtil {

    public static long calculateTtl(
            long ttl,
            int jitterPercent) {

        long jitter =
                ThreadLocalRandom
                        .current()
                        .nextLong(
                            ttl *
                            jitterPercent / 100);

        return ttl + jitter;
    }
}

Update Cache Write

long finalTtl =
        TtlUtil.calculateTtl(
                ttl,
                20);

provider.put(
        key,
        value,
        finalTtl);

Example:

Base TTL = 300

Actual TTL

312
345
356
298
324

No mass expiration.


Problem 3: Cold Cache After Deployment

Deployment occurs.

All pods restart.

Cache becomes empty.

Every request hits DB.


Cache Warming

Preload cache during startup.


New Package

com.company.cache.warming

CacheWarmer Interface

public interface CacheWarmer {

    void warm();
}

Product Cache Warmer Example

@Component
public class ProductCacheWarmer
        implements CacheWarmer {

    @Autowired
    private ProductRepository repository;

    @Autowired
    private MultiLevelCacheManager cache;

    @Override
    public void warm() {

        repository.findTop100Products()
                .forEach(product -> {

                    cache.put(
                        "product:" +
                        product.getId(),
                        product,
                        3600);
                });
    }
}

Startup Listener

@Component
public class CacheWarmupRunner
        implements ApplicationRunner {

    @Autowired
    private List<CacheWarmer>
            warmers;

    @Override
    public void run(
            ApplicationArguments args) {

        warmers.forEach(
                CacheWarmer::warm);
    }
}

Microservice Configuration

cache.warming.enabled=true

Selective Warming

Warm only critical data.

Examples:

Products

Reference Data

Country Codes

Configuration

Feature Flags

Avoid warming:

Orders

Invoices

Transactions

Background Refresh

Instead of waiting for expiry.

Refresh proactively.


Refresh Configuration

cache.refresh.enabled=true

cache.refresh.interval=300

Refresh Scheduler

@Scheduled(
        fixedDelayString =
        "${cache.refresh.interval}")
public void refresh() {

    hotKeys.forEach(key -> {

        Object value =
                refreshProvider.load(key);

        cache.put(
                key,
                value,
                300);
    });
}

Tracking Hot Keys

Store frequently used keys.

ConcurrentHashMap<
        String,
        AtomicLong>
        hitCounter;

Auto Configuration Updates

@ConfigurationProperties(
        prefix="cache")
public class CacheProperties {

    private boolean warmingEnabled;

    private boolean refreshEnabled;

    private int refreshInterval;

    private int lockTimeout;

    private int ttlJitterPercent;

}

Updated Properties

Redis

cache.enabled=true

cache.l1.enabled=true

cache.l2.type=redis

cache.lock.enabled=true

cache.lock.timeout=10

cache.ttl.jitter=20

cache.warming.enabled=true

cache.refresh.enabled=true

cache.refresh.interval=300

cache.invalidation.type=kafka

GemFire

cache.enabled=true

cache.l2.type=gemfire

cache.lock.enabled=true

cache.lock.timeout=10

cache.warming.enabled=true

cache.refresh.enabled=true

New Metrics

Add:

cache.lock.acquired

cache.lock.failed

cache.warming.count

cache.refresh.count

cache.stampede.prevented

cache.wait.count

Micrometer:

Counter.builder(
        "cache.stampede.prevented")
       .register(registry)
       .increment();

Final Production Architecture

                 Client
                    |
               Load Balancer
                    |
        -------------------------
        |           |           |
       Pod1        Pod2        Pod3
        |           |           |
      EhCache     EhCache     EhCache
        \           |          /
         \          |         /
              Redis/GemFire
                    |
            Distributed Lock
                    |
                Database

Features Provided by Library:

✓ L1 Cache (EhCache)

✓ L2 Cache (Redis/GemFire)

✓ Distributed Locking

✓ Cache Stampede Protection

✓ Double Check Locking

✓ Randomized TTL

✓ Cache Warming

✓ Background Refresh

✓ Kafka Invalidation

✓ Spring Event Invalidation

✓ Metrics & Monitoring

✓ Auto Configuration


What Changes in Microservices?

Almost nothing.

Add dependency:

<dependency>
    <groupId>
        com.company.platform
    </groupId>
    <artifactId>
        platform-cache-lib
    </artifactId>
</dependency>

Configure:

cache.l2.type=redis

cache.warming.enabled=true

cache.lock.enabled=true

Optional:

@Component
public class ProductWarmer
        implements CacheWarmer {

    public void warm() {

        // preload important data
    }
}

The library handles everything else.


Conclusion

With these enhancements, the caching framework evolves from a simple L1/L2 cache abstraction into a full-fledged enterprise caching platform. It now protects downstream systems from cache stampedes, automatically warms critical data after deployments, coordinates cache rebuilds across pods using distributed locks, and proactively refreshes hot data before expiration.

The result is a production-ready caching platform that can be adopted consistently across dozens of Spring Boot microservices with minimal code changes and centralized operational control.

Leave a Reply

Your email address will not be published. Required fields are marked *