Skip to main content
All articles
Infrastructure··2 min read

Edge caching strategies that actually work at scale

How we reduced origin load by 94% using stale-while-revalidate, surrogate keys, and fine-grained cache invalidation.

DP
David Park
Infrastructure Engineer

We run a multi-tenant SaaS platform serving ~50M requests per month. Our origin servers were drowning. Database CPU sat at 80% during peak hours. API p99 latency hit 2.3 seconds.

The fix was not horizontal scaling. It was smarter caching.

The cache hierarchy

Modern edge platforms give you multiple cache layers:

  1. Browser cache: Cache-Control headers
  2. CDN edge: PoP-level caching
  3. Regional cache: Shared across nearby users
  4. Origin cache: Redis, Memcached, or in-memory

We use all four, with different TTLs and invalidation strategies.

Stale-while-revalidate

The most impactful change was aggressive stale-while-revalidate (SWR):

Cache-Control: public, max-age=60, stale-while-revalidate=86400

This means:

  • First request fetches from origin, caches for 60 seconds
  • Subsequent requests serve from cache instantly
  • After 60 seconds, the next request still serves stale data but triggers a background revalidation
  • Origin has 24 hours to update before cache truly expires

For our product catalog, this reduced origin requests by 89% with zero perceived stale data.

Surrogate key invalidation

The hard part of caching is invalidation. We tag every cached response with surrogate keys:

  • product:12345
  • category:electronics
  • pricing:tier-pro

When a product updates, we purge all responses tagged with product:12345. This invalidates the product page, category listings, search results, and recommendation carousels simultaneously.

Fine-grained API caching

GraphQL makes CDN caching tricky because every query is a POST to /graphql. We solved this by:

  1. Persisted queries: Every unique query gets a SHA256 hash and is served via GET
  2. Automatic query analysis: We extract entity types from the AST and attach surrogate keys
  3. Partial responses: Hot fields (prices, stock) are fetched separately from cold fields (descriptions, images)

Results

MetricBeforeAfter
Origin requests48M/month2.9M/month
API p99 latency2.3s89ms
Database CPU78%12%
CDN cache hit ratio34%97%

When not to cache

Caching is not universal. We never cache:

  • User-specific dashboards
  • Real-time financial data
  • Write operations
  • Admin panels

The key is understanding your data's consistency requirements and designing cache boundaries accordingly.

EdgeCachingVercelPerformanceCDN