Edge caching strategies that actually work at scale
How we reduced origin load by 94% using stale-while-revalidate, surrogate keys, and fine-grained cache invalidation.
We run a multi-tenant SaaS platform serving ~50M requests per month. Our origin servers were drowning. Database CPU sat at 80% during peak hours. API p99 latency hit 2.3 seconds.
The fix was not horizontal scaling. It was smarter caching.
The cache hierarchy
Modern edge platforms give you multiple cache layers:
- Browser cache:
Cache-Controlheaders - CDN edge: PoP-level caching
- Regional cache: Shared across nearby users
- Origin cache: Redis, Memcached, or in-memory
We use all four, with different TTLs and invalidation strategies.
Stale-while-revalidate
The most impactful change was aggressive stale-while-revalidate (SWR):
Cache-Control: public, max-age=60, stale-while-revalidate=86400
This means:
- First request fetches from origin, caches for 60 seconds
- Subsequent requests serve from cache instantly
- After 60 seconds, the next request still serves stale data but triggers a background revalidation
- Origin has 24 hours to update before cache truly expires
For our product catalog, this reduced origin requests by 89% with zero perceived stale data.
Surrogate key invalidation
The hard part of caching is invalidation. We tag every cached response with surrogate keys:
product:12345category:electronicspricing:tier-pro
When a product updates, we purge all responses tagged with product:12345. This invalidates the product page, category listings, search results, and recommendation carousels simultaneously.
Fine-grained API caching
GraphQL makes CDN caching tricky because every query is a POST to /graphql. We solved this by:
- Persisted queries: Every unique query gets a SHA256 hash and is served via GET
- Automatic query analysis: We extract entity types from the AST and attach surrogate keys
- Partial responses: Hot fields (prices, stock) are fetched separately from cold fields (descriptions, images)
Results
| Metric | Before | After |
|---|---|---|
| Origin requests | 48M/month | 2.9M/month |
| API p99 latency | 2.3s | 89ms |
| Database CPU | 78% | 12% |
| CDN cache hit ratio | 34% | 97% |
When not to cache
Caching is not universal. We never cache:
- User-specific dashboards
- Real-time financial data
- Write operations
- Admin panels
The key is understanding your data's consistency requirements and designing cache boundaries accordingly.