Caching, Batching, and Queueing
Trade consistency for latency and throughput; decouple producers from consumers.
TL;DR
Caching stores expensive computations in fast storage, trading consistency for latency. Use TTLs and cache-aside patterns. Batching combines multiple requests into one for efficiency—useful for database writes, API calls. Queueing decouples producers from consumers, protecting against overload—use for async tasks, background jobs. All three trade simplicity for performance; apply carefully. Cache invalidation is hard; prefer TTL-based expiration. Batch size trades latency (small batches = low latency) for throughput (large batches = high throughput).
Learning Objectives
- Design cache strategies (cache-aside, write-through, write-behind)
- Implement cache invalidation and TTL policies
- Identify opportunities for batching API calls, database writes, and event processing
- Use queues for decoupling and resilience
- Balance consistency, latency, and throughput trade-offs
- Monitor cache hit rates and queue depths
Motivating Scenario
A recommendation engine calls a database for every product view—1000 requests/sec → 1000 database queries/sec. Adding Redis caching (80% hit rate) reduces to 200 queries/sec. Batching write operations cuts database load by 10x. Using a queue for async recommendations decouples the view from computation. Together, these patterns drop latency from 500ms to 50ms and allow horizontal scaling.
Core Concepts
Caching Patterns
- Check cache first; if miss, fetch from source and populate cache
- Pros: Only caches accessed data; simple
- Cons: Cold cache on startup; cache misses add latency
- Best for: Read-heavy, evolving data (products, profiles)
- Write to cache AND source simultaneously
- Pros: Cache always consistent with source
- Cons: Write latency = max(cache, source); double write cost
- Best for: Critical data where stale cache is unacceptable
- Write to cache immediately; async sync to source
- Pros: Fast writes; async durability
- Cons: Data loss risk if cache fails before sync
- Best for: Non-critical data, high write throughput (logs, metrics)
- Proactively refresh cache before TTL expires
- Pros: Always warm cache; no miss latency
- Cons: Refresh unnecessary data; complexity
- Best for: Popular, predictable data (top products)
Cache Invalidation Strategies
Cache invalidation is hard. Choose one:
- TTL (Time-To-Live): Expire after N seconds. Simple; risks staleness.
- Versioning: Include version in key (e.g.,
product:v2:123). Clean; requires upstream updates. - Event-Driven: Listen to change events; invalidate on updates. Consistent; adds complexity.
- Broadcast Invalidation: Cluster-wide delete on write. Strong consistency; network overhead.
Batching
Combine multiple requests to reduce overhead:
- Database writes: Instead of 100 individual INSERTs, one BATCH INSERT.
- API calls: Combine 10 product lookups into one
/products?ids=1,2,3.... - Event processing: Group 1000 events into one batch for Kafka producer.
Trade-off: Batch size vs. latency. Size 1 = low latency, low throughput. Size 1000 = high latency, high throughput.
Queueing
Decouple producers (request handlers) from consumers (workers):
Request → API → Queue → Workers → Database
Benefits:
- Resilience: If workers are slow, queue absorbs load.
- Scalability: Add workers without changing API.
- Decoupling: API doesn't care how work is processed.
Challenges:
- Ordering: Some queues don't preserve order.
- Durability: Queue failure = lost messages (unless persistent).
- Latency: Queue adds round-trip time.
Practical Example
- Cache-Aside Pattern (Python)
- Batching (Database)
- Queueing (RabbitMQ)
import redis
import json
from typing import Optional
class ProductCache:
def __init__(self, redis_url="redis://localhost"):
self.redis = redis.from_url(redis_url)
self.ttl = 3600 # 1 hour
def get(self, product_id: int) -> Optional[dict]:
"""Get product from cache or fetch from DB."""
cache_key = f"product:{product_id}"
# Try cache first
cached = self.redis.get(cache_key)
if cached:
print(f"Cache hit for {product_id}")
return json.loads(cached)
# Cache miss: fetch from database
print(f"Cache miss for {product_id}")
product = self._fetch_from_db(product_id)
if product:
# Populate cache
self.redis.setex(
cache_key,
self.ttl,
json.dumps(product)
)
return product
def invalidate(self, product_id: int):
"""Invalidate cache on update."""
cache_key = f"product:{product_id}"
self.redis.delete(cache_key)
def _fetch_from_db(self, product_id: int) -> Optional[dict]:
# Simulate database fetch
return {"id": product_id, "name": "Widget"}
# Usage
cache = ProductCache()
product = cache.get(1) # Cache miss, fetches from DB
product = cache.get(1) # Cache hit
cache.invalidate(1) # Invalidate on update
import psycopg2
from typing import List
def insert_logs_unbatched(logs: List[dict]):
"""Slow: N database round-trips."""
conn = psycopg2.connect("dbname=mydb")
cur = conn.cursor()
for log in logs:
# N separate INSERT statements
cur.execute(
"INSERT INTO logs (user_id, action) VALUES (%s, %s)",
(log["user_id"], log["action"])
)
conn.commit()
conn.close()
# 10,000 logs = 10,000 queries
def insert_logs_batched(logs: List[dict], batch_size=1000):
"""Fast: Batch INSERT statements."""
conn = psycopg2.connect("dbname=mydb")
cur = conn.cursor()
for i in range(0, len(logs), batch_size):
batch = logs[i:i+batch_size]
# Multi-row INSERT
args_str = ",".join(
cur.mogrify("(%s, %s)", (log["user_id"], log["action"])).decode()
for log in batch
)
cur.execute(f"INSERT INTO logs (user_id, action) VALUES {args_str}")
conn.commit()
conn.close()
# 10,000 logs = 10 queries
# Benchmarks
# Unbatched: 2.5 seconds
# Batched (size=1000): 0.1 seconds → 25x speedup
import pika
import json
# Producer: Fast response, async work
def process_order(order):
"""Respond quickly, queue work."""
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
# Declare queue (idempotent)
channel.queue_declare(queue='order_processing')
# Publish order to queue
channel.basic_publish(
exchange='',
routing_key='order_processing',
body=json.dumps(order),
properties=pika.BasicProperties(delivery_mode=2) # Persistent
)
connection.close()
return {"status": "queued"} # Fast response
# Consumer: Process in background
def consume_orders():
"""Background worker processing orders."""
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='order_processing')
def callback(ch, method, properties, body):
order = json.loads(body)
print(f"Processing order: {order}")
# Simulate long-running work
process_payment(order)
send_confirmation_email(order)
# Acknowledge after success
ch.basic_ack(delivery_tag=method.delivery_tag)
channel.basic_consume(
queue='order_processing',
on_message_callback=callback,
auto_ack=False # Manual ack ensures no data loss
)
print("Waiting for orders...")
channel.start_consuming()
# Run producer in API
# Run consumer in background worker
Performance Checklist
Caching, Batching, Queueing Checklist
Common Pitfalls
Pitfall 1: Cache too long; data becomes stale
- Risk: Users see outdated information.
- Fix: Set reasonable TTLs; validate freshness requirements.
Pitfall 2: Cache thundering herd
- Risk: When cache expires, all requests hit database simultaneously.
- Fix: Use refresh-ahead or randomize TTL expiration.
Pitfall 3: Batch size too large; latency balloons
- Risk: User waits for batch to fill; feels slow.
- Fix: Use max-wait timeout (e.g., batch after 100 items OR 100ms).
Pitfall 4: Queue doesn't persist; messages lost
- Risk: Worker crashes; queued work is lost.
- Fix: Use durable queue (RabbitMQ with persistence, Kafka) and acks.
Real-World Performance Analysis
E-Commerce Product Search
Without Optimization: Query database for every search → 5000 queries/sec at peak → Database at 95% CPU
With Caching (Redis):
- Cache product search results with 5-minute TTL
- 80% cache hit rate → 1000 queries/sec to database
- Database CPU drops to 20%
- Latency: 500ms (database) → 50ms (cache)
With Batching:
- Batch product reviews indexing (update once per hour instead of per-review)
- Eliminates 100k index updates per day → 42 per hour
- 99% throughput increase for real requests
With Queueing:
- User purchases trigger async email, recommendation, analytics tasks
- API response: 100ms (just confirms order to database)
- Email/analytics processed in background
- Users see instant confirmation instead of waiting 3-5 seconds
Combined Impact:
- Latency: 5 seconds → 100ms (50x improvement)
- Throughput: 100 requests/sec → 10,000 requests/sec (100x)
- Database CPU: 95% → 15%
- User experience: "slow and frustrating" → "fast and responsive"
Cache Invalidation Strategies in Detail
TTL-Based Expiration:
// Simplest: automatic expiration
cache.set('product:123', product, ttl: 300); // 5 minutes
// Risk: if product price changes at 4:59, users see stale price for 1 minute
// Trade-off: simpler implementation vs temporary staleness
Event-Driven Invalidation:
// When product updates, invalidate cache immediately
function updateProduct(id, newData) {
db.update('products', id, newData);
cache.delete(`product:${id}`); // Instant invalidation
}
// Next request: cache miss, fetches fresh data
// Trade-off: more complex but always fresh
Hybrid Approach (Recommended):
// Combine both: event-driven + TTL fallback
function updateProduct(id, newData) {
db.update('products', id, newData);
cache.set(`product:${id}`, newData, ttl: 300); // Immediate fresh value + fallback
publishEvent('product:updated', id); // Other services can react
}
Batch Processing Timing
Pure Time-Based Batching:
# Wait 100ms or batch fills, whichever comes first
batch_timeout = 100 # ms
batch_size = 1000
async def add_to_batch(item):
batch.append(item)
if len(batch) >= batch_size:
await flush_batch()
# Timer flushes after timeout
Issues: If items arrive slowly, users wait up to 100ms for response
Adaptive Batching (Better):
# For write-heavy: larger batch, longer wait is acceptable
# For latency-critical: smaller batch, flush sooner
if is_batch_write:
timeout = 500ms # Can afford to wait
size = 5000
else:
timeout = 50ms # Need fast response
size = 100
Queue Depth Monitoring
Healthy Queue Depth Pattern:
[Peak] Sudden spike in requests → Queue depth rises to 50
Workers catch up → Queue depth drops to 5
[Normal] Baseline queue depth 1-3 (small transient bursts)
[Danger] Queue depth 100+ and growing → Workers can't keep up
Action: When queue exceeds 1000, auto-scale workers (add 10% more) or alert on-call
Cache Layer Trade-Offs by Technology
In-Memory (Node/Python): Fastest (microseconds), lost on restart, single-instance only Redis: Fast (milliseconds), persistent options, distributed, network latency Memcached: Simple, very fast, no persistence, good for truly temporary data Database Query Cache: Slowest (milliseconds to seconds), not separate infrastructure CDN Edge Cache: Best for geographic distribution, can't customize per-user
Advanced Patterns
Stale-While-Revalidate
// Serve cached (may be stale), update in background
async function getProduct(id) {
const cached = cache.get(id);
if (cached && Date.now() - cached.time < 24.hours) {
// Serve immediately (even if 8 hours old)
updateInBackground(id); // Non-blocking, eventual consistency
return cached.value;
}
// Cache miss: fetch fresh
const fresh = await db.get(id);
cache.set(id, fresh);
return fresh;
}
// Benefits:
// - Fast response (served from cache)
// - Fresh data eventually (background update)
// - No "thundering herd" (update in background, not on cache miss)
Distributed Caching with Consistency
// Multi-DC setup: cache in each region
// DC1 Redis ← → Sync ← → DC2 Redis
class DistributedCache {
async set(key, value) {
await Promise.all([
this.localCache.set(key, value),
this.remoteCache.set(key, value) // Replicate to other regions
]);
}
async get(key) {
// Check local first (latency: 1ms)
// Fall back to remote (latency: 50ms across network)
return this.localCache.get(key) || this.remoteCache.get(key);
}
}
// Trade-off: higher consistency (both regions have same data) vs latency (wait for remote)
Next Steps
- Profile your application — Use APM tools to find expensive operations.
- Apply caching strategically — Cache-aside for read-heavy, avoid over-caching.
- Tune TTLs — Monitor cache hit rates, adjust based on freshness requirements.
- Identify batching opportunities — Database writes, API calls, bulk operations.
- Implement queueing — Decouple producers from consumers for resilience.
- Monitor and measure — Track cache hit rate, batch sizes, queue depths in production.
- Load test — Verify that caching/batching/queueing actually improve throughput.