Skip to main content

Chatty Services Anti-Pattern

Services making excessive network calls for operations that could be done locally or batched.

TL;DR

Chatty services make excessive individual API calls instead of batching requests. A typical pattern: loop through 100 items, making one request per item (100 network hops). Each hop adds 10-50ms latency; total time becomes seconds instead of milliseconds. Fix: Design batch APIs, implement caching layers, and use prefetching strategies to reduce network chatiness.

Learning Objectives

  • Identify chatty service interactions in your architecture
  • Understand the performance cost of excessive network calls
  • Design batch APIs and caching strategies
  • Implement prefetching and CQRS patterns
  • Measure and monitor network efficiency

Motivating Scenario

Imagine an order service that needs to enrich 50 orders with customer data and payment history from two separate services. A naive implementation loops through each order, fetching customer details (1 API call per order) and payment history (1 more API call per order). Result: 100 network calls, each 20ms = 2 seconds minimum. A user clicks "view my orders" and waits 2 seconds for data that could load in 100ms with proper batching. The service separation that was meant to improve scalability instead creates a performance bottleneck.

Core Concepts

The Cost of Network Calls

Every network call incurs overhead: serialization, network transmission, deserialization, routing. Even on a fast network (1ms latency), multiple calls dominate. Consider:

  • 1 batched request for 100 items: ~20ms
  • 100 individual requests: 100 × 20ms = 2000ms (100x slower)

This overhead is why microservices require different API design than monoliths. In a monolith, function calls are microseconds; in distributed systems, calls are milliseconds—a 1000x difference.

Common Chatty Patterns

Practical Example

# Order Service - BAD: Chatty calls
async def get_orders_with_details(customer_id):
orders = db.query("SELECT * FROM orders WHERE customer_id = ?", customer_id)
result = []

for order in orders:
# Call User Service 50 times
user = requests.get(f"http://user-service/users/{order.user_id}")

# Call Payment Service 50 times
payment = requests.get(f"http://payment-service/payments/{order.payment_id}")

result.append({
"order": order,
"user": user.json(),
"payment": payment.json()
})

return result # 100 HTTP calls for 50 orders

When to Use / When to Avoid

Chatty Pattern (Avoid)
  1. Loop through items, one API call per item
  2. No batching support in downstream services
  3. Each call waits for previous to complete
  4. Latency: O(n) where n = number of items
  5. Fragile: If downstream is slow, entire flow blocks
Batch Pattern (Prefer)
  1. Single API call for multiple items
  2. Downstream service processes all at once
  3. Parallel execution possible
  4. Latency: O(1) or O(log n)
  5. Resilient: One batch operation instead of many

Patterns & Pitfalls

Design downstream services to accept multiple IDs: POST /users/batch ids: [1,2,3]. Clients accumulate IDs and request all at once. Reduces calls 100x for typical operations.
Cache frequently accessed data in local memory or Redis. Accept eventual consistency. Most user/product data changes slowly; perfect for caching with 5-minute TTL.
Anticipate what you'll need and fetch proactively. On user login, prefetch their recent orders, preferences, and recommendations. Reduce on-demand calls during critical paths.
Separate read models from write models. Maintain a local copy of frequently-read data from other services. Write updates trigger events that sync your cache.
Don't make batch sizes too large. If you batch 10,000 items and timeout is 30s, the service fails. Use reasonable batch sizes (100-1000 typical) with retry logic.
Caches can return stale data. Publish events when data changes. Subscribe to those events and invalidate your cache. Hard problem; plan for it.

Design Review Checklist

  • Identified hot paths with excessive API calls?
  • Downstream services support batch endpoints?
  • Batch sizes reasonable (100-1000 typical)?
  • Caching strategy defined for read-heavy operations?
  • Cache invalidation events published on data changes?
  • Prefetching implemented for predictable patterns?
  • Fallback behavior defined if batch call fails?
  • Monitoring in place to detect chatty patterns?
  • Load tests show acceptable latency with real data volumes?
  • Circuit breakers protect against cascading failures?

Self-Check

  • What happens if you make 100 sequential API calls at 10ms each? 1000ms = 1 second minimum. With batching, could be 20-30ms total.
  • When is caching appropriate? When data is read-heavy and changes are infrequent. Most user profiles, product catalogs, configuration—perfect candidates.
  • How do you detect chatty interactions? Monitor API call counts per logical operation. If count > 10 for simple operation, likely chatty. Use APM tools.
  • What's the difference between batching and caching? Batching groups many calls into one. Caching eliminates calls by storing data locally.
  • How do you handle cache invalidation? Publish events when data changes. Consumers subscribe and invalidate. Or use TTLs and accept staleness.

Next Steps

  1. Audit current services — Measure API call counts for typical operations using APM tools
  2. Design batch endpoints — Add POST /batch endpoints that accept arrays of IDs
  3. Implement caching — Add Redis or in-memory cache for frequently accessed data
  4. Set cache TTLs — Start with 5 minutes; adjust based on data freshness requirements
  5. Monitor improvements — Measure latency reduction after optimizations
  6. Document patterns — Create team guidelines on when to batch vs. cache vs. fetch

Advanced Optimization Techniques

Technique 1: Request Collapsing

Merge multiple concurrent requests into one.

class RequestCollapser {
constructor(fetchFn, delayMs = 10) {
this.fetchFn = fetchFn;
this.delayMs = delayMs;
this.pending = null;
this.queue = [];
}

async fetch(...args) {
const promise = new Promise((resolve, reject) => {
this.queue.push({ args, resolve, reject });
});

if (!this.pending) {
this.pending = setTimeout(() => this._flush(), this.delayMs);
}

return promise;
}

async _flush() {
const queue = this.queue;
this.queue = [];
this.pending = null;

if (queue.length === 0) return;

const uniqueIds = [...new Set(queue.map(q => q.args[0]))];

try {
const results = await this.fetchFn(uniqueIds);
const resultMap = new Map(results.map(r => [r.id, r]));

for (const item of queue) {
const [id] = item.args;
item.resolve(resultMap.get(id));
}
} catch (error) {
for (const item of queue) {
item.reject(error);
}
}
}
}

// Usage
const userFetcher = new RequestCollapser(async (ids) => {
return userService.batch(ids); // Single batch call
});

// These 5 calls become 1 request
userFetcher.fetch(1);
userFetcher.fetch(2);
userFetcher.fetch(1); // Duplicate ID
userFetcher.fetch(3);
userFetcher.fetch(2);

Technique 2: GraphQL (Single Request, Multiple Resources)

Instead of N+1 API calls, one GraphQL query gets everything needed.

query GetOrderDetails($orderId: ID!) {
order(id: $orderId) {
id
total
customer {
id
name
email
}
items {
id
productId
quantity
}
payment {
status
method
}
}
}

Without GraphQL: 4 API calls (order, customer, items, payment). With GraphQL: 1 API call, backend resolves dependencies.

Technique 3: Async/Await with Parallel Execution

async def get_order_details(order_id):
order = await orders_db.get(order_id)

# Fetch multiple resources in parallel
customer, payment, inventory = await asyncio.gather(
customers_service.get(order.customer_id),
payments_service.get(order.payment_id),
inventory_service.check(order.item_ids)
)

return {
'order': order,
'customer': customer,
'payment': payment,
'inventory': inventory
}

# If sequential (bad): 3 waits = 60ms
# If parallel (good): max(20ms, 20ms, 20ms) = 20ms

Measuring Chatty Patterns

Metric 1: API Call Count Per Operation

def track_api_calls(operation_name):
initial_count = metrics.api_call_count

# Run operation
do_something()

final_count = metrics.api_call_count
calls = final_count - initial_count

print(f"{operation_name}: {calls} API calls")
if calls > 5:
print(f" WARNING: High call count for {operation_name}")

Metric 2: Latency Breakdown

Operation: GetOrderWithDetails
Total latency: 500ms
Breakdown:
- Database query: 50ms
- User service call: 100ms
- Payment service call: 150ms
- Inventory service call: 100ms
- Serialization: 100ms

Insight: 350ms (70%) spent in external calls

Optimization: Batch user/payment/inventory into 1 call
Expected latency: 50ms + 200ms + 100ms = 350ms

Self-Check

  • What happens if you make 100 sequential API calls at 10ms each? 1000ms = 1 second minimum. With batching, could be 20-30ms total.
  • When is caching appropriate? When data is read-heavy and changes are infrequent.
  • How do you detect chatty interactions? Monitor API call counts per operation. If count greater than 10 for simple operation, likely chatty.
  • What's the difference between batching and caching? Batching groups calls into one. Caching eliminates calls.
  • How do you handle cache invalidation? Publish events when data changes. Consumers subscribe and invalidate, or use TTLs.
info

One Takeaway: Chatty services are a hidden performance killer. Each network call adds 10-100ms of latency. One endpoint calling ten others means 100-1000ms of extra latency. Fix with batching (reduce number of calls), caching (eliminate calls), or GraphQL (combine requests). Monitor API call counts religiously.

Next Steps

  1. Audit current services — Measure API call counts for typical operations
  2. Design batch endpoints — Add POST /batch endpoints
  3. Implement caching — Add Redis for frequently accessed data
  4. Use GraphQL — If complexity justifies it
  5. Monitor improvements — Measure latency reduction

References