Skip to main content

Overuse of Synchronous Calls Anti-Pattern

Every inter-service communication is synchronous, creating blocking dependencies and reducing resilience.

TL;DR

Overusing synchronous calls means treating every inter-service communication as request/response. Service A calls B, blocks waiting for response. B calls C, blocks waiting. Latency accumulates (100ms × 5 services = 500ms). If any service is slow or down, the entire chain fails with no buffering or graceful degradation. Solution: Use asynchronous messaging for non-critical operations. Services publish events; others consume when ready. Reduces latency (immediate response), improves resilience (failures don't cascade), and enables independent scaling.

Learning Objectives

  • Understand synchronous vs. asynchronous communication trade-offs
  • Identify which operations should be synchronous vs. async
  • Implement message queues for decoupling
  • Design resilient systems with graceful degradation
  • Use circuit breakers and timeouts for remaining sync calls
  • Monitor and measure improvements in latency and reliability

Motivating Scenario

An e-commerce system synchronously sends email confirmation after every order. Order Service must wait for Email Service to respond. Email Service is slow (500ms to send), so every order creation waits 500ms minimum. A traffic spike causes Email Service to get backlogged. Orders start timing out. Customers abandon carts. Revenue drops. Meanwhile, the order was already successfully created; the email is just a nice-to-have. If Email Service was async (publish event, Order Service returns immediately), traffic spikes wouldn't matter—orders would succeed, and emails would be sent eventually with no customer impact.

Core Concepts

The Cost of Synchronous Communication

Every synchronous call is a potential failure point and a latency contributor. In distributed systems, this compounds quickly.

Synchronous vs. Asynchronous Characteristics

AspectSynchronousAsynchronous
BlockingYes (waits for response)No (returns immediately)
Failure PropagationCascades (A fails if B fails)Isolated (B's failure queued, retried)
LatencyAccumulates (100ms × 5 calls = 500ms)Minimal to caller (immediate response)
BufferingNone (request lost if recipient down)Queue buffers requests
Use CaseCritical path (auth, payment validation)Background tasks (email, analytics)
ScalingHarder (scale all together)Easier (independent scaling)

Practical Example

# Order Service - everything synchronous
class OrderService:
def __init__(self, payment_service, email_service, analytics_service):
self.payment = payment_service
self.email = email_service
self.analytics = analytics_service

def create_order(self, user_id, items):
total = sum(item.price for item in items)

# Synchronous: must wait for payment to complete
payment = self.payment.charge(user_id, total) # 100ms
if not payment.success:
raise Exception("Payment failed")

order = self.save_order(user_id, items, total, payment.id)

# Synchronous: must wait for email to send
self.email.send_confirmation(user_id, order.id, items) # 500ms
# ^ Customer waits 500ms for email that is just a nice-to-have

# Synchronous: must wait for analytics to log
self.analytics.log_purchase(user_id, total, order.id) # 50ms
# ^ Customer waits 50ms for analytics that nobody cares about

return order # Total: 650ms for customer

# Problem: Email Service is slow on Black Friday
# Order creation blocks waiting for email
# Orders start timing out
# Customers see errors and leave
# Revenue lost, even though orders were already saved

When to Use / When to Avoid

Synchronous (Limited Use)
  1. Critical path operations needing immediate decision
  2. Auth validation (user exists, token valid)
  3. Payment processing (need success/failure response)
  4. Inventory checks (stock availability)
  5. Keep to minimum; typically 1-2 calls per operation
Asynchronous (Default)
  1. Email, SMS notifications (can be delayed)
  2. Analytics, logging, metrics (non-blocking)
  3. Reporting and data warehouse updates (eventual)
  4. Image processing, file conversions (background jobs)
  5. Publish events; handle responses asynchronously

Patterns & Pitfalls

Publish events for state changes. Services subscribe and process asynchronously. Decouples producers from consumers; enables independent scaling.
Use Kafka, RabbitMQ, or cloud queues. Producers publish; consumers process when ready. Queue buffers requests; handles backpressure.
For remaining synchronous calls, use circuit breakers. If service is down, fail fast instead of timing out. Degrade gracefully.
All synchronous calls must have timeouts. If call takes > 5 seconds, fail and retry. Prevents cascading hangs.
Publish event then wait for handler response = defeats async. Use event sourcing or callbacks for coordination, not blocking.
Events fail to process (service down, bug, etc.). Implement dead letter queues. Monitor and alert on them.

Design Review Checklist

  • Identified critical path operations (must be sync)?
  • Non-critical operations converted to async events?
  • Maximum 2-3 synchronous calls per operation?
  • Message queue infrastructure in place (Kafka, RabbitMQ)?
  • Event contracts documented and versioned?
  • Circuit breakers protecting all sync calls?
  • Timeouts configured for all sync calls (< 5s)?
  • Dead letter queue monitoring for failed events?
  • Idempotent event handlers (safe to retry)?
  • Graceful degradation if async services down?

Self-Check

  • Which operations should be synchronous? Auth, payment, inventory, validation. Anything where you need immediate success/failure decision.
  • Which should be async? Email, analytics, logging, reporting, notifications. Anything that can happen later without impacting user experience.
  • What if Email Service is slow? With async, doesn't matter. Queue handles it. Retry if needed. User never waits.
  • How do I handle failures in async? Idempotent handlers (safe to retry) + dead letter queues (failed messages) + monitoring (alerts on failures).
  • How do I know if I'm overusing sync? If any single operation makes > 3 synchronous calls, probably overusing. Measure latencies; anything > 200ms is suspicious.

Next Steps

  1. Audit all service calls — Map which are sync, which are async
  2. Identify non-critical sync calls — Candidates for async conversion
  3. Design events — Define what events to publish
  4. Implement message queue — Set up Kafka, RabbitMQ, or cloud equivalent
  5. Migrate one flow — Convert email/analytics from sync to async
  6. Add circuit breakers — Protect remaining sync calls
  7. Monitor improvements — Measure latency reduction and reliability gains

References