Synchronous vs Asynchronous Communication
Master the fundamental choice that determines coupling, latency, and availability of your system
TL;DR
Synchronous: A calls B, waits for response. Tight coupling, simple flow, but low availability. Asynchronous: A sends a message to B, continues immediately. Loose coupling, scales well, but more complex. Most systems use both: synchronous for user-facing operations needing immediate feedback, asynchronous for background work and inter-service communication.
Learning Objectives
- Understand the difference between synchronous and asynchronous communication
- Recognize coupling implications of each approach
- Apply each style to appropriate scenarios
- Understand how to combine both effectively
Motivating Scenario
Your e-commerce system receives an order. The customer is waiting for confirmation. Payment processing might take seconds. Email delivery might take minutes. Inventory updates might queue up. If you make the customer wait for all of it synchronously, they timeout. If you respond immediately asynchronously, they never know if their order went through. The solution: synchronous for immediate operations (payment), asynchronous for eventual operations (inventory, email).
Synchronous Communication
Definition: Service A sends a request to Service B and waits for a response before continuing.
Characteristics:
- Blocking: A doesn't proceed until B responds
- Request-response pattern
- Immediate feedback
- Tight coupling (A depends on B being up)
- Simple to reason about
- Limited scalability (waiting requests consume resources)
Timing:
Time 0: A sends request ──────────→ B
Time 1: B processes
Time 2: A receives response ←────── B
A continues
When to Use:
- User-facing operations needing immediate response
- Operations where failure is obvious (show error to user)
- Operations where response is required to proceed
- Small microservices environments with good networks
- Example
User clicks "Purchase"
├─ Check inventory (sync) ────→ Inventory Service
├─ Wait for response ✓
├─ Process payment (sync) ────→ Payment Service
├─ Wait for response ✓
└─ Return confirmation to user ✓
Time: ~500ms-2s (depending on latencies)
If Inventory Service is down: User sees error immediately
Asynchronous Communication
Definition: Service A sends a message to Service B (usually through a broker) and continues immediately without waiting for processing.
Characteristics:
- Non-blocking: A continues after sending
- Fire-and-forget or publish-subscribe
- Decoupled: A doesn't care if B is up
- No immediate feedback
- Complex to reason about
- High scalability (no waiting)
Timing:
Time 0: A sends message ──────→ Queue ←─── B subscribing
A continues immediately
Time 1: B picks up message
Time 2: B processes
(A already finished long ago)
When to Use:
- Background operations (emails, analytics)
- Work that takes time (image processing)
- Operations where immediate feedback isn't needed
- Decoupling services
- Event-driven architectures
- Handling spikes in traffic
- Example
User clicks "Purchase"
├─ Check inventory (sync) ────→ Inventory Service ✓
├─ Send order event (async) ──→ Message Queue
├─ Return confirmation immediately ✓
Later, asynchronously:
├─ Payment Service picks up order
├─ Process payment
└─ Send payment confirmation email
Time: ~100ms (only for critical path)
If Email Service is down: Order still processes, email sent later
Comparison
- REST API calls
- gRPC requests
- Database queries
- Message queues
- Event streams
- Webhooks
The Latency Difference
Hybrid Approaches
The best systems use both:
- Synchronous for critical path: User needs immediate feedback
- Asynchronous for background: Processing that can happen later
- Hybrid Example
@app.post('/orders')
def create_order(request):
# Synchronous: Check inventory, required for decision
if not inventory.has_stock(request.item_id):
return error('Out of stock')
# Synchronous: Process payment, required for decision
payment_result = payment_service.charge(request.amount)
if not payment_result.success:
return error('Payment failed')
# Create order (synchronous)
order = Order.create(...)
# Asynchronous: Send confirmation email (can happen later)
queue.send('email.new_order', order.id)
# Asynchronous: Update analytics (can happen later)
queue.send('analytics.order_created', order.id)
# Return immediately with confirmed order
return success(order)
Failure Handling
Architecture Patterns for Sync/Async
Saga Pattern (Distributed Transactions)
class OrderSaga:
"""Coordinate order creation across multiple services."""
async def create_order(self, order_data):
"""
Orchestrate order across multiple services.
Synchronous critical path, asynchronous updates.
"""
try:
# Synchronous: Must succeed or entire order fails
order = await self.create_order_in_db(order_data)
# Asynchronous: Individual service failures don't fail order
# But must be tracked for compensation
tasks = [
self.reserve_inventory(order),
self.authorize_payment(order),
self.create_shipment(order),
self.send_confirmation_email(order)
]
results = await asyncio.gather(*tasks, return_exceptions=True)
# Track failures for compensation
failures = [r for r in results if isinstance(r, Exception)]
if failures:
# Log failures, trigger compensation
await self.handle_partial_failure(order, failures)
return order
except Exception as e:
# Rollback on critical failure
await self.rollback_order(order_data)
raise
async def handle_partial_failure(self, order, failures):
"""
Compensate for failed async operations.
Example: inventory reserved but payment failed.
"""
for failure in failures:
await self.compensation_service.compensate(order, failure)
Outbox Pattern (Reliable Publishing)
class OrderService:
"""Ensure events are published even if system crashes."""
async def create_order(self, order_data):
# Transaction 1: Create order and event in same transaction
async with db.transaction():
order = await self.orders.create(order_data)
# Write event to outbox (same transaction)
await self.outbox.insert({
'event_type': 'OrderCreated',
'order_id': order.id,
'payload': order.to_dict(),
'published': False
})
# Later (separate process): Publish events
# Even if app crashes, events are in DB and will be retried
async def publish_pending_events():
events = await self.outbox.find_unpublished()
for event in events:
try:
await self.message_broker.publish(event)
await self.outbox.mark_published(event.id)
except Exception as e:
logger.error(f"Failed to publish event {event.id}: {e}")
# Will retry on next run
Self-Check
Which communication style for each?
- User clicks "Send Email" - needs immediate confirmation? Async (return immediately, send email in background)
- Processing a batch of images overnight? Async (background job, no immediate response)
- Checking account balance? Sync (user needs immediate response)
- Recording analytics events? Async (not critical, eventual consistency OK)
- Processing a refund? Sync for validation, async for notification
- Loading product catalog? Sync with caching
- Updating inventory after purchase? Async with retries
- Validating user input on form? Sync (immediate feedback)
- Sending SMS notification? Async (can fail gracefully)
- Checking if username available? Sync (user needs answer)
Synchronous is simple but scales poorly. Asynchronous is complex but scales well. Use synchronous for critical decisions, asynchronous for everything else.
Next Steps
- Messaging Details: Read Messaging
- API Gateway: Learn API Gateway
- Resilience: Explore Timeouts and Retries
Advanced Synchronous Patterns
Request-Response with Timeouts
Always set timeouts on synchronous calls. Infinite waits cause cascading failures:
import requests
from requests.exceptions import Timeout
def call_with_timeout(url, timeout_seconds=5):
try:
response = requests.get(url, timeout=timeout_seconds)
return response.json()
except Timeout:
# Handle timeout - don't wait forever
logger.error(f"Request to {url} timed out after {timeout_seconds}s")
return None
# Usage with circuit breaker
class CircuitBreaker:
def __init__(self, failure_threshold=5):
self.failures = 0
self.failure_threshold = failure_threshold
self.last_failure_time = None
self.state = 'closed' # closed, open, half-open
def call(self, func, *args, **kwargs):
if self.state == 'open':
# Recently failed, don't retry yet
if time.time() - self.last_failure_time > 60:
self.state = 'half-open'
else:
raise CircuitBreakerOpen()
try:
result = func(*args, **kwargs)
self.failures = 0
self.state = 'closed'
return result
except Exception as e:
self.failures += 1
self.last_failure_time = time.time()
if self.failures >= self.failure_threshold:
self.state = 'open'
raise
# Usage
breaker = CircuitBreaker(failure_threshold=3)
def call_inventory_service():
return breaker.call(lambda: call_with_timeout(
'https://inventory.service/check',
timeout_seconds=2
))
Synchronous Request Chaining
Be careful with long chains of synchronous calls:
Request Chain Pattern (ANTI-PATTERN):
Client → API Gateway (100ms)
→ User Service (100ms)
→ Auth Service (100ms)
→ Product Service (100ms)
→ Inventory Service (100ms)
→ Price Service (100ms)
Total: 600ms (each call must wait for previous)
Problem: Any single slow service makes entire chain slow. Latency multiplies.
Solution: Parallelize where possible:
import asyncio
async def get_order_details(order_id):
# Parallel requests instead of sequential
user_task = get_user_details()
product_task = get_product_details()
inventory_task = get_inventory_status()
# Wait for all to complete
user, products, inventory = await asyncio.gather(
user_task, product_task, inventory_task
)
return {
'user': user,
'products': products,
'inventory': inventory
}
# Total time: max(100ms, 100ms, 100ms) = 100ms (not 300ms)
Advanced Asynchronous Patterns
Event-Driven Architecture
Instead of direct calls, publish events that other services subscribe to:
class OrderService:
def create_order(self, order_data):
order = Order.create(order_data)
self.order_repo.save(order)
# Publish event instead of calling other services
self.event_bus.publish('order.created', {
'order_id': order.id,
'user_id': order.user_id,
'items': order.items,
'total': order.total
})
return order
# Other services subscribe independently
class PaymentService:
def on_order_created(self, event):
order_id = event['order_id']
# Charge payment asynchronously
self.process_payment(order_id)
class NotificationService:
def on_order_created(self, event):
user_id = event['user_id']
# Send confirmation email
self.send_confirmation_email(user_id)
class InventoryService:
def on_order_created(self, event):
items = event['items']
# Allocate inventory
self.allocate_items(items)
Benefits: Services loosely coupled. New subscribers can be added without changing OrderService. If one subscriber fails, others aren't affected.
Request-Reply Pattern with Correlation ID
For async request-reply, use correlation IDs to match responses:
import uuid
class AsyncRequestReply:
def __init__(self, message_broker):
self.broker = message_broker
self.pending_requests = {}
def send_request(self, service_name, request_data, timeout=10):
# Generate unique ID
correlation_id = str(uuid.uuid4())
# Send request
self.broker.publish(f'{service_name}.requests', {
'correlation_id': correlation_id,
'payload': request_data
})
# Wait for response
future = asyncio.Future()
self.pending_requests[correlation_id] = future
# Timeout after 10 seconds
try:
response = asyncio.wait_for(future, timeout=timeout)
return response
finally:
del self.pending_requests[correlation_id]
def handle_response(self, message):
correlation_id = message['correlation_id']
if correlation_id in self.pending_requests:
self.pending_requests[correlation_id].set_result(message['payload'])
Dead Letter Queues
Messages that fail repeatedly go to a dead letter queue for inspection:
class RobustMessageProcessor:
def __init__(self, queue, max_retries=3):
self.queue = queue
self.max_retries = max_retries
self.dead_letter_queue = queue.dead_letter_queue
async def process_messages(self):
while True:
msg = await self.queue.receive()
retry_count = msg.get('retry_count', 0)
try:
await self.handle_message(msg)
await self.queue.acknowledge(msg)
except Exception as e:
if retry_count < self.max_retries:
# Retry: re-queue with incremented counter
msg['retry_count'] = retry_count + 1
await self.queue.send(msg)
logger.warning(f"Retrying message {msg.id}: {e}")
else:
# Max retries exceeded: send to dead letter queue
await self.dead_letter_queue.send({
'original_message': msg,
'error': str(e),
'retry_count': retry_count
})
await self.queue.acknowledge(msg)
logger.error(f"Message {msg.id} moved to DLQ: {e}")
Choosing Sync vs Async: Decision Tree
Use this decision tree to determine the best approach:
Do you need immediate feedback?
├─ YES: "Is it user-facing (direct request)?"
│ ├─ YES: Synchronous (API call, REST request)
│ │ └─ Examples: Login, fetch product, check balance
│ └─ NO: "Can it fail gracefully?"
│ ├─ YES: Asynchronous with user notification
│ │ └─ Example: Generating report, video encoding
│ └─ NO: Synchronous (critical operation)
│ └─ Example: Process payment, create order
└─ NO: Asynchronous (background job)
├─ Can happen later: Message queue
│ └─ Examples: Send email, update analytics, cleanup
└─ Time-critical but not user-facing: Event stream
└─ Examples: Real-time notifications, audit logging
Real-World Trade-Offs Example
E-commerce checkout:
class CheckoutService:
async def checkout(self, order):
# SYNCHRONOUS: Critical for user experience
# Must complete or fail immediately
try:
# 1. Validate inventory (sync) — must know if in stock
if not self.inventory.has_stock(order.items):
raise OutOfStock()
# 2. Process payment (sync) — must know if payment succeeded
payment = self.payment.charge(order.user_id, order.total)
if not payment.success:
raise PaymentFailed()
# 3. Create order record (sync) — must persist before responding
saved_order = self.order_repo.save(order)
# ASYNCHRONOUS: Can happen in background
# User doesn't wait for these
# 4. Send confirmation email (async) — user can wait
self.queue.send('email.order_confirmation', {'order_id': saved_order.id})
# 5. Update analytics (async) — not critical
self.queue.send('analytics.checkout_completed', {'order_id': saved_order.id})
# 6. Notify warehouse (async) — has time window
self.queue.send('warehouse.new_order', {'order_id': saved_order.id})
# User gets response immediately
return {'status': 'success', 'order_id': saved_order.id}
except (OutOfStock, PaymentFailed) as e:
# User sees error immediately
raise
Critical Path (Synchronous): 200-300ms total
- Inventory check: 50ms
- Payment processing: 200ms
- Order creation: 50ms
Background Tasks (Asynchronous): Happen later
- Email sent in 1-5 seconds
- Analytics updated in 10 seconds
- Warehouse notified in 30 seconds
User sees order confirmation in 300ms, even though full process takes 35 seconds.
References
- Newman, S. (2015). "Building Microservices". O'Reilly Media.
- Fowler, M., & Lewis, J. (2014). "Microservices". martinfowler.com.
- Indrasiri, K., & Kulatunga, D. (2021). "Microservices Development Cookbook". Packt.
- "Enterprise Integration Patterns" by Gregor Hohpe
- "Designing Event-Driven Systems" by Ben Stopford