Synchronous vs Asynchronous Communication

Master the fundamental choice that determines coupling, latency, and availability of your system

TL;DR

Synchronous: A calls B, waits for response. Tight coupling, simple flow, but low availability. Asynchronous: A sends a message to B, continues immediately. Loose coupling, scales well, but more complex. Most systems use both: synchronous for user-facing operations needing immediate feedback, asynchronous for background work and inter-service communication.

Learning Objectives

Understand the difference between synchronous and asynchronous communication
Recognize coupling implications of each approach
Apply each style to appropriate scenarios
Understand how to combine both effectively

Motivating Scenario

Your e-commerce system receives an order. The customer is waiting for confirmation. Payment processing might take seconds. Email delivery might take minutes. Inventory updates might queue up. If you make the customer wait for all of it synchronously, they timeout. If you respond immediately asynchronously, they never know if their order went through. The solution: synchronous for immediate operations (payment), asynchronous for eventual operations (inventory, email).

Synchronous Communication

Definition: Service A sends a request to Service B and waits for a response before continuing.

Characteristics:

Blocking: A doesn't proceed until B responds
Request-response pattern
Immediate feedback
Tight coupling (A depends on B being up)
Simple to reason about
Limited scalability (waiting requests consume resources)

Timing:

Time 0: A sends request ──────────→ B
Time 1:                     B processes
Time 2: A receives response ←────── B
       A continues

When to Use:

User-facing operations needing immediate response
Operations where failure is obvious (show error to user)
Operations where response is required to proceed
Small microservices environments with good networks

Example

User clicks "Purchase"
├─ Check inventory (sync) ────→ Inventory Service
├─ Wait for response ✓
├─ Process payment (sync) ────→ Payment Service
├─ Wait for response ✓
└─ Return confirmation to user ✓

Time: ~500ms-2s (depending on latencies)
If Inventory Service is down: User sees error immediately

Asynchronous Communication

Definition: Service A sends a message to Service B (usually through a broker) and continues immediately without waiting for processing.

Characteristics:

Non-blocking: A continues after sending
Fire-and-forget or publish-subscribe
Decoupled: A doesn't care if B is up
No immediate feedback
Complex to reason about
High scalability (no waiting)

Timing:

Time 0: A sends message ──────→ Queue ←─── B subscribing
       A continues immediately
Time 1: B picks up message
Time 2:        B processes
       (A already finished long ago)

When to Use:

Background operations (emails, analytics)
Work that takes time (image processing)
Operations where immediate feedback isn't needed
Decoupling services
Event-driven architectures
Handling spikes in traffic

Example

User clicks "Purchase"
├─ Check inventory (sync) ────→ Inventory Service ✓
├─ Send order event (async) ──→ Message Queue
├─ Return confirmation immediately ✓

Later, asynchronously:
├─ Payment Service picks up order
├─ Process payment
└─ Send payment confirmation email

Time: ~100ms (only for critical path)
If Email Service is down: Order still processes, email sent later

Comparison

Synchronous

REST API calls
gRPC requests
Database queries

Asynchronous

Message queues
Event streams
Webhooks

The Latency Difference

Latency: Synchronous vs Asynchronous

Hybrid Approaches

The best systems use both:

Synchronous for critical path: User needs immediate feedback
Asynchronous for background: Processing that can happen later

Hybrid Example

@app.post('/orders')
def create_order(request):
    # Synchronous: Check inventory, required for decision
    if not inventory.has_stock(request.item_id):
        return error('Out of stock')

    # Synchronous: Process payment, required for decision
    payment_result = payment_service.charge(request.amount)
    if not payment_result.success:
        return error('Payment failed')

    # Create order (synchronous)
    order = Order.create(...)

    # Asynchronous: Send confirmation email (can happen later)
    queue.send('email.new_order', order.id)

    # Asynchronous: Update analytics (can happen later)
    queue.send('analytics.order_created', order.id)

    # Return immediately with confirmed order
    return success(order)

Failure Handling

Failure Handling in Sync vs Async

Architecture Patterns for Sync/Async

Saga Pattern (Distributed Transactions)

class OrderSaga:
    """Coordinate order creation across multiple services."""

    async def create_order(self, order_data):
        """
        Orchestrate order across multiple services.
        Synchronous critical path, asynchronous updates.
        """
        try:
            # Synchronous: Must succeed or entire order fails
            order = await self.create_order_in_db(order_data)

            # Asynchronous: Individual service failures don't fail order
            # But must be tracked for compensation
            tasks = [
                self.reserve_inventory(order),
                self.authorize_payment(order),
                self.create_shipment(order),
                self.send_confirmation_email(order)
            ]

            results = await asyncio.gather(*tasks, return_exceptions=True)

            # Track failures for compensation
            failures = [r for r in results if isinstance(r, Exception)]
            if failures:
                # Log failures, trigger compensation
                await self.handle_partial_failure(order, failures)

            return order

        except Exception as e:
            # Rollback on critical failure
            await self.rollback_order(order_data)
            raise

    async def handle_partial_failure(self, order, failures):
        """
        Compensate for failed async operations.
        Example: inventory reserved but payment failed.
        """
        for failure in failures:
            await self.compensation_service.compensate(order, failure)

Outbox Pattern (Reliable Publishing)

class OrderService:
    """Ensure events are published even if system crashes."""

    async def create_order(self, order_data):
        # Transaction 1: Create order and event in same transaction
        async with db.transaction():
            order = await self.orders.create(order_data)

            # Write event to outbox (same transaction)
            await self.outbox.insert({
                'event_type': 'OrderCreated',
                'order_id': order.id,
                'payload': order.to_dict(),
                'published': False
            })

        # Later (separate process): Publish events
        # Even if app crashes, events are in DB and will be retried
        async def publish_pending_events():
            events = await self.outbox.find_unpublished()
            for event in events:
                try:
                    await self.message_broker.publish(event)
                    await self.outbox.mark_published(event.id)
                except Exception as e:
                    logger.error(f"Failed to publish event {event.id}: {e}")
                    # Will retry on next run

Self-Check

Which communication style for each?

User clicks "Send Email" - needs immediate confirmation? Async (return immediately, send email in background)
Processing a batch of images overnight? Async (background job, no immediate response)
Checking account balance? Sync (user needs immediate response)
Recording analytics events? Async (not critical, eventual consistency OK)
Processing a refund? Sync for validation, async for notification
Loading product catalog? Sync with caching
Updating inventory after purchase? Async with retries
Validating user input on form? Sync (immediate feedback)
Sending SMS notification? Async (can fail gracefully)
Checking if username available? Sync (user needs answer)

One Takeaway

Synchronous is simple but scales poorly. Asynchronous is complex but scales well. Use synchronous for critical decisions, asynchronous for everything else.

Next Steps

Messaging Details: Read Messaging
API Gateway: Learn API Gateway
Resilience: Explore Timeouts and Retries

Advanced Synchronous Patterns

Request-Response with Timeouts

Always set timeouts on synchronous calls. Infinite waits cause cascading failures:

import requests
from requests.exceptions import Timeout

def call_with_timeout(url, timeout_seconds=5):
    try:
        response = requests.get(url, timeout=timeout_seconds)
        return response.json()
    except Timeout:
        # Handle timeout - don't wait forever
        logger.error(f"Request to {url} timed out after {timeout_seconds}s")
        return None

# Usage with circuit breaker
class CircuitBreaker:
    def __init__(self, failure_threshold=5):
        self.failures = 0
        self.failure_threshold = failure_threshold
        self.last_failure_time = None
        self.state = 'closed'  # closed, open, half-open

    def call(self, func, *args, **kwargs):
        if self.state == 'open':
            # Recently failed, don't retry yet
            if time.time() - self.last_failure_time > 60:
                self.state = 'half-open'
            else:
                raise CircuitBreakerOpen()

        try:
            result = func(*args, **kwargs)
            self.failures = 0
            self.state = 'closed'
            return result
        except Exception as e:
            self.failures += 1
            self.last_failure_time = time.time()
            if self.failures >= self.failure_threshold:
                self.state = 'open'
            raise

# Usage
breaker = CircuitBreaker(failure_threshold=3)

def call_inventory_service():
    return breaker.call(lambda: call_with_timeout(
        'https://inventory.service/check',
        timeout_seconds=2
    ))

Synchronous Request Chaining

Be careful with long chains of synchronous calls:

Request Chain Pattern (ANTI-PATTERN):
Client → API Gateway (100ms)
       → User Service (100ms)
       → Auth Service (100ms)
       → Product Service (100ms)
       → Inventory Service (100ms)
       → Price Service (100ms)

Total: 600ms (each call must wait for previous)

Problem: Any single slow service makes entire chain slow. Latency multiplies.

Solution: Parallelize where possible:

import asyncio

async def get_order_details(order_id):
    # Parallel requests instead of sequential
    user_task = get_user_details()
    product_task = get_product_details()
    inventory_task = get_inventory_status()

    # Wait for all to complete
    user, products, inventory = await asyncio.gather(
        user_task, product_task, inventory_task
    )

    return {
        'user': user,
        'products': products,
        'inventory': inventory
    }

# Total time: max(100ms, 100ms, 100ms) = 100ms (not 300ms)

Advanced Asynchronous Patterns

Event-Driven Architecture

Instead of direct calls, publish events that other services subscribe to:

class OrderService:
    def create_order(self, order_data):
        order = Order.create(order_data)
        self.order_repo.save(order)

        # Publish event instead of calling other services
        self.event_bus.publish('order.created', {
            'order_id': order.id,
            'user_id': order.user_id,
            'items': order.items,
            'total': order.total
        })

        return order

# Other services subscribe independently
class PaymentService:
    def on_order_created(self, event):
        order_id = event['order_id']
        # Charge payment asynchronously
        self.process_payment(order_id)

class NotificationService:
    def on_order_created(self, event):
        user_id = event['user_id']
        # Send confirmation email
        self.send_confirmation_email(user_id)

class InventoryService:
    def on_order_created(self, event):
        items = event['items']
        # Allocate inventory
        self.allocate_items(items)

Benefits: Services loosely coupled. New subscribers can be added without changing OrderService. If one subscriber fails, others aren't affected.

Request-Reply Pattern with Correlation ID

For async request-reply, use correlation IDs to match responses:

import uuid

class AsyncRequestReply:
    def __init__(self, message_broker):
        self.broker = message_broker
        self.pending_requests = {}

    def send_request(self, service_name, request_data, timeout=10):
        # Generate unique ID
        correlation_id = str(uuid.uuid4())

        # Send request
        self.broker.publish(f'{service_name}.requests', {
            'correlation_id': correlation_id,
            'payload': request_data
        })

        # Wait for response
        future = asyncio.Future()
        self.pending_requests[correlation_id] = future

        # Timeout after 10 seconds
        try:
            response = asyncio.wait_for(future, timeout=timeout)
            return response
        finally:
            del self.pending_requests[correlation_id]

    def handle_response(self, message):
        correlation_id = message['correlation_id']
        if correlation_id in self.pending_requests:
            self.pending_requests[correlation_id].set_result(message['payload'])

Dead Letter Queues

Messages that fail repeatedly go to a dead letter queue for inspection:

class RobustMessageProcessor:
    def __init__(self, queue, max_retries=3):
        self.queue = queue
        self.max_retries = max_retries
        self.dead_letter_queue = queue.dead_letter_queue

    async def process_messages(self):
        while True:
            msg = await self.queue.receive()

            retry_count = msg.get('retry_count', 0)

            try:
                await self.handle_message(msg)
                await self.queue.acknowledge(msg)
            except Exception as e:
                if retry_count < self.max_retries:
                    # Retry: re-queue with incremented counter
                    msg['retry_count'] = retry_count + 1
                    await self.queue.send(msg)
                    logger.warning(f"Retrying message {msg.id}: {e}")
                else:
                    # Max retries exceeded: send to dead letter queue
                    await self.dead_letter_queue.send({
                        'original_message': msg,
                        'error': str(e),
                        'retry_count': retry_count
                    })
                    await self.queue.acknowledge(msg)
                    logger.error(f"Message {msg.id} moved to DLQ: {e}")

Choosing Sync vs Async: Decision Tree

Use this decision tree to determine the best approach:

Do you need immediate feedback?
├─ YES: "Is it user-facing (direct request)?"
│  ├─ YES: Synchronous (API call, REST request)
│  │  └─ Examples: Login, fetch product, check balance
│  └─ NO: "Can it fail gracefully?"
│     ├─ YES: Asynchronous with user notification
│     │  └─ Example: Generating report, video encoding
│     └─ NO: Synchronous (critical operation)
│        └─ Example: Process payment, create order
└─ NO: Asynchronous (background job)
   ├─ Can happen later: Message queue
   │  └─ Examples: Send email, update analytics, cleanup
   └─ Time-critical but not user-facing: Event stream
      └─ Examples: Real-time notifications, audit logging

Real-World Trade-Offs Example

E-commerce checkout:

class CheckoutService:
    async def checkout(self, order):
        # SYNCHRONOUS: Critical for user experience
        # Must complete or fail immediately

        try:
            # 1. Validate inventory (sync) — must know if in stock
            if not self.inventory.has_stock(order.items):
                raise OutOfStock()

            # 2. Process payment (sync) — must know if payment succeeded
            payment = self.payment.charge(order.user_id, order.total)
            if not payment.success:
                raise PaymentFailed()

            # 3. Create order record (sync) — must persist before responding
            saved_order = self.order_repo.save(order)

            # ASYNCHRONOUS: Can happen in background
            # User doesn't wait for these

            # 4. Send confirmation email (async) — user can wait
            self.queue.send('email.order_confirmation', {'order_id': saved_order.id})

            # 5. Update analytics (async) — not critical
            self.queue.send('analytics.checkout_completed', {'order_id': saved_order.id})

            # 6. Notify warehouse (async) — has time window
            self.queue.send('warehouse.new_order', {'order_id': saved_order.id})

            # User gets response immediately
            return {'status': 'success', 'order_id': saved_order.id}

        except (OutOfStock, PaymentFailed) as e:
            # User sees error immediately
            raise

Critical Path (Synchronous): 200-300ms total

Inventory check: 50ms
Payment processing: 200ms
Order creation: 50ms

Background Tasks (Asynchronous): Happen later

Email sent in 1-5 seconds
Analytics updated in 10 seconds
Warehouse notified in 30 seconds

User sees order confirmation in 300ms, even though full process takes 35 seconds.

References

Newman, S. (2015). "Building Microservices". O'Reilly Media.
Fowler, M., & Lewis, J. (2014). "Microservices". martinfowler.com.
Indrasiri, K., & Kulatunga, D. (2021). "Microservices Development Cookbook". Packt.
"Enterprise Integration Patterns" by Gregor Hohpe
"Designing Event-Driven Systems" by Ben Stopford

Synchronous vs Asynchronous Communication

TL;DR​

Learning Objectives​

Motivating Scenario​

Synchronous Communication​

Asynchronous Communication​

Comparison​

The Latency Difference​

Hybrid Approaches​

Failure Handling​

Architecture Patterns for Sync/Async​

Saga Pattern (Distributed Transactions)​

Outbox Pattern (Reliable Publishing)​

Self-Check​

Next Steps​

Advanced Synchronous Patterns​

Request-Response with Timeouts​

Synchronous Request Chaining​

Advanced Asynchronous Patterns​

Event-Driven Architecture​

Request-Reply Pattern with Correlation ID​

Dead Letter Queues​

Choosing Sync vs Async: Decision Tree​

Real-World Trade-Offs Example​

References​