Skip to main content

Event-Driven & Reactive

Event-Driven Architecture (EDA) is a paradigm where systems respond to events—immutable facts about something that has happened. Producers publish events without knowing who will consume them, enabling loose coupling and asynchronicity. Reactive programming is a related paradigm that focuses on composing asynchronous and event-based programs with observable streams, providing tools for managing backpressure and complex data flows.

"In an event-driven system, the flow of control is determined by events. This is a fundamental shift from traditional, request-driven architectures." — Jonas Bonér

A typical event-driven topology with producers, a broker, and consumers.

Core ideas

  • Event as a Fact: An event is an immutable record of a business fact (e.g., OrderPlaced, PaymentProcessed). It contains all necessary data for a consumer to act.
  • Producers and Consumers: Producers emit events to a message broker or event bus. Consumers subscribe to topics and react to events asynchronously.
  • Loose Coupling: Producers and consumers are independent. They don't need to know about each other, which allows them to be developed, deployed, and scaled separately.
  • Backpressure: Consumers signal to producers when they are overwhelmed, allowing the system to gracefully handle load by slowing down producers, buffering, or dropping events.

Examples

Sequential control flow for handling a single event.
consumer.py
import asyncio
from typing import Any, Dict, Set

processed: Set[str] = set()

async def handle_event(evt: Dict[str, Any]) -> None:
# Idempotency guard
key = evt.get("id")
if key in processed:
return

# Business logic (pure-ish), then side effects
order_total = sum(i["qty"] * i["price"] for i in evt["items"])
await persist_result({"order_id": evt["id"], "total": order_total})
processed.add(key)

async def persist_result(doc: Dict[str, Any]) -> None:
await asyncio.sleep(0) # simulate non-blocking IO

async def consumer(stream):
# stream is an async iterator yielding events
sem = asyncio.Semaphore(50) # backpressure via concurrency cap
async for evt in stream:
await sem.acquire()
asyncio.create_task(_run(evt, sem))

async def _run(evt, sem):
try:
await handle_event(evt)
finally:
sem.release()
When to Use vs. When to Reconsider
When to Use
  1. High-throughput, asynchronous workflows: Ideal for systems that need to handle many concurrent requests, like IoT data ingestion, real-time notifications, or financial tickers.
  2. Decoupling microservices: Allows services to evolve independently. A producer can change without affecting consumers, as long as the event contract is maintained.
  3. Streaming data processing: Perfect for Change Data Capture (CDC), log processing, and real-time analytics where data is treated as an infinite stream.
When to Reconsider
  1. Simple, synchronous request/response: The complexity of brokers, delivery semantics, and asynchronous logic is overkill for simple CRUD services.
  2. Systems requiring strong transactional consistency: Achieving end-to-end transactional guarantees in a distributed, event-driven system is extremely complex.
  3. When a clear, linear control flow is needed: Debugging and reasoning about a system where control flow is distributed across many independent consumers can be challenging.

Operational Considerations

Choose your guarantee: at-most-once (fast, but lossy), at-least-once (retries, but requires idempotent consumers), or exactly-once (complex, often emulated).
Consumers must be designed to handle duplicate events without causing incorrect side effects. This is critical for at-least-once delivery.
Monitor consumer lag and queue depth. Implement backpressure strategies (e.g., bounded buffers, rate limiting) to prevent consumers from being overwhelmed.

Design Review Checklist

  • Is the event schema well-defined and versioned?
  • Are delivery semantics (at-most/at-least/exactly-once) explicitly defined and handled?
  • Are all consumers idempotent?
  • Is there a strategy for handling backpressure?
  • How are failed events handled (retries, dead-letter queues)?
  • Is distributed tracing in place to track the flow of an event across multiple services?

Edge cases

  • Duplicate or out-of-order events: enforce idempotency keys and tolerate reordering with sequence numbers or versioning.
  • Poison messages: implement retry with backoff and a Dead Letter Queue (DLQ) with alerting and quarantine procedures.
  • Large payloads: prefer reference-based events (IDs/URIs) over embedding large blobs; set broker size limits.
  • Slow consumers: use bounded buffers, consumer groups, and autoscaling; apply backpressure or drop policies where acceptable.
  • Multi-tenant isolation: partition topics by tenant or include tenant IDs and authorize at the topic/partition level.

Observability

  • Logs: emit structured logs with correlation/trace IDs on produce, consume, handle, and ack/nack paths.
  • Metrics: track publish latency, end-to-end event age, consumer lag, DLQ rates, retry counts, and handler success/error rates.
  • Traces: propagate context via headers/attributes (e.g., W3C Trace Context) across producer → broker → consumer spans.
  • Dashboards/alerts: alert on growing lag, DLQ spikes, and increased end-to-end latency versus SLOs.

Testing

  • Contract tests: validate event schemas (Avro/JSON Schema/Protobuf) and evolution rules with consumer-driven contracts.
  • Idempotency tests: replay the same event N times and assert single side-effect.
  • Failure tests: inject broker timeouts, handler exceptions, and verify retry/backoff/DLQ behavior.
  • Load tests: simulate bursty event streams to verify backpressure and autoscaling responses.

Security, Privacy, and Compliance

Ensure that no sensitive data (PII, credentials) is leaked into events. Use tokenization or reference-based payloads (e.g., orderId instead of the full order details) and require consumers to fetch sensitive data via authenticated, authorized APIs. Encrypt event payloads at rest and in transit.
Producers and consumers must authenticate with the broker (e.g., via mTLS, SASL). Enforce topic-level authorization so that only approved services can publish or subscribe to specific topics. This prevents unauthorized data access and rogue producers.
Idempotency checks based on a unique event ID are the primary defense against replay attacks, where an attacker re-sends a valid event to trigger duplicate processing. Including a timestamp in the event and rejecting events older than a certain threshold can also help.

References

  1. Bonér, J., et al. (2014). The Reactive Manifesto ↗️.
  2. Richards, M., & Ford, N. (2020). Fundamentals of Software Architecture ↗️. O'Reilly Media.
  3. Kafka Documentation. (n.d.). Security ↗️.