Skip to main content

Distributed Tracing

In a monolith, a request enters at one point and exits at another—straightforward. In distributed systems, a request might touch 5-10 services, databases, queues, and caches. A 500ms request might spend 100ms in service A, waiting for service B, 200ms in service B calling service C, 50ms in service C, and 150ms crossing the network. Which service caused the slowness? Where did we wait? Tracing answers these questions by following a request through all services, recording time spent at each step.

This section covers:

  • Trace Context Propagation: How correlation IDs flow through services
  • Service Maps: Visualizing which services call which
  • OpenTelemetry: Standard instrumentation for tracing