Performance & Scalability

Performance and scalability are inseparable: a system must respond quickly under baseline load (performance) and maintain that responsiveness as load increases (scalability). This section covers latency budgets, load testing strategies, profiling bottlenecks, and optimization techniques like caching and batching.

What you'll find here

Latency Budgets, SLAs, and SLOs — Define and track response time targets end-to-end.
Load, Stress, Soak, and Spike Testing — Validate performance under various load conditions.
Profiling and Bottleneck Analysis — Identify where time is spent and what to optimize.
Caching, Batching, and Queueing — Common optimization patterns.

Quick mental model

Performance and scalability decision tree.

Key concepts

Latency: Time for a single request to complete (P50, P95, P99).
Throughput: Requests per second the system can handle.
Scalability: Maintaining performance as load increases (horizontal or vertical).
SLA: Service Level Agreement—contractual uptime guarantee (e.g., 99.9%).
SLO: Service Level Objective—internal target (e.g., P99 < 200ms).
Bottleneck: Resource that becomes saturated first, limiting throughput.

Common patterns

Caching: Store frequently accessed data in fast, temporary storage.
Batching: Combine multiple requests into one for efficiency.
Queueing: Decouple producers from consumers to prevent overload.
Sharding: Partition data across servers for parallel access.
Compression: Reduce network payload size.
Async I/O: Avoid blocking on I/O, improve concurrency.

📄️ Latency Budgets, SLAs, and SLOs

Define end-to-end latency targets, track SLOs, and communicate availability guarantees via SLAs.

📄️ Load, Stress, Soak, and Spike Testing

Validate system behavior under various load conditions to ensure performance and reliability targets.

📄️ Profiling and Bottleneck Analysis

Identify performance bottlenecks using profilers, distributed tracing, and flamegraphs

📄️ Caching, Batching, and Queueing

Essential optimization patterns: caching for latency, batching for throughput, queueing for resilience.

What you'll find here​

Quick mental model​

Key concepts​

Common patterns​