Load Testing in Pipelines

Manual load testing is expensive and rare. Automated load tests in CI/CD run on every commit, measuring latency, throughput, and resource utilization. Establish baseline metrics on the main branch, then compare PR changes against it. If latency increases by more than 5% or error rate spikes above thresholds, fail the build. This prevents performance regressions from shipping. Load tests don't require production-scale resources—test at 10% scale if constrained. Focus on bottleneck endpoints (slowest, most CPU-intensive) and critical resources (database queries, cache operations). Load testing in staging catches problems before production deployment.

Learning Objectives

Motivating Scenario

Your product ships a new feature that handles user requests differently. The code looks efficient, passes functional tests, and deploys smoothly. Three days later, customer complaints arrive: the system feels slower. Investigation reveals a 12% latency increase on the critical user search endpoint. Rollback costs an hour of downtime and customer trust.

A load test in your CI/CD pipeline would have caught this before merge. The regression would have been visible, visible, and fixable during code review—where the fix takes minutes. This pattern repeats: developers make innocent changes, aggregate effects cause performance degradation, and the first signal is customer impact.

Core Concepts

Key Metrics in Load Testing

Latency percentiles: Measure response time at p50, p95, p99. P99 matters most—even if 99% of requests are fast, one slow response degrades user experience.

Throughput: Requests per second the system handles before degradation. Measure stable throughput (after initial warmup) and peak throughput.

Error rate: Percentage of requests that fail under load. Should remain near 0% for acceptable load levels. If error rate rises under load, you've found the breaking point.

Resource utilization: CPU, memory, database connections. Identify bottlenecks—if CPU is 95% but memory is 30%, you're CPU-bound.

Load Testing Patterns

Constant load: Fixed request rate (1000 req/s for 5 minutes). Simple, reproducible.

Ramp-up load: Gradually increase request rate. Models gradual traffic growth, reveals when degradation begins.

Spike load: Sudden jump to high request rate. Models traffic spikes, reveals spike resilience.

Practical Example

When to Use / When Not to Use

Use SLO-Based Load Testing

Services with strict availability/latency SLOs
Critical customer-facing endpoints
High-traffic scenarios where performance matters
Cost-sensitive systems (efficiency = revenue)
Microservices architecture with interdependencies

Use Static Threshold Testing

Internal tools with flexible SLOs
Batch processing with no real-time requirements
One-off tests (not recurring in CI/CD)
Exploratory testing without established baselines
Development environment testing

Patterns and Pitfalls

Pattern: Measure Bottleneck Endpoints

Don't test every endpoint equally. Identify which endpoints are slowest, consume most resources, or receive most traffic. Load test those aggressively. A slow search endpoint matters more than slow admin endpoints. Bottleneck endpoints reveal where optimization efforts yield highest ROI.

Pitfall: Testing with Production-Scale Data

You don't need to test at production scale every time. A 10% scale load test in a smaller staging environment runs faster, costs less, and still reveals regressions if your endpoints scale linearly. For expensive resources, 10% scale is often sufficient.

Pattern: Warm-Up Before Measurement

Cold starts skew results. JIT compilation, connection pooling, caching all take time. Always include a warm-up phase (2-3 minutes at lower load) before measuring. Ignore warm-up results; measure only after system reaches steady state.

Pitfall: Ignoring Resource Constraints

A test that pushes 1000 req/s requires staging infrastructure to support 1000 req/s. Without proper sizing, tests themselves become bottlenecks. Baseline your staging infrastructure capacity first; design tests to stay within that.

Pattern: Alert on Degradation, Not Absolute Thresholds

Don't alert when latency hits 500ms—alert when latency increased by 10% from baseline. Absolute thresholds are context-blind; relative changes show degradation. A service might legitimately have 800ms latency; a jump to 900ms is the signal.

Pitfall: Flaky Tests from External Dependencies

If load tests depend on external APIs or services, they're flaky. Isolate load tests from external systems. Mock external dependencies or test against staging instances you control. External flakiness should not block your CI/CD pipeline.

Load Testing in Pipelines

TL;DR

Learning Objectives

Motivating Scenario

Core Concepts

Key Metrics in Load Testing

Load Testing Patterns

Practical Example

When to Use / When Not to Use

Patterns and Pitfalls

Design Review Checklist

Self-Check

Next Steps

References

Load Testing in Pipelines

TL;DR​

Learning Objectives​

Motivating Scenario​

Core Concepts​

Key Metrics in Load Testing​

Load Testing Patterns​

Practical Example​

When to Use / When Not to Use​

Patterns and Pitfalls​

Design Review Checklist​

Self-Check​

Next Steps​

References​

TL;DR

Learning Objectives

Motivating Scenario

Core Concepts

Key Metrics in Load Testing

Load Testing Patterns

Practical Example

When to Use / When Not to Use

Patterns and Pitfalls

Design Review Checklist

Self-Check

Next Steps

References