Skip to main content

Performance, Load, Stress, Spike, and Soak Testing

Validate latency, throughput, and scalability under various load conditions.

TL;DR

Performance testing validates system behavior under load. Different load patterns test different things: Load testing gradually increases to expected peak load and verifies P99 latency meets SLOs. Stress testing pushes beyond expected load to find breaking point. Spike testing suddenly increases load to validate auto-scaling. Soak testing runs constant load over hours/days to detect memory leaks and degradation. Use tools like JMeter, k6, or Locust. Run tests in staging (must mirror production). Automate in CI/CD for continuous performance validation. Define SLOs (e.g., P99 < 500ms) before testing; monitor during load tests.

Learning Objectives

After reading this article, you will understand:

  • The difference between load, stress, spike, and soak testing
  • How to define performance SLOs and success criteria
  • How to design realistic load tests
  • How to interpret performance metrics (latency, throughput, error rate)
  • Best practices for performance testing
  • How to identify and fix performance bottlenecks

Motivating Scenario

Your microservices platform handles 1,000 requests/second in production. During Black Friday, traffic spikes to 10,000 requests/second. You never tested at that scale; the system crashes. Auto-scaling is misconfigured, databases hit connection limits, and users experience timeouts.

Performance tests catch this: Load tests validate you handle expected peak (1,000 RPS). Spike tests validate auto-scaling when traffic suddenly jumps. Soak tests (running constant load for 24h) reveal memory leaks. Before production load hits, performance tests find and fix bottlenecks.

Core Concepts

Types of Performance Testing

Different load patterns test different aspects of system resilience
TypeWhatWhyDurationPeak Load
LoadGradual increase to expected peakValidate SLOs20-60 min1x expected
StressPush beyond limitsFind breaking point10-30 min2-5x expected
SpikeSudden jump in loadValidate auto-scaling5-15 min2-3x expected
SoakConstant load over long timeDetect leaks, degradation4-24 hours0.5x expected

Key Metrics

Latency (Response Time): How long requests take

  • P50, P95, P99 percentiles (not average!)
  • SLO example: "P99 < 500ms"

Throughput: Requests per second the system handles

  • Measured at different load levels
  • Find the plateau where throughput stops increasing

Error Rate: % of failed requests

  • Should be 0% or very low (< 0.1%)
  • Spike indicates breaking point

Resource Utilization: CPU, memory, disk, network

  • CPU usually 70-80% at peak load (headroom for spikes)
  • Memory should be stable (not growing = no leak)

Practical Example

// k6 load test: Gradual increase to expected peak
import http from 'k6/http';
import { check } from 'k6';

export const options = {
stages: [
{ duration: '5m', target: 100 }, // Ramp up to 100 users
{ duration: '10m', target: 100 }, // Stay at 100 users
{ duration: '5m', target: 0 }, // Ramp down
],
thresholds: {
'http_req_duration': ['p(99)<500'], // P99 latency < 500ms
'http_req_failed': ['rate<0.01'], // Error rate < 1%
},
};

export default function() {
// Simulate user fetching a product
const res = http.get('https://api.example.com/products/123');

check(res, {
'status is 200': (r) => r.status === 200,
'response time < 500ms': (r) => r.timings.duration < 500,
'body contains product': (r) => r.body.includes('Product Name'),
});
}

// Spike test: Sudden increase in load
export const spikeTest = {
stages: [
{ duration: '2m', target: 100 }, // Normal load
{ duration: '2m', target: 500 }, // Sudden spike
{ duration: '3m', target: 500 }, // Sustain spike
{ duration: '2m', target: 100 }, // Back to normal
{ duration: '1m', target: 0 }, // Ramp down
],
thresholds: {
'http_req_duration': ['p(99)<1000'], // Latency can increase during spike
'http_req_failed': ['rate<0.05'], // More failures during spike
},
};

// Soak test: Constant load over extended time
export const soakTest = {
stages: [
{ duration: '5m', target: 100 }, // Ramp up
{ duration: '8h', target: 100 }, // Constant load for 8 hours
{ duration: '5m', target: 0 }, // Ramp down
],
thresholds: {
'http_req_duration': ['p(99)<500'],
'http_req_failed': ['rate<0.01'],
},
};

When to Use / When Not to Use

Use Performance Testing When:
  1. You need to validate system meets latency SLOs
  2. You're approaching a major scale milestone (e.g., Black Friday)
  3. You've made infrastructure changes (database upgrade, new caching layer)
  4. You're introducing a new feature that might impact performance
  5. You want to establish baseline metrics before optimization
Avoid Performance Testing When:
  1. You haven't defined SLOs (what are you validating?)
  2. Your staging environment doesn't mirror production
  3. You're testing individual function performance (use profilers instead)
  4. The cost of testing exceeds the risk of performance issues
  5. You have no way to implement changes based on test results

Patterns and Pitfalls

Performance Testing Best Practices and Anti-Patterns

Define SLOs first: Before testing, define success criteria (P99 < 500ms). Realistic load patterns: Simulate actual user behavior, not constant throughput. Staging mirrors production: Same database version, same infrastructure scale (at least proportionally). Monitor resources: Track CPU, memory, disk during tests; identify bottlenecks. Separate concerns: Test DB separately, cache separately, full stack together. Document results: Save baseline metrics; compare after changes. Iterate: Fix bottlenecks, retest; measure improvements. Automate in CI/CD: Run performance tests before releases; fail if SLOs violated.
No SLOs: Testing without knowing success criteria. Unrealistic load: Constant 1000 RPS doesn't match actual traffic patterns (bursty). Staging != production: Testing on underpowered staging; results don't apply to production. Load from wrong location: Testing from office network; doesn't include geographic latency. No baseline: No before/after comparison; can't tell if changes helped. Ignoring percentiles: Only tracking average latency; P99 tells the real story. Single test: One load test doesn't prove scalability; test multiple scenarios. No capacity planning: Testing finds bottleneck but no plan to fix it.

Design Review Checklist

  • SLOs (Service Level Objectives) defined before testing
  • Load test simulates realistic user behavior and traffic patterns
  • Staging environment matches production in scale and configuration
  • Database, cache, and third-party services are production-like
  • Tests measure P50, P95, P99 latencies (not just averages)
  • Throughput, error rate, and resource utilization tracked
  • Load tests run for sufficient duration (at least 15-20 minutes)
  • Spike tests validate auto-scaling works as expected
  • Soak tests detect memory leaks and long-term degradation
  • Tests fail if SLOs violated (gates deployment if needed)
  • Results documented with baseline metrics for comparison
  • Bottlenecks identified and prioritized for optimization
  • Performance tests run in CI/CD (nightly or on demand)
  • Team has capacity to implement changes based on results
  • Monitoring dashboards created for metrics tracked in tests

Self-Check Questions

  • Q: What's the difference between load testing and stress testing? A: Load testing validates SLOs at expected peak load. Stress testing pushes beyond breaking point to find system limits.

  • Q: Why measure P99 latency instead of average? A: Average hides outliers. P99 means 99% of users see < that latency; the 1% who don't see poor performance.

  • Q: What causes latency spikes during load tests? A: GC pauses, database connection pool exhaustion, resource contention. Identify by monitoring resource utilization during tests.

  • Q: Should you test from your office network? A: No. Test from a location that simulates production geography. Latency varies by region.

  • Q: How often should you run performance tests? A: Before major releases (always). Nightly for high-traffic services. Ad hoc when making performance-impacting changes.

Next Steps

  1. Define SLOs — P99 latency, error rate, throughput targets
  2. Design load scenarios — Match actual traffic patterns
  3. Set up test environment — Staging that mirrors production
  4. Run baseline tests — Establish metrics before changes
  5. Identify bottlenecks — CPU, memory, database, network?
  6. Implement fixes — Cache, database tuning, autoscaling config
  7. Retest and compare — Measure improvements
  8. Automate in CI/CD — Run tests before releases

References

  1. k6 Performance Testing ↗️
  2. Apache JMeter ↗️
  3. Locust Load Testing ↗️
  4. Google Cloud Performance Testing Guide ↗️