Cost Monitoring and FinOps Integration

Track costs: measure total infrastructure spend per month and per service. Attribution: which service costs how much? Which team owns which cost? Optimization: what can be reduced without impacting users? Tools: resource tagging (label by service/team), showback (visibility without charge), chargeback (costs impact team budgets). Target: dedicate 30% of engineering effort to cost optimization and achieve 30% savings. Measure cost per transaction: decreasing cost per unit processed indicates better efficiency. Monitor trends: is cost growing faster than revenue? Investigate spikes immediately—they indicate scaling events, inefficient code, or wasted resources that need addressing.

Learning Objectives

Motivating Scenario

Your infrastructure team notices the monthly cloud bill jumped from $250K to $340K in a single month. No major feature shipped, no traffic increase. Investigation reveals three problems: (1) a deprecated service still running with no traffic, consuming $40K/month; (2) a microservice with inefficient database queries consuming 5x more computing than needed; (3) old snapshots and backups no longer needed but never cleaned up.

Without cost visibility, these problems persist indefinitely. With cost monitoring and attribution, each team sees their service cost. Engineers become cost-conscious: they optimize queries, clean up resources, and right-size instances. The engineering team owns cost as seriously as they own latency.

Core Concepts

Cost Attribution Model

Tagging strategy: Label every resource (compute, storage, database) with metadata:

From tags, cloud providers (AWS, GCP, Azure) can generate cost reports broken down by service, team, or cost-center.

Showback vs. Chargeback

Showback: "Your team's services cost $50K/month. Here's the breakdown." Visibility without financial impact. Builds awareness without forcing strict accountability.

Chargeback: "Your team's budget is $40K/month. You're using $50K. You need to optimize or request more budget." Financial accountability drives behavior change. Creates tension between engineering and finance if not managed carefully.

Key Cost Metrics

Cost per transaction: If your service processes 1M requests/month at $10K cost, that's $0.01 per request. Track this trend. Decreasing cost per unit = increasing efficiency.

Cost growth vs. revenue growth: Healthy companies: cost growth < revenue growth. If cost grows faster than revenue, margins compress. Signal: something is inefficient.

Utilization: If you're paying for 100 CPU cores but using only 40, you're overprovisioned by 60%. Target: 60-70% average utilization at peak (headroom for spikes, but not massive waste).

Practical Example

When to Use / When Not to Use

Use FinOps & Cost Monitoring

High cloud infrastructure spend (>$50K/month)
Multiple teams sharing infrastructure
Rapidly scaling systems with unpredictable growth
Cost-sensitive business with margin pressures
Multi-service architecture with shared resources

Less Critical For

Startups with minimal cloud spend (<$10K/month)
Single-team organizations
Fixed infrastructure (on-premises)
Services with stable, predictable usage
Early-stage projects in active development

Patterns and Pitfalls

Pattern: Cost as a Design Constraint

Include cost in architecture design decisions. Choosing between RDS and DynamoDB? Calculate cost per transaction. Considering a third-party API? Account for per-request fees. When engineers see the cost upfront, they make better trade-off decisions and explore more efficient alternatives.

Pitfall: Overly Granular Tagging

Don't try to tag every detail (environment, tier, owner, project, cost-center, team, squad, pod...). Too many tags create maintenance burden and inconsistency. Use 4-6 core tags (service, team, environment, cost-center) and leave it at that.

Pattern: Monthly Cost Reviews

Schedule 30-minute monthly cost reviews: service leads review their costs, identify trends, celebrate decreases, investigate increases. When teams see cost data regularly, problems are caught early. Spike of 10% last month? Often due to known scaling event or temporary load.

Pitfall: Treating Costs as Someone Else's Problem

If only finance tracks costs and engineers never see them, no one optimizes. Cost visibility is necessary but not sufficient. Engineers must own cost outcomes. This requires culture change: cost is a design constraint, like latency or reliability.

Pattern: Right-Sizing Based on Actual Usage

Don't buy based on theoretical maximum. Measure actual utilization for 2-4 weeks, then right-size. A database that runs at 40% CPU 99% of the time doesn't need the size you bought for peak capacity. Right-sizing saves 20-40% on compute costs.

Pitfall: Ignoring Reserved Instances

Buying on-demand resources is expensive. For stable, long-term workloads (production databases, always-on services), reserved instances or commitments save 30-60%. However, reserved instances require forecasting accuracy and lock-in.

Cost Monitoring and FinOps Integration

TL;DR

Learning Objectives

Motivating Scenario

Core Concepts

Cost Attribution Model

Showback vs. Chargeback

Key Cost Metrics

Practical Example

When to Use / When Not to Use

Patterns and Pitfalls

Design Review Checklist

Self-Check

Next Steps

References

Cost Monitoring and FinOps Integration

TL;DR​

Learning Objectives​

Motivating Scenario​

Core Concepts​

Cost Attribution Model​

Showback vs. Chargeback​

Key Cost Metrics​

Practical Example​

When to Use / When Not to Use​

Patterns and Pitfalls​

Design Review Checklist​

Self-Check​

Next Steps​

References​

TL;DR

Learning Objectives

Motivating Scenario

Core Concepts

Cost Attribution Model

Showback vs. Chargeback

Key Cost Metrics

Practical Example

When to Use / When Not to Use

Patterns and Pitfalls

Design Review Checklist

Self-Check

Next Steps

References