Skip to main content

Secrets Management

Secure storage, access control, and rotation of sensitive credentials

TL;DR

Secrets are sensitive credentials: passwords, API keys, encryption keys, database credentials. Never commit to code (use environment variables or vaults). Vault (HashiCorp, AWS Secrets Manager) stores secrets encrypted, controls access, audits reads. KMS (Key Management Service) encrypts data, controls encryption keys, prevents direct key access. Rotation periodically replaces secrets (API key v1 → v2) to limit breach impact. Implement: centralized vault + audit logs + rotation every 30-90 days.

Learning Objectives

  • Understand secret types and lifecycle
  • Implement centralized secret storage (Vault, AWS Secrets Manager)
  • Design rotation strategies
  • Prevent secrets leaks (code, logs, memory)
  • Audit secret access

Motivating Scenario

Problem: Database password hardcoded in source code (GitHub). Contractor has production access forever. Developer leaves; password still valid. API key exposed in Docker image; attacker uses it for weeks.

Solution: Vault stores DB password encrypted. Code requests at runtime. Access logged. Password rotated every 30 days (old one becomes invalid). Contractor's session revoked immediately on departure. API key rotated weekly; breach window is hours, not weeks.

Core Concepts

Secret Types

Database Credentials: Username + password for database connections.

API Keys: Long-lived credentials for third-party APIs (Stripe, AWS, Slack).

Encryption Keys: Keys used to encrypt/decrypt data at rest.

Certificates & Keys: TLS certs, SSH keys, signing keys.

Tokens: OAuth tokens, API tokens, JWT signing keys.

Passwords: User passwords (usually hashed, not stored plaintext).

Vault Architecture

Application
↓ (authenticate with token/role)

Secret Vault (encrypted storage)
├─ Database credentials (encrypted)
├─ API keys (encrypted)
├─ SSH keys (encrypted)
├─ Access audit logs
└─ Rotation policies

What vault does:
✓ Encrypts secrets at rest (AES-256)
✓ Controls access (who can read what secret)
✓ Logs every read (audit trail)
✓ Rotates secrets automatically
✗ Stores secrets directly in code (bad)

KMS (Key Management Service)

KMS manages encryption keys, not data directly.

App encrypts data → KMS returns encryption key
App stores encrypted_data in database
Later: App retrieves encrypted_data from database
App decrypts using KMS → KMS checks if app authorized → Returns key

Advantages:

  • App never has unencrypted key (KMS holds it)
  • Audit trail of all key usage
  • Rotate master key; all encrypted data automatically decryptable with new key
  • Compliance (HIPAA, PCI: key access logged)

Rotation Strategy

API Key Rotation:

Day 1: api_key_v1 valid
Day 31: api_key_v2 created, both valid
Day 32: api_key_v1 revoked, only v2 valid
(If v1 leaked on day 28, attacker has 4 days of access)

Database Password Rotation:

Day 1: password_v1 set in Vault
Day 31: password_v2 generated, set in DB and Vault
Day 32: password_v1 deleted from Vault
(Old connections using password_v1 disconnect)

Encryption Key Rotation (without re-encrypting data):

Day 1: DEK (Data Encryption Key) created, encrypts data
Day 31: DEK_v2 created
Day 32: DEK_v1 marked as retired (but can decrypt old data)
(New data encrypted with DEK_v2; old data readable with DEK_v1)

Practical Examples

# Using HashiCorp Vault CLI

# Store a secret
vault kv put secret/database/prod \
username=postgres \
password=securepassword123

# Read a secret
vault kv get secret/database/prod
# Returns: username=postgres, password=securepassword123

# List access history (audit)
vault audit list
vault read sys/audit

When to Use / When Not to Use

Use Vault When
  1. Managing many secrets (API keys, passwords, certs)
  2. Team needs to share secrets securely
  3. Audit trail required
  4. Rotation policies needed
  5. Multi-environment (dev, staging, prod)
Use KMS When
  1. Encrypting data at rest
  2. Key management required
  3. Compliance needs (audit encryption access)
  4. Decryption must be logged
  5. Separation of encryption key access

Patterns and Pitfalls

Pitfall: Secrets in environment variables. Process list visible, logs leak env vars, container images include them.

Pattern: Use Vault + authentication (IAM role, JWT token). App proves identity, Vault returns secret.

Pitfall: Hardcoding secrets (password = "abc123" in code). Code reaches GitHub, contractors, logs, backups.

Pattern: Rotate all secrets by default. API keys every 30 days. Passwords every 90 days. Encryption keys: decrypt with old, encrypt with new.

Pitfall: No backup plan if Vault goes down. App can't fetch secrets; service outage.

Pattern: Cache secrets locally for short period (5 min) with fallback. If Vault unavailable, use cached copy briefly.

Pitfall: Logging secrets accidentally. "DEBUG: using password=secret123". Sanitize logs.

Pattern: Separate encryption keys by environment. Prod key never used for dev data.

Design Review Checklist

  • Centralized secret storage in place (Vault, AWS Secrets Manager, etc.)
  • No secrets in code, config files, environment vars (except tokens to fetch from Vault)
  • Access control enforced (who can read which secret)
  • Audit logging enabled (every secret access logged)
  • Rotation policy defined and automated (30-90 days)
  • Backup and recovery tested
  • TLS/mTLS for Vault communication
  • Secret lifecycle documented (creation, rotation, revocation)
  • Teams can request/approve new secrets
  • Secrets never logged or displayed
  • Encryption keys managed separately (KMS)

Self-Check

  • Why shouldn't API keys be stored in environment variables directly?
  • What's the advantage of rotating secrets regularly?
  • How would you implement zero-downtime API key rotation?
One Takeaway

Centralized vault + rotation + audit logging = reduced blast radius and full compliance trail for every secret accessed.

Real-World Implementation Examples

Complete Secrets Management Setup

# Initialize HashiCorp Vault
vault server -config=/etc/vault/config.hcl

# Create authentication token for app
vault auth enable approle
vault write auth/approle/role/myapp \
policies="app-policy" \
bind_secret_id=true

# Generate secret ID for app authentication
SECRET_ID=$(vault write -field=secret_id auth/approle/role/myapp/secret-id)

# Create database secret (auto-rotates)
vault write database/config/pg \
plugin_name=postgresql-database-plugin \
allowed_roles="readonly" \
connection_url="postgresql://root:password@postgres:5432"

vault write database/roles/readonly \
db_name=pg \
creation_statements="CREATE USER \"{{name}}\" WITH PASSWORD '{{password}}';" \
default_ttl="1h" \
max_ttl="24h"

# Application requests dynamic credentials (temporary, auto-revoked)
vault read database/creds/readonly
# Returns: username=v-approle-readonly-xyz, password=temporary-1h-duration

Secrets Rotation Without Downtime

# Zero-downtime API key rotation strategy

class APIKeyManager:
def __init__(self, vault_client):
self.vault = vault_client
self.current_key = None
self.next_key = None
self.rotation_lock = threading.Lock()

def get_current_key(self):
"""Get the active API key."""
return self.current_key

def rotate_key(self):
"""Rotate to next key without downtime."""
with self.rotation_lock:
# Generate new key in Vault
new_key = self.vault.generate_api_key()

# Update third-party service to accept BOTH keys temporarily
self.update_third_party_service([self.current_key, new_key])

# Switch our internal reference
self.next_key = new_key

# Wait for in-flight requests using old key to complete
time.sleep(5) # Grace period

# Update third-party to accept only new key
self.update_third_party_service([new_key])

# Switch active key
self.current_key = new_key
self.next_key = None

# Log rotation for audit
self.vault.audit_log("key_rotated", {"timestamp": datetime.now()})

def update_third_party_service(self, accepted_keys):
"""Update external service configuration."""
response = requests.post(
"https://api.stripe.com/v1/account/api_keys",
json={"accepted_keys": accepted_keys},
headers={"Authorization": f"Bearer {self.vault.management_token}"}
)
if response.status_code != 200:
raise Exception(f"Failed to update service: {response.text}")

# Usage: rotate during off-peak or continuously with canary
manager = APIKeyManager(vault_client)
manager.rotate_key() # Non-blocking for users

Multi-Environment Secret Separation

# Vault configuration: separate secrets per environment

vault secrets enable -path=dev kv
vault secrets enable -path=staging kv
vault secrets enable -path=prod kv

# Dev can be less restricted
vault policy write dev-policy - <<EOF
path "dev/*" {
capabilities = ["create", "read", "update", "delete", "list"]
}
EOF

# Staging more restricted
vault policy write staging-policy - <<EOF
path "staging/*" {
capabilities = ["read", "list"] # No delete/update
}
EOF

# Prod most restricted (audit everything, no delete)
vault policy write prod-policy - <<EOF
path "prod/*" {
capabilities = ["read", "list"]
}
EOF

# Application in prod:
# 1. Assumes prod IAM role (EC2, ECS, Lambda, etc.)
# 2. Vault verifies identity
# 3. Returns secret valid for 1 hour
# 4. App must re-authenticate after 1 hour (automatic)

Secrets in CI/CD Pipeline

# GitHub Actions example

name: Deploy
on: [push]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
# DO NOT use hard-coded secrets!
# DO NOT use GitHub secrets for long-term credentials!

# Instead: Use OIDC to authenticate with vault, get short-lived credentials
- name: Authenticate with Vault
run: |
# GitHub provides OIDC token
TOKEN=$(curl -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
"$ACTIONS_ID_TOKEN_REQUEST_URL&audience=vault.example.com")

# Exchange for Vault token
VAULT_TOKEN=$(curl -X POST https://vault.example.com/auth/github/login \
-d token=$TOKEN)

- name: Deploy with temporary credentials
env:
VAULT_TOKEN: ${{ steps.authenticate.outputs.token }}
run: |
# Get temporary AWS credentials from Vault
AWS_CREDS=$(vault read -format=json aws/creds/deploy)

# Deploy with temporary credentials (expire after 1 hour)
# Even if CI/CD repo is compromised, credentials are 1-hour temporary
aws s3 sync . s3://my-bucket --delete

Secrets Management Best Practices

What to Protect

Definitely Vault:

  • Database passwords
  • API keys
  • Private keys (certificates, signing keys)
  • Encryption keys (data encryption keys)
  • OAuth tokens
  • AWS credentials

Maybe Vault (depends on sensitivity):

  • Configuration (feature flags, endpoints) — may belong in config management
  • Non-secret app settings

Don't Vault (too much volume/frequency):

  • Public certificates (use DNS CAA records)
  • Public API endpoints

Rotation Frequency

  • API Keys: 30-90 days (or on compromise)
  • Database Passwords: 30 days
  • Encryption Keys: 90 days (with support for old keys)
  • SSH Keys: 365 days or on employee departure
  • OAuth tokens: Based on provider (often auto-expired)

High-Risk Patterns to Avoid

PatternRiskFix
Hardcoded secretsFound in code search, history, backupsVault + rotation
Secrets in env varsVisible in process list, core dumps, logsVault API at runtime
Secrets in config filesChecked into git, visible in containersVault injection at startup
Shared secrets across environmentsCompromise of dev affects prodSeparate secrets per environment
No rotationCompromise window is permanentAutomatic rotation every 30-90 days
No audit loggingCan't detect who accessed secretsCentralized logging with alerting

Next Steps

  • Read Encryption at Rest for key management beyond secrets
  • Study Authentication & MFA for protecting Vault access itself
  • Explore Audit Logging for tracking secret lifecycle
  • Implement HashiCorp Vault or AWS Secrets Manager in your infrastructure
  • Set up automatic rotation policies for all secret types
  • Audit and remove hardcoded secrets from existing codebases

References

  • HashiCorp Vault Documentation
  • AWS Secrets Manager
  • AWS KMS Best Practices
  • NIST SP 800-57: Key Management
  • OWASP Secrets Management Cheat Sheet
  • Real-world case studies (GitHub, AWS security blogs)