Skip to main content

Network Policies & mTLS

TL;DR

Network policies implement zero-trust networking: whitelist permitted traffic; deny all else. mTLS (mutual TLS) encrypts service-to-service traffic and authenticates both parties using certificates. Combine with service mesh (Istio, Linkerd) for automatic mTLS, or manage manually with cert-manager. Reduces blast radius if a service is compromised. Start with policies and cert-manager; add service meshes for advanced observability and traffic management.

Learning Objectives

  • Design network policies to restrict inter-service communication using zero-trust principles.
  • Implement mTLS for service-to-service encryption and mutual authentication.
  • Automate certificate provisioning and rotation with cert-manager.
  • Evaluate service mesh architecture for advanced networking and observability.
  • Monitor and debug encrypted traffic with network policies in place.
  • Troubleshoot common certificate and network policy issues.

Motivating Scenario

A compromised microservice attempts to exfiltrate data from sensitive services. Without network policies, the attack succeeds (default allow). With policies, unauthorized egress is blocked. mTLS ensures communication partners authenticate each other, preventing man-in-the-middle attacks and credential theft. You need both layers of defense: network policies prevent unwanted connections; mTLS ensures that only legitimate services can communicate.

Mental Model

Zero-trust networking: network policies whitelist; mTLS encrypts and authenticates.

Core Concepts

Network Policies: Firewall rules at the pod level. Default-deny, then whitelist permitted ingress/egress. Pod-to-pod communication is the basis; policies filter by labels and namespaces.

mTLS (Mutual TLS): Both client and server authenticate using certificates. Server proves its identity; client proves its identity. Prevents MITM attacks and spoofing. Essential for zero-trust.

Service Mesh: Sidecar proxies (Envoy) in each pod intercept traffic; handle mTLS, retry, rate limiting, observability. Adds operational complexity but simplifies security policy management.

Certificate Management: Automatic provisioning, renewal, and rotation using cert-manager or mesh-integrated systems (Istio CA). Reduces manual toil and certificate expiration surprises.

Zero-Trust: Assume no network is inherently trusted; verify every request (identity via mTLS, authorization via policies, encryption in transit).

Blast Radius: The scope of damage if a service is compromised. Network policies and mTLS limit blast radius by restricting what a compromised service can access.

Practical Examples

# Start with deny-all policy - default deny ingress and egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
# No rules = deny everything. This is your safety net.
---
# Order Service: Accept from API Gateway, call Payment Service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: order-service-policy
namespace: production
spec:
podSelector:
matchLabels:
app: order-api
policyTypes:
- Ingress
- Egress

# Allow ingress from API Gateway only
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress
podSelector:
matchLabels:
app: api-gateway
ports:
- protocol: TCP
port: 8080

# Allow egress to Payment Service
egress:
- to:
- podSelector:
matchLabels:
app: payment-api
ports:
- protocol: TCP
port: 8080

# Allow egress to PostgreSQL
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432

# Allow DNS (critical - without this, service discovery fails)
- to:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: UDP
port: 53
---
# Payment Service: Accept from Order Service, call Database
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: payment-service-policy
namespace: production
spec:
podSelector:
matchLabels:
app: payment-api
policyTypes:
- Ingress
- Egress

# Only Order Service can call Payment Service
ingress:
- from:
- podSelector:
matchLabels:
app: order-api
ports:
- protocol: TCP
port: 8080

# Egress to database
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432

# DNS
- to:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: UDP
port: 53
---
# Database: Accept from Order and Payment Services only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: postgres-policy
namespace: production
spec:
podSelector:
matchLabels:
app: postgres
policyTypes:
- Ingress

ingress:
- from:
- podSelector:
matchLabels:
app: order-api
- podSelector:
matchLabels:
app: payment-api
ports:
- protocol: TCP
port: 5432

Real-World Production Scenarios

Scenario 1: Multi-Tenant SaaS

Each tenant's workload is isolated via network policies. Tenant A pods cannot see Tenant B's data paths:

# Tenant A Network Policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: tenant-a-isolation
namespace: tenant-a
spec:
podSelector:
matchLabels:
tenant: a
policyTypes:
- Ingress
- Egress

# Only ingress from ingress-gateway
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress
- podSelector:
matchLabels:
tenant: a

# Egress: Tenant A database only, DNS
egress:
- to:
- podSelector:
matchLabels:
db: tenant-a
- to:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: UDP
port: 53

With mTLS, each tenant's certificate is signed by a tenant-specific CA. If Tenant A's private key is compromised, only Tenant A's traffic can be impersonated.

Scenario 2: Migrating from HTTP to mTLS

Gradual rollout using Istio PERMISSIVE mode:

# Phase 1: PERMISSIVE - accept both mTLS and plaintext
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: production
spec:
mtls:
mode: PERMISSIVE
---
# Phase 2: Monitor metrics; once traffic is all mTLS, switch to STRICT
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: production
spec:
mtls:
mode: STRICT

Scenario 3: Certificate Rotation Without Downtime

cert-manager automatically rotates certificates 30 days before expiry. Pods mounted with cert secrets receive updates:

# Check certificate expiration
kubectl get certificate -n production -o wide

# Monitor renewals
kubectl logs -n cert-manager deploy/cert-manager -f | grep renew

Istio automatically picks up rotated certificates from the pod volume.

Common Mistakes and Pitfalls

Debugging Network Policies and mTLS

# 1. Check if network policies are applied
kubectl get networkpolicies -n production -o wide

# 2. Describe a specific policy
kubectl describe networkpolicy order-service-policy -n production

# 3. Test connectivity (inside cluster, from one pod to another)
kubectl exec -it <order-pod> -n production -- sh

# Inside order pod shell:
# Test DNS
nslookup payment-api.production.svc.cluster.local

# Test TCP connection
nc -zv payment-api.production.svc.cluster.local 8080

# 4. Check pod labels (policies match on labels)
kubectl get pods -n production --show-labels

# 5. Monitor network traffic (tcpdump, only in permissive environments)
kubectl exec <pod> -- tcpdump -i eth0 -A

# 6. Test with netcat or curl from a temporary pod
kubectl run -it --image=busybox --rm debug -- sh
# Inside: wget -O- http://order-api.production.svc.cluster.local:8080/health

Decision Checklist

  • Default-deny network policy in place?
  • Whitelist policies defined for each service pair?
  • DNS egress allowed in network policies?
  • mTLS enabled for service-to-service communication?
  • Certificates auto-renewed before expiration?
  • Certificate rotation process verified (no downtime)?
  • Service mesh (if used) monitoring traffic in real-time?
  • mTLS mode is STRICT (not PERMISSIVE) in production?
  • Authorization policies restrict service pairs (not just mTLS)?
  • Monitoring alerts for certificate expiry?
  • Runbook for debugging network policy failures?

Self-Check

  • How do network policies reduce blast radius of compromised services?
  • What is mTLS, and why is mutual authentication important?
  • How do you automatically renew certificates without downtime?
  • When would you add a service mesh vs native Kubernetes security?
  • How do you debug a network policy that's blocking legitimate traffic?
  • What happens if you forget DNS in your egress policies?

One Takeaway

Zero-trust networking (network policies + mTLS) is essential for production systems. Start with policies and manual cert management; add service meshes when observability and advanced routing justify the complexity. Always test policies in a staging environment first—misconfigured policies silently break production.

Next Steps

References