Fallacies of Distributed Computing
Identify the eight false assumptions that derail distributed systems architects and learn to design around them
TL;DR
Eight assumptions that seem obvious for single-machine systems fail catastrophically in distributed systems: the network is reliable, latency is zero, bandwidth is infinite, the network is secure, topology doesn't change, there's one administrator, transport cost is zero, and the network is homogeneous. Designing for distributed systems means rejecting each assumption and building systems where failures are expected, not exceptional.
Learning Objectives
- Understand each of the eight fallacies and why it's false
- Recognize real-world consequences of designing based on these assumptions
- Apply design patterns that don't depend on any fallacy
- Evaluate your architecture for hidden assumptions
Motivating Scenario
A team deploys their microservice application confidently. Their local testing shows everything works perfectly. Requests get routed between services, data flows seamlessly, and response times are minimal. But on day one of production, a distant data center develops network issues. Packets are delayed, timeouts cascade through the system, and the entire service becomes unresponsive.
The engineers had assumed: the network was reliable (fallacy #1), latency was low (fallacy #2), and bandwidth was sufficient (fallacy #3). None of these held under real conditions. They built for the happy path and paid the price.
The Eight Fallacies
1. The Network is Reliable
The Fallacy: Assuming messages always arrive and in order.
Reality: Networks drop packets, reorder messages, and partition. Distributed systems operate in an environment where communication failure is normal.
Cost of Belief: Building systems with no retry logic, losing messages silently, or assuming acknowledgments mean success.
How to Design Around It:
- Implement timeouts for all network calls
- Use idempotent operations to enable safe retries
- Design for eventual consistency where immediate success isn't guaranteed
- Monitor packet loss and latency as critical metrics
- Go
- Python
// ❌ WRONG - No timeout, no retry
response, err := http.Get("http://service/api/data")
// ✅ CORRECT - Timeout and exponential backoff
client := &http.Client{
Timeout: 5 * time.Second,
}
var response *http.Response
var err error
for attempt := 0; attempt < 3; attempt++ {
response, err = client.Get("http://service/api/data")
if err == nil {
break
}
backoff := time.Duration(math.Pow(2, float64(attempt))) * time.Second
time.Sleep(backoff)
}
# ❌ WRONG - No timeout or retry
response = requests.get("http://service/api/data")
# ✅ CORRECT - Timeout and exponential backoff
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=1, max=10)
)
def fetch_data():
response = requests.get(
"http://service/api/data",
timeout=5
)
return response.json()
2. Latency is Zero
The Fallacy: Assuming network calls are as fast as local function calls.
Reality: Network latency is orders of magnitude greater than local computation. A 100ms round trip seems small until you chain ten service calls (1 second total).
Cost of Belief: Synchronous request chains, N+1 query patterns, and unpredictable tail latencies in distributed systems.
How to Design Around It:
- Use asynchronous communication patterns
- Batch requests to reduce round trips
- Cache aggressively to avoid network calls
- Design APIs for efficient resource fetching
- Node.js
- Go
// ❌ WRONG - Sequential calls, 3-second latency
const user = await fetchUser(userId);
const profile = await fetchProfile(user.profileId);
const preferences = await fetchPreferences(user.id);
// ✅ CORRECT - Parallel calls, 1-second latency
const [user, profile, preferences] = await Promise.all([
fetchUser(userId),
fetchProfile(user?.profileId),
fetchPreferences(userId)
]);
// ✅ BETTER - Batch query to eliminate round trips
const data = await fetchUserWithRelated(userId);
// ❌ WRONG - Sequential calls chain latency
user, _ := fetchUser(userID)
profile, _ := fetchProfile(user.ProfileID)
prefs, _ := fetchPreferences(userID)
// ✅ CORRECT - Parallel calls reduce latency
userChan := make(chan *User)
profileChan := make(chan *Profile)
prefsChan := make(chan *Preferences)
go func() { userChan <- fetchUser(userID) }()
go func() { profileChan <- fetchProfile(...) }()
go func() { prefsChan <- fetchPreferences(userID) }()
user := <-userChan
profile := <-profileChan
prefs := <-prefsChan
3. Bandwidth is Infinite
The Fallacy: Assuming you can send as much data as needed without cost.
Reality: Bandwidth is limited and expensive. Transferring gigabytes across continents incurs both monetary cost and latency.
Cost of Belief: Chatty APIs, over-fetching data, and massive response payloads that overwhelm networks.
How to Design Around It:
- Minimize payload sizes through compression and selection
- Use efficient serialization formats (Protocol Buffers, MessagePack)
- Implement pagination for large datasets
- Cache frequently accessed data
4. The Network is Secure
The Fallacy: Assuming only legitimate traffic reaches your services.
Reality: Networks are hostile. Eavesdropping, spoofing, and injection attacks are constant threats.
Cost of Belief: Services that leak sensitive data, accept forged messages, or are vulnerable to replay attacks.
How to Design Around It:
- Use TLS/SSL for all inter-service communication
- Authenticate all requests with cryptographic credentials
- Implement authorization checks, not just authentication
- Use mutual TLS for service-to-service communication
5. Topology Doesn't Change
The Fallacy: Assuming fixed network structure and static service locations.
Reality: Services scale horizontally, instances fail and restart, and network paths change constantly.
Cost of Belief: Hard-coded service addresses, connection pools that become stale, and routing logic that breaks when services scale.
How to Design Around It:
- Implement service discovery (DNS, service registries, mesh)
- Use client-side or server-side load balancing
- Design for horizontal scaling from the start
- Expect instances to appear and disappear
6. There is One Administrator
The Fallacy: Assuming a single entity manages the entire network.
Reality: Large systems span multiple teams, organizations, and network domains with different policies and capabilities.
Cost of Belief: Systems that break when crossing organizational boundaries, inability to scale to enterprise scale, and coordination nightmares.
How to Design Around It:
- Use standardized protocols and APIs
- Design for federation and composition
- Assume heterogeneous ownership and governance
- Minimize coordination requirements between domains
7. Transport Cost is Zero
The Fallacy: Assuming network operations are free.
Reality: Transport has financial cost (data transfer charges), operational cost (infrastructure), and performance cost (latency, energy).
Cost of Belief: Designs that create excessive network traffic, resulting in high cloud bills and degraded performance.
How to Design Around It:
- Minimize cross-region communication
- Cache aggressively
- Use local-first designs where possible
- Monitor and optimize data transfer patterns
8. The Network is Homogeneous
The Fallacy: Assuming all network segments have uniform characteristics.
Reality: Networks vary widely: WiFi vs ethernet, local networks vs WAN, satellite vs fiber. Requirements differ by network type.
Cost of Belief: Solutions that work in data centers but fail on edge networks, or vice versa.
How to Design Around It:
- Support multiple protocols and configurations
- Implement adaptive algorithms that adjust to conditions
- Test across diverse network conditions
- Design for both high-bandwidth and low-bandwidth scenarios
Practical Consequences
The fallacies interact to create complex failure modes:
-
Fallacies #1, #2, #3 Together: Lead to cascading timeouts. When a service becomes slow, requests back up, creating thundering herds of retries that make everything worse.
-
Fallacies #5, #6 Together: Create configuration management nightmares. Hard-coded addresses in a federated system means changes require coordinating across organizations.
-
Fallacies #4, #7 Together: Lead to unencrypted data transfer to save bandwidth, exposing sensitive information.
Design Review Checklist
- Services do not depend on network being reliable (timeouts, retries implemented)
- Latency expectations are realistic (not assuming millisecond RPCs)
- Bandwidth efficiency is considered (payload sizes, compression)
- Security is built-in, not bolted-on (TLS, authentication, authorization)
- Service discovery is dynamic, not hard-coded
- Federation is possible (services can be owned by different teams)
- Transport costs are understood and optimized
- Network heterogeneity is accommodated (multiple protocols, adaptivity)
Self-Check
Can you explain what would happen if:
- A network partition isolated your database service from your API service?
- Network latency doubled between your microservices?
- A service instance crashed and restarted with a different IP address?
Every fallacy has a design pattern that addresses it. The fallacies aren't failures of luck—they're laws of distributed systems. Design for them from day one.
Next Steps
- Understand the Trade-offs: Read CAP & PACELC Theorems
- Choose Consistency: Learn about Consistency Models
- Implement Resilience: Explore Timeouts and Retries
References
- Deutsch, L. P., & Lampson, B. W. (1994). "The Eight Fallacies of Distributed Computing". Xerox PARC Technical Report.
- Rotem-Gal-Oz, A. (2006). "Fallacies of Distributed Computing Explained". Published online.
- Kleppmann, M. (2017). "Designing Data-Intensive Applications". O'Reilly Media.