Exception Handling & Waivers
Create a lightweight process for justified deviations from standards without compromising overall coherence.
TL;DR
Create a lightweight process for justified deviations from standards without compromising overall coherence. Success in this area comes from balancing clarity with autonomy, establishing lightweight processes that serve teams, and continuously evolving based on feedback and organizational growth.
Learning Objectives
- Understand the purpose and scope of exception handling & waivers
- Learn practical implementation approaches and best practices
- Recognize common pitfalls and how to avoid them
- Build sustainable processes that scale with your organization
- Mentor others in applying these principles effectively
Motivating Scenario
Your organization faces a challenge that exception handling & waivers directly addresses. Without clear processes and alignment, teams work in silos, making duplicate decisions or conflicting choices. Investments are wasted, knowledge doesn't transfer, and teams reinvent wheels repeatedly. This section provides frameworks, templates, and practices to move forward with confidence and coherence.
Core Concepts
Purpose and Value
Exception Handling & Waivers matters because it creates clarity without creating bureaucracy. When processes are lightweight and transparent, teams understand what decisions matter and can move fast with safety.
Key Principles
- Clarity: Make the "why" behind processes explicit
- Lightweight: Every process should create more value than it costs
- Transparency: Document criteria so teams know what to expect
- Evolution: Regularly review and refine based on experience
- Participation: Include affected teams in designing processes
Implementation Pattern
Most successful implementations follow this pattern: understand current state, design minimal viable process, pilot with early adopters, gather feedback, refine, and scale.
Governance Without Bureaucracy
The hard part is scaling without creating approval bottlenecks. This requires clear decision criteria, asynchronous review mechanisms, and truly delegating decisions to teams.
Practical Example
- Process Implementation
- Standard Template
- Governance Model
# Exception Handling & Waivers - Implementation Roadmap
Week 1-2: Discovery & Design
- Understand current pain points
- Design minimal viable process
- Identify early adopter teams
- Create templates and documentation
Week 3-4: Pilot & Feedback
- Run process with pilot teams
- Gather feedback weekly
- Make quick adjustments
- Document lessons learned
Week 5-6: Refinement & Documentation
- Incorporate feedback
- Create training materials
- Prepare communication plan
- Build tools to support process
Week 7+: Scaling & Iteration
- Roll out to all teams
- Monitor adoption metrics
- Gather feedback monthly
- Continuously improve based on learning
# Exception Handling & Waivers - Quick Reference
## What This Is
[One sentence explanation]
## When to Use This
- Situation 1
- Situation 2
- Situation 3
## Process Steps
1. [Step with owner and timeline]
2. [Step with owner and timeline]
3. [Step with owner and timeline]
## Success Criteria
- [Measurable outcome 1]
- [Measurable outcome 2]
## Roles & Responsibilities
- [Role 1]: [Specific responsibility]
- [Role 2]: [Specific responsibility]
## Decision Criteria
- [Criterion that allows action]
- [Criterion that requires escalation]
- [Criterion that allows exception]
## Common Questions
Q: What if...?
A: [Clear answer]
Q: Who decides...?
A: [Clear authority]
# Governance Approach
Decision Tier 1: Team-Level (Own It)
- Internal team decisions
- No cross-team impact
- Timeline: Team decides
- Authority: Tech Lead
- Process: Documented in code review
Decision Tier 2: Cross-Team (Collaborate)
- Affects multiple teams or shared systems
- Requires coordination
- Timeline: 1-2 weeks
- Authority: System/Solution Architect
- Process: ADR review, stakeholder feedback
Decision Tier 3: Org-Level (Align)
- Organization-wide impact
- Strategic implications
- Timeline: 2-4 weeks
- Authority: Enterprise Architect
- Process: Design review, exception evaluation
Escape Hatch: Exception
- Justified deviation from standard
- Time-boxed (3-6 months)
- Requires rationale and review plan
- Authority: Role + affected team lead
Core Principles in Practice
- Make the Why Clear: Teams will follow processes they understand the purpose of
- Delegate Authority: Push decisions down; keep strategy centralized
- Use Asynchronous Review: Documents and ADRs scale better than meetings
- Measure Impact: Track metrics that show whether process is working
- Iterate Quarterly: Regular review keeps processes relevant
Success Indicators
✓ Teams proactively engage in the process ✓ 80%+ adoption without enforcement ✓ Clear reduction in the pain point the process addresses ✓ Minimal time overhead (less than 5% of team capacity) ✓ Positive feedback in retrospectives
Pitfalls to Avoid
❌ Process theater: Requiring documentation no one reads ❌ Over-standardization: Same rules for all teams and all decisions ❌ Changing frequently: Processes need 3-6 months to stabilize ❌ Ignoring feedback: Refusing to adapt based on experience ❌ One-size-fits-all: Different teams need different process levels ❌ No documentation: Unwritten processes get inconsistently applied
Related Concepts
This practice connects to:
- Architecture Governance & Organization (overall structure)
- Reliability & Resilience (ensuring systems stay healthy)
- Documentation & ADRs (capturing decisions and rationale)
- Team Structure & Communication (enabling effective collaboration)
Checklist: Before You Implement
- Clear problem statement: "This process solves [X]"
- Stakeholder input: Teams that will use it helped design it
- Minimal viable version: Start simple, add complexity only if needed
- Success metrics: Define what "better" looks like
- Communication plan: How will people learn about this?
- Pilot plan: Early adopters to validate before scaling
- Review schedule: When will we revisit and refine?
Self-Check
- Can you explain the purpose of this process in one sentence? If not, it's too complex.
- Do 80% of teams engage without being forced? If not, reconsider its value.
- Have you measured the actual impact? Or are you assuming it works?
- When did you last gather feedback? If >3 months, do it now.
Takeaway
The best processes are rarely the most comprehensive ones. They're the ones teams choose to follow because they see the value. Start lightweight, measure impact, gather feedback, and iterate. A simple process that 90% of teams adopt is infinitely better than a perfect process that 30% of teams bypass.
Exception Process in Practice
Example 1: Technology Stack Exception
Standard: "Use TypeScript for all web services"
Exception Request: "Use Python/FastAPI for data processing service"
Justification:
- Existing team has 5+ years Python experience
- None with TypeScript
- 3-month ramp-up cost for learning TypeScript
- Fast training with Python
Time-Box:
- 6 months initially
- Review at 3 months to assess success
Acceptance Criteria:
- Project delivered on schedule
- Code quality same as TypeScript services
- Team comfort increases over time
Outcome:
- Approved for 6 months
- Review scheduled for month 3
- At month 3: Team skilled, service stable → extend to 12 months
- At month 12: Codebase high quality, team considers this strategic → permanent exception
Example 2: Security Exception
Standard: "All database connections use TLS 1.3 with certificate validation"
Exception Request: "Internal database uses TLS 1.2 without certificate validation"
Justification:
- Legacy database doesn't support TLS 1.3
- Upgrade costs $50K+ and takes 2 months
- Database is internal network only
- Temporary until upgrade
Time-Box:
- 4 months (database upgrade timeline)
Acceptance Criteria:
- Monitoring for unauthorized database access
- Monthly access reviews
- No data exfiltration attempts
Risk Acceptance:
- Security team acknowledges and accepts risk
- Documented in risk register
- Escalation path defined
Outcome:
- Approved with enhanced monitoring
- Monthly audits scheduled
- Upgrade planned for next quarter
- Exception automatically expires when database upgraded
Example 3: Performance Exception
Standard: "API responses under 200ms p99"
Exception Request: "Search API: 500ms p99 for complex queries on initial launch"
Justification:
- New search engine (Elasticsearch) being integrated
- Complex queries require time to optimize
- Performance improvements planned (caching, indexing)
- User experience acceptable despite higher latency
Time-Box:
- 3 months
Acceptance Criteria:
- Monthly performance improvements documented
- caching layer added by month 2
- Query optimization by month 3
- Reach 200ms p99 by month 3
Monitoring:
- Daily p99 latency tracked
- Alert if p99 increases
Outcome:
- Approved with performance improvement plan
- Implementation monitoring confirms steady improvement
- At month 3: p99 reaches 150ms → exception revoked
Exception Request Template
# Exception Request Form
## What is the standard?
[Describe the architecture standard being requested as exception]
## Why can't you follow the standard?
[Explain constraints: cost, timeline, skills, etc.]
## What's the impact of not following the standard?
[Be honest: performance, security, maintainability consequences]
## Time-box
[When will you revisit? When will you fix it?]
## Acceptance criteria
[How will we know this exception is working?]
## Metrics/monitoring
[What will you measure?]
## Risk acceptance
[Who is accepting this risk? Stakeholder sign-off]
## Review schedule
[When will you check status?]
## Exit plan
[How will you return to standard?]
When Exceptions Become Anti-Patterns
Warning Signs:
1. "We've had an exception for 18 months"
→ Exception is becoming the standard
2. "Everyone has an exception to this"
→ Standard is too strict; rethink it
3. "We forgot about the exception"
→ No monitoring/oversight; uncontrolled deviations
4. "There's no exit plan"
→ Permanent debt accumulating
Solution:
- Quarterly review of all exceptions
- Kill exceptions that aren't actively being resolved
- If many exceptions exist, revise the standard
- Make exception visibility automatic (dashboards, alerts)
Next Steps
- Define the problem: What specifically are you trying to solve?
- Understand current state: How do teams work today?
- Design minimally: What's the smallest change that creates value?
- Pilot with volunteers: Find early adopters who see the value
- Gather feedback: Weekly for the first month, then monthly
- Refine and scale: Incorporate feedback and expand gradually
- Automate tracking: Build tooling to track exceptions
- Establish review cadence: Monthly, then quarterly exception reviews
- Use exceptions as feedback: Use exception patterns to identify standard revisions
- Celebrate resolved exceptions: Public recognition when exceptions exit
References
- ISO/IEC/IEEE 42010: Systems and Software Engineering ↗️
- Martin Fowler: Architecture Decision Records ↗️
- Forsgren, Humble, Kim: Accelerate ↗️
- "Principled Exceptions in Software Architecture" (Martin Fowler)