Skip to main content

Right to Erasure & Data Portability

Implement user data rights: deletion and export on demand

TL;DR

GDPR Article 17 (Right to Erasure): User can request deletion; you must delete within 30 days. Exceptions: legal obligations (contracts, audits), fraud investigation, retention policies. Article 20 (Data Portability): User can request export in portable format (JSON, CSV); you must provide within 30 days. Challenges: cascading deletes (user delete → related orders, comments), backups (data lives in backups after deletion), derived data (user ID in analytics tables). Build deletion as first-class feature, not afterthought. Track data lineage to enable complete deletion. Maintain audit trail of all deletions.

Learning Objectives

By the end of this article, you will understand:

  • GDPR right to erasure and right to data portability
  • How to implement deletion workflows at scale
  • Handling cascading deletes across systems
  • Data export formats and compliance
  • Backups and disaster recovery implications
  • Legal holds that override erasure requests
  • Audit trails for deletion verification

Motivating Scenario

User requests deletion under GDPR. Your process: 1) Delete from user table, 2) Done. But user ID appears in orders, comments, reviews, analytics events, backups. A compliance audit finds orders still linked to deleted user—you fail. Correct approach: 1) Mark user for deletion, 2) Delete from all tables (orders, cascade), 3) Delete from analytics (remove user_id from events), 4) Schedule backup deletion, 5) Verify deletion propagated, 6) Maintain audit log. 30 days later, prove user data gone.

Core Concepts

Data Deletion Workflow: Cascading and Archival

Right to Erasure (GDPR Article 17)

User rights:

  • Request deletion of personal data
  • Organization must delete within 30 days
  • No fee for deletion request

Exceptions (legitimate reasons to retain):

  1. Legal obligation (contract, audit trail, tax law)
  2. Fraud investigation (suspicious deletion request)
  3. Public interest (news archives)
  4. Consent not withdrawn (user consented to different purpose)
  5. Legal hold (litigation pending)

Right to Data Portability (GDPR Article 20)

User rights:

  • Request export of personal data
  • Format: commonly used, portable (JSON, CSV, XML)
  • Machine-readable (not PDF scan)
  • Provided within 30 days

Includes:

  • All directly provided data (profile, settings, content)
  • Inferred data (behavioral profiles OK if identified separately)
  • Excludes: data not "personal" (anonymized, aggregated)

Caveat: Not if "commercially sensitive" (algorithms, trade secrets)

Cascading Deletes & Data Lineage

Problem: User ID appears in dozens of tables. Foreign key constraints can help, but non-relational systems don't enforce.

Solution: Track data lineage

User 123
├─ orders (user_id FK)
├─ payments (user_id)
├─ comments (user_id)
├─ reviews (user_id)
└─ preferences (user_id)

Analytics events
└─ events table (user_id column)

Cache
├─ redis: user:123:profile
└─ redis: user:123:orders

Search
├─ elasticsearch: users index (user_id)
└─ elasticsearch: comments index (user_id)

Backups
└─ daily backup (contains all)
└─ weekly archive (contains all)

Deletion must handle all.

Backup & Archival Complications

Backups contain deleted data:

  • Daily backup 2025-02-14 includes user 123
  • User deleted 2025-02-15
  • Backup retention: 30 days
  • Until 2025-03-16, user data accessible via backup

Options:

  1. Exclude deleted users from future backups (can't modify existing)
  2. Accept backup retention > deletion request latency
  3. Rewrite backups to remove deleted user (expensive)
  4. Purge-on-restore: during restore, filter deleted users

GDPR requirement: Document backup retention in privacy policy. If backup retention outlasts deletion, disclose.

Practical Example

from datetime import datetime, timedelta
from enum import Enum

class DeletionStatus(Enum):
REQUESTED = "requested"
VERIFIED = "verified"
IN_PROGRESS = "in_progress"
COMPLETED = "completed"
FAILED = "failed"

class DeletionRequest:
def __init__(self, user_id: int, request_timestamp: datetime):
self.user_id = user_id
self.request_timestamp = request_timestamp
self.deadline = request_timestamp + timedelta(days=30)
self.status = DeletionStatus.REQUESTED
self.audit_log = []

def verify_identity(self, verification_token: str) -> bool:
"""Step 1: Verify user identity"""
# Verify token matches user's email verification link
is_valid = verify_token(verification_token, self.user_id)

if is_valid:
self.status = DeletionStatus.VERIFIED
self.audit_log.append({
"action": "identity_verified",
"timestamp": datetime.utcnow().isoformat()
})
return is_valid

def execute_deletion(self, db_connection):
"""Step 2: Execute cascading deletes"""
self.status = DeletionStatus.IN_PROGRESS

try:
# Delete from primary user table
db_connection.execute(
"DELETE FROM users WHERE user_id = %s",
(self.user_id,)
)

# Cascade deletes (or soft delete + cleanup job)
tables_to_delete = [
"orders", "comments", "reviews", "preferences",
"notifications", "saved_items"
]

for table in tables_to_delete:
db_connection.execute(
f"DELETE FROM {table} WHERE user_id = %s",
(self.user_id,)
)

db_connection.commit()

self.status = DeletionStatus.COMPLETED
self.audit_log.append({
"action": "deletion_executed",
"timestamp": datetime.utcnow().isoformat(),
"tables_affected": tables_to_delete
})

except Exception as e:
self.status = DeletionStatus.FAILED
self.audit_log.append({
"action": "deletion_failed",
"error": str(e),
"timestamp": datetime.utcnow().isoformat()
})
raise

def verify_deletion_complete(self, db_connection) -> bool:
"""Step 3: Verify user data gone"""
# Check all tables for remaining data
count_query = """
SELECT COUNT(*) FROM (
SELECT * FROM orders WHERE user_id = %s
UNION ALL
SELECT * FROM comments WHERE user_id = %s
-- ... more tables
) AS remaining
"""

result = db_connection.execute(count_query, (self.user_id, self.user_id))
remaining_count = result.fetchone()[0]

verified = remaining_count == 0
self.audit_log.append({
"action": "deletion_verified",
"remaining_records": remaining_count,
"verified": verified,
"timestamp": datetime.utcnow().isoformat()
})

return verified

# Usage
request = DeletionRequest(user_id=123, request_timestamp=datetime.utcnow())
request.verify_identity("token_abc123xyz")
request.execute_deletion(db)
is_complete = request.verify_deletion_complete(db)
print(f"Deletion complete: {is_complete}")

When to Use / When Not to Use

Honor Deletion
  1. User requests deletion (no exceptions)
  2. Account no longer active (user departed)
  3. Data no longer necessary for purpose
  4. User revokes consent
  5. GDPR applies to EU residents
  6. No active legal hold
Refuse / Defer Deletion
  1. Fraud investigation ongoing
  2. Contract requires retention (7yr)
  3. Tax law mandates retention (7yr)
  4. Court order / legal hold
  5. Regulatory compliance
  6. Public interest (news archive)

Patterns & Pitfalls

Hard delete immediately removes data (GDPR-compliant). Soft delete marks as deleted but retains (risky—data still accessible). If using soft delete, explain to users: their data logically deleted but technically remains (for backup). Hard delete is cleanest but harder to undo (no recovery from mistakes).
User deleted from primary database but cached in Redis, in Elasticsearch, in analytics Kafka. Must coordinate deletion across all stores. Use event: UserDeleted → all systems consume and delete their copy. Risk: one system misses the event (user partly recoverable).
Checking user doesn't exist in 100 tables is slow. Optimize: maintain list of systems with user data (data lineage), check only those. Or: periodic scans for orphaned user_ids (monthly reconciliation).
Delete sensitive but keep aggregate: replace name with 'Anonymous User', age with '30-40', location with 'USA'. Balances privacy and analytics. But if user can re-identify (linked to other data), not truly anonymized = GDPR still applies.
User expects human-readable (CSV, JSON), not database dumps. Include headers, friendly column names, dates in ISO format. Avoid: binary formats, custom schemas, incomplete data.
30 days sounds long until you realize: cascading deletes take time, backups need rotation, analytics rebuilding. Start immediately; don't wait. Have deletion workflow documented and tested BEFORE users request it.

Design Review Checklist

  • Documented which data is personal (GDPR scope)
  • Identified all systems storing personal data (data lineage map)
  • Designed cascading delete logic: what deletes when?
  • For each table: how to delete user records efficiently?
  • Planned deletion in derived data: analytics, caches, search indexes
  • Documented backup retention vs deletion latency
  • Implemented deletion request verification (identity confirmation)
  • Built deletion workflow: request → verify → delete → confirm
  • Designed data export: JSON/CSV format, include all personal data
  • Audit trail: log all deletions/exports with timestamp and reason

Self-Check

  • What's the 30-day deadline for? (Hint: GDPR Article 17 window)
  • Why hard delete instead of soft delete? (Hint: true GDPR compliance)
  • What happens to backups after user deletion? (Hint: still contain data until backup expires)
  • How do you export data for portability? (Hint: JSON/CSV, machine-readable, all personal data)
  • What exceptions allow refusing deletion? (Hint: fraud investigation, legal hold, contract)

Next Steps

  • Map data lineage: find all systems with personal data
  • Implement deletion workflow: request → verify → cascade delete → confirm
  • Design data export: JSON/CSV with all personal fields
  • Test deletion: delete test user, verify from all systems
  • Document process: SOP for handling deletion/export requests

References