Skip to main content

Copy-Paste Programming

Duplicating code instead of extracting shared functionality, violating DRY principle.

TL;DR

Copy-paste programming duplicates code across your codebase, violating the DRY (Don't Repeat Yourself) principle. When you fix a bug in one copy, you forget about the others. When requirements change, you must update every copy consistently. This leads to subtle bugs, increased maintenance burden, and frustration. Extract shared code into reusable functions immediately.

Learning Objectives

You will be able to:

  • Identify duplicated code patterns in your codebase
  • Understand why duplication is expensive to maintain
  • Extract duplicated logic into reusable functions
  • Measure and track duplication in your codebase
  • Implement refactoring strategies to eliminate duplication
  • Design generic functions that work across multiple scenarios

Motivating Scenario

Your team is building an e-commerce platform. In the first sprint, you write user registration validation:

// In UserController.js
if (!email.includes('@')) throw new Error('Invalid email');
if (password.length < 8) throw new Error('Password too short');
if (name.length < 2) throw new Error('Name too short');

Three months later, you add a profile update feature. Your colleague copies this validation:

// In ProfileController.js
if (!email.includes('@')) throw new Error('Invalid email');
if (password.length < 8) throw new Error('Password too short');
if (name.length < 2) throw new Error('Name too short');

A bug is discovered: email validation is too strict (doesn't allow .co.uk domains). You fix it in UserController:

if (!email.includes('@') || !email.includes('.')) throw new Error('Invalid email');

But ProfileController still uses the old validation. Now some users can update their profile with invalid emails. The bug takes three days to discover and causes angry customer support tickets.

This scenario repeats throughout the codebase. Duplicated validation, duplicated formatting logic, duplicated error handling. Each duplication is a ticking time bomb.

Core Explanation

Why Copy-Paste Happens

  • It's fast: Copying is faster than designing generic functions
  • It works: The code works for your immediate use case
  • It seems harmless: "I'll refactor this later"
  • Risk aversion: "What if the new context requires different behavior?"

The Hidden Cost

Copy-paste has a deferred cost that compounds:

ChangeCost with DuplicationCost without Duplication
Fix bug in 1 placeSearch 3-5 times for duplication, fix each copy, risk forgetting oneFix once, done
Add feature flagUpdate 4 copies, test 4 timesUpdate once, test once
Change error messageManual search + replace in multiple filesOne line change
Onboarding new dev"Watch out for email validation in these 6 places""Use validateUser() function"

After 6 months, that "quick copy" has cost 40+ hours of maintenance and caused 3 bugs.

The Psychological Impact

Developers become afraid to change duplicated code. "If I change this, I might break the other place." This fear causes stagnation and technical debt accumulation.

Pattern Visualization

Code Duplication Cascade

Code Examples

user_service.py
def register_user(email, password, name):
"""Register a new user"""
# Duplicated validation
if '@' not in email or '.' not in email:
raise ValueError('Invalid email')
if len(password) < 8:
raise ValueError('Password must be 8+ characters')
if not name or len(name) < 2:
raise ValueError('Name must be at least 2 characters')
# ... continue with registration
return create_user(email, password, name)

def update_profile(user_id, email, password, name):
"""Update user profile"""
# SAME VALIDATION COPIED HERE!
if '@' not in email or '.' not in email:
raise ValueError('Invalid email')
if len(password) < 8:
raise ValueError('Password must be 8+ characters')
if not name or len(name) < 2:
raise ValueError('Name must be at least 2 characters')
# ... continue with update
return update_user(user_id, email, password, name)

def reset_password(email, new_password):
"""Reset user password"""
# Duplicated password validation
if len(new_password) < 8: # Inconsistent!
raise ValueError('Password must be 8+ characters')
# ... continue with reset
return update_user_password(email, new_password)

def invite_user(inviter_id, invitee_email):
"""Invite user by email"""
# Duplicated email validation (slightly different!)
if '@' not in invitee_email: # Missing '.' check!
raise ValueError('Invalid email')
# ... continue with invitation
return send_invitation(inviter_id, invitee_email)

Patterns and Pitfalls

Why Duplication Spreads

1. The "Quick Copy" Excuse "I'll just copy this validation for now, refactor later." Later never comes. The code ships, users depend on it, and refactoring becomes harder.

2. Subtle Variations Duplication isn't always exact. One copy validates email.length > 0, another uses email != null. These subtle differences cause bugs when requirements change.

3. Copy-Paste Inheritance One team copies code from another team. Both teams now maintain separate copies, diverging over time.

4. Fear of Generalization "What if these two use cases diverge later?" Better to extract with parameters than duplicate. If they diverge, refactor then.

Duplication Metrics

Duplication Ratio = (Duplicated Lines) / (Total Lines)
- < 5%: Acceptable
- 5-10%: Concerning, plan refactoring
- > 10%: Crisis, major refactoring needed

Tools: SonarQube, PMD, Checkstyle can measure duplication automatically.

When This Happens / How to Detect

Red Flags:

  1. Same validation logic in 3+ places
  2. Bug fixed in one place, reappears in another copy
  3. Copy-paste commits: "Copy user validation from X to Y"
  4. Variable/function names identical in multiple files
  5. Changes to one copy don't propagate to others
  6. IDE shows "Similar code" warnings
  7. Tests for the same logic in multiple test files

Automated Detection:

# Find duplicate code blocks (40+ lines)
sonarqube-scanner --report duplication

# Using PMD
pmd check --dir src/ --rulesets duplicates

# Simple grep for exact duplicates
grep -r "if (!email.includes('@'))" src/

How to Fix / Refactor

Step 1: Identify Duplication

# Find the same validation in multiple places
grep -n "validate.*email" src/**/*.js

Step 2: Understand the Variations

Are the copies identical or slightly different? Identical? Easy to extract. Variations? Extract the common parts, parameterize the differences.

Step 3: Create Shared Function/Utility

Extract to a shared module that all code uses:

// validators/user-validator.js
export const validateEmail = (email) => { ... };

// All controllers import and use it

Step 4: Gradually Migrate

Update one consumer at a time to use the shared function. Test each migration.

Step 5: Remove Old Copies

Once all consumers are migrated, delete the old duplicated code.

Operational Considerations

Refactoring Under Pressure:

Copy-paste is tempting when deadlines loom. Resist it. The short-term speed gain costs 10x in maintenance later.

Code Review:

Make duplication detection a code review practice. "This validation already exists in X module, use that instead."

Automated Enforcement:

Use static analysis tools (SonarQube, ESLint plugins) to flag potential duplication and block merges if too high.

Design Review Checklist

  • Is the same validation logic in only one place?
  • Do similar functions use extracted helpers, not copy-paste?
  • Are error messages consistent across similar validations?
  • Can you change a requirement and update once, not in 5 places?
  • Are utility functions in a shared location, not scattered?
  • Did code review catch and prevent duplication?
  • Are duplicate detection tools (SonarQube) part of CI/CD?
  • Can you trace a shared behavior to a single source?
  • Are there no commented-out or 'legacy' copies of functions?
  • Would a new developer know to use validateEmail() instead of writing it again?
  • Are tests for shared logic in one test file, not duplicated?

Showcase

Signals of Copy-Paste Programming

  • Email validation in 5 different files
  • Bug fixed in UserController, appears in ProfileController
  • Similar function names: validateEmail(), validateUserEmail(), checkEmail()
  • Slightly different error messages for same validation
  • Copy-paste commits in git log
  • Duplication ratio > 10%
  • Email validation in validators/email-validator.js only
  • Bug fix applies everywhere automatically
  • Single validateEmail() function used everywhere
  • Consistent error messages from one source
  • Commits create new shared utilities
  • Duplication ratio < 5%

Self-Check

  1. Can you find the email validation in 10 seconds? If it's in 3+ places, you have duplication.

  2. If you change the password requirement from 8 to 12 characters, how many files need updating? Should be one.

  3. Do your tests duplicate test cases for the same logic? If yes, the logic should be shared.

Next Steps

  • Identify: Run duplication analysis on your codebase
  • Extract: Choose one duplicated pattern and create a shared utility
  • Migrate: Update all consumers to use the shared utility
  • Verify: Ensure tests pass, behavior unchanged
  • Prevent: Add SonarQube/linting to catch future duplication

One Takeaway

ℹ️

Every duplicated line is a future bug. Extract shared logic immediately, even if it seems simple. The maintenance cost of duplication compounds exponentially.

References

  1. Don't Repeat Yourself (DRY) Principle ↗️
  2. Refactoring: Extract Method ↗️
  3. SonarQube - Duplicate Code Detection ↗️
  4. Code Duplication ↗️