Skip to main content

ORMs: Active Record vs Data Mapper

Abstraction patterns for database persistence with different trade-offs

TL;DR

Active Record: Objects know how to persist themselves (User.save(), Post.find(id)). Simpler, more intuitive, but tightly couples entities to database. Better for CRUD-heavy applications with straightforward domains.

Data Mapper: Separate objects from persistence logic (UserRepository.save(user)). More testable, cleaner architecture, enables complex domains. Higher upfront complexity. Better for larger systems with intricate business logic.

Choose based on domain complexity: simple CRUD → Active Record; complex domains with lots of business logic → Data Mapper.

Learning Objectives

  • Understand Active Record and Data Mapper patterns deeply
  • Recognize trade-offs in coupling vs simplicity vs testability
  • Design persistence layers that match your domain complexity
  • Know when each pattern excels and when to avoid it
  • Optimize query patterns and avoid common pitfalls

Core Concepts

Active Record vs Data Mapper architecture

Pattern Comparison

Active Record
  1. Entity objects contain persistence logic
  2. Example: User.save(), Post.find(id), user.destroy()
  3. Tight coupling to database schema
  4. Simpler for CRUD operations
  5. Less testable (must mock database)
  6. Frameworks: Rails, Laravel, ASP.NET MVC, Django
  7. Best for: Small to medium CRUD apps, rapid prototyping
Data Mapper
  1. Entities are plain objects (POJOs/DTOs)
  2. Example: UserRepository.save(user), userRepository.findById(id)
  3. Loose coupling to persistence
  4. Better for complex domain logic
  5. Highly testable (mock repository easily)
  6. Frameworks: SQLAlchemy, TypeORM, JPA, Hibernate
  7. Best for: Large systems, complex domains, testing-first development

Practical Examples

# Active Record: Entity knows how to persist itself

class User < ApplicationRecord
# Associations
has_many :posts
has_many :comments

# Validations (business logic mixed with persistence)
validates :email, presence: true, uniqueness: true, format: { with: URI::MailTo::EMAIL_REGEXP }
validates :name, presence: true, length: { minimum: 2, maximum: 100 }

# Scopes (queries)
scope :active, -> { where(active: true) }
scope :recent, -> { order(created_at: :desc) }

# Methods (business logic)
def send_welcome_email
WelcomeMailer.deliver_now(self)
end

def full_name
"#{first_name} #{last_name}"
end
end

# Usage - very intuitive
user = User.new(email: "alice@example.com", name: "Alice")
user.save # Calls database

# Find queries
user = User.find(1)
user = User.find_by(email: "alice@example.com")
active_users = User.active.recent.limit(10)

# Update
user.update(email: "alice.smith@example.com")

# Delete
user.destroy

# Associations are lazy-loaded by default (N+1 risk!)
users = User.all
users.each do |u|
puts u.posts.count # Separate query per user!
end

# Fix: Use eager loading
users = User.includes(:posts).all # Loads all posts in 1 extra query

Pitfalls of Active Record:

  • N+1 query problem (forgot to eager load relations)
  • Business logic mixed with persistence logic
  • Difficult to test (requires database or mocking entire AR)
  • Scope pollution (many scopes make model hard to understand)
  • Cannot reuse object without persistence concern

Common ORM Problems and Solutions

Design Patterns: Query Optimization

# WRONG: N+1 queries
users = session.query(User).all() # 1 query
for user in users:
print(user.posts) # N additional queries (1 per user)

# RIGHT: Eager loading (join)
users = session.query(User).options(joinedload(User.posts)).all() # 2 queries (1 join)

# RIGHT: Explicit join with select_related
users = (session.query(User)
.join(Post)
.options(contains_eager(User.posts))
.distinct()
.all())

Decision Checklist

  • Domain complexity: simple CRUD or complex business logic?
  • Team experience: familiar with pattern choice?
  • Testing strategy: need isolated unit tests or integration tests?
  • Query patterns identified and optimized?
  • N+1 query problem acknowledged and addressed?
  • Eager loading strategy planned (joins vs separate queries)?
  • Transaction boundaries clear?
  • Bulk operations use batch queries?
  • Testing approach supports chosen pattern?
  • Lazy loading gotchas documented?
  • Query performance monitored (logging slow queries)?
  • Migration strategy for schema changes planned?

Self-Check

  • What's the N+1 query problem and how do you solve it?
  • When would you choose Active Record over Data Mapper?
  • How do lazy loading and eager loading differ, and what are the tradeoffs?
  • What testing advantages does Data Mapper provide?
  • How do you efficiently delete/update hundreds of entities?
  • Why might mixing business logic into entities be problematic?
info

Active Record is simpler but couples entities to the database; Data Mapper separates concerns at the cost of extra code. Choose based on your domain complexity and testing needs. Larger systems almost always benefit from Data Mapper.

One Takeaway

Choose your ORM pattern based on domain complexity and testing requirements. Simple CRUD apps: Active Record (faster to build). Large systems with complex business logic: Data Mapper (easier to test and maintain). Either way, master query optimization (eager loading, bulk operations) to avoid performance pitfalls.

Next Steps

  • Explore Query Patterns for optimal data retrieval strategies
  • Learn Indexing Strategies for query performance
  • Study Caching Patterns for ORM optimization (query result caching)
  • Dive into Performance Optimization for ORMs (profiling, monitoring)
  • Master Transaction Management and ACID properties

References

Real-World ORM Problems and Solutions

The N+1 Query Problem in Detail

Scenario: Display list of blog posts with author names.

# WRONG: N+1 queries (1 for posts + 1 per post for author)
posts = session.query(Post).all() # Query 1: SELECT * FROM posts
for post in posts:
print(f"{post.title} by {post.author.name}") # Queries 2-N: SELECT * FROM users WHERE id = ?

# 100 posts = 101 SQL queries! 😱

# RIGHT: Eager loading with join (2 queries total)
posts = session.query(Post).join(Author).options(contains_eager(Post.author)).all()
# Query 1: SELECT posts.*, users.* FROM posts JOIN users ON ...
# Query 2: Not needed!
# 100 posts = 2 SQL queries ✓

# RIGHT: Lazy loading with options (2 queries)
posts = session.query(Post).options(selectinload(Post.author)).all()
# Query 1: SELECT * FROM posts
# Query 2: SELECT * FROM users WHERE id IN (1, 2, 3, 4, 5)
# Smarter: loads all authors in one IN() query, not individual queries

Lazy Loading vs Eager Loading

StrategyQuery CountLatencySuitable For
LazyN+1High but distributedSingle object (not lists)
Join Eager1Low, predictableSmall relations
Subquery Eager2Low, predictableMost cases (recommended)
ExplicitVariableDependsComplex queries

Memory Leaks in ORMs

Common cause: Sessions not closed, objects accumulate in memory.

# WRONG: Memory leak (session keeps all objects)
session = Session()
for i in range(1000000):
user = User(email=f"user{i}@example.com")
session.add(user)
session.commit()
# After 1M iterations: session holds 1M users in memory (GBs!)

# RIGHT: Bulk insert with periodic cleanup
session = Session()
for i in range(1000000):
user = User(email=f"user{i}@example.com")
session.add(user)

if i % 10000 == 0:
session.commit() # Flush every 10K
session.expunge_all() # Clear memory

session.commit()

Transaction Deadlocks

Problem: Two transactions lock each other.

# Transaction 1           # Transaction 2
update_account(A) # update_account(B)
update_account(B) # update_account(A) ← DEADLOCK!

# RIGHT: Always acquire locks in same order
def transfer_money(account_a_id, account_b_id, amount):
# Sort IDs: ensure same order always
id1, id2 = min(account_a_id, account_b_id), max(account_a_id, account_b_id)

with db.transaction():
acc1 = session.query(Account).filter(Account.id == id1).with_for_update().first()
acc2 = session.query(Account).filter(Account.id == id2).with_for_update().first()

acc1.balance -= amount
acc2.balance += amount
# No deadlock: always A then B

Query Performance Tuning

Index strategies:

# SLOW: Full table scan
users = session.query(User).filter(User.email == 'alice@example.com').all() # O(n)

# FAST: Index on email
# In schema: CREATE INDEX idx_users_email ON users(email)
users = session.query(User).filter(User.email == 'alice@example.com').all() # O(log n)

# SLOW: Multiple conditions, no index
posts = session.query(Post).filter(
Post.author_id == 1,
Post.created_at > '2025-01-01',
Post.status == 'PUBLISHED'
).all()

# FAST: Composite index
# CREATE INDEX idx_posts_author_date_status ON posts(author_id, created_at DESC, status)
posts = session.query(Post).filter(
Post.author_id == 1,
Post.created_at > '2025-01-01',
Post.status == 'PUBLISHED'
).all() # Uses index efficiently