Skip to main content

Query Patterns: Pagination, Filtering, Sorting

Efficient patterns for large result sets with user controls

TL;DR

Offset-limit pagination is simple but slow on large offsets. Cursor-based pagination uses ordered unique columns for constant-time fetches. Filtering requires strategic indexes. Sorting is expensive; push to application if possible. Design queries to use indexes on WHERE and ORDER BY clauses.

Learning Objectives

  • Understand pagination trade-offs (offset-limit vs cursor)
  • Design efficient filtering with indexes
  • Optimize sorting in queries
  • Handle common query patterns at scale

Pagination Strategies

Offset-Limit
  1. Simple implementation
  2. User-facing pagination UI
  3. Small datasets (<10K rows)
  4. Unknown total count OK
Cursor-Based
  1. Large datasets (>1M rows)
  2. Consistent ordering critical
  3. Real-time feeds (social media)
  4. Performance at scale

Practical Examples

-- Simple pagination: Page 2, 20 per page
SELECT id, title, author FROM posts
WHERE status = 'published'
ORDER BY created_at DESC
LIMIT 20 OFFSET 40;

-- Problem: As offset grows, MySQL scans all rows up to offset
-- Offset 1,000,000 scans 1M rows, then returns 20

-- Optimization: Use indexed column for WHERE
SELECT id, title, author FROM posts
WHERE status = 'published' AND created_at < '2025-02-10'
ORDER BY created_at DESC
LIMIT 20;

Patterns and Pitfalls

Common Query Pattern Mistakes

Mistake 1: No Index on WHERE Clause

-- WRONG: Full table scan even with where clause
SELECT * FROM orders WHERE customer_id = 123; -- No index
-- Scans all 10M rows, returns 500

-- RIGHT: Index on WHERE column
CREATE INDEX idx_orders_customer ON orders(customer_id);
-- Index lookup: instant

-- Impact: 10 seconds → 10ms (1000x improvement)

Mistake 2: Sorting Without Index

-- WRONG: In-memory sort, expensive
SELECT * FROM posts WHERE status = 'published'
ORDER BY created_at DESC
LIMIT 20; -- Without index, MySQL sorts 100K results in memory

-- RIGHT: Index supports both WHERE and ORDER
CREATE INDEX idx_posts_status_created ON posts(status, created_at DESC);

-- Impact: 500ms → 20ms

Mistake 3: SELECT without LIMIT

# WRONG: Can return millions of rows
result = db.query("SELECT * FROM users")

# RIGHT: Always paginate
result = db.query("SELECT * FROM users LIMIT 20 OFFSET 0")

# Impact: Crashes client, exhausts server memory

Mistake 4: Sorting by Computed Column

-- WRONG: Computes for every row, no index
SELECT * FROM products
ORDER BY price * (1 - discount) DESC; -- Expensive calculation

-- RIGHT: Precompute and index
CREATE INDEX idx_final_price ON products(final_price DESC);

-- Impact: 2 seconds → 50ms

Self-Check

  • Why is offset pagination slow for large offsets?
  • How does cursor-based pagination maintain consistency?
  • What index would you create for filtering+sorting?
  • When would you cache counts instead of computing?
  • Can you identify which queries in your application lack proper indexes?
  • What's the largest offset your application currently uses?
  • Are you monitoring query performance (explain plans)?
  • Do you know which queries are causing slowness?
info

Use offset-limit for small datasets and simple UIs; switch to cursor-based pagination for large result sets. Always ensure WHERE and ORDER BY clauses use indexes.

Next Steps

  • Explore Indexing Strategies for query optimization
  • Learn Caching Patterns for computed values
  • Study Search Engines for complex filtering
  • Dive into Query Performance monitoring

Advanced Query Optimization Patterns

Range Queries with Indexes

-- Create composite index for range queries
CREATE INDEX idx_posts_created_at_status ON posts(created_at DESC, status);

-- Efficient range query using index
SELECT id, title, created_at
FROM posts
WHERE created_at BETWEEN '2025-01-01' AND '2025-02-14'
AND status = 'published'
ORDER BY created_at DESC
LIMIT 20;

-- Verify index is used
EXPLAIN ANALYZE
SELECT * FROM posts
WHERE created_at > '2025-01-01'
AND status = 'published'
ORDER BY created_at DESC
LIMIT 20;

-- Without index: sequential scan (slow)
-- With index: index range scan (fast)

Full-Text Search Pagination

-- Full-text search with pagination
CREATE INDEX idx_posts_fts ON posts USING GIN(to_tsvector('english', content));

SELECT id, title, ts_rank(to_tsvector('english', content), plainto_tsquery('postgresql')) as rank
FROM posts
WHERE to_tsvector('english', content) @@ plainto_tsquery('postgresql')
ORDER BY rank DESC
LIMIT 20 OFFSET 0;

-- Cursor-based version (more efficient)
SELECT id, title, ts_rank(to_tsvector('english', content), plainto_tsquery('postgresql')) as rank
FROM posts
WHERE to_tsvector('english', content) @@ plainto_tsquery('postgresql')
AND (rank, id) < (previous_rank, previous_id) -- Cursor
ORDER BY rank DESC, id DESC
LIMIT 20;

Complex Filtering with Multiple Conditions

class AdvancedQueryBuilder:
"""Build complex queries with multiple filters."""

def build_product_query(self, filters: dict) -> str:
"""Build query with flexible filtering."""
query = "SELECT * FROM products WHERE 1=1"
params = []

# Dynamic WHERE clauses
if 'category' in filters:
query += " AND category = %s"
params.append(filters['category'])

if 'price_min' in filters and 'price_max' in filters:
query += " AND price BETWEEN %s AND %s"
params.extend([filters['price_min'], filters['price_max']])

if 'in_stock' in filters and filters['in_stock']:
query += " AND quantity > 0"

if 'rating_min' in filters:
query += " AND avg_rating >= %s"
params.append(filters['rating_min'])

if 'search' in filters:
# Full-text search
query += " AND to_tsvector(title || ' ' || description) @@ plainto_tsquery(%s)"
params.append(filters['search'])

# Sorting
sort_field = filters.get('sort', 'created_at')
sort_dir = filters.get('sort_dir', 'DESC')
query += f" ORDER BY {sort_field} {sort_dir}"

# Pagination
limit = min(int(filters.get('limit', 20)), 100) # Max 100
offset = int(filters.get('offset', 0))
query += f" LIMIT {limit} OFFSET {offset}"

return query, params

# Usage
builder = AdvancedQueryBuilder()
query, params = builder.build_product_query({
'category': 'electronics',
'price_min': 50,
'price_max': 500,
'in_stock': True,
'rating_min': 4.0,
'search': 'laptop',
'sort': 'price',
'sort_dir': 'ASC',
'limit': 20,
'offset': 0
})

Lazy Loading and N+1 Prevention

class LazyLoadingOptimization:
"""Avoid N+1 query problem."""

# WRONG: N+1 queries
def get_users_with_posts_bad(self):
users = self.db.query("SELECT * FROM users")
for user in users:
user.posts = self.db.query(
"SELECT * FROM posts WHERE user_id = %s", [user.id]
) # N queries!
return users

# RIGHT: Single query with JOIN
def get_users_with_posts_good(self):
return self.db.query("""
SELECT u.*, p.*
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
ORDER BY u.id, p.created_at DESC
""")

# RIGHT with cursor-based pagination
def get_users_paginated_with_posts(self, cursor=None, limit=20):
query = """
SELECT u.id, u.name, p.id as post_id, p.title
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
"""

if cursor:
query += f" WHERE u.id > {cursor}"

query += " ORDER BY u.id, p.created_at DESC"
query += f" LIMIT {limit + 1}"

rows = self.db.query(query)

# Group by user
result = {}
for row in rows:
if row['id'] not in result:
result[row['id']] = {
'id': row['id'],
'name': row['name'],
'posts': []
}
if row['post_id']:
result[row['id']]['posts'].append({
'id': row['post_id'],
'title': row['title']
})

return list(result.values())[:limit]

Performance Benchmarking

import time
import statistics

class QueryBenchmark:
"""Benchmark query performance."""

def benchmark_pagination_strategy(self, strategy_name: str, query_func):
"""Measure latency and throughput."""
times = []

for _ in range(10):
start = time.time()
result = query_func()
elapsed = time.time() - start
times.append(elapsed)

return {
'strategy': strategy_name,
'min_ms': min(times) * 1000,
'max_ms': max(times) * 1000,
'avg_ms': statistics.mean(times) * 1000,
'p99_ms': sorted(times)[9] * 1000
}

def compare_pagination_strategies(self):
"""Compare offset vs cursor pagination."""
offset_result = self.benchmark_pagination_strategy(
'offset-limit',
lambda: self._query_offset_pagination()
)

cursor_result = self.benchmark_pagination_strategy(
'cursor-based',
lambda: self._query_cursor_pagination()
)

print(f"Offset: {offset_result['avg_ms']:.2f}ms (p99: {offset_result['p99_ms']:.2f}ms)")
print(f"Cursor: {cursor_result['avg_ms']:.2f}ms (p99: {cursor_result['p99_ms']:.2f}ms)")

if cursor_result['avg_ms'] < offset_result['avg_ms']:
print("✓ Cursor pagination is faster")

References

  • PostgreSQL Documentation on Indexes
  • MySQL Optimization Guide
  • "Use the Index, Luke" by Markus Winand
  • "High Performance MySQL" by Schwartz, Zaitsev, Tkachenko
  • "Designing Data-Intensive Applications" by Martin Kleppmann