GraphQL

Design flexible, client-driven APIs with precise data fetching

TL;DR

GraphQL is a query language for APIs that lets clients request exactly what data they need. Unlike REST's fixed representations, GraphQL clients specify shape and depth. A single query fetches users and their recent orders in one roundtrip. The server defines a schema (types, fields, rules). Resolvers handle fetching data for each field. The N+1 problem and batching complexity are real concerns, but design patterns like DataLoader mitigate them. GraphQL excels for flexible APIs; REST excels for simple, stable ones. Choose based on your client diversity and evolution pace.

Learning Objectives

Design GraphQL schemas for clarity and performance
Distinguish queries, mutations, and subscriptions
Understand resolver patterns and N+1 problems
Implement batching and caching strategies
Decide when GraphQL makes sense vs REST

Motivating Scenario

A mobile app needs a user profile: name, email, recent orders (with order totals). A REST approach requires three calls: /users/{id}, /users/{id}/orders, /orders/{id}/products (to calculate totals). A web app also needs this but adds an address endpoint. A partner portal doesn't need orders at all. Each client has different data requirements.

GraphQL lets each client request exactly what it needs in one query. The app fetches user, orders, and product data in one roundtrip. The website queries the same fields. The partner portal queries only name and email. The server doesn't bloat responses with unused data.

Core Concepts

Schema, Types, and Fields

A GraphQL schema defines the shape of data clients can query. Types are objects with fields, scalars are primitives, enums define allowed values.

type User {
  id: ID!
  name: String!
  email: String!
  orders: [Order!]!
}

type Order {
  id: ID!
  total: Float!
  status: OrderStatus!
}

enum OrderStatus {
  PENDING
  COMPLETED
  CANCELLED
}

type Query {
  user(id: ID!): User
  users(limit: Int = 20, offset: Int = 0): [User!]!
}

Resolvers

Resolvers are functions that fetch data for each field. The user field resolver queries the database. The orders field resolver fetches orders for that user.

The N+1 Problem

Naive resolvers cause N+1 queries: fetch user (1 query), then for each order, fetch product details (N queries). With 100 orders, that's 101 queries.

Batching and DataLoader

Batching collects multiple field requests and fetches data efficiently. DataLoader caches requests within a query execution, preventing duplicate database hits.

Practical Example

Schema Design
Client Query
N+1 Problem & Solution

# ✅ Clear, well-structured schema
type User {
  id: ID!
  name: String!
  email: String!
  createdAt: DateTime!
  orders(limit: Int = 10, offset: Int = 0): [Order!]!
}

type Order {
  id: ID!
  total: Float!
  status: OrderStatus!
  items: [OrderItem!]!
}

type OrderItem {
  product: Product!
  quantity: Int!
  price: Float!
}

type Product {
  id: ID!
  name: String!
  sku: String!
}

enum OrderStatus {
  PENDING
  SHIPPED
  DELIVERED
}

type Query {
  user(id: ID!): User
  users(limit: Int = 20): [User!]!
}

type Mutation {
  createOrder(userId: ID!, items: [OrderItemInput!]!): Order
}

# Client requests exactly what it needs
query GetUserOrders {
  user(id: "123") {
    name
    email
    orders(limit: 5) {
      id
      total
      status
      items {
        product {
          name
        }
        quantity
        price
      }
    }
  }
}

# Response: Only requested fields
{
  "data": {
    "user": {
      "name": "Alice",
      "email": "alice@example.com",
      "orders": [
        {
          "id": "order-1",
          "total": 99.99,
          "status": "DELIVERED",
          "items": [
            {
              "product": { "name": "Widget" },
              "quantity": 2,
              "price": 49.99
            }
        }
    }
  }
}

// ❌ NAIVE RESOLVER - N+1 queries
const resolvers = {
  User: {
    orders: async (user) => {
      // Fetches orders for this user
      return db.query('SELECT * FROM orders WHERE user_id = ?', user.id);
    }
  },
  Order: {
    items: async (order) => {
      // Each order fetches its items - N additional queries if N orders
      return db.query('SELECT * FROM order_items WHERE order_id = ?', order.id);
    },
    product: async (item) => {
      // Each item fetches product - another N queries
      return db.query('SELECT * FROM products WHERE id = ?', item.product_id);
    }
  }
};

// ✅ BATCHED RESOLVER - Efficient

const productLoader = new DataLoader(async (productIds) => {
  // Fetch all products in one query instead of one per item
  const products = await db.query(
    'SELECT * FROM products WHERE id IN (?)',
    [productIds]
  );
  // Return in same order as input
  return productIds.map(id => products.find(p => p.id === id));
});

const resolvers = {
  User: {
    orders: async (user) => {
      return db.query('SELECT * FROM orders WHERE user_id = ?', user.id);
    }
  },
  Order: {
    items: async (order) => {
      return db.query('SELECT * FROM order_items WHERE order_id = ?', order.id);
    }
  },
  OrderItem: {
    product: async (item) => {
      // DataLoader batches these requests
      return productLoader.load(item.product_id);
    }
  }
};

REST vs GraphQL

Use REST When

Simple, stable resources
Caching infrastructure important
Clients have similar data needs
HTTP semantics matter (PUT, DELETE)
Team familiar with HTTP conventions

Use GraphQL When

Diverse clients with different needs
Frequent API evolution
Over-fetching is costly (mobile)
Complex data relationships
Single endpoint preferred

Performance Considerations

Depth Limiting

Prevent deeply nested queries:

# This query digs 10 levels deep - wasteful and dangerous
query {
  user {
    orders {
      items {
        product {
          supplier {
            contacts {
              company {
                employees {
                  manager {
                    department {
                      budget
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

# Solution: set max depth to 4-5

Query Complexity Scoring

Assign costs to fields:

type User {
  id: String! # Cost: 1
  name: String! # Cost: 1
  orders(limit: Int): [Order!]! # Cost: limit (can return 1000 items)
}

type Order {
  id: String! # Cost: 1
  items: [OrderItem!]! # Cost: 1 per item × count
}

# Query complexity: user(1) + orders(limit:10, cost=10) + items(per order, cost=5) = 1 + 10 + (10×5) = 61
# Set threshold: reject > 100 to prevent DOS

Caching Strategies

Problem: GraphQL doesn't use HTTP caching (single endpoint for all queries).

Solutions:

Field-level caching: Cache resolver results

from functools import lru_cache

@lru_cache(maxsize=1000)
def get_product(product_id):
    return db.query("SELECT * FROM products WHERE id = ?", product_id)

Persisted queries: Pre-define queries, send query ID instead of full query

# Client sends: { "id": "GetUserOrders", "variables": { "userId": "123" } }
# Server executes pre-defined query

HTTP caching with persisted queries: Use GET requests, set Cache-Control headers

Subscriptions: Real-time Updates

Real-time data via WebSocket:

subscription OnOrderCreated {
  orderCreated {
    id
    status
    total
  }
}

Implementation complexity:

WebSocket connection management
Broadcasting to subscribed clients
Unsubscription cleanup
Backpressure handling (what if updates come faster than client can process?)

Monitoring and Observability

Track GraphQL-specific metrics:

Query execution time (median, p99)
N+1 query occurrences
Cache hit rate
Resolver performance
Query complexity distribution

Design Review Checklist

Common Pitfalls

Pitfall: Poorly Designed Mutations

Mutations should clearly express side effects:

# Bad: vague return type
type Mutation {
  updateUser(id: ID!, data: JSON): String  # returns what? error message? success message?
}

# Good: explicit return type with errors
type Mutation {
  updateUser(id: ID!, input: UpdateUserInput!): UpdateUserPayload!
}

type UpdateUserPayload {
  success: Boolean!
  user: User  # null if failed
  errors: [UserError!]!
}

type UserError {
  field: String!
  message: String!
}

Pitfall: Unvalidated Input

# Bad: no validation
input CreateUserInput {
  email: String
  age: Int
}

# Good: validation in schema
input CreateUserInput {
  email: String! @validate(format: "email")
  age: Int! @validate(min: 13, max: 150)
  name: String! @validate(minLength: 2, maxLength: 100)
}

Pitfall: Breaking Schema Changes

GraphQL schema is a contract. Changes break clients.

# BAD: removing field breaks existing queries
type User {
  id: ID!
  name: String!
  # removed: email String!
}

# GOOD: deprecate, then remove later
type User {
  id: ID!
  name: String!
  email: String @deprecated(reason: "Use contactEmail instead")
  contactEmail: String
}

# After clients migrate: remove email

Self-Check

What problem does DataLoader solve in GraphQL resolvers?
- Answer: Batching requests. Instead of 1 query per item (N+1), collect all items and fetch in one query (1).
Why is query complexity scoring important?
- Answer: Prevents DOS attacks. Without it, a client could request 1 million nested items, overwhelming the server.
When might REST be preferable to GraphQL?
- Answer: Simple, stable resources (CRUD), caching critical (HTTP caching works well), team unfamiliar with GraphQL, client needs are homogeneous.
How do you handle errors in GraphQL (no HTTP status codes)?
- Answer: Return errors in response with error codes and messages. Client checks errors array and data values.
What's the difference between query and mutation?
- Answer: Query is idempotent (fetches data), Mutation modifies state (has side effects). Use mutations for writes.

One Takeaway

GraphQL empowers clients with precise data fetching, but resolvers require careful design to avoid N+1 performance cliffs. Use DataLoader, complexity scoring, and depth limiting to build performant APIs.

Next Steps

Read API Security for GraphQL-specific auth patterns
Study Error Formats for GraphQL error responses
Explore Versioning Strategies for GraphQL schema evolution

References

GraphQL Official Specification (graphql.org)
GraphQL Best Practices (How to GraphQL)
DataLoader Pattern (facebook/dataloader)
GraphQL Performance (Apollo Docs)

GraphQL

TL;DR​

Learning Objectives​

Motivating Scenario​

Core Concepts​

Schema, Types, and Fields​

Resolvers​

The N+1 Problem​

Batching and DataLoader​

Practical Example​

REST vs GraphQL​

Performance Considerations​

Depth Limiting​

Query Complexity Scoring​

Caching Strategies​

Subscriptions: Real-time Updates​

Monitoring and Observability​

Design Review Checklist​

Common Pitfalls​

Pitfall: Poorly Designed Mutations​

Pitfall: Unvalidated Input​

Pitfall: Breaking Schema Changes​

Self-Check​

Next Steps​

References​