Skip to main content

Graph Databases

Optimize relationship traversals for social networks, recommendations, and knowledge graphs

TL;DR

Graph databases (Neo4j, ArangoDB) store nodes and relationships as first-class entities, optimizing traversal queries. Ideal for social networks, recommendation engines, and knowledge graphs. Trade-off: not suitable for analytics-only workloads, less horizontal scale than document stores, relationship counts affect query complexity.

Learning Objectives

  • Understand property graph model (nodes, relationships, properties)
  • Design graph schemas for access patterns
  • Recognize when graph queries outperform relational JOINs
  • Write efficient traversal queries in Cypher

Motivating Scenario

Social network: find all friends-of-friends who like jazz within 100 miles. Relational requires multiple JOINs. Graph database: single traversal. LinkedIn recommendation: "People You May Know" - find users with mutual connections. Graph: milliseconds. RDBMS: seconds or timeouts.

Core Concepts

Practical Example

-- Create nodes and relationships
CREATE (alice:Person {name: 'Alice', age: 30, email: 'alice@example.com'})
CREATE (bob:Person {name: 'Bob', age: 28})
CREATE (carol:Person {name: 'Carol', age: 31})
CREATE (jazz:Genre {name: 'Jazz'})
CREATE (rock:Genre {name: 'Rock'})

-- Create relationships with properties
CREATE (alice)-[:KNOWS {since: 2020}]->(bob)
CREATE (alice)-[:KNOWS {since: 2021}]->(carol)
CREATE (bob)-[:KNOWS {since: 2019}]->(carol)
CREATE (alice)-[:LIKES]->(jazz)
CREATE (bob)-[:LIKES]->(jazz)
CREATE (carol)-[:LIKES]->(rock)

-- Find Alice's friends
MATCH (alice:Person {name: 'Alice'})-[:KNOWS]->(friend)
RETURN friend.name, friend.age

-- Find friends-of-friends (two-hop)
MATCH (alice:Person {name: 'Alice'})-[:KNOWS]->()-[:KNOWS]->(fof)
RETURN DISTINCT fof.name

-- Find mutual friends
MATCH (alice:Person {name: 'Alice'})-[:KNOWS]->(mutual)
<-[:KNOWS]-(bob:Person {name: 'Bob'})
RETURN mutual.name

-- Find users interested in same music
MATCH (alice:Person {name: 'Alice'})-[:LIKES]->(genre)
<-[:LIKES]-(similar:Person)
RETURN similar.name, genre.name

-- Collaborative filtering: books liked by similar users
MATCH (me:Person {name: 'Alice'})-[:LIKES]->(thing)
<-[:LIKES]-(similar:Person)
-[:LIKES]->(recommendation)
WHERE NOT (me)-[:LIKES]->(recommendation)
RETURN recommendation.name, COUNT(*) as score
ORDER BY score DESC

-- Shortest path
MATCH (alice:Person {name: 'Alice'}), (carol:Person {name: 'Carol'})
CALL apoc.algo.dijkstra(alice, carol, 'KNOWS', 'weight')
YIELD path, weight
RETURN path, weight

-- With aggregation
MATCH (person:Person)-[r:KNOWS]->()
RETURN person.name, COUNT(r) as friend_count
ORDER BY friend_count DESC
LIMIT 10

When to Use Graph Databases / When Not to Use

Use Graph Databases When
  1. Relationship traversals are common
  2. Multi-hop queries frequent (friends-of-friends)
  3. Recommendation engine needed
  4. Knowledge graphs or semantic search
  5. Social networks, collaboration
Use RDBMS When
  1. Simple queries without traversals
  2. Primarily analytical workloads
  3. Schema is mostly tables without relationships
  4. High-volume write throughput needed
  5. Complex multi-table transactions

Patterns and Pitfalls

Design Review Checklist

  • Node types and relationship types clearly defined
  • Relationship directions optimize common queries
  • Properties indexed appropriately
  • Recursive queries have depth limits
  • Query patterns don't create cartesian products
  • Recommendation caching strategy defined
  • Backup and recovery procedures documented
  • Graph size monitored for memory constraints
  • Query performance profiled and optimized
  • Monitoring for slow queries and timeouts

Graph Database Design Patterns

Social Network Graph Design

Nodes: Person, followed by depth of connections Relationships: FOLLOWS (directed), KNOWS (undirected), BLOCKS (directional)

-- Design: follows are directional (A follows B doesn't mean B follows A)
CREATE (alice:Person {name: 'Alice'})
CREATE (bob:Person {name: 'Bob'})
CREATE (carol:Person {name: 'Carol'})

-- Alice follows Bob (Alice sees Bob's tweets)
CREATE (alice)-[:FOLLOWS]->(bob)
-- Bob follows Alice (Bob sees Alice's tweets)
CREATE (bob)-[:FOLLOWS]->(alice)
-- Carol follows Bob
CREATE (carol)-[:FOLLOWS]->(bob)

-- Find followers (who follows me?)
MATCH (alice:Person {name: 'Alice'})<-[:FOLLOWS]-(follower)
RETURN follower.name

-- Find following (who do I follow?)
MATCH (alice:Person {name: 'Alice'})-[:FOLLOWS]->(following)
RETURN following.name

-- Find mutuals (we follow each other)
MATCH (alice:Person {name: 'Alice'})-[:FOLLOWS]->(mutual)
<-[:FOLLOWS]-(alice)
RETURN mutual.name

-- Recommend: users followed by my followers
MATCH (alice:Person {name: 'Alice'})-[:FOLLOWS]->(follower)
-[:FOLLOWS]->(recommendation:Person)
WHERE NOT (alice)-[:FOLLOWS]->(recommendation)
AND recommendation != alice
RETURN DISTINCT recommendation.name, COUNT(*) as mutual_followers
ORDER BY mutual_followers DESC
LIMIT 10

Recommendation Graph

-- Nodes: User, Product, Category
-- Relationships: LIKES, IN_CATEGORY, SIMILAR_TO

-- Find products liked by similar users
MATCH (me:User {id: 'user123'})
-[:LIKES]->(product)<-[:LIKES]-(similar:User)
-[:LIKES]->(recommendation:Product)
WHERE NOT (me)-[:LIKES]->(recommendation)
AND recommendation != product
WITH recommendation, COUNT(similar) as score
ORDER BY score DESC
LIMIT 5
RETURN recommendation.name, score

Knowledge Graph

-- Nodes: Concept, Person, Place
-- Relationships: IS_A, RELATED_TO, LOCATED_IN, AUTHORED_BY

-- Find all related concepts to AI
MATCH (ai:Concept {name: 'AI'})
-[:RELATED_TO*1..3]-(related:Concept)
RETURN DISTINCT related.name

-- Find experts on a topic
MATCH (topic:Concept)<-[:AUTHORED_BY]-(expert:Person)
RETURN expert.name, COUNT(*) as papers
ORDER BY papers DESC

Performance Optimization

Preventing Cartesian Products

-- BAD: Creates cartesian product (all combinations)
MATCH (a:Person)-[:KNOWS]-(b:Person)
MATCH (c:Person)-[:KNOWS]-(d:Person)
RETURN *
-- If there are 1000 KNOWS relationships, returns 1,000,000 rows

-- GOOD: Filter early to reduce combinations
MATCH (a:Person {name: 'Alice'})-[:KNOWS]-(b:Person)
MATCH (b)-[:KNOWS]-(c:Person)
WHERE c.name != 'Alice'
RETURN DISTINCT c.name
-- Finds friends-of-friends; filters reduce result set

Query Planning with EXPLAIN

-- Analyze query execution plan
EXPLAIN MATCH (p:Person)-[:KNOWS]->(:Person)-[:LIKES]->(m:Movie)
RETURN p, m

-- Output shows:
-- - Which indexes are used
-- - Estimated rows at each step
-- - Search methods (node index, relationship scan, etc.)

Index Strategy

-- Create indexes for common query starts
CREATE INDEX ON :Person(name)
CREATE INDEX ON :User(email)

-- Compound indexes for multi-property filters
CREATE INDEX ON :Product(category, price)

-- Relationship type indexes
CREATE INDEX ON :KNOWS

-- Full-text search index
CALL db.index.fulltext.createNodeIndex("personSearch", ["Person"], ["name", "bio"])

Graph Database Limitations

Memory Constraints: Graph stays in memory. Large graphs (millions of nodes) may not fit in memory.

No Complex Transactions: ACID transactions limited to single node or small subgraph.

Not for Bulk Analytics: Analytics workloads (scanning all nodes) less efficient than data warehouses.

Sharding Complexity: Distributing graphs across servers is hard (relationships cross shards).

Self-Check

  • How would you design a follow graph for Twitter? Nodes: User. Relationships: FOLLOWS (directed). User A follows User B if A wants to see B's tweets.
  • What's the difference between friend and follower relationships? FRIEND: bidirectional (both must accept). FOLLOWS: directional (one-way).
  • When would you use a graph instead of relational JOINs? When relationship traversals are frequent, deep, and multi-hop (friends-of-friends). RDBMS becomes slow with many JOINs.
  • How do you prevent cartesian explosions? Filter early; use WHERE clauses to reduce result set size before subsequent MATCH clauses. Set depth limits on recursive queries.

Example: Finding "People You May Know"

  • Graph: Simple traversal (find friends' friends, exclude existing friends)
  • RDBMS: Complex JOINs across User, Friendship, User tables; slower and harder to understand.
info

Graph databases excel at relationship traversals and recommendation algorithms, but aren't suitable for analytics-only workloads. Use them when multi-hop queries are frequent and relationships are first-class citizens in your domain model.

Next Steps

  • Explore Social Network Patterns for follower/following design
  • Learn Recommendation Algorithms using graph traversals
  • Study Knowledge Graph design patterns
  • Dive into Query Optimization for Cypher

References

  • Neo4j Official Documentation
  • "Graph Databases" by Ian Robinson
  • Cypher Query Language Guide
  • APOC Library Documentation