Graph Databases

Optimize relationship traversals for social networks, recommendations, and knowledge graphs

TL;DR

Graph databases (Neo4j, ArangoDB) store nodes and relationships as first-class entities, optimizing traversal queries. Ideal for social networks, recommendation engines, and knowledge graphs. Trade-off: not suitable for analytics-only workloads, less horizontal scale than document stores, relationship counts affect query complexity.

Learning Objectives

Understand property graph model (nodes, relationships, properties)
Design graph schemas for access patterns
Recognize when graph queries outperform relational JOINs
Write efficient traversal queries in Cypher

Motivating Scenario

Social network: find all friends-of-friends who like jazz within 100 miles. Relational requires multiple JOINs. Graph database: single traversal. LinkedIn recommendation: "People You May Know" - find users with mutual connections. Graph: milliseconds. RDBMS: seconds or timeouts.

Core Concepts

Practical Example

Neo4j (Cypher)
Python + Neo4j Driver
Node.js + Neo4j

-- Create nodes and relationships
CREATE (alice:Person {name: 'Alice', age: 30, email: 'alice@example.com'})
CREATE (bob:Person {name: 'Bob', age: 28})
CREATE (carol:Person {name: 'Carol', age: 31})
CREATE (jazz:Genre {name: 'Jazz'})
CREATE (rock:Genre {name: 'Rock'})

-- Create relationships with properties
CREATE (alice)-[:KNOWS {since: 2020}]->(bob)
CREATE (alice)-[:KNOWS {since: 2021}]->(carol)
CREATE (bob)-[:KNOWS {since: 2019}]->(carol)
CREATE (alice)-[:LIKES]->(jazz)
CREATE (bob)-[:LIKES]->(jazz)
CREATE (carol)-[:LIKES]->(rock)

-- Find Alice's friends
MATCH (alice:Person {name: 'Alice'})-[:KNOWS]->(friend)
RETURN friend.name, friend.age

-- Find friends-of-friends (two-hop)
MATCH (alice:Person {name: 'Alice'})-[:KNOWS]->()-[:KNOWS]->(fof)
RETURN DISTINCT fof.name

-- Find mutual friends
MATCH (alice:Person {name: 'Alice'})-[:KNOWS]->(mutual)
<-[:KNOWS]-(bob:Person {name: 'Bob'})
RETURN mutual.name

-- Find users interested in same music
MATCH (alice:Person {name: 'Alice'})-[:LIKES]->(genre)
<-[:LIKES]-(similar:Person)
RETURN similar.name, genre.name

-- Collaborative filtering: books liked by similar users
MATCH (me:Person {name: 'Alice'})-[:LIKES]->(thing)
<-[:LIKES]-(similar:Person)
-[:LIKES]->(recommendation)
WHERE NOT (me)-[:LIKES]->(recommendation)
RETURN recommendation.name, COUNT(*) as score
ORDER BY score DESC

-- Shortest path
MATCH (alice:Person {name: 'Alice'}), (carol:Person {name: 'Carol'})
CALL apoc.algo.dijkstra(alice, carol, 'KNOWS', 'weight')
YIELD path, weight
RETURN path, weight

-- With aggregation
MATCH (person:Person)-[r:KNOWS]->()
RETURN person.name, COUNT(r) as friend_count
ORDER BY friend_count DESC
LIMIT 10

from neo4j import GraphDatabase

class SocialGraphDB:
    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))
    
    def close(self):
        self.driver.close()
    
    def create_person(self, name, age, email):
        with self.driver.session() as session:
            session.run(
                "CREATE (p:Person {name: $name, age: $age, email: $email})",
                name=name, age=age, email=email
            )
    
    def add_friendship(self, person1_name, person2_name):
        with self.driver.session() as session:
            session.run(
                """
                MATCH (p1:Person {name: $p1}), (p2:Person {name: $p2})
                CREATE (p1)-[:KNOWS]->(p2)
                """,
                p1=person1_name, p2=person2_name
            )
    
    def get_friends_of_friends(self, person_name):
        with self.driver.session() as session:
            result = session.run(
                """
                MATCH (p:Person {name: $name})-[:KNOWS]->()-[:KNOWS]->(fof)
                RETURN DISTINCT fof.name as name, fof.age as age
                """,
                name=person_name
            )
            return [dict(record) for record in result]
    
    def recommend_connections(self, person_name, limit=10):
        """People you may know: mutual friends"""
        with self.driver.session() as session:
            result = session.run(
                """
                MATCH (p:Person {name: $name})-[:KNOWS]->(mutual)
                <-[:KNOWS]-(recommendation:Person)
                WHERE NOT (p)-[:KNOWS]->(recommendation)
                RETURN recommendation.name as name, COUNT(mutual) as mutual_friends
                ORDER BY mutual_friends DESC
                LIMIT $limit
                """,
                name=person_name, limit=limit
            )
            return [dict(record) for record in result]

# Usage
db = SocialGraphDB("bolt://localhost:7687", "neo4j", "password")

# Create people
db.create_person("Alice", 30, "alice@example.com")
db.create_person("Bob", 28, "bob@example.com")
db.create_person("Carol", 31, "carol@example.com")

# Create connections
db.add_friendship("Alice", "Bob")
db.add_friendship("Alice", "Carol")
db.add_friendship("Bob", "Carol")

# Get recommendations
recommendations = db.recommend_connections("Alice", limit=5)
for person in recommendations:
    print(f"{person['name']} ({person['mutual_friends']} mutual friends)")

db.close()

const neo4j = require('neo4j-driver');

class SocialGraph {
  constructor(uri, user, password) {
    this.driver = neo4j.driver(uri, neo4j.auth.basic(user, password));
  }
  
  async close() {
    await this.driver.close();
  }
  
  async createPerson(name, age, email) {
    const session = this.driver.session();
    try {
      await session.run(
        'CREATE (p:Person {name: $name, age: $age, email: $email})',
        { name, age, email }
      );
    } finally {
      await session.close();
    }
  }
  
  async addFriendship(person1, person2) {
    const session = this.driver.session();
    try {
      await session.run(
        `MATCH (p1:Person {name: $p1}), (p2:Person {name: $p2})
         CREATE (p1)-[:KNOWS]->(p2)`,
        { p1: person1, p2: person2 }
      );
    } finally {
      await session.close();
    }
  }
  
  async getFriendOfFriends(personName) {
    const session = this.driver.session();
    try {
      const result = await session.run(
        `MATCH (p:Person {name: $name})-[:KNOWS]->()-[:KNOWS]->(fof)
         RETURN DISTINCT fof.name as name, fof.age as age`,
        { name: personName }
      );
      
      return result.records.map(record => ({
        name: record.get('name'),
        age: record.get('age')
      }));
    } finally {
      await session.close();
    }
  }
  
  async recommendConnections(personName, limit = 10) {
    const session = this.driver.session();
    try {
      const result = await session.run(
        `MATCH (p:Person {name: $name})-[:KNOWS]->(mutual)
           <-[:KNOWS]-(recommendation:Person)
         WHERE NOT (p)-[:KNOWS]->(recommendation)
         RETURN recommendation.name as name, COUNT(mutual) as mutual_friends
         ORDER BY mutual_friends DESC
         LIMIT $limit`,
        { name: personName, limit }
      );
      
      return result.records.map(record => ({
        name: record.get('name'),
        mutualFriends: record.get('mutual_friends').toNumber()
      }));
    } finally {
      await session.close();
    }
  }
}

// Usage
(async () => {
  const graph = new SocialGraph('bolt://localhost:7687', 'neo4j', 'password');
  
  await graph.createPerson('Alice', 30, 'alice@example.com');
  await graph.createPerson('Bob', 28, 'bob@example.com');
  
  await graph.addFriendship('Alice', 'Bob');
  
  const recommendations = await graph.recommendConnections('Alice', 5);
  recommendations.forEach(r => {
    console.log(`${r.name} (${r.mutualFriends} mutual friends)`);
  });
  
  await graph.close();
})();

When to Use Graph Databases / When Not to Use

Use Graph Databases When

Relationship traversals are common
Multi-hop queries frequent (friends-of-friends)
Recommendation engine needed
Knowledge graphs or semantic search
Social networks, collaboration

Use RDBMS When

Simple queries without traversals
Primarily analytical workloads
Schema is mostly tables without relationships
High-volume write throughput needed
Complex multi-table transactions

Patterns and Pitfalls

Design Review Checklist

Graph Database Design Patterns

Nodes: Person, followed by depth of connections Relationships: FOLLOWS (directed), KNOWS (undirected), BLOCKS (directional)

-- Design: follows are directional (A follows B doesn't mean B follows A)
CREATE (alice:Person {name: 'Alice'})
CREATE (bob:Person {name: 'Bob'})
CREATE (carol:Person {name: 'Carol'})

-- Alice follows Bob (Alice sees Bob's tweets)
CREATE (alice)-[:FOLLOWS]->(bob)
-- Bob follows Alice (Bob sees Alice's tweets)
CREATE (bob)-[:FOLLOWS]->(alice)
-- Carol follows Bob
CREATE (carol)-[:FOLLOWS]->(bob)

-- Find followers (who follows me?)
MATCH (alice:Person {name: 'Alice'})<-[:FOLLOWS]-(follower)
RETURN follower.name

-- Find following (who do I follow?)
MATCH (alice:Person {name: 'Alice'})-[:FOLLOWS]->(following)
RETURN following.name

-- Find mutuals (we follow each other)
MATCH (alice:Person {name: 'Alice'})-[:FOLLOWS]->(mutual)
<-[:FOLLOWS]-(alice)
RETURN mutual.name

-- Recommend: users followed by my followers
MATCH (alice:Person {name: 'Alice'})-[:FOLLOWS]->(follower)
-[:FOLLOWS]->(recommendation:Person)
WHERE NOT (alice)-[:FOLLOWS]->(recommendation)
  AND recommendation != alice
RETURN DISTINCT recommendation.name, COUNT(*) as mutual_followers
ORDER BY mutual_followers DESC
LIMIT 10

Recommendation Graph

-- Nodes: User, Product, Category
-- Relationships: LIKES, IN_CATEGORY, SIMILAR_TO

-- Find products liked by similar users
MATCH (me:User {id: 'user123'})
-[:LIKES]->(product)<-[:LIKES]-(similar:User)
-[:LIKES]->(recommendation:Product)
WHERE NOT (me)-[:LIKES]->(recommendation)
  AND recommendation != product
WITH recommendation, COUNT(similar) as score
ORDER BY score DESC
LIMIT 5
RETURN recommendation.name, score

Knowledge Graph

-- Nodes: Concept, Person, Place
-- Relationships: IS_A, RELATED_TO, LOCATED_IN, AUTHORED_BY

-- Find all related concepts to AI
MATCH (ai:Concept {name: 'AI'})
-[:RELATED_TO*1..3]-(related:Concept)
RETURN DISTINCT related.name

-- Find experts on a topic
MATCH (topic:Concept)<-[:AUTHORED_BY]-(expert:Person)
RETURN expert.name, COUNT(*) as papers
ORDER BY papers DESC

Performance Optimization

Preventing Cartesian Products

-- BAD: Creates cartesian product (all combinations)
MATCH (a:Person)-[:KNOWS]-(b:Person)
MATCH (c:Person)-[:KNOWS]-(d:Person)
RETURN *
-- If there are 1000 KNOWS relationships, returns 1,000,000 rows

-- GOOD: Filter early to reduce combinations
MATCH (a:Person {name: 'Alice'})-[:KNOWS]-(b:Person)
MATCH (b)-[:KNOWS]-(c:Person)
WHERE c.name != 'Alice'
RETURN DISTINCT c.name
-- Finds friends-of-friends; filters reduce result set

Query Planning with EXPLAIN

-- Analyze query execution plan
EXPLAIN MATCH (p:Person)-[:KNOWS]->(:Person)-[:LIKES]->(m:Movie)
RETURN p, m

-- Output shows:
-- - Which indexes are used
-- - Estimated rows at each step
-- - Search methods (node index, relationship scan, etc.)

Index Strategy

-- Create indexes for common query starts
CREATE INDEX ON :Person(name)
CREATE INDEX ON :User(email)

-- Compound indexes for multi-property filters
CREATE INDEX ON :Product(category, price)

-- Relationship type indexes
CREATE INDEX ON :KNOWS

-- Full-text search index
CALL db.index.fulltext.createNodeIndex("personSearch", ["Person"], ["name", "bio"])

Graph Database Limitations

Memory Constraints: Graph stays in memory. Large graphs (millions of nodes) may not fit in memory.

No Complex Transactions: ACID transactions limited to single node or small subgraph.

Not for Bulk Analytics: Analytics workloads (scanning all nodes) less efficient than data warehouses.

Sharding Complexity: Distributing graphs across servers is hard (relationships cross shards).

Self-Check

How would you design a follow graph for Twitter? Nodes: User. Relationships: FOLLOWS (directed). User A follows User B if A wants to see B's tweets.
What's the difference between friend and follower relationships? FRIEND: bidirectional (both must accept). FOLLOWS: directional (one-way).
When would you use a graph instead of relational JOINs? When relationship traversals are frequent, deep, and multi-hop (friends-of-friends). RDBMS becomes slow with many JOINs.
How do you prevent cartesian explosions? Filter early; use WHERE clauses to reduce result set size before subsequent MATCH clauses. Set depth limits on recursive queries.

Example: Finding "People You May Know"

Graph: Simple traversal (find friends' friends, exclude existing friends)
RDBMS: Complex JOINs across User, Friendship, User tables; slower and harder to understand.

info

Graph databases excel at relationship traversals and recommendation algorithms, but aren't suitable for analytics-only workloads. Use them when multi-hop queries are frequent and relationships are first-class citizens in your domain model.

Next Steps

Explore Social Network Patterns for follower/following design
Learn Recommendation Algorithms using graph traversals
Study Knowledge Graph design patterns
Dive into Query Optimization for Cypher

References

Neo4j Official Documentation
"Graph Databases" by Ian Robinson
Cypher Query Language Guide
APOC Library Documentation

Graph Databases

TL;DR​

Learning Objectives​

Motivating Scenario​

Core Concepts​

Practical Example​

When to Use Graph Databases / When Not to Use​

Patterns and Pitfalls​

Design Review Checklist​

Graph Database Design Patterns​

Social Network Graph Design​

Recommendation Graph​

Knowledge Graph​

Performance Optimization​

Preventing Cartesian Products​

Query Planning with EXPLAIN​

Index Strategy​

Graph Database Limitations​

Self-Check​

Next Steps​

References​