Heap Applications

TL;DR

Heaps efficiently solve top-K problems in O(n log k) using min-heap of size k. Running median uses two-heap approach: max-heap for lower half, min-heap for upper half, maintaining balance for O(log n) insertion. Merging k sorted lists with heap costs O(n log k) by repeatedly extracting minimum. Dijkstra's algorithm uses priority queue for O((V+E) log V) shortest paths. Recognizing when heaps apply transforms inefficient solutions into optimized algorithms.

Core Concepts

Top-K Problems

Top-K largest uses min-heap of size k. Iterate array: if heap size < k, insert; else if current > heap min, remove min and insert current. Final heap contains k largest. Time O(n log k), space O(k). Better than sorting O(n log n) when k << n.

Top-K frequent counts frequencies first (hash map O(n)), then applies top-K algorithm. Heap comparator uses frequency. O(n + k log n) total.

Top-K closest points calculates distance from origin, then applies top-K. Heap size k minimizes memory for large arrays.

Variations: reverse for k smallest, use max-heap for min-heap approach differences.

Running Median with Two Heaps

Two-heap invariant: max-heap stores lower half (all ≤ median), min-heap stores upper half (all ≥ median). Max-heap size equals or exceeds min-heap by 1.

Insertion compares to max-heap's root. If smaller, add to max-heap; else add to min-heap. Then balance: if size difference exceeds 1, move root from larger to smaller.

Median query: if sizes equal, average both roots; else return max-heap root (the larger). O(1) query after O(log n) insertion.

Time complexity O(log n) per insertion, O(1) per query. Enables processing infinite stream with O(1) space beyond heaps.

Merging K Sorted Lists

Naive approach merges pairwise: O(n log k) where we do k-1 merges of O(n) comparisons each with tree structure.

Heap approach: maintain min-heap of current heads from each list (k elements initially). Extract min, output it, insert next element from same list. Continue until all lists exhausted. O(n log k) time since each of n elements enters/exits heap.

Implementation stores (value, list_index, element_index) tuples. Heap comparator uses value.

Advantages over pairwise: cleaner implementation, same optimal complexity.

Dijkstra's Algorithm with Priority Queue

Algorithm maintains priority queue of (distance, vertex) pairs. Start from source with distance 0. Repeatedly: extract vertex with minimum distance, relax outgoing edges. If destination has lower distance, insert new (distance, vertex) pair.

Termination when all vertices visited or destination reached. Distance to destination = shortest path.

Time complexity O((V+E) log V) with binary heap. V vertex extractions (each log V), E edge relaxations (each log V).

Important: must skip outdated entries in priority queue (entries with vertex already visited). Check visited set when extracting.

Key Algorithms and Techniques

Top-K with Min-Heap of Size K

Keep k elements; remove min when larger appears.

Two-Heap Median Maintaining Balance

Alternate moving roots between heaps to maintain size invariant.

Merge K Lists with Priority Queue

Heap of k heads; extract min, add next from same list.

Dijkstra with Priority Queue

Repeatedly extract minimum distance vertex, relax edges.

Practical Example

Python
Java
TypeScript

import heapq
from collections import Counter

# Top-K largest elements
def topKLargest(nums, k):
    """Find k largest elements using min-heap."""
    heap = nums[:k]
    heapq.heapify(heap)
    for num in nums[k:]:
        if num > heap[0]:
            heapq.heapreplace(heap, num)
    return sorted(heap, reverse=True)

# Top-K frequent elements
def topKFrequent(nums, k):
    """Find k most frequent elements."""
    count = Counter(nums)
    heap = [(-freq, num) for num, freq in count.items()]
    heapq.heapify(heap)
    return [heapq.heappop(heap)[1] for _ in range(k)]

# Running median
class MedianFinder:
    def __init__(self):
        self.small = []  # max-heap
        self.large = []  # min-heap

    def addNum(self, num):
        if not self.small or num <= -self.small[0]:
            heapq.heappush(self.small, -num)
        else:
            heapq.heappush(self.large, num)

        # Balance heaps
        if len(self.small) > len(self.large) + 1:
            val = -heapq.heappop(self.small)
            heapq.heappush(self.large, val)
        elif len(self.large) > len(self.small):
            val = heapq.heappop(self.large)
            heapq.heappush(self.small, -val)

    def findMedian(self):
        if len(self.small) == len(self.large):
            return (-self.small[0] + self.large[0]) / 2.0
        return float(-self.small[0])

# Merge K sorted lists
def mergeKLists(lists):
    """Merge k sorted linked lists."""
    heap = []
    for i, lst in enumerate(lists):
        if lst:
            heapq.heappush(heap, (lst.val, i, lst))

    dummy = ListNode(0)
    current = dummy
    while heap:
        val, idx, node = heapq.heappop(heap)
        current.next = node
        current = current.next
        if node.next:
            heapq.heappush(heap, (node.next.val, idx, node.next))

    return dummy.next

# Dijkstra's algorithm
def dijkstra(graph, start):
    """Find shortest paths from start to all vertices."""
    import heapq
    dist = {node: float('inf') for node in graph}
    dist[start] = 0
    pq = [(0, start)]

    while pq:
        d, u = heapq.heappop(pq)
        if d > dist[u]:
            continue
        for v, weight in graph[u]:
            if dist[u] + weight < dist[v]:
                dist[v] = dist[u] + weight
                heapq.heappush(pq, (dist[v], v))

    return dist

# Test
print(f"Top-3 largest [1,2,3,5,6,4]: {topKLargest([1,2,3,5,6,4], 3)}")
print(f"Top-2 frequent [1,1,1,2,2,3]: {topKFrequent([1,1,1,2,2,3], 2)}")

mf = MedianFinder()
for num in [1, 2, 3, 4, 5]:
    mf.addNum(num)
    print(f"Median: {mf.findMedian()}")

import java.util.*;

public class HeapApplications {
    static int[] topKLargest(int[] nums, int k) {
        PriorityQueue<Integer> minHeap = new PriorityQueue<>();
        for (int num : nums) {
            minHeap.offer(num);
            if (minHeap.size() > k)
                minHeap.poll();
        }
        int[] result = new int[k];
        for (int i = k - 1; i >= 0; i--)
            result[i] = minHeap.poll();
        return result;
    }

    static class MedianFinder {
        PriorityQueue<Integer> small;  // max-heap
        PriorityQueue<Integer> large;  // min-heap

        public MedianFinder() {
            small = new PriorityQueue<>((a, b) -> b - a);
            large = new PriorityQueue<>();
        }

        public void addNum(int num) {
            if (small.isEmpty() || num <= small.peek())
                small.offer(num);
            else
                large.offer(num);

            if (small.size() > large.size() + 1)
                large.offer(small.poll());
            else if (large.size() > small.size())
                small.offer(large.poll());
        }

        public double findMedian() {
            if (small.size() == large.size())
                return (small.peek() + large.peek()) / 2.0;
            return small.peek();
        }
    }
}

function topKLargest(nums: number[], k: number): number[] {
    const sorted = [...nums].sort((a, b) => b - a);
    return sorted.slice(0, k);
}

class MedianFinder {
    private small: number[] = [];
    private large: number[] = [];

    addNum(num: number): void {
        if (this.small.length === 0 || num <= this.small[0]) {
            this.small.push(num);
            this.small.sort((a, b) => b - a);
        } else {
            this.large.push(num);
            this.large.sort((a, b) => a - b);
        }

        if (this.small.length > this.large.length + 1) {
            const top = this.small.shift()!;
            this.large.push(top);
            this.large.sort((a, b) => a - b);
        } else if (this.large.length > this.small.length) {
            const top = this.large.shift()!;
            this.small.push(top);
            this.small.sort((a, b) => b - a);
        }
    }

    findMedian(): number {
        if (this.small.length === this.large.length)
            return (this.small[0] + this.large[0]) / 2;
        return this.small[0];
    }
}

const mf = new MedianFinder();
[1, 2, 3].forEach(n => mf.addNum(n));
console.log(mf.findMedian());

Common Pitfalls

warning

Incorrect heap size in top-K: Using heap size > k wastes space and increases complexity. Maintain exactly k elements.

Not balancing heaps in median: Forgetting to move roots between heaps after insertion breaks median query. Always balance after insertion.

Skipping outdated entries in Dijkstra: Heap contains multiple entries per vertex. Must check if vertex visited before processing. Otherwise explores same vertex multiple times.

Wrong heap type: Top-K largest needs min-heap (extract minimum from k largest). Top-K smallest needs max-heap. Easy to confuse.

Merging with incorrect tuple ordering: Tuple elements determine heap ordering. Ensure comparator element (value) comes first.

Self-Check

Why is top-K with min-heap O(n log k)? When is this better than sorting?
Explain two-heap median approach. Why must sizes stay balanced?
How does Dijkstra avoid reprocessing vertices? What role does visited checking play?

One Takeaway

info

Heaps solve selection and ranking problems optimally. Top-K leverages heap size k, two-heap median elegantly balances streams, and priority queues power graph algorithms. Recognizing these patterns transforms complex problems into clean heap-based solutions.

Real-World Applications

Stock Market: Top N Moving Movers

# Find top 10 stocks by percentage gain today

class StockTracker:
    def __init__(self, top_k=10):
        self.top_k = top_k
        self.heap = []  # min-heap of (gain%, symbol)

    def add_stock(self, symbol, current_price, open_price):
        gain = (current_price - open_price) / open_price * 100

        if len(self.heap) < self.top_k:
            heapq.heappush(self.heap, (gain, symbol))
        elif gain > self.heap[0][0]:
            heapq.heapreplace(self.heap, (gain, symbol))

    def get_top_movers(self):
        return sorted(self.heap, reverse=True)

# Usage: Track 10,000 stocks, keep only top 10
tracker = StockTracker(top_k=10)
for stock in all_stocks:
    tracker.add_stock(stock.symbol, stock.current_price, stock.open_price)

top_movers = tracker.get_top_movers()
# [[+15.2%, NVDA], [+12.8%, TSLA], ...]

Job Scheduling: Earliest Deadline First

# Schedule jobs to meet deadlines

class Job:
    def __init__(self, name, duration, deadline):
        self.name = name
        self.duration = duration
        self.deadline = deadline

    def __lt__(self, other):
        return self.deadline < other.deadline

def schedule_jobs(jobs):
    """Schedule jobs earliest deadline first."""
    pq = jobs  # Already a min-heap by deadline
    heapq.heapify(pq)

    current_time = 0
    schedule = []

    while pq:
        job = heapq.heappop(pq)
        schedule.append((job.name, current_time, current_time + job.duration))
        current_time += job.duration

        if current_time > job.deadline:
            print(f"WARNING: Job {job.name} missed deadline")

    return schedule

# Usage
jobs = [
    Job("Task A", 2, 5),    # 2 units, due at time 5
    Job("Task B", 1, 3),    # 1 unit, due at time 3
    Job("Task C", 3, 8),    # 3 units, due at time 8
]

schedule = schedule_jobs(jobs)
# Task B: 0-1 (meets deadline 3)
# Task A: 1-3 (meets deadline 5)
# Task C: 3-6 (meets deadline 8)

System Load Balancer: Least Loaded Server

# Distribute requests to least loaded servers

class LoadBalancer:
    def __init__(self, servers):
        self.servers = servers
        self.load_heap = [(0, server_id) for server_id, _ in enumerate(servers)]
        heapq.heapify(self.load_heap)

    def assign_request(self):
        """Assign request to least loaded server."""
        current_load, server_id = heapq.heappop(self.load_heap)

        # Process request (takes 1 unit of time on this server)
        new_load = current_load + 1
        heapq.heappush(self.load_heap, (new_load, server_id))

        return server_id

    def complete_request(self, server_id):
        """Server finished processing request."""
        # Simplified: in reality, track completion time and unload
        pass

# Usage: 100 requests arrive, distribute load
lb = LoadBalancer(servers=[()] * 4)  # 4 servers
for i in range(100):
    server = lb.assign_request()
    # Server "server" processes request

Performance Comparison

Problem	Algorithm	Time	Space	Notes
Top-K largest	Min-heap of size K	O(n log K)	O(K)	Better than O(n log n) when K much less than N
Running median	Two heaps	O(log n) per insert	O(n)	O(1) query, ideal for streams
Merge K lists	Priority queue	O(n log K)	O(K)	Cleaner than pairwise merging
Dijkstra	Priority queue	O((V+E) log V)	O(V)	Fast shortest path for sparse graphs
Task scheduling	Min-heap	O(n log n)	O(n)	Greedy approach optimal

References

Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to Algorithms (3rd ed.). MIT Press.
McDowell, G. L. (2015). Cracking the Coding Interview (6th ed.). CareerCup.
Real-world applications of heaps in production systems

Heap Applications

TL;DR​

Core Concepts​

Top-K Problems​

Running Median with Two Heaps​

Merging K Sorted Lists​

Dijkstra's Algorithm with Priority Queue​

Key Algorithms and Techniques​

Top-K with Min-Heap of Size K​

Two-Heap Median Maintaining Balance​

Merge K Lists with Priority Queue​

Dijkstra with Priority Queue​

Practical Example​

Common Pitfalls​

Self-Check​

One Takeaway​

Real-World Applications​

Stock Market: Top N Moving Movers​

Job Scheduling: Earliest Deadline First​

System Load Balancer: Least Loaded Server​

Performance Comparison​

References​