Skip to main content

Heap Applications

TL;DR

Heaps efficiently solve top-K problems in O(n log k) using min-heap of size k. Running median uses two-heap approach: max-heap for lower half, min-heap for upper half, maintaining balance for O(log n) insertion. Merging k sorted lists with heap costs O(n log k) by repeatedly extracting minimum. Dijkstra's algorithm uses priority queue for O((V+E) log V) shortest paths. Recognizing when heaps apply transforms inefficient solutions into optimized algorithms.

Core Concepts

Top-K Problems

Top-K largest uses min-heap of size k. Iterate array: if heap size < k, insert; else if current > heap min, remove min and insert current. Final heap contains k largest. Time O(n log k), space O(k). Better than sorting O(n log n) when k << n.

Top-K frequent counts frequencies first (hash map O(n)), then applies top-K algorithm. Heap comparator uses frequency. O(n + k log n) total.

Top-K closest points calculates distance from origin, then applies top-K. Heap size k minimizes memory for large arrays.

Variations: reverse for k smallest, use max-heap for min-heap approach differences.

Running Median with Two Heaps

Two-heap invariant: max-heap stores lower half (all ≤ median), min-heap stores upper half (all ≥ median). Max-heap size equals or exceeds min-heap by 1.

Insertion compares to max-heap's root. If smaller, add to max-heap; else add to min-heap. Then balance: if size difference exceeds 1, move root from larger to smaller.

Median query: if sizes equal, average both roots; else return max-heap root (the larger). O(1) query after O(log n) insertion.

Time complexity O(log n) per insertion, O(1) per query. Enables processing infinite stream with O(1) space beyond heaps.

Merging K Sorted Lists

Naive approach merges pairwise: O(n log k) where we do k-1 merges of O(n) comparisons each with tree structure.

Heap approach: maintain min-heap of current heads from each list (k elements initially). Extract min, output it, insert next element from same list. Continue until all lists exhausted. O(n log k) time since each of n elements enters/exits heap.

Implementation stores (value, list_index, element_index) tuples. Heap comparator uses value.

Advantages over pairwise: cleaner implementation, same optimal complexity.

Dijkstra's Algorithm with Priority Queue

Algorithm maintains priority queue of (distance, vertex) pairs. Start from source with distance 0. Repeatedly: extract vertex with minimum distance, relax outgoing edges. If destination has lower distance, insert new (distance, vertex) pair.

Termination when all vertices visited or destination reached. Distance to destination = shortest path.

Time complexity O((V+E) log V) with binary heap. V vertex extractions (each log V), E edge relaxations (each log V).

Important: must skip outdated entries in priority queue (entries with vertex already visited). Check visited set when extracting.

Key Algorithms and Techniques

Top-K with Min-Heap of Size K

Keep k elements; remove min when larger appears.

Two-Heap Median Maintaining Balance

Alternate moving roots between heaps to maintain size invariant.

Merge K Lists with Priority Queue

Heap of k heads; extract min, add next from same list.

Dijkstra with Priority Queue

Repeatedly extract minimum distance vertex, relax edges.

Practical Example

import heapq
from collections import Counter

# Top-K largest elements
def topKLargest(nums, k):
"""Find k largest elements using min-heap."""
heap = nums[:k]
heapq.heapify(heap)
for num in nums[k:]:
if num > heap[0]:
heapq.heapreplace(heap, num)
return sorted(heap, reverse=True)

# Top-K frequent elements
def topKFrequent(nums, k):
"""Find k most frequent elements."""
count = Counter(nums)
heap = [(-freq, num) for num, freq in count.items()]
heapq.heapify(heap)
return [heapq.heappop(heap)[1] for _ in range(k)]

# Running median
class MedianFinder:
def __init__(self):
self.small = [] # max-heap
self.large = [] # min-heap

def addNum(self, num):
if not self.small or num <= -self.small[0]:
heapq.heappush(self.small, -num)
else:
heapq.heappush(self.large, num)

# Balance heaps
if len(self.small) > len(self.large) + 1:
val = -heapq.heappop(self.small)
heapq.heappush(self.large, val)
elif len(self.large) > len(self.small):
val = heapq.heappop(self.large)
heapq.heappush(self.small, -val)

def findMedian(self):
if len(self.small) == len(self.large):
return (-self.small[0] + self.large[0]) / 2.0
return float(-self.small[0])

# Merge K sorted lists
def mergeKLists(lists):
"""Merge k sorted linked lists."""
heap = []
for i, lst in enumerate(lists):
if lst:
heapq.heappush(heap, (lst.val, i, lst))

dummy = ListNode(0)
current = dummy
while heap:
val, idx, node = heapq.heappop(heap)
current.next = node
current = current.next
if node.next:
heapq.heappush(heap, (node.next.val, idx, node.next))

return dummy.next

# Dijkstra's algorithm
def dijkstra(graph, start):
"""Find shortest paths from start to all vertices."""
import heapq
dist = {node: float('inf') for node in graph}
dist[start] = 0
pq = [(0, start)]

while pq:
d, u = heapq.heappop(pq)
if d > dist[u]:
continue
for v, weight in graph[u]:
if dist[u] + weight < dist[v]:
dist[v] = dist[u] + weight
heapq.heappush(pq, (dist[v], v))

return dist

# Test
print(f"Top-3 largest [1,2,3,5,6,4]: {topKLargest([1,2,3,5,6,4], 3)}")
print(f"Top-2 frequent [1,1,1,2,2,3]: {topKFrequent([1,1,1,2,2,3], 2)}")

mf = MedianFinder()
for num in [1, 2, 3, 4, 5]:
mf.addNum(num)
print(f"Median: {mf.findMedian()}")

Common Pitfalls

warning

Incorrect heap size in top-K: Using heap size > k wastes space and increases complexity. Maintain exactly k elements.

Not balancing heaps in median: Forgetting to move roots between heaps after insertion breaks median query. Always balance after insertion.

Skipping outdated entries in Dijkstra: Heap contains multiple entries per vertex. Must check if vertex visited before processing. Otherwise explores same vertex multiple times.

Wrong heap type: Top-K largest needs min-heap (extract minimum from k largest). Top-K smallest needs max-heap. Easy to confuse.

Merging with incorrect tuple ordering: Tuple elements determine heap ordering. Ensure comparator element (value) comes first.

Self-Check

  1. Why is top-K with min-heap O(n log k)? When is this better than sorting?
  2. Explain two-heap median approach. Why must sizes stay balanced?
  3. How does Dijkstra avoid reprocessing vertices? What role does visited checking play?

One Takeaway

info

Heaps solve selection and ranking problems optimally. Top-K leverages heap size k, two-heap median elegantly balances streams, and priority queues power graph algorithms. Recognizing these patterns transforms complex problems into clean heap-based solutions.

Real-World Applications

Stock Market: Top N Moving Movers

# Find top 10 stocks by percentage gain today

class StockTracker:
def __init__(self, top_k=10):
self.top_k = top_k
self.heap = [] # min-heap of (gain%, symbol)

def add_stock(self, symbol, current_price, open_price):
gain = (current_price - open_price) / open_price * 100

if len(self.heap) < self.top_k:
heapq.heappush(self.heap, (gain, symbol))
elif gain > self.heap[0][0]:
heapq.heapreplace(self.heap, (gain, symbol))

def get_top_movers(self):
return sorted(self.heap, reverse=True)

# Usage: Track 10,000 stocks, keep only top 10
tracker = StockTracker(top_k=10)
for stock in all_stocks:
tracker.add_stock(stock.symbol, stock.current_price, stock.open_price)

top_movers = tracker.get_top_movers()
# [[+15.2%, NVDA], [+12.8%, TSLA], ...]

Job Scheduling: Earliest Deadline First

# Schedule jobs to meet deadlines

class Job:
def __init__(self, name, duration, deadline):
self.name = name
self.duration = duration
self.deadline = deadline

def __lt__(self, other):
return self.deadline < other.deadline

def schedule_jobs(jobs):
"""Schedule jobs earliest deadline first."""
pq = jobs # Already a min-heap by deadline
heapq.heapify(pq)

current_time = 0
schedule = []

while pq:
job = heapq.heappop(pq)
schedule.append((job.name, current_time, current_time + job.duration))
current_time += job.duration

if current_time > job.deadline:
print(f"WARNING: Job {job.name} missed deadline")

return schedule

# Usage
jobs = [
Job("Task A", 2, 5), # 2 units, due at time 5
Job("Task B", 1, 3), # 1 unit, due at time 3
Job("Task C", 3, 8), # 3 units, due at time 8
]

schedule = schedule_jobs(jobs)
# Task B: 0-1 (meets deadline 3)
# Task A: 1-3 (meets deadline 5)
# Task C: 3-6 (meets deadline 8)

System Load Balancer: Least Loaded Server

# Distribute requests to least loaded servers

class LoadBalancer:
def __init__(self, servers):
self.servers = servers
self.load_heap = [(0, server_id) for server_id, _ in enumerate(servers)]
heapq.heapify(self.load_heap)

def assign_request(self):
"""Assign request to least loaded server."""
current_load, server_id = heapq.heappop(self.load_heap)

# Process request (takes 1 unit of time on this server)
new_load = current_load + 1
heapq.heappush(self.load_heap, (new_load, server_id))

return server_id

def complete_request(self, server_id):
"""Server finished processing request."""
# Simplified: in reality, track completion time and unload
pass

# Usage: 100 requests arrive, distribute load
lb = LoadBalancer(servers=[()] * 4) # 4 servers
for i in range(100):
server = lb.assign_request()
# Server "server" processes request

Performance Comparison

ProblemAlgorithmTimeSpaceNotes
Top-K largestMin-heap of size KO(n log K)O(K)Better than O(n log n) when K much less than N
Running medianTwo heapsO(log n) per insertO(n)O(1) query, ideal for streams
Merge K listsPriority queueO(n log K)O(K)Cleaner than pairwise merging
DijkstraPriority queueO((V+E) log V)O(V)Fast shortest path for sparse graphs
Task schedulingMin-heapO(n log n)O(n)Greedy approach optimal

References

  • Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to Algorithms (3rd ed.). MIT Press.
  • McDowell, G. L. (2015). Cracking the Coding Interview (6th ed.). CareerCup.
  • Real-world applications of heaps in production systems