Skip to main content

Searching Algorithms

Various searching algorithms from linear search to advanced interpolation search

TL;DR

Binary search queries sorted data in O(log n), exponentially faster than linear search's O(n). Interpolation search exploits uniform distribution for O(log log n) average case. Ternary search finds peaks/valleys in unimodal functions. Off-by-one errors and integer overflow are common pitfalls; use mid = left + (right - left) / 2 to prevent bugs. Choose linear for unsorted small datasets, binary for sorted large data, interpolation for uniformly distributed integers.

Learning Objectives

  • Understand binary search invariants and boundary conditions
  • Implement search variants (first/last occurrence, insert position, range)
  • Recognize when linear, binary, ternary, exponential, or interpolation search applies
  • Identify and fix off-by-one errors and integer overflow bugs
  • Optimize search for special data structures (rotated arrays, 2D matrices)
  • Analyze time/space complexity for different search scenarios

Motivating Scenario

Your analytics platform searches a sorted log of 1 billion events for a transaction ID. Linear search: 500M comparisons, 10 seconds. Binary search: 30 comparisons, 5 milliseconds. A user-facing autocomplete searches product names in lexicographic order with 10 million items. Interpolation search on uniformly distributed inventory suggests O(log log n) performance. A feature flag system needs to find the first timestamp greater than "now"; binary search on log entries enables sub-millisecond lookup. Search algorithm choice directly impacts query latency and system responsiveness.

Core Concepts

Search algorithm decision tree: consider data structure, distribution, and query type.

Binary Search: Divides sorted range in half; eliminates half the search space per iteration. Requires sort order and careful boundary handling.

Interpolation Search: Estimates target position using linear interpolation; faster on uniformly distributed data but can degrade to O(n) on clustered data.

Ternary Search: Divides range into thirds; useful for finding peaks/valleys in unimodal functions, not sorted arrays.

Exponential Search: Finds range by doubling, then binary searches; efficient for unbounded or infinite sequences.

Boundary Conditions: Inclusive vs exclusive endpoints; off-by-one errors are common. Standard pattern: left = 0, right = n - 1.

Practical Example

def binary_search(arr, target):
"""Standard binary search: returns index or -1."""
left, right = 0, len(arr) - 1

while left <= right:
mid = left + (right - left) // 2 # Prevent overflow
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1

return -1

def binary_search_first(arr, target):
"""Find first (leftmost) occurrence of target."""
left, right = 0, len(arr) - 1
result = -1

while left <= right:
mid = left + (right - left) // 2
if arr[mid] == target:
result = mid
right = mid - 1 # Continue left
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1

return result

def binary_search_last(arr, target):
"""Find last (rightmost) occurrence of target."""
left, right = 0, len(arr) - 1
result = -1

while left <= right:
mid = left + (right - left) // 2
if arr[mid] == target:
result = mid
left = mid + 1 # Continue right
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1

return result

def binary_search_insert(arr, target):
"""Find position to insert target to maintain order."""
left, right = 0, len(arr)

while left < right:
mid = left + (right - left) // 2
if arr[mid] < target:
left = mid + 1
else:
right = mid

return left

# Usage: Events sorted by timestamp
events = [100, 200, 200, 200, 300, 400, 500]
print(binary_search(events, 200)) # 1 (any occurrence)
print(binary_search_first(events, 200)) # 1 (first 200)
print(binary_search_last(events, 200)) # 3 (last 200)
print(binary_search_insert(events, 250)) # 4 (insert position)

# Range query: all elements equal to 200
first = binary_search_first(events, 200)
last = binary_search_last(events, 200)
count = last - first + 1 # 3 elements

When to Use / When NOT to Use

Binary Search
  1. Data is sorted and random-accessible (array, sorted list)
  2. Need O(log n) exact match or range queries
  3. Data size > 100 elements; linear search becomes slow
  4. Preferred default for sorted data
Linear Search
  1. Data is unsorted and cannot be sorted
  2. Dataset is small (< 100 items)
  3. Need first occurrence without duplicates
  4. One-time search on already-sorted linked list (no random access)
Interpolation Search
  1. Data is uniformly distributed (prices, IDs, timestamps)
  2. Can achieve O(log log n) on well-distributed data
  3. Integer or numeric keys
  4. Drawback: degrades to O(n) on skewed/clustered data
Exponential Search
  1. Unbounded or infinite array (streams, generators)
  2. Size of array unknown beforehand
  3. Better than binary search when target is near beginning
  4. Drawback: requires append-only or sequential access

Patterns & Pitfalls

Standard: left=0, right=n-1 (inclusive both). Update: mid=left+(right-left)//2, then left=mid+1 or right=mid-1. Avoids off-by-one.
For first: when found, update result then search left (right=mid-1). For last: search right (left=mid+1). Separate loops avoid errors.
Run binarysearchfirst() and binarysearchlast(), then count=last-first+1. Two O(log n) searches, no iteration needed.
mid = (left + right) / 2 overflows. Use mid = left + (right - left) / 2 instead. Standard pattern in all languages.
Forgetting right = len(arr) vs right = len(arr)-1. Forgetting left < right vs left <= right. Unit test all boundary cases.
If arr[mid] == target but no update to left/right, loop repeats. Always move boundaries: left=mid+1 or right=mid-1.

Design Review Checklist

  • Data is sorted before binary search is called?
  • Using mid = left + (right - left) / 2 to prevent overflow?
  • Loop condition is left <= right (or < right) and consistently updated?
  • Boundary test cases: empty array, single element, target at edges?
  • First/last occurrence variant handles duplicates correctly?
  • Range queries use two binary searches, not iteration?
  • Rotated array: correctly identifies sorted side before pruning?
  • 2D matrix: understands row-wise vs column-wise vs fully sorted?
  • Unbounded array: exponential search finds range correctly?
  • Linear search only used for unsorted or very small data?

Self-Check

  • Why does mid = (left + right) / 2 overflow, and how do you fix it?
  • How do you find first and last occurrence of a duplicate in O(log n)?
  • What is the difference between binary search and interpolation search?
  • When would you use exponential search vs binary search?
  • How do you search in a rotated sorted array without knowing the pivot?

Next Steps

References