Skip to main content

Set Operations

TL;DR

Sets efficiently store unique elements enabling O(1) lookups. Union combines all unique elements from multiple sets in O(m+n) time. Intersection finds common elements by iterating smaller set and checking membership in larger. Difference returns elements in first set but not second. Duplicate detection uses sets to track seen elements—first encounter goes to set, second encounter identifies duplicate. These fundamental operations power numerous algorithmic patterns and data structure problems.

Core Concepts

Union, Intersection, and Difference

Union (A ∪ B) combines all unique elements from both sets. Implementation: create new set, add all elements from A, then all from B (duplicates automatically eliminated). Time O(m+n) where m, n are set sizes. Result contains union of both.

Intersection (A ∩ B) finds elements in both sets. Optimize by iterating smaller set, checking existence in larger. Time O(min(m,n)). Result contains only common elements.

Difference (A - B) contains elements in A but not in B. Iterate A, exclude elements in B. Time O(m). Result asymmetric; A - B differs from B - A.

Symmetric difference (A ⊕ B) contains elements in either A or B but not both. Compute (A - B) ∪ (B - A). Time O(m+n). Mathematically (A ∪ B) - (A ∩ B) yields same result.

Subset and Superset Relationships

Subset (A ⊆ B) means all elements of A exist in B. Check by verifying every A element in B. O(|A|) time for membership checks.

Proper subset (A ⊂ B) is subset but A ≠ B. Check subset and |A| < |B|.

Superset (A ⊇ B) means A contains all B elements. Equivalent to B being subset of A.

Power set generates all 2^n possible subsets of n-element set. Recursive approach: power set of 3 builds from power set of 2 by either including or excluding 3.

Duplicate Detection and Unique Elements

Set-based duplicate detection tracks seen elements. First occurrence: add to set. Second occurrence: identified as duplicate. O(n) time, O(n) space.

XOR-based detection works for array containing one unique element and rest duplicates. XOR all elements; duplicates cancel, leaving unique. O(n) time, O(1) space but only works for specific cases.

Frequency map counts occurrences. Elements with count > 1 are duplicates. Enables counting duplicates and finding first duplicate.

Remove duplicates maintains insertion order using set to track seen, only keeping first occurrence.

Set-Based Filtering

Filter by membership removes elements not in reference set. Iterate collection, keep only elements in set.

Find missing elements when expecting range 1 to n. Create set from input, check range for missing values.

Common elements from multiple collections uses intersection repeatedly or checks membership in all sets.

Distinct elements from collection simply converts to set then back to collection.

Key Algorithms and Techniques

Union Algorithm

Create new set, add all elements from both sets. Duplicates automatically handled.

Intersection with Optimization

Iterate smaller set, check membership in larger. Early termination possible.

Difference Computation

Iterate first set, exclude elements in second set.

Duplicate Detection Loop

Single pass maintaining seen set; second encounter identifies duplicate.

Subset Verification

Check if all elements of potential subset exist in candidate superset.

Practical Example

# Set operations
def set_union(a, b):
"""Union of two sets: A ∪ B."""
return set(a) | set(b)

def set_intersection(a, b):
"""Intersection of two sets: A ∩ B."""
return set(a) & set(b)

def set_difference(a, b):
"""Difference of two sets: A - B."""
return set(a) - set(b)

def symmetric_difference(a, b):
"""Symmetric difference: elements in either but not both."""
return set(a) ^ set(b)

# Subset and superset checks
def is_subset(a, b):
"""Check if A is subset of B."""
return set(a).issubset(set(b))

def is_superset(a, b):
"""Check if A is superset of B."""
return set(a).issuperset(set(b))

# Duplicate detection
def find_duplicates(arr):
"""Find all duplicate elements in array."""
seen = set()
duplicates = set()
for num in arr:
if num in seen:
duplicates.add(num)
seen.add(num)
return list(duplicates)

def first_duplicate(arr):
"""Find first duplicate by appearance order."""
seen = set()
for num in arr:
if num in seen:
return num
seen.add(num)
return None

def remove_duplicates_ordered(arr):
"""Remove duplicates while preserving order."""
seen = set()
result = []
for num in arr:
if num not in seen:
seen.add(num)
result.append(num)
return result

# Find unique element (all others appear twice)
def find_unique_element(arr):
"""Find element appearing once when rest appear twice."""
unique_set = set()
for num in arr:
if num in unique_set:
unique_set.remove(num)
else:
unique_set.add(num)
return unique_set.pop()

# Find missing numbers
def find_missing(arr, n):
"""Find missing numbers in range 1 to n."""
num_set = set(arr)
return [i for i in range(1, n + 1) if i not in num_set]

# Test cases
print(f"Union [1,2,3] and [3,4,5]: {set_union([1,2,3], [3,4,5])}")
print(f"Intersection [1,2,3] and [3,4,5]: {set_intersection([1,2,3], [3,4,5])}")
print(f"Difference [1,2,3] and [3,4,5]: {set_difference([1,2,3], [3,4,5])}")
print(f"Find duplicates [1,2,2,3,3,3]: {find_duplicates([1,2,2,3,3,3])}")
print(f"First duplicate [1,3,2,1]: {first_duplicate([1,3,2,1])}")
print(f"Remove duplicates [1,2,2,3,1]: {remove_duplicates_ordered([1,2,2,3,1])}")
print(f"Find unique [1,1,2,2,3]: {find_unique_element([1,1,2,2,3])}")
print(f"Is [1,2] subset of [1,2,3]? {is_subset([1,2], [1,2,3])}")

Common Pitfalls

warning

Modifying set during iteration: Removing elements while iterating causes skipped elements. Create new set or use copy.

Intersection optimization ignored: Iterating larger set then checking smaller is slower. Always iterate the smaller set.

Duplicate detection with first occurrence tracking missing: Store first occurrence index to find earliest duplicate, not just identify existence.

Symmetric difference vs regular difference confusion: A ⊕ B ≠ A - B. Symmetric difference includes B's unique elements too.

Order loss in set operations: Sets don't preserve order. For order-preserving operations, track seen with insertion order (Python 3.7+ dicts, Java LinkedHashSet).

Self-Check

  1. Explain union, intersection, and difference operations. How do complexity and correctness differ?
  2. How do sets detect duplicates in linear time? What is the space trade-off?
  3. Why optimize intersection by iterating the smaller set? Calculate complexity for both approaches.

One Takeaway

info

Sets provide O(1) membership checking enabling efficient union, intersection, and difference operations. Duplicate detection via sets transforms O(n²) naive checks to O(n). These fundamental operations underpin filtering, grouping, and comparison tasks across algorithms.

References

  • Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to Algorithms (3rd ed.). MIT Press.
  • McDowell, G. L. (2015). Cracking the Coding Interview (6th ed.). CareerCup.