Disjoint Sets

You have a collection of elements. Some of them belong together, and you need to answer one question over and over: are these two in the same group?

Think about a social network. Alice knows Bob, Bob knows Carol. Are Alice and Carol in the same friend cluster? Or picture a map with cities and roads. You keep adding roads between cities. After each road, you want to know: can you drive from city X to city Y?

The operations are simple. You need to find which group an element belongs to, and you need to merge two groups when you discover a connection between them. That's it. Two operations, and the whole challenge is making them fast.

Starting point: every element alone

We start with a constraint: every element is its own group. Each one points to itself, meaning "I am my own representative." We call that representative the root.

Now we need to merge groups and check membership. The simplest idea is to keep a list for each group. To merge two groups, you append one list to the other. To check if two elements are in the same group, you scan through until you find them.

The list approach hits a wall

With lists, merging two groups means updating every element in the smaller list to point to the new group. If you have 100 elements in one group and 50 in another, that's 50 pointer updates. Do this thousands of times and you're spending all your time on bookkeeping.

Toggle between the two representations below. Notice the cost difference.

With lists, merge cost scales with the size of the group. With 10,000 elements and 5,000 merges, you could end up doing tens of millions of pointer updates. We need a representation where merging is cheaper.

What if groups were trees?

Here's the idea: instead of a flat list, make each group a tree. Every element stores a pointer to its parent. The root of the tree is the group's representative.

Now merging two groups is trivial: point one root at the other. That's one pointer change, regardless of how many elements are in each group.

Step through the union below and watch how little work it takes.

One pointer change. The entire second tree is now part of the first, and we didn't touch a single element inside either tree.

Finding which group an element belongs to means walking up parent pointers until you reach a root (a node that points to itself). Checking if two elements are in the same group means comparing their roots.

Click on different nodes below to see the path each one takes to reach its root.

Running into the next wall

Trees fixed our merge problem, but they introduced a new one. What happens if you keep merging in an unlucky order? The tree degenerates into a chain, a linked list in disguise. Finding the root of the last element now takes as many steps as there are elements.

Toggle between the degenerate and balanced shapes below. Notice how find(E) goes from 4 steps to 1.

A degenerate tree is no better than a list for lookups. We've shifted the cost from union to find, but we haven't actually eliminated it. We need trees that stay flat.

Flattening trees on the way up

When you walk up a chain to find the root, you visit every node along the path. Why not point all of them directly at the root while you're there? You're already reading each pointer; writing it costs almost nothing extra.

This technique is called path compression. After one find, every node on the traversed path points directly to the root. Future finds for any of those nodes take a single step.

The cost of the first find stays the same: you still walk the whole path. But you pay that cost once. Every subsequent find for any node you touched drops to one step. Over many operations, the amortized cost per find approaches constant time.

Keeping trees short from the start

Path compression fixes tall trees after the fact. But we can also prevent them from getting tall in the first place.

The idea is simple: when merging two trees, always make the shorter tree point to the taller one. We track an approximate height called rank for each root. When two trees have the same rank, the rank increases by one. Otherwise, the shorter tree joins the taller one and no rank changes.

Why does this work? Think about when tree height increases. It only happens when you merge two trees of equal height. A tree of height h needs at least 2^h nodes. With n nodes, the maximum height is log₂(n). A million elements? Maximum height of 20. A billion? 30.

The combined effect

Path compression and union by rank work together. Union by rank prevents trees from getting tall. Path compression flattens whatever height remains. Together, the amortized cost per operation drops to O(α(n)), where α is the inverse Ackermann function, a function that grows so slowly it's effectively a constant.

How slowly? α(10) = 2. α(1,000) = 3. α(1,000,000) = 4. α(10^80), more atoms than exist in the observable universe, equals 5. For any dataset you'll ever work with, each operation takes at most a handful of steps.

Drag the sliders below to see how the different implementations compare at different scales.

Putting it to work: connected components

Given a graph, how do you find which vertices are reachable from which? That's the connected components problem. With disjoint sets, you process each edge once: union the two endpoints. When you're done, vertices with the same root are in the same component.

Step through the edges below and watch the components form.

One pass through the edge list. No recursion, no visited arrays, no stack management. Each union is nearly constant time.

Putting it to work: minimum spanning trees

Kruskal's algorithm builds a minimum spanning tree by processing edges in order of weight. For each edge, it asks: are the endpoints already connected? If not, add the edge and merge the components.

That "are they connected?" check is exactly what disjoint sets do. Without them, you'd need a graph traversal for every edge. With them, it's a single find comparison.

Notice step 4: the edge A-C is skipped because A and C are already in the same component. That cycle detection, checking whether two nodes share a root, is what makes Kruskal's algorithm efficient.

The implementation

The beauty of disjoint sets is how little code they require. Two arrays (parent and rank) and two functions (find and union). Path compression is a single extra line inside find. Union by rank is a three-way branch inside union.

class DisjointSet {
  parent = {}   // parent[x] = x's parent in the tree
  rank = {}     // rank[x] = approximate height of tree rooted at x

  find(x) {
    if (parent[x] != x) {
      parent[x] = find(parent[x])  // path compression
    }
    return parent[x]
  }

  union(x, y) {
    root_x = find(x)
    root_y = find(y)
    if (root_x == root_y) return  // already in same group

    // union by rank: shorter tree points to taller tree
    if (rank[root_x] < rank[root_y]) {
      parent[root_x] = root_y
    } else if (rank[root_x] > rank[root_y]) {
      parent[root_y] = root_x
    } else {
      parent[root_y] = root_x
      rank[root_x] += 1
    }
  }

  connected(x, y) {
    return find(x) == find(y)
  }
}

No rotations, no rebalancing logic, no complex invariants. The recursive find with path compression and the rank comparison in union are the entire optimization.

One limitation worth noting: disjoint sets only merge groups. They never split them. If you need to remove an element from a group or divide a group in two, you need a different data structure. But for the pattern of "start separate, merge on connection, query membership," nothing is faster.