Probability and Counting
Luck. Coincidence. Randomness. Uncertainty. Risk. We use these words all the time, usually in a vague, casual way. The problem is that probability often contradicts our gut feelings.
If we rely on intuitions that turn out to be wrong, we make bad predictions and overconfident decisions. How do we reason about uncertainty without fooling ourselves?
The goal of this post is to introduce probability as a logical framework for quantifying uncertainty and randomness. We'll also aim to strengthen intuition, both when our initial guesses coincide with logical reasoning and when we're not so lucky.
Why Study Probability?
Mathematics is the logic of certainty; probability is the logic of uncertainty. It shows up in statistics (using data to learn about the world), machine learning (almost everything depends on probability), genetics (inheritance is random), finance (pricing options requires modeling randomness), and medicine (clinical trials are designed around probability). If you want to understand how the world works, you need probability.
Sample Spaces and Pebble World
The mathematical framework for probability is built around sets. Imagine that an experiment is performed, resulting in one out of a set of possible outcomes.
Sample space S: The set of all possible outcomes of an experiment.
Event A: A subset of the sample space S. We say A occurred if the actual outcome is in A.
When the sample space is finite, we can visualize it as Pebble World. Each pebble represents an outcome, and an event is a set of pebbles.
Click the different event buttons to see which pebbles are included in each event. Notice how the union A ∪ B includes 8 pebbles (those in A or B, including the one in both), while the intersection A ∩ B includes only 1 pebble (the one in both A and B).
Set operations make it easy to build new events from existing ones:
- Union A ∪ B: The event that A or B (or both) occurs
- Intersection A ∩ B: The event that both A and B occur
- Complement Aᶜ: The event that A does not occur
In Pebble World, probability behaves like mass: the total mass of all pebbles is 1, and the probability of an event is the total mass of pebbles in that event. If all pebbles have equal mass, this becomes simple counting.
The Naive Definition of Probability
When all outcomes are equally likely, probability becomes a counting problem:
P(A) = |A| / |S| = (number of favorable outcomes) / (total number of outcomes)
This "naive" definition is powerful when it applies, but requires two conditions:
- The sample space S must be finite
- All outcomes must be equally likely
The naive definition has often been misapplied by arguments like "either it will happen or it won't, so it's 50-50." This reasoning isn't even internally consistent, it would say both "life on Mars" and "intelligent life on Mars" have probability 1/2, but clearly the latter should have lower probability than the former.
The naive definition is applicable when:
- There's symmetry making outcomes equally likely (fair coins, well-shuffled decks)
- Outcomes are equally likely by design (simple random sampling)
- We use it as a null model to see if equally likely outcomes match observed data
How to Count: The Multiplication Rule
Many counting problems can be solved using the multiplication rule:
If Experiment A has a possible outcomes, and for each of those outcomes Experiment B has b possible outcomes, then the compound experiment has a × b possible outcomes.
Switch between different experiments and click "Expand All" to see the complete tree of outcomes. Notice how the total count is always the product of choices at each level.
Some important observations:
- Order doesn't matter for counting: Whether you choose the cone first or the flavor first, there are still 2 × 3 = 6 possibilities
- Each branch must have the same count: If waffle cones had different flavor options than cake cones, we couldn't simply multiply
The multiplication rule naturally leads to formulas for sampling. With replacement: n^k possibilities (each sample has n choices). Without replacement: n × (n-1) × ... × (n-k+1) possibilities (choices decrease as items are used).
Sampling: With vs Without Replacement
A fundamental distinction in counting is whether we're sampling with replacement (an item can be chosen multiple times) or without replacement (once chosen, an item is no longer available).
Toggle between "With Replacement" and "Without Replacement" modes. Draw samples and notice how items become unavailable in without-replacement mode. Compare the total count formulas at the bottom.
Sampling with replacement: n^k ordered outcomes (inexhaustible items)
Sampling without replacement: n!/(n-k)! ordered outcomes (requires k ≤ n)
Permutations vs Combinations
The question that determines which formula to use: does order matter?
- Permutation: An ordered arrangement. ABC is different from CBA.
- Combination: An unordered selection. {A, B, C} is the same as {C, B, A}.
Adjust n and k, then toggle between "Order Matters" and "Order Doesn't Matter" to see how the count changes. Notice that permutations are always larger by a factor of k! (the number of ways to arrange k items).
The relationship between them:
P(n,k) = C(n,k) × k!
Because each combination of k items can be arranged in k! different orders.
Binomial Coefficients and Pascal's Triangle
The binomial coefficient C(n,k), read "n choose k," counts the number of ways to choose k items from n items (order doesn't matter).
C(n,k) = n! / (k! × (n-k)!)
Adjust n and k to see the highlighted position in Pascal's Triangle. Notice how each entry equals the sum of the two entries directly above it, this is Pascal's Rule: C(n,k) = C(n-1,k-1) + C(n-1,k).
Important identities:
- Symmetry: C(n,k) = C(n,n-k): choosing who's IN is the same as choosing who's OUT
- Pascal's Rule: C(n,k) = C(n-1,k-1) + C(n-1,k): explains the triangle structure
- Sum of row: C(n,0) + C(n,1) + ... + C(n,n) = 2^n: total number of subsets
Story Proofs
A story proof proves an identity by interpreting both sides as counting the same thing in different ways. This is often more insightful than algebraic manipulation.
Click on people to select them for the committee. Notice that you're simultaneously determining who's NOT on the committee, and there are exactly the same number of ways to make either choice.
Story proofs turn abstract algebra into intuitive arguments. To prove C(n,k) = C(n,n-k): choosing a committee of k from n people is the same as choosing the n-k people who are NOT on the committee.
The Birthday Problem
One of probability's most counterintuitive results: How many people do you need in a room for there to be a 50% chance that two share a birthday?
The answer: just 23 people.
Adjust the number of people and run simulations to see how the empirical probability matches the theoretical calculation. With 23 people, you should see matches about half the time.
Why is this surprising? The key insight is that with 23 people, there are C(23,2) = 253 pairs that could match. We're not asking if anyone shares YOUR birthday, we're asking if ANY two people share A birthday.
The probability of NO match with k people is:
P(no match) = (365/365) × (364/365) × (363/365) × ... × ((366-k)/365)
The birthday problem illustrates a common error: underestimating the number of ways things can happen. With k people, there are C(k,2) pairs, this grows quadratically, making matches much more likely than intuition suggests.
Inclusion-Exclusion Principle
When events overlap, we can't just add their probabilities. The inclusion-exclusion principle corrects for overcounting:
P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
Adjust the probabilities using the sliders. Watch how the Venn diagram changes and verify that the union probability follows the inclusion-exclusion formula.
The intuition: When we add P(A) + P(B), the intersection is counted twice. We subtract P(A ∩ B) to count it exactly once.
For three events:
P(A ∪ B ∪ C) = P(A) + P(B) + P(C) - P(A ∩ B) - P(A ∩ C) - P(B ∩ C) + P(A ∩ B ∩ C)
Application: Poker Hand Probabilities
Let's apply our counting techniques to calculate exact probabilities of poker hands.
Click through different poker hands to see how the counting breaks down. Each calculation uses binomial coefficients to count the favorable outcomes, then divides by C(52,5) total possible hands.
For example, a full house (three of one rank, two of another):
- Choose the rank for three-of-a-kind: C(13,1) = 13 ways
- Choose which 3 suits: C(4,3) = 4 ways
- Choose the rank for the pair: C(12,1) = 12 ways
- Choose which 2 suits: C(4,2) = 6 ways
- Total: 13 × 4 × 12 × 6 = 3,744 favorable hands
- Probability: 3,744 / 2,598,960 ≈ 0.144%
Probability Axioms
The naive definition is restrictive. The general definition of probability requires just two axioms:
- P(S) = 1, P(∅) = 0: Something must happen, and nothing is impossible
- Countable additivity: For disjoint events A₁, A₂, ..., P(A₁ ∪ A₂ ∪ ...) = P(A₁) + P(A₂) + ...
From these axioms, we can derive:
- P(Aᶜ) = 1 - P(A)
- If A ⊆ B, then P(A) ≤ P(B)
- P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
The axioms tell us how probability must behave mathematically, but not how to interpret it. Frequentists interpret probability as long-run frequency. Bayesians interpret it as degree of belief. Both views are useful for building intuition.
Common Pitfalls
Leibniz's Mistake
Even great mathematicians can err. Leibniz wrongly argued that rolling two dice gives equal probability for sums of 11 and 12, claiming each can be attained in "only one way."
The error: treating the dice as indistinguishable. If we label them A and B:
- Sum of 11: (A=5, B=6) or (A=6, B=5): 2 ways
- Sum of 12: (A=6, B=6): 1 way
Antidote: Always label your objects. If Leibniz had thought of "left die" and "right die," he wouldn't have made this mistake.
The Naive Definition Trap
Not all outcomes are equally likely. "It will either happen or it won't" doesn't mean 50-50. Before using the naive definition, verify that symmetry or design actually makes outcomes equally likely.
Where to Go Next
This post covers the counting machinery that probability is built on. The next step is conditional probability, where you ask: given that I know something happened, how does that change what I believe? That leads to Bayes' rule, which is where probability really starts to get useful for reasoning about the real world.