Conditional Probability: How Beliefs Update with Evidence
Probability is a language for expressing our degrees of belief or uncertainties about events. But beliefs don't exist in a vacuum. Whenever we observe new evidence, obtain data, see something happen, learn a fact, our uncertainties should change.
A new observation consistent with an existing belief makes us more confident in that belief. A surprising observation throws that belief into question. Conditional probability is the concept that addresses this fundamental question: how should we update our beliefs in light of the evidence we observe?
The Importance of Thinking Conditionally
When we learn something new, how do we know if it should make us more or less confident in what we already believe?
Conditional probability is essential for scientific, medical, and legal reasoning. It's how we incorporate evidence into our understanding of the world in a logical, coherent manner.
Here's a key insight: all probabilities are conditional. There is always background knowledge or assumptions built into every probability statement, whether stated explicitly or not.
Why all probabilities are conditional
Consider the probability that it will rain today. Before looking outside, you might estimate this based on historical rainfall patterns. But which patterns? Just this month? This season? This location or nearby areas?
To determine any probability, we must decide what background information to condition on. These choices affect the answer. Different people might reasonably come up with different prior probabilities, but everyone can agree on how to update based on new evidence.
A Concrete Example: The Weather
Suppose on a morning we assess the probability that it will rain today: P(R) = 0.2.
We look outside and see ominous clouds in the sky. Now the probability of rain should increase. We denote this new probability as P(R | C) (read as "probability of R given C"), where C is the event of seeing ominous clouds. When we go from P(R) to P(R|C), we say we are "conditioning on C."
As the day progresses, we might observe more evidence: wind picking up, temperature dropping, other weather indicators. Each new observation lets us update our probabilities. Finally, if we observe that it does start raining, then P(R | it is raining) = 1.
Conditioning as a Problem-Solving Tool
Beyond updating beliefs, conditioning is powerful for solving complicated problems. The strategy: decompose a complex probability problem into simpler conditional probability problems.
Just as in computer science we break large problems into bite-sized pieces, in probability we reduce complicated problems to simpler conditional probability problems. A technique called first-step analysis allows us to solve problems with multiple stages by conditioning on what happens in the first step.
Conditioning is the soul of statistics. It's both how we update beliefs to reflect evidence and how we solve complex probability problems.
Definition and Intuition
The Mathematical Definition
If A and B are events with P(B) > 0, then the conditional probability of A given B, denoted P(A|B), is defined as:
The probability of both A and B happening, divided by the probability of B.
Here, A is the event whose uncertainty we want to update, and B is the evidence we observe or want to treat as given. We call P(A) the prior probability of A (before updating based on evidence) and P(A|B) the posterior probability of A (after updating).
Important: Understanding the conditioning bar
When we write P(A|B), the event appearing after the vertical bar is the evidence we have observed or are conditioning on. P(A|B) is the probability of A given the evidence B, not the probability of some entity called "A|B."
There is no such event as "A|B", the bar is purely notational. It tells us which information we're treating as known.
Key Property
For any event A: P(A|A) = 1
This makes perfect sense: if we learn that A has occurred, our updated probability that A occurred is 1 (certainty). If conditional probability didn't give us this result, we'd need a completely different definition!
Intuitive Interpretation: The Pebble World
Imagine a finite sample space where outcomes are visualized as pebbles with total mass 1. Event A is a set of pebbles, and event B is another set.
- Remove incompatible outcomes: When we learn B occurred, we eliminate all pebbles outside of B (they're incompatible with our evidence).
- Focus on what remains: P(A|B) is the total mass of pebbles remaining in A.
- Renormalize: We divide all masses by P(B) to ensure the new total mass is 1.
In this way, probabilities are updated consistently with observed evidence. Outcomes contradicting the evidence are discarded, and their mass is redistributed among remaining outcomes, preserving relative probabilities.
Frequentist Interpretation
Alternatively, imagine repeating an experiment many times and recording outcomes. The conditional probability P(A|B) is the fraction of times A occurs, restricting attention to trials where B occurred.
If we repeated an experiment 10,000 times and B occurred in 3,000 trials, and in those 3,000 trials A occurred 900 times, then P(A|B) ≈ 900/3000 = 0.3.
Two Classic Examples with Subtleties
The Two Children Problem
Martin Gardner posed this famous puzzle in the 1950s:
Part 1: Mr. Jones has two children. The older child is a girl. What is the probability that both children are girls?
Part 2: Mr. Smith has two children. At least one of them is a boy. What is the probability that both children are boys?
Gardner's answers were 1/2 for Part 1 and 1/3 for Part 2. For decades, people debated why the answers should differ. The key is understanding exactly what information we condition on.
The importance of how data was collected
This problem illustrates a fundamental principle in statistics: it's essential to think carefully about how the sample was collected, not just what the raw data shows.
Two different ways of learning about the children, being told about the older child vs. learning at least one is a boy, give different conditional probabilities. The statistical process matters!
Bayes' Rule: The Most Useful Formula
From the definition of conditional probability, we can derive something remarkable. Multiply both sides by P(B):
Posterior = Likelihood × Prior / Evidence
This seems circular at first: we defined P(A|B) using P(A|B), but we defined P(A|B) differently! The power of Bayes' rule is that it often turns out P(B|A) is much easier to find directly than P(A|B) (or vice versa).
0.100 × 0.900 = 0.0900
(0.900 × 0.100) + (0.100 × 0.900)
= 0.1800
A Medical Testing Example
Fred is tested for a rare disease that afflicts 1% of the population. The test is 95% accurate, meaning:
- P(test positive | has disease) = 0.95 (sensitivity)
- P(test negative | doesn't have disease) = 0.95 (specificity)
Fred's test comes back positive. What's the probability he actually has the disease?
Intuitively, you might think: 95% accurate, so 95% chance he has it. But that's wrong! We need Bayes' rule.
Why Is This Surprising?
Most people find this shocking. The test is 95% accurate, yet only a 16% chance Fred has the disease?
The key: the disease is rare. Even though false positives are uncommon (5% of healthy people test positive), there are many more healthy people than sick people. So in a population of 10,000:
- 100 have the disease; 95 test positive (true positives)
- 9,900 are healthy; 495 test positive (false positives)
- Total positive tests: 590
- Probability of disease given positive test: 95/590 ≈ 16%
The false positives vastly outnumber the true positives! This is why medical tests need to be interpreted carefully and second opinions are valuable.
The Law of Total Probability
Often we want to find P(B) but don't know it directly. We can decompose B into disjoint pieces and use the Law of Total Probability:
If A₁, A₂, ..., Aₙ partition the sample space (disjoint and exhaustive):
The total probability of B is a weighted sum of conditional probabilities.
To compute P(B), we divide the sample space into disjoint slices (A₁, A₂, ..., Aₙ), find the conditional probability of B within each slice, then take a weighted sum where weights are the probabilities P(Aᵢ).
The Random Coin Problem
You have one fair coin (heads with probability 1/2) and one biased coin (heads with probability 3/4). You pick one at random and flip it three times. It lands heads all three times.
What's the probability you picked the fair coin?
Scenario: Three flips all landed Heads. Which coin did you pick?
Conditional Probabilities Are Probabilities
Here's a profound insight: when we condition on event E, we're effectively putting ourselves in a universe where E is known to be true. In this new universe, the laws of probability operate exactly as before.
Conditional probabilities satisfy all the axioms of probability:
- Conditional probabilities are between 0 and 1
- P(A|E) + P(Aᶜ|E) = 1 (a proper partition)
- If A and B are disjoint, then P(A ∪ B | E) = P(A|E) + P(B|E)
- Any result we derived about probability still holds with conditioning on E
All probabilities are conditional on background knowledge
In fact, we can think of every probability statement as conditional. There's always background knowledge K, even if unspoken. Then P(A) is shorthand for P(A|K).
When you say "the probability of rain today is 20%," you're implicitly conditioning on your knowledge of the climate, season, location, and current weather patterns. Different people might assign different prior probabilities based on different background knowledge, but everyone can agree on how to update those beliefs given new evidence.
Independence: When Conditioning Changes Nothing
Sometimes, learning that B occurred tells us absolutely nothing about whether A occurs. In this case, A and B are independent.
Events A and B are independent if and only if:
P(A ∩ B) = P(A) · P(B)
Or equivalently: P(A|B) = P(A) and P(B|A) = P(B)
Independence vs. Disjointness
These are very different concepts! If A and B are disjoint (mutually exclusive), then P(A ∩ B) = 0, so they're definitely not independent (unless one has probability 0). Disjoint events provide strong information about each other: if A happens, B definitely didn't.
Independence means the events provide no information about each other. Knowing A happened doesn't change the probability of B.
Conditional Independence
Events can be independent overall but dependent when we condition on something. Or vice versa.
Two events can be:
- Conditionally independent given E, but not independent
- Independent, but not conditionally independent given E
- Both? Neither? Context matters!
The Coherency of Bayes' Rule
An important property: it doesn't matter whether you update your beliefs sequentially (incorporating evidence one piece at a time) or all at once. You get the same answer.
If Fred takes two independent tests for the disease and both come back positive, you can either:
- Update sequentially: Use the first positive test to get a posterior probability (say, 16%), then use that as your new prior for the second test.
- Update all at once: Use both positive tests together with the original prior.
Both approaches yield the same answer. This is powerful: it means the order in which we encounter evidence doesn't matter, only the total evidence.
A Classic Problem: Monty Hall
One of the most famous problems in probability is the Monty Hall Problem. It beautifully illustrates how conditioning on new information can dramatically change probabilities, and why many people find conditional probability counterintuitive.
The setup: You're on a game show. There are three doors. Behind one is a car, behind two are goats. You choose a door. The host, who knows where the car is, opens one of the other doors and reveals a goat. Now you have a choice: stick with your original choice, or switch to the remaining unopened door. Should you switch?
Three doors. One car, two goats. Choose wisely!
The answer is yes, you should switch. By switching, your probability of winning increases from 1/3 to 2/3. This counterintuitive result stumped many people when it was posed in the 1990s. The key insight is that the host's action of revealing a goat provides information that changes the conditional probabilities. You're not choosing between two equally likely doors, the host's knowledge makes one more likely than the other.
Practical Takeaways
-
Always condition on your knowledge: Probabilities are only meaningful in context. Different background knowledge can justify different prior probabilities.
-
Bayes' rule connects what we can measure to what we want to know: Often P(evidence|hypothesis) is easier to compute than P(hypothesis|evidence).
-
Beware the base rate: Even if evidence is reliable, if the base rate is low, a positive result might still be more likely to be false than true.
-
How the data was collected matters: Different ways of learning the same information can yield different conditional probabilities.
-
Conditioning is a problem-solving tool: Break complex problems into cases, solve each with conditional probability, combine with the law of total probability.
Next Steps
Conditional probability is the foundation for understanding:
- Hypothesis testing: How do we decide if evidence supports a theory?
- Bayesian inference: How do we systematically update beliefs?
- Information theory: How much does evidence reduce uncertainty?
- Decision-making: How do we choose optimally under uncertainty?
- Machine learning: How do models learn from data?
All of these rest on the bedrock of conditional probability.
Remember: Conditioning is the soul of statistics. When you learn something new, use conditional probability to update your beliefs logically and consistently.