Part 2: Central Tendency & Spread
In Part 1, we learned that larger samples give more reliable estimates. But once you have your sample, what do you do with it? Usually, you summarize it with a single number - an "average."
But which average? And why does this choice matter mathematically?
Two definitions of "center"
When someone says "average," they usually mean one of two things.
The mean (μ) is the arithmetic average:
μ = (x₁ + x₂ + ... + xₙ) / n = (1/n) × Σxᵢ
The median is the middle value when sorted:
Median = x₍₍n+1₎/2₎ for odd n
= (x₍n/2₎ + x₍n/2+1₎)/2 for even n
Both measure "center," but they define it differently. The mean is a balance point - the value where the data "balances" like a seesaw. The median is a position - literally the middle element.
Key Question: If both measure the center, when does it matter which one you use?
Answer: When your data has outliers. Let's prove why mathematically.
Why outliers break the mean
The mathematical vulnerability
Consider a dataset [10, 12, 14, 16, 18] with mean = 14 and median = 14.
Now add one outlier: 100.
New mean:
μ' = (10 + 12 + 14 + 16 + 18 + 100) / 6
= 170 / 6
= 28.33
New median:
Sorted: [10, 12, 14, 16, 18, 100]
Middle values: 14 and 16
Median = (14 + 16) / 2 = 15
The mean jumped from 14 to 28.33 (+102%), while the median only moved from 14 to 15 (+7%).
Experiment 1: Start with the default outlier value of 100. See how dramatically the mean shifts compared to the original.
Experiment 2: Click "Show Mathematical Derivation" to see the step-by-step calculation of exactly how much the mean shifts.
Experiment 3: Try extreme outlier values (150, 200). Watch how the percentage change in mean grows with outlier magnitude.
The key insight: The mean's formula includes every value directly, so extreme values have outsized influence.
The breakdown point
How do we quantify this vulnerability? We use the breakdown point - the minimum fraction of contaminated data points needed to make an estimator arbitrarily wrong.
Definition: Breakdown Point
The breakdown point of an estimator is the smallest proportion of observations that must be replaced by arbitrary values to make the estimator give arbitrarily large or small values.
Mean's breakdown point: 0%
Proof: Take any dataset. Replace just ONE value with M. As M → ∞:
Mean' = (Original Sum - replaced value + M) / n
→ ∞ as M → ∞
Even 1/n contamination can make the mean arbitrarily large. As n → ∞, this proportion → 0.
Median's breakdown point: 50%
Proof: The median is determined by the middle position. To change the median, you must change which value occupies the middle position. This requires moving more than half the data points past the original median.
- Replace 49% of points with ∞: median unchanged (middle is still original data)
- Replace 51% of points with ∞: median = ∞
Therefore, breakdown point = 50%.
Experiment 1: Set contamination to 20%. See the mean shift dramatically while the median stays stable.
Experiment 2: Increase contamination to 45%. Median still unchanged!
Experiment 3: Push contamination to 50% or higher. NOW the median breaks.
The math in action: You can contaminate nearly half your data and the median will still give you a valid answer. The mean breaks with a single bad point.
The breakdown point explains everything. The mean (0% breakdown) is optimal when data is clean but useless with any contamination. The median (50% breakdown) is robust to nearly half your data being garbage.
This is why income statistics use median (contaminated by billionaires) but physics experiments use mean (carefully controlled data).
Deriving the standard deviation
We've established that the mean and median tell us about center. But two datasets can have the same center yet be completely different. Dataset A: [49, 50, 51] has mean = 50 and is very tight. Dataset B: [0, 50, 100] also has mean = 50 but is very spread out.
We need a measure of spread. Let's derive one from first principles.
Attempt 1: average deviation (fails!)
Intuition: measure how far each point is from the mean, then average.
Average Deviation = Σ(xᵢ - μ) / n
Problem: positive and negative deviations cancel!
For [49, 50, 51] with μ = 50:
Deviations: -1, 0, +1
Sum: -1 + 0 + 1 = 0
Average deviation = 0
This is useless - it always equals zero (which we can prove: Σ(xᵢ - μ) = Σxᵢ - nμ = nμ - nμ = 0).
Attempt 2: average absolute deviation
Fix the cancellation by taking absolute values:
MAD = Σ|xᵢ - μ| / n
This works! For [49, 50, 51]: MAD = (1 + 0 + 1) / 3 = 0.67
But absolute values are mathematically inconvenient (not differentiable at zero, harder to work with algebraically).
The standard deviation
Instead of absolute values, square the deviations (makes them all positive), then take the square root at the end:
Derivation of Standard Deviation
Step 1: Calculate squared deviations
(xᵢ - μ)² for each i
Step 2: Average them (this is the variance)
σ² = Σ(xᵢ - μ)² / n
Step 3: Take square root (to return to original units)
σ = √[Σ(xᵢ - μ)² / n]
Why square then square root? Squaring makes all terms positive (solves cancellation) and penalizes larger deviations more (4² = 16 vs 2² = 4). Taking the square root returns to original units - if data is in dollars, σ is in dollars.
Step through the derivation: Use Previous/Next to walk through each mathematical step. On each step, the visualization updates to show what's being calculated.
Watch the squaring: When you reach step 3, see how the squared deviations appear below each data point. Notice that the point furthest from the mean contributes the most to σ.
Compare the final result: Step 6 shows how taking the square root gives us a value in the same units as our original data.
Why use σ instead of MAD? First, variance (σ²) has nice algebraic properties - Var(X + Y) = Var(X) + Var(Y) for independent variables, which doesn't hold for MAD. Second, for normal data, about 68% falls within ±1σ and 95% within ±2σ. Third, the whole machinery of statistics is built on variance - remember Standard Error = σ/√n from Part 1.
Same mean, different reality
Here's where all this matters in practice.
Product reviews
Two products both have a mean rating of 4.0 stars.
Product A has σ = 0.3 stars. Almost all reviews fall between 3.7 and 4.3 stars - a consistent experience, low-risk purchase.
Product B has σ = 1.5 stars. Reviews spread from 2.5 to 5 - people love it or hate it. High-risk purchase.
The mean tells you nothing about this difference. Standard deviation reveals it.
Experiment with outliers: Add outliers one at a time and watch the statistics update.
Watch the median resist: Even with 4 outliers (out of 9 total points), the median barely budges while the mean has shifted dramatically.
Find the threshold: Keep adding outliers until the median finally breaks. Count how many outliers it took - that's the 50% breakdown point in action!
When to use what
| Situation | Use Mean | Use Median |
|---|---|---|
| Symmetric, clean data | ✓ | ✓ |
| Known outliers exist | ✗ | ✓ |
| Reporting income/prices | ✗ | ✓ |
| Physical measurements (carefully controlled) | ✓ | Either |
| You need to do further calculations | ✓ | Depends |
| Spread Measure | When to Use |
|---|---|
| Standard Deviation (σ) | Clean, symmetric data; when you'll use it for further statistics |
| IQR (Interquartile Range) | Data with outliers; robust alternative |
| Range | Quick sanity check only; very sensitive to outliers |
When someone reports just one summary statistic, they're hiding information. "Average salary: $100,000" - is that mean or median? What's the spread? "Average rating: 4.5 stars" - standard deviation of 0.2 or 1.5? "Average return: 8%" - what was the worst year?
Always ask for both center and spread. Better yet, ask to see the distribution.
What we derived
Starting from the question "how do we summarize data?", we derived the mean (Σxᵢ/n) as a balance point with 0% breakdown point, and the median as a position with 50% breakdown point. We showed why outliers break the mean - the formula includes every value directly. And we derived the standard deviation (√[Σ(xᵢ-μ)²/n]) from the need to measure spread without cancellation.
The choice between mean and median isn't arbitrary - it's determined by your data's contamination level and your tolerance for risk.
In Part 3, we'll look at shape - and derive the Central Limit Theorem that explains why sample means are normally distributed regardless of the population shape.