Don't Assume Aggregates Tell the Whole Story

Look beyond averages and totals to uncover the hidden patterns in your data's distribution.

Key Takeaways

  • Averages and totals often mask critical information by flattening diverse distributions into a single number—equivalent averages can represent completely different realities.
  • When two groups have the same average but different distributions (one consistent, one polarized), making decisions based only on the average could lead to inappropriate interventions.
  • Using visualization techniques that show distribution details—like histograms, box plots, and dot plots—helps reveal patterns, outliers, and potential equity issues that would otherwise remain hidden.
  • Breaking data down by relevant segments (demographics, time periods, categories) before aggregating helps identify disparities and provides a more complete understanding of what's happening in your data.

Real-world Example

Course Satisfaction Ratings

Average-Only Analysis

Both Course A and Course B have the same average satisfaction rating of 3.5/5. Based on this, the L&D team concludes that both courses are performing similarly and require the same level of improvement.

Distribution Analysis

Looking at the distribution reveals very different stories:

Course A: 3.5/5 avg

Distribution (1-5 scale):

Consistently moderate ratings

Course B: 3.5/5 avg

Distribution (1-5 scale):

Polarized "love it or hate it" ratings

The full distribution reveals that Course A needs minor improvements across the board, while Course B has a fundamental design issue where it works extremely well for some learners but fails completely for others.

Same average, completely different improvement strategies needed!

How to Apply This Principle

1. Show the Distribution

Use visualization types that reveal spread:

  • Histograms for frequency patterns
  • Box plots for quartile distribution
  • Violin plots for density
  • Dot plots for individual data points
  • Paired averages + distribution visuals

2. Break Down by Segments

Disaggregate data across key dimensions:

  • Demographic categories
  • Geographic regions
  • Time periods
  • Teams or departments
  • Product categories

3. Report Distributional Statistics

Include measures beyond just averages:

  • Median (less sensitive to outliers)
  • Range (min/max spread)
  • Standard deviation (variation)
  • Interquartile range (middle 50%)
  • Skewness (distribution tilt)
"The average obscures as much as it reveals. Once we look beyond summaries to understand the full distribution of our data, we unlock insights that have been hiding in plain sight."
— Amanda Cox, Data Journalist