Outliers Deserve Attention (Not Just Removal)

Discover valuable insights hidden in the extreme data points others might discard.

Key Takeaways

  • Outliers often represent real phenomena—like exceptional performance, system failures, or unusual circumstances—and automatically removing them can hide valuable insights about your processes or data.
  • Using visualization techniques that reveal individual data points, like scatter plots, box plots, or dot plots, helps you identify outliers that might otherwise be hidden in summary statistics or bar charts.
  • A systematic approach to outliers—flagging them for review, investigating their causes, and documenting findings—can lead to breakthrough insights about system weaknesses, exceptional performers, or process improvement opportunities.
  • Only after thorough investigation should you consider whether an outlier should be excluded from analysis—and even then, document the reasoning and consider performing analyses both with and without the outliers included.

Real-world Example

Customer Feedback Survey Analysis

Approach 1: Automatic Outlier Removal

An L&D team receives customer satisfaction survey results with an average of 4.6/5 and removes the single score of 1/5 as a statistical outlier. They report "overwhelmingly positive feedback" and make no changes to their program.

Approach 2: Outlier Investigation

The same team notices the 1/5 rating and investigates further. The comment associated with this outlier reveals a critical accessibility issue: "Course was impossible to complete with a screen reader." This leads to fixing an accessibility problem that would have affected many other users over time.

The outlier was a signal of a real problem affecting a minority of users—removing it would have hidden a crucial insight for improvement.

How to Apply This Principle

1. Visualize Individual Data Points

Choose visualization methods that reveal outliers:

  • Scatter plots to show relationships
  • Box plots to highlight distribution
  • Dot plots to display individual values
  • Histograms to reveal distribution shape
  • Highlight outliers with distinct colors

2. Investigate Systematically

Create a protocol for handling outliers:

  • Verify data accuracy (not an error)
  • Investigate contextual factors
  • Look for patterns among outliers
  • Document findings and hypotheses
  • Analyze with and without outliers

3. Extract Actionable Insights

Turn outlier analysis into improvements:

  • Identify process breakdowns
  • Learn from exceptional performance
  • Discover hidden user segments
  • Flag equity or access issues
  • Generate hypotheses for testing
"Your outliers are trying to tell you something important. When we rush to remove them from our analysis without investigation, we silence the voices that might be pointing to our biggest opportunities for improvement."
— Hadley Wickham, Data Scientist