A comprehensive guide to understanding data types, measurement levels, and choosing the right visualization
Data visualization is both an art and a science. Creating effective visualizations requires an understanding of the nature of your data, including data types and measurement levels. This guide walks you through the fundamental concepts and helps you choose the most appropriate visualization techniques for your data.
The type and level of measurement of your data determines which statistical analyses are appropriate and which visualization methods will be most effective. Making informed choices about visualization leads to clearer communication and more accurate interpretation of your data.
The most fundamental distinction in data types is between qualitative and quantitative data.
Also known as categorical data, qualitative data describes qualities or characteristics that cannot be measured numerically. It can be observed but not measured.
Examples: Color, gender, race, hair color, country, taste, smell
Refers to data that can be counted or measured using numbers. It represents quantities, amounts, or ranges.
Examples: Height, weight, age, temperature, scores, counts, prices
Qualitative data addresses the "what" or "which type" questions, while quantitative data addresses "how much" or "how many" questions. Understanding this distinction is the first step in proper data analysis and visualization.
Data can be classified into four levels of measurement, which determine what mathematical operations can be performed on the data and what visualizations are appropriate.
Nominal data consists of categories with no inherent order or ranking. Each value represents a distinct category.
Operations allowed: Mode, frequency count, percentage
Common visualizations: Bar charts, pie charts, treemaps
Ordinal data consists of categories with a clear, meaningful order or ranking, but the differences between values are not uniform or quantifiable.
Operations allowed: Mode, median, frequency count, percentages, rank ordering
Common visualizations: Bar charts, stacked bars, heat maps, dot plots
Interval data has order and equal distances between values, but no true zero point. The zero point is arbitrary and doesn't represent the complete absence of the measured attribute.
Operations allowed: Mean, median, mode, standard deviation, addition, subtraction
Common visualizations: Line charts, histograms, heat maps, area charts
Ratio data has all the properties of interval data plus a true zero point that represents the complete absence of the measured attribute.
Operations allowed: All mathematical operations (addition, subtraction, multiplication, division)
Common visualizations: All quantitative visualizations: bar charts, line charts, scatter plots, histograms, box plots
Each level of measurement includes all the properties and allowed operations of the levels below it. Ratio is the highest level, allowing all mathematical operations, while nominal is the lowest, allowing only equality comparisons.
Measurement Level | Ordering | Equal Intervals | True Zero | Example |
---|---|---|---|---|
Nominal | No | No | No | Car brands |
Ordinal | Yes | No | No | Education levels |
Interval | Yes | Yes | No | Temperature (°C) |
Ratio | Yes | Yes | Yes | Height (cm) |
Quantitative data can be further classified as discrete or continuous, which affects how we collect, analyze, and visualize the data.
Discrete data can only take specific values, typically counted as whole numbers with gaps between possible values.
Examples:
Continuous data can take any value within a range, including decimals and fractions. There are no gaps between possible values.
Examples:
Discrete data is often visualized using bar charts, while continuous data is typically visualized using histograms or density plots that show the distribution across the range of possible values.
Data visualization serves several key purposes that help us understand and communicate data more effectively.
Visualizations allow us to compare values, categories, or changes over time more easily than looking at raw numbers.
Example visualizations: Bar charts, spider/radar charts, bullet charts
Visualizations can reveal the shape, center, and spread of data distributions, helping identify patterns, outliers, and central tendencies.
Example visualizations: Histograms, box plots, violin plots, density plots
Visualizations help us understand how parts relate to the whole and how different categories contribute to a total.
Example visualizations: Pie charts, stacked bar charts, treemaps, area charts
Visualizations can reveal correlations, patterns, and connections between variables that might not be apparent in raw data.
Example visualizations: Scatter plots, bubble charts, heatmaps, network diagrams
Visualizations help us understand trends, cycles, and anomalies in time-series data.
Example visualizations: Line charts, area charts, candlestick charts, Gantt charts
Visualizations can display how data varies across geographic regions and reveal spatial patterns.
Example visualizations: Choropleth maps, cartograms, dot density maps, flow maps
Humans process visual information more quickly and effectively than text or numbers. A well-designed visualization can communicate complex patterns and insights at a glance, making data more accessible and actionable.
Selecting the appropriate visualization depends on your data type, measurement level, and what you're trying to communicate.
Data Type | Comparison | Distribution | Composition | Relationship | Time Series |
---|---|---|---|---|---|
Nominal | Bar chart, Spider chart | Bar chart, Dot plot | Pie chart, Treemap | Network diagram, Heatmap | Stacked bar chart |
Ordinal | Bar chart, Dot plot | Bar chart, Dot plot | Stacked bar chart | Heatmap, Bubble chart | Line chart, Area chart |
Interval | Bar chart, Bullet chart | Histogram, Box plot | Stacked area chart | Scatter plot, Bubble chart | Line chart, Area chart |
Ratio | Bar chart, Bullet chart | Histogram, Box plot, Violin plot | Stacked area chart, 100% charts | Scatter plot, Bubble chart | Line chart, Area chart |
When choosing a visualization, consider your audience, the complexity of your data, and the main message you want to convey. Sometimes simpler visualizations are more effective than complex ones.
Here's a guide to the most widely used visualization types, their appropriate uses, and the data types they work best with.
Best for showing distribution, central tendency, and outliers.
Best for showing patterns in a matrix of data or relationships between variables.
Best for showing hierarchical data and part-to-whole relationships.
Best for showing spatial patterns and geographic distributions.
Best for showing connections and relationships between entities.
Best for comparing categories and showing ranking.
Best for showing trends over time and continuous data.
Best for showing composition when parts add up to a meaningful whole.
Best for showing relationships between two continuous variables.
Best for showing the distribution of continuous data.
For complex data or multifaceted stories, combining multiple visualization types in a dashboard can provide a more complete picture than any single chart.
Follow these steps to choose the most appropriate visualization for your data:
Data: Monthly sales figures for five product categories over two years
Data types:
Purpose: Show sales trends over time and compare performance between categories
Potential visualizations:
The measurement level of your data significantly impacts which visualization types are most effective. Let's explore some real-world examples that demonstrate how to properly visualize different data types.
Consider this dataset about project durations across different phases and projects:
Data: Duration (in hours) of each project, by stage
Fields: Phase in Project, Duration (Hours), Project, Stakeholder
This requires analyzing nominal data (project phases) against ratio data (hours).
A bar chart with overlaid line effectively shows both total hours (bars) and average hours (line) by phase. Design phase clearly takes the longest time.
This requires comparing nominal data (projects) with a breakdown of ratio data (hours) by phase.
A stacked bar chart allows comparison of total project length while showing the composition of time spent in each phase. Project 2 took the longest time overall.
When your primary goal is comparing total project durations with composition by phase:
The stacked bar chart makes it easy to compare total heights (total project duration) while seeing the contribution of each phase.
When your goal is comparing the pattern of phase durations across projects:
A connected scatter plot reveals patterns across phases. All projects show the same pattern: Design takes longest, Development is shortest.
Nominal data (like Project, Stakeholder) is typically shown along an axis with categorical scales, using position to distinguish categories. Ordinal data (like survey responses, priority levels) maintains a specific order in visualizations and can use color intensity to reinforce ordering. Interval/Ratio data (like Hours) requires proportional visual encoding through position, length, or area.
When working with ordinal data, maintaining the correct order significantly improves visualization clarity:
When survey responses are displayed alphabetically or randomly, the pattern is difficult to discern and can be misleading.
When survey responses are properly ordered from "Strongly Disagree" to "Strongly Agree," the distribution pattern becomes immediately clear.
Different data types require different visualization approaches, even when answering similar questions:
For nominal data like product categories, a bar chart is appropriate because there's no inherent order. Categories can be arranged by frequency for easier comparison.
For interval data like temperature, a line chart is appropriate because it emphasizes the continuous nature of the data and shows trends over time.
Begin with a specific question or story you want to tell with your data. This will guide all subsequent visualization decisions.
Select visualizations that match your data type, measurement level, and communication goals using the frameworks outlined in this guide.
Remove chart junk, unnecessary decorations, and redundant elements. Focus on making your data the star of the visualization.
Use color to highlight important data points, distinguish categories, or represent values. Be mindful of color blindness and cultural associations.
Include descriptive titles, axis labels, legends, and data labels where appropriate. Your visualization should be understandable without additional explanation.
Provide context that helps viewers interpret the data correctly. This might include baselines, comparisons, or annotations of important events.
The ultimate purpose of data visualization is not to make data look pretty, but to make it more understandable. A good visualization should help viewers gain insights they wouldn't easily see in raw numbers.