Heat to Height

A study comparing the physical dimensionlity of heat maps

Tyler Wiederich

University of Nebraska, Lincoln

Introduction

Introduction

Consider the following examples:

  • Observing the relationship between mileage and age for the percentage of car engine malfunctions
  • Average student grade measured by number of hours spent studying and number of hours spent sleeping
  • Count of beetles caught in sticky traps across the coordinates of a field

What do these all have in common?

They all report a summary statistic across two explanatory variables.

Heat Maps

When two explanatory variables represent quantitative or ordinal qualitative variables, heat maps are a popular choice in displaying summary statistics.

Figure 1: Example of a 2D heat map

3D Heat Maps

Given the three-dimensional nature of the data, heat maps are sometimes represented in three-dimensions, using height to convey the summary statistic.

(a) Figure from Golladay (1977)
(b) Figure from 3D Population Density of the US - HomeArea.com” (n.d.)
Figure 2: Examples of 3D heat maps.

Which chart to use?

Few studies have evaluated the effectiveness in extracting information from 2D and 3D heat maps.

  • Accuracy in estimations is worse for volumes than for areas (Croxton and Stein 1932).

    • Also theorized by Cleveland and McGill (1984)
  • 3D heat maps have lower error rates than 2D heat maps in virtual reality (Kraus et al. 2020).

One common limitation of these studies is that they are generalized to two-dimensional displays of the heat maps, either via paper or digital renderings. These 3D charts are not truly 3D.

Motivation

Data physicalization is the process of converting data into three-dimensional objects. While typically used for artistic representations of data, statistical graphics can also take advantage of this process. Some immediate benefits include:

  • Increased accessibility of data for users with low vision
  • Educational tools for explaining data structures
  • More intuitive interactions
  • Potential for increased memorability and/or engagement

Study Overview

While there are many methods of evaluating the qualities of a “good” chart (Vanderplas, Cook, and Hofmann 2020), we focus on the accuracy of numerical extractions from the chart. This is a common technique used in data visualization studies.

Does numerical accuracy of ratio estimations differ between dimensionality and projections of chart types? It is important to note that direct translations of 2D and 3D heat maps require different visual cues.

(a) Flatter height scale
(b) Stretched height scale
Figure 3: Two figures representing the volcano dataset (R Core Team 2024) using different height scaling factors. Both figures use the same color scale, but there is no “correct” translation of color into the measurable height.

Methods

Stimuli

The design of the 3D heat map experiment uses the method of constant stimuli: ratios are estimated with respect to one stimuli height that remains the same.

\[ S=\text{Stimuli} \]

Setting 50 as the constant and 90 as the maximum, a sequence of stimuli are chosen by equally partitioning the ratios between \(50/50=1\) and \(50/90\approx0.556\). The same ratios are used when setting 50 as the maximum in the stimuli pair.

(a) Data set 1
(b) Data set 2
Figure 4: Placement of stimuli in the heat map data sets. Transparent bars represent randomized values.

Chart Types

(a) 2D Digital
(b) 3D Digital
(c) 3D Printed
Figure 5: Chart types representing heat map data set 1.

Experimental Design

For a full replicate, there are \(3\times2\times9=54\) treatment combinations:

  • 3 media types (2dd, 3dd, 3dp)
  • 2 datasets
  • 9 pairs of stimuli

Way too many trials for a single participant’s attention span!

Our main interest is the difference between media types, measured at a given ratio and dataset. To accomplish this and to reduce the number of trials per participant, we use 4 of the 9 possible stimuli pairs to create blocks.

\[ 2\times3\times4=24 \]

Figure 6: Balanced Incomplete Block Design. Each stimuli pair in a block is fully crossed with media type and dataset.

Hypotheses

For each trial in the experiment, we ask two question adapted from Cleveland and McGill (1984).

  1. Which value in a stimuli pair represents a larger quantity?
  2. If the larger value in the stimuli pair represents 100 units, how many units is the smaller value?

From these questions, test the following hypotheses:

H1: Do differences exist in identifying the larger value in a stimuli pair across different stimuli pairs, chart types, and underlying data sets?

H2: Does numerical accuracy of ratio estimations differ across stimuli pairs, chart types, and underlying data sets?

Statistical Methods

We use (Generalized) Linear Mixed Models to account for the type of distributions for the responses and allocations of variation among participants and the ordering of charts. These are extensions of the methods presented in Stat 218.

Question 1: With a binary response, a binomial distribution is chosen to model the proportion of successfully identifying the larger value in a stimuli pair.

Question 2: Since estimates of the ratio take a continuous response, we model the average error metric using normal distributions.

Results

Demographics

196 participants completed the experiment as part of a project in the curriculum of Stat 218. Demographic information is provided below.

19-25 26-30 31-35 36-40 41-45 46 and over
191 2 0 1 2 0


Female Male Prefer not to answer
129 66 1


High School or Less Some Undergraduate Courses Undergraduate Degree Some Graduate Courses Graduate Degree Prefer not to answer
19 161 9 3 2 2

Question 1

Which value represents a larger quantity?

There were 4072 trials completed. From these, 3218 responses were correct in identifying the larger quantity in the pair of stimuli (79.03%).

Figure 7: Proportion of correct responses to Question 1

Question 1

There were 12 combinations of data sets and stimuli pairs where there was evidence that the odds of successfully identifying the larger stimuli value were larger than for 3D charts than for 2D charts. This was particularly prevalent when the stimuli pairs were approximately the same size.

Figure 8: Estimated probability of success

Question 2

If the larger value you selected above represents 100 units, how many units is the smaller value?

When excluding incorrect responses to Question 1 and when the stimuli were the same value, there were 3602 responses. Error is measured as follows:

\[ \text{Error}=\log_2(|\text{judged percent}-\text{true percent}|) \]

Figure 9: Boxplots of responses to Question 2

Question 2

Unlike Question 1, there were no significant interactions between the stimuli pairs, datasets, or media types. This means that estimated errors can be calculated separately for each treatment factor. We observed that the effect of stimuli pair was significant (p-value = 0.04335) and the effect of media type was significant (p-value < .0001).

Figure 10: Estimated log absolute error for stimuli pairs.

Question 2

Figure 11: Estimated log absolute error for chart media types.

Dicussion

Conclusion

  1. In certain situations, there is evidence to suggest that 2D heat maps are harder to identify larger values in a pair than for 3D heat maps. However, digital and physical 3D heat maps did not exhibit differences in identifying larger values.

  2. Estimations of ratios were worse for 2D charts than for 3D charts. There were no detectable difference between the digital and phyiscal 3D heat maps.

Future work

  1. It was sometimes observed that participants would estimate values that were not the ratio of the stimuli pair. A more sophisticated model could account for this discrepancy.

  2. Very few options exist for creating 3D printed charts. In R, an open-source statistical software package, the rayshader and rgl packages only provide limited support. Creating software that produces 3D-print files for these types charts would allow for broader implementation of physical statistical graphics. This could be helpful for increasing accessibility for people with low vision or as educational tools in explaining statistical concepts.

If you are interested in following the progress of this study, you can visit https://github.com/TWiedRW/ch3-heat3d for data and write-ups of the experiment.

Thank you!

References

3D Population Density of the US - HomeArea.com.” n.d. https://www.homearea.com/featured/3d-population-density/#3128000. Accessed September 17, 2025.
Barfield, Woodrow, and Robert Robless. 1989. “The Effects of Two- or Three-Dimensional Graphics on the Problem-Solving Performance of Experienced and Novice Decision Makers.” Behaviour & Information Technology 8 (5): 369–85. https://doi.org/10.1080/01449298908914567.
Cleveland, William S., and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association 79 (387): 531–54. https://doi.org/10.1080/01621459.1984.10478080.
Croxton, Frederick E., and Harold Stein. 1932. “Graphic Comparisons by Bars, Squares, Circles, and Cubes.” Journal of the American Statistical Association 27 (177): 54–60. https://doi.org/10.1080/01621459.1932.10503227.
Golladay, Mary. 1977. A Statistical Report on the Condition of Education in the United States. 1975-1979: DHEW Publication. U.S. Department of Education, Office of Educational Research and Improvement, National Center for Education Statistics.
Kraus, Matthias, Katrin Angerbauer, Juri Buchmüller, Daniel Schweitzer, Daniel A. Keim, Michael Sedlmair, and Johannes Fuchs. 2020. “CHI ’20: CHI Conference on Human Factors in Computing Systems.” In, 1–14. Honolulu HI USA: ACM. https://doi.org/10.1145/3313831.3376675.
R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Vanderplas, Susan, Dianne Cook, and Heike Hofmann. 2020. “Testing Statistical Charts: What Makes a Good Graph?” Annual Review of Statistics and Its Application 7 (1): 61–88. https://doi.org/10.1146/annurev-statistics-031219-041252.