| 19-25 | 26-30 | 31-35 | 36-40 | 41-45 | 46 and over |
|---|---|---|---|---|---|
| 191 | 2 | 0 | 1 | 2 | 0 |
A study comparing the physical dimensionlity of heat maps
University of Nebraska, Lincoln
Consider the following examples:
What do these all have in common?
They all report a summary statistic across two explanatory variables.
When two explanatory variables represent quantitative or ordinal qualitative variables, heat maps are a popular choice in displaying summary statistics.
Figure 1: Example of a 2D heat map
Given the three-dimensional nature of the data, heat maps are sometimes represented in three-dimensions, using height to convey the summary statistic.
Few studies have evaluated the effectiveness in extracting information from 2D and 3D heat maps.
Accuracy in estimations is worse for volumes than for areas (Croxton and Stein 1932).
One common limitation of these studies is that they are generalized to two-dimensional displays of the heat maps, either via paper or digital renderings. These 3D charts are not truly 3D.
Data physicalization is the process of converting data into three-dimensional objects. While typically used for artistic representations of data, statistical graphics can also take advantage of this process. Some immediate benefits include:
While there are many methods of evaluating the qualities of a “good” chart (Vanderplas, Cook, and Hofmann 2020), we focus on the accuracy of numerical extractions from the chart. This is a common technique used in data visualization studies.
Does numerical accuracy of ratio estimations differ between dimensionality and projections of chart types? It is important to note that direct translations of 2D and 3D heat maps require different visual cues.
The design of the 3D heat map experiment uses the method of constant stimuli: ratios are estimated with respect to one stimuli height that remains the same.
\[ S=\text{Stimuli} \]
Setting 50 as the constant and 90 as the maximum, a sequence of stimuli are chosen by equally partitioning the ratios between \(50/50=1\) and \(50/90\approx0.556\). The same ratios are used when setting 50 as the maximum in the stimuli pair.
For a full replicate, there are \(3\times2\times9=54\) treatment combinations:
Way too many trials for a single participant’s attention span!
Our main interest is the difference between media types, measured at a given ratio and dataset. To accomplish this and to reduce the number of trials per participant, we use 4 of the 9 possible stimuli pairs to create blocks.
\[ 2\times3\times4=24 \]
Figure 6: Balanced Incomplete Block Design. Each stimuli pair in a block is fully crossed with media type and dataset.
For each trial in the experiment, we ask two question adapted from Cleveland and McGill (1984).
From these questions, test the following hypotheses:
H1: Do differences exist in identifying the larger value in a stimuli pair across different stimuli pairs, chart types, and underlying data sets?
H2: Does numerical accuracy of ratio estimations differ across stimuli pairs, chart types, and underlying data sets?
We use (Generalized) Linear Mixed Models to account for the type of distributions for the responses and allocations of variation among participants and the ordering of charts. These are extensions of the methods presented in Stat 218.
Question 1: With a binary response, a binomial distribution is chosen to model the proportion of successfully identifying the larger value in a stimuli pair.
Question 2: Since estimates of the ratio take a continuous response, we model the average error metric using normal distributions.
196 participants completed the experiment as part of a project in the curriculum of Stat 218. Demographic information is provided below.
| 19-25 | 26-30 | 31-35 | 36-40 | 41-45 | 46 and over |
|---|---|---|---|---|---|
| 191 | 2 | 0 | 1 | 2 | 0 |
| Female | Male | Prefer not to answer |
|---|---|---|
| 129 | 66 | 1 |
| High School or Less | Some Undergraduate Courses | Undergraduate Degree | Some Graduate Courses | Graduate Degree | Prefer not to answer |
|---|---|---|---|---|---|
| 19 | 161 | 9 | 3 | 2 | 2 |
Which value represents a larger quantity?
There were 4072 trials completed. From these, 3218 responses were correct in identifying the larger quantity in the pair of stimuli (79.03%).
Figure 7: Proportion of correct responses to Question 1
There were 12 combinations of data sets and stimuli pairs where there was evidence that the odds of successfully identifying the larger stimuli value were larger than for 3D charts than for 2D charts. This was particularly prevalent when the stimuli pairs were approximately the same size.
Figure 8: Estimated probability of success
If the larger value you selected above represents 100 units, how many units is the smaller value?
When excluding incorrect responses to Question 1 and when the stimuli were the same value, there were 3602 responses. Error is measured as follows:
\[ \text{Error}=\log_2(|\text{judged percent}-\text{true percent}|) \]
Figure 9: Boxplots of responses to Question 2
Unlike Question 1, there were no significant interactions between the stimuli pairs, datasets, or media types. This means that estimated errors can be calculated separately for each treatment factor. We observed that the effect of stimuli pair was significant (p-value = 0.04335) and the effect of media type was significant (p-value < .0001).
Figure 10: Estimated log absolute error for stimuli pairs.
Figure 11: Estimated log absolute error for chart media types.
In certain situations, there is evidence to suggest that 2D heat maps are harder to identify larger values in a pair than for 3D heat maps. However, digital and physical 3D heat maps did not exhibit differences in identifying larger values.
Estimations of ratios were worse for 2D charts than for 3D charts. There were no detectable difference between the digital and phyiscal 3D heat maps.



It was sometimes observed that participants would estimate values that were not the ratio of the stimuli pair. A more sophisticated model could account for this discrepancy.
Very few options exist for creating 3D printed charts. In R, an open-source statistical software package, the rayshader and rgl packages only provide limited support. Creating software that produces 3D-print files for these types charts would allow for broader implementation of physical statistical graphics. This could be helpful for increasing accessibility for people with low vision or as educational tools in explaining statistical concepts.
If you are interested in following the progress of this study, you can visit https://github.com/TWiedRW/ch3-heat3d for data and write-ups of the experiment.
