Visualization in R

A SC3L Workshop

Tyler Wiederich

University of Nebraska-Lincoln

About us

  • We are the Statistical Cross-disciplinary Collaboration & Consulting Lab (SC3L) from the UNL Department of Statistics.
  • We offer free statistical consulting services to students, faculty, and staff within INAR at UNL.
  • Workshops! Hosted from 2-3pm on Wednesdays and 10-11 on Thursdays.

Data Visualization in R

Why visualize data?

Data visualization is an important step in understanding the relationships of variables in your dataset.

  • How does Factor A affect my response?
  • Is there an interaction between Factor A and Factor B?
  • Are there outliers in my dataset?

The basics

R has multiple methods of creating visualizations, but our focus will be with the ggplot2 package. This package uses the Grammar of Graphics approach, layering different building blocks to produce a graph.

The basics: data format

Data needs to be formatted so that it is tidy, which is defined as one observation per row and each measurement as a column.

Trt1 Trt2 Rep response
1 1 1 9.37
2 1 1 9.04
1 2 1 10.84
2 2 1 10.94
1 1 2 10.00
2 1 2 9.63
1 2 2 9.62
2 2 2 10.00

Example: not tidy

Average diamond price by cut and color
color Fair Good Very Good Premium Ideal
E 11156 4535 1703 4739 1799
G 5924 NA 2684 5720 3266
H 3862 NA 4527 5438 3448
D NA 2406 3424 1809 5855
F NA 4549 8948 2880 1780
I NA 1952 4532 4480 3734
J NA NA NA 6348 1720

Example: tidy

Average diamond price by cut and color.
cut color price
Fair E 11156
Fair G 5924
Fair H 3862
Good D 2406
Good E 4535
Good F 4549

The basics: syntax

ggplot(data = data, mapping = aes(...)) + 
  geom_FUNCTION(aes(...), ...) + 
  scale_FUNCTION(...) +
  facet_FUNCTION(...) + 
  labs(title = '', subtitle = '', x = '', y = '') + 
  theme_FUNCTION(...) + 
  coord_FUNCTION(...)

Preliminaries

install.packages(c('ggplot2', 'palmerpenguins', 'ggthemes'))
library(ggplot2)
library(palmerpenguins)
library(ggthemes)

Example 1

Example 1: penguins

Palmer Penguins Data
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007

Example 1: penguins

ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm, color = species)) + 
  geom_point(aes(shape = species), alpha = 2/3, size = 1) + 
  theme_bw() + 
  labs(x = 'Bill length (mm)', y = 'Bill depth (mm)',
       title = 'Bill length vs. Bill depth', subtitle = 'By species',
       color = 'Species', shape = 'Species', caption = 'Source: palmerpenguins') +
  facet_grid(.~island) + 
  theme(aspect.ratio = 1/2,
        legend.position = 'bottom')

Example 2

Example 2

Does mass differ between penguins species?

Example 3

Example 3

Using the ggplot2::economics dataset, plot the median duration of unemployment over time.

ggplot(data = economics, mapping = aes(x = date, y = uempmed)) + 
  # geom_line(color = 'black') + 
  geom_area(fill = 'skyblue', color = 'black') +
  theme_bw() + 
  labs(x = '',
       y = 'Median durration of unemployment\n(in weeks)',
       title = 'Longer unemployment during Great Recession',
       subtitle = 'in the United States',
       caption = 'Source: ggplot2::economics') + 
  scale_x_date(date_breaks = '5 years', date_labels = '%Y') + 
  scale_y_continuous(limits = c(0,27), expand = c(0,0)) + 
  theme(aspect.ratio = 1/2)

Saving your visualization

Saving your visualization

myplot <- economics2000s %>% 
  ggplot(mapping = aes(x = year, y = mean_unemploy)) + 
  geom_bar(stat = 'identity', width = 1,
           color = 'black', fill = 'skyblue') + 
  labs(x = '', y = 'Unemployment rate (%)', title = 'Unemployment rate in the United States') +
  scale_y_continuous(limits = c(0, 5), expand = c(0,0)) + 
  scale_x_continuous(breaks = 2000:2015) +
  theme_bw() + theme(aspect.ratio = 1/2)

ggsave('unemploy.png', width = 6, dpi = 600)

Wrap-up

Thank you

Additional resources

Visit our website to schedule an appointment! https://statistics.unl.edu/sc3lhelp-desk/