Lab - Creating Accessible Data Visualizations in R

TidyTuesday, a weekly social project for individuals to practice their data visualization skills in R, is one of many spaces rendered inaccessible due to a lack of alternative text for screen reading technology. An analysis of TidyTuesday images posted on Twitter between 2018 and 2021 by Silvia Canelón and Elizabeth Hare revealed that 3% of the data visualizations had alternative text and 84% of the images were described by default as “image”. There is still a great deal of progress to be made concerning accessibility not only on Twitter but across all platforms.

The main goal of this lab is to introduce you to elements of accessibility in data visualization which you will be using throughout the course and in your final projects. Creating accessible visualizations involves many facets. In this lab we will focus primarily on writing and including alternative text for plots and choosing effective color schemes for your data.

Getting started

Packages

In this lab we will work with two principal packages: ggpattern which allow us to customize the pattern of our plots and tidyverse which is a collection of packages including ggplot2 for doing data analysis and visualization in a “tidy” way. We will also be using the countrycode package to format and preprocess the data for the chocolate bar visualizations.

These packages are already installed for you. You can load the packages by running the following in the Console.

knitr::opts_chunk$set(warning = FALSE, message = FALSE)

library(tidyverse)
library(openintro)
library(countrycode)
library(ggpattern)
library(ggplot2)
library(RColorBrewer)

Note that the packages are also loaded with the same commands in your R Markdown document.

Data

The data frames we will be working with today are called chocolate and penguins and are both from the TidyTuesday GitHub repository.

You can load the data frames by running the following in the Console.

chocolate <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-18/chocolate.csv')

penguins <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-28/penguins.csv')

Part I: Writing Alternative Text

Please watch Amy Cesal’s guide to writing alternative text for data visualization from the Outlier 2021 conference on Youtube. According to this video, what are the three components that alternative text for data visualization should include?

🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with the commit message “Add Exercise 1”. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

First, we will create a new continent variable using mutate that will allow us to explore chocolate production and cacao origin by continent.

chocolate <- chocolate %>%
  mutate(continent = countrycode(country_of_bean_origin, 
                                 origin = "country.name", 
                                 destination = "continent")
         )

Next, we will filter the data for chocolate bars produced by companies located in the USA that are of a single-country bean origin, and then count the number of chocolate bars produced by bean country of origin and continent.

chocolate %>%
  filter(company_location == "U.S.A.", 
         country_of_bean_origin != "Blend") %>%
  group_by(country_of_bean_origin, continent) %>%
  summarise(count = n()) %>%
  ggplot(mapping = aes(x = fct_reorder(country_of_bean_origin,
                                       count), 
                       y = count, 
                       fill = continent)
         ) +
  geom_col() + 
  coord_flip() +
  labs(
    title = "Bean Origin of Chocolate Bars 
            Manufactured in the U.S.A.", 
    y = "Number of Chocolate Bars 
        Manufactured in the U.S.A.", 
    x = "Country of Bean Origin", 
    fill = "Continent") +
  scale_fill_viridis_d()

With the three components alternative text for data visualizations should include in mind, justify why each of the four alternative text examples below describing the above image meets or doesn’t meet standards for alternative text for data visualizations:

Bar chart where 20 of the 38 countries are in the Americas, 11 of the 38 countries are in Africa, and the Dominican Republic, Peru, Venezuela, and Ecuador are the top four countries responsible for 154, 115, 87, and 86 bars respectively.
Bar chart of the number of chocolate bars manufactured in the U.S.A by country and continent of bean origin where 20 of the 38 countries are in the Americas, 11 of the 38 countries are in Africa, and the Dominican Republic, Peru, Venezuela, and Ecuador are the top four countries responsible for 154, 115, 87, and 86 bars respectively.
Bar chart of the number of chocolate bars manufactured in the U.S.A by country and continent of bean origin.
Number of chocolate bars manufactured in the U.S.A by country and continent of bean origin where 20 of the 38 countries are in the Americas, 11 of the 38 countries are in Africa, and the Dominican Republic, Peru, Venezuela, and Ecuador are the top four countries responsible for 154, 115, 87, and 86 bars respectively.

🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with the commit message “Add Exercise 2”. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Part II: Choosing Effective Color Schemes

According to the Colour Blind Awareness organization, approximately 300 million people are affected by color blindness worldwide. When creating data visualizations, the wrong choice of colors can easily render a figure inaccessible. As such, it is critical to choose a color palette that is accessible for all members of your audience. Two such palettes are the Viridis color scales included in the ggplot2 package, and the Colorbrewer palettes, using the RColorBrewer package.

The Viridis color scales are designed to be accessible for those who are visually impaired because the colors are differentiable in both color and black-and-white. For discrete data, use + scale_color_viridis_d() or + scale_fill_viridis_d() depending on whether you are specifying the color or fill of your data visualization. Similarly, use + scale_color_viridis_c() or + scale_fill_viridis_c() for continuous data. You can refer to the Viridis color scales documentation for examples.

In choosing a color palette, you should consider the kind of data that you are dealing with for the data visualization at hand. You can create your own colorblind safe color palette based on your preferred number of colors and whether your data is numeric or categorical using the ColorBrewer. This tool produces HEX color codes that may be used to define the colors of data visualizations in R.

The RColorBrewer package palettes include both sequential and diverging color palettes for numerical data. To visualize incremental changes in data, choose from the below palettes:

display.brewer.all(type = "seq", colorblindFriendly = TRUE)

Diagram showing colors for eighteen R Color Brewer color blind friendly sequential color palettes including a 9 color gradient. Possible palettes include YlOrRd, YlOrBr, YlGnBu, YlGn, Reds, RdPu, Purples, PuRd, PuBuGn, PuBu, OrRd, Oranges, Greys, Greens, GnBu, BuPu, BuGn, Blues. Short hand colors are: Yl is yellow, Or is orange, Rd is red, Br is Brown, Gn is green, Bu is blue, Pu is purple

To visualize data that falls above or below a starting point, choose from the below palettes:

display.brewer.all(type = "div", colorblindFriendly = TRUE)

Diagram showing colors for six RColorBrewer color blind friendly sequential color palettes including an 11 color gradient. Possible palettes include RdYlBu, RdBu, PuOr, PRGn, PiYG, BRBG. Color abbreviations are: Yl is yellow, Or is orange, Rd is red, Br is Brown, Gn is green, Bu is blue, Pu is purple, BG is Blue Green, YG is yellow green, PR is a dark purple

To show the alternative text that you have written, include the following in the code chunk header:

```{r, fig.alt = "Add Alternative Text Here"}

add code here

```

Now, try your hand at making data visualizations more accessible by focusing on the color palette and alternative text of this data visualization. Remember to include the chart type, type of data, and reason for including the chart in your alternative text:

penguins %>%
  group_by(species) %>%
  summarise(mean_bill = mean(bill_length_mm,
                             na.rm = TRUE)) %>%
  ggplot(mapping = aes(x = fct_reorder(
                       species, mean_bill), 
                       y = mean_bill, 
                       fill = species)
         ) +
  geom_col() + 
  coord_flip() +
  labs(title = "Average Bill Length of Three Penguin Species", 
       x = "Penguin Species", 
       y = "Average Bill Length in Millimeters", 
       fill = "Penguin Species")

🧶 ✅ ⬆️ This is another good place to pause, knit, commit changes with the commit message “Add Exercise 3”, and push. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

As outlined by CR Ferreira in “Two Simple Steps to Create Colorblind-Friendly Data Visualizations”, published in Towards Data Science, you should practice double-coding to create accessible data visualizations. This means that in addition to choosing a colorblind-friendly color palette, you should utilize different geometrical shapes, line patterns, and fill textures in your data visualizations.

Observe the following use of accessible color palettes and patterns to make below visualization accessible:

sample_dataframe <- data.frame(
                    categories = c("a", "b", "c", "d"), 
                    values = c(2.3, 1.9, 3.2, 1))

sample_plot <- ggplot(sample_dataframe,
                      aes(categories, values)) +
  geom_col_pattern(
    aes(pattern = categories, 
        fill = categories, 
        pattern_fill = categories), 
    colour = "black", 
    pattern_density = 0.35, 
    pattern_key_scale_factor = 1.3
    ) +
  theme_bw() +
  labs(
    title = "Sample Plot Illustrating 
            Double-Coding for Accessibility",
    subtitle = "Using Color and Patterns"
    ) +
  scale_pattern_fill_viridis_d() + 
  theme(legend.position = "none") + 
  coord_fixed(ratio = 1)

sample_plot

Now, fill in the blanks in the following code chunk to improve the accessibility of the penguin species visualization in the same way. Once you have added your code change the chunk options eval = FALSE to eval = TRUE:

penguins %>%
  group_by(species) %>%
  summarise(mean_bill = mean(bill_length_mm,
            na.rm = TRUE)) %>%
  ggplot(mapping = aes(x = fct_reorder(
                       species,mean_bill), 
                       y = mean_bill,
                       fill = species)
         ) +
  geom_col_pattern(
    aes(pattern = _____ , 
        fill = _____, 
        pattern_fill = _____),
    colour = "black", 
    pattern_density = 0.35, 
    pattern_key_scale_factor = 1.3) +
  theme_bw() +
  labs(
    title = "Average Bill Length of
            Three Penguin Species", 
    x = "Penguin Species", 
    y = "Average Bill Length in Millimeters",
    fill = "Penguin Species"
    ) +
  scale_pattern_fill______ + 
  theme(legend.position = "none") + 
  coord_flip()

🧶 ✅ ⬆️ This is another good place to pause, knit, commit changes with the commit message “Add Exercise 4”, and push. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Compare your color palette and alternative text for this data visualization with others in your group. As a group, combine your ideas to decide on the final data visualization.

🧶 ✅ ⬆️ You should pause again, commit changes with the commit message “Add Exercise 5” and push. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Reflection: What’s difficult about applying this criteria to more complex visualizations? What purpose do these elements provide for you as the creator of data visualizations as well as your audience?

🧶 ✅ ⬆️ Yay, you’re done! Commit all remaining changes, use the commit message “Finish Lab X! 💪“, and push. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards. Before you wrap up the assignment, make sure all documents are updated on your GitHub repo.

Lab - Creating Accessible Data Visualizations in R

Getting started

Packages

Data

Part I: Writing Alternative Text

Part II: Choosing Effective Color Schemes

Summary of Amy Cesal’s Alternative Text Guide

Further Resources

Acknowledgements