Lab - Creating Accessible Data Visualizations in R

TidyTuesday, a weekly social project for individuals to practice their data visualization skills in R, is one of many spaces rendered inaccessible due to a lack of alternative text for screen reading technology. An analysis of TidyTuesday images posted on Twitter between 2018 and 2021 by Silvia Canelón and Elizabeth Hare revealed that 3% of the data visualizations had alternative text and 84% of the images were described by default as “image”. There is still a great deal of progress to be made concerning accessibility not only on Twitter but across all platforms.

The main goal of this lab is to introduce you to elements of accessibility in data visualization which you will be using throughout the course and in your final projects. Creating accessible visualizations involves many facets. In this lab we will focus primarily on writing and including alternative text for plots and choosing effective color schemes for your data.

Getting started

Packages

In this lab we will work with two principal packages: ggpattern which allow us to customize the pattern of our plots and tidyverse which is a collection of packages including ggplot2 for doing data analysis and visualization in a “tidy” way. We will also be using the countrycode package to format and preprocess the data for the chocolate bar visualizations.

These packages are already installed for you. You can load the packages by running the following in the Console.

knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(tidyverse)
library(openintro)
library(countrycode)
library(ggpattern)
library(ggplot2)
library(RColorBrewer)

Note that the packages are also loaded with the same commands in your R Markdown document.

Data

The data frames we will be working with today are called chocolate and penguins and are both from the TidyTuesday GitHub repository.

You can load the data frames by running the following in the Console.

chocolate <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-18/chocolate.csv')

penguins <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-28/penguins.csv')

Part I: Writing Alternative Text

  1. Please watch Amy Cesal’s guide to writing alternative text for data visualization from the Outlier 2021 conference on Youtube. According to this video, what are the three components that alternative text for data visualization should include?

🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with the commit message “Add Exercise 1”. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

First, we will create a new continent variable using mutate that will allow us to explore chocolate production and cacao origin by continent.

chocolate <- chocolate %>%
  mutate(continent = countrycode(country_of_bean_origin, 
                                 origin = "country.name", 
                                 destination = "continent")
         )

Next, we will filter the data for chocolate bars produced by companies located in the USA that are of a single-country bean origin, and then count the number of chocolate bars produced by bean country of origin and continent.

chocolate %>%
  filter(company_location == "U.S.A.", 
         country_of_bean_origin != "Blend") %>%
  group_by(country_of_bean_origin, continent) %>%
  summarise(count = n()) %>%
  ggplot(mapping = aes(x = fct_reorder(country_of_bean_origin,
                                       count), 
                       y = count, 
                       fill = continent)
         ) +
  geom_col() + 
  coord_flip() +
  labs(
    title = "Bean Origin of Chocolate Bars 
            Manufactured in the U.S.A.", 
    y = "Number of Chocolate Bars 
        Manufactured in the U.S.A.", 
    x = "Country of Bean Origin", 
    fill = "Continent") +
  scale_fill_viridis_d() 

  1. With the three components alternative text for data visualizations should include in mind, justify why each of the four alternative text examples below describing the above image meets or doesn’t meet standards for alternative text for data visualizations:

🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with the commit message “Add Exercise 2”. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Part II: Choosing Effective Color Schemes

According to the Colour Blind Awareness organization, approximately 300 million people are affected by color blindness worldwide. When creating data visualizations, the wrong choice of colors can easily render a figure inaccessible. As such, it is critical to choose a color palette that is accessible for all members of your audience. Two such palettes are the Viridis color scales included in the ggplot2 package, and the Colorbrewer palettes, using the RColorBrewer package.

The Viridis color scales are designed to be accessible for those who are visually impaired because the colors are differentiable in both color and black-and-white. For discrete data, use + scale_color_viridis_d() or + scale_fill_viridis_d() depending on whether you are specifying the color or fill of your data visualization. Similarly, use + scale_color_viridis_c() or + scale_fill_viridis_c() for continuous data. You can refer to the Viridis color scales documentation for examples.

In choosing a color palette, you should consider the kind of data that you are dealing with for the data visualization at hand. You can create your own colorblind safe color palette based on your preferred number of colors and whether your data is numeric or categorical using the ColorBrewer. This tool produces HEX color codes that may be used to define the colors of data visualizations in R.

The RColorBrewer package palettes include both sequential and diverging color palettes for numerical data. To visualize incremental changes in data, choose from the below palettes:

display.brewer.all(type = "seq", colorblindFriendly = TRUE)

Diagram showing colors for eighteen R Color Brewer color blind friendly sequential color palettes including a 9 color gradient. Possible palettes include YlOrRd, YlOrBr, YlGnBu, YlGn, Reds, RdPu, Purples, PuRd, PuBuGn, PuBu, OrRd, Oranges, Greys, Greens, GnBu, BuPu, BuGn, Blues. Short hand colors are: Yl is yellow, Or is orange, Rd is red, Br is Brown, Gn is green, Bu is blue, Pu is purple

To visualize data that falls above or below a starting point, choose from the below palettes:

display.brewer.all(type = "div", colorblindFriendly = TRUE)

Diagram showing colors for six RColorBrewer color blind friendly sequential color palettes including an 11 color gradient. Possible palettes include RdYlBu, RdBu, PuOr, PRGn, PiYG, BRBG. Color abbreviations are: Yl is yellow, Or is orange, Rd is red, Br is Brown, Gn is green, Bu is blue, Pu is purple, BG is Blue Green, YG is yellow green, PR is a dark purple

To show the alternative text that you have written, include the following in the code chunk header:

```{r, fig.alt = "Add Alternative Text Here"}

add code here

```
  1. Now, try your hand at making data visualizations more accessible by focusing on the color palette and alternative text of this data visualization. Remember to include the chart type, type of data, and reason for including the chart in your alternative text:
penguins %>%
  group_by(species) %>%
  summarise(mean_bill = mean(bill_length_mm,
                             na.rm = TRUE)) %>%
  ggplot(mapping = aes(x = fct_reorder(
                       species, mean_bill), 
                       y = mean_bill, 
                       fill = species)
         ) +
  geom_col() + 
  coord_flip() +
  labs(title = "Average Bill Length of Three Penguin Species", 
       x = "Penguin Species", 
       y = "Average Bill Length in Millimeters", 
       fill = "Penguin Species") 

🧶 ✅ ⬆️ This is another good place to pause, knit, commit changes with the commit message “Add Exercise 3”, and push. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

As outlined by CR Ferreira in “Two Simple Steps to Create Colorblind-Friendly Data Visualizations”, published in Towards Data Science, you should practice double-coding to create accessible data visualizations. This means that in addition to choosing a colorblind-friendly color palette, you should utilize different geometrical shapes, line patterns, and fill textures in your data visualizations.

  1. Observe the following use of accessible color palettes and patterns to make below visualization accessible:
sample_dataframe <- data.frame(
                    categories = c("a", "b", "c", "d"), 
                    values = c(2.3, 1.9, 3.2, 1))

sample_plot <- ggplot(sample_dataframe,
                      aes(categories, values)) +
  geom_col_pattern(
    aes(pattern = categories, 
        fill = categories, 
        pattern_fill = categories), 
    colour = "black", 
    pattern_density = 0.35, 
    pattern_key_scale_factor = 1.3
    ) +
  theme_bw() +
  labs(
    title = "Sample Plot Illustrating 
            Double-Coding for Accessibility",
    subtitle = "Using Color and Patterns"
    ) +
  scale_pattern_fill_viridis_d() + 
  theme(legend.position = "none") + 
  coord_fixed(ratio = 1)

sample_plot

Now, fill in the blanks in the following code chunk to improve the accessibility of the penguin species visualization in the same way. Once you have added your code change the chunk options eval = FALSE to eval = TRUE:

penguins %>%
  group_by(species) %>%
  summarise(mean_bill = mean(bill_length_mm,
            na.rm = TRUE)) %>%
  ggplot(mapping = aes(x = fct_reorder(
                       species,mean_bill), 
                       y = mean_bill,
                       fill = species)
         ) +
  geom_col_pattern(
    aes(pattern = _____ , 
        fill = _____, 
        pattern_fill = _____),
    colour = "black", 
    pattern_density = 0.35, 
    pattern_key_scale_factor = 1.3) +
  theme_bw() +
  labs(
    title = "Average Bill Length of
            Three Penguin Species", 
    x = "Penguin Species", 
    y = "Average Bill Length in Millimeters",
    fill = "Penguin Species"
    ) +
  scale_pattern_fill______ + 
  theme(legend.position = "none") + 
  coord_flip()

🧶 ✅ ⬆️ This is another good place to pause, knit, commit changes with the commit message “Add Exercise 4”, and push. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

  1. Compare your color palette and alternative text for this data visualization with others in your group. As a group, combine your ideas to decide on the final data visualization.

🧶 ✅ ⬆️ You should pause again, commit changes with the commit message “Add Exercise 5” and push. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

  1. Reflection: What’s difficult about applying this criteria to more complex visualizations? What purpose do these elements provide for you as the creator of data visualizations as well as your audience?

🧶 ✅ ⬆️ Yay, you’re done! Commit all remaining changes, use the commit message “Finish Lab X! 💪“, and push. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards. Before you wrap up the assignment, make sure all documents are updated on your GitHub repo.

Summary of Amy Cesal’s Alternative Text Guide

In the video that you watched above, Amy Cesal provided a guide to writing alternative text for data visualization. Adding alternative text to images increases accessibility, especially for individuals with some level of visual impairment who use assistive technology like screen readers. Without alternative text, people miss out on content simply because it’s visual.

Amy Cesal outlines three components of effective alternative text for data visualization. First, describe the chart type of the data visualization to give the audience context for understanding the rest of the visual. Then, include the type of data included in the chart by referring to the x and y axis labels, for example. Finally, add the reason for including the chart by telling the audience what to look for and what makes the chart meaningful. Link to the data source that was used to produce the chart in the surrounding text rather than in the alternative text for further exploration.

Further Resources

Find a non-exhaustive list of Data Visualization Accessibility resources on the dataviza11y github

Want to go further? You can check the accessibility of a webpage using the WebAim tool.

Acknowledgements

This lab was created by Bates College students, Liza Dubinsky and Max Devon, in collaboration with Dr. Laurie Baker as part of a Bates Faculty Development Fund.