TidyTuesday, a weekly social project for individuals to practice their data visualization skills in R, is one of many spaces rendered inaccessible due to a lack of alternative text for screen reading technology. An analysis of TidyTuesday images posted on Twitter between 2018 and 2021 by Silvia Canelón and Elizabeth Hare revealed that 3% of the data visualizations had alternative text and 84% of the images were described by default as “image”. There is still a great deal of progress to be made concerning accessibility not only on Twitter but across all platforms.
The main goal of this lab is to introduce you to elements of accessibility in data visualization which you will be using throughout the course and in your final projects. Creating accessible visualizations involves many facets. In this lab we will focus primarily on writing and including alternative text for plots and choosing effective color schemes for your data.
In this lab we will work with two principal packages: ggpattern which allow us to customize the pattern of our plots and tidyverse which is a collection of packages including ggplot2 for doing data analysis and visualization in a “tidy” way. We will also be using the countrycode package to format and preprocess the data for the chocolate bar visualizations.
These packages are already installed for you. You can load the packages by running the following in the Console.
library(tidyverse)
library(openintro)
library(countrycode)
library(ggpattern)
library(ggplot2)
library(RColorBrewer)Note that the packages are also loaded with the same commands in your R Markdown document.
The data frames we will be working with today are called chocolate and penguins and are both from the TidyTuesday GitHub repository.
You can load the data frames by running the following in the Console.
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with the commit message “Add Exercise 1”. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
First, we will create a new continent variable using mutate that will allow us to explore chocolate production and cacao origin by continent.
chocolate <- chocolate %>%
mutate(continent = countrycode(country_of_bean_origin,
origin = "country.name",
destination = "continent")
)Next, we will filter the data for chocolate bars produced by companies located in the USA that are of a single-country bean origin, and then count the number of chocolate bars produced by bean country of origin and continent.
chocolate %>%
filter(company_location == "U.S.A.",
country_of_bean_origin != "Blend") %>%
group_by(country_of_bean_origin, continent) %>%
summarise(count = n()) %>%
ggplot(mapping = aes(x = fct_reorder(country_of_bean_origin,
count),
y = count,
fill = continent)
) +
geom_col() +
coord_flip() +
labs(
title = "Bean Origin of Chocolate Bars
Manufactured in the U.S.A.",
y = "Number of Chocolate Bars
Manufactured in the U.S.A.",
x = "Country of Bean Origin",
fill = "Continent") +
scale_fill_viridis_d() Horizontal bar chart. Dominican Republic, Peru, Venezuela, and Ecuador are the top four countries responsible for the bean origin of 1154, 115, 87, and 86 bars respectively. Bars are color-coded by continent, with most beans originating from the Americas (blue).
Horizontal bar chart of the number of chocolate bars manufactured in the U.S.A by country and continent of bean origin. Dominican Republic, Peru, Venezuela, and Ecuador are the top four countries responsible for 154, 115, 87, and 86 bars respectively. Bars are color-coded by continent, with most beans originating from the Americas (blue).
Horizontal bar chart of the number of chocolate bars manufactured in the U.S.A by country and continent of bean origin.
Number of chocolate bars manufactured in the U.S.A by country and continent of bean origin. Dominican Republic, Peru, Venezuela, and Ecuador are the top four countries responsible for 154, 115, 87, and 86 bars respectively. Bars are color-coded by continent, with most beans originating from the Americas (blue).
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with the commit message “Add Exercise 2”. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
In the video that you watched above, Amy Cesal provided a guide to writing alternative text for data visualization. Adding alternative text to images increases accessibility, especially for individuals with some level of visual impairment who use assistive technology like screen readers. Without alternative text, people miss out on content simply because it’s visual.
Amy Cesal outlines three components of effective alternative text for data visualization. First, describe the chart type of the data visualization to give the audience context for understanding the rest of the visual. Then, include the type of data included in the chart by referring to the x and y axis labels, for example. Finally, add the reason for including the chart by telling the audience what to look for and what makes the chart meaningful. Link to the data source that was used to produce the chart in the surrounding text rather than in the alternative text for further exploration.
Keep in mind as you are writing your alternative text that Web Content Accessibility Guidelines (WCAG) recommends alternative text be limited to 125 characters. You will want to include enough text to convey the important take homes from the data visualization while trying to limit the number of characters.
According to the Colour Blind Awareness organization, approximately 300 million people are affected by color blindness worldwide. When creating data visualizations, the wrong choice of colors can easily render a figure inaccessible. As such, it is critical to choose a color palette that is accessible for all members of your audience. Two such palettes are the Viridis color scales included in the ggplot2 package, and the Colorbrewer palettes, using the RColorBrewer package.
The Viridis color scales are designed to be accessible for those who are visually impaired because the colors are differentiable in both color and black-and-white. For discrete data, use + scale_color_viridis_d() or + scale_fill_viridis_d() depending on whether you are specifying the color or fill of your data visualization. Similarly, use + scale_color_viridis_c() or + scale_fill_viridis_c() for continuous data. You can refer to the Viridis color scales documentation for examples.
In choosing a color palette, you should consider the kind of data that you are dealing with for the data visualization at hand. You can create your own colorblind safe color palette based on your preferred number of colors and whether your data is numeric or categorical using the ColorBrewer. This tool produces HEX color codes that may be used to define the colors of data visualizations in R.
The RColorBrewer package palettes include both sequential and diverging color palettes for numerical data. To visualize incremental changes in data, choose from the below palettes:
To visualize data that falls above or below a starting point, choose from the below palettes:
To show the alternative text that you have written, include the following in the code chunk header:
```{r, fig.alt = "Add Alternative Text Here"}
add code here
```
penguins %>%
group_by(species) %>%
summarise(mean_bill = mean(bill_length_mm,
na.rm = TRUE)) %>%
ggplot(mapping = aes(x = fct_reorder(
species, mean_bill),
y = mean_bill,
fill = species)
) +
geom_col() +
coord_flip() +
labs(title = "Average Bill Length of Three Penguin Species",
x = "Penguin Species",
y = "Average Bill Length in Millimeters",
fill = "Penguin Species") 🧶 ✅ ⬆️ This is another good place to pause, knit, commit changes with the commit message “Add Exercise 3”, and push. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
As outlined by CR Ferreira in “Two Simple Steps to Create Colorblind-Friendly Data Visualizations”, published in Towards Data Science, you should practice double-coding to create accessible data visualizations. This means that in addition to choosing a colorblind-friendly color palette, you should utilize different geometrical shapes, line patterns, and fill textures in your data visualizations.
sample_dataframe <- data.frame(
categories = c("a", "b", "c", "d"),
values = c(2.3, 1.9, 3.2, 1))
sample_plot <- ggplot(sample_dataframe,
aes(categories, values)) +
geom_col_pattern(
aes(pattern = categories,
fill = categories,
pattern_fill = categories),
colour = "black",
pattern_density = 0.35,
pattern_key_scale_factor = 1.3
) +
theme_bw() +
labs(
title = "Sample Plot Illustrating
Double-Coding for Accessibility",
subtitle = "Using Color and Patterns"
) +
scale_pattern_fill_viridis_d() +
theme(legend.position = "none") +
coord_fixed(ratio = 1)
sample_plotNow, fill in the blanks in the following code chunk to improve the accessibility of the penguin species visualization in the same way. Once you have added your code change the chunk options eval = FALSE to eval = TRUE:
penguins %>%
group_by(species) %>%
summarise(mean_bill = mean(bill_length_mm,
na.rm = TRUE)) %>%
ggplot(mapping = aes(x = fct_reorder(
species,mean_bill),
y = mean_bill,
fill = species)
) +
geom_col_pattern(
aes(pattern = _____ ,
fill = _____,
pattern_fill = _____),
colour = "black",
pattern_density = 0.35,
pattern_key_scale_factor = 1.3) +
theme_bw() +
labs(
title = "Average Bill Length of
Three Penguin Species",
x = "Penguin Species",
y = "Average Bill Length in Millimeters",
fill = "Penguin Species"
) +
scale_pattern_fill______ +
theme(legend.position = "none") +
coord_flip()🧶 ✅ ⬆️ This is another good place to pause, knit, commit changes with the commit message “Add Exercise 4”, and push. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
🧶 ✅ ⬆️ Yay, you’re done! Commit all remaining changes, use the commit message “Finish Lab X! 💪“, and push. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards. Before you wrap up the assignment, make sure all documents are updated on your GitHub repo.
Find a non-exhaustive list of Data Visualization Accessibility resources on the dataviza11y github
Want to go further? You can check the accessibility of a webpage using the WebAim tool.
This lab was created by Bates College students, Liza Dubinsky and Max Devon, in collaboration with Dr. Laurie Baker as part of a Bates Faculty Development Fund.