TidyTuesday, a weekly social project for individuals to practice their data visualization skills in R, is one of many spaces rendered inaccessible due to a lack of alternative text for screen reading technology. An analysis of TidyTuesday images posted on Twitter between 2018 and 2021 by Silvia Canelón and Elizabeth Hare revealed that 3% of the data visualizations had alternative text and 84% of the images were described by default as “image”. There is still a great deal of progress to be made concerning accessibility not only on Twitter but across all platforms.
The main goal of this lab is to introduce you to elements of accessibility in data visualization which you will be using throughout the course and in your final projects. Creating accessible visualizations involves many facets. In this lab we will focus primarily on writing and including alternative text for plots and choosing effective color schemes for your data.
In this lab we will work with two principal packages: ggpattern which allow us to customize the pattern of our plots and tidyverse which is a collection of packages including ggplot2 for doing data analysis and visualization in a “tidy” way. We will also be using the countrycode package to format and preprocess the data for the chocolate bar visualizations.
These packages are already installed for you. You can load the packages by running the following in the Console.
::opts_chunk$set(warning = FALSE, message = FALSE) knitr
library(tidyverse)
library(openintro)
library(countrycode)
library(ggpattern)
library(ggplot2)
library(RColorBrewer)
Note that the packages are also loaded with the same commands in your R Markdown document.
The data frames we will be working with today are called
chocolate
and penguins
and are both from the
TidyTuesday
GitHub repository.
You can load the data frames by running the following in the Console.
<- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-18/chocolate.csv')
chocolate
<- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-28/penguins.csv') penguins
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with the commit message “Add Exercise 1”. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
First, we will create a new continent variable using
mutate
that will allow us to explore chocolate production
and cacao origin by continent.
<- chocolate %>%
chocolate mutate(continent = countrycode(country_of_bean_origin,
origin = "country.name",
destination = "continent")
)
Next, we will filter the data for chocolate bars produced by companies located in the USA that are of a single-country bean origin, and then count the number of chocolate bars produced by bean country of origin and continent.
%>%
chocolate filter(company_location == "U.S.A.",
!= "Blend") %>%
country_of_bean_origin group_by(country_of_bean_origin, continent) %>%
summarise(count = n()) %>%
ggplot(mapping = aes(x = fct_reorder(country_of_bean_origin,
count), y = count,
fill = continent)
+
) geom_col() +
coord_flip() +
labs(
title = "Bean Origin of Chocolate Bars
Manufactured in the U.S.A.",
y = "Number of Chocolate Bars
Manufactured in the U.S.A.",
x = "Country of Bean Origin",
fill = "Continent") +
scale_fill_viridis_d()
Bar chart where 20 of the 38 countries are in the Americas, 11 of the 38 countries are in Africa, and the Dominican Republic, Peru, Venezuela, and Ecuador are the top four countries responsible for 154, 115, 87, and 86 bars respectively.
Bar chart of the number of chocolate bars manufactured in the U.S.A by country and continent of bean origin where 20 of the 38 countries are in the Americas, 11 of the 38 countries are in Africa, and the Dominican Republic, Peru, Venezuela, and Ecuador are the top four countries responsible for 154, 115, 87, and 86 bars respectively.
Bar chart of the number of chocolate bars manufactured in the U.S.A by country and continent of bean origin.
Number of chocolate bars manufactured in the U.S.A by country and continent of bean origin where 20 of the 38 countries are in the Americas, 11 of the 38 countries are in Africa, and the Dominican Republic, Peru, Venezuela, and Ecuador are the top four countries responsible for 154, 115, 87, and 86 bars respectively.
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with the commit message “Add Exercise 2”. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
According to the Colour Blind Awareness organization, approximately 300 million people are affected by color blindness worldwide. When creating data visualizations, the wrong choice of colors can easily render a figure inaccessible. As such, it is critical to choose a color palette that is accessible for all members of your audience. Two such palettes are the Viridis color scales included in the ggplot2 package, and the Colorbrewer palettes, using the RColorBrewer package.
The Viridis color scales are designed to be accessible for those who
are visually impaired because the colors are differentiable in both
color and black-and-white. For discrete data, use
+ scale_color_viridis_d()
or
+ scale_fill_viridis_d()
depending on whether you are
specifying the color or fill of your data visualization. Similarly, use
+ scale_color_viridis_c()
or
+ scale_fill_viridis_c()
for continuous data. You can refer
to the Viridis
color scales documentation for examples.
In choosing a color palette, you should consider the kind of data that you are dealing with for the data visualization at hand. You can create your own colorblind safe color palette based on your preferred number of colors and whether your data is numeric or categorical using the ColorBrewer. This tool produces HEX color codes that may be used to define the colors of data visualizations in R.
The RColorBrewer package palettes include both sequential and diverging color palettes for numerical data. To visualize incremental changes in data, choose from the below palettes:
display.brewer.all(type = "seq", colorblindFriendly = TRUE)
To visualize data that falls above or below a starting point, choose from the below palettes:
display.brewer.all(type = "div", colorblindFriendly = TRUE)
To show the alternative text that you have written, include the following in the code chunk header:
```{r, fig.alt = "Add Alternative Text Here"}
add code here
```
%>%
penguins group_by(species) %>%
summarise(mean_bill = mean(bill_length_mm,
na.rm = TRUE)) %>%
ggplot(mapping = aes(x = fct_reorder(
species, mean_bill), y = mean_bill,
fill = species)
+
) geom_col() +
coord_flip() +
labs(title = "Average Bill Length of Three Penguin Species",
x = "Penguin Species",
y = "Average Bill Length in Millimeters",
fill = "Penguin Species")
🧶 ✅ ⬆️ This is another good place to pause, knit, commit changes with the commit message “Add Exercise 3”, and push. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
As outlined by CR Ferreira in “Two Simple Steps to Create Colorblind-Friendly Data Visualizations”, published in Towards Data Science, you should practice double-coding to create accessible data visualizations. This means that in addition to choosing a colorblind-friendly color palette, you should utilize different geometrical shapes, line patterns, and fill textures in your data visualizations.
<- data.frame(
sample_dataframe categories = c("a", "b", "c", "d"),
values = c(2.3, 1.9, 3.2, 1))
<- ggplot(sample_dataframe,
sample_plot aes(categories, values)) +
geom_col_pattern(
aes(pattern = categories,
fill = categories,
pattern_fill = categories),
colour = "black",
pattern_density = 0.35,
pattern_key_scale_factor = 1.3
+
) theme_bw() +
labs(
title = "Sample Plot Illustrating
Double-Coding for Accessibility",
subtitle = "Using Color and Patterns"
+
) scale_pattern_fill_viridis_d() +
theme(legend.position = "none") +
coord_fixed(ratio = 1)
sample_plot
Now, fill in the blanks in the following code chunk to improve the
accessibility of the penguin species visualization in the same way. Once
you have added your code change the chunk options
eval = FALSE
to eval = TRUE
:
%>%
penguins group_by(species) %>%
summarise(mean_bill = mean(bill_length_mm,
na.rm = TRUE)) %>%
ggplot(mapping = aes(x = fct_reorder(
species,mean_bill), y = mean_bill,
fill = species)
+
) geom_col_pattern(
aes(pattern = _____ ,
fill = _____,
pattern_fill = _____),
colour = "black",
pattern_density = 0.35,
pattern_key_scale_factor = 1.3) +
theme_bw() +
labs(
title = "Average Bill Length of
Three Penguin Species",
x = "Penguin Species",
y = "Average Bill Length in Millimeters",
fill = "Penguin Species"
+
) +
scale_pattern_fill______ theme(legend.position = "none") +
coord_flip()
🧶 ✅ ⬆️ This is another good place to pause, knit, commit changes with the commit message “Add Exercise 4”, and push. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
🧶 ✅ ⬆️ You should pause again, commit changes with the commit message “Add Exercise 5” and push. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
🧶 ✅ ⬆️ Yay, you’re done! Commit all remaining changes, use the commit message “Finish Lab X! 💪“, and push. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards. Before you wrap up the assignment, make sure all documents are updated on your GitHub repo.
In the video that you watched above, Amy Cesal provided a guide to writing alternative text for data visualization. Adding alternative text to images increases accessibility, especially for individuals with some level of visual impairment who use assistive technology like screen readers. Without alternative text, people miss out on content simply because it’s visual.
Amy Cesal outlines three components of effective alternative text for data visualization. First, describe the chart type of the data visualization to give the audience context for understanding the rest of the visual. Then, include the type of data included in the chart by referring to the x and y axis labels, for example. Finally, add the reason for including the chart by telling the audience what to look for and what makes the chart meaningful. Link to the data source that was used to produce the chart in the surrounding text rather than in the alternative text for further exploration.
Find a non-exhaustive list of Data Visualization Accessibility resources on the dataviza11y github
Want to go further? You can check the accessibility of a webpage using the WebAim tool.
This lab was created by Bates College students, Liza Dubinsky and Max Devon, in collaboration with Dr. Laurie Baker as part of a Bates Faculty Development Fund.