HW 03 - Road traffic accidents

Photo by Clark Van Der Beken on Unsplash Photo by Clark Van Der Beken on Unsplash

In this assignment we’ll look at traffic accidents in Edinburgh. The data are made available online by the UK Government. It covers all recorded accidents in Edinburgh in 2018 and some of the variables were modified for the purposes of this assignment.

Getting started

Click on the assignment link. On Github locate your homework repo, which should be named hw-03-accidents-YOUR_GITHUB_USERNAME or something similar. Grab the URL of the repo, and clone it in RStudio. First, open the R Markdown document hw-03.Rmd and Knit it. Make sure it compiles without errors. The output will be in the file markdown .md file with the same name.

Warm up

Before we introduce the data, let’s warm up with some simple exercises.

Packages

We’ll use the tidyverse package for much of the data wrangling and visualization and the data lives in the dsbox package.

You can load them by running the following in your Console:

#install.packages("tidyverse")
library(tidyverse)
# install.packages("devtools")
#library(devtools)
#devtools::install_github("rstudio-education/dsbox")
library(dsbox)

Note, if the above code does not run, you will want to uncomment (remove the #) and install the required packages. You will then want to recommend (add the #) so that the package does not get installed everytime you knit.

Data

The data can be found in the dsbox package, and it’s called accidents. Since the dataset is distributed with the package, we don’t need to load it separately; it becomes available to us when we load the package. You can find out more about the dataset by inspecting its documentation, which you can access by running ?accidents in the Console or using the Help menu in RStudio to search for accidents. You can also find this information here.

Occasionally RStudio Cloud and Github have trouble connecting. So the dataset is also in the folder and can be loaded using this code instead.

accidents <- read_csv("accidents.csv")

Exercises

  1. How many observations (rows) does the dataset have? Instead of hard coding the number in your answer, use inline code.

Hint: The Markdown Quick Reference sheet has an example of inline R code that might be helpful. You can access it from the Help menu in RStudio. It is also referenced in this section of the RMarkdown cookbook.

  1. Run View(accidents) in your Console to view the data in the data viewer. What does each row in the dataset represent?

🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

  1. Recreate the following plot, and describe in context of the data. In your answer, don’t forget to label your R chunk as well (where it says label-me-1). Your label should be short, informative, shouldn’t include spaces, and shouldn’t shouldn’t repeat a previous label.

🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

  1. Create another data visualisation based on these data and interpret it. You can choose any variables and any type of visualisation you like, but it must have at least three variables, e.g. a scatterplot of x vs. y isn’t enough, but if points are coloured by z, that’s fine. In your answer, don’t forget to label your R chunk as well (where it says label-me-2).

🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards and review the md document on GitHub to make sure you’re happy with the final state of your work. Check also the self checklist in your Github Readme to check to see if you missed a solution or interpretation that was expected on your lab/hw.