Photo by Daniel Cheung on Unsplash
This week we’ll do some data gymnastics to refresh and review what we learned over the past few weeks using (simulated) data from Lego sales in 2018 for a sample of customers who bought Legos in the US.
Click on your assignment link and go to the Github repo. Clone the repo in RStudio and open the R Markdown document. Knit the document to make sure it compiles without errors.
Before we introduce the data, let’s warm up with some simple exercises. Update the YAML of your R Markdown file with your information, knit, commit, and push your changes. Make sure to commit with a meaningful commit message. Then, go to your repo on GitHub and confirm that your changes are visible in your Rmd and md files. If anything is missing, commit and push again.
#install.packages("tidyverse")
library(tidyverse)
# install.packages("devtools")
# library(devtools)
# devtools::install_github("rstudio-education/dsbox")
library(dsbox)
Note, if the above code does not run, you will want to uncomment (remove the #) and install the required packages. You will then want to recommend (add the #) so that the package does not get installed everytime you knit.
The data can be found in the dsbox package, and it’s
called lego_sales
. Since the dataset is distributed with
the package, we don’t need to load it separately; it becomes available
to us when we load the package. You can find out more about the dataset
by inspecting its documentation, which you can access by running
?lego_sales
in the Console or using the Help menu in
RStudio to search for lego_sales
. You can also find this
information here.
Answer the following questions using pipelines. For each question, state your answer in a sentence, e.g. “In this sample, the first three common names of purchasers are …”. Note that the answers to all questions are within the context of this particular sample of sales, i.e. you shouldn’t make inferences about the population of all Lego sales based on this sample.
What are the three most common first names of purchasers?
What are the three most common themes of Lego sets purchased?
Among the most common theme of Lego sets purchased, what is the most common subtheme?
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
Hint:
Use the case_when()
function.
age_group
and group the
ages into the following categories: “18 and under”, “19 - 25”, “26 -
35”, “36 - 50”, “51 and over”.Hint: You will need to consider quantity of purchases.
Hint: You will need to consider quantity of purchases as well as price of lego sets.
Hint:
The str_sub()
function will be helpful here!
Which Lego theme has made the most money for Lego?
Which area code has spent the most money on Legos? In the US the area code is the first 3 digits of a phone number.
Come up with a question you want to answer using these data, and write it down. Then, create a data visualization that answers the question, and explain how your visualization answers the question.
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards and review the md document on GitHub to make sure you’re happy with the final state of your work. Check also the self checklist in your Github Readme to check to see if you missed a solution or interpretation that was expected on your lab/hw.