Photo by Madeleine Kohler on Unsplash
Once upon a time, people travelled all over the world, and some stayed in hotels and others chose to stay in other people’s houses that they booked through Airbnb. Recent developments in Edinburgh regarding the growth of Airbnb and its impact on the housing market means a better understanding of the Airbnb listings is needed. Using data provided by Airbnb, we can explore how Airbnb availability and prices vary by neighbourhood.
Click on the assignment link for this assignment. It will create a
repository for you on github with the name (or some similar version)
hw-02-airbnb-edi-YOUR_GITHUB_USERNAME
. Grab the URL of the
repo, and clone it in RStudio. First, open the R Markdown document
hw-02.Rmd
and Knit it. Make sure it compiles without
errors. The output will be in the file markdown .md
file
with the same name.
Before we introduce the data, let’s warm up with some simple exercises.
We’ll use the tidyverse package for much of the data wrangling and visualization and the data lives in the dsbox package. We will be using the ggridges package as well to make a ridges plots.
You can load them by running the following in your Console:
#install.packages("tidyverse")
library(tidyverse)
# install.packages("devtools")
# library(devtools)
# devtools::install_github("rstudio-education/dsbox")
library(dsbox)
# install.packages("ggridges")
library(ggridges)
Note:
The dsbox package is loaded using a package call to
devtools
. devtools
is used to load packages
installed on GitHub. You can type ??dsbox
within the
console or do a google search to figure out where it is loaded
from.
If the above code does not run, you will want to uncomment (remove the #) and install the required packages. You will then want to recommend (add the #) so that the package does not get installed everytime you knit.
The data can be found in the dsbox package, and it’s
called edibnb
. Since the dataset is distributed with the
package, we don’t need to load it separately; it becomes available to us
when we load the package.
Occasionally RStudio Cloud and Github have trouble connecting. So the dataset is also in the data folder and can be loaded using this code instead.
<- read_csv("data/edibnb.csv") edibnb
You can view the dataset as a spreadsheet using the
View()
function. Note that you should not put this function
in your R Markdown document, but instead type it directly in the
Console, as it pops open a new window (and the concept of popping open a
window in a static document doesn’t really make sense…). When you run
this in the console, you’ll see the following data
viewer window pop up.
View(edibnb)
You can find out more about the dataset by inspecting its
documentation, which you can access by running ?edibnb
in
the Console or using the Help menu in RStudio to search for
edibnb
. You can also find this information on
the RStudio Education Github page for the package.
Hint: The Markdown Quick Reference sheet has an example of inline R code that might be helpful. You can access it from the Help menu in RStudio. It is also referenced in this section of the RMarkdown cookbook.
View(edibnb)
in your Console to view the data in
the data viewer. What does each row in the dataset represent?Hint: It’s something you can book.
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
Each column represents a variable. We can get a list of the variables
in the data frame using the names()
function.
names(edibnb)
You can find descriptions of each of the variables in the help file
for the dataset, which you can access by running ?edibnb
in
your Console.
Note: The plot will give a warning about some observations with non-finite values for price being removed. Don’t worry about the warning, it simply means that 199 listings in the data didn’t have prices available, so they can’t be plotted.
ggplot(data = ___, mapping = aes(x = ___)) +
geom_histogram(binwidth = ___) +
facet_wrap(~___)
Let’s de-construct this code:
ggplot()
is the function we are using to build our
plot, in layers.🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
Hint: 4a will give you the names you will use to filter the neighbourhoods before making the plot in 4b.
New
term: Whenever you have more than one pipe %>%
it is called a pipeline 😎.
4a. Use a single pipeline to identity the neighbourhoods with the top five median listing prices.
Hint:
Did your pipeline return NAs for the median listing prices? Use
?median
to find out how the argument na.rm
might be useful in your code.
4b. Then, in another pipeline filter the data for these five neighbourhoods and make ridge plots of the distributions of listing prices in these five neighbourhoods.
Hint: Check out the tutorial on R Graph Gallery on how to make a rideline plot.
4c. In a third pipeline calculate the minimum, mean, median, standard deviation, IQR, and maximum listing price in each of these neighbourhoods.
Use the visualisation and the summary statistics to describe the distribution of listing prices in the neighbourhoods.
(Your final answer for 4 will include three pipelines, one of which ends in a visualisation, and a narrative.)
review_scores_rating
) across
neighbourhoods. You get to decide what type of visualisation to create
and there is more than one correct answer! In your answer, include a
brief interpretation of how Airbnb guests rate properties in general and
how the neighbourhoods compare to each other in terms of their
ratings.Tip: Looking for ideas? Check out R Graph Gallery for some types of graphs could you use to plot a distribution? What ones have we learned so far in class?
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards and review the md document on GitHub to make sure you’re happy with the final state of your work. Finally, look at the Readme and go through the self checklist to check that you’ve remembered to add your interpretations to the document