Photo by Viktor Kern on Unsplash
Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.
Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data.
Source: UCI Machine Learning Repository - Bike Sharing Dataset
Click on your assignment link, clone it in RStudio and open the R Markdown document. Knit the document to make sure it compiles without errors.
Before we introduce the data, let’s warm up with some simple exercises. Update the YAML of your R Markdown file with your information, knit, commit, and push your changes. Make sure to commit with a meaningful commit message. Then, go to your repo on GitHub and confirm that your changes are visible in your Rmd and md files. If anything is missing, commit and push again.
We’ll use the tidyverse package for much of the data
wrangling and visualisation and the data lives in the
dsbox package. Remember to install these packages
before loading using
install.packages("tidyverse")'. Loading the
dsboxpackage is a little more involved. We first need to install
devtools`
and then install the package from Github.
install.packages("devtools")
library(devtools)
::install_github("rstudio-education/dsbox") devtools
You can then load them by running the following in your Console:
library(tidyverse)
library(dsbox)
The data can be found in the dsbox package, and it’s
called dcbikeshare
. Since the dataset is distributed with
the package, we don’t need to load it separately; it becomes available
to us when we load the package.
In case there are problems with loading the dsbox
package. I have also included the dataset in the data folder. You can
load this in directly if you have any issues.
You can find out more about the dataset by inspecting its
documentation, which you can access by running ?dcbikeshare
in the Console or using the Help menu in RStudio to search for
dcbikeshare
. You can also find this information here.
The data include daily bike rental counts (by members and casual users) of Capital Bikeshare in Washington, DC in 2011 and 2012 as well as weather information on these days. The original data sources are http://capitalbikeshare.com/system-data and http://www.freemeteo.com.
Note:
You will find it helpful to review the functions case_when
and functions in the forcats
for this
exercize.
Recode the season
variable to be a factor with
meaningful level names as outlined in the codebook, with spring as the
baseline level.
Recode the binary variables holiday
and
workingday
to be factors with levels no (0) and yes (1),
with no as the baseline level.
Recode the yr
variable to be a factor with levels
2011 and 2012, with 2011 as the baseline level.
Recode the weathersit
variable as 1 - clear, 2 -
mist, 3 - light precipitation, and 4 - heavy precipitation, with clear
as the baseline.
Calculate raw temperature, feeling temperature, humidity, and windspeed as their values given in the dataset multiplied by the maximum raw values stated in the codebook for each variable. Instead of writing over the existing variables, create new ones with concise but informative names.
Check that the sum of casual
and
registered
adds up to cnt
for each record.
Hint: One way of doing this is to create a new column
that takes on the value TRUE
if they add up and
FALSE
if not, and then checking if all values in that
column are TRUE
s. But this is only one way, you might come
up with another.
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
Fit a linear model predicting total daily bike rentals from daily temperature. Write the linear model, interpret the slope and the intercept in context of the data, and determine and interpret the \(R^2\).
Fit another linear model predicting total daily bike rentals from daily feeling temperature. Write the linear model, interpret the slope and the intercept in context of the data, and determine and interpret the \(R^2\). Is temperature or feeling temperature a better predictor of bike rentals? Explain your reasoning.
Fit a model predicting total daily bike rentals from season, year, whether the day is holiday or not, whether the day is a workingday or not, the weather category, temperature, feeling temperature, humidity, and windspeed, as well as the interaction between feeling temperature and holiday. Record adjusted \(R^2\) of the model.
Write the linear models for holidays and non-holidays. Is the slope of temperature the same or different for these two models? How about the slope for feeling temperature? Why or why not?
Interpret the slopes of season and feeling temperature. If the slopes are different for holidays and non-holidays, make sure to interpret both. If the variable has multiple levels, make sure you interpret all of the slope coefficients associated with it.
Interpret the intercept. If the intercept is different for holidays and non-holidays, make sure to interpret both.
According to this model, assuming everything else is the same, in which season does the model predict total daily bike rentals to be highest and which to be the lowest?
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards and review the md document on GitHub to make sure you’re happy with the final state of your work.