Photo by Mauro Mora on Unsplash
The GSS gathers data on contemporary American society in order to monitor and explain trends and constants in attitudes, behaviours, and attributes. Hundreds of trends have been tracked since 1972. In addition, since the GSS adopted questions from earlier surveys, trends can be followed for up to 70 years.
The GSS contains a standard core of demographic, behavioural, and attitudinal questions, plus topics of special interest. Among the topics covered are civil liberties, crime and violence, intergroup tolerance, morality, national spending priorities, psychological well-being, social mobility, and stress and traumatic events.
In this assignment we analyze data from the 2016 GSS, using it to estimate values of population parameters of interest about US adults.1
Follow the github assignment link and clone it in RStudio and open the R Markdown document. Knit the document to make sure it compiles without errors.
Before we introduce the data, let’s warm up with some simple exercises. Update the YAML of your R Markdown file with your information, knit, commit, and push your changes. Make sure to commit with a meaningful commit message. Then, go to your repo on GitHub and confirm that your changes are visible in your Rmd and md files. If anything is missing, commit and push again.
We’ll use the tidyverse package for much of the data wrangling and visualisation and the data lives in the dsbox package. Remember to install the packages and then you can load them by running the following in your Console:
library(tidyverse)
library(dsbox)
library(tidymodels)
## Warning: package 'tidymodels' was built under R version 4.1.3
## Warning: package 'broom' was built under R version 4.1.3
## Warning: package 'dials' was built under R version 4.1.3
## Warning: package 'scales' was built under R version 4.1.3
## Warning: package 'infer' was built under R version 4.1.3
## Warning: package 'modeldata' was built under R version 4.1.3
## Warning: package 'parsnip' was built under R version 4.1.3
## Warning: package 'recipes' was built under R version 4.1.3
## Warning: package 'rsample' was built under R version 4.1.3
## Warning: package 'tune' was built under R version 4.1.3
## Warning: package 'workflows' was built under R version 4.1.3
## Warning: package 'workflowsets' was built under R version 4.1.3
## Warning: package 'yardstick' was built under R version 4.1.3
The data can be found in the dsbox package, and it’s
called gss16
. It is also included in your data folder.
Since the dataset is distributed with the package, we don’t need to load
it separately; it becomes available to us when we load the package. You
can find out more about the dataset by inspecting its documentation,
which you can access by running ?gss16
in the Console or
using the Help menu in RStudio to search for gss16
. You can
also find this information here.
In 2016, the GSS added a new question on harassment at work. The question is phrased as the following.
Over the past five years, have you been harassed by your superiors or co-workers at your job, for example, have you experienced any bullying, physical or psychological abuse?
Answers to this question are stored in the harass5
variable in our dataset.
What are the possible responses to this question and how many respondents chose each of these answers?
What percent of the respondents for whom this question is
applicable
(i.e. excluding NA
s and Does not apply
s) have
been harassed by their superiors or co-workers at their job.
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
The 2016 GSS also asked respondents how many hours and minutes they
spend on email weekly. The responses to these questions are recorded in
the emailhr
and emailmin
variables. For
example, if the response is 2.5 hrs, this would be recorded as
emailhr = 2
and emailmin = 30
.
Create a new variable called email
that combines
these two variables to reports the number of minutes the respondents
spend on email weekly.
Visualize the distribution of this new variable. Find the mean and the median number of minutes respondents spend on email weekly. Is the mean or the median a better measure of the typical amount of time Americans spend on email weekly? Why?
Create another new variable, snap_insta
that is
coded as “Yes” if the respondent reported using any of Snapchat
(snapchat
) or Instagram (instagrm
), and “No”
if not. If the recorded value was NA
for both of these
questions, the value in your new variable should also be
NA
.
Calculate the percentage of Yes’s for snap_insta
among those who answered the question, i.e. excluding
NA
s.
What are the possible responses to the question Last week
were you working full time, part time, going to school, keeping house,
or what? and how many respondents chose each of these answers? Note
that this information is stored in the wrkstat
variable.
Fit a model predicting email
(number of minutes per
week spent on email) from educ
(number of years of
education), wrkstat
, and snap_insta
. Interpret
the slopes for each of these variables.
Create a predicted values vs. residuals plot for this model. Are there any issues with the model? If yes, describe them.
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
The 2016 GSS also asked respondents whether they think of themselves
as liberal or conservative (polviews
) and whether they
think science research is necessary and should be supported by the
federal government (advfront
).
Even if it brings no immediate benefits, scientific research that advances the frontiers of knowledge is necessary and should be supported by the federal government.
And possible responses to this question are Strongly agree, Agree, Disagree, Strongly disagree, Don’t know, No answer, Not applicable.
We hear a lot of talk these days about liberals and conservatives. I’m going to show you a seven-point scale on which the political views that people might hold are arranged from extremely liberal–point 1–to extremely conservative–point 7. Where would you place yourself on this scale?
Note: The levels of this variables are spelled inconsistently: “Extremely liberal” vs. “Extrmly conservative”. Since this is the spelling that shows up in the data, you need to make sure this is how you spell the levels in your code.
And possible responses to this question are Extremely liberal,
Liberal, Slightly liberal, Moderate, Slghtly conservative, Conservative,
Extrmly conservative. Responses that were originally Don’t know, No
answer and Not applicable are already mapped to NA
s upon
data import.
In a new variable, recode advfront
such that
Strongly Agree and Agree are mapped to "Yes"
, and Disagree
and Strongly disagree are mapped to "No"
. The remaining
levels can be left as is. Don’t overwrite the existing
advfront
, instead pick a different, informative name for
your new variable.
In a new variable, recode polviews
such that
Extremely liberal, Liberal, and Slightly liberal, are mapped to
"Liberal"
, and Slghtly conservative, Conservative, and
Extrmly conservative disagree are mapped to "Conservative"
.
The remaining levels can be left as is. Make sure that the levels are in
a reasonable order. Don’t overwrite the existing polviews
,
instead pick a different, informative name for your new
variable.
Create a visualization that displays the relationship between these two new variables and interpret it.
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards and review the md document on GitHub to make sure you’re happy with the final state of your work.
Smith, Tom W, Peter Marsden, Michael Hout, and Jibum Kim. General Social Surveys, 1972-2016 [machine-readable data file] /Principal Investigator, Tom W. Smith; Co-Principal Investigator, Peter V. Marsden; Co-Principal Investigator, Michael Hout; Sponsored by National Science Foundation. -NORC ed.- Chicago: NORC at the University of Chicago [producer and distributor]. Data accessed from the GSS Data Explorer website at gssdataexplorer.norc.org.↩︎