class: center, middle, inverse, title-slide # Visualising categorical data ##
College of the Atlantic ###
https://coa-dataviz.netlify.app/
--- class: middle # Recap --- ## Variables - **Numerical** variables can be classified as **continuous** or **discrete** based on whether or not the variable can take on an infinite number of values or only non-negative whole numbers, respectively. - If the variable is **categorical**, we can determine if it is **ordinal** based on whether or not the levels have a natural ordering. --- ### Data ```r library(openintro) loans <- loans_full_schema %>% select(loan_amount, interest_rate, term, grade, state, annual_income, homeownership, debt_to_income) glimpse(loans) ``` ``` ## Rows: 10,000 ## Columns: 8 ## $ loan_amount <int> 28000, 5000, 2000, 21600, 23000, 5000, 2~ ## $ interest_rate <dbl> 14.07, 12.61, 17.09, 6.72, 14.07, 6.72, ~ ## $ term <dbl> 60, 36, 36, 36, 36, 36, 60, 60, 36, 36, ~ ## $ grade <ord> C, C, D, A, C, A, C, B, C, A, C, B, C, B~ ## $ state <fct> NJ, HI, WI, PA, CA, KY, MI, AZ, NV, IL, ~ ## $ annual_income <dbl> 90000, 40000, 40000, 30000, 35000, 34000~ ## $ homeownership <fct> MORTGAGE, RENT, RENT, RENT, RENT, OWN, M~ ## $ debt_to_income <dbl> 18.01, 5.04, 21.15, 10.16, 57.96, 6.46, ~ ``` --- class: middle # Bar plot --- ## Bar plot ```r ggplot(loans, aes(x = homeownership)) + geom_bar() ``` <img src="u2-d04-viz-cat_files/figure-html/unnamed-chunk-3-1.png" width="60%" style="display: block; margin: auto;" /> --- ## Segmented bar plot ```r ggplot(loans, aes(x = homeownership, * fill = grade)) + geom_bar() ``` <img src="u2-d04-viz-cat_files/figure-html/unnamed-chunk-4-1.png" width="60%" style="display: block; margin: auto;" /> --- ## Segmented bar plot ```r ggplot(loans, aes(x = homeownership, fill = grade)) + * geom_bar(position = "fill") ``` <img src="u2-d04-viz-cat_files/figure-html/unnamed-chunk-5-1.png" width="60%" style="display: block; margin: auto;" /> --- .question[ Which bar plot is a more useful representation for visualizing the relationship between homeownership and grade? ] .pull-left[ <img src="u2-d04-viz-cat_files/figure-html/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="u2-d04-viz-cat_files/figure-html/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Customizing bar plots .panelset[ .panel[.panel-name[Plot] <img src="u2-d04-viz-cat_files/figure-html/unnamed-chunk-8-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Code] ```r *ggplot(loans, aes(y = homeownership, fill = grade)) + geom_bar(position = "fill") + * labs( * x = "Proportion", * y = "Homeownership", * fill = "Grade", * title = "Grades of Lending Club loans", * subtitle = "and homeownership of lendee" * ) ``` ] ] --- class: middle # Relationships between numerical and categorical variables --- ## Already talked about... - Colouring and faceting histograms and density plots - Side-by-side box plots --- ## Violin plots ```r ggplot(loans, aes(x = homeownership, y = loan_amount)) + geom_violin() ``` <img src="u2-d04-viz-cat_files/figure-html/unnamed-chunk-9-1.png" width="60%" style="display: block; margin: auto;" /> --- ## Ridge plots ```r library(ggridges) ggplot(loans, aes(x = loan_amount, y = grade, fill = grade, color = grade)) + geom_density_ridges(alpha = 0.5) ``` <img src="u2-d04-viz-cat_files/figure-html/unnamed-chunk-10-1.png" width="60%" style="display: block; margin: auto;" /> --- ## Acknowledgements * This course builds on the materials from [Data Science in a Box](https://datasciencebox.org/) developed by Mine Çetinkaya-Rundel and are adapted under the [Creative Commons Attribution Share Alike 4.0 International](https://github.com/rstudio-education/datascience-box/blob/master/LICENSE.md)