class: center, middle, inverse, title-slide # Tidy data ##
College of the Atlantic ### --- ## Tidy data >Happy families are all alike; every unhappy family is unhappy in its own way. > >Leo Tolstoy -- .pull-left[ **Characteristics of tidy data:** - Each variable forms a column. - Each observation forms a row. - Each type of observational unit forms a table. ] -- .pull-right[ **Characteristics of untidy data:** !@#$%^&*() ] --- ## .question[ What makes this data not tidy? ] <img src="img/hyperwar-airplanes-on-hand.png" width="70%" style="display: block; margin: auto;" /> .footnote[ Source: [Army Air Forces Statistical Digest, WW II](https://www.ibiblio.org/hyperwar/AAF/StatDigest/aafsd-3.html) ] --- .question[ What makes this data not tidy? ] <br> <img src="img/hiv-est-prevalence-15-49.png" width="70%" style="display: block; margin: auto;" /> .footnote[ Source: [Gapminder, Estimated HIV prevalence among 15-49 year olds](https://www.gapminder.org/data) ] --- .question[ What makes this data not tidy? ] <br> <img src="img/us-general-economic-characteristic-acs-2017.png" width="85%" style="display: block; margin: auto;" /> .footnote[ Source: [US Census Fact Finder, General Economic Characteristics, ACS 2017](https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_17_5YR_DP03&src=pt) ] --- ## Displaying vs. summarising data .panelset[ .panel[.panel-name[Output] .pull-left[ ``` ## name height mass ## 1 Luke Skywalker 172 77.0 ## 2 C-3PO 167 75.0 ## 3 R2-D2 96 32.0 ## 4 Darth Vader 202 136.0 ## 5 Leia Organa 150 49.0 ## 6 Owen Lars 178 120.0 ## 7 Beru Whitesun lars 165 75.0 ## 8 R5-D4 97 32.0 ## 9 Biggs Darklighter 183 84.0 ## 10 Obi-Wan Kenobi 182 77.0 ## 11 Anakin Skywalker 188 84.0 ## 12 Wilhuff Tarkin 180 NA ## 13 Chewbacca 228 112.0 ## 14 Han Solo 180 80.0 ## 15 Greedo 173 74.0 ## 16 Jabba Desilijic Tiure 175 1358.0 ## 17 Wedge Antilles 170 77.0 ## 18 Jek Tono Porkins 180 110.0 ## 19 Yoda 66 17.0 ## 20 Palpatine 170 75.0 ## 21 Boba Fett 183 78.2 ## 22 IG-88 200 140.0 ## 23 Bossk 190 113.0 ## 24 Lando Calrissian 177 79.0 ## 25 Lobot 175 79.0 ## 26 Ackbar 180 83.0 ## 27 Mon Mothma 150 NA ## 28 Arvel Crynyd NA NA ## 29 Wicket Systri Warrick 88 20.0 ## 30 Nien Nunb 160 68.0 ## 31 Qui-Gon Jinn 193 89.0 ## 32 Nute Gunray 191 90.0 ## 33 Finis Valorum 170 NA ## 34 Jar Jar Binks 196 66.0 ## 35 Roos Tarpals 224 82.0 ## 36 Rugor Nass 206 NA ## 37 Ric Olié 183 NA ## 38 Watto 137 NA ## 39 Sebulba 112 40.0 ## 40 Quarsh Panaka 183 NA ## 41 Shmi Skywalker 163 NA ## 42 Darth Maul 175 80.0 ## 43 Bib Fortuna 180 NA ## 44 Ayla Secura 178 55.0 ## 45 Dud Bolt 94 45.0 ## 46 Gasgano 122 NA ## 47 Ben Quadinaros 163 65.0 ## 48 Mace Windu 188 84.0 ## 49 Ki-Adi-Mundi 198 82.0 ## 50 Kit Fisto 196 87.0 ## 51 Eeth Koth 171 NA ## 52 Adi Gallia 184 50.0 ## 53 Saesee Tiin 188 NA ## 54 Yarael Poof 264 NA ## 55 Plo Koon 188 80.0 ## 56 Mas Amedda 196 NA ## 57 Gregar Typho 185 85.0 ## 58 Cordé 157 NA ## 59 Cliegg Lars 183 NA ## 60 Poggle the Lesser 183 80.0 ## 61 Luminara Unduli 170 56.2 ## 62 Barriss Offee 166 50.0 ## 63 Dormé 165 NA ## 64 Dooku 193 80.0 ## 65 Bail Prestor Organa 191 NA ## 66 Jango Fett 183 79.0 ## 67 Zam Wesell 168 55.0 ## 68 Dexter Jettster 198 102.0 ## 69 Lama Su 229 88.0 ## 70 Taun We 213 NA ## 71 Jocasta Nu 167 NA ## 72 Ratts Tyerell 79 15.0 ## 73 R4-P17 96 NA ## 74 Wat Tambor 193 48.0 ## 75 San Hill 191 NA ## 76 Shaak Ti 178 57.0 ## 77 Grievous 216 159.0 ## 78 Tarfful 234 136.0 ## 79 Raymus Antilles 188 79.0 ## 80 Sly Moore 178 48.0 ## 81 Tion Medon 206 80.0 ## 82 Finn NA NA ## 83 Rey NA NA ## 84 Poe Dameron NA NA ## 85 BB8 NA NA ## 86 Captain Phasma NA NA ## 87 Padmé Amidala 165 45.0 ``` ] .pull-right[ ``` ## # A tibble: 3 x 2 ## gender avg_ht ## <chr> <dbl> ## 1 feminine 165. ## 2 masculine 177. ## 3 <NA> 181. ``` ] ] .panel[.panel-name[Code] .pull-left[ ```r starwars %>% select(name, height, mass) ``` ] .pull-right[ ```r starwars %>% group_by(gender) %>% summarize( avg_ht = mean(height, na.rm = TRUE) ) ``` ] ] ] --- ## Acknowledgements * This course builds on the materials from [Data Science in a Box](https://datasciencebox.org/) developed by Mine Çetinkaya-Rundel and are adapted under the [Creative Commons Attribution Share Alike 4.0 International](https://github.com/rstudio-education/datascience-box/blob/master/LICENSE.md)