Fitting and interpreting models

# Fitting and interpreting models
## <br><br> College of the Atlantic

---

# Models with numerical explanatory variables

---

## Data: Paris Paintings

```r
pp <- read_csv("data/paris-paintings.csv", na = c("n/a", "", "NA"))
```

- Number of observations: 3393
- Number of variables: 61

---

## Goal: Predict height from width

`$$\widehat{height}_{i} = \beta_0 + \beta_1 \times width_{i}$$`

---

---

## Step 1: Specify model

```r
linear_reg()
```

```
## Linear Regression Model Specification (regression)
## 
## Computational engine: lm
```

---

## Step 2: Set model fitting *engine*

```r
linear_reg() %>%
  set_engine("lm") # lm: linear model
```

```
## Linear Regression Model Specification (regression)
## 
## Computational engine: lm
```

---

## Step 3: Fit model & estimate parameters

... using **formula syntax**

```r
linear_reg() %>%
  set_engine("lm") %>%
  fit(Height_in ~ Width_in, data = pp)
```

```
## parsnip model object
## 
## 
## Call:
## stats::lm(formula = Height_in ~ Width_in, data = data)
## 
## Coefficients:
## (Intercept)     Width_in  
##      3.6214       0.7808
```

---

## A closer look at model output

```
## parsnip model object
## 
## 
## Call:
## stats::lm(formula = Height_in ~ Width_in, data = data)
## 
## Coefficients:
## (Intercept)     Width_in  
##      3.6214       0.7808
```

---

## A tidy look at model output

```r
linear_reg() %>%
  set_engine("lm") %>%
  fit(Height_in ~ Width_in, data = pp) %>%
  tidy()
```

```
## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    3.62    0.254        14.3 8.82e-45
## 2 Width_in       0.781   0.00950      82.1 0
```

---

## Slope and intercept

- **Slope:** For each additional inch the painting is wider, the height is expected to be higher, on average, by 0.781 inches.

--
- **Intercept:** Paintings that are 0 inches wide are expected to be 3.62 inches high, on average. (Does this make sense?)

---

## Correlation does not imply causation

Remember this when interpreting model coefficients

---

# Parameter estimation

---

## Linear model with a single predictor

- We're interested in `$\beta_0$` (population parameter for the intercept) and `$\beta_1$` (population parameter for the slope) in the following model:

`$$\hat{y}_{i} = \beta_0 + \beta_1~x_{i}$$`

--
- Tough luck, you can't have them...

--
- So we use sample statistics to estimate them:

`$$\hat{y}_{i} = b_0 + b_1~x_{i}$$`

---

## Least squares regression

- The regression line minimizes the sum of squared residuals.

--
- If `$e_i = y_i - \hat{y}_i$`, then, the regression line minimizes 
`$\sum_{i = 1}^n e_i^2$`.

---

## Visualizing residuals

---

## Visualizing residuals (cont.)

---

## Visualizing residuals (cont.)

---

## Properties of least squares regression

- The regression line goes through the center of mass point, the coordinates corresponding to average `$x$` and average `$y$`, `$(\bar{x}, \bar{y})$`:

`$$\bar{y} = b_0 + b_1 \bar{x} ~ \rightarrow ~ b_0 = \bar{y} - b_1 \bar{x}$$`

--
- The slope has the same sign as the correlation coefficient: `$b_1 = r \frac{s_y}{s_x}$`

--
- The sum of the residuals is zero: `$\sum_{i = 1}^n e_i = 0$`

--
- The residuals and `$x$` values are uncorrelated

---
class: middle

# Model checking

---

## Data: Paris Paintings

```r
pp <- read_csv("data/paris-paintings.csv", na = c("n/a", "", "NA"))
```

- Number of observations: 3393
- Number of variables: 61

---

## "Linear" models

- We're fitting a "linear" model, which assumes a linear relationship between our explanatory and response variables.
- But how do we assess this?

---

## Graphical diagnostic: residuals plot

.panelset[
.panel[.panel-name[Plot]
<img src="u4-d02-fitting-interpreting-models_files/figure-html/unnamed-chunk-10-1.png" width="60%" style="display: block; margin: auto;" />
]
.panel[.panel-name[Code]

```r
ht_wt_fit <- linear_reg() %>%
  set_engine("lm") %>%
  fit(Height_in ~ Width_in, data = pp)

*ht_wt_fit_aug <- augment(ht_wt_fit$fit)

ggplot(ht_wt_fit_aug, mapping = aes(x = .fitted, y = .resid)) +
  geom_point(alpha = 0.5) +
  geom_hline(yintercept = 0, color = "gray", lty = "dashed") +
  labs(x = "Predicted height", y = "Residuals")
```
]

```r
ht_wt_fit_aug
```

```
## # A tibble: 3,135 x 9
##    .rown~1 Heigh~2 Width~3 .fitted  .resid    .hat .sigma .cooksd
##    <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>
##  1 1            37    29.5   26.7  10.3    3.99e-4   8.30 3.10e-4
##  2 2            18    14     14.6   3.45   3.96e-4   8.31 3.42e-5
##  3 3            13    16     16.1  -3.11   3.61e-4   8.31 2.54e-5
##  4 4            14    18     17.7  -3.68   3.37e-4   8.31 3.30e-5
##  5 5            14    18     17.7  -3.68   3.37e-4   8.31 3.30e-5
##  6 6             7    10     11.4  -4.43   4.98e-4   8.31 7.09e-5
##  7 7             6    13     13.8  -7.77   4.18e-4   8.30 1.83e-4
##  8 8             6    13     13.8  -7.77   4.18e-4   8.30 1.83e-4
##  9 9            15    15     15.3  -0.333  3.77e-4   8.31 3.04e-7
## 10 10            9     7      9.09 -0.0870 6.01e-4   8.31 3.30e-8
## # ... with 3,125 more rows, 1 more variable: .std.resid <dbl>,
## #   and abbreviated variable names 1: .rownames, 2: Height_in,
## #   3: Width_in
```
]
]

---

## More on `augment()`

```r
glimpse(ht_wt_fit_aug)
```

```
## Rows: 3,135
## Columns: 9
## $ .rownames  <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9",~
## $ Height_in  <dbl> 37, 18, 13, 14, 14, 7, 6, 6, 15, 9, 9, 16, 1~
## $ Width_in   <dbl> 29.5, 14.0, 16.0, 18.0, 18.0, 10.0, 13.0, 13~
## $ .fitted    <dbl> 26.65490, 14.55256, 16.11415, 17.67574, 17.6~
## $ .resid     <dbl> 10.3451004, 3.4474447, -3.1141481, -3.675740~
## $ .hat       <dbl> 0.0003991488, 0.0003961825, 0.0003611963, 0.~
## $ .sigma     <dbl> 8.303538, 8.305367, 8.305409, 8.305336, 8.30~
## $ .cooksd    <dbl> 3.099689e-04, 3.416655e-05, 2.541574e-05, 3.~
## $ .std.resid <dbl> 1.24600543, 0.41522347, -0.37507338, -0.4427~
```

---

## Looking for...

- Residuals distributed randomly around 0
- With no visible pattern along the x or y axes

---

## Not looking for...

---

## Not looking for...

---

## Not looking for...

---

## Not looking for...

---

.question[
What patterns does the residuals plot reveal that should make us question whether the current model is a good fit for modeling the height of the paintings?
]

---

# Models with categorical explanatory variables

---

## Categorical predictor with 2 levels

```
## # A tibble: 3,393 x 3
##    name      Height_in landsALL
##    <chr>         <dbl>    <dbl>
##  1 L1764-2          37        0
##  2 L1764-3          18        0
##  3 L1764-4          13        1
##  4 L1764-5a         14        1
##  5 L1764-5b         14        1
##  6 L1764-6           7        0
##  7 L1764-7a          6        0
##  8 L1764-7b          6        0
##  9 L1764-8          15        0
## 10 L1764-9a          9        0
## 11 L1764-9b          9        0
## 12 L1764-10a        16        1
## 13 L1764-10b        16        1
## 14 L1764-10c        16        1
## 15 L1764-11         20        0
## 16 L1764-12a        14        1
## 17 L1764-12b        14        1
## 18 L1764-13a        15        1
## 19 L1764-13b        15        1
## 20 L1764-14         37        0
## # ... with 3,373 more rows
```
]
]
.pull-right-wide[
- `landsALL = 0`: No landscape features
- `landsALL = 1`: Some landscape features
]

---

## Height & landscape features

```r
linear_reg() %>%
  set_engine("lm") %>%
  fit(Height_in ~ factor(landsALL), data = pp) %>%
  tidy()
```

```
## # A tibble: 2 x 5
##   term              estimate std.error statistic  p.value
##   <chr>                <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)          22.7      0.328      69.1 0       
## 2 factor(landsALL)1    -5.65     0.532     -10.6 7.97e-26
```

---

## Height & landscape features

`$$\widehat{Height_{in}} = 22.7 - 5.645~landsALL$$`

- **Intercept:** Paintings that don't have landscape features are expected, on average, to be 22.7 inches tall

- **Landscapes** Paintings with landscape features are expected, on average, to be 5.645 inches shorter than paintings that without landscape features
  - Compares baseline level (`landsALL = 0`) to the other level (`landsALL = 1`)
    
---

## Graphical diagnostic: residuals plot

.panelset[
.panel[.panel-name[Plot]
<img src="u4-d02-fitting-interpreting-models_files/figure-html/unnamed-chunk-20-1.png" width="60%" style="display: block; margin: auto;" />
]
.panel[.panel-name[Code]

```r
ht_land_fit <- linear_reg() %>%
  set_engine("lm") %>%
  fit(Height_in ~ factor(landsALL), data = pp)

*ht_land_fit_aug <- augment(ht_land_fit$fit)

ggplot(ht_land_fit_aug, mapping = aes(x = .fitted, y = .resid)) +
  geom_point(alpha = 0.5) +
  geom_hline(yintercept = 0, color = "gray", lty = "dashed") +
  labs(x = "Predicted height", y = "Residuals")
```
]

```r
ht_land_fit_aug
```

```
## # A tibble: 3,141 x 9
##    .rowna~1 Heigh~2 facto~3 .fitted .resid    .hat .sigma .cooksd
##    <chr>      <dbl> <fct>     <dbl>  <dbl>   <dbl>  <dbl>   <dbl>
##  1 1             37 0          22.7  14.3  5.13e-4   14.5 2.51e-4
##  2 2             18 0          22.7  -4.68 5.13e-4   14.5 2.68e-5
##  3 3             13 1          17.0  -4.03 8.39e-4   14.5 3.26e-5
##  4 4             14 1          17.0  -3.03 8.39e-4   14.5 1.85e-5
##  5 5             14 1          17.0  -3.03 8.39e-4   14.5 1.85e-5
##  6 6              7 0          22.7 -15.7  5.13e-4   14.5 3.01e-4
##  7 7              6 0          22.7 -16.7  5.13e-4   14.5 3.41e-4
##  8 8              6 0          22.7 -16.7  5.13e-4   14.5 3.41e-4
##  9 9             15 0          22.7  -7.68 5.13e-4   14.5 7.22e-5
## 10 10             9 0          22.7 -13.7  5.13e-4   14.5 2.29e-4
## # ... with 3,131 more rows, 1 more variable: .std.resid <dbl>,
## #   and abbreviated variable names 1: .rownames, 2: Height_in,
## #   3: `factor(landsALL)`
```
]
]

---

## Height, Width & landscape features

```r
linear_reg() %>%
  set_engine("lm") %>%
  fit(Height_in ~ factor(landsALL) + Width_in, data = pp) %>%
  tidy()
```

```
## # A tibble: 3 x 5
##   term              estimate std.error statistic  p.value
##   <chr>                <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)          5.62    0.269        20.9 1.24e-90
## 2 factor(landsALL)1   -5.02    0.292       -17.2 3.70e-63
## 3 Width_in             0.777   0.00909      85.4 0
```

---

## Height & landscape features

`$$\widehat{Height_{in}} = 5.615 - 5.017~landsALL + 0.777 \times width_{i}$$`

- **Intercept:** Paintings that don't have landscape features are expected, on average, to be 5.615 inches tall

- **Landscapes** Paintings with landscape features are expected, on average, to be 5.017 inches shorter than paintings that without landscape features
  - Compares baseline level (`landsALL = 0`) to the other level (`landsALL = 1`)
  
- **Slope:** For every inch in width, paintings are expected to be 0.777 inches taller.
    
---

## Graphical diagnostic: residuals plot

.panelset[
.panel[.panel-name[Plot]
<img src="u4-d02-fitting-interpreting-models_files/figure-html/unnamed-chunk-22-1.png" width="60%" style="display: block; margin: auto;" />
]
.panel[.panel-name[Code]

```r
ht_wt_land_fit <- linear_reg() %>%
  set_engine("lm") %>%
  fit(Height_in ~ factor(landsALL) + Width_in, data = pp)

*ht_wt_land_fit_aug <- augment(ht_wt_land_fit$fit)

ggplot(ht_wt_land_fit_aug, mapping = aes(x = .fitted, y = .resid)) +
  geom_point(alpha = 0.5) +
  geom_hline(yintercept = 0, color = "gray", lty = "dashed") +
  labs(x = "Predicted height", y = "Residuals")
```
]

```r
ht_wt_land_fit_aug
```

```
## # A tibble: 3,135 x 10
##    .rown~1 Heigh~2 facto~3 Width~4 .fitted  .resid    .hat .sigma
##    <chr>     <dbl> <fct>     <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
##  1 1            37 0          29.5    28.5  8.47   5.88e-4   7.94
##  2 2            18 0          14      16.5  1.51   5.98e-4   7.94
##  3 3            13 1          16      13.0 -0.0251 8.75e-4   7.94
##  4 4            14 1          18      14.6 -0.578  8.53e-4   7.94
##  5 5            14 1          18      14.6 -0.578  8.53e-4   7.94
##  6 6             7 0          10      13.4 -6.38   7.03e-4   7.94
##  7 7             6 0          13      15.7 -9.71   6.20e-4   7.94
##  8 8             6 0          13      15.7 -9.71   6.20e-4   7.94
##  9 9            15 0          15      17.3 -2.27   5.78e-4   7.94
## 10 10            9 0           7      11.1 -2.05   8.09e-4   7.94
## # ... with 3,125 more rows, 2 more variables: .cooksd <dbl>,
## #   .std.resid <dbl>, and abbreviated variable names
## #   1: .rownames, 2: Height_in, 3: `factor(landsALL)`,
## #   4: Width_in
```
]
]

---

## Logging Height, Graphical diagnostic: residuals plot

.panelset[
.panel[.panel-name[Plot]
<img src="u4-d02-fitting-interpreting-models_files/figure-html/unnamed-chunk-24-1.png" width="60%" style="display: block; margin: auto;" />
]
.panel[.panel-name[Code]

```r
ht_wt_land_fit <- linear_reg() %>%
  set_engine("lm") %>%
  fit(log(Height_in) ~ factor(landsALL) + Width_in, data = pp)

*ht_wt_land_fit_aug <- augment(ht_wt_land_fit$fit)

ggplot(ht_wt_land_fit_aug, mapping = aes(x = .fitted, y = .resid)) +
  geom_point(alpha = 0.5) +
  geom_hline(yintercept = 0, color = "gray", lty = "dashed") +
  labs(x = "Predicted logged height", y = "Residuals")
```
]

```r
ht_wt_land_fit_aug
```

```
## # A tibble: 3,135 x 10
##    .rown~1 log(H~2 facto~3 Width~4 .fitted  .resid    .hat .sigma
##    <chr>     <dbl> <fct>     <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
##  1 1          3.61 0          29.5    3.15  0.456  5.88e-4  0.400
##  2 2          2.89 0          14      2.64  0.248  5.98e-4  0.400
##  3 3          2.56 1          16      2.47  0.0966 8.75e-4  0.400
##  4 4          2.64 1          18      2.53  0.105  8.53e-4  0.400
##  5 5          2.64 1          18      2.53  0.105  8.53e-4  0.400
##  6 6          1.95 0          10      2.51 -0.564  7.03e-4  0.400
##  7 7          1.79 0          13      2.61 -0.818  6.20e-4  0.400
##  8 8          1.79 0          13      2.61 -0.818  6.20e-4  0.400
##  9 9          2.71 0          15      2.68  0.0326 5.78e-4  0.400
## 10 10         2.20 0           7      2.41 -0.214  8.09e-4  0.400
## # ... with 3,125 more rows, 2 more variables: .cooksd <dbl>,
## #   .std.resid <dbl>, and abbreviated variable names
## #   1: .rownames, 2: `log(Height_in)`, 3: `factor(landsALL)`,
## #   4: Width_in
```
]
]

---
## Residuals Summary

- Non-constant variance is one of the most common model violations, however it is usually fixable by transforming the response (y) variable.

--
- The most common transformation when the response variable is right skewed is the log transform: `$log(y)$`, especially useful when the response variable is 
(extremely) right skewed.

--
- This transformation is also useful for variance stabilization.

--
- When using a log transformation on the response variable the interpretation of 
the slope changes: *"For each unit increase in x, y is expected on average to be higher/lower <br> by a factor of `$e^{b_1}$`."*

--
- Another useful transformation is the square root: `$\sqrt{y}$`, especially 
useful when the response variable is counts.

---
class: inverse

# Additional Slides

---

## Relationship between height and school

```r
linear_reg() %>%
  set_engine("lm") %>%
  fit(Height_in ~ school_pntg, data = pp) %>%
  tidy()
```

```
## # A tibble: 7 x 5
##   term            estimate std.error statistic p.value
##   <chr>              <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)        14.0       10.0     1.40  0.162  
## 2 school_pntgD/FL     2.33      10.0     0.232 0.816  
## 3 school_pntgF       10.2       10.0     1.02  0.309  
## 4 school_pntgG        1.65      11.9     0.139 0.889  
## 5 school_pntgI       10.3       10.0     1.02  0.306  
## 6 school_pntgS       30.4       11.4     2.68  0.00744
## 7 school_pntgX        2.87      10.3     0.279 0.780
```

---

## Dummy variables

- When the categorical explanatory variable has many levels, they're encoded to **dummy variables**
- Each coefficient describes the expected difference between heights in that particular school compared to the baseline level

---

## Categorical predictor with 3+ levels

.pull-left-wide[
<table class="table" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> school_pntg </th>
   <th style="text-align:center;"> D_FL </th>
   <th style="text-align:center;"> F </th>
   <th style="text-align:center;"> G </th>
   <th style="text-align:center;"> I </th>
   <th style="text-align:center;"> S </th>
   <th style="text-align:center;"> X </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> A </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> D/FL </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(122, 209, 81, 1) !important;"> 1 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> F </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(122, 209, 81, 1) !important;"> 1 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> G </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(122, 209, 81, 1) !important;"> 1 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> I </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(122, 209, 81, 1) !important;"> 1 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> S </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(122, 209, 81, 1) !important;"> 1 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> X </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(68, 1, 84, 1) !important;"> 0 </td>
   <td style="text-align:center;width: 10em; color: white !important;background-color: rgba(122, 209, 81, 1) !important;"> 1 </td>
  </tr>
</tbody>
</table>
]
.pull-right-narrow[
.small[

```
## # A tibble: 3,393 x 3
##    name      Height_in school_pntg
##    <chr>         <dbl> <chr>      
##  1 L1764-2          37 F          
##  2 L1764-3          18 I          
##  3 L1764-4          13 D/FL       
##  4 L1764-5a         14 F          
##  5 L1764-5b         14 F          
##  6 L1764-6           7 I          
##  7 L1764-7a          6 F          
##  8 L1764-7b          6 F          
##  9 L1764-8          15 I          
## 10 L1764-9a          9 D/FL       
## 11 L1764-9b          9 D/FL       
## 12 L1764-10a        16 X          
## 13 L1764-10b        16 X          
## 14 L1764-10c        16 X          
## 15 L1764-11         20 D/FL       
## 16 L1764-12a        14 D/FL       
## 17 L1764-12b        14 D/FL       
## 18 L1764-13a        15 D/FL       
## 19 L1764-13b        15 D/FL       
## 20 L1764-14         37 F          
## # ... with 3,373 more rows
```
]
]

---

## Relationship between height and school

- **Austrian school (A)** paintings are expected, on average, to be **14 inches** tall.
- **Dutch/Flemish school (D/FL)** paintings are expected, on average, to be **2.33 inches taller** than *Austrian school* paintings.
- **French school (F)** paintings are expected, on average, to be **10.2 inches taller** than *Austrian school* paintings.
- **German school (G)** paintings are expected, on average, to be **1.65 inches taller** than *Austrian school* paintings.
- **Italian school (I)** paintings are expected, on average, to be **10.3 inches taller** than *Austrian school* paintings.
- **Spanish school (S)** paintings are expected, on average, to be **30.4 inches taller** than *Austrian school* paintings.
- Paintings whose school is **unknown (X)** are expected, on average, to be **2.87 inches taller** than *Austrian school* paintings.
]

---
## Acknowledgements

* This course builds on the materials from [Data Science in a Box](https://datasciencebox.org/) developed by Mine Çetinkaya-Rundel and are adapted under the [Creative Commons Attribution Share Alike 4.0 International](https://github.com/rstudio-education/datascience-box/blob/master/LICENSE.md)