STAT 213: Polynomials and Interactions

The plotly package will let us make some interactive 3D plots

library(mosaic)
library(plotly)

Education Spending and SAT Scores

First we load the data.

data(SAT)

Controlling for Participation Rate

Let’s fit a model to predict SAT using 1. taking test, and 2. expenditure / pupil

m_frac_expend <- lm(sat ~ frac + expend, data = SAT)
m_frac_expend

## 
## Call:
## lm(formula = sat ~ frac + expend, data = SAT)
## 
## Coefficients:
## (Intercept)         frac       expend  
##     993.832       -2.851       12.287

coefficients(m_frac_expend) %>% round(digits = 2)

## (Intercept)        frac      expend 
##      993.83       -2.85       12.29

This chunk defines some convenience functions to make plotting easier

get_prediction_grid <- function(x, z, pred_fun, data)
{
  require(mosaic)
  xmin <- min(data[[x]])
  xmax <- max(data[[x]])
  zmin <- min(data[[z]])
  zmax <- max(data[[z]])
  xmesh <- seq(xmin, xmax, length = 100)
  zmesh <- seq(zmin, zmax, length = 100)
  yhat <- outer(xmesh, zmesh, pred_fun)
  return(list(x1 = xmesh, x2 = zmesh, yhat = t(yhat)))
}

plot_sat_model_3D <- function(model)
{
  f_hat <- makeFun(model)
  surface <- get_prediction_grid("frac", "expend", f_hat, SAT)
  SAT %>%
    plot_ly(x = ~frac, y = ~expend, z = ~sat, size = 0.5, opacity = 0.5) %>%
    add_markers() %>%
    add_surface(x = surface$x1, y = surface$x2, z = surface$yhat)
}

Visualizing the Data and Model in 3D

Let’s make a 3D plot of the two variable model.

plot_sat_model_3D(m_frac_expend)

How do the residuals look?

gf_point(residuals(m_frac_expend) ~ fitted(m_frac_expend)) %>%
  gf_smooth()

Not great. We probably want a quadratic term for frac… but first…

An Interaction Model

Let’s fit a model with an interaction

library(plotly)
m_interaction <- lm(sat ~ frac + expend + frac:expend, data = SAT)

coefficients(m_interaction) %>% round(2)

## (Intercept)        frac      expend frac:expend 
##     1057.12       -4.23        0.63        0.24

plot_sat_model_3D(m_interaction)

gf_point(residuals(m_interaction) ~ fitted(m_interaction)) %>%
  gf_smooth()

Not necessarily what we needed in this case.

A Quadratic Control for `frac`

Let’s add a quadratic term for frac to the no-interaction model

m_frac2_expend <- lm(sat ~ frac + I(frac^2) + expend, data = SAT)

coefficients(m_frac2_expend) %>% round(digits = 2)

## (Intercept)        frac   I(frac^2)      expend 
##     1051.89       -6.38        0.05        7.91

plot_sat_model_3D(m_frac2_expend)

gf_point(residuals(m_frac2_expend) ~ fitted(m_frac2_expend)) %>%
  gf_smooth()

That definitely looks better…

Quadratic and Interaction Together

What if we have both the quadratic term and the interaction?

m_frac2_expend_interaction <- 
  lm(sat ~ frac + expend + I(frac^2) + expend:frac, data = SAT)

coefficients(m_frac2_expend_interaction) %>% round(digits = 2)

## (Intercept)        frac      expend   I(frac^2) frac:expend 
##     1020.33       -6.04       14.70        0.05       -0.15

plot_sat_model_3D(m_frac2_expend_interaction)

gf_point(residuals(m_frac2_expend_interaction) ~ fitted(m_frac2_expend_interaction)) %>%
  gf_smooth()

Nested Tests to Answer Targeted Questions

Does expenditure have additional predictive value after controlling for participation rate?

Let’s fit a reduced model that doesn’t use expenditure.

m_frac2 <- lm(sat ~ frac + I(frac^2), data = SAT)

Now we can do a nested test to compare this to our model that has expenditure as well as the interaction.

anova(m_frac2, m_frac2_expend_interaction)

## Analysis of Variance Table
## 
## Model 1: sat ~ frac + I(frac^2)
## Model 2: sat ~ frac + expend + I(frac^2) + expend:frac
##   Res.Df   RSS Df Sum of Sq      F  Pr(>F)  
## 1     47 34780                              
## 2     45 30467  2      4313 3.1852 0.05084 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Seems like the evidence is on the borderline of the standard threshold for statistical significance.

STAT 213: Polynomials and Interactions

Colin Reimer Dawson

July 7, 2021

Education Spending and SAT Scores

Controlling for Participation Rate

Visualizing the Data and Model in 3D

An Interaction Model

A Quadratic Control for `frac`

Quadratic and Interaction Together

Nested Tests to Answer Targeted Questions

STAT 213: Polynomials and Interactions

Colin Reimer Dawson

July 7, 2021

Education Spending and SAT Scores

Controlling for Participation Rate

Visualizing the Data and Model in 3D

An Interaction Model

A Quadratic Control for frac

Quadratic and Interaction Together

Nested Tests to Answer Targeted Questions

A Quadratic Control for `frac`