The plotly
package will let us make some interactive 3D plots
First we load the data.
Let’s fit a model to predict SAT using 1. taking test, and 2. expenditure / pupil
##
## Call:
## lm(formula = sat ~ frac + expend, data = SAT)
##
## Coefficients:
## (Intercept) frac expend
## 993.832 -2.851 12.287
## (Intercept) frac expend
## 993.83 -2.85 12.29
This chunk defines some convenience functions to make plotting easier
get_prediction_grid <- function(x, z, pred_fun, data)
{
require(mosaic)
xmin <- min(data[[x]])
xmax <- max(data[[x]])
zmin <- min(data[[z]])
zmax <- max(data[[z]])
xmesh <- seq(xmin, xmax, length = 100)
zmesh <- seq(zmin, zmax, length = 100)
yhat <- outer(xmesh, zmesh, pred_fun)
return(list(x1 = xmesh, x2 = zmesh, yhat = t(yhat)))
}
plot_sat_model_3D <- function(model)
{
f_hat <- makeFun(model)
surface <- get_prediction_grid("frac", "expend", f_hat, SAT)
SAT %>%
plot_ly(x = ~frac, y = ~expend, z = ~sat, size = 0.5, opacity = 0.5) %>%
add_markers() %>%
add_surface(x = surface$x1, y = surface$x2, z = surface$yhat)
}
Let’s make a 3D plot of the two variable model.
How do the residuals look?
Not great. We probably want a quadratic term for frac
… but first…
Let’s fit a model with an interaction
## (Intercept) frac expend frac:expend
## 1057.12 -4.23 0.63 0.24
Not necessarily what we needed in this case.
frac
Let’s add a quadratic term for frac
to the no-interaction model
## (Intercept) frac I(frac^2) expend
## 1051.89 -6.38 0.05 7.91
That definitely looks better…
What if we have both the quadratic term and the interaction?
## (Intercept) frac expend I(frac^2) frac:expend
## 1020.33 -6.04 14.70 0.05 -0.15
gf_point(residuals(m_frac2_expend_interaction) ~ fitted(m_frac2_expend_interaction)) %>%
gf_smooth()
Does expenditure have additional predictive value after controlling for participation rate?
Let’s fit a reduced model that doesn’t use expenditure.
Now we can do a nested test to compare this to our model that has expenditure as well as the interaction.
## Analysis of Variance Table
##
## Model 1: sat ~ frac + I(frac^2)
## Model 2: sat ~ frac + expend + I(frac^2) + expend:frac
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 47 34780
## 2 45 30467 2 4313 3.1852 0.05084 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Seems like the evidence is on the borderline of the standard threshold for statistical significance.