STAT 237: Lab 6b – Hierarchical Models in Stan

Getting Set Up

If you missed the last class, you’ll need to run the following code (preferably in the console) to create a configuration file for compiling Stan models. This should only need to be done once per user account, so if you did this last time you don’t need to do it again now.

source("initialize-compiler.R")

Next, we’ll load the R package that interfaces with Stan, and tell the computer to use multiple CPU cores.

library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

The Data

The data we’ll use is similar to the dataset from last time, but rather than consisting of a single set of presumed i.i.d. observations, we’ll have binary data from S individual sources (for example, the people in the fictitious ESP experiment).

Read in the data by running the chunk below, and then examine the resulting dataset by clicking on hierarchical_data in the Environment to open a “spreadsheet” view of the data. Take note of how it’s structured: a column called s with ten 1s followed by ten 2s, and a column called y with 0 and 1 “outcome” data. This is a canonical format for data like this: a “grouping” variable that tells us which source the observation came from, and then the outcome variable itself. We could represent data from arbitrarily many individuals this way by just allowing \(s\) to take on additional values.

hierarchical_data <- read_csv("hierarchical-bernoulli.csv")

Recall that we defined a hierarchical model where each person had their own value of \(\theta\), and where the \(\theta\)s were themselves assumed to be distributed around a mean \(\omega\), which described the “average success rate” in the general population. We also defined a parameter \(\kappa\), which is meant to represent the degree to which members of the population have similar long run success rates (that is, how “tightly” the individual \(\theta\)s cluster around \(\omega\)).

The formal model we defined in class looked like so:

The individual \(y\)s are assumed to come from a Bernoulli distribution with a parameter \(\theta_s\) which is specific to person \(s\):

\[ y_{n|s} | \theta_s \sim Bernoulli(\theta_s) \] We used a Beta prior for the \(\theta_s\)s whose parameters were a function of the population-level “hyperparameters” \(\omega\) and \(\kappa\). I’ll put a subscript \(\theta\) on these parameters just as a reminder that they are governing the distribution of the \(\theta\)s.

\[ \theta_s \sim Beta(\kappa_\theta\omega_\theta, \kappa_\theta(1-\omega_\theta)) \] Since we wanted to be able to learn about the characteristics of the population as a whole, rather than using fixed values for \(\omega_\theta\) and \(\kappa_\theta\), we put “hyperpriors” on them as well.

For consistency and easier interpretability, I’m going to write the parameters of the hyperprior on \(\omega\) in terms of the mean and “concentration”, rather than \(a\) and \(b\):

\[ \omega_\theta \sim Beta(\gamma_\omega\mu_\omega, \gamma_\omega(1 - \mu_\omega)) \] 2. How would you interpret the parameters \(\mu_\omega\) and \(\gamma_\omega\)? Hint: note the parallels between the prior on \(\omega\) and the prior on \(\theta\).

For the prior on \(\kappa\), we’re using a Gamma distribution, which is a continuous distribution over the positive reals, which is the range we want for \(\kappa\): Recall that in the nonhierarchical model, the sum of the prior parameters \(a\) and \(b\), which is what \(\kappa\) is like here, could be interpreted as the “virtual sample size” of the prior. In other words, it’s the weight the prior carries relative to the data, measured in units of a number of datapoints.

I will again use a modified parameterization for interpretability reasons.

\[ \kappa \sim Gamma(\mu^2_\kappa \gamma_\kappa, \mu_\kappa \gamma_\kappa) \] The conventional first parameter of a Gamma distribution is called the “shape” parameter, and the second is called the “rate” parameter. I’ve written these in terms of the mean \(\mu_\kappa\), and a “precision” parameter \(\gamma_\kappa\). That way \(\mu_\kappa\) expresses directly our expectation about the value of \(\kappa\), and \(\gamma_\kappa\) expresses (more or less) our prior confidence in that estimate.

Let’s try to modify the bernoulli.stan program in order to express this model.

The chunk below has eval=FALSE set so that it doesn’t try to run if we Knit, because the hierarchical-bernoulli.stan model file doesn’t exist yet: we need to create it. Once we have the model defined, set eval = TRUE and then try to run the sampler.

The fixed (by us) top level parameters will be passed to Stan as if they were “data”, since Stan is not try to learn these; just treat them as constants. We will talk more later about “sanity checks” when selecting these parameters, but we’ll just use these for now.

## We can define our top level hyperparameters here:
mu_omega    = 0.5
gamma_omega = 10
mu_kappa    = 10
gamma_kappa = 0.1
hierarchical_model <- stan(
  file = "hierarchical-bernoulli.stan", 
  model_name = "hierarchical_bernoulli", 
  data = c(
    hierarchical_data, 
    N = 20, 
    S = 2, 
    mu_omega    = mu_omega,
    gamma_omega = gamma_omega,
    mu_kappa    = mu_kappa,
    gamma_kappa = gamma_kappa))

Let’s examine some basic plots and summary output. Set eval = TRUE once the sampler has run successfully and then run the chunks.

plot(hierarchical_model, par = c("theta[1]", "theta[2]", "omega_theta"))

plot(hierarchical_model, pars = "kappa_theta")

print(hierarchical_model)

pairs(
  hierarchical_model, 
  pars = c("theta[1]", "theta[2]", "omega_theta", "kappa_theta"))

STAT 237: Lab 6b – Hierarchical Models in Stan

Colin Dawson

April 15, 2022

Getting Set Up

The Data