- Multilevel models from a frequentist perspective
- The Bayesian approach
- Markov Chain Monte Carlo (MCMC)
- Examples
GDR Vision - Paris 4/10/2018
Matrix notation:
\[ \bf{Y} = \bf{X}\beta + \epsilon \] \[ \left( \begin{array}{c} y_1 \\ \vdots \\ y_m \end{array} \right) = \left( \begin{array}{cccc} 1 & x_{11} & \ldots & x_{1n}\\ \vdots & \vdots & \vdots & \vdots\\ 1 & x_{m1} & \ldots & x_{mn} \end{array} \right) \left( \begin{array}{c} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_n \end{array} \right) + \left( \begin{array}{c} \epsilon_1 \\ \vdots \\ \epsilon_m \end{array} \right) \]
Matrix multiplication: \[ \left( \begin{array}{cc} a & b \\ c& d \end{array} \right) \left( \begin{array}{c} 1 \\ 2 \end{array} \right) = 1 \left( \begin{array}{c} a \\ c \end{array} \right) + 2 \left( \begin{array}{c} b \\ d \end{array} \right) = \left( \begin{array}{c} a + 2b \\ c+ 2d \end{array} \right) \]
General formulation in matrix notation \[ \textbf{Y} = \textbf{X}\beta + \textbf{Z}b + \epsilon \] where \(\textbf{X}\) and \(\textbf{Z}\) are the known fixed-effects and random-effects regressor matrices.
The components of the residual error vector \(\epsilon \sim \mathcal{N}\left( 0, \sigma^2 \right)\) are assumed to be i.i.d. (independent and identically distributed).
The random-effect components, \(b \sim \mathcal{N}\left( 0, \Omega \right)\) are assumed to be normally distributed with mean 0, however they are not necessarily independent (the components \(b_j\) can be correlated, and correlations can be estimated). Example:\[ \left[ {\begin{array}{c} {{b_0}}\\ {{b_1}} \end{array}} \right] \sim\cal N \left( {\left[ {\begin{array}{c} 0 \\ 0 \end{array}} \right],\Omega = \left[ {\begin{array}{c} {{\mathop{\rm Var}} \left( b_0 \right)} & {{\mathop{\rm cov}} \left( {{b_0},{b_1}} \right)}\\ {{\mathop{\rm cov}} \left( {{b_0},{b_1}} \right)} & {{\mathop{\rm Var}} \left( b_1 \right)} \end{array}} \right]} \right) \]
Parameters (\(\beta\), \(\sigma^2\) and \(\Omega\)) are estimated by maximizing the likelihood function, which is the probability of the data, given the parameters (but treated as a function of the parameters, keeping the data fixed).
The probability of the data conditional to the random effects is integrated with respect to the marginal density of the random effects, to obtain the marginal density of the data \[ L\left(\beta,\sigma^2,\Omega \mid \text{data}\right) = \int p \left(\text{data} \mid \beta,\sigma^2, b\right) \, p\left(b \mid \Omega\right) \, db \]
The \(\textsf{R}\) library lme4
provides two methods for estimating parameters: Maximum Likelihood (ML) and Restricted Maximum Likelihood (REML).
ML tend to be biased and underestimate the variance components (e.g. \(\Omega\)).
REML provide less biased variance estimates: conceptually can be seen as similar to Bessel's correction for sample variance (using \(n-1\) instead of \(n\) in the denominator).
sleepstudy
sleepstudy
is a dataset in the lme4
package, with reaction times data from 18 subjects that were restricted to 3 hours of sleep for 10 days.
Questions:
str(sleepstudy) 'data.frame': 180 obs. of 3 variables: $ Reaction: num 250 259 251 321 357 ... $ Days : num 0 1 2 3 4 5 6 7 8 9 ... $ Subject : Factor w/ 18 levels "308","309","310",..: 1 1 1 1 1 1 1 1 1 1 ...
lme4
m.1 <- lmer(Reaction ~ 1 + Days + (1 + Days | Subject), sleepstudy)
summary(m.1) Linear mixed model fit by REML ['lmerMod'] Formula: Reaction ~ 1 + Days + (1 + Days | Subject) Data: sleepstudy REML criterion at convergence: 1743.6 Scaled residuals: Min 1Q Median 3Q Max -3.9536 -0.4634 0.0231 0.4634 5.1793 Random effects: Groups Name Variance Std.Dev. Corr Subject (Intercept) 612.09 24.740 Days 35.07 5.922 0.07 Residual 654.94 25.592 Number of obs: 180, groups: Subject, 18 Fixed effects: Estimate Std. Error t value (Intercept) 251.405 6.825 36.84 Days 10.467 1.546 6.77 Correlation of Fixed Effects: (Intr) Days -0.138
The function confint()
allow easily to compute bootstrapped 95% CI of the parameters
CI_fixef <- confint(m.1, method = "boot", nsim = 500, oldNames = F)
print(CI_fixef, digits = 2) 2.5 % 97.5 % sd_(Intercept)|Subject 12.81 35.63 cor_Days.(Intercept)|Subject -0.53 0.81 sd_Days|Subject 3.57 8.32 sigma 22.61 28.44 (Intercept) 237.02 264.76 Days 7.54 13.20
Days
# m.1 <- lmer(Reaction ~ 1 + Days + (1 + Days|Subject), # sleepstudy) m.2 <- lmer(Reaction ~ 1 + Days + (1 | Subject), sleepstudy) anova(m.1, m.2) refitting model(s) with ML (instead of REML) Data: sleepstudy Models: m.2: Reaction ~ 1 + Days + (1 | Subject) m.1: Reaction ~ 1 + Days + (1 + Days | Subject) Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) m.2 4 1802.1 1814.8 -897.04 1794.1 m.1 6 1763.9 1783.1 -875.97 1751.9 42.139 2 7.072e-10 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Shrinkage: the predicted \({\hat b}_j\) (conditional modes of the random effects) can be seen as a "compromise" between the within-subject estimates and the population mean.
Say you want to sample from a density \(P({\bf x})\), but you can only evaluate a function \(f({\bf x})\) that is proportional to the density \(P({\bf x})\):
Let use the sampling algorithm to infer values from one participant of the dataset sleepstudy
.
First, code one function to compute the (log)likelihood, keeping the data fixed.
x <- sleepstudy$Days[sleepstudy$Subject == "308"] y <- sleepstudy$Reaction[sleepstudy$Subject == "308"] loglik <- function(par) { # par = [intercept, slope, log(residual Std.)] pred <- par[1] + par[2] * x return(sum(dnorm(y, mean = pred, sd = exp(par[3]), log = T))) }
Next, choose your priors
Next, choose your priors
Write a function that compute the log of the joint prior density for a combination of parameters
logprior <- function(par) { intercept_prior <- dnorm(par[1], mean = 250, sd = 180, log = T) slope_prior <- dnorm(par[2], mean = 20, sd = 20, log = T) sd_prior <- dnorm(par[3], mean = 4, sd = 1, log = T) return(intercept_prior + slope_prior + sd_prior) }
Put together likelihood and prior to obtain the (log) posterior density
logposterior <- function(par) { return(loglik(par) + logprior(par)) }
Choose the arbitrary starting point and the proposal density \(g({\bf x}_{n+1}|{\bf x}_n)\)
# initial parameters startvalue <- c(250, 20, 5) # proposal density proposalfunction <- function(par) { return(rnorm(3, mean = par, sd = c(15, 5, 0.2))) }
This function execute iteratively the step we have seen before
run_metropolis_MCMC <- function(startvalue, iterations) { chain <- array(dim = c(iterations + 1, 3)) chain[1, ] <- startvalue for (i in 1:iterations) { # draw a random proposal proposal <- proposalfunction(chain[i, ]) # ratio of posterior density between new and old values a <- exp(logposterior(proposal) - logposterior(chain[i, ])) # accept/reject if (runif(1) < a) { chain[i + 1, ] <- proposal } else { chain[i + 1, ] <- chain[i, ] } } return(chain) }
Having the samples from the posterior distribution, summarising uncertainty is simply a matter of counting values. Here I calculate a 95% Bayesian credible interval for slope.
# remove initial 'burn in' samples burnIn <- 5000 slope_samples <- chain[-(1:burnIn), 2] # mean of posterior distribution mean(slope_samples) [1] 21.75772 # 95% Bayesian credible interval alpha <- 0.05 round(c(quantile(slope_samples, probs = alpha/2), quantile(slope_samples, probs = 1 - alpha/2)), digits = 2) 2.5% 97.5% 10.82 33.27
sleepstudy
datasetRStan
, see installation instructions at github.com/stan-dev/rstan/wiki/RStan-Getting-Started)The model can be expressed as \[ y_{j} \sim \text{Normal}\left( \beta_0 + b_{0j} + \left( \beta_1 + b_{1j} \right){\rm{Days}}, \,\, \sigma_{\epsilon}^2 \right)\\ b \sim \text{Normal}\left( 0, \,\, \Omega \right)\\ \]
Priors\[ \beta_0 \sim \text{Normal} \left(0.3,1 \right)\\ \beta_1 \sim \text{Normal} \left(0.01,0.5 \right)\\ \sigma_{\epsilon} \sim \text{HalfCauchy} \left(0, 0.5\right)\\ \]
\[ \begin{align*} \Omega & = \left( \begin{array}{cc} \sigma_0 & 0 \\ 0 & \sigma_1 \end{array} \right) \textbf{R} \left( \begin{array}{cc} \sigma_0 & 0 \\ 0 & \sigma_1 \end{array} \right)\\ & = \left( \begin{array}{cc} \sigma_0 & 0 \\ 0 & \sigma_1 \end{array} \right) \left( \begin{array}{cc} 1 & \rho \\ \rho & 1 \end{array} \right) \left( \begin{array}{cc} \sigma_0 & 0 \\ 0 & \sigma_1 \end{array} \right) \\ & = \left( \begin{array}{cc} \sigma_0 & \rho\sigma_0 \\ \rho\sigma_1 & \sigma_1 \end{array} \right) \left( \begin{array}{cc} \sigma_0 & 0 \\ 0 & \sigma_1 \end{array} \right) \\ & = \left( \begin{array}{cc} \sigma_0^2 & \rho\sigma_0\sigma_1 \\ \rho\sigma_0\sigma_1 & \sigma_1^2 \end{array} \right)\\ \end{align*} \]
.stan
extensiondata
: declaration of all dependent/independent variablesparameters
: declare the free parameters of the modelmodel
: define likelihood function and priorstransformed parameters
: apply any transformation to the parameters, before computing the likelihoodgenerated quantities
: derive any quantities of interest based on data or parametersdata
data { int<lower=1> N; // number of observations real RT[N]; // reaction time (transformed in seconds) int<lower=0,upper=9> Days[N]; // predictor int<lower=1> J; // number of subjects int<lower=1,upper=J> Subject[N]; // subject id }
RStan requires the dataset to be organized as a list object
d_stan <- list(Subject = as.numeric(factor(sleepstudy$Subject, labels = 1:length(unique(sleepstudy$Subject)))), Days = sleepstudy$Days, RT = sleepstudy$Reaction/1000, N = nrow(sleepstudy), J = length(unique(sleepstudy$Subject)))
parameters
parameters { vector[2] beta; // fixed-effects parameters real<lower=0> sigma_e; // residual std vector<lower=0>[2] sigma_u; // random effects standard deviations cholesky_factor_corr[2] L_u; // L_u is the Choleski factor of the correlation matrix matrix[2,J] z_u; // random effect matrix } transformed parameters { matrix[2,J] u; u = diag_pre_multiply(sigma_u, L_u) * z_u; // use Cholesky to set correlation }
For efficiency and numerical stability (see Stan manual) the correlation matrix \(\textbf{R}\) is parametrized by its Cholesky factor \(\textbf{L}\), which can be seen as the square root of \(\textbf{R}\), i.e. \[\textbf{L}\textbf{L}^T=\textbf{R} = \left( \begin{array}{cc} 1 & \rho \\ \rho & 1 \end{array} \right) \] If \(\textbf{Z}_{\text{uncorr}}\) is a 2x\(n\) matrix where the rows are 2 samples from uncorrelated random variables, then \[ \textbf{Z}_{\text{corr}} = \left( \begin{array}{cc} \sigma_0 & 0 \\ 0 & \sigma_1 \end{array} \right) \textbf{L} \, \textbf{Z}_{\text{uncorr}} \] then \(\textbf{Z}_{\text{corr}}\) have the desired variances \(\sigma_0^2,\sigma_1^2\) and correlation \(\rho\). (\(\textbf{L}\) is lower-triangular Cholesky factor.)
Why does that work? Say the the two uncorrelated samples have zero mean and unit variance, then the covariance matrix is identity matrix, \(\mathbb{E}\left( {Z{Z^T}} \right) - \mathbb{E}\left( Z \right) \mathbb{E}\left(Z \right)^T = \mathbb{E} \left( Z Z^T \right) = I\), and is identical to the correlation matrix.
Say the desired correlation matrix is \(\textbf{R}\); since it is symmetric and positive defined it is possible to obtain the Cholesky factorization \(L{L^T} = \textbf{R}\).
We transform the samples with \(X=LZ\), we have that that the new covariance (correlation) matrix is \[ \begin{align} \mathbb{E} \left(XX^T\right) &= \mathbb{E} \left((LZ)(LZ)^T \right) \\ &= \mathbb{E} \left(LZ Z^T L^T\right) \\ &= L \mathbb{E} \left(ZZ^T \right) L^T \\ &= LIL^T = LL^T = \textbf{R} \\ \end{align} \]
The third step is justified because the expected value is a linear operator, therefore \(\mathbb{E}(cX) = c\mathbb{E}(X)\). Also \((AB)^T = B^T A^T\), the order of the factor reverses.
model
model { real mu; // conditional mean of the dependent variable //priors L_u ~ lkj_corr_cholesky(2); // LKJ prior for the correlation matrix to_vector(z_u) ~ normal(0,1); // before Cholesky, random effects are normal variates with SD=1 sigma_u ~ cauchy(0, 0.5); // SD of random effects (vectorized) sigma_e ~ cauchy(0, 0.5); // prior for residual standard deviation beta[1] ~ normal(0.3, 1); // prior for fixed-effect intercept beta[2] ~ normal(0.01, 0.5); // prior for fixed-effect slope //likelihood for (i in 1:N){ mu = beta[1] + u[1,Subject[i]] + (beta[2] + u[2,Subject[i]])*Days[i]; RT[i] ~ normal(mu, sigma_e); } }
library(rstan) options(mc.cores = parallel::detectCores()) # indicate stan to use multiple cores if available sleep_model <- stan(file = "sleep_model_v2.stan", data = d_stan, iter = 2000, chains = 4)
traceplot(sleep_model, pars = c("beta", "sigma_e", "sigma_u"), inc_warmup = F)
print(sleep_model, pars = c("beta", "sigma_e"), probs = c(0.025, 0.975), digits = 3) Inference for Stan model: sleep_model_v2. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 2.5% 97.5% n_eff Rhat beta[1] 0.251 0 0.008 0.236 0.267 2545 1 beta[2] 0.010 0 0.002 0.007 0.014 2409 1 sigma_e 0.026 0 0.002 0.023 0.029 4000 1 Samples were drawn using NUTS(diag_e) at Sun Sep 23 14:37:56 2018. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
Posterior distribution of fixed-effects parameters.
pairs(sleep_model, pars = c("beta", "sigma_e", "sigma_u"))
Examine random effects correlation.
# Use the L matrices to compute the correlation matrices L_u <- extract(sleep_model, pars = "L_u")$L_u cor_u <- apply(L_u, 1, function(x) tcrossprod(x)[1, 2]) print(signif(quantile(cor_u, probs = c(0.025, 0.5, 0.975)))) 2.5% 50% 97.5% -0.4676860 0.0597705 0.5986940
generated quantities { real y_rep[N]; matrix[2,J] u_rep; // matrix of simulated random effects for (j in 1:J) { u_rep[1:2,j] = multi_normal_cholesky_rng(rep_vector(0, 2), diag_matrix(sigma_u) * (L_u)); } for (i in 1:N){ // simulate the model y_rep[i] = normal_rng(beta[1] + u_rep[1,Subject[i]] + (beta[2] + u_rep[2,Subject[i]])*Days[i], sigma_e); } }
Comparison with lme4
random effects estimates.
# extract samples samples_rf0 <- extract(sleep_model, pars = "u")$u[, 1, ] samples_rf1 <- extract(sleep_model, pars = "u")$u[, 2, ] # compute posterior mean rf_int <- apply(samples_rf0, 2, mean) rf_slp <- apply(samples_rf1, 2, mean) par(mfrow = c(1, 2)) # plot plot(rf_int, ranef(m.1)$Subject[, 1], pch = 19, xlab = "Stan", ylab = "lme4", main = "Intercept") plot(rf_slp, ranef(m.1)$Subject[, 2], pch = 19, xlab = "Stan", ylab = "lme4", main = "Slope")
rstanarm
rstanarm
automatically prepare Stan code from a model specified in standard R syntax, e.g.library(rstanarm) sleepstudy$Reaction_s <- sleepstudy$Reaction/1000 sleep_model_2 <- stan_lmer(Reaction_s ~ Days + (Days | Subject), data = sleepstudy, cores = 2, iter = 2000, chain = 4)
rstanarm
prior_summary()
to examine the default)bmrs
(Bürkner, 2017)summary(sleep_model_2, pars = c("(Intercept)", "Days"), digits = 3) Model Info: function: stan_lmer family: gaussian [identity] formula: Reaction_s ~ Days + (Days | Subject) algorithm: sampling priors: see help('prior_summary') sample: 4000 (posterior sample size) observations: 180 groups: Subject (18) Estimates: mean sd 2.5% 25% 50% 75% 97.5% (Intercept) 0.251 0.007 0.237 0.247 0.251 0.256 0.265 Days 0.010 0.002 0.007 0.009 0.010 0.011 0.014 Diagnostics: mcse Rhat n_eff (Intercept) 0.000 1.000 1790 Days 0.000 1.004 1364 For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
lme4
), but it is important to check that sampling was succesfull (e.g. \(\hat R\), …)dat <- list(y = c(-1, 1)) stan_code <- " data{ real y[2]; } parameters{ real mu; real<lower=0> sigma; } model{ y ~ normal(mu, sigma); } " m_ <- stan(model_code = stan_code, data = dat, iter = 2000, chains = 4)
rstan::traceplot(m_, pars = c("mu", "sigma"), inc_warmup = F)
dat <- list(y = c(-1, 1)) stan_code <- " data{ real y[2]; } parameters{ real mu; real<lower=0> sigma; } model{ mu ~ normal(1, 10); sigma ~ cauchy(0, 1); y ~ normal(mu, sigma); } " m_ <- stan(model_code = stan_code, data = dat, iter = 2000, chains = 4)
rstan::traceplot(m_, pars = c("mu", "sigma"), inc_warmup = F)
dat <- list(y = rnorm(100, mean = 0, sd = 1)) stan_code <- " data{ real y[100]; } parameters{ vector[2] alpha; real<lower=0> sigma; } model{ real mu; mu = alpha[1] + alpha[2]; y ~ normal(mu, sigma); } " m_ <- stan(model_code = stan_code, data = dat, iter = 4000, chains = 2)
rstan::traceplot(m_, pars = c("alpha", "sigma"), inc_warmup = F)
dat <- list(y = rnorm(100, mean = 0, sd = 1)) stan_code <- " data{ real y[100]; } parameters{ vector[2] alpha; real<lower=0> sigma; } model{ real mu; alpha ~ normal(0, 10); mu = alpha[1] + alpha[2]; y ~ normal(mu, sigma); } " m_ <- stan(model_code = stan_code, data = dat, iter = 4000, chains = 2)
rstan::traceplot(m_, pars = c("alpha", "sigma"), inc_warmup = F)
Often psychometric function are fit to individual data and group analysis is done on the estimated parameters. This approach neglects subject-specific standard errors and imbalances in the number of repetitions/trials.
Inferences can be improved by adopting a (generalized linear) multilevel approach (GLMM) \[ p \left(y \mid b\right) = \Phi \left( \textbf{X} \beta + \textbf{Z} b \right) \\ b \sim \mathcal{N}\left( 0, \Omega \right) \]
glmer
function of the lme4
library. However it is not a realistic model: sometimes subjects push buttons at random and make stimulus-independent errors (lapses)We can extend this model with an extra parameter \(\lambda\) indicating the probability of lapsing\[ p \left(y \mid b\right) = \lambda+\left( 1-2\lambda \right) \times \Phi \left( \textbf{X} \beta + \textbf{Z} b \right) \\ b \sim \mathcal{N}\left( 0, \Omega \right)\\ \]
d <- read.table("bisection2.txt", sep = "\t", header = T) str(d) 'data.frame': 4593 obs. of 5 variables: $ id : int 4 4 4 4 4 4 4 4 4 4 ... $ dx : num 0.04 -0.04 -0.53 -0.53 -0.36 0.44 0.53 0.69 0.93 -1.09 ... $ rr : int 1 1 0 1 0 1 0 1 0 0 ... $ acc : int 1 0 1 0 1 1 0 1 0 1 ... $ cond: int 1 2 1 2 2 1 2 2 2 2 ...
Declare all the variables in the data
block.
data { int<lower=1> N; int<lower=0,upper=1> rr[N]; real ds[N]; int<lower=0,upper=1> cond[N]; int<lower=1> J; int<lower=1,upper=J> id[N]; }
Define the free parameters in the parameters
block
parameters { vector[4] beta; // fixed-effects parameters vector<lower=0,upper=1>[J] lambda; // lapse rate vector<lower=0>[2] beta_ab; // population parameters of lapse rate vector<lower=0>[4] sigma_u; // random effects standard deviations cholesky_factor_corr[4] L_u; // L_u is the Choleski factor of the correlation matrix matrix[4,J] z_u; // random effect matrix }
In the transformed parameters
block apply any transformations needed to compute the likelihood.
transformed parameters { matrix[4,J] u; u = diag_pre_multiply(sigma_u, L_u) * z_u; // use Cholesky to set correlation }
Define priors and likelihood in the model
block.
model { real mu; // linear predictor //priors L_u ~ lkj_corr_cholesky(2); // LKJ prior for the correlation matrix to_vector(z_u) ~ normal(0,1); // before Cholesky, random eff. are normal variates with SD=1 sigma_u ~ cauchy(0, 1); // SD of random effects (vectorized) beta[1] ~ normal(0, 2); // prior for intercept (cond 1) beta[2] ~ normal(0, 4); // prior for slope (cond 1) beta[3] ~ normal(0, 1); // prior for diff. in intercept (cond 1) beta[4] ~ normal(0, 2); // prior for diff. in slope (cond 1) beta_ab[1] ~ gamma(4,1); // hyper priors for lapse rate distribution beta_ab[2] ~ gamma(1,0.2); lambda ~ beta(beta_ab[1] ,beta_ab[2]); //lapse rate //likelihood for (i in 1:N){ mu = beta[1] + u[1,id[i]] + cond[i]*(beta[2]+u[2,id[i]]) + (beta[3] + u[3,id[i]] + cond[i]*(beta[4] + u[4,id[i]] ))*ds[i]; rr[i] ~ bernoulli((1-lambda[id[i]])*Phi(mu)+lambda[id[i]]/2); } }
Prepare the data as a list for Stan.
d_stan <- list(id = as.numeric(factor(d$id, labels = 1:length(unique(d$id)))), ds = d$dx, rr = d$rr, N = nrow(d), J = length(unique(d$id)), cond = d$cond - 1) str(d_stan) List of 6 $ id : num [1:4593] 1 1 1 1 1 1 1 1 1 1 ... $ ds : num [1:4593] 0.04 -0.04 -0.53 -0.53 -0.36 0.44 0.53 0.69 0.93 -1.09 ... $ rr : int [1:4593] 1 1 0 1 0 1 0 1 0 0 ... $ N : int 4593 $ J : int 13 $ cond: num [1:4593] 0 1 0 1 1 0 1 1 1 1 ...
Run sampling
bisection_model <- stan(file = "bisection_stan_v1.stan", data = d_stan, iter = 4000, chains = 4)
Check the convergence by visualising the chains
traceplot(bisection_model, pars = c("beta", "sigma_u", "beta_ab"), inc_warmup = F)
Visualize posterior distribution
pairs(bisection_model, pars = c("beta", "sigma_u"))
print(bisection_model, pars = c("beta", "sigma_u", "beta_ab"), probs = c(0.025, 0.975), digits = 3) Inference for Stan model: bisection_stan_v1. 4 chains, each with iter=4000; warmup=2000; thin=1; post-warmup draws per chain=2000, total post-warmup draws=8000. mean se_mean sd 2.5% 97.5% n_eff Rhat beta[1] -0.577 0.007 0.329 -1.231 0.068 2512 1.001 beta[2] 0.430 0.004 0.216 0.001 0.865 3396 1.001 beta[3] 1.661 0.005 0.247 1.194 2.156 2596 1.000 beta[4] -0.946 0.003 0.177 -1.312 -0.625 2926 1.000 sigma_u[1] 1.152 0.004 0.251 0.767 1.725 3708 1.000 sigma_u[2] 0.689 0.003 0.168 0.421 1.081 4004 1.000 sigma_u[3] 0.705 0.003 0.170 0.434 1.099 2604 1.001 sigma_u[4] 0.440 0.003 0.132 0.214 0.740 2404 1.001 beta_ab[1] 1.233 0.011 0.567 0.435 2.594 2458 1.000 beta_ab[2] 9.303 0.082 4.554 3.029 20.679 3110 1.000 Samples were drawn using NUTS(diag_e) at Thu Sep 27 09:44:39 2018. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
Dedicated packages (e.g. the package bayesplot
) makes it easier to summarize the results.
library(bayesplot) posterior2 <- extract(bisection_model) mcmc_intervals(posterior2, regex_pars = c("beta\\[[1-4]", "sigma_u\\[[1-4]"), prob = 0.8, prob_outer = 0.95, point_est = "mean")
Dedicated packages (e.g. the package bayesplot
) makes it easier to summarize the results.
mcmc_areas(posterior2, regex_pars = c("beta\\[[1-4]", "sigma_u\\[[1-4]"), prob = 0.8, point_est = "mean")
Fixed-effects are parametrised according to standard GLM formulation (in terms of \(\beta_0, \beta_1, \dots\)). These can be easily transformed in the psychometric functions parameters \(\mu\) and \(\sigma\).
beta_par <- extract(bisection_model, pars = "beta")$beta jnd_1 <- 1/beta_par[, 3] jnd_2 <- 1/(beta_par[, 3] + beta_par[, 4]) jnd_diff <- jnd_2 - jnd_1 # posterior JND differences distribution
Calculate Highest Posterior Density (HPD) intervals (the narrowest interval containing the specified probability mass, e.g. 0.95).
jnd_ci_1 <- HPDinterval(as.mcmc(jnd_1), prob = 0.95) jnd_ci_2 <- HPDinterval(as.mcmc(jnd_2), prob = 0.95) jnd_ci_diff <- HPDinterval(as.mcmc(jnd_diff), prob = 0.95)
Plot
Hmisc::errbar(c("sigma 1", "sigma 2", "difference"), c(mean(jnd_1), mean(jnd_2), mean(jnd_diff)), yplus = c(jnd_ci_1[2], jnd_ci_2[2], jnd_ci_diff[2]), yminus = c(jnd_ci_1[1], jnd_ci_2[1], jnd_ci_diff[1]))
d <- read.table("cueingdd.txt", header = T, sep = "\t") d$alpha_rad <- d$alpha_deg/180 * pi d$rerr <- d$dir * angDiff(d$alpha_rad, d$resp) # transform in signed angular differences str(d) 'data.frame': 4608 obs. of 8 variables: $ id : Factor w/ 8 levels "sj01","sj02",..: 1 1 1 1 1 1 1 1 1 1 ... $ cue : int 2 2 2 2 2 1 2 2 2 2 ... $ alpha_deg: int 220 270 130 180 320 50 330 90 300 80 ... $ location : int 2 4 2 2 3 4 1 4 1 3 ... $ dir : int 1 -1 1 1 1 1 1 1 -1 1 ... $ resp : num -1.77 -0.35 1.84 -2.12 -0.38 1.22 0.37 1.57 -1.96 1.87 ... $ rerr : num 0.673 -1.221 -0.429 1.022 0.318 ... $ alpha_rad: num 3.84 4.71 2.27 3.14 5.59 ...
data
block.data { int<lower=1> N; real<lower=-pi(),upper=pi()> rerr[N]; int<lower=1,upper=2> cue[N]; int<lower=1> J; int<lower=1,upper=J> id[N]; }
parameter
block.parameters { // group-level vector[2] mu; // bias vector[2] logk; // k (log scale) vector[2] logilambda; // mixing parameter (probit scale) // subject specific vector<lower=0>[6] sigma_u; // random effects standard deviations cholesky_factor_corr[6] L_u; // L_u is the Choleski factor matrix[6,J] z_u; // random effect matrix }
transformed parameters
block: here the parameters are transformed as required for the computation of the likelihood (to ensure that \(k>0\) and \(1<\lambda < 1\)).transformed parameters { matrix<lower=0,upper=1>[2,J] lambda_j; matrix<lower=0>[2,J] kappa_j; matrix[2,J] mu_j; matrix[6,J] u; u = diag_pre_multiply(sigma_u, L_u) * z_u; for (j in 1:J){ mu_j[1,j]= mu[1] + u[1,j]; mu_j[2,j]= mu[2] + u[2,j]; kappa_j[1,j]= exp(logk[1] + u[3,j]); kappa_j[2,j]= exp(logk[2] + u[4,j]); lambda_j[1,j]= inv_logit(logilambda[1] + u[5,j]); lambda_j[2,j]= inv_logit(logilambda[2] + u[6,j]); } }
model
block.model { // priors L_u ~ lkj_corr_cholesky(2); to_vector(z_u) ~ normal(0,1); sigma_u ~ normal(0, 10); mu ~ normal(0.5, 10); logk ~ normal(2, 20); logilambda ~ normal(-4, 30); // likelihood for (n in 1:N){ if (kappa_j[cue[n],id[n]] < 100) target += log_mix(lambda_j[cue[n],id[n]], uniform_lpdf(rerr[n] | -pi(), pi()), von_mises_lpdf(rerr[n] | mu_j[cue[n],id[n]], kappa_j[cue[n],id[n]])); else target += log_mix(lambda_j[cue[n],id[n]], uniform_lpdf(rerr[n] | -pi(), pi()), normal_lpdf(rerr[n] | mu_j[cue[n],id[n]], sqrt(1/kappa_j[cue[n],id[n]]))); } }
mu1 <- extract(mix_model, pars = "mu[1]", inc_warmup = F, permuted = FALSE) mu2 <- extract(mix_model, pars = "mu[2]", inc_warmup = F, permuted = FALSE) mu1 <- c(mu1[, 1, 1], mu1[, 2, 1])/pi * 180 mu2 <- c(mu2[, 1, 1], mu2[, 2, 1])/pi * 180 HPDinterval(as.mcmc(mu2 - mu1), prob = 0.95) lower upper var1 -2.806921 12.10085 attr(,"Probability") [1] 0.95
k1 <- extract(mix_model, pars = "logk[1]", inc_warmup = F, permuted = FALSE) k2 <- extract(mix_model, pars = "logk[2]", inc_warmup = F, permuted = FALSE) # transform concentration parameter k into a standard # deviation k2SD <- function(k) sqrt(-2 * log(besselI(k, nu = 1)/besselI(k, nu = 0))) sd1 <- k2SD(exp(c(k1[, 1, 1], k1[, 2, 1])))/pi * 180 sd2 <- k2SD(exp(c(k2[, 1, 1], k2[, 2, 1])))/pi * 180 HPDinterval(as.mcmc((sd2 - sd1), prob = 0.95)) lower upper var1 1.98266 10.74307 attr(,"Probability") [1] 0.95