- Multilevel models from a frequentist perspective
- The Bayesian approach
- Markov Chain Monte Carlo (MCMC)
- Examples
GDR Vision - Paris 4/10/2018
Matrix notation:
\[ \bf{Y} = \bf{X}\beta + \epsilon \] \[ \left( \begin{array}{c} y_1 \\ \vdots \\ y_m \end{array} \right) = \left( \begin{array}{cccc} 1 & x_{11} & \ldots & x_{1n}\\ \vdots & \vdots & \vdots & \vdots\\ 1 & x_{m1} & \ldots & x_{mn} \end{array} \right) \left( \begin{array}{c} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_n \end{array} \right) + \left( \begin{array}{c} \epsilon_1 \\ \vdots \\ \epsilon_m \end{array} \right) \]
Matrix multiplication: \[ \left( \begin{array}{cc} a & b \\ c& d \end{array} \right) \left( \begin{array}{c} 1 \\ 2 \end{array} \right) = 1 \left( \begin{array}{c} a \\ c \end{array} \right) + 2 \left( \begin{array}{c} b \\ d \end{array} \right) = \left( \begin{array}{c} a + 2b \\ c+ 2d \end{array} \right) \]
General formulation in matrix notation \[ \textbf{Y} = \textbf{X}\beta + \textbf{Z}b + \epsilon \] where \(\textbf{X}\) and \(\textbf{Z}\) are the known fixed-effects and random-effects regressor matrices.
The components of the residual error vector \(\epsilon \sim \mathcal{N}\left( 0, \sigma^2 \right)\) are assumed to be i.i.d. (independent and identically distributed).
The random-effect components, \(b \sim \mathcal{N}\left( 0, \Omega \right)\) are assumed to be normally distributed with mean 0, however they are not necessarily independent (the components \(b_j\) can be correlated, and correlations can be estimated). Example:\[ \left[ {\begin{array}{c} {{b_0}}\\ {{b_1}} \end{array}} \right] \sim\cal N \left( {\left[ {\begin{array}{c} 0 \\ 0 \end{array}} \right],\Omega = \left[ {\begin{array}{c} {{\mathop{\rm Var}} \left( b_0 \right)} & {{\mathop{\rm cov}} \left( {{b_0},{b_1}} \right)}\\ {{\mathop{\rm cov}} \left( {{b_0},{b_1}} \right)} & {{\mathop{\rm Var}} \left( b_1 \right)} \end{array}} \right]} \right) \]
Parameters (\(\beta\), \(\sigma^2\) and \(\Omega\)) are estimated by maximizing the likelihood function, which is the probability of the data, given the parameters (but treated as a function of the parameters, keeping the data fixed).
The probability of the data conditional to the random effects is integrated with respect to the marginal density of the random effects, to obtain the marginal density of the data \[ L\left(\beta,\sigma^2,\Omega \mid \text{data}\right) = \int p \left(\text{data} \mid \beta,\sigma^2, b\right) \, p\left(b \mid \Omega\right) \, db \]
The \(\textsf{R}\) library lme4 provides two methods for estimating parameters: Maximum Likelihood (ML) and Restricted Maximum Likelihood (REML).
ML tend to be biased and underestimate the variance components (e.g. \(\Omega\)).
REML provide less biased variance estimates: conceptually can be seen as similar to Bessel's correction for sample variance (using \(n-1\) instead of \(n\) in the denominator).
sleepstudysleepstudy is a dataset in the lme4 package, with reaction times data from 18 subjects that were restricted to 3 hours of sleep for 10 days.
Questions:
str(sleepstudy) 'data.frame': 180 obs. of 3 variables: $ Reaction: num 250 259 251 321 357 ... $ Days : num 0 1 2 3 4 5 6 7 8 9 ... $ Subject : Factor w/ 18 levels "308","309","310",..: 1 1 1 1 1 1 1 1 1 1 ...
lme4m.1 <- lmer(Reaction ~ 1 + Days + (1 + Days | Subject),
sleepstudy)
summary(m.1)
Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ 1 + Days + (1 + Days | Subject)
Data: sleepstudy
REML criterion at convergence: 1743.6
Scaled residuals:
Min 1Q Median 3Q Max
-3.9536 -0.4634 0.0231 0.4634 5.1793
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 612.09 24.740
Days 35.07 5.922 0.07
Residual 654.94 25.592
Number of obs: 180, groups: Subject, 18
Fixed effects:
Estimate Std. Error t value
(Intercept) 251.405 6.825 36.84
Days 10.467 1.546 6.77
Correlation of Fixed Effects:
(Intr)
Days -0.138
The function confint() allow easily to compute bootstrapped 95% CI of the parameters
CI_fixef <- confint(m.1, method = "boot", nsim = 500, oldNames = F)
print(CI_fixef, digits = 2)
2.5 % 97.5 %
sd_(Intercept)|Subject 12.81 35.63
cor_Days.(Intercept)|Subject -0.53 0.81
sd_Days|Subject 3.57 8.32
sigma 22.61 28.44
(Intercept) 237.02 264.76
Days 7.54 13.20
Days# m.1 <- lmer(Reaction ~ 1 + Days + (1 + Days|Subject),
# sleepstudy)
m.2 <- lmer(Reaction ~ 1 + Days + (1 | Subject), sleepstudy)
anova(m.1, m.2)
refitting model(s) with ML (instead of REML)
Data: sleepstudy
Models:
m.2: Reaction ~ 1 + Days + (1 | Subject)
m.1: Reaction ~ 1 + Days + (1 + Days | Subject)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m.2 4 1802.1 1814.8 -897.04 1794.1
m.1 6 1763.9 1783.1 -875.97 1751.9 42.139 2 7.072e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Shrinkage: the predicted \({\hat b}_j\) (conditional modes of the random effects) can be seen as a "compromise" between the within-subject estimates and the population mean.
Say you want to sample from a density \(P({\bf x})\), but you can only evaluate a function \(f({\bf x})\) that is proportional to the density \(P({\bf x})\):
Let use the sampling algorithm to infer values from one participant of the dataset sleepstudy.
First, code one function to compute the (log)likelihood, keeping the data fixed.
x <- sleepstudy$Days[sleepstudy$Subject == "308"]
y <- sleepstudy$Reaction[sleepstudy$Subject == "308"]
loglik <- function(par) {
# par = [intercept, slope, log(residual Std.)]
pred <- par[1] + par[2] * x
return(sum(dnorm(y, mean = pred, sd = exp(par[3]), log = T)))
}
Next, choose your priors
Next, choose your priors
Write a function that compute the log of the joint prior density for a combination of parameters
logprior <- function(par) {
intercept_prior <- dnorm(par[1], mean = 250, sd = 180,
log = T)
slope_prior <- dnorm(par[2], mean = 20, sd = 20, log = T)
sd_prior <- dnorm(par[3], mean = 4, sd = 1, log = T)
return(intercept_prior + slope_prior + sd_prior)
}
Put together likelihood and prior to obtain the (log) posterior density
logposterior <- function(par) {
return(loglik(par) + logprior(par))
}
Choose the arbitrary starting point and the proposal density \(g({\bf x}_{n+1}|{\bf x}_n)\)
# initial parameters
startvalue <- c(250, 20, 5)
# proposal density
proposalfunction <- function(par) {
return(rnorm(3, mean = par, sd = c(15, 5, 0.2)))
}
This function execute iteratively the step we have seen before
run_metropolis_MCMC <- function(startvalue, iterations) {
chain <- array(dim = c(iterations + 1, 3))
chain[1, ] <- startvalue
for (i in 1:iterations) {
# draw a random proposal
proposal <- proposalfunction(chain[i, ])
# ratio of posterior density between new and old values
a <- exp(logposterior(proposal) - logposterior(chain[i,
]))
# accept/reject
if (runif(1) < a) {
chain[i + 1, ] <- proposal
} else {
chain[i + 1, ] <- chain[i, ]
}
}
return(chain)
}
Having the samples from the posterior distribution, summarising uncertainty is simply a matter of counting values. Here I calculate a 95% Bayesian credible interval for slope.
# remove initial 'burn in' samples
burnIn <- 5000
slope_samples <- chain[-(1:burnIn), 2]
# mean of posterior distribution
mean(slope_samples)
[1] 21.75772
# 95% Bayesian credible interval
alpha <- 0.05
round(c(quantile(slope_samples, probs = alpha/2), quantile(slope_samples,
probs = 1 - alpha/2)), digits = 2)
2.5% 97.5%
10.82 33.27
sleepstudy datasetRStan, see installation instructions at github.com/stan-dev/rstan/wiki/RStan-Getting-Started)The model can be expressed as \[ y_{j} \sim \text{Normal}\left( \beta_0 + b_{0j} + \left( \beta_1 + b_{1j} \right){\rm{Days}}, \,\, \sigma_{\epsilon}^2 \right)\\ b \sim \text{Normal}\left( 0, \,\, \Omega \right)\\ \]
Priors\[ \beta_0 \sim \text{Normal} \left(0.3,1 \right)\\ \beta_1 \sim \text{Normal} \left(0.01,0.5 \right)\\ \sigma_{\epsilon} \sim \text{HalfCauchy} \left(0, 0.5\right)\\ \]
\[ \begin{align*} \Omega & = \left( \begin{array}{cc} \sigma_0 & 0 \\ 0 & \sigma_1 \end{array} \right) \textbf{R} \left( \begin{array}{cc} \sigma_0 & 0 \\ 0 & \sigma_1 \end{array} \right)\\ & = \left( \begin{array}{cc} \sigma_0 & 0 \\ 0 & \sigma_1 \end{array} \right) \left( \begin{array}{cc} 1 & \rho \\ \rho & 1 \end{array} \right) \left( \begin{array}{cc} \sigma_0 & 0 \\ 0 & \sigma_1 \end{array} \right) \\ & = \left( \begin{array}{cc} \sigma_0 & \rho\sigma_0 \\ \rho\sigma_1 & \sigma_1 \end{array} \right) \left( \begin{array}{cc} \sigma_0 & 0 \\ 0 & \sigma_1 \end{array} \right) \\ & = \left( \begin{array}{cc} \sigma_0^2 & \rho\sigma_0\sigma_1 \\ \rho\sigma_0\sigma_1 & \sigma_1^2 \end{array} \right)\\ \end{align*} \]
.stan extensiondata: declaration of all dependent/independent variablesparameters: declare the free parameters of the modelmodel: define likelihood function and priorstransformed parameters: apply any transformation to the parameters, before computing the likelihoodgenerated quantities: derive any quantities of interest based on data or parametersdatadata {
int<lower=1> N; // number of observations
real RT[N]; // reaction time (transformed in seconds)
int<lower=0,upper=9> Days[N]; // predictor
int<lower=1> J; // number of subjects
int<lower=1,upper=J> Subject[N]; // subject id
}
RStan requires the dataset to be organized as a list object
d_stan <- list(Subject = as.numeric(factor(sleepstudy$Subject,
labels = 1:length(unique(sleepstudy$Subject)))), Days = sleepstudy$Days,
RT = sleepstudy$Reaction/1000, N = nrow(sleepstudy),
J = length(unique(sleepstudy$Subject)))
parametersparameters {
vector[2] beta; // fixed-effects parameters
real<lower=0> sigma_e; // residual std
vector<lower=0>[2] sigma_u; // random effects standard deviations
cholesky_factor_corr[2] L_u; // L_u is the Choleski factor of the correlation matrix
matrix[2,J] z_u; // random effect matrix
}
transformed parameters {
matrix[2,J] u;
u = diag_pre_multiply(sigma_u, L_u) * z_u; // use Cholesky to set correlation
}
For efficiency and numerical stability (see Stan manual) the correlation matrix \(\textbf{R}\) is parametrized by its Cholesky factor \(\textbf{L}\), which can be seen as the square root of \(\textbf{R}\), i.e. \[\textbf{L}\textbf{L}^T=\textbf{R} = \left( \begin{array}{cc} 1 & \rho \\ \rho & 1 \end{array} \right) \] If \(\textbf{Z}_{\text{uncorr}}\) is a 2x\(n\) matrix where the rows are 2 samples from uncorrelated random variables, then \[ \textbf{Z}_{\text{corr}} = \left( \begin{array}{cc} \sigma_0 & 0 \\ 0 & \sigma_1 \end{array} \right) \textbf{L} \, \textbf{Z}_{\text{uncorr}} \] then \(\textbf{Z}_{\text{corr}}\) have the desired variances \(\sigma_0^2,\sigma_1^2\) and correlation \(\rho\). (\(\textbf{L}\) is lower-triangular Cholesky factor.)
Why does that work? Say the the two uncorrelated samples have zero mean and unit variance, then the covariance matrix is identity matrix, \(\mathbb{E}\left( {Z{Z^T}} \right) - \mathbb{E}\left( Z \right) \mathbb{E}\left(Z \right)^T = \mathbb{E} \left( Z Z^T \right) = I\), and is identical to the correlation matrix.
Say the desired correlation matrix is \(\textbf{R}\); since it is symmetric and positive defined it is possible to obtain the Cholesky factorization \(L{L^T} = \textbf{R}\).
We transform the samples with \(X=LZ\), we have that that the new covariance (correlation) matrix is \[ \begin{align} \mathbb{E} \left(XX^T\right) &= \mathbb{E} \left((LZ)(LZ)^T \right) \\ &= \mathbb{E} \left(LZ Z^T L^T\right) \\ &= L \mathbb{E} \left(ZZ^T \right) L^T \\ &= LIL^T = LL^T = \textbf{R} \\ \end{align} \]
The third step is justified because the expected value is a linear operator, therefore \(\mathbb{E}(cX) = c\mathbb{E}(X)\). Also \((AB)^T = B^T A^T\), the order of the factor reverses.
modelmodel {
real mu; // conditional mean of the dependent variable
//priors
L_u ~ lkj_corr_cholesky(2); // LKJ prior for the correlation matrix
to_vector(z_u) ~ normal(0,1); // before Cholesky, random effects are normal variates with SD=1
sigma_u ~ cauchy(0, 0.5); // SD of random effects (vectorized)
sigma_e ~ cauchy(0, 0.5); // prior for residual standard deviation
beta[1] ~ normal(0.3, 1); // prior for fixed-effect intercept
beta[2] ~ normal(0.01, 0.5); // prior for fixed-effect slope
//likelihood
for (i in 1:N){
mu = beta[1] + u[1,Subject[i]] + (beta[2] + u[2,Subject[i]])*Days[i];
RT[i] ~ normal(mu, sigma_e);
}
}
library(rstan)
options(mc.cores = parallel::detectCores()) # indicate stan to use multiple cores if available
sleep_model <- stan(file = "sleep_model_v2.stan", data = d_stan,
iter = 2000, chains = 4)
traceplot(sleep_model, pars = c("beta", "sigma_e", "sigma_u"),
inc_warmup = F)
print(sleep_model, pars = c("beta", "sigma_e"), probs = c(0.025,
0.975), digits = 3)
Inference for Stan model: sleep_model_v2.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 97.5% n_eff Rhat
beta[1] 0.251 0 0.008 0.236 0.267 2545 1
beta[2] 0.010 0 0.002 0.007 0.014 2409 1
sigma_e 0.026 0 0.002 0.023 0.029 4000 1
Samples were drawn using NUTS(diag_e) at Sun Sep 23 14:37:56 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Posterior distribution of fixed-effects parameters.
pairs(sleep_model, pars = c("beta", "sigma_e", "sigma_u"))
Examine random effects correlation.
# Use the L matrices to compute the correlation matrices
L_u <- extract(sleep_model, pars = "L_u")$L_u
cor_u <- apply(L_u, 1, function(x) tcrossprod(x)[1, 2])
print(signif(quantile(cor_u, probs = c(0.025, 0.5, 0.975))))
2.5% 50% 97.5%
-0.4676860 0.0597705 0.5986940
generated quantities {
real y_rep[N];
matrix[2,J] u_rep; // matrix of simulated random effects
for (j in 1:J) {
u_rep[1:2,j] = multi_normal_cholesky_rng(rep_vector(0, 2),
diag_matrix(sigma_u) * (L_u));
}
for (i in 1:N){ // simulate the model
y_rep[i] = normal_rng(beta[1] + u_rep[1,Subject[i]] +
(beta[2] + u_rep[2,Subject[i]])*Days[i],
sigma_e);
}
}
Comparison with lme4 random effects estimates.
# extract samples
samples_rf0 <- extract(sleep_model, pars = "u")$u[, 1, ]
samples_rf1 <- extract(sleep_model, pars = "u")$u[, 2, ]
# compute posterior mean
rf_int <- apply(samples_rf0, 2, mean)
rf_slp <- apply(samples_rf1, 2, mean)
par(mfrow = c(1, 2)) # plot
plot(rf_int, ranef(m.1)$Subject[, 1], pch = 19, xlab = "Stan",
ylab = "lme4", main = "Intercept")
plot(rf_slp, ranef(m.1)$Subject[, 2], pch = 19, xlab = "Stan",
ylab = "lme4", main = "Slope")
rstanarmrstanarm automatically prepare Stan code from a model specified in standard R syntax, e.g.library(rstanarm)
sleepstudy$Reaction_s <- sleepstudy$Reaction/1000
sleep_model_2 <- stan_lmer(Reaction_s ~ Days + (Days | Subject),
data = sleepstudy, cores = 2, iter = 2000, chain = 4)
rstanarmprior_summary() to examine the default)bmrs (Bürkner, 2017)summary(sleep_model_2, pars = c("(Intercept)", "Days"),
digits = 3)
Model Info:
function: stan_lmer
family: gaussian [identity]
formula: Reaction_s ~ Days + (Days | Subject)
algorithm: sampling
priors: see help('prior_summary')
sample: 4000 (posterior sample size)
observations: 180
groups: Subject (18)
Estimates:
mean sd 2.5% 25% 50% 75% 97.5%
(Intercept) 0.251 0.007 0.237 0.247 0.251 0.256 0.265
Days 0.010 0.002 0.007 0.009 0.010 0.011 0.014
Diagnostics:
mcse Rhat n_eff
(Intercept) 0.000 1.000 1790
Days 0.000 1.004 1364
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
lme4), but it is important to check that sampling was succesfull (e.g. \(\hat R\), …)dat <- list(y = c(-1, 1))
stan_code <- "
data{
real y[2];
}
parameters{
real mu;
real<lower=0> sigma;
}
model{
y ~ normal(mu, sigma);
}
"
m_ <- stan(model_code = stan_code, data = dat, iter = 2000,
chains = 4)
rstan::traceplot(m_, pars = c("mu", "sigma"), inc_warmup = F)
dat <- list(y = c(-1, 1))
stan_code <- "
data{
real y[2];
}
parameters{
real mu;
real<lower=0> sigma;
}
model{
mu ~ normal(1, 10);
sigma ~ cauchy(0, 1);
y ~ normal(mu, sigma);
}
"
m_ <- stan(model_code = stan_code, data = dat, iter = 2000,
chains = 4)
rstan::traceplot(m_, pars = c("mu", "sigma"), inc_warmup = F)
dat <- list(y = rnorm(100, mean = 0, sd = 1))
stan_code <- "
data{
real y[100];
}
parameters{
vector[2] alpha;
real<lower=0> sigma;
}
model{
real mu;
mu = alpha[1] + alpha[2];
y ~ normal(mu, sigma);
}
"
m_ <- stan(model_code = stan_code, data = dat, iter = 4000,
chains = 2)
rstan::traceplot(m_, pars = c("alpha", "sigma"), inc_warmup = F)
dat <- list(y = rnorm(100, mean = 0, sd = 1))
stan_code <- "
data{
real y[100];
}
parameters{
vector[2] alpha;
real<lower=0> sigma;
}
model{
real mu;
alpha ~ normal(0, 10);
mu = alpha[1] + alpha[2];
y ~ normal(mu, sigma);
}
"
m_ <- stan(model_code = stan_code, data = dat, iter = 4000,
chains = 2)
rstan::traceplot(m_, pars = c("alpha", "sigma"), inc_warmup = F)
Often psychometric function are fit to individual data and group analysis is done on the estimated parameters. This approach neglects subject-specific standard errors and imbalances in the number of repetitions/trials.
Inferences can be improved by adopting a (generalized linear) multilevel approach (GLMM) \[ p \left(y \mid b\right) = \Phi \left( \textbf{X} \beta + \textbf{Z} b \right) \\ b \sim \mathcal{N}\left( 0, \Omega \right) \]
glmer function of the lme4 library. However it is not a realistic model: sometimes subjects push buttons at random and make stimulus-independent errors (lapses)We can extend this model with an extra parameter \(\lambda\) indicating the probability of lapsing\[ p \left(y \mid b\right) = \lambda+\left( 1-2\lambda \right) \times \Phi \left( \textbf{X} \beta + \textbf{Z} b \right) \\ b \sim \mathcal{N}\left( 0, \Omega \right)\\ \]
d <- read.table("bisection2.txt", sep = "\t", header = T)
str(d)
'data.frame': 4593 obs. of 5 variables:
$ id : int 4 4 4 4 4 4 4 4 4 4 ...
$ dx : num 0.04 -0.04 -0.53 -0.53 -0.36 0.44 0.53 0.69 0.93 -1.09 ...
$ rr : int 1 1 0 1 0 1 0 1 0 0 ...
$ acc : int 1 0 1 0 1 1 0 1 0 1 ...
$ cond: int 1 2 1 2 2 1 2 2 2 2 ...
Declare all the variables in the data block.
data {
int<lower=1> N;
int<lower=0,upper=1> rr[N];
real ds[N];
int<lower=0,upper=1> cond[N];
int<lower=1> J;
int<lower=1,upper=J> id[N];
}
Define the free parameters in the parameters block
parameters {
vector[4] beta; // fixed-effects parameters
vector<lower=0,upper=1>[J] lambda; // lapse rate
vector<lower=0>[2] beta_ab; // population parameters of lapse rate
vector<lower=0>[4] sigma_u; // random effects standard deviations
cholesky_factor_corr[4] L_u; // L_u is the Choleski factor of the correlation matrix
matrix[4,J] z_u; // random effect matrix
}
In the transformed parameters block apply any transformations needed to compute the likelihood.
transformed parameters {
matrix[4,J] u;
u = diag_pre_multiply(sigma_u, L_u) * z_u; // use Cholesky to set correlation
}
Define priors and likelihood in the model block.
model {
real mu; // linear predictor
//priors
L_u ~ lkj_corr_cholesky(2); // LKJ prior for the correlation matrix
to_vector(z_u) ~ normal(0,1); // before Cholesky, random eff. are normal variates with SD=1
sigma_u ~ cauchy(0, 1); // SD of random effects (vectorized)
beta[1] ~ normal(0, 2); // prior for intercept (cond 1)
beta[2] ~ normal(0, 4); // prior for slope (cond 1)
beta[3] ~ normal(0, 1); // prior for diff. in intercept (cond 1)
beta[4] ~ normal(0, 2); // prior for diff. in slope (cond 1)
beta_ab[1] ~ gamma(4,1); // hyper priors for lapse rate distribution
beta_ab[2] ~ gamma(1,0.2);
lambda ~ beta(beta_ab[1] ,beta_ab[2]); //lapse rate
//likelihood
for (i in 1:N){
mu = beta[1] + u[1,id[i]] + cond[i]*(beta[2]+u[2,id[i]])
+ (beta[3] + u[3,id[i]] + cond[i]*(beta[4] + u[4,id[i]] ))*ds[i];
rr[i] ~ bernoulli((1-lambda[id[i]])*Phi(mu)+lambda[id[i]]/2);
}
}
Prepare the data as a list for Stan.
d_stan <- list(id = as.numeric(factor(d$id, labels = 1:length(unique(d$id)))),
ds = d$dx, rr = d$rr, N = nrow(d), J = length(unique(d$id)),
cond = d$cond - 1)
str(d_stan)
List of 6
$ id : num [1:4593] 1 1 1 1 1 1 1 1 1 1 ...
$ ds : num [1:4593] 0.04 -0.04 -0.53 -0.53 -0.36 0.44 0.53 0.69 0.93 -1.09 ...
$ rr : int [1:4593] 1 1 0 1 0 1 0 1 0 0 ...
$ N : int 4593
$ J : int 13
$ cond: num [1:4593] 0 1 0 1 1 0 1 1 1 1 ...
Run sampling
bisection_model <- stan(file = "bisection_stan_v1.stan",
data = d_stan, iter = 4000, chains = 4)
Check the convergence by visualising the chains
traceplot(bisection_model, pars = c("beta", "sigma_u", "beta_ab"),
inc_warmup = F)
Visualize posterior distribution
pairs(bisection_model, pars = c("beta", "sigma_u"))
print(bisection_model, pars = c("beta", "sigma_u", "beta_ab"),
probs = c(0.025, 0.975), digits = 3)
Inference for Stan model: bisection_stan_v1.
4 chains, each with iter=4000; warmup=2000; thin=1;
post-warmup draws per chain=2000, total post-warmup draws=8000.
mean se_mean sd 2.5% 97.5% n_eff Rhat
beta[1] -0.577 0.007 0.329 -1.231 0.068 2512 1.001
beta[2] 0.430 0.004 0.216 0.001 0.865 3396 1.001
beta[3] 1.661 0.005 0.247 1.194 2.156 2596 1.000
beta[4] -0.946 0.003 0.177 -1.312 -0.625 2926 1.000
sigma_u[1] 1.152 0.004 0.251 0.767 1.725 3708 1.000
sigma_u[2] 0.689 0.003 0.168 0.421 1.081 4004 1.000
sigma_u[3] 0.705 0.003 0.170 0.434 1.099 2604 1.001
sigma_u[4] 0.440 0.003 0.132 0.214 0.740 2404 1.001
beta_ab[1] 1.233 0.011 0.567 0.435 2.594 2458 1.000
beta_ab[2] 9.303 0.082 4.554 3.029 20.679 3110 1.000
Samples were drawn using NUTS(diag_e) at Thu Sep 27 09:44:39 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Dedicated packages (e.g. the package bayesplot) makes it easier to summarize the results.
library(bayesplot)
posterior2 <- extract(bisection_model)
mcmc_intervals(posterior2, regex_pars = c("beta\\[[1-4]",
"sigma_u\\[[1-4]"), prob = 0.8, prob_outer = 0.95, point_est = "mean")
Dedicated packages (e.g. the package bayesplot) makes it easier to summarize the results.
mcmc_areas(posterior2, regex_pars = c("beta\\[[1-4]", "sigma_u\\[[1-4]"),
prob = 0.8, point_est = "mean")
Fixed-effects are parametrised according to standard GLM formulation (in terms of \(\beta_0, \beta_1, \dots\)). These can be easily transformed in the psychometric functions parameters \(\mu\) and \(\sigma\).
beta_par <- extract(bisection_model, pars = "beta")$beta jnd_1 <- 1/beta_par[, 3] jnd_2 <- 1/(beta_par[, 3] + beta_par[, 4]) jnd_diff <- jnd_2 - jnd_1 # posterior JND differences distribution
Calculate Highest Posterior Density (HPD) intervals (the narrowest interval containing the specified probability mass, e.g. 0.95).
jnd_ci_1 <- HPDinterval(as.mcmc(jnd_1), prob = 0.95) jnd_ci_2 <- HPDinterval(as.mcmc(jnd_2), prob = 0.95) jnd_ci_diff <- HPDinterval(as.mcmc(jnd_diff), prob = 0.95)
Plot
Hmisc::errbar(c("sigma 1", "sigma 2", "difference"), c(mean(jnd_1),
mean(jnd_2), mean(jnd_diff)), yplus = c(jnd_ci_1[2],
jnd_ci_2[2], jnd_ci_diff[2]), yminus = c(jnd_ci_1[1],
jnd_ci_2[1], jnd_ci_diff[1]))
d <- read.table("cueingdd.txt", header = T, sep = "\t")
d$alpha_rad <- d$alpha_deg/180 * pi
d$rerr <- d$dir * angDiff(d$alpha_rad, d$resp) # transform in signed angular differences
str(d)
'data.frame': 4608 obs. of 8 variables:
$ id : Factor w/ 8 levels "sj01","sj02",..: 1 1 1 1 1 1 1 1 1 1 ...
$ cue : int 2 2 2 2 2 1 2 2 2 2 ...
$ alpha_deg: int 220 270 130 180 320 50 330 90 300 80 ...
$ location : int 2 4 2 2 3 4 1 4 1 3 ...
$ dir : int 1 -1 1 1 1 1 1 1 -1 1 ...
$ resp : num -1.77 -0.35 1.84 -2.12 -0.38 1.22 0.37 1.57 -1.96 1.87 ...
$ rerr : num 0.673 -1.221 -0.429 1.022 0.318 ...
$ alpha_rad: num 3.84 4.71 2.27 3.14 5.59 ...
data block.data {
int<lower=1> N;
real<lower=-pi(),upper=pi()> rerr[N];
int<lower=1,upper=2> cue[N];
int<lower=1> J;
int<lower=1,upper=J> id[N];
}
parameter block.parameters {
// group-level
vector[2] mu; // bias
vector[2] logk; // k (log scale)
vector[2] logilambda; // mixing parameter (probit scale)
// subject specific
vector<lower=0>[6] sigma_u; // random effects standard deviations
cholesky_factor_corr[6] L_u; // L_u is the Choleski factor
matrix[6,J] z_u; // random effect matrix
}
transformed parameters block: here the parameters are transformed as required for the computation of the likelihood (to ensure that \(k>0\) and \(1<\lambda < 1\)).transformed parameters {
matrix<lower=0,upper=1>[2,J] lambda_j;
matrix<lower=0>[2,J] kappa_j;
matrix[2,J] mu_j;
matrix[6,J] u;
u = diag_pre_multiply(sigma_u, L_u) * z_u;
for (j in 1:J){
mu_j[1,j]= mu[1] + u[1,j];
mu_j[2,j]= mu[2] + u[2,j];
kappa_j[1,j]= exp(logk[1] + u[3,j]);
kappa_j[2,j]= exp(logk[2] + u[4,j]);
lambda_j[1,j]= inv_logit(logilambda[1] + u[5,j]);
lambda_j[2,j]= inv_logit(logilambda[2] + u[6,j]);
}
}
model block.model {
// priors
L_u ~ lkj_corr_cholesky(2);
to_vector(z_u) ~ normal(0,1);
sigma_u ~ normal(0, 10);
mu ~ normal(0.5, 10);
logk ~ normal(2, 20);
logilambda ~ normal(-4, 30);
// likelihood
for (n in 1:N){
if (kappa_j[cue[n],id[n]] < 100)
target += log_mix(lambda_j[cue[n],id[n]],
uniform_lpdf(rerr[n] | -pi(), pi()),
von_mises_lpdf(rerr[n] | mu_j[cue[n],id[n]], kappa_j[cue[n],id[n]]));
else
target += log_mix(lambda_j[cue[n],id[n]],
uniform_lpdf(rerr[n] | -pi(), pi()),
normal_lpdf(rerr[n] | mu_j[cue[n],id[n]], sqrt(1/kappa_j[cue[n],id[n]])));
}
}
mu1 <- extract(mix_model, pars = "mu[1]", inc_warmup = F,
permuted = FALSE)
mu2 <- extract(mix_model, pars = "mu[2]", inc_warmup = F,
permuted = FALSE)
mu1 <- c(mu1[, 1, 1], mu1[, 2, 1])/pi * 180
mu2 <- c(mu2[, 1, 1], mu2[, 2, 1])/pi * 180
HPDinterval(as.mcmc(mu2 - mu1), prob = 0.95)
lower upper
var1 -2.806921 12.10085
attr(,"Probability")
[1] 0.95
k1 <- extract(mix_model, pars = "logk[1]", inc_warmup = F,
permuted = FALSE)
k2 <- extract(mix_model, pars = "logk[2]", inc_warmup = F,
permuted = FALSE)
# transform concentration parameter k into a standard
# deviation
k2SD <- function(k) sqrt(-2 * log(besselI(k, nu = 1)/besselI(k,
nu = 0)))
sd1 <- k2SD(exp(c(k1[, 1, 1], k1[, 2, 1])))/pi * 180
sd2 <- k2SD(exp(c(k2[, 1, 1], k2[, 2, 1])))/pi * 180
HPDinterval(as.mcmc((sd2 - sd1), prob = 0.95))
lower upper
var1 1.98266 10.74307
attr(,"Probability")
[1] 0.95