- Multilevel models from a frequentist perspective
- The Bayesian approach
- Markov Chain Monte Carlo (MCMC)
- Examples
GDR Vision - Paris 4/10/2018
Matrix notation:
\[ \bf{Y} = \bf{X}\beta + \epsilon \] \[ \left( \begin{array}{c} y_1 \\ \vdots \\ y_m \end{array} \right) = \left( \begin{array}{cccc} 1 & x_{11} & \ldots & x_{1n}\\ \vdots & \vdots & \vdots & \vdots\\ 1 & x_{m1} & \ldots & x_{mn} \end{array} \right) \left( \begin{array}{c} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_n \end{array} \right) + \left( \begin{array}{c} \epsilon_1 \\ \vdots \\ \epsilon_m \end{array} \right) \]
Matrix multiplication: \[ \left( \begin{array}{cc} a & b \\ c& d \end{array} \right) \left( \begin{array}{c} 1 \\ 2 \end{array} \right) = 1 \left( \begin{array}{c} a \\ c \end{array} \right) + 2 \left( \begin{array}{c} b \\ d \end{array} \right) = \left( \begin{array}{c} a + 2b \\ c+ 2d \end{array} \right) \]
General formulation in matrix notation \[ \textbf{Y} = \textbf{X}\beta + \textbf{Z}b + \epsilon \] where \(\textbf{X}\) and \(\textbf{Z}\) are the known fixed-effects and random-effects regressor matrices.
The components of the residual error vector \(\epsilon \sim \mathcal{N}\left( 0, \sigma^2 \right)\) are assumed to be i.i.d. (independent and identically distributed).
The random-effect components, \(b \sim \mathcal{N}\left( 0, \Omega \right)\) are assumed to be normally distributed with mean 0, however they are not necessarily independent (the components \(b_j\) can be correlated, and correlations can be estimated). Example:\[ \left[ {\begin{array}{c} {{b_0}}\\ {{b_1}} \end{array}} \right] \sim\cal N \left( {\left[ {\begin{array}{c} 0 \\ 0 \end{array}} \right],\Omega = \left[ {\begin{array}{c} {{\mathop{\rm Var}} \left( b_0 \right)} & {{\mathop{\rm cov}} \left( {{b_0},{b_1}} \right)}\\ {{\mathop{\rm cov}} \left( {{b_0},{b_1}} \right)} & {{\mathop{\rm Var}} \left( b_1 \right)} \end{array}} \right]} \right) \]
Parameters (\(\beta\), \(\sigma^2\) and \(\Omega\)) are estimated by maximizing the likelihood function, which is the probability of the data, given the parameters (but treated as a function of the parameters, keeping the data fixed).
The probability of the data conditional to the random effects is integrated with respect to the marginal density of the random effects, to obtain the marginal density of the data \[ L\left(\beta,\sigma^2,\Omega \mid \text{data}\right) = \int p \left(\text{data} \mid \beta,\sigma^2, b\right) \, p\left(b \mid \Omega\right) \, db \]
The \(\textsf{R}\) library lme4
provides two methods for estimating parameters: Maximum Likelihood (ML) and Restricted Maximum Likelihood (REML).
ML tend to be biased and underestimate the variance components (e.g. \(\Omega\)).
REML provide less biased variance estimates: conceptually can be seen as similar to Bessel's correction for sample variance (using \(n-1\) instead of \(n\) in the denominator).
sleepstudy
sleepstudy
is a dataset in the lme4
package, with reaction times data from 18 subjects that were restricted to 3 hours of sleep for 10 days.
Questions:
str(sleepstudy) 'data.frame': 180 obs. of 3 variables: $ Reaction: num 250 259 251 321 357 ... $ Days : num 0 1 2 3 4 5 6 7 8 9 ... $ Subject : Factor w/ 18 levels "308","309","310",..: 1 1 1 1 1 1 1 1 1 1 ...
lme4
m.1 <- lmer(Reaction ~ 1 + Days + (1 + Days | Subject), sleepstudy)
summary(m.1) Linear mixed model fit by REML ['lmerMod'] Formula: Reaction ~ 1 + Days + (1 + Days | Subject) Data: sleepstudy REML criterion at convergence: 1743.6 Scaled residuals: Min 1Q Median 3Q Max -3.9536 -0.4634 0.0231 0.4634 5.1793 Random effects: Groups Name Variance Std.Dev. Corr Subject (Intercept) 612.09 24.740 Days 35.07 5.922 0.07 Residual 654.94 25.592 Number of obs: 180, groups: Subject, 18 Fixed effects: Estimate Std. Error t value (Intercept) 251.405 6.825 36.84 Days 10.467 1.546 6.77 Correlation of Fixed Effects: (Intr) Days -0.138
The function confint()
allow easily to compute bootstrapped 95% CI of the parameters
CI_fixef <- confint(m.1, method = "boot", nsim = 500, oldNames = F)
print(CI_fixef, digits = 2) 2.5 % 97.5 % sd_(Intercept)|Subject 12.81 35.63 cor_Days.(Intercept)|Subject -0.53 0.81 sd_Days|Subject 3.57 8.32 sigma 22.61 28.44 (Intercept) 237.02 264.76 Days 7.54 13.20
Days
# m.1 <- lmer(Reaction ~ 1 + Days + (1 + Days|Subject), # sleepstudy) m.2 <- lmer(Reaction ~ 1 + Days + (1 | Subject), sleepstudy) anova(m.1, m.2) refitting model(s) with ML (instead of REML) Data: sleepstudy Models: m.2: Reaction ~ 1 + Days + (1 | Subject) m.1: Reaction ~ 1 + Days + (1 + Days | Subject) Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) m.2 4 1802.1 1814.8 -897.04 1794.1 m.1 6 1763.9 1783.1 -875.97 1751.9 42.139 2 7.072e-10 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Shrinkage: the predicted \({\hat b}_j\) (conditional modes of the random effects) can be seen as a "compromise" between the within-subject estimates and the population mean.
Say you want to sample from a density \(P({\bf x})\), but you can only evaluate a function \(f({\bf x})\) that is proportional to the density \(P({\bf x})\):
Let use the sampling algorithm to infer values from one participant of the dataset sleepstudy
.
First, code one function to compute the (log)likelihood, keeping the data fixed.
x <- sleepstudy$Days[sleepstudy$Subject == "308"] y <- sleepstudy$Reaction[sleepstudy$Subject == "308"] loglik <- function(par) { # par = [intercept, slope, log(residual Std.)] pred <- par[1] + par[2] * x return(sum(dnorm(y, mean = pred, sd = exp(par[3]), log = T))) }
Next, choose your priors
Next, choose your priors