R
Effect sizes
Estimating meta-analytic models in R (metafor
package, Viechtbauer (2010))
Random-effects model
Meta-regression
Multilevel meta analysis
Bias detection
A summary statistics that capture the direction and size of the "effect" of interest (e.g. the mean difference between two conditions, groups, etc.)
Unstandardized vs. standardized
A summary statistics that capture the direction and size of the "effect" of interest (e.g. the mean difference between two conditions, groups, etc.)
Unstandardized vs. standardized
Standardize effect size measures should be used when it is not possible to quantify express results in the same units across different studies; e.g. see Baguley (2009)
A summary statistics that capture the direction and size of the "effect" of interest (e.g. the mean difference between two conditions, groups, etc.)
Unstandardized vs. standardized
Standardize effect size measures should be used when it is not possible to quantify express results in the same units across different studies; e.g. see Baguley (2009)
For each effect size we need to know how reliable it is. This is typically quantified by the standard error of the effect size.
For each effect size we need to know how reliable it is. This is typically quantified by the standard error of the effect size. Formally we have:
For each effect size we need to know how reliable it is. This is typically quantified by the standard error of the effect size. Formally we have:
The sampling error is the 'uncontrolled' variability (accounting for the fact that even an exact replication of a study would likely give a different result).
For each effect size we need to know how reliable it is. This is typically quantified by the standard error of the effect size. Formally we have:
The sampling error is the 'uncontrolled' variability (accounting for the fact that even an exact replication of a study would likely give a different result).
If we were to repeat the experiment many times (ideally infinitely many) we would obtain the sampling distribution of the observed effect ˆθk.
For each effect size we need to know how reliable it is. This is typically quantified by the standard error of the effect size. Formally we have:
The sampling error is the 'uncontrolled' variability (accounting for the fact that even an exact replication of a study would likely give a different result).
If we were to repeat the experiment many times (ideally infinitely many) we would obtain the sampling distribution of the observed effect ˆθk.
Given a single study, we can never know ϵk, however we can estimate it's variability. The standard error (SE) of the effect size is the standard deviation of the sampling distribution, and therefore it is an estimate of how reliable is a study.
Example: the standard error of the mean is SE=s√n where s is the sample standard deviation and n the sample size. As n increases, the standard error will decrease proportionally to √n.
Example: the standard error of the mean is SE=s√n where s is the sample standard deviation and n the sample size. As n increases, the standard error will decrease proportionally to √n.
e.g. study A has twice the sample size of study B
Example: the standard error of the mean is SE=s√n where s is the sample standard deviation and n the sample size. As n increases, the standard error will decrease proportionally to √n.
e.g. study A has twice the sample size of study B⟹ the sampling error of study A is expected to be 1√2≈70% that of study B.
Standard error's for Cohen's d:
SE(d)=√n1+n2n1n2+d22(n1+n2)
For the meta-analytic approach taken in metafor
, we need to ensure that effect size measures are not restricted in their range
For the meta-analytic approach taken in metafor
, we need to ensure that effect size measures are not restricted in their range
For the meta-analytic approach taken in metafor
, we need to ensure that effect size measures are not restricted in their range
The dataset power_pose
contains 6 pre-registered replications testing whether participants reported higher "feeling-of-power" ratings after adopting expansive 'as opposed to constrictive body postures.
power_pose <- read.csv("https://mlisi.xyz/files/workshops/meta_analyses/data/power_pose.csv")str(power_pose)
## 'data.frame': 6 obs. of 7 variables:## $ study : chr "Bailey et al." "Ronay et al." "Klaschinski et al." "Bombari et al." ...## $ n_high_power : int 46 53 101 99 100 135## $ n_low_power : int 48 55 99 101 100 134## $ mean_high_power: num 2.62 2.12 3 2.24 2.58 ...## $ mean_low_power : num 2.39 1.95 2.73 1.99 2.44 ...## $ sd_high_power : num 0.932 0.765 0.823 0.927 0.789 ...## $ sd_low_power : num 0.935 0.792 0.9 0.854 0.977 ...
The dataset power_pose
contains 6 pre-registered replications testing whether participants reported higher "feeling-of-power" ratings after adopting expansive 'as opposed to constrictive body postures.
power_pose <- read.csv("https://mlisi.xyz/files/workshops/meta_analyses/data/power_pose.csv")str(power_pose)
## 'data.frame': 6 obs. of 7 variables:## $ study : chr "Bailey et al." "Ronay et al." "Klaschinski et al." "Bombari et al." ...## $ n_high_power : int 46 53 101 99 100 135## $ n_low_power : int 48 55 99 101 100 134## $ mean_high_power: num 2.62 2.12 3 2.24 2.58 ...## $ mean_low_power : num 2.39 1.95 2.73 1.99 2.44 ...## $ sd_high_power : num 0.932 0.765 0.823 0.927 0.789 ...## $ sd_low_power : num 0.935 0.792 0.9 0.854 0.977 ...
Calculate standardized effect size (Cohen's d) and its standard error for each study, and add it as an additional column in the dataset.
Additional: Compute also the t statistic and the one-sided p-value for each study
Formulae:
Cohen's d=¯M1−¯M2σpooled
SE(d)=√n1+n2n1n2+d22(n1+n2)
σpooled=√(n1−1)σ21+(n2−1)σ22n1+n2−2
Formulae:
Cohen's d=¯M1−¯M2σpooled
SE(d)=√n1+n2n1n2+d22(n1+n2)
σpooled=√(n1−1)σ21+(n2−1)σ22n1+n2−2
t=¯M1−¯M2σpooled√1n1+1n2
Formulae:
Cohen's d=¯M1−¯M2σpooled
SE(d)=√n1+n2n1n2+d22(n1+n2)
σpooled=√(n1−1)σ21+(n2−1)σ22n1+n2−2
t=¯M1−¯M2σpooled√1n1+1n2
Calculate standardized effect size (Cohen's d) and its standard error for each study, and add it as an additional column in the dataset.
Additional: Compute also the t statistic and the one-sided p-value for each study
# function for computing pooled SDpooled_SD <- function(sdA,sdB,nA,nB){ sqrt((sdA^2*(nA-1) + sdB^2*(nB-1))/(nA+nB-1))}# function for Cohen's dcohen_d_se <- function(n1,n2,d){ sqrt((n1+n2)/(n1*n2) + (d^2)/(2*(n1+n2)))}# apply functions & create new columns in the dataset (using dplyr::mutate)power_pose <- power_pose %>% mutate(pooled_sd =pooled_SD(sd_high_power, sd_low_power, n_high_power, n_low_power), mean_diff = mean_high_power - mean_low_power, cohen_d = mean_diff/pooled_sd, SE = cohen_d_se(n_high_power, n_low_power,cohen_d))
# function for computing pooled SDpooled_SD <- function(sdA,sdB,nA,nB){ sqrt((sdA^2*(nA-1) + sdB^2*(nB-1))/(nA+nB-1))}# function for Cohen's dcohen_d_se <- function(n1,n2,d){ sqrt((n1+n2)/(n1*n2) + (d^2)/(2*(n1+n2)))}# apply functions & create new columns in the dataset (base R)power_pose$pooled_sd <- with(power_pose, pooled_SD(sd_high_power, sd_low_power, n_high_power, n_low_power))power_pose$mean_diff <- with(power_pose, mean_high_power - mean_low_power)power_pose$cohen_d <- with(power_pose, mean_diff/pooled_sd)power_pose$SE <- with(power_pose, cohen_d_se(n_high_power, n_low_power,cohen_d))
par(mfrow=c(1,2))n <- with(power_pose, n_high_power + n_low_power)plot(n,power_pose$SE, ylab="SE(Cohen's d)",cex=1.5)plot(power_pose$mean_diff, power_pose$cohen_d, xlab = "mean difference (5-point Likert scale)", ylab="Cohen's d",cex=1.5)
We have defined θk as the true effect size of study k and ˆθk=θk+ϵk as the observed effect size.
We have defined θk as the true effect size of study k and ˆθk=θk+ϵk as the observed effect size.
In the random-effects model, we further assume that θk is a random sample from a population of possible studies that measure the same effect but differ in their methods or sample characteristics. These differences introduce heterogeneity among the true effects.
We have defined θk as the true effect size of study k and ˆθk=θk+ϵk as the observed effect size.
In the random-effects model, we further assume that θk is a random sample from a population of possible studies that measure the same effect but differ in their methods or sample characteristics. These differences introduce heterogeneity among the true effects.
The random-effects model account for this heterogeneity by treating as an additional level of randomness; that is we assume that θk=μ+ζk where μ is the average of true effects, and ζk∼N(0,τ2) are random, study-specific fluctuations in the true effect. The variance τ2 quantifies the amount of heterogeneity among true effects.
We have defined θk as the true effect size of study k and ˆθk=θk+ϵk as the observed effect size.
In the random-effects model, we further assume that θk is a random sample from a population of possible studies that measure the same effect but differ in their methods or sample characteristics. These differences introduce heterogeneity among the true effects.
The random-effects model account for this heterogeneity by treating as an additional level of randomness; that is we assume that θk=μ+ζk where μ is the average of true effects, and ζk∼N(0,τ2) are random, study-specific fluctuations in the true effect. The variance τ2 quantifies the amount of heterogeneity among true effects.
Putting it all together we have that ˆθk=μ+ζk+ϵk with ζk∼N(0,τ2)ϵk∼N(0,σ2k)
Putting it all together we have that ˆθk=μ+ζk+ϵk with ζk∼N(0,τ2)ϵk∼N(0,σ2k)
Or equivalently, in more compact notation: ˆθk∼N(θk,σ2k),θk∼N(μ,τ2)
metafor
packageThe function rma
allows to fit the random-effects model very easily.
metafor
packageThe function rma
allows to fit the random-effects model very easily.
For the power_pose
example
str(power_pose)
## 'data.frame': 6 obs. of 11 variables:## $ study : chr "Bailey et al." "Ronay et al." "Klaschinski et al." "Bombari et al." ...## $ n_high_power : int 46 53 101 99 100 135## $ n_low_power : int 48 55 99 101 100 134## $ mean_high_power: num 2.62 2.12 3 2.24 2.58 ...## $ mean_low_power : num 2.39 1.95 2.73 1.99 2.44 ...## $ sd_high_power : num 0.932 0.765 0.823 0.927 0.789 ...## $ sd_low_power : num 0.935 0.792 0.9 0.854 0.977 ...## $ pooled_sd : num 0.929 0.775 0.86 0.889 0.886 ...## $ mean_diff : num 0.234 0.177 0.275 0.252 0.13 ...## $ cohen_d : num 0.252 0.229 0.319 0.284 0.147 ...## $ SE : num 0.207 0.193 0.142 0.142 0.142 ...
we can estimate the model with this command:
library(metafor)res <- rma(cohen_d, SE^2, data = power_pose)
Model output:
summary(res)
## ## Random-Effects Model (k = 6; tau^2 estimator: REML)## ## logLik deviance AIC BIC AICc ## 4.0374 -8.0748 -4.0748 -4.8559 1.9252 ## ## tau^2 (estimated amount of total heterogeneity): 0 (SE = 0.0139)## tau (square root of estimated tau^2 value): 0## I^2 (total heterogeneity / total variability): 0.00%## H^2 (total variability / sampling variability): 1.00## ## Test for Heterogeneity:## Q(df = 5) = 1.2981, p-val = 0.9351## ## Model Results:## ## estimate se zval pval ci.lb ci.ub ## 0.2230 0.0613 3.6358 0.0003 0.1028 0.3432 *** ## ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In the random-effects model the τ2 directly quantify the heterogeneity.
In the random-effects model the τ2 directly quantify the heterogeneity.
There are other measures of of heterogeneity which can be applied also to non random-effects models:
Weighted sum of the squared differences between effect size estimates in each study, ˆθk, and the meta-analytic effect size estimate, ˆμk
Q=K∑k=1wk(ˆθk−ˆμ)2
where wk=1σ2k
Weighted sum of the squared differences between effect size estimates in each study, ˆθk, and the meta-analytic effect size estimate, ˆμk
Q=K∑k=1wk(ˆθk−ˆμ)2
where wk=1σ2k
Under the null hypothesis that all effect size differences are caused only by sampling error (no heterogeneity), Q is approximately distributed as χ2 with K−1 degrees of freedom.
Percentage of variability in the observed effect sizes that is not caused by sampling error
I2=Q−(K−1)Q
where K is the total number of studies.
H2=QK−1
It can be interpreted as the ratio of the observed variation, and the expected variance due to sampling error.
H2>1 indicates heterogeneity
Model output:
summary(res)
## ## Random-Effects Model (k = 6; tau^2 estimator: REML)## ## logLik deviance AIC BIC AICc ## 4.0374 -8.0748 -4.0748 -4.8559 1.9252 ## ## tau^2 (estimated amount of total heterogeneity): 0 (SE = 0.0139)## tau (square root of estimated tau^2 value): 0## I^2 (total heterogeneity / total variability): 0.00%## H^2 (total variability / sampling variability): 1.00## ## Test for Heterogeneity:## Q(df = 5) = 1.2981, p-val = 0.9351## ## Model Results:## ## estimate se zval pval ci.lb ci.ub ## 0.2230 0.0613 3.6358 0.0003 0.1028 0.3432 *** ## ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since τ2≈0, we could also estimate a fixed-effect model (option method="FE"
)
summary(rma(cohen_d, SE^2, data = power_pose, method="FE"))
## ## Fixed-Effects Model (k = 6)## ## logLik deviance AIC BIC AICc ## 5.0141 1.2981 -8.0283 -8.2365 -7.0283 ## ## I^2 (total heterogeneity / total variability): 0.00%## H^2 (total variability / sampling variability): 0.26## ## Test for Heterogeneity:## Q(df = 5) = 1.2981, p-val = 0.9351## ## Model Results:## ## estimate se zval pval ci.lb ci.ub ## 0.2230 0.0613 3.6358 0.0003 0.1028 0.3432 *** ## ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The metafor
package has also a handy function to make forest plots:
forest(res)
With only a few addition we can make a publication-quality figure
forest(res, slab=study, xlim=c(-5,2), xlab="Cohen's d", ilab=round(cbind(with(power_pose, n_high_power + n_low_power), mean_low_power, sd_low_power, mean_high_power, sd_high_power),digits=2), ilab.xpos=c(-3.5, -2.5,-2,-1.25,-0.75), cex=.75, header="Authors", mlab="")par(cex=.75, font=2)text(c(-2.5,-2,-1.25,-0.75), res$k+2, c("mean", "Std.", "mean", "Std."))text(c(-2.25,-1), res$k+3, c("Low power", "High power"))text(-3.5, res$k+2, "N")# this add the results of the heterogeneity testtext(-5, -1, pos=4, cex=0.9, bquote(paste("RE Model (Q = ",.(formatC(res$QE, digits=2, format="f")), ", df = ", .(res$k - res$p),", p = ", .(formatC(res$QEp, digits=2, format="f")), "; ", I^2, " = ", .(formatC(res$I2, digits=1, format="f")), "%)")))
towels
contains results of a set of studies that investigated whether people reuse towels in hotels more often if they are provided with a descriptive norm (a message stating that the majority of guests actually reused their towels), as opposed to a control condition in which they were simply encouraged to reuse the towels (with a message about the protection of the environment)towels <- read.csv("https://mlisi.xyz/files/workshops/meta_analyses/data/towels.csv")str(towels)
## 'data.frame': 7 obs. of 3 variables:## $ study: chr "Goldstein, Cialdini, & Griskevicius (2008), Exp. 1" "Goldstein, Cialdini, & Griskevicius (2008), Exp. 2" "Schultz, Khazian, & Zaleski (2008), Exp. 2" "Schultz, Khazian, & Zaleski (2008), Exp. 3" ...## $ logOR: num 0.381 0.305 0.206 0.251 0.288 ...## $ SE : num 0.198 0.136 0.192 0.17 0.824 ...
ThirdWave
studies examining the effect of so-called “third wave” psychotherapies on perceived stress in college students. The effect size is the standardized mean difference between a treatment and control group at post-testload(url('https://mlisi.xyz/files/workshops/meta_analyses/data/thirdwave.rda'))str(ThirdWave)
## 'data.frame': 18 obs. of 8 variables:## $ Author : chr "Call et al." "Cavanagh et al." "DanitzOrsillo" "de Vibe et al." ...## $ TE : num 0.709 0.355 1.791 0.182 0.422 ...## $ seTE : num 0.261 0.196 0.346 0.118 0.145 ...## $ RiskOfBias : chr "high" "low" "high" "low" ...## $ TypeControlGroup : chr "WLC" "WLC" "WLC" "no intervention" ...## $ InterventionDuration: chr "short" "short" "short" "short" ...## $ InterventionType : chr "mindfulness" "mindfulness" "ACT" "mindfulness" ...## $ ModeOfDelivery : chr "group" "online" "group" "group" ...
Example taken from the book by Harrer, Cuijpers, A, and Ebert (2021).
Study-level variables ('moderators') can also be included as predictors in meta-analytic models, leading to so-called 'meta-regression' analyses.
Example, with 1 study-level moderator variable Xk
ˆθk=β0+βXk+ζk+ϵk
where the intercept β0 corresponds to the average true effect/outcome μ when the value of the moderator variable Xk is equal to zero.
Author | TE | seTE | RiskOfBias | TypeControlGroup | InterventionDuration | InterventionType | ModeOfDelivery | |
---|---|---|---|---|---|---|---|---|
1 | Call et al. | 0.71 | 0.26 | high | WLC | short | mindfulness | group |
2 | Cavanagh et al. | 0.35 | 0.20 | low | WLC | short | mindfulness | online |
3 | DanitzOrsillo | 1.79 | 0.35 | high | WLC | short | ACT | group |
4 | de Vibe et al. | 0.18 | 0.12 | low | no intervention | short | mindfulness | group |
5 | Frazier et al. | 0.42 | 0.14 | low | information only | short | PCI | online |
6 | Frogeli et al. | 0.63 | 0.20 | low | no intervention | short | ACT | group |
7 | Gallego et al. | 0.72 | 0.22 | high | no intervention | long | mindfulness | group |
8 | Hazlett-Stevens & Oren | 0.53 | 0.21 | low | no intervention | long | mindfulness | book |
9 | Hintz et al. | 0.28 | 0.17 | low | information only | short | PCI | online |
10 | Kang et al. | 1.28 | 0.34 | low | no intervention | long | mindfulness | group |
11 | Kuhlmann et al. | 0.10 | 0.19 | high | no intervention | short | mindfulness | group |
12 | Lever Taylor et al. | 0.39 | 0.23 | low | WLC | long | mindfulness | book |
Does the duration of intervention influence the effect size?
# transform duration in a dummy variableThirdWave$duration_dummy <- ifelse(ThirdWave$InterventionDuration=="long", 1, 0)res2 <- rma(yi=TE, vi=seTE^2, data = ThirdWave, mods = duration_dummy)print(res2)
## ## Mixed-Effects Model (k = 18; tau^2 estimator: REML)## ## tau^2 (estimated amount of residual heterogeneity): 0.0630 (SE = 0.0401)## tau (square root of estimated tau^2 value): 0.2510## I^2 (residual heterogeneity / unaccounted variability): 58.30%## H^2 (unaccounted variability / sampling variability): 2.40## R^2 (amount of heterogeneity accounted for): 23.13%## ## Test for Residual Heterogeneity:## QE(df = 16) = 37.5389, p-val = 0.0018## ## Test of Moderators (coefficient 2):## QM(df = 1) = 2.6694, p-val = 0.1023## ## Model Results:## ## estimate se zval pval ci.lb ci.ub ## intrcpt 0.4701 0.1001 4.6970 <.0001 0.2739 0.6663 *** ## mods 0.2739 0.1676 1.6338 0.1023 -0.0547 0.6025 ## ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Does adding the moderators significantly improve the ability of the model to explain the observed effect sizes?
# refit models using maximum likelihood estimation (ML)res0 <- rma(yi=TE, vi=seTE^2, data = ThirdWave, method="ML")res2 <- rma(yi=TE, vi=seTE^2, data = ThirdWave, mods = duration_dummy, method="ML")# run likelihood ratio test of nested modelsanova(res0, res2)
## ## df AIC BIC AICc logLik LRT pval QE tau^2 ## Full 3 20.6903 23.3614 22.4046 -7.3452 37.5389 0.0423 ## Reduced 2 21.5338 23.3145 22.3338 -8.7669 2.8435 0.0917 45.5026 0.0715 ## R^2 ## Full ## Reduced 40.7999%
Introduce a hierarchically higher level of randomness in which observed effect sizes are grouped in "clusters" that share some similarities (for example many effect sizes are nested in one study; or many studies are nested in subgroups depending on the nationality of the participants).
Introduce a hierarchically higher level of randomness in which observed effect sizes are grouped in "clusters" that share some similarities (for example many effect sizes are nested in one study; or many studies are nested in subgroups depending on the nationality of the participants).
Formally ˆθij=μ+ζ(2)ij+ζ(3)j+ϵij
load(url('https://mlisi.xyz/files/workshops/meta_analyses/data/Chernobyl.rda'))str(Chernobyl)
## 'data.frame': 33 obs. of 8 variables:## $ author : chr "Aghajanyan & Suskov (2009)" "Aghajanyan & Suskov (2009)" "Aghajanyan & Suskov (2009)" "Aghajanyan & Suskov (2009)" ...## $ cor : num 0.206 0.269 0.205 0.267 0.932 ...## $ n : num 91 91 92 92 559 559 559 559 559 559 ...## $ z : num 0.209 0.275 0.208 0.274 1.671 ...## $ se.z : num 0.1066 0.1066 0.106 0.106 0.0424 ...## $ var.z : num 0.0114 0.0114 0.0112 0.0112 0.0018 ...## $ radiation: chr "low" "low" "low" "low" ...## $ es.id : chr "id_1" "id_2" "id_3" "id_4" ...
author | cor | n | z | se.z | var.z | radiation | es.id | |
---|---|---|---|---|---|---|---|---|
1 | Aghajanyan & Suskov (2009) | 0.21 | 91 | 0.21 | 0.11 | 0.01 | low | id_1 |
2 | Aghajanyan & Suskov (2009) | 0.27 | 91 | 0.28 | 0.11 | 0.01 | low | id_2 |
3 | Aghajanyan & Suskov (2009) | 0.20 | 92 | 0.21 | 0.11 | 0.01 | low | id_3 |
4 | Aghajanyan & Suskov (2009) | 0.27 | 92 | 0.27 | 0.11 | 0.01 | low | id_4 |
5 | Alexanin et al. (2010) | 0.93 | 559 | 1.67 | 0.04 | 0.00 | low | id_5 |
6 | Alexanin et al. (2010) | 0.44 | 559 | 0.48 | 0.04 | 0.00 | low | id_6 |
7 | Alexanin et al. (2010) | 0.90 | 559 | 1.47 | 0.04 | 0.00 | low | id_7 |
8 | Alexanin et al. (2010) | 0.95 | 559 | 1.88 | 0.04 | 0.00 | low | id_8 |
author | cor | n | z | se.z | var.z | radiation | es.id | |
---|---|---|---|---|---|---|---|---|
1 | Aghajanyan & Suskov (2009) | 0.21 | 91 | 0.21 | 0.11 | 0.01 | low | id_1 |
2 | Aghajanyan & Suskov (2009) | 0.27 | 91 | 0.28 | 0.11 | 0.01 | low | id_2 |
3 | Aghajanyan & Suskov (2009) | 0.20 | 92 | 0.21 | 0.11 | 0.01 | low | id_3 |
4 | Aghajanyan & Suskov (2009) | 0.27 | 92 | 0.27 | 0.11 | 0.01 | low | id_4 |
5 | Alexanin et al. (2010) | 0.93 | 559 | 1.67 | 0.04 | 0.00 | low | id_5 |
6 | Alexanin et al. (2010) | 0.44 | 559 | 0.48 | 0.04 | 0.00 | low | id_6 |
7 | Alexanin et al. (2010) | 0.90 | 559 | 1.47 | 0.04 | 0.00 | low | id_7 |
8 | Alexanin et al. (2010) | 0.95 | 559 | 1.88 | 0.04 | 0.00 | low | id_8 |
Studies reported correlation between radioactivity die to the nuclear fallout and mutation rates in humans, caused by the 1986 Chernobyl reactor disaster.
Most studies reported multiple effect sizes (e.g. because they used several methods to measure mutations or considered several subgroups, such as exposed parents versus their offspring).
In R, we can fit a model using the rma.mv
function and setting the random
argument to indicate the clustering structure
model_cherobyl <- rma.mv(yi = z, V = var.z, slab = author, data = Chernobyl, random = ~ 1 | author/es.id, # equivalent formulations below: # random = list(~ 1 | author, ~ 1 | interaction(author,es.id)), # random = list(~ 1 | author, ~ 1 | es.id), method = "REML")summary(model_cherobyl)
## ## Multivariate Meta-Analysis Model (k = 33; method: REML)## ## logLik Deviance AIC BIC AICc ## -21.1229 42.2458 48.2458 52.6430 49.1029 ## ## Variance Components:## ## estim sqrt nlvls fixed factor ## sigma^2.1 0.1788 0.4229 14 no author ## sigma^2.2 0.1194 0.3455 33 no author/es.id ## ## Test for Heterogeneity:## Q(df = 32) = 4195.8268, p-val < .0001## ## Model Results:## ## estimate se zval pval ci.lb ci.ub ## 0.5231 0.1341 3.9008 <.0001 0.2603 0.7860 *** ## ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
forest(model_cherobyl)
library(metafor)dat <- dat.konstantopoulos2011head(dat)
## ## district school study year yi vi ## 1 11 1 1 1976 -0.180 0.118 ## 2 11 2 2 1976 -0.220 0.118 ## 3 11 3 3 1976 0.230 0.144 ## 4 11 4 4 1976 -0.300 0.144 ## 5 12 1 5 1989 0.130 0.014 ## 6 12 2 6 1989 -0.260 0.014
Results from 56 studies on the effects of modified school calendars (comparing more frequent but shorter intermittent breaks, to a longer summer break) on student achievement. The effect yi
is the standardized difference between the groups in each study, and vi
is the study variance.
(More info about this dataset at this link)
district | school | study | year | yi | vi | |
---|---|---|---|---|---|---|
1 | 11 | 1 | 1 | 1976 | -0.18 | 0.12 |
2 | 11 | 2 | 2 | 1976 | -0.22 | 0.12 |
3 | 11 | 3 | 3 | 1976 | 0.23 | 0.14 |
4 | 11 | 4 | 4 | 1976 | -0.30 | 0.14 |
5 | 12 | 1 | 5 | 1989 | 0.13 | 0.01 |
6 | 12 | 2 | 6 | 1989 | -0.26 | 0.01 |
7 | 12 | 3 | 7 | 1989 | 0.19 | 0.01 |
8 | 12 | 4 | 8 | 1989 | 0.32 | 0.02 |
9 | 18 | 1 | 9 | 1994 | 0.45 | 0.02 |
10 | 18 | 2 | 10 | 1994 | 0.38 | 0.04 |
11 | 18 | 3 | 11 | 1994 | 0.29 | 0.01 |
12 | 27 | 1 | 12 | 1976 | 0.16 | 0.02 |
13 | 27 | 2 | 13 | 1976 | 0.65 | 0.00 |
14 | 27 | 3 | 14 | 1976 | 0.36 | 0.00 |
15 | 27 | 4 | 15 | 1976 | 0.60 | 0.01 |
16 | 56 | 1 | 16 | 1997 | 0.08 | 0.02 |
Funnel plot
Funnel plot from Carter and McCullough (2014) vizualizing bias in 198 published tests of the ego-depletion effect.
Image taken from Lakens (2023), link
ThirdWave
datasetres <- rma(yi=TE, vi=seTE^2, data = ThirdWave)funnel(res, level=c(90, 95, 99), shade=c("white", "gray55", "gray75"), refline=0, legend=TRUE, xlab="Effect size (standardized mean difference)", refline2=res$b)
Egger’s regression test for funnel plot asymmetry
regtest(res, model="lm")
## ## Regression Test for Funnel Plot Asymmetry## ## Model: weighted regression with multiplicative dispersion## Predictor: standard error## ## Test for Funnel Plot Asymmetry: t = 4.6770, df = 16, p = 0.0003## Limit Estimate (as sei -> 0): b = -0.3407 (CI: -0.7302, 0.0487)
Egger’s regression test for funnel plot asymmetry
regtest(res, model="lm")
## ## Regression Test for Funnel Plot Asymmetry## ## Model: weighted regression with multiplicative dispersion## Predictor: standard error## ## Test for Funnel Plot Asymmetry: t = 4.6770, df = 16, p = 0.0003## Limit Estimate (as sei -> 0): b = -0.3407 (CI: -0.7302, 0.0487)
with(ThirdWave, summary(lm(I(TE/seTE) ~ I(1/seTE))))
## ## Call:## lm(formula = I(TE/seTE) ~ I(1/seTE))## ## Residuals:## Min 1Q Median 3Q Max ## -1.82920 -0.55227 -0.03299 0.66857 2.05815 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.1111 0.8790 4.677 0.000252 ***## I(1/seTE) -0.3407 0.1837 -1.855 0.082140 . ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 1.096 on 16 degrees of freedom## Multiple R-squared: 0.177, Adjusted R-squared: 0.1255 ## F-statistic: 3.44 on 1 and 16 DF, p-value: 0.08214
Egger’s regression test tidyverse
style
ThirdWave %>% mutate(y = TE/seTE, x = 1/seTE) %>% lm(y ~ x, data = .) %>% summary()
## ## Call:## lm(formula = y ~ x, data = .)## ## Residuals:## Min 1Q Median 3Q Max ## -1.82920 -0.55227 -0.03299 0.66857 2.05815 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.1111 0.8790 4.677 0.000252 ***## x -0.3407 0.1837 -1.855 0.082140 . ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 1.096 on 16 degrees of freedom## Multiple R-squared: 0.177, Adjusted R-squared: 0.1255 ## F-statistic: 3.44 on 1 and 16 DF, p-value: 0.08214
Baguley, T. (2009). "Standardized or simple effect size: What should be reported?" In: British Journal of Psychology 100.3, pp. 603-617. DOI: https://doi.org/10.1348/000712608X377117.
Harrer, M., P. Cuijpers, F. T. A, et al. (2021). Doing Meta-Analysis With R: A Hands-On Guide. 1st. Boca Raton, FL and London: Chapman & Hall/CRC Press. ISBN: 9780367610074.
Lakens, D. (2023). Improving Your Statistical Inferences. https://doi.org/10.5281/zenodo.6409077.
Viechtbauer, W. (2010). "Conducting meta-analyses in R with the metafor package". In: Journal of Statistical Software 36.3, pp. 1-48. https://doi.org/10.18637/jss.v036.i03.
mark | ||||
---|---|---|---|---|
Predictors | Estimates | std. Error | CI | p |
(Intercept) | 0.6388 | 0.0235 | 0.5927 – 0.6850 | <0.001 |
gender [male] | -0.0317 | 0.0210 | -0.0730 – 0.0096 | 0.132 |
Random Effects | ||||
σ2 | 0.0052 | |||
τ00 student | 0.0095 | |||
τ00 module | 0.0019 | |||
ICC | 0.6851 | |||
N student | 163 | |||
N module | 4 | |||
Observations | 650 | |||
Marginal R2 / Conditional R2 | 0.009 / 0.688 |
Example: model on 2nd year marks of psychology students
## Linear mixed model fit by REML ['lmerMod']## Formula: mark ~ gender + (1 | student) + (1 | module)## Data: .## ## REML criterion at convergence: -1201.4## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -2.59763 -0.62613 0.03198 0.61472 2.32079 ## ## Random effects:## Groups Name Variance Std.Dev.## student (Intercept) 0.009515 0.09754 ## module (Intercept) 0.001882 0.04338 ## Residual 0.005238 0.07237 ## Number of obs: 650, groups: student, 163; module, 4## ## Fixed effects:## Estimate Std. Error t value## (Intercept) 0.63882 0.02349 27.190## gendermale -0.03174 0.02103 -1.509## ## Correlation of Fixed Effects:## (Intr)## gendermale -0.165
The variance of expected mark for females is the standard error of the intercept squared: 0.02352=σ2female
The variance of expected mark for males is the sum of the standard error of the intercept squared and the standard error of the effect of gender squared, plus twice their covariance: σ2female+0.02102+2rσfemale×0.0210)⏟covariance(r is the correlationof fixed effects)=σ2male
The variance of expected mark for females is the standard error of the intercept squared: 0.02352=σ2female
The variance of expected mark for males is the sum of the standard error of the intercept squared and the standard error of the effect of gender squared, plus twice their covariance: σ2female+0.02102+2rσfemale×0.0210)⏟covariance(r is the correlationof fixed effects)=σ2male
see variance of the sum of random variables: σ2A+B=σ2A+σ2B+2rσAσB⏟cov(A,B)
To calculate the pooled variance, we take into account that there are 530 females and 120 males in the dataset, σ2pooled=(530−1)σ2female+(120−1)σ2female530+120−2
Here is the estimate of effect size as estimated by the LMM model Cohen's d=βgender√σ2+σ2pooled+τstudent+τmodule≈−0.15
Effect sizes
Estimating meta-analytic models in R (metafor
package, Viechtbauer (2010))
Random-effects model
Meta-regression
Multilevel meta analysis
Bias detection
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
R
Effect sizes
Estimating meta-analytic models in R (metafor
package, Viechtbauer (2010))
Random-effects model
Meta-regression
Multilevel meta analysis
Bias detection
A summary statistics that capture the direction and size of the "effect" of interest (e.g. the mean difference between two conditions, groups, etc.)
Unstandardized vs. standardized
A summary statistics that capture the direction and size of the "effect" of interest (e.g. the mean difference between two conditions, groups, etc.)
Unstandardized vs. standardized
Standardize effect size measures should be used when it is not possible to quantify express results in the same units across different studies; e.g. see Baguley (2009)
A summary statistics that capture the direction and size of the "effect" of interest (e.g. the mean difference between two conditions, groups, etc.)
Unstandardized vs. standardized
Standardize effect size measures should be used when it is not possible to quantify express results in the same units across different studies; e.g. see Baguley (2009)
For each effect size we need to know how reliable it is. This is typically quantified by the standard error of the effect size.
For each effect size we need to know how reliable it is. This is typically quantified by the standard error of the effect size. Formally we have:
For each effect size we need to know how reliable it is. This is typically quantified by the standard error of the effect size. Formally we have:
The sampling error is the 'uncontrolled' variability (accounting for the fact that even an exact replication of a study would likely give a different result).
For each effect size we need to know how reliable it is. This is typically quantified by the standard error of the effect size. Formally we have:
The sampling error is the 'uncontrolled' variability (accounting for the fact that even an exact replication of a study would likely give a different result).
If we were to repeat the experiment many times (ideally infinitely many) we would obtain the sampling distribution of the observed effect ˆθk.
For each effect size we need to know how reliable it is. This is typically quantified by the standard error of the effect size. Formally we have:
The sampling error is the 'uncontrolled' variability (accounting for the fact that even an exact replication of a study would likely give a different result).
If we were to repeat the experiment many times (ideally infinitely many) we would obtain the sampling distribution of the observed effect ˆθk.
Given a single study, we can never know ϵk, however we can estimate it's variability. The standard error (SE) of the effect size is the standard deviation of the sampling distribution, and therefore it is an estimate of how reliable is a study.
Example: the standard error of the mean is SE=s√n where s is the sample standard deviation and n the sample size. As n increases, the standard error will decrease proportionally to √n.
Example: the standard error of the mean is SE=s√n where s is the sample standard deviation and n the sample size. As n increases, the standard error will decrease proportionally to √n.
e.g. study A has twice the sample size of study B
Example: the standard error of the mean is SE=s√n where s is the sample standard deviation and n the sample size. As n increases, the standard error will decrease proportionally to √n.
e.g. study A has twice the sample size of study B⟹ the sampling error of study A is expected to be 1√2≈70% that of study B.
Standard error's for Cohen's d:
SE(d)=√n1+n2n1n2+d22(n1+n2)
For the meta-analytic approach taken in metafor
, we need to ensure that effect size measures are not restricted in their range
For the meta-analytic approach taken in metafor
, we need to ensure that effect size measures are not restricted in their range
For the meta-analytic approach taken in metafor
, we need to ensure that effect size measures are not restricted in their range
The dataset power_pose
contains 6 pre-registered replications testing whether participants reported higher "feeling-of-power" ratings after adopting expansive 'as opposed to constrictive body postures.
power_pose <- read.csv("https://mlisi.xyz/files/workshops/meta_analyses/data/power_pose.csv")str(power_pose)
## 'data.frame': 6 obs. of 7 variables:## $ study : chr "Bailey et al." "Ronay et al." "Klaschinski et al." "Bombari et al." ...## $ n_high_power : int 46 53 101 99 100 135## $ n_low_power : int 48 55 99 101 100 134## $ mean_high_power: num 2.62 2.12 3 2.24 2.58 ...## $ mean_low_power : num 2.39 1.95 2.73 1.99 2.44 ...## $ sd_high_power : num 0.932 0.765 0.823 0.927 0.789 ...## $ sd_low_power : num 0.935 0.792 0.9 0.854 0.977 ...
The dataset power_pose
contains 6 pre-registered replications testing whether participants reported higher "feeling-of-power" ratings after adopting expansive 'as opposed to constrictive body postures.
power_pose <- read.csv("https://mlisi.xyz/files/workshops/meta_analyses/data/power_pose.csv")str(power_pose)
## 'data.frame': 6 obs. of 7 variables:## $ study : chr "Bailey et al." "Ronay et al." "Klaschinski et al." "Bombari et al." ...## $ n_high_power : int 46 53 101 99 100 135## $ n_low_power : int 48 55 99 101 100 134## $ mean_high_power: num 2.62 2.12 3 2.24 2.58 ...## $ mean_low_power : num 2.39 1.95 2.73 1.99 2.44 ...## $ sd_high_power : num 0.932 0.765 0.823 0.927 0.789 ...## $ sd_low_power : num 0.935 0.792 0.9 0.854 0.977 ...
Calculate standardized effect size (Cohen's d) and its standard error for each study, and add it as an additional column in the dataset.
Additional: Compute also the t statistic and the one-sided p-value for each study
Formulae:
Cohen's d=¯M1−¯M2σpooled
SE(d)=√n1+n2n1n2+d22(n1+n2)
σpooled=√(n1−1)σ21+(n2−1)σ22n1+n2−2
Formulae:
Cohen's d=¯M1−¯M2σpooled
SE(d)=√n1+n2n1n2+d22(n1+n2)
σpooled=√(n1−1)σ21+(n2−1)σ22n1+n2−2
t=¯M1−¯M2σpooled√1n1+1n2
Formulae:
Cohen's d=¯M1−¯M2σpooled
SE(d)=√n1+n2n1n2+d22(n1+n2)
σpooled=√(n1−1)σ21+(n2−1)σ22n1+n2−2
t=¯M1−¯M2σpooled√1n1+1n2
Calculate standardized effect size (Cohen's d) and its standard error for each study, and add it as an additional column in the dataset.
Additional: Compute also the t statistic and the one-sided p-value for each study
# function for computing pooled SDpooled_SD <- function(sdA,sdB,nA,nB){ sqrt((sdA^2*(nA-1) + sdB^2*(nB-1))/(nA+nB-1))}# function for Cohen's dcohen_d_se <- function(n1,n2,d){ sqrt((n1+n2)/(n1*n2) + (d^2)/(2*(n1+n2)))}# apply functions & create new columns in the dataset (using dplyr::mutate)power_pose <- power_pose %>% mutate(pooled_sd =pooled_SD(sd_high_power, sd_low_power, n_high_power, n_low_power), mean_diff = mean_high_power - mean_low_power, cohen_d = mean_diff/pooled_sd, SE = cohen_d_se(n_high_power, n_low_power,cohen_d))
# function for computing pooled SDpooled_SD <- function(sdA,sdB,nA,nB){ sqrt((sdA^2*(nA-1) + sdB^2*(nB-1))/(nA+nB-1))}# function for Cohen's dcohen_d_se <- function(n1,n2,d){ sqrt((n1+n2)/(n1*n2) + (d^2)/(2*(n1+n2)))}# apply functions & create new columns in the dataset (base R)power_pose$pooled_sd <- with(power_pose, pooled_SD(sd_high_power, sd_low_power, n_high_power, n_low_power))power_pose$mean_diff <- with(power_pose, mean_high_power - mean_low_power)power_pose$cohen_d <- with(power_pose, mean_diff/pooled_sd)power_pose$SE <- with(power_pose, cohen_d_se(n_high_power, n_low_power,cohen_d))
par(mfrow=c(1,2))n <- with(power_pose, n_high_power + n_low_power)plot(n,power_pose$SE, ylab="SE(Cohen's d)",cex=1.5)plot(power_pose$mean_diff, power_pose$cohen_d, xlab = "mean difference (5-point Likert scale)", ylab="Cohen's d",cex=1.5)
We have defined θk as the true effect size of study k and ˆθk=θk+ϵk as the observed effect size.
We have defined θk as the true effect size of study k and ˆθk=θk+ϵk as the observed effect size.
In the random-effects model, we further assume that θk is a random sample from a population of possible studies that measure the same effect but differ in their methods or sample characteristics. These differences introduce heterogeneity among the true effects.
We have defined θk as the true effect size of study k and ˆθk=θk+ϵk as the observed effect size.
In the random-effects model, we further assume that θk is a random sample from a population of possible studies that measure the same effect but differ in their methods or sample characteristics. These differences introduce heterogeneity among the true effects.
The random-effects model account for this heterogeneity by treating as an additional level of randomness; that is we assume that θk=μ+ζk where μ is the average of true effects, and ζk∼N(0,τ2) are random, study-specific fluctuations in the true effect. The variance τ2 quantifies the amount of heterogeneity among true effects.
We have defined θk as the true effect size of study k and ˆθk=θk+ϵk as the observed effect size.
In the random-effects model, we further assume that θk is a random sample from a population of possible studies that measure the same effect but differ in their methods or sample characteristics. These differences introduce heterogeneity among the true effects.
The random-effects model account for this heterogeneity by treating as an additional level of randomness; that is we assume that θk=μ+ζk where μ is the average of true effects, and ζk∼N(0,τ2) are random, study-specific fluctuations in the true effect. The variance τ2 quantifies the amount of heterogeneity among true effects.
Putting it all together we have that ˆθk=μ+ζk+ϵk with ζk∼N(0,τ2)ϵk∼N(0,σ2k)
Putting it all together we have that ˆθk=μ+ζk+ϵk with ζk∼N(0,τ2)ϵk∼N(0,σ2k)
Or equivalently, in more compact notation: ˆθk∼N(θk,σ2k),θk∼N(μ,τ2)
metafor
packageThe function rma
allows to fit the random-effects model very easily.
metafor
packageThe function rma
allows to fit the random-effects model very easily.
For the power_pose
example
str(power_pose)
## 'data.frame': 6 obs. of 11 variables:## $ study : chr "Bailey et al." "Ronay et al." "Klaschinski et al." "Bombari et al." ...## $ n_high_power : int 46 53 101 99 100 135## $ n_low_power : int 48 55 99 101 100 134## $ mean_high_power: num 2.62 2.12 3 2.24 2.58 ...## $ mean_low_power : num 2.39 1.95 2.73 1.99 2.44 ...## $ sd_high_power : num 0.932 0.765 0.823 0.927 0.789 ...## $ sd_low_power : num 0.935 0.792 0.9 0.854 0.977 ...## $ pooled_sd : num 0.929 0.775 0.86 0.889 0.886 ...## $ mean_diff : num 0.234 0.177 0.275 0.252 0.13 ...## $ cohen_d : num 0.252 0.229 0.319 0.284 0.147 ...## $ SE : num 0.207 0.193 0.142 0.142 0.142 ...
we can estimate the model with this command:
library(metafor)res <- rma(cohen_d, SE^2, data = power_pose)
Model output:
summary(res)
## ## Random-Effects Model (k = 6; tau^2 estimator: REML)## ## logLik deviance AIC BIC AICc ## 4.0374 -8.0748 -4.0748 -4.8559 1.9252 ## ## tau^2 (estimated amount of total heterogeneity): 0 (SE = 0.0139)## tau (square root of estimated tau^2 value): 0## I^2 (total heterogeneity / total variability): 0.00%## H^2 (total variability / sampling variability): 1.00## ## Test for Heterogeneity:## Q(df = 5) = 1.2981, p-val = 0.9351## ## Model Results:## ## estimate se zval pval ci.lb ci.ub ## 0.2230 0.0613 3.6358 0.0003 0.1028 0.3432 *** ## ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In the random-effects model the τ2 directly quantify the heterogeneity.
In the random-effects model the τ2 directly quantify the heterogeneity.
There are other measures of of heterogeneity which can be applied also to non random-effects models:
Weighted sum of the squared differences between effect size estimates in each study, ˆθk, and the meta-analytic effect size estimate, ˆμk
Q=K∑k=1wk(ˆθk−ˆμ)2
where wk=1σ2k
Weighted sum of the squared differences between effect size estimates in each study, ˆθk, and the meta-analytic effect size estimate, ˆμk
Q=K∑k=1wk(ˆθk−ˆμ)2
where wk=1σ2k
Under the null hypothesis that all effect size differences are caused only by sampling error (no heterogeneity), Q is approximately distributed as χ2 with K−1 degrees of freedom.
Percentage of variability in the observed effect sizes that is not caused by sampling error
I2=Q−(K−1)Q
where K is the total number of studies.
H2=QK−1
It can be interpreted as the ratio of the observed variation, and the expected variance due to sampling error.
H2>1 indicates heterogeneity
Model output:
summary(res)
## ## Random-Effects Model (k = 6; tau^2 estimator: REML)## ## logLik deviance AIC BIC AICc ## 4.0374 -8.0748 -4.0748 -4.8559 1.9252 ## ## tau^2 (estimated amount of total heterogeneity): 0 (SE = 0.0139)## tau (square root of estimated tau^2 value): 0## I^2 (total heterogeneity / total variability): 0.00%## H^2 (total variability / sampling variability): 1.00## ## Test for Heterogeneity:## Q(df = 5) = 1.2981, p-val = 0.9351## ## Model Results:## ## estimate se zval pval ci.lb ci.ub ## 0.2230 0.0613 3.6358 0.0003 0.1028 0.3432 *** ## ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since τ2≈0, we could also estimate a fixed-effect model (option method="FE"
)
summary(rma(cohen_d, SE^2, data = power_pose, method="FE"))
## ## Fixed-Effects Model (k = 6)## ## logLik deviance AIC BIC AICc ## 5.0141 1.2981 -8.0283 -8.2365 -7.0283 ## ## I^2 (total heterogeneity / total variability): 0.00%## H^2 (total variability / sampling variability): 0.26## ## Test for Heterogeneity:## Q(df = 5) = 1.2981, p-val = 0.9351## ## Model Results:## ## estimate se zval pval ci.lb ci.ub ## 0.2230 0.0613 3.6358 0.0003 0.1028 0.3432 *** ## ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The metafor
package has also a handy function to make forest plots:
forest(res)
With only a few addition we can make a publication-quality figure
forest(res, slab=study, xlim=c(-5,2), xlab="Cohen's d", ilab=round(cbind(with(power_pose, n_high_power + n_low_power), mean_low_power, sd_low_power, mean_high_power, sd_high_power),digits=2), ilab.xpos=c(-3.5, -2.5,-2,-1.25,-0.75), cex=.75, header="Authors", mlab="")par(cex=.75, font=2)text(c(-2.5,-2,-1.25,-0.75), res$k+2, c("mean", "Std.", "mean", "Std."))text(c(-2.25,-1), res$k+3, c("Low power", "High power"))text(-3.5, res$k+2, "N")# this add the results of the heterogeneity testtext(-5, -1, pos=4, cex=0.9, bquote(paste("RE Model (Q = ",.(formatC(res$QE, digits=2, format="f")), ", df = ", .(res$k - res$p),", p = ", .(formatC(res$QEp, digits=2, format="f")), "; ", I^2, " = ", .(formatC(res$I2, digits=1, format="f")), "%)")))
towels
contains results of a set of studies that investigated whether people reuse towels in hotels more often if they are provided with a descriptive norm (a message stating that the majority of guests actually reused their towels), as opposed to a control condition in which they were simply encouraged to reuse the towels (with a message about the protection of the environment)towels <- read.csv("https://mlisi.xyz/files/workshops/meta_analyses/data/towels.csv")str(towels)
## 'data.frame': 7 obs. of 3 variables:## $ study: chr "Goldstein, Cialdini, & Griskevicius (2008), Exp. 1" "Goldstein, Cialdini, & Griskevicius (2008), Exp. 2" "Schultz, Khazian, & Zaleski (2008), Exp. 2" "Schultz, Khazian, & Zaleski (2008), Exp. 3" ...## $ logOR: num 0.381 0.305 0.206 0.251 0.288 ...## $ SE : num 0.198 0.136 0.192 0.17 0.824 ...
ThirdWave
studies examining the effect of so-called “third wave” psychotherapies on perceived stress in college students. The effect size is the standardized mean difference between a treatment and control group at post-testload(url('https://mlisi.xyz/files/workshops/meta_analyses/data/thirdwave.rda'))str(ThirdWave)
## 'data.frame': 18 obs. of 8 variables:## $ Author : chr "Call et al." "Cavanagh et al." "DanitzOrsillo" "de Vibe et al." ...## $ TE : num 0.709 0.355 1.791 0.182 0.422 ...## $ seTE : num 0.261 0.196 0.346 0.118 0.145 ...## $ RiskOfBias : chr "high" "low" "high" "low" ...## $ TypeControlGroup : chr "WLC" "WLC" "WLC" "no intervention" ...## $ InterventionDuration: chr "short" "short" "short" "short" ...## $ InterventionType : chr "mindfulness" "mindfulness" "ACT" "mindfulness" ...## $ ModeOfDelivery : chr "group" "online" "group" "group" ...
Example taken from the book by Harrer, Cuijpers, A, and Ebert (2021).
Study-level variables ('moderators') can also be included as predictors in meta-analytic models, leading to so-called 'meta-regression' analyses.
Example, with 1 study-level moderator variable Xk
ˆθk=β0+βXk+ζk+ϵk
where the intercept β0 corresponds to the average true effect/outcome μ when the value of the moderator variable Xk is equal to zero.
Does the duration of intervention influence the effect size?
# transform duration in a dummy variableThirdWave$duration_dummy <- ifelse(ThirdWave$InterventionDuration=="long", 1, 0)res2 <- rma(yi=TE, vi=seTE^2, data = ThirdWave, mods = duration_dummy)print(res2)
## ## Mixed-Effects Model (k = 18; tau^2 estimator: REML)## ## tau^2 (estimated amount of residual heterogeneity): 0.0630 (SE = 0.0401)## tau (square root of estimated tau^2 value): 0.2510## I^2 (residual heterogeneity / unaccounted variability): 58.30%## H^2 (unaccounted variability / sampling variability): 2.40## R^2 (amount of heterogeneity accounted for): 23.13%## ## Test for Residual Heterogeneity:## QE(df = 16) = 37.5389, p-val = 0.0018## ## Test of Moderators (coefficient 2):## QM(df = 1) = 2.6694, p-val = 0.1023## ## Model Results:## ## estimate se zval pval ci.lb ci.ub ## intrcpt 0.4701 0.1001 4.6970 <.0001 0.2739 0.6663 *** ## mods 0.2739 0.1676 1.6338 0.1023 -0.0547 0.6025 ## ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Does adding the moderators significantly improve the ability of the model to explain the observed effect sizes?
# refit models using maximum likelihood estimation (ML)res0 <- rma(yi=TE, vi=seTE^2, data = ThirdWave, method="ML")res2 <- rma(yi=TE, vi=seTE^2, data = ThirdWave, mods = duration_dummy, method="ML")# run likelihood ratio test of nested modelsanova(res0, res2)
## ## df AIC BIC AICc logLik LRT pval QE tau^2 ## Full 3 20.6903 23.3614 22.4046 -7.3452 37.5389 0.0423 ## Reduced 2 21.5338 23.3145 22.3338 -8.7669 2.8435 0.0917 45.5026 0.0715 ## R^2 ## Full ## Reduced 40.7999%
Introduce a hierarchically higher level of randomness in which observed effect sizes are grouped in "clusters" that share some similarities (for example many effect sizes are nested in one study; or many studies are nested in subgroups depending on the nationality of the participants).
Introduce a hierarchically higher level of randomness in which observed effect sizes are grouped in "clusters" that share some similarities (for example many effect sizes are nested in one study; or many studies are nested in subgroups depending on the nationality of the participants).
Formally ˆθij=μ+ζ(2)ij+ζ(3)j+ϵij
load(url('https://mlisi.xyz/files/workshops/meta_analyses/data/Chernobyl.rda'))str(Chernobyl)
## 'data.frame': 33 obs. of 8 variables:## $ author : chr "Aghajanyan & Suskov (2009)" "Aghajanyan & Suskov (2009)" "Aghajanyan & Suskov (2009)" "Aghajanyan & Suskov (2009)" ...## $ cor : num 0.206 0.269 0.205 0.267 0.932 ...## $ n : num 91 91 92 92 559 559 559 559 559 559 ...## $ z : num 0.209 0.275 0.208 0.274 1.671 ...## $ se.z : num 0.1066 0.1066 0.106 0.106 0.0424 ...## $ var.z : num 0.0114 0.0114 0.0112 0.0112 0.0018 ...## $ radiation: chr "low" "low" "low" "low" ...## $ es.id : chr "id_1" "id_2" "id_3" "id_4" ...
Studies reported correlation between radioactivity die to the nuclear fallout and mutation rates in humans, caused by the 1986 Chernobyl reactor disaster.
Most studies reported multiple effect sizes (e.g. because they used several methods to measure mutations or considered several subgroups, such as exposed parents versus their offspring).
In R, we can fit a model using the rma.mv
function and setting the random
argument to indicate the clustering structure
model_cherobyl <- rma.mv(yi = z, V = var.z, slab = author, data = Chernobyl, random = ~ 1 | author/es.id, # equivalent formulations below: # random = list(~ 1 | author, ~ 1 | interaction(author,es.id)), # random = list(~ 1 | author, ~ 1 | es.id), method = "REML")summary(model_cherobyl)
## ## Multivariate Meta-Analysis Model (k = 33; method: REML)## ## logLik Deviance AIC BIC AICc ## -21.1229 42.2458 48.2458 52.6430 49.1029 ## ## Variance Components:## ## estim sqrt nlvls fixed factor ## sigma^2.1 0.1788 0.4229 14 no author ## sigma^2.2 0.1194 0.3455 33 no author/es.id ## ## Test for Heterogeneity:## Q(df = 32) = 4195.8268, p-val < .0001## ## Model Results:## ## estimate se zval pval ci.lb ci.ub ## 0.5231 0.1341 3.9008 <.0001 0.2603 0.7860 *** ## ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
forest(model_cherobyl)
library(metafor)dat <- dat.konstantopoulos2011head(dat)
## ## district school study year yi vi ## 1 11 1 1 1976 -0.180 0.118 ## 2 11 2 2 1976 -0.220 0.118 ## 3 11 3 3 1976 0.230 0.144 ## 4 11 4 4 1976 -0.300 0.144 ## 5 12 1 5 1989 0.130 0.014 ## 6 12 2 6 1989 -0.260 0.014
Results from 56 studies on the effects of modified school calendars (comparing more frequent but shorter intermittent breaks, to a longer summer break) on student achievement. The effect yi
is the standardized difference between the groups in each study, and vi
is the study variance.
(More info about this dataset at this link)
Funnel plot
Funnel plot from Carter and McCullough (2014) vizualizing bias in 198 published tests of the ego-depletion effect.
Image taken from Lakens (2023), link
ThirdWave
datasetres <- rma(yi=TE, vi=seTE^2, data = ThirdWave)funnel(res, level=c(90, 95, 99), shade=c("white", "gray55", "gray75"), refline=0, legend=TRUE, xlab="Effect size (standardized mean difference)", refline2=res$b)
Egger’s regression test for funnel plot asymmetry
regtest(res, model="lm")
## ## Regression Test for Funnel Plot Asymmetry## ## Model: weighted regression with multiplicative dispersion## Predictor: standard error## ## Test for Funnel Plot Asymmetry: t = 4.6770, df = 16, p = 0.0003## Limit Estimate (as sei -> 0): b = -0.3407 (CI: -0.7302, 0.0487)
Egger’s regression test for funnel plot asymmetry
regtest(res, model="lm")
## ## Regression Test for Funnel Plot Asymmetry## ## Model: weighted regression with multiplicative dispersion## Predictor: standard error## ## Test for Funnel Plot Asymmetry: t = 4.6770, df = 16, p = 0.0003## Limit Estimate (as sei -> 0): b = -0.3407 (CI: -0.7302, 0.0487)
with(ThirdWave, summary(lm(I(TE/seTE) ~ I(1/seTE))))
## ## Call:## lm(formula = I(TE/seTE) ~ I(1/seTE))## ## Residuals:## Min 1Q Median 3Q Max ## -1.82920 -0.55227 -0.03299 0.66857 2.05815 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.1111 0.8790 4.677 0.000252 ***## I(1/seTE) -0.3407 0.1837 -1.855 0.082140 . ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 1.096 on 16 degrees of freedom## Multiple R-squared: 0.177, Adjusted R-squared: 0.1255 ## F-statistic: 3.44 on 1 and 16 DF, p-value: 0.08214
Egger’s regression test tidyverse
style
ThirdWave %>% mutate(y = TE/seTE, x = 1/seTE) %>% lm(y ~ x, data = .) %>% summary()
## ## Call:## lm(formula = y ~ x, data = .)## ## Residuals:## Min 1Q Median 3Q Max ## -1.82920 -0.55227 -0.03299 0.66857 2.05815 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.1111 0.8790 4.677 0.000252 ***## x -0.3407 0.1837 -1.855 0.082140 . ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 1.096 on 16 degrees of freedom## Multiple R-squared: 0.177, Adjusted R-squared: 0.1255 ## F-statistic: 3.44 on 1 and 16 DF, p-value: 0.08214
Baguley, T. (2009). "Standardized or simple effect size: What should be reported?" In: British Journal of Psychology 100.3, pp. 603-617. DOI: https://doi.org/10.1348/000712608X377117.
Harrer, M., P. Cuijpers, F. T. A, et al. (2021). Doing Meta-Analysis With R: A Hands-On Guide. 1st. Boca Raton, FL and London: Chapman & Hall/CRC Press. ISBN: 9780367610074.
Lakens, D. (2023). Improving Your Statistical Inferences. https://doi.org/10.5281/zenodo.6409077.
Viechtbauer, W. (2010). "Conducting meta-analyses in R with the metafor package". In: Journal of Statistical Software 36.3, pp. 1-48. https://doi.org/10.18637/jss.v036.i03.
mark | ||||
---|---|---|---|---|
Predictors | Estimates | std. Error | CI | p |
(Intercept) | 0.6388 | 0.0235 | 0.5927 – 0.6850 | <0.001 |
gender [male] | -0.0317 | 0.0210 | -0.0730 – 0.0096 | 0.132 |
Random Effects | ||||
σ2 | 0.0052 | |||
τ00 student | 0.0095 | |||
τ00 module | 0.0019 | |||
ICC | 0.6851 | |||
N student | 163 | |||
N module | 4 | |||
Observations | 650 | |||
Marginal R2 / Conditional R2 | 0.009 / 0.688 |
Example: model on 2nd year marks of psychology students
## Linear mixed model fit by REML ['lmerMod']## Formula: mark ~ gender + (1 | student) + (1 | module)## Data: .## ## REML criterion at convergence: -1201.4## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -2.59763 -0.62613 0.03198 0.61472 2.32079 ## ## Random effects:## Groups Name Variance Std.Dev.## student (Intercept) 0.009515 0.09754 ## module (Intercept) 0.001882 0.04338 ## Residual 0.005238 0.07237 ## Number of obs: 650, groups: student, 163; module, 4## ## Fixed effects:## Estimate Std. Error t value## (Intercept) 0.63882 0.02349 27.190## gendermale -0.03174 0.02103 -1.509## ## Correlation of Fixed Effects:## (Intr)## gendermale -0.165
The variance of expected mark for females is the standard error of the intercept squared: 0.02352=σ2female
The variance of expected mark for males is the sum of the standard error of the intercept squared and the standard error of the effect of gender squared, plus twice their covariance: σ2female+0.02102+2rσfemale×0.0210)⏟covariance(r is the correlationof fixed effects)=σ2male
The variance of expected mark for females is the standard error of the intercept squared: 0.02352=σ2female
The variance of expected mark for males is the sum of the standard error of the intercept squared and the standard error of the effect of gender squared, plus twice their covariance: σ2female+0.02102+2rσfemale×0.0210)⏟covariance(r is the correlationof fixed effects)=σ2male
see variance of the sum of random variables: σ2A+B=σ2A+σ2B+2rσAσB⏟cov(A,B)
To calculate the pooled variance, we take into account that there are 530 females and 120 males in the dataset, σ2pooled=(530−1)σ2female+(120−1)σ2female530+120−2
Here is the estimate of effect size as estimated by the LMM model Cohen's d=βgender√σ2+σ2pooled+τstudent+τmodule≈−0.15