I’m taking comps pretty soon so this is my summary document regarding meta-analysis.

MAs give us an average estimate across settings, tests, and people after correcting for noise. In a bare-bones MA, we correct only for sampling error. In a full MA, we correct for sampling error, unreliability, and range restriction. I’ll demonstrate a full MA here where we assume direct (rather than indirect) range restriction.

# Steps

Literature Review

- Create inclusion criteria for how you are going to select studies
- Find relevant articles

Code Articles

- Measures
- Reliability
- SD
- Means
- Effect sizes
- Moderators

Calculate Meta-Analytic Estimate

- Calculate meta-analytic effect size
- Calculate its variance

In this post, I’m focusing only on step 3, the calculations, even though steps 1 and 2 are arguably the more important pieces.

## Calculating MA Estimate and Variance

Within this step, there are many substeps:

Calculate the MA effect estimate (typically from correlations or cohen’s d’s).

Within each study gathered from our literature review…

Correct the observed correlation for range restriction, which produces a rr-corrected correlation

Use the rr-corrected correlation along with the criterion reliability to correct for unreliability, which produces operational validity

Then, use the operational validities along with sample sizes to correct for sampling error and produce a sample-size-weighted meta-analytic correlation

Calculate the variance in our MA effect estimate

Within each study gathered from our literature review…

- Compute a correction factor for unreliability on X
- Compute a correction factor for unreliability on Y
- Compute a correction factor for range restriction
Combine all of those together

Compute the error variance for a given observed correlation and correct it using the combined correction factor. This step produces the sampling error correction

Calculate the average sampling error from the sampling error corrections

Calculate the observed error variance

The MA variance estimate is equal to the observed error variance - the average sampling error

Before we begin, here is a peak at the (mock) data set. I reviewed four studies and compiled their observed effect sizes – in this case we’re going to use correlations. Let’s say that our IV is dancing ability and our DV is life satisfaction, both are continuous variables. We are interested in the meta-analytic correlation between dancing ability and life satisfaction.

```
## study restricted_predictor_sd unrestricted_predictor_sd
## 1 1 14 20
## 2 2 13 20
## 3 3 16 20
## 4 4 18 20
## predictor_reliability criterion_reliability sample_size
## 1 0.94 0.75 50
## 2 0.73 0.80 100
## 3 0.82 0.83 125
## 4 0.75 0.94 240
## observed_correlation
## 1 0.32
## 2 0.10
## 3 0.25
## 4 0.40
```

- Study = an ID number for each study in my meta-analysis
- Restricted SD = the standard deviation of scores on dancing ability within the study
- Unrestricted SD = the standard deviation of scores on on dancing ability across a larger population – from a manual, prior studies, known SDs, field reports, etc.
- Predictor reliability = the reliability of the measure used to assess dancing ability within the study
- Criterion reliability = the reliability of the measure used to assess life satisfaction within the study
- Sample size = how many people were observed within the study
- Observed correlation = the correlation between dancing ability and life satisfaction within the study

### a) Calculate the MA correlation

For each study gathered from our literature review…

#### 1)

Correct the observed correlation for range restriction, which produces a rr-corrected correlation

\[\begin{equation} r_{RR} = \dfrac{ \left(\dfrac{US_{x}}{RS_{x}}\right)r_{xy} } {\sqrt{1 + r^2_{xy}\left(\dfrac{US^2_{x}}{RS^2_{x}} - 1\right)} } \end{equation}\]

where \(r_{RR}\) is the correlation that is corrected for range restriction, \(US_x\) is the unrestricted SD on dancing ability, \(RS_x\) is the restricted SD on dancing ability, and \(r_{xy}\) is the correlation between dancing ability and life satisfaction. We are going to compute \(r_{RR}\) for every study.

```
df <- df %>%
mutate(r_RR =
((unrestricted_predictor_sd / restricted_predictor_sd)*observed_correlation) / sqrt(
1 + ((observed_correlation^2) * ((unrestricted_predictor_sd / restricted_predictor_sd) -1))
)
)
```

#### 2)

Use the rr-corrected correlation along with the criterion reliability to correct for unreliability, which produces operational validity

\[\begin{equation} r_{ov} = \dfrac{r_{RR}}{\sqrt{r_{yy}}} \end{equation}\]

where \(r_{ov}\) is the operational validity of dancing ability and life satisfaction, \(r_{RR}\) is the correlation we calculated in step 1 (the range-restriction-corrected correlation) between dancing ability and life satisfaction, and \(r_{yy}\) is the reliability of the criterion, life satisfaction.

```
df <- df %>%
mutate(r_ov =
r_RR / sqrt(criterion_reliability))
```

#### 3)

Then, use the operational validities along with sample sizes to correct for sampling error and produce a sample-size-weighted meta-analytic correlation

\[\begin{equation} \rho = \dfrac{\sum{w_sr_{ov_{i}}}}{\sum{w_s}} \end{equation}\]

where \(\rho\) is the meta-analytic estimate, \(r_{ov_{i}}\) is the operational validity between dancing ability and life satisfaction for each study, and \(w_s\) is the sample size for each study.

```
ovs_by_sample_size <- df$sample_size * df$r_ov
ma_correlation <- sum(ovs_by_sample_size) / sum(df$sample_size)
df <- df %>%
mutate(ma_correlation = ma_correlation)
```

### b) Calculate the variance in our MA effect estimate

Compile all of the corrections – steps 4 through 7

#### 4)

Compute the correction factor for unreliability on X, dancing ability (take the square root of the reliability)

```
df <- df %>%
mutate(cf_x = sqrt(predictor_reliability))
```

#### 5)

Compute the correction factor for unreliability on Y, life satisfaction (take the square root of the reliability)

```
df <- df %>%
mutate(cf_y = sqrt(criterion_reliability))
```

#### 6)

Compute the correction factor for range restriction

\[\begin{equation} a_{rr} = \dfrac{1}{ \left(\left(\dfrac{US_x}{RS_x}\right)^2 - 1\right)r_{xy}^2 + 1} \end{equation}\]

where all terms are defined above.

```
df <- df %>%
mutate(cf_rr = 1 /
( ((unrestricted_predictor_sd / restricted_predictor_sd)^2 - 1)*(observed_correlation^2) + 1 )
)
```

#### 7)

Combine all of those correction factors together into one common correction factor, \(A\).

```
df <- df %>%
mutate(A = cf_x*cf_y*cf_rr)
```

#### 8)

Compute the error variance for a given observed correlation and correct it using the combined correction factor.

This part takes three steps.

```
I: Compute the sample size weighted observed correlation
- Essentially the same thing as step 3 but using observed correlations rather than operational validities
```

\[\begin{equation} r_{wa} = \dfrac{\sum{w_sr_{xy_{i}}}}{\sum{w_s}} \end{equation}\]

```
ss_times_correlations <- df$sample_size*df$observed_correlation
wa_correlation <- sum(ss_times_correlations) / sum(df$sample_size)
```

`II: Compute the error variance on the observed correlation for each study`

\[\begin{equation} \sigma^2_e = \dfrac{\left(1-r_{wa}^2\right)^2}{N-1} \end{equation}\]

where \(\sigma^2_e\) is the error variance (for each study), \(r_{wa}\) is the weighted average observed correlation between dancing ability and life satisfaction that we computed above, and \(N\) is the sample size.

```
df <- df %>%
mutate(sigma2e =
((1 - wa_correlation^2)^2) / (sample_size - 1)
)
```

`III: Compute the sampling error correction for each study`

\[\begin{equation} Var_{ec} = \dfrac{\sigma^2_e}{A^2} \end{equation}\]

where \(Var_{ec}\) is the sampling error correction, \(\sigma^2_e\) is what we just calculated above, the error variance on the observed correlation for each study, and \(A\) is the combined correction factor for each study.

```
df <- df %>%
mutate(var_ec = sigma2e^2 / A^2)
```

#### 9)

Calculate the average sampling error

\[\begin{equation} Ave_{var_{ec}} = \dfrac{\sum{w_sVar_{ec}}}{\sum{w_s}} \end{equation}\]

where \(Ave_{var_{ec}}\) is the average sampling error, \(w_s\) is the sample size for each study, and \(Var_{ec}\) is the sampling error correction for each individual study.

```
ss_times_varec <- df$sample_size*df$var_ec
ave_var_ec <- sum(ss_times_varec) / sum(df$sample_size)
```

#### 10)

Calculate the observed error variance

\[\begin{equation} var_r = \dfrac{\sum{w_s\left(r_{xy} - r_{ov}\right)^2}}{\sum{w_s}} \end{equation}\]

where all terms are defined above.

```
ss_times_r_minus_ov <- df$sample_size*((df$observed_correlation - df$r_ov)^2)
var_r <- sum(ss_times_r_minus_ov) / sum(df$sample_size)
```

#### 11)

The MA variance estimate is equal to the observed error variance - the average sampling error

\[\begin{equation} Var_p = var_r - Ave_{var_{ec}} \end{equation}\]

`var_p <- var_r - ave_var_ec`

### Recap

What a nightmare. Here’s a recap:

Correct the observed correlations for unreliability and range restriction, use them to compute a sample-size-weighted MA correlation coefficient

Make a bunch of corrections and compute the average sampling error, and subtract that from the observed variance of the correlation coefficient to get a sense for the MA correlation coefficient variance

Now we can calculate credibility and confidence intervals.

### Credibility Interval

Gives us a sense for whether or not moderators are at play.

\[\begin{equation} \textrm{95 credibility interval} = \rho +- 1.96*\sqrt{Var_p} \end{equation}\]

```
upper_cred_i = ma_correlation + (1.96 * sqrt(var_p))
lower_cred_i = ma_correlation - (1.96 * sqrt(var_p))
upper_cred_i
```

`## [1] 0.5532591`

`lower_cred_i`

`## [1] 0.2024129`

Credibility Ratio: if \(\dfrac{ave_{var_{ec}}}{var_{r}}\) is lower than 0.75, then moderators may be at play.

`ave_var_ec / var_r`

`## [1] 0.01211969`

### Confidence Interval

\[\begin{equation} \textrm{95 confidence interval} = r_{ov} +- 1.96*SE_{r_{ov}} \end{equation}\]

where \(SE_{r_{ov}}\) is the standard error of the operational validities and is calculated as…

\[\begin{equation} SE_{r_{ov}} = \dfrac{SD_{r_{ov}}}{\sqrt{k}} \end{equation}\]

where \(k\) is the number of studies.

```
se_r_ov = sd(df$r_ov) / sqrt(length(df$study))
upper_ci = ma_correlation + (1.96 * se_r_ov)
lower_ci = ma_correlation - (1.96 * se_r_ov)
upper_ci
```

`## [1] 0.5263396`

`lower_ci`

`## [1] 0.2293324`

Bo\(^2\)m =)