Statistical Reporting Practices: What the Research Shows

A Review of Published Meta-Analyses on Effect Sizes, Power, and Chi-Square Testing

Type: Literature Review  |  Last Updated: January 30, 2026  |  Sources: Peer-reviewed meta-analyses and systematic reviews

Overview

This review synthesizes findings from major meta-analyses examining statistical reporting practices in psychology and related fields. The evidence reveals persistent gaps between recommended practices and actual reporting, with implications for research reproducibility and interpretation.

Key Findings from the Literature

Statistical Power: A Persistent Problem

Jacob Cohen first documented the power problem in 1962, finding that abnormal psychology studies had only about 48% power to detect medium effects—essentially a coin flip. More than 50 years later, the situation has not meaningfully improved.

Szucs & Ioannidis (2017) analyzed 26,841 statistical records from 3,801 papers published in cognitive neuroscience and psychology journals (2011-2014). They found median power of:
Source: PLoS Biology 15(3): e2000797. doi:10.1371/journal.pbio.2000797
Stanley, Carter & Doucouliagos (2018) reviewed 200 meta-analyses spanning nearly 8,000 individual studies. They found:
Source: Psychological Bulletin 144(12): 1325-1346. doi:10.1037/bul0000169

Power by Research Domain

Domain Median Power Source
Neuroscience 21% Button et al. (2013)
Psychology (overall) 36% Stanley et al. (2018)
Applied Psychology 52%* Mone et al. (1996)
Intelligence Research 12%** Nuijten et al. (2020)

*For medium effects. **For small effects, median N = 60.

Effect Size Reporting

The American Psychological Association has required effect size reporting since 2001, yet compliance remains inconsistent.

Sun, Pan & Wang (2010) reviewed 1,243 articles from 14 journals (2005-2007):
Source: Educational Psychology Review 25(1): 89-118. doi:10.1007/s10648-012-9218-2

Improvement Over Time

Fritz, Scherndl & Kühberger (2012) documented a growth rate in effect size reporting of approximately 2% per year between 1990 and 2007. More recent data suggests continued improvement, with some journals now achieving near-complete compliance:

Period Effect Size Reporting Rate Source
1990s ~20-30% Fritz et al. (2012)
2005-2007 49% Sun et al. (2010)
Post-2020 (Social/Personality) 97% Farmus et al. (2023)

Chi-Square Test Assumptions

The chi-square test requires certain conditions to produce valid p-values. The most commonly cited rule comes from Cochran (1954):

Cochran's Rule: Avoid using the chi-square test when more than 20% of cells have expected frequencies less than 5, or when any cell has an expected frequency less than 1.
Source: Cochran WG (1954) Some methods for strengthening the common χ² tests. Biometrics 10(4): 417-451.

When to Use Alternative Tests

Condition Recommended Test Rationale
Any expected count < 5 (2×2 table) Fisher's Exact Test Computes exact p-value
Total N < 20 Fisher's Exact Test Chi-square approximation unreliable
>20% cells with E < 5 Combine categories or Fisher's Inflates Type I error rate
Paired/matched data McNemar's Test Independence assumption violated

Research by Camilli & Hopkins (1978) and others suggests that Yates' continuity correction may be overly conservative, and modern practice favors Fisher's exact test when assumptions are not met.

Implications for Researchers

Based on this review, we recommend:

  1. Conduct a priori power analysis — Only 2.9% of studies report doing this, yet it's essential for adequate sample sizes
  2. Always report effect sizes with confidence intervals — Required by APA since 2001, but still underreported
  3. Check chi-square assumptions — Use Fisher's exact test when expected frequencies are low
  4. Report exact p-values — Not just "p < .05" but the actual value
  5. Interpret effect size magnitude — A significant p-value with negligible effect size has limited practical value

Tools for Better Practices

CrossTabs.com automatically calculates all recommended statistics:

Try CrossTabs.com Free →

References

  1. Button KS, Ioannidis JPA, Mokrysz C, et al. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365-376. doi:10.1038/nrn3475
  2. Cochran WG (1954). Some methods for strengthening the common χ² tests. Biometrics, 10(4), 417-451.
  3. Cohen J (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65(3), 145-153.
  4. Fritz A, Scherndl T, Kühberger A (2012). A comprehensive review of reporting practices in psychological journals. Theory & Psychology, 23, 98-122. doi:10.1177/0959354312436870
  5. Mone MA, Mueller GC, Mauland W (1996). The perceptions and usage of statistical power in applied psychology and management research. Personnel Psychology, 49(1), 103-120.
  6. Stanley TD, Carter EC, Doucouliagos H (2018). What meta-analyses reveal about the replicability of psychological research. Psychological Bulletin, 144(12), 1325-1346. doi:10.1037/bul0000169
  7. Sun S, Pan W, Wang LL (2010). A comprehensive review of effect size reporting and interpreting practices in academic journals in education and psychology. Journal of Educational Psychology, 102(4), 989-1004.
  8. Szucs D, Ioannidis JPA (2017). Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biology, 15(3), e2000797. doi:10.1371/journal.pbio.2000797
How to Cite This Review
CrossTabs (2026). Statistical Reporting Practices: What the Research Shows. CrossTabs.com. Retrieved from https://crosstabs.com/pages/research-chi-square-practices-2026.html

About this review: This page synthesizes findings from peer-reviewed meta-analyses and systematic reviews. All statistics are sourced from published research. CrossTabs.com provides free statistical tools designed to support best practices in categorical data analysis.