Reproducibility in Psychology
Meme
Context
the accuracy and reliability of reproducibility metrics in psychology—the field has been grappling with a well-documented replication crisis for over a decade. Here's a curated set of key findings and references that highlight the scope and causes of the problem:
📉 1. Reproducibility Project: Psychology (Open Science Collaboration, 2015) Only 36% of 100 replicated studies from top psychology journals produced statistically significant results, compared to 97% in the original studies.
The average effect size in replications was about half that of the originals.
Conclusion: Even high-quality studies often fail to replicate, suggesting systemic issues in research design, analysis, and publication incentives.
📖 Read the full study inScience
🧠 2. Statistical Assumptions and Logical Fallacies A 2022 study in Frontiers in Psychology found that many psychological studies violate key statistical assumptions (e.g. normality, homoscedasticity).
It also critiques the misuse of p-values, noting logical fallacies like the transposed conditional and affirming the consequent.
These flaws undermine the reliability of statistical inferences and inflate false positives.
📖 Frontiers in Psychology – Statistical Assumptions and Reproducibility
🧪 3. Meta-Analyses and Replication Rates The Many Labs projects found that only 50–54% of classic psychology effects replicated with p < .05.
Effect sizes were consistently smaller in replications than in original studies.
Subfields like social psychology had particularly low replication rates.
📖 Reflections on the Reproducibility Project – Springer
🧰 4. Measurement Reliability and Validity Many psychological instruments lack test-retest reliability, inter-rater reliability, or internal consistency.
Even when reliable, they may not be valid—i.e., they don’t measure what they claim to measure.
📖 Reliability in Psychology Research – SimplyPsychology
🧩 5. Structural Incentives and Publication Bias Researchers are incentivized to publish novel, significant results, not replications.
Practices like p-hacking, HARKing (hypothesizing after results are known), and selective reporting contribute to irreproducibility.
📖 Quantity Over Quality? – UC Press
🧠 Summary Table Issue Impact on Reproducibility Low replication rates Only ~36–50% of studies replicate Inflated effect sizes Replications show smaller effects Statistical misuse Misinterpretation of p-values, flawed assumptions Measurement problems Poor reliability and questionable validity Incentive structures Favor novelty over verification