The Replication Crisis: Most of What You Know About Psychology Is Wrong

What Is This?

In 2011, the field of psychology began eating itself.

Diederik Stapel, a prominent Dutch social psychologist with dozens of high-profile publications, was found to have fabricated his data wholesale — inventing the experimental results in his studies rather than collecting them. His work had been cited hundreds of times, taught in courses, and covered by major media outlets. He eventually confessed to fraud in 26 papers, and the investigation found problems with many more.

Stapel was the most spectacular case of fraud, but fraud was not the crisis. Fraud is rare. The crisis was something more mundane and more widespread: most published psychology findings, conducted honestly and analysed in good faith, could not be reproduced when other researchers tried.

In 2015, the Open Science Collaboration published the most systematic test to date: they attempted to replicate 100 studies from three major psychology journals. Only 39% replicated successfully. Not 39% of bad studies — 39% of published, peer-reviewed, often widely cited studies from respected journals.^1

The failure modes were not random. Certain types of findings failed at much higher rates: social priming effects (environmental cues subtly influencing behaviour), ego depletion (willpower as a depletable resource), unconscious bias effects, and many findings in social psychology generally. Others held up better: cognitive effects with strong mechanistic explanations, findings from clinical trials, results with large effect sizes.

The pattern was not "science is unreliable." It was something more specific: a set of structural incentives in academic publishing had produced a systematic bias toward false positives, and those false positives had accumulated for decades, becoming the basis for textbooks, TED talks, corporate training programmes, parenting advice, and popular psychology.

The replication crisis is the name for this reckoning — the ongoing process of testing the accumulated body of behavioural science findings and discovering which ones actually hold.

Why Does It Matter?

The studies you've heard of are disproportionately likely to be wrong. The most famous psychology experiments — the ones that became pop-psychology staples, TED talks, Malcolm Gladwell chapters, corporate training modules — are exactly the ones most likely to have failed replication. The selection mechanism is perverse: dramatic, counterintuitive findings get published (because they're surprising), get covered (because they're interesting), and get replicated (because they're famous). And they fail. The Stanford Prison Experiment was largely fabricated. Power posing (Amy Cuddy's famous 2012 paper — "your body language shapes who you are") failed to replicate on both of its core claims. Ego depletion — the idea that willpower is a finite resource that depletes like a muscle — failed a massive 23-lab pre-registered replication in 2016. Carol Dweck's growth mindset findings have partially replicated but at smaller effect sizes than the original papers. Priming effects (exposure to elderly-related words makes you walk slower, etc.) have almost uniformly failed.^2
The structural cause is incentives, not fraud. Most researchers were not fabricating data like Stapel. They were doing something subtler: p-hacking — running multiple analyses until one crossed the p<0.05 threshold for statistical significance, then reporting only that analysis. Or HARKing (Hypothesising After the Results are Known) — collecting data, finding a pattern, then writing the paper as if they had predicted the pattern in advance. Or running small samples that produced high variance, which meant that random noise was more likely to generate a "significant" finding. Or simply stopping data collection when the results looked good rather than at a predetermined sample size. None of these are fraud. All of them produce false positives. And all of them were standard practice because the incentive was clear: publish or perish, and only positive results get published.^3
The crisis spread to medicine, nutrition, and economics. Psychology was the field that most publicly confronted the problem, but the underlying incentive structure — publish or perish, preference for novel positive findings, small samples, flexible analysis — is shared across empirical science. John Ioannidis's 2005 paper "Why Most Published Research Findings Are False" demonstrated this for biomedical research. Nutrition science has an especially severe replication problem: most of what we "know" about which foods are healthy was derived from epidemiological studies with enormous confounding factors (people who eat salads also exercise and have higher incomes). The claim that dietary fat causes heart disease — one of the most influential nutritional recommendations of the 20th century, reshaping the food industry and public health policy — has been systematically undermined by subsequent research.
It requires a new personal epistemology. The practical implication is uncomfortable: you need to weight claimed psychological and behavioural findings differently based on their replication status and the conditions under which they were produced. Pre-registered studies (where hypotheses, methods, and sample sizes are logged before data collection) are far more reliable than standard published studies. Large samples are more reliable than small ones. Effect sizes matter — an effect that accounts for 2% of variance is practically negligible even if it's statistically significant. Studies with biological or mechanistic explanations are more reliable than findings that rely purely on statistical patterns. First-person evidence (what actually happens in your own experience) should be weighted more heavily than you've been taught — because the published study on the topic may be wrong.
Some things did replicate, and those should now be weighted more heavily. Basic cognitive psychology — memory, attention, perception — replicates at higher rates than social psychology. Well-established effects with large effect sizes replicate better than subtle, small effects. Clinical psychology findings from randomised controlled trials are generally more reliable than lab-based behavioural findings. Behavioural economics findings are mixed: loss aversion is robust; many specific framing effects are not. The replication crisis is not a reason to abandon empirical psychology — it's a reason to be much more selective about which empirical findings you use as the basis for action.

Key People & Players

Brian Nosek (University of Virginia / Center for Open Science) — The researcher who led the 2015 Reproducibility Project, the most comprehensive empirical test of the replication crisis. He has spent his career building the infrastructure for open science — pre-registration platforms, data sharing standards, replication journals — to structurally fix the incentive problems.^4

John Ioannidis (Stanford) — The most cited scientist who documented the replication problem in biomedical research. His 2005 paper "Why Most Published Research Findings Are False" is one of the most read papers in medical science. His ongoing work documents the poor quality of much clinical research.^5

Daniel Kahneman (1934–2024) — Nobel Laureate and author of Thinking, Fast and Slow. In 2012, he wrote an open letter warning social priming researchers that their field was facing a "train wreck" — that the findings he had cited in his own book were probably not replicating. Several of them didn't. Kahneman's willingness to publicly acknowledge that findings he'd endorsed were probably wrong was unusual and admirable.^6

Philip Zimbardo (Stanford) — Creator of the Stanford Prison Experiment (1971), one of the most famous psychology studies ever conducted. Journalist Ben Blum's 2018 investigation found previously unpublished recordings and documents showing that guards had been explicitly coached on how to behave, that participants had been pressured to remain in the experiment, and that Zimbardo himself had actively shaped events in ways that contradicted his later presentations. The "participants spontaneously developed sadistic guard behaviour" finding was essentially fabricated.^7

Amy Cuddy (Harvard Business School) — Her "power posing" paper (2010, with Dana Carney and Andy Yap) claimed that adopting expansive postures for two minutes changed hormone levels and increased risk tolerance. The TED talk was watched 70 million times. Dana Carney later published an explicit disavowal of the finding: she no longer believed in power posing. The hormone effects failed to replicate. Cuddy maintains that subjective feelings of confidence may still be affected; most researchers remain sceptical.^8

Andrew Gelman (Columbia) — Statistician who has written most clearly about the methodological problems underlying the crisis: the "garden of forking paths" (how analytical flexibility generates false positives), why p-values are not what most people think they are, and what statistical reforms would actually help.

The Current State

The replication crisis has produced two broad responses: reform and entrenchment.

The reform movement has made significant progress. Pre-registration (logging hypotheses and methods before data collection) is now standard in many journals. The Open Science Framework provides infrastructure for data and materials sharing. Replication studies are now publishable in dedicated journals. Many major journals have adopted registered reports (where peer review happens before data collection, ensuring publication regardless of results). Sample sizes have increased. Effect size reporting is more common.

The entrenchment is also real. Many researchers whose careers were built on findings that have now failed to replicate have not publicly updated their views. The textbooks have not been updated — intro psychology courses still teach findings that are now known not to replicate. TED talks remain online. Corporate training programmes continue to teach power posing. The popular consciousness lags the scientific correction by a decade or more.

The fields most affected (ranked roughly by severity of the crisis):

Social priming / environmental effects on behaviour
Unconscious bias and implicit attitude measures
Ego depletion and related willpower findings
Many educational psychology interventions (growth mindset at large scale)
Nutrition epidemiology
Much of pre-registration-era social neuroscience
Early experimental economics findings with small samples

Fields less affected:

Basic cognitive psychology (memory, attention, learning)
Clinical psychology (RCTs for well-established treatments)
Well-replicated behavioural economics effects (loss aversion, anchoring at large scales)
Psychophysiology with clear mechanistic grounding

Best Resources to Learn More

"Why Most Published Research Findings Are False" by John Ioannidis (2005) — The foundational paper. Free online. Methodologically dense but the abstract and discussion are accessible.^9
Vox: "The Stanford Prison Experiment was massively influential — and massively wrong" — The best single accessible account of the crisis and its most famous cases.^10
Science Fictions by Stuart Ritchie (2020) — The most readable book-length treatment of the replication crisis across science fields. Covers fraud, bias, negligence, and hype as the four failure modes.^11
Center for Open Science (cos.io) — The organisation building the infrastructure for fixing the crisis: pre-registration, data sharing, replication databases.^12
Calling Bullshit by Carl Bergstrom & Jevin West — The most practical guide to evaluating empirical claims. Teaches the specific tools for identifying which findings to trust.^13

Sources

What Is This?

In 2011, the field of psychology began eating itself.

The replication crisis is the name for this reckoning — the ongoing process of testing the accumulated body of behavioural science findings and discovering which ones actually hold.

Why Does It Matter?

The studies you've heard of are disproportionately likely to be wrong. The most famous psychology experiments — the ones that became pop-psychology staples, TED talks, Malcolm Gladwell chapters, corporate training modules — are exactly the ones most likely to have failed replication. The selection mechanism is perverse: dramatic, counterintuitive findings get published (because they're surprising), get covered (because they're interesting), and get replicated (because they're famous). And they fail. The Stanford Prison Experiment was largely fabricated. Power posing (Amy Cuddy's famous 2012 paper — "your body language shapes who you are") failed to replicate on both of its core claims. Ego depletion — the idea that willpower is a finite resource that depletes like a muscle — failed a massive 23-lab pre-registered replication in 2016. Carol Dweck's growth mindset findings have partially replicated but at smaller effect sizes than the original papers. Priming effects (exposure to elderly-related words makes you walk slower, etc.) have almost uniformly failed.^2
The structural cause is incentives, not fraud. Most researchers were not fabricating data like Stapel. They were doing something subtler: p-hacking — running multiple analyses until one crossed the p<0.05 threshold for statistical significance, then reporting only that analysis. Or HARKing (Hypothesising After the Results are Known) — collecting data, finding a pattern, then writing the paper as if they had predicted the pattern in advance. Or running small samples that produced high variance, which meant that random noise was more likely to generate a "significant" finding. Or simply stopping data collection when the results looked good rather than at a predetermined sample size. None of these are fraud. All of them produce false positives. And all of them were standard practice because the incentive was clear: publish or perish, and only positive results get published.^3
The crisis spread to medicine, nutrition, and economics. Psychology was the field that most publicly confronted the problem, but the underlying incentive structure — publish or perish, preference for novel positive findings, small samples, flexible analysis — is shared across empirical science. John Ioannidis's 2005 paper "Why Most Published Research Findings Are False" demonstrated this for biomedical research. Nutrition science has an especially severe replication problem: most of what we "know" about which foods are healthy was derived from epidemiological studies with enormous confounding factors (people who eat salads also exercise and have higher incomes). The claim that dietary fat causes heart disease — one of the most influential nutritional recommendations of the 20th century, reshaping the food industry and public health policy — has been systematically undermined by subsequent research.
It requires a new personal epistemology. The practical implication is uncomfortable: you need to weight claimed psychological and behavioural findings differently based on their replication status and the conditions under which they were produced. Pre-registered studies (where hypotheses, methods, and sample sizes are logged before data collection) are far more reliable than standard published studies. Large samples are more reliable than small ones. Effect sizes matter — an effect that accounts for 2% of variance is practically negligible even if it's statistically significant. Studies with biological or mechanistic explanations are more reliable than findings that rely purely on statistical patterns. First-person evidence (what actually happens in your own experience) should be weighted more heavily than you've been taught — because the published study on the topic may be wrong.
Some things did replicate, and those should now be weighted more heavily. Basic cognitive psychology — memory, attention, perception — replicates at higher rates than social psychology. Well-established effects with large effect sizes replicate better than subtle, small effects. Clinical psychology findings from randomised controlled trials are generally more reliable than lab-based behavioural findings. Behavioural economics findings are mixed: loss aversion is robust; many specific framing effects are not. The replication crisis is not a reason to abandon empirical psychology — it's a reason to be much more selective about which empirical findings you use as the basis for action.

Key People & Players

The Current State

The replication crisis has produced two broad responses: reform and entrenchment.

The fields most affected (ranked roughly by severity of the crisis):

Social priming / environmental effects on behaviour
Unconscious bias and implicit attitude measures
Ego depletion and related willpower findings
Many educational psychology interventions (growth mindset at large scale)
Nutrition epidemiology
Much of pre-registration-era social neuroscience
Early experimental economics findings with small samples

Fields less affected:

Basic cognitive psychology (memory, attention, learning)
Clinical psychology (RCTs for well-established treatments)
Well-replicated behavioural economics effects (loss aversion, anchoring at large scales)
Psychophysiology with clear mechanistic grounding

Best Resources to Learn More

"Why Most Published Research Findings Are False" by John Ioannidis (2005) — The foundational paper. Free online. Methodologically dense but the abstract and discussion are accessible.^9
Vox: "The Stanford Prison Experiment was massively influential — and massively wrong" — The best single accessible account of the crisis and its most famous cases.^10
Science Fictions by Stuart Ritchie (2020) — The most readable book-length treatment of the replication crisis across science fields. Covers fraud, bias, negligence, and hype as the four failure modes.^11
Center for Open Science (cos.io) — The organisation building the infrastructure for fixing the crisis: pre-registration, data sharing, replication databases.^12
Calling Bullshit by Carl Bergstrom & Jevin West — The most practical guide to evaluating empirical claims. Teaches the specific tools for identifying which findings to trust.^13

The Replication Crisis: Most of What You Know About Psychology Is Wrong

What Is This?

Why Does It Matter?

Key People & Players

The Current State

Best Resources to Learn More

Sources

Want to go deeper?

Questions & Answers

The Replication Crisis: Most of What You Know About Psychology Is Wrong

What Is This?

Why Does It Matter?

Key People & Players

The Current State

Best Resources to Learn More

Sources

Want to go deeper?

Questions & Answers