The Accidental Law: Why 0.05 Rules Science and Why the ASA Wants It Gone
- Christos Nikolaou
- Nov 24
- 3 min read
In modern research, the value p = 0.05 often functions as a magical cliff edge. If a result lands on 0.049, it is celebrated as a discovery; if it lands on 0.051, it is dismissed as noise.
This rigid threshold determines funding, publication, and drug approvals. Yet, there is no mathematical derivation that proves 0.05 is the "correct" threshold for truth. It is arguably the most influential number in science, but its dominance is essentially an accident of history—a fossil of the 1920s printing industry that has outlived its usefulness.
In 2016, the American Statistical Association (ASA) released a landmark statement to correct this course. To understand their recommendations, we must first understand how we got here.
The Accident of the Printing Press
Before the era of computers, scientists could not calculate an exact p-value (like p = 0.032) for their specific data. Doing so required complex calculus that was impractical to perform for every experiment.
Instead, researchers relied on reference tables published in textbooks. The most influential of these was Sir Ronald Fisher’s Statistical Methods for Research Workers, published in 1925.
Fisher faced a practical constraint: he could not print a table for every possible probability. To save space, he tabulated critical values for a few specific milestones, most notably 0.05, 0.02, and 0.01. Because 0.05 was often the first column in the table, it became the default "first hurdle" for researchers.
If Fisher had possessed a modern computer—or if he had simply chosen to print a column for 0.06—the standard of evidence in modern science might be entirely different today.
The Conflicted Hybrid
The current confusion in statistics arises because modern practice is an incompatible hybrid of two rival philosophies that were never meant to be combined.
1. The Fisher Approach (Evidence)
Sir Ronald Fisher viewed the p-value as a fluid measure of evidence against a null hypothesis. He famously stated that 1 in 20 (0.05) was merely a "convenient" point to verify that an experiment was worth further investigation. He explicitly warned against using it as a dogma:
"No scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses..." (Fisher, 1956)
2. The Neyman-Pearson Approach (Decisions)
Fisher’s rivals, Jerzy Neyman and Egon Pearson, argued for a strict decision-making framework, primarily for industrial quality control. They believed researchers should set a fixed Error Rate (α) before the experiment and strictly "Accept" or "Reject" based on that line.
The Modern Mishmash
Today, researchers often calculate an exact p-value (Fisher’s method) but judge it against a rigid 0.05 cutoff (Neyman-Pearson’s method). This creates a false sense of certainty, where a complex continuum of evidence is forced into a binary "True/False" outcome.
The ASA Statement (2016)
Recognising that this rigid reliance on 0.05 was leading to a crisis of reproducibility (including "p-hacking" and the "file-drawer effect"), the American Statistical Association released a formal statement on statistical significance.
The ASA’s Executive Director, Ron Wasserstein, noted that "The p-value was never intended to be a substitute for scientific reasoning." To steer research into a "post p < 0.05 era," the ASA established six principles:
The Six Principles
P-values are not truth detectors. They indicate how incompatible the data are with a specified statistical model. They do not measure the probability that the studied hypothesis is true, nor the probability that the data were produced by random chance alone.
Decisions should not rely on a threshold. Scientific conclusions and business/policy decisions should not be based only on whether a p-value passes a specific threshold (like 0.05). The "cliff edge" is an illusion.
Inference requires transparency. Proper inference requires full reporting. If a researcher performs multiple analyses but only reports the one that yields p < 0.05, the p-value is rendered invalid.
P-values do not measure size or importance. A result can be statistically significant (p < 0.05) without being practically important (e.g., a drug reduces headache duration by 2 seconds). Conversely, a large, important effect might fail to reach significance in a small study.
A p-value is not evidence by itself. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. It must be interpreted in context.
There is no substitute for context. Good statistical practice emphasises study design, understanding the phenomenon, and interpreting results in context—not just calculating a number.]
Conclusion: The Post-0.05 Era
The 0.05 threshold is not a law of nature; it is a compromise made for 1920s textbook formatting. When we treat it as a law, we risk dismissing talented researchers simply because their study design introduced more variance than the rigid threshold allows.
The ASA recommends moving away from the binary label “statistically significant” toward a holistic view of evidence. Researchers should report confidence intervals, effect sizes, and—crucially—the full context of their methodology, rather than relying on a single, accidental number to declare the truth.
_edited.png)