For example, authors, editors, and reviewers use “statistical (non)significance” as a filter to select which results to publish. This creates a distorted literature because the effects of published interventions are biased upward in magnitude. It also encourages harmful research practices that yield results that attain “statistical significance.”
From: Blakeley B. McShane, Eric T. Bradlow, John G. Lynch, Jr., and Robert J. Meyer, “‘Statistical Significance’ and Statistical Reporting: Moving Beyond Binary,” Journal of Marketing.
Decisions are seldom necessary in scientific reporting and are best left to end-users such as managers and clinicians when necessary. In such cases, they should be made using a decision analysis that integrates the costs, benefits, and probabilities of all possible consequences via a loss function (which typically varies dramatically across stakeholders)—not via arbitrary thresholds applied to statistical summaries such as P-values (“statistical (non)significance”) which, outside of certain specialized applications such as industrial quality control, are insufficient for this purpose.
If the P-value is less than 0.05, the effect is declared “statistically significant,” the assumption of no effect is rejected, and it is concluded that the intervention has an effect in the real world. If the P-value is above 0.05, the effect is declared “statistically nonsignificant,” the assumption of no effect is not rejected, and it is concluded that the intervention has no effect in the real world.
Criticisms of NHST
Robert Meyer is Frederick H. Ecker/MetLife Insurance Professor of Marketing, University of Pennsylvania, USA.