Statistical significance is a concept that even established researchers get completely wrong. If you don’t believe me, just read the list of increasingly desperate descriptions of non-significant results (compiled by Matthew Hankins).
I think this confusion is largely to do with language. After doing a hypothesis test, the word ‘significant’ has a precise meaning: it means that the probability of observing the given result if the null hypothesis was true has been calculated and found to be less than some arbitrary pre-determined significance level. This arbitrary level is usually taken to be 0.05, corresponding to a 1 in 20 chance that you’d see the same result if the null hypothesis were true. (If you followed that explanation you’ve almost certainly heard it before.)
However, this meaning of ‘significant’ is different to its everyday meaning.
When you say a result is ‘significant’ to most non-statisticians, they’re likely to start thinking of any of the following closely related words: notable, noteworthy, worthy of attention, remarkable, outstanding, important…this is clearly how it gets (mis)used in practice.
Conversely, saying a result is ‘not significant’ sounds like you’re saying it is none of those things.
Is it any wonder that people become obsessed over whether the p-value passes that arbitrary p < 0.05 threshold when they hear in Applied Stats 101 that their result won’t be ‘important’ unless it does?
Things are further complicated by the fact that ‘clinical significance’ is also a thing. I’ve noticed particularly in medical studies it’s not uncommon talk about results as being ‘significant’ and imply that they’re clinically significant or important, whereas in fact they’re probably not.
The Wikipedia page on statistical significance stresses:
“The term significance does not imply importance and the term statistical significance is not the same as research, theoretical, or practical significance.” (source)
It’s clear that this message has failed to get through to thousands of students and researchers.
Therefore, I would like to suggest a new word to be used in place of ‘significant’ after performing a hypothesis test:
When spoken, the p at the start should to be aspirated (‘puh-significant’) to remind everyone that this interpretation is inextricably linked to a p-value from a statistical test and is not the same as the everyday meaning of ‘significant’.
With this new word, I look forward to statements like this appearing in published papers:
“the difference in values is psignificant (p < 0.05) but is too small to be of clinical significance”
This post is just a dumb suggestion. But I don’t think it’s completely fair to blame non-statisticians for misusing p-values when the language used to describe them is misleading.