Why We Habitually Engage in Null-Hypothesis Significance Testing…

You should head over to PLOS to read this paper by Jonah Stunt et al. It’s the first qualitative study I’ve come across at PLOS, but it’s definitely worth a read to better understand why we’re still surrounded by p-values.

One thing I missed in the paper is a hint that we don’t have to engage in frequentists null-hypothesis significance testing. I realize that the authors are interested in the sociology of science here, but we have plenty of statements in the article how difficult it’d be to learn about alternative methods. It doesn’t have to be: We do have packages like rstanarm or software like JASP that do not leave much room for such excuses.

Stunt, Jonah, Leonie van Grootel, Lex Bouter, David Trafimow, Trynke Hoekstra, and Michiel de Boer. 2021. “Why We Habitually Engage in Null-Hypothesis Significance Testing: A Qualitative Study.” PLOS ONE 16(10):e0258330. doi: 10.1371/journal.pone.0258330.

That hairy caterpillar

Textbooks on Bayesian inference often refer to a ‘hairy caterpillar’ when describing the traceplot and what it should look like. It’s easy to come across examples what things look like, examples of this hairy caterpillar:

Hairy caterpillar

Likewise, we often see examples of the autocorrelation plot where everything is fine: a quick decrease to values around zero:

Autocorrelation

What seems less common are examples of what things should not look like, such as a traceplot that does not look like a ‘hairy caterpillar’ at all, or autocorrelation that really does not want to behave. Here I provide examples of both. How about this beauty, where each chain seems to be up to its own thing? This definitely does not look like convergence, nor do the chains mix well.

No hairy caterpillar

Or how about this trend? We need to stretch the definition of ‘quickly’ beyond any recognition to argue that this resembles a quick decrease.~

Bad autocorrelation

So yes, it’s back to the drawing board for this model… longer chains (with more thinning) may not suffice here.