P-values are hard enough to understand — the appear ‘magically’ on the screen — so how can we best communicate the problem of p-hacking? How about using Yahtzee as an analogy to explain the intuition of p-hacking?
In Yahtzee, players roll five dice to make predetermined combinations (e.g. three of a kind, full house). They are allowed three turns, and can lock dice. Important for the analogy, players decide which of combination they want to use for their round after the three turns. (“I threw these dice, let’s see what combination fits best…”) This is what adds an element of strategy to the game, and players can optimize their expected (average) points.
Compare this with pre-registration (according to Wikipedia, this is actually a variant of the Yahtzee variant Yatzy — or is Yahtzee a variant of Yatzy? Whatever.). This means players choose a predetermined combination before throwing their dice. (“Now I’m going to try a full house. Let’s see if the dice play along…”)
If the implications are not clear enough, we can play a couple of rounds to see which way we get higher scores. Clearly, the Yahtzee-way leads to (significantly?) more points — and a much smaller likelihood to end up with 0 points because we failed to get say that full house we announced before throwing the dice. Sadly, though, p-values are designed for the forced Yatzy variant.
Pre-registration plans (PAP) rightly become more common (they are still not common enough yet, I think), but here’s a reason to write up a PAP that I have never seen mentioned before: Pre-registration plans can be immensely useful for yourself!
So, you have come up with a clever analysis, and writing the PAP has helped sharpen your mind what exactly you are looking for. You then collect your data, finish off another project, and … what was it exactly I was going to do with these data? Did I need to recode the predictor variable? etc.? Yes it happens, and a pre-analysis plan would be an ideal reminder to get back into the project: PAP can be like a good lab journal or good documentation of the data and analysis we do — a reminder to our future selves.
We probably all know that pre-registration of experiments is a good thing. It’s a real solution to what is increasingly called ‘p-hacking’: doing analyses until you find a statistically significant association (which you then report).
One problem is that most pre-registration protocols are pretty complicated, and as researchers in the social sciences we usually don’t have inclination/incentives to follow complicated protocols typically designed for biomedical experiments. A probably more reasonable approach is AsPredicted: We’re looking at 9 simple and straightforward questions, and we’re looking at pre-registration that remains private until it is made public (but can be shared with reviewers).