Understanding p-hacking through Yahtzee?

P-values are hard enough to understand — the appear ‘magically’ on the screen — so how can we best communicate the problem of p-hacking? How about using Yahtzee as an analogy to explain the intuition of p-hacking?

In Yahtzee, players roll five dice to make predetermined combinations (e.g. three of a kind, full house). They are allowed three turns, and can lock dice. Important for the analogy, players decide which of combination they want to use for their round after the three turns. (“I threw these dice, let’s see what combination fits best…”) This is what adds an element of strategy to the game, and players can optimize their expected (average) points.

Compare this with pre-registration (according to Wikipedia, this is actually a variant of the Yahtzee variant Yatzy — or is Yahtzee a variant of Yatzy? Whatever.). This means players choose a predetermined combination before throwing their dice. (“Now I’m going to try a full house. Let’s see if the dice play along…”)

If the implications are not clear enough, we can play a couple of rounds to see which way we get higher scores. Clearly, the Yahtzee-way leads to (significantly?) more points — and a much smaller likelihood to end up with 0 points because we failed to get say that full house we announced before throwing the dice. Sadly, though, p-values are designed for the forced Yatzy variant.

Image: cc-by by Joe King

Defending the Decimals? Not so Fast!

In a recent article in Sociological Science, Jeremy Freese comes to the defence of ‘foolishly false precision’ as he calls it. To cut a short story even shorter, the paper argues for including these conventional three decimals when reporting research findings — as long as the research community continues to rely so much (too much) on p-values. The reason for this is that we can recover precise p-values when often it is simply reported whether the results were above or below a specific level of significance.

While I share the concerns presented in the paper, I think it may actually do more harm than good. Yes, in the academic literature simply appearing more precise than one is will fool nobody with at least a little bit of statistical training. What we miss, however, by including tables with three or four decimals, is communication. It is easier to see that 0.5 is bigger than 0.3 (and roughly how much) than say 0.4958 and 0.307. Cut decimals or keep them? I think we should do both: cut them as much as we can in the main text — graphics would be very strong contenders there; and keep them in the appendix or online supplementary material (as I argued a year ago; and if reviewers think otherwise, ignore them!). That’s exactly in the spirit of Jeremy Freese’s paper, I think: give those doing meta-analyses the numbers the need, while keeping the main text nice and clean.

p<0.05 in Sweave

Here’s a very simple way to include p-levels in Sweave. Let’s assume you want to mention a correlation coefficient in your text, \Swexpr{} will do just that.

\Sexpr{round(cor.test(x, y)$estimate,2)}

You can easily include the p-level, too.

\Sexpr{round(cor.test(x, y)$p.value,2)}

Except that’s not how it’s usually done. Normally we report whether the p-value is smaller than a certain threshold, and by convention only a few of them are considered.

Enter a very simple function (I’d include this in my first Sweave block where I load the data):

plevel <- function (x, strict=FALSE) {
  # levels of p-values, for Sweave
  # strict cuts at 0.05, otherwise cuts at 0.1
  if (x>0.1 & strict==FALSE) p <- "p>0.1"  # not significant
  if (x>0.1 & strict==TRUE)  p <- "p>0.05" # not significant
  if (x<=0.1 & strict==FALSE) p <- "p<0.1"  # significant
  if (x<=0.1 & strict==TRUE)  p <- "p>0.05" # not significant
  if (x<=0.01)  p <- "p<0.01"  # significant
  if (x<=0.05)  p <- "p<0.05"  # significant
  if (x<=0.001) p <- "p<0.001" # significant

Created by Pretty R at inside-R.org

This automatizes the procedure, and the cited thresholds will always be correct. I could make this function simpler by leaving out the strict argument, or obviously adjust the thresholds.

So, here’s how I use this in Sweave: some text ($r=\Sexpr{round(cor.test(x, y)$estimate,2)}$, $\Sexpr{plevel(cor.test(x, y)$p.value)}$) some more text.

The dollar signs (math mode) mean that I get nice typography for the numbers and operators.