p-hacking: try it yourself!

It’s not new, but it’s still worth sharing:

The instructions go: “You’re a social scientist with a hunch: The U.S. economy is affected by whether Republicans or Democrats are in office. Try to show that a connection exists, using real data going back to 1948. For your results to be publishable in an academic journal, you’ll need to prove that they are “statistically significant” by achieving a low enough p-value.”

The tool is here: https://projects.fivethirtyeight.com/p-hacking/

And more on p-hacking here: Wikipedia — to understand why “success” in the above is not what it seems.

Anonymizing Microdata

In the age of datalinkage, protecting microdata is as relevant as ever. Fortunately, there are R packages available to help:

That’s another excuse for not sharing data busted.

Replication as learning

As part of the course on applied statistics I’m teaching, my students have to try to replicate a published paper (or, typically, part of the analysis). It’s an excellent opportunity to apply what they have learned in the course, and probably the best way to teach researcher degrees of freedom and how much we should trust the results of a single study. It’s also an excellent reminder to do better than much of the published research in providing proper details of the analysis we undertake. Common problems include not describing the selection of cases (where not everyone remains in the sample), opaque recoding of variables, and variables that are not described. An interesting case is the difference between what the authors wanted to do (e.g. restrict the sample to voters) and what they apparently did (e.g. forge to do so). One day, I hope this exercise will become obsolete: the day my students can readily download replication code…

Image: CC-by-nd Tina Sherwood Imaging https://flic.kr/p/8iz7qS

Access to knitr cache

Alastair Rushworth had the answer: what’s the easiest way to access the knitr cache? It turns out all we need is the lazyLoad() function, and of course some detective work to find the right cache (but you’re labelling your chunks, aren’t you?).

So: lazyLoad("name-of-that-cached-rdx-file-in-the-cache"), and note that there is no file extension here…

Comment on Reproducibility

There’s a ‘technical’ comment on a recent paper that has stirred quite a debate: the reproducibility of psychological science is purportedly quite low. This paper argues that when the results of the original study are corrected for error, power, and bias, there is not much left to conclude that there is a reproducibility crisis. As always in Science, short and to the point. And there’s a response to the comment, too.

Gilbert, Daniel T., Gary King, Stephen Pettigrew, and Timothy D. Wilson. 2016. ‘Comment on “Estimating the Reproducibility of Psychological Science”’. Science 351 (6277): 1037–1037. doi:10.1126/science.aad7243.