Here’s a link to a blog post by Andrew Gelman that deserves to be read more widely. It reports on a paper in a field remote to what I’m doing, but the issues is about correlation coefficients — a staple in much of what we do. Apparently the authors of the paper must have thought that a correlation coefficient of 0.02 is not enough to get published, and resorted to binning the data. Binning data in itself is not a bad thing, it can be quite useful for graphs, for instance. However, they then calculated and report the correlation of the binned data. Not so miraculously, the correlation coefficient increases; they average out unexplained variance.
Maybe I really should change the framing of my statistics course to focus on how to lie and cheat with statistics: I guess the students will learn just as much about good statistics this way.