You’re shown a simple scatter plot and enter the correlation coefficient you guess to be associated with it. If you’re close enough, you get coins, if you’re too far off, you lose a heart. There’s even a two-player mode. Basic gaming stuff, but you also build an intuition of what those correlation coefficients we’re throwing around all the time actually mean.

There’s more, though. The game also serves (another) serious purpose: Omar Wagih is collecting the data to analyse how we mortals perceive correlations in scatter plots.

Adding text labels to a scatter plot in R is easy. The basic function is text(), and here’s a reproducible example how you can use it to create these plots:

For the example, I’m creating random data. Since the data are random, your plots will look different. In this fictitious example, I look at the relationship between a policy indicator and performance. It is conventional to put the outcome variable on the Y axis and the predictor on the X axis, but in this example there’s no relationship to reality anyway… The reason I chose min and max values for the random variables here is that I jotted down this code as an explanation for a replication. In this example, we have 25 observations, for 25 units I call “cantons”. The third line here creates a string of characters “A” to “Y”, these are the labels!

For the scatter plot on the left, we use plot(). Then we add the trend line with abline() and lm(). To add the labels, we have text(), the first argument gives the X value of each point, the second argument the Y value (so R knows where to place the text) and the third argument is the corresponding label. The argument pos=1 is there to tell R to draw the label underneath the point; with pos=2 (etc.) we can change that position.

The scatter plot on the right is similar, but here we actually plot the labels instead of the dots. There are two differences in the code: First, we add type="n" to create the scatter plot without actually drawing any circles (an empty plot if you will). Second, when we add the text in the third line of the code, we do not have pos=1, because we want to place the labels exactly where the points are.

Today I spend quite some time trying to figure out why I couldn’t use the scatterplot function (from the package car) for one specific variable, while it worked for every other variable. I got stuck at the error “Error in if (transform != FALSE | length(transform) == ncol(x)) { : argument is of length zero.”

It was only when I used str on the variables to examine the structure that I realized that the scatterplot function does not work with scales. So, normally I could use scatterplot(y~x | country). In this particular case, I used x <- scale(x1) + scale(x2) +... to create the new variable. scatterplot(y~x1 | country) worked perfectly, as did scatterplot(y~x2 | country), but the scale did not. It turns out the scale function also adds additional information, which breaks the scatterplot function. Once I knew this, the solution seemed obvious: filter out this additional information by using as.numeric: scatterplot(y~as.numeric(x) | country).