## Guess the Correlation!

Here’s a fantastic way to kill a few minutes better understand correlation coefficients: http://guessthecorrelation.com/

You’re shown a simple scatter plot and enter the correlation coefficient you guess to be associated with it. If you’re close enough, you get coins, if you’re too far off, you lose a heart. There’s even a two-player mode. Basic gaming stuff, but you also build an intuition of what those correlation coefficients we’re throwing around all the time actually mean.

There’s more, though. The game also serves (another) serious purpose: Omar Wagih is collecting the data to analyse how we mortals perceive correlations in scatter plots.

## How to add text labels to a scatter plot in R?

Adding text labels to a scatter plot in R is easy. The basic function is text(), and here’s a reproducible example how you can use it to create these plots:

For the example, I’m creating random data. Since the data are random, your plots will look different. In this fictitious example, I look at the relationship between a policy indicator and performance. It is conventional to put the outcome variable on the Y axis and the predictor on the X axis, but in this example there’s no relationship to reality anyway… The reason I chose min and max values for the random variables here is that I jotted down this code as an explanation for a replication. In this example, we have 25 observations, for 25 units I call “cantons”. The third line here creates a string of characters “A” to “Y”, these are the labels!

policy = runif(25, min=0.4, max=0.7)
perfor = runif(25, min=500, max=570)
canton = sapply(65:89, function(x) rawToChar(as.raw(x)))

For the scatter plot on the left, we use plot(). Then we add the trend line with abline() and lm(). To add the labels, we have text(), the first argument gives the X value of each point, the second argument the Y value (so R knows where to place the text) and the third argument is the corresponding label. The argument pos=1 is there to tell R to draw the label underneath the point; with pos=2 (etc.) we can change that position.

plot(policy ~ perfor, bty="n", ylab="Policy Indicator", xlab="Performance", main="Policy and Performance")
abline(lm(policy ~ perfor), col="red")
text(perfor, policy, canton, pos=1)

The scatter plot on the right is similar, but here we actually plot the labels instead of the dots. There are two differences in the code: First, we add type="n" to create the scatter plot without actually drawing any circles (an empty plot if you will). Second, when we add the text in the third line of the code, we do not have pos=1, because we want to place the labels exactly where the points are.

plot(policy ~ perfor, bty="n", type="n", ylab="Policy Indicator", xlab="Performance", main="Policy and Performance")
abline(lm(policy ~ perfor), col="red")
text(perfor, policy, canton)

## scatterplot() with scales

Today I spend quite some time trying to figure out why I couldn’t use the `scatterplot` function (from the package `car`) for one specific variable, while it worked for every other variable. I got stuck at the error “Error in if (transform != FALSE | length(transform) == ncol(x)) { : argument is of length zero.”

It was only when I used `str` on the variables to examine the structure that I realized that the scatterplot function does not work with scales. So, normally I could use `scatterplot(y~x | country)`. In this particular case, I used `x <- scale(x1) + scale(x2) +... `to create the new variable. `scatterplot(y~x1 | country)` worked perfectly, as did `scatterplot(y~x2 | country)`, but the scale did not. It turns out the scale function also adds additional information, which breaks the scatterplot function. Once I knew this, the solution seemed obvious: filter out this additional information by using `as.numeric`: `scatterplot(y~as.numeric(x) | country)`.