Plot and boxplot in R without borders

R can be pretty counter-intuitive at times, usually for historical reasons. Here’s one I’ve forgotten several times: drawing a plot and a boxplot without the border/box around it. The default (top row) is the plot with a border/box around it.

For the plot() function, we need the argument bty=”n”, that’s the box type “n” (for none, I guess). For the boxplot, we need frame=FALSE, which behind the scenes sets the same graphical parameter. Oddly enough, frame=FALSE is not mentioned in the help on the boxplot() function.

The precise code for these plots: a bunch of values in myvalues, some random values in xrandom to spread out the points, and then this:

par(mfrow=c(2,2)) # 2 x 2 plots
plot(myvalues ~ xrandom, xlim=c(0.5, 1.5), xlab="", ylab="")
boxplot(myvalues, xlim=c(0.5, 1.5), xlab="", ylab="")
plot(myvalues ~ xrandom, xlim=c(0.5, 1.5), xlab="", ylab="", bty="n")
boxplot(myvalues, xlim=c(0.5, 1.5), xlab="", ylab="", frame=FALSE)

The Politicization of Immigration in Portugal between 1995 and 2014: A European Exception?

Out now, an extension of the SOM project to Portugal.

Notwithstanding the doubling of the foreign population settled in the country in the early 2000s, the diminished salience and the absence of significant political conflict suggest that immigration failed to become politicized in Portugal.

Happy to see this published, and excellent to see my figures in print so that we can directly compare the results for Portugal with the other seven countries in the original SOM project and the SOM book.

Carvalho, João, and Mariana Carmo Duarte. 2020. ‘The Politicization of Immigration in Portugal between 1995 and 2014: A European Exception?’ JCMS: Journal of Common Market Studies. https://doi.org/10.1111/jcms.13048.
Van der Brug, Wouter, Gianni D’Amato, Joost Berkhout, and Didier Ruedin, eds. 2015. The Politicisation of Migration. Abingdon: Routledge.

Visualize correlations in R

There are rare cases when a graphic is not better than a figure to help us understand our quantitative results. A simple yet common table we’re staring at ever so often are tables of correlation coefficients: how strongly do different variables correlate with one another. We’re scanning the tables for numbers close to +1 and close to -1, but there’s a better way: visualize!

The R package corrplot offers a ready-made solution:

library(corrplot)
dat=matrix(c(0.11128257, -0.38968561, 0.11765272, -0.07089879, -0.19715366, -0.48083950, 0.54760745, -0.49410370, -0.42443391), nrow=3)
corrplot(dat)

Here we call the corrplot package, create some data so that we can plot something, normally this would be a selection of variables. Then we simply call corrplot() and we’re done.

There are many ways to tweak the plots, but in all versions we get a quicker and better overview of the variables that correlate than staring at a large table.

Here are some variants of the above:

par(mfrow=c(2,2))
corrplot(dat, method = "shade")
corrplot(dat, diag=FALSE)
corrplot(dat, method = "square")
corrplot(dat, method = "number")

Contour Plot Breaks Off?

Today I experimented with the good old contour plots in R. I plotted my points rather large, because there is quite some uncertainty around their precise placement. In this particular case, I start with an empty plot and a custom range, and add the points separately. Note the cex=8 to draw extra large points.

plot(c(80, 740), c(180, 740) , type='n', xlab="", ylab="", bty="n", main="")
points(jitter(x), jitter(y), cex=8, pch=19, col="#AA449950")

Then I added contours, and they were cut off, breaking off where I expected them to go around the dots. Why are there incomplete lines at the top and bottom?

It turns out — a.k.a. read the manual — that kde2d sets the default limits to the range (I guess this is quite reasonable in other cases): lims = c(range(x), range(y)). Now my big dots obviously cover more than the strict range of values, so all I needed to do was set my own lims in kde2d.

Here’s the entire code for the plot:
plot(c(80, 740), c(180, 740) , type='n', xlab="", ylab="", bty="n", main="")
points(jitter(x), jitter(y), cex=8, pch=19, col="#AA449950")
library(MASS)
# z = kde2d(x, y, n=50) # this one didn't work out
z = kde2d(x, y, n=50, lims=c(80, 740, 180, 740))
contour(z, drawlabels=FALSE, nlevels=6, col="#AA4499", add=TRUE)

Correlations Graphics in R

Correlations are some of the basics in quantitative analysis, and they are well suited for graphical examination. Using plots we can see whether it is justified to assume a linear relationship between the variables, for example. Scatter plots are our friends here, and with two variables it is as simple as calling plot() in R:

plot(var1, var2)

If we have more than two variables, it can be useful to plot a scatter plot matrix: multiple scatter plots in one go. The pairs() command is built in, but in my view not the most useful one out there. Here we use cbind() to combine a few variables, and specify that we don’t want to see the same scatter plots (rotated) in the upper panel.

pairs(cbind(var1, var2, var3, var4) , upper.panel=NULL)

A more flexible method is provided in library(car) with the scatterplotMatrix(). If this is not flexible enough, we can always split the plot and draw whatever we need, but that’s not for today.

library(car)scatterplotMatrix(cbind(var1, var2, var3, var4))

If we have many more variables, it’s necessary to draw multiple plots to be able to see what is going on. However, sometimes after having checked that the associations are more or less linear, we’re simply interested in the strength and direction of the correlations for many combinations of variables. I guess the classic approach is staring at a large table of correlation coefficients, but as is often the case, graphics can make your life easier, in this case library(corrplot):

library(corrplot)
corrplot(object_with_many_variables, method="circle", type="lower", diag=FALSE)

This is certainly more pleasant than staring at a table…

For all these commands, R offers plenty of ways to tweak the output.