An Ode to Low R2

It’s the time of the year when many of us do their share of grading. In my case, it’s quantitative projects, and every time I’m impressed how much the students learn. One thing that annoys me sometimes is to see how many of them (MA students) insist on interpreting R2 in absolute terms (rather than to compare similar models, for instance). That’s something they seem to learn in their BA course:

[in this simple model with three predictor variables], we only explain 3% of the variance; it’s a ‘bad’ model.

I paraphrased, of course. But I started to like low R2: They are a testament to the complexity of humans and their social world. They are a testament to the fact that we are not machines, we are in the world where quantitative analysis is about tendencies. Just imagine a world in which knowing your age and gender I could perfectly predict your political preferences… So there you have it: low R2 are great!

Stuff we do here: TV report on discrimination (in French)

Here’s a short TV report on discrimination in the labour market in Switzerland, drawing on a field study by my (former) colleagues at Neuchâtel: (in French)

Swiss nationals of foreign origin have to send 30% more applications to get a job interview, according to a study by the University of Neuchâtel. Some communities are more affected, particularly people with family names from the Balkans.

Contour Plot Breaks Off?

Today I experimented with the good old contour plots in R. I plotted my points rather large, because there is quite some uncertainty around their precise placement. In this particular case, I start with an empty plot and a custom range, and add the points separately. Note the cex=8 to draw extra large points.

plot(c(80, 740), c(180, 740) , type='n', xlab="", ylab="", bty="n", main="")
points(jitter(x), jitter(y), cex=8, pch=19, col="#AA449950")

Then I added contours, and they were cut off, breaking off where I expected them to go around the dots. Why are there incomplete lines at the top and bottom?

It turns out — a.k.a. read the manual — that kde2d sets the default limits to the range (I guess this is quite reasonable in other cases): lims = c(range(x), range(y)). Now my big dots obviously cover more than the strict range of values, so all I needed to do was set my own lims in kde2d.

Here’s the entire code for the plot:
plot(c(80, 740), c(180, 740) , type='n', xlab="", ylab="", bty="n", main="")
points(jitter(x), jitter(y), cex=8, pch=19, col="#AA449950")
# z = kde2d(x, y, n=50) # this one didn't work out
z = kde2d(x, y, n=50, lims=c(80, 740, 180, 740))
contour(z, drawlabels=FALSE, nlevels=6, col="#AA4499", add=TRUE)

C4P: Workshop on Survey Experiments in Migration and Integration Research

Flavia Fossati and I are organizing an international workshop on “Survey Experiments in Migration and Integration Research” and would like to cordially invite you to contribute a paper to this event, which will be hosted at the University of Lausanne (IDHEAP) on June 4-5th 2020.

This is the third meeting of a series of international workshops previously held in Switzerland at the Universities of Lausanne and Berne that aim at gathering experts on the topic and to have in-depth discussions on their work in progress.

In this edition of the Survey Experiment in Migration and Integration Research, we will have a few different panels that focus on the survey experiment methodology and others that focus more on the immigration and integration research that is carried out by means of such experimental methods.

The event will be accompanied by two keynote speeches, one by Prof. Katrin Auspurg (University of Munich) and Prof. Donald Green (Columbia University).

Please apply by following this link:

Deadline February 15th 2020.