A colleague recently commented that he is confused where I stand with regard to the academic use of MIPEX data. Apparently I have been rather critical and quite enthusiastic about it. I guess this sums it up quite well. I’ve always been critical of the (historical) lack of a theoretical base for the indicators used, and the often uncritical use of the aggregate scores as indicators of ‘immigration policy’ in the literature. I’m enthusiastic about its coverage (compared to other indices), the effort to keep it up-to-date, and the availability of the detailed data.

A few years back, I verified that it is OK to use the MIPEX as a scale (as is often done), highlighting redundancy in the items and that such scales could be improved:

In the context of the SOM project, we have demonstrated that it is feasible to expand the MIPEX indicators back in time. We did so for 7 countries back to 1995. I refined these data by using the qualitative descriptions provided to identify the year of the change, giving year-on-year changes since 1995 for the 7 SOM countries. These data are experimental in that they rely on the documentation and not original research. If that’s not enough, Camilla and I have then created a complete time series of the MIPEX indicators in Switzerland since 1848. This showed that we definitely can go back in time, but also that quite a few of the things MIPEX measures were not regulated a century ago.

Even with the short time in the SOM data, these data are quite insightful:

Later I provided a different approach: re-assembling! The idea is generic and does not apply to the MIPEX alone: make use of the many indicators in the database, but use your own theory to pick and choose the ones you consider most appropriate (rather than be constrained by the presentation in the MIPEX publications). I have demonstrated that the MIPEX data can be used to closely approximate the Koopmans et al. data, but immediately cover a wider range of countries and observe changes over time. Now we can have theory and coverage!

And yes, we can apply these data to gain new insights, like the nature of the politicization of immigrant groups:


Inspired by a reference to using MIPEX data in Anna Zamora-Kapoor, Petar Kovincic, and Charles Causey‘s review on anti-foreigner sentiments, I decided to post a few comments. Basically I agree with the authors on the benefits of systematic comparative data, but this does not necessarily lead to a blanket recommendation of MIPEX data.

MIPEX data have many advantages, including a relatively wide coverage and the fact that it provides measures over time (even more so for some countries).

The history of the MIPEX means that it is probably not as soundly based on theory as we want it to be in academic research (i.e. if we want to use it as a scale: this is not its original purpose). For a number of indicators it is not entirely clear why they were chosen. That said, most of the indicators seem to hold up relatively well empirically. By relatively well I mean that MIPEX could be used as a scale, but it could be improved in several ways. In particular, fewer indicators would suffice.

On a different note, I have reservations about measurement invariance: do the different MIPEX indicators actually measure the same thing in different countries? As we are looking at aggregate data, empirical tests such as CFA do not apply.

There are other similar indicators, nicely summarized in Koopmans, R., I. Michalowski, and S. Waibel. 2012. “Citizenship rights for immigrants: National political processes and cross-national convergence in Western Europe, 1980–2008.” American Journal of Sociology 117 (4): 1202–1245. doi:10.1086/662707.

What is important is that no one measure of citizenship rights will be suitable for all research questions. The limitations of existing data sets should encourage us to produce better data sets for academic research whenever necessary. In many cases MIPEX comes with clear advantages: readily available, directly comparable to other research, wide coverage, and coverage over time. At the same time, there is no rule that all the indicators need to used, or that other indicators cannot be added to create a new measure without having to start from scratch.

scatterplot() with scales

Today I spend quite some time trying to figure out why I couldn’t use the scatterplot function (from the package car) for one specific variable, while it worked for every other variable. I got stuck at the error “Error in if (transform != FALSE | length(transform) == ncol(x)) { : argument is of length zero.”

It was only when I used str on the variables to examine the structure that I realized that the scatterplot function does not work with scales. So, normally I could use scatterplot(y~x | country). In this particular case, I used x <- scale(x1) + scale(x2) +... to create the new variable. scatterplot(y~x1 | country) worked perfectly, as did scatterplot(y~x2 | country), but the scale did not. It turns out the scale function also adds additional information, which breaks the scatterplot function. Once I knew this, the solution seemed obvious: filter out this additional information by using as.numeric: scatterplot(y~as.numeric(x) | country).