MIPEX? MIPEX!

A colleague recently commented that he is confused where I stand with regard to the academic use of MIPEX data. Apparently I have been rather critical and quite enthusiastic about it. I guess this sums it up quite well. I’ve always been critical of the (historical) lack of a theoretical base for the indicators used, and the often uncritical use of the aggregate scores as indicators of ‘immigration policy’ in the literature. I’m enthusiastic about its coverage (compared to other indices), the effort to keep it up-to-date, and the availability of the detailed data.

A few years back, I verified that it is OK to use the MIPEX as a scale (as is often done), highlighting redundancy in the items and that such scales could be improved:

In the context of the SOM project, we have demonstrated that it is feasible to expand the MIPEX indicators back in time. We did so for 7 countries back to 1995. I refined these data by using the qualitative descriptions provided to identify the year of the change, giving year-on-year changes since 1995 for the 7 SOM countries. These data are experimental in that they rely on the documentation and not original research. If that’s not enough, Camilla and I have then created a complete time series of the MIPEX indicators in Switzerland since 1848. This showed that we definitely can go back in time, but also that quite a few of the things MIPEX measures were not regulated a century ago.

Even with the short time in the SOM data, these data are quite insightful:

Later I provided a different approach: re-assembling! The idea is generic and does not apply to the MIPEX alone: make use of the many indicators in the database, but use your own theory to pick and choose the ones you consider most appropriate (rather than be constrained by the presentation in the MIPEX publications). I have demonstrated that the MIPEX data can be used to closely approximate the Koopmans et al. data, but immediately cover a wider range of countries and observe changes over time. Now we can have theory and coverage!

And yes, we can apply these data to gain new insights, like the nature of the politicization of immigrant groups:

MIPEX as Measure of Citizenship Models

MIPEX are currently launching their latest release (with a shiny new website), and their data are often used in academic research. Earlier I have shown that the MIPEX can indeed be used as scales — as it is often done –, although there is scope for improving these scales. Put differently, from a statistical point of view, the dimensions and sub-dimensions in the MIPEX data are not optimal. There are two approaches to this: First, we can reduce the data complexity by removing items that are not strongly associated. Second, we can use the redundancy in the data, and pick and mix the data.

In a paper just published in the SSQ, I demonstrate this by recombining bits and pieces of the MIPEX to create citizenship scores that closely match those in Koopmans et al. On the one hand, this is a demonstration that we can easily create more valid constructs when recombining existing data sources like the MIPEX. On the other hand, I have gained classifications of citizenship models in many more countries than previous endeavours — with less effort. As a side product, I can validate the citizenship typology presented by Koopmans et al. by showing the existence of ethnic-pluralistic citizenship models (segregationism), previously only predicted on a theoretical basis.

Koopmans, Ruud, Paul Statham, Marco Giugni, and Florence Passy. 2005. Contested Citizenship: Immigration and Cultural Diversity in Europe. Minneapolis: Minnesota University Press.
Ruedin, Didier. 2015. “Increasing Validity by Recombining Existing Indices: MIPEX as a Measure of Citizenship Models.” Social Science Quarterly. doi:10.1111/ssqu.12162.

MIPEX and Naturalization Policies

In a recent working paper Thomas Huddleston and Maarten Peter Vink demonstrate that the different dimensions covered by the MIPEX indicators all tend to correlate strongly with naturalization policies. A country tough on naturalization tends to be tough on other aspects of immigration and integration policies.

While it didn’t make a direct reference to this debate, my 2011 working paper on the reliability of the MIPEX as a scale fully supports this. In this working paper I show that all MIPEX indicators combined are a reliable scale, but also highlight redundancies. These findings actually prepared my recent post on remastering MIPEX indicators depending on the research question.

Cronbach’s Alpha with Zero-Inflated Data

Cronbach’s alpha is a common way to test the internal consistency of scales. In a recent scale I constructed, I got an excellent alpha, and started wondering to what extent the many zeros in my data were the cause. Basically, a sizeable proportion of the respondents answered “no” to all the questions, and I wanted to know to what extent this drives the alpha rather than having picked good questions.

What I needed was a base-line, which I simulated in R (code).

zialpha

Basically I start with a random draw, and then gradually replace the values with zeros (could be any value). We can see that many zeros (x-axis) are required to drive the alpha (y-axis). If more than about half the values are zeros, we probably should start being bit more careful in interpreting alphas directly.

A quick conversation with William Revelle confirmed that I’m not looking at what he calls “lumpy data”, and that factor analysis was indeed the correct reaction. Given the way Cronbach’s alpha reacts to zero-inflation, factor analyses may be a necessary addition to the alpha when more than half the values are zeros.

Cronbach, Lee J. 1951. “Coefficient alpha and the internal structure of tests.” Psychometrika 16 (3): 297–334. doi:10.1007/BF02310555.

Revelle, William. 2013. Psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, Illinois. http://CRAN.R-project.org/package=psych.