How (Not) to Study Ideological Representation

David Broockman has an important paper on political representation apparently forthcoming in LSQ.

He notes two ways to study the political representation of issues, policies, and preferences. On the one hand we can examine citizen-elite congruence issue by issue. On the other hand, we can calculate “policy scores” to capture ideal points of overall ideologies and compare these between citizens and the elite. The paper convincingly demonstrates that the latter approach is flawed in the sense that it doesn’t really capture political representation in the way we generally understand it.

Broockman, David E. 2015. “Approaches to Studying Policy Representation.” Legislative Studies Quarterly.

Should We Use Stop Words?

When using automatic content analysis like Wordscores or Wordfish, stop words may be used. This is a contentious issue, with recommendations ranging from definitely use stop words to those who argue that stop words are a bad thing. What to do?

To me this sounded more like an empirical question than something beliefs could settle. Using professionally translated texts (i.e. party manifestos available in two languages), I examined how stop words affect predicted scores (i.e. party positions). Lowe & Benoit (2013) highlight that words considered as a priori uninformative can help predict party positions altogether. This can be used as an argument against using stop words. In my analysis, I applied just a few stop words, consisting almost entirely of grammatical terms like articles and conjunctions (function words). It turns out that removing these words can almost entirely remove the impact of language on predicted scores. Put differently, removing words that really carry no meaning can improve the predictions.

So should we use stop words? Yes, but we don’t need many stop words, and using stop words that clearly carry no substantive information seems to be a good idea.

Lowe, Will, and Kenneth Benoit. 2013. “Validating estimates of latent traits from textual data using human judgment as a benchmark.” Political Analysis 21 (3): 298–313. doi:10.1093/pan/mpt002.

Ruedin, D. 2013. “The Role of Language in the Automatic Coding of Political Texts.” Swiss Political Science Review 19 (4): 539–45. doi:doi:10.1111/spsr.12050.

The role of source language in Wordscores etc.

My paper on the role of source language in the automatic coding of political texts (Wordscores, dictionary coding) is now available online. I make use of Swiss party manifestos to examine the impact of source language on party positions derived from the manifestos: does it matter if a French or German manifesto is used? The conclusion is that both stemming and particularly stop words are important to obtain comparable results for Wordscores, while the keyword-based dictionary approach is not affected by language differences. Replication material is available on my Dataverse.

MIPEX and Naturalization Policies

In a recent working paper Thomas Huddleston and Maarten Peter Vink demonstrate that the different dimensions covered by the MIPEX indicators all tend to correlate strongly with naturalization policies. A country tough on naturalization tends to be tough on other aspects of immigration and integration policies.

While it didn’t make a direct reference to this debate, my 2011 working paper on the reliability of the MIPEX as a scale fully supports this. In this working paper I show that all MIPEX indicators combined are a reliable scale, but also highlight redundancies. These findings actually prepared my recent post on remastering MIPEX indicators depending on the research question.

MIPEX Remastered

The MIPEX (Migrant Integration Policy Index) is a relatively widely used index. I have demonstrated empirically that it can be used as a scale, but have voiced some concerns about the weak theoretical foundation.

The number of countries covered by the MIPEX is increasing, and there are 148 indicators available. In an attempt to make most of these data, I have picked the parts of the MIPEX that most closely fit the typology developed in Koopmans et al. (2005).

To get a better handle of developments over time, I use the SOM extension of MIPEX, and here is how the situation in Austria and the Netherlands has changed over time.


We can discuss the labels, but there are clear differences between countries, and citizenship regimes are clearly dynamic. This means that, yes, citizenship regimes are worth investigating, but country dummies will fail to provide an accurate picture.

Koopmans, Ruud, Paul Statham, Marco Giugni, and Florence Passy. 2005. Contested Citizenship: Immigration and Cultural Diversity in Europe. Minneapolis: Minnesota University Press.

Ruedin, Didier. 2011. “The reliability of MIPEX indicators as scales.” SOM Working Paper 3: 1–19.