This post might serve as a reminder to myself and others doing research on immigrants and their descendent that nationality is not a mechanism. Put differently, if you discover that people with nationality A differ from people with nationality B in a given characteristic, you have not explained anything at all.
It feels rather obvious when put this way, but it’s usually harder when it comes to multiple regression models. So often we throw in a control variable like “foreign national” or “foreign born” without thinking why we do so, what alternative explanation we think we are capturing. Obviously, a person’s passport or place of birth is used as a shorthand or proxy of something else, but what exactly?
Let’s consider the commonly used variables of migration background or migration origin. Short of calling a particular section of society different in essence (which we probably don’t want to), there are a range of concepts we might be trying to capture, like the experience of (racial) discrimination, having a different skin colour, having a different religion, holding different values, having poor language skills, being of the working class, having additional cultural perspectives and experiences, transnational ties, or a combination of these.
Knowing what we’re after is essential for understanding. Sometimes it is necessary to use proxies like immigrant origin, but we need to specify the mechanism we’re trying to capture. Depending on the mechanism, who should be counted as of immigrant origin, for example, can be quite different, especially when it comes to children of immigrants, individuals of “mixed” background, and naturalized individuals. Having poor language skills, for example, is something most likely to affect (first generation) immigrants; but likely experience of racial discrimination is probably not disappearing just because it was my grandparents rather than me who came to this country.
In a new paper, Graham Brown and Arnim Langer introduce a general class of social distance measures. They follow the general feeling that many measure of diversity and disparity may be closely related by demonstrating how they are all related. By clarifying how these different measures are related, we should find it easier to choose an appropriate measure for the analysis at hand.
The one thing I’m still not convinced is the title of the paper: While they clearly define what they mean by social distance, my sociological training keeps interfering and social distance doesn’t seem fit to express a characteristic of a society. Perhaps it’s easier to talk of the more concrete instances of ethnic diversity, or income disparity.
Brown, Graham K., and Arnim Langer. 2016. ‘A General Class of Social Distance Measures’. Political Analysis, March, mpw002. doi:10.1093/pan/mpw002.
It’s the time of the year I make my students read codebooks (to choose a data set). It often strikes me how complex survey questions can be, especially once we take into account introductions and explanations. The quest is clear: precision, ruling out alternative understandings. Often, these are (or seem to be) the sole tools we have to ensure measurement validity.
Against this background, a paper by Sebastian Lundmark et al. highlights that minimally balanced questions are best for measuring generalized trust: asking whether “most people can be trusted or that you need to be very careful in dealing with people” (fully balanced) is beaten by questions that limit themselves to whether it is “possible to trust people.”
Lundmark, Sebastian, Mikael Gilljam, and Stefan Dahlberg. 2015. ‘Measuring Generalized Trust An Examination of Question Wording and the Number of Scale Points’. Public Opinion Quarterly, October, nfv042. doi:10.1093/poq/nfv042.
David Broockman has an important paper on political representation apparently forthcoming in LSQ.
He notes two ways to study the political representation of issues, policies, and preferences. On the one hand we can examine citizen-elite congruence issue by issue. On the other hand, we can calculate “policy scores” to capture ideal points of overall ideologies and compare these between citizens and the elite. The paper convincingly demonstrates that the latter approach is flawed in the sense that it doesn’t really capture political representation in the way we generally understand it.
Broockman, David E. 2015. “Approaches to Studying Policy Representation.” Legislative Studies Quarterly.
When using automatic content analysis like Wordscores or Wordfish, stop words may be used. This is a contentious issue, with recommendations ranging from definitely use stop words to those who argue that stop words are a bad thing. What to do?
To me this sounded more like an empirical question than something beliefs could settle. Using professionally translated texts (i.e. party manifestos available in two languages), I examined how stop words affect predicted scores (i.e. party positions). Lowe & Benoit (2013) highlight that words considered as a priori uninformative can help predict party positions altogether. This can be used as an argument against using stop words. In my analysis, I applied just a few stop words, consisting almost entirely of grammatical terms like articles and conjunctions (function words). It turns out that removing these words can almost entirely remove the impact of language on predicted scores. Put differently, removing words that really carry no meaning can improve the predictions.
So should we use stop words? Yes, but we don’t need many stop words, and using stop words that clearly carry no substantive information seems to be a good idea.
Lowe, Will, and Kenneth Benoit. 2013. “Validating estimates of latent traits from textual data using human judgment as a benchmark.” Political Analysis 21 (3): 298–313. doi:10.1093/pan/mpt002.
Ruedin, D. 2013. “The Role of Language in the Automatic Coding of Political Texts.” Swiss Political Science Review 19 (4): 539–45. doi:doi:10.1111/spsr.12050.