Are Low-Skilled Minorities Discriminated More?

Today a colleague asked me whether our recent meta-analysis drew any inferences on whether low-skilled minorities are discriminated more than highly-skilled minorities. It does so only at the margins — mostly in the supplementary material (S13). And to be precise, with the data at hand, we can’t say anything about the skills of the applicants, but we’re talking about the skills levels necessary for the job at hand.

What about the average call-back ratios by skills-level of the job? The data are available on Dataverse: doi:10.7910/DVN/ZU8H79.

First we load the data file.

disc = read.csv("meta-clean.csv", header=TRUE, sep=",", fileEncoding="UTF8")

Then we simply average across skills levels (using aggregate). For the meta-analytic regression analysis, refer to the supplementary material. Here we only look at the “subgroup” level, and store the averages in a variable called x.

x = aggregate(disc$[disc$global=="subgroup"], by=list(Global=disc$global[disc$global=="subgroup"], Skills=disc$skills[disc$global=="subgroup"]), mean, na.rm=TRUE)

Since I want a figure, I’m sorting the result, and I don’t include the call-back rate for studies where the skills level was not indicated. Then I add the labels.

p = sort(x[2:4,3])
names(p) = c("high skills", "mixed skills", "low skills")

Finally, here’s the figure. I specify the ylim to include zero so as not to suggest bigger differences as there are.

barplot(p, ylim=c(0,2.2), bty="n", ylab="Average Call-Back Ratio")

The difference between “high” and “low” is statistically significant in a t-test (p=0.002).

Also on Figshare.

I also looked at the ISCO-88 codes. Now, the level of detail included in the different studies varies greatly, and the data file includes text rather than numbers, because some cells include non-numeric characters. After struggling a bit with as.numeric on factors, I chose a different approach using our good friend sapply.

I create a new variable for the 1-digit ISCO-88 codes. There are 781 rows. For each row, I convert what’s there into a character string (in case it isn’t already), then use substr to cut the first character, and then turn this into numbers.

disc$isco88_1 = sapply(1:781, function(x) as.numeric(substr(as.character(disc$isco88[x]), 0, 1)))

We can again run aggregate to average across occupation levels.

aggregate(disc$[disc$global=="subgroup"], by=list(Global=disc$global[disc$global=="subgroup"], ISCO88=disc$isco88_1[disc$global=="subgroup"]), mean, na.rm=TRUE)

ISCO88 x
2 1.629796
4 1.422143
5 2.142449

I am not including all the output, because there are too few cases for some of the levels:

ISCO-88 Level 1 2 3 4 5 7 8 9
N 3 68 8 36 62 7 11 12

Zschirnt, Eva and Didier Ruedin. 2016. “Ethnic discrimination in hiring decisions: A meta-analysis of correspondence tests 1990–2015”, Journal of Ethnic and Migration Studies. Forthcoming. doi:10.1080/1369183X.2015.1133279

Taking Notes on Readings/Papers

4531792759_89882afbe4Here’s something I’ve meant to share for a while now. I use Zotero to manage things I read (articles, books, conference papers, etc.), but what follows is applicable to any similar software. I keep notes on everything I read, and over the years this has evolved into something quite structured (a template in fact). The fact that it is structured is quite useful when I come back to a paper after a while. Here’s the template:

Research question:
Dependent variable:
Explanatory variable:

Obviously, not all papers will have something for each heading. While the heading research question is rather innocuous, unfortunately it’s not always as easy to fill in as it should be. The dependent variable is the quantity of interest; under explanatory variable I include the main explanations. I tend to include control variables here, too, although in brackets.

Data describes the data sources, such as the survey used, the countries covered, population covered, N; experts, ABM, or even “data free”, whatever seems the most adequate description. Method is for methodological details. While usually we are more interested in the results rather than how they were obtained, a quick glance at the methods (and data) can be really helpful in determining how much weight I want to give a particular result.

The heading mechanism is often challenging to fill in, simply because many papers do not state them explicitly, or because the theory section is not tightly connected with the empirical part. I’m not lamenting here; I guess I’m guilty of this, too…

Often my interest is in the results section, where I summarize the main findings. The heading notes takes everything else, namely free notes.

The whole things is (deliberately) rather flexible, but it helps with two things: (1) read papers with some focus, (2) have notes in a format that allow me to retrieve relevant information more quickly (here the advantage of a database over Anki, but obviously only when things can be found).

Image credit