Today a colleague asked me whether our recent meta-analysis drew any inferences on whether low-skilled minorities are discriminated more than highly-skilled minorities. It does so only at the margins — mostly in the supplementary material (S13). And to be precise, with the data at hand, we can’t say anything about the skills of the applicants, but we’re talking about the skills levels necessary for the job at hand.
What about the average call-back ratios by skills-level of the job? The data are available on Dataverse: doi:10.7910/DVN/ZU8H79.
First we load the data file.
disc = read.csv("meta-clean.csv", header=TRUE, sep=",", fileEncoding="UTF8")
Then we simply average across skills levels (using aggregate
). For the meta-analytic regression analysis, refer to the supplementary material. Here we only look at the “subgroup” level, and store the averages in a variable called x.
x = aggregate(disc$relative.call.back.rate[disc$global=="subgroup"], by=list(Global=disc$global[disc$global=="subgroup"], Skills=disc$skills[disc$global=="subgroup"]), mean, na.rm=TRUE)
Since I want a figure, I’m sorting the result, and I don’t include the call-back rate for studies where the skills level was not indicated. Then I add the labels.
p = sort(x[2:4,3])
names(p) = c("high skills", "mixed skills", "low skills")
Finally, here’s the figure. I specify the ylim
to include zero so as not to suggest bigger differences as there are.
barplot(p, ylim=c(0,2.2), bty="n", ylab="Average Call-Back Ratio")
The difference between “high” and “low” is statistically significant in a t-test (p=0.002).
Also on Figshare.
I also looked at the ISCO-88 codes. Now, the level of detail included in the different studies varies greatly, and the data file includes text rather than numbers, because some cells include non-numeric characters. After struggling a bit with as.numeric
on factors, I chose a different approach using our good friend sapply.
I create a new variable for the 1-digit ISCO-88 codes. There are 781 rows. For each row, I convert what’s there into a character string (in case it isn’t already), then use substr
to cut the first character, and then turn this into numbers.
disc$isco88_1 = sapply(1:781, function(x) as.numeric(substr(as.character(disc$isco88[x]), 0, 1)))
We can again run aggregate
to average across occupation levels.
aggregate(disc$relative.call.back.rate[disc$global=="subgroup"], by=list(Global=disc$global[disc$global=="subgroup"], ISCO88=disc$isco88_1[disc$global=="subgroup"]), mean, na.rm=TRUE)
ISCO88 x
2 1.629796
4 1.422143
5 2.142449
I am not including all the output, because there are too few cases for some of the levels:
ISCO-88 Level 1 2 3 4 5 7 8 9
N 3 68 8 36 62 7 11 12
Zschirnt, Eva and Didier Ruedin. 2016. “Ethnic discrimination in hiring decisions: A meta-analysis of correspondence tests 1990–2015”, Journal of Ethnic and Migration Studies. Forthcoming. doi:10.1080/1369183X.2015.1133279