Over at the BI team, there’s a nice summary of the lack of evidence on unconscious bias and diversity training. Note in particular the difference between perceived “effectiveness” and the lack of evidence that behaviour actually changed. As usual, the focus is really on application and the question what should be done. Discrimination is too serious an issue that we can leave it to feel-good check-box exercises!
I’m sure I’m not the first to notice, but it seems to me that peer-review encourages p-hacking. Try this: (1) pre-register your analysis of a regression analysis before doing the analysis and writing the paper (in your lab notes, or actually on OSF). (2) Do the analysis, and (3) submit. How often do we get recommendations or demands to change the model during the peer-reviewing process? How about controlling for X, should you not do Y, or you should do Z, etc.
Unless we’re looking at a pre-registered report, we’re being asked to change the model. Typically we don’t know whether these suggestions are based on theory or the empirical results. In the former case, we should probably do a new pre-registration and redo the analysis. Sometimes we catch important things like post-treatment bias… In the latter case, simply resist?
And as reviewers, we should probably be conscious of this (in addition to the additional work we’re asking authors to do, because we know that at this stage authors will typically do anything to get the paper accepted).
I’ve long been critical of population estimates as ‘evidence’ of racism, but now there is no reason left to do so. The basic ‘evidence’ is as follows: There are say 5% immigrants in country X, you ask the general population, and their mean estimate is maybe that there are 15% immigrants in the country. Shocking, they overestimate the immigrant population, which is ‘evidence’ that the general population is generally racist (I enjoyed this phrase). I’ve been critical of this because of three reasons. First, we don’t generally tell survey participants what we mean by ‘immigrants’, but use a specific definition (foreign citizens, foreign born) for the supposedly correct answer. Second, why should members of the general population have a good grasp of the size of the immigrant population? We might be able to estimate the share of immigrants in our personal network, but that’s not the same as estimating population shared. Third, if we see this as evidence of racism, we assume that the threat perspective is dominant.
It turns out, however, that there is a general human tendency to overestimate the population share of small groups: immigrants, homosexuals, you name it. David Landy and colleagues demonstrate that this tendency to overestimate small groups comes hand in hand with a tendency to underestimate large groups — a pull towards the average. There’s nothing particular about immigrants there, and nothing about racism either.
Landy, D., B. Guay, and T. Marghetis. 2017. ‘Bias and Ignorance in Demographic Perception’. Psychonomic Bulletin & Review, August, 1–13. https://doi.org/10.3758/s13423-017-1360-2.
Photo: CC-by-nc-nd by IceBone
When Eva Zschirnt and I were working on the meta-analysis on ethnic discrimination in hiring, I also run one of these tests for publication bias (included in the supplementary material S12). According to the test, there are a couple of studies “missing”, and we left this as a puzzle. Here’s what I wrote at the time: “Given that studies report discrimination against minority groups rather consistently, we suspect that a study finding no difference between the minority and majority population, or even one that indicates positive discrimination in favour of the minority groups would actually be easier to publish.” (emphasis in original).
We were actually quite confident not to have missed many studies. One way is to dismiss the assumptions between the tests for publication bias. Perhaps a soft target, but who are we to say that there are no missing studies?
Here’s another explanation that didn’t occur to me at the time, and nobody we asked about it explicitly came up with it. It’s just a guess, and will remain one. David Neumark has suggested a correction for what he calls the “Heckman critique” in 2012. We were aware of this, but I did not connect the dots until reading David Neumark and Judith Rich‘s 2016 NBER working paper where they apply this correction to 9 existing correspondent tests. They find that the level of discrimination is often over-estimated without the correction: “For the labor market studies, in contrast, the evidence is less robust; in about half of cases covered in these studies, the estimated effect of discrimination either falls to near zero or becomes statistically insignificant.”
This means that the “Heckman critique” seems justified, and at least in the labour market some of the field experiments seem to overstate the degree of discrimination. Assuming that this is not unique to the papers they could re-examine, the distribution of effect sizes in the meta-analysis would be a bit different and include more studies towards the no discrimination end. I can imagine that in this case, the test for publication bias would no longer suggest “missing” studies. Put different, these “missing” studies were not missing, but reported biased estimates.
The unfortunate bit is that we cannot find out, because the correction provided by David Neumark has data requirements not all existing studies can meet. But at least I have a potential explanation to that puzzle: bias of a different kind than publication bias and the so-called file-drawer problem.
Neumark, D. (2012). ‘Detecting discrimination in audit and correspondence studies’, Journal of Human Resources, 47(4), pp. 1128-157
Neumark, David, and Judith Rich. 2016. “Do Field Experiments on Labor and Housing Markets Overstate Discrimination? Re-Examination of the Evidence.” NBER Working Papers w22278 (May). http://www.nber.org/papers/w22278.pdf.
Zschirnt, Eva, and Didier Ruedin. 2016. “Ethnic Discrimination in Hiring Decisions: A Meta-Analysis of Correspondence Tests 1990–2015.” Journal of Ethnic and Migration Studies 42 (7): 1115–34. doi:10.1080/1369183X.2015.1133279.
Why are reviews always coming back in batches? Send out (say) three papers spread out nicely over a couple of months, and you’ll get the reviews almost at the same time… Well, surely the always bit wouldn’t hold up any empirical test, but it’s striking how randomness plays from time to time (or is it striking that it still feels striking even if we actually know what randomness can look like?).