Cronbach’s alpha is a common way to test the internal consistency of scales. In a recent scale I constructed, I got an excellent alpha, and started wondering to what extent the many zeros in my data were the cause. Basically, a sizeable proportion of the respondents answered “no” to all the questions, and I wanted to know to what extent this drives the alpha rather than having picked good questions.
What I needed was a base-line, which I simulated in R (code).
Basically I start with a random draw, and then gradually replace the values with zeros (could be any value). We can see that many zeros (x-axis) are required to drive the alpha (y-axis). If more than about half the values are zeros, we probably should start being bit more careful in interpreting alphas directly.
A quick conversation with William Revelle confirmed that I’m not looking at what he calls “lumpy data”, and that factor analysis was indeed the correct reaction. Given the way Cronbach’s alpha reacts to zero-inflation, factor analyses may be a necessary addition to the alpha when more than half the values are zeros.
Cronbach, Lee J. 1951. “Coefficient alpha and the internal structure of tests.” Psychometrika 16 (3): 297–334. doi:10.1007/BF02310555.
Revelle, William. 2013. Psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, Illinois. http://CRAN.R-project.org/package=psych.