I have mentioned Cees van der Eijk’s measure of agreement before, and Leik’s measure of ordinal consensus. Unsurprisingly, others have come across this issue, discontent with the widespread use of standard deviations (inappropriate as this can be). Tastle & Wierman (2007) take a quite different approach, taking the Shannon entropy as the starting point. I have added this to my R package agrmt on R-Forge, and will push it through to CRAN once the documentation is up to scratch. It’s interesting how many different approaches are developed to address the same problem; clearly the different solutions have not spread wide enough to prevent doubling the effort.
Tastle, W., and M. Wierman. 2007. Consensus and dissention: A measure of ordinal dispersion. International Journal of Approximate Reasoning 45 (3): 531-545.
In 1966 Robert K. Leik introduced a measure of ordinal consensus based on cumulative frequency distributions. It can be used to express agreement or polarization, just like Cees van der Eijk‘s measure of agreement “A”, and its derived measure of polarization. A difference exists in that in Leik’s measure, an equal distribution of frequencies – all categories equally common – does not always give the same value. Leik defends this, arguing that an equal distribution should only be considered the mid-point between agreement and polarization if the number of categories is very large. With a small number of categories, polarization may simply be a result of chance.
Here’s a graphical summary of how Leik’s measure of ordinal dispersal behaves with increasing numbers of categories (consensus is defined as 1 minus dispersal), as outlined in table 3 of the article.
Leik’s measure of ordinal dispersion is available in the latest version of the package agrmt (version 0.27, not yet on CRAN)
Leik, R. 1966. ‘A measure of ordinal consensus’. Pacific Sociological Review 9 (2): 85–90.
In survey data, actual distributions frequently are non-continuous and non-normal. The mean may thus be inappropriate to summarize the central tendency, and the median too rough because it is constrained to the actual categories in the data. On a five-point scale, the median can only fall on one of these categories, and thus does not reflect smaller changes in the distribution. The interpolated median adjusts the median position to do just that.
The concept of interpolated medians is nicely described at http://aec.umich.edu/median.php (at the bottom), or at http://www.weekscomputing.com/webhelp/hs520.htm (with a nice graph).
In R, the interpolated median can easily be calculated with the package psych, using the
To illustrate the difference, here a simple plot using data from the SOM project. Here I use the positional variable (all actors in Austria combined) by year. It is immediately apparent that the three ways to express central tendency are different. The median position (in blue) is clearly constrained by the categories available (in this case -1, -0.5, 0, 0.5, 1). With its corrections, the interpolated median offers much more details of changes over time.
OK, not a very pretty graph, but it does illustrate the point.