Measuring Consensus

I have mentioned Cees van der Eijk’s measure of agreement before, and Leik’s measure of ordinal consensus. Unsurprisingly, others have come across this issue, discontent with the widespread use of standard deviations (inappropriate as this can be). Tastle & Wierman (2007) take a quite different approach, taking the Shannon entropy as the starting point. I have added this to my R package agrmt on R-Forge, and will push it through to CRAN once the documentation is up to scratch. It’s interesting how many different approaches are developed to address the same problem; clearly the different solutions have not spread wide enough to prevent doubling the effort.

Tastle, W., and M. Wierman. 2007. Consensus and dissention: A measure of ordinal dispersion. International Journal of Approximate Reasoning 45 (3): 531-545.

Leik’s Measure of Ordinal Consensus

In 1966 Robert K. Leik introduced a measure of ordinal consensus based on cumulative frequency distributions. It can be used to express agreement or polarization, just like Cees van der Eijk‘s measure of agreement “A”, and its derived measure of polarization. A difference exists in that in Leik’s measure, an equal distribution of frequencies – all categories equally common – does not always give the same value. Leik defends this, arguing that an equal distribution should only be considered the mid-point between agreement and polarization if the number of categories is very large. With a small number of categories, polarization may simply be a result of chance.

Here’s a graphical summary of how Leik’s measure of ordinal dispersal behaves with increasing numbers of categories (consensus is defined as 1 minus dispersal), as outlined in table 3 of the article.

Leik’s measure of ordinal dispersion is available in the latest version of the package agrmt (version 0.27, not yet on CRAN)

Leik, R. 1966. ‘A measure of ordinal consensus’. Pacific Sociological Review 9 (2): 85–90.

Calculating Polarization

How can we enumerate the polarization of a party system, or the polarization of opinions? Polarization exists when the population are divided in their opinions. If we measure these opinions on an ordered scale (as is common place), we’re looking at peaks in two non-adjacent positions. An ideal type would be 50% for an issue, and 50% against it.

The opposite ideal type can help us formulate what we mean by polarization. If all positions are equally popular, we cannot really speak of polarization, but it is not the logical opposite. The opposite of polarization is agreement: everyone has the same position on an issue.

To enumerate polarization, we can work backwards from Cees van der Eijk‘s (2001) measure of agreement: inverting it. I’ve written up a few functions to do this in R.

Van der Eijk, C. 2001. “Measuring agreement in ordered rating scales.” Quality and Quantity 35(3): 325-341.

Bootstrapping in R

Bootstrapping uses resampling to assign measures of accuracy, and it can easily be used in R. When I first used it, it took me a while to figure out the double subscripts needed, so here is how to do it.

First, we’ll need the boot package from CRAN. Here’s an example using polarization from my agrmt package. After installing the packages, we define a function. So in this case, I use polarization(collapse(POSIT, pos=w)) to calculate the point estimates of polarization. POSIT is the variable of interest – I am interested in the polarization in this positional variable. To use this as a function for bootstrapping, a second subscript is necessary: p <- function(x,y) {polarization(collapse(x[y], pos=w))}.

The call of the boot package is simple: boot(data, function, replications), as long as you have a function with double subscripts. So we would use: mean(x[y]) rather than mean(x) in the function.

w <- c(-1,-0.5,0,0.5,1) # positions at which data could occur, used by polarization function: the pos=w argument
c <- c(0,0,2,2,0) # collapsed data; *not* used
d <- c(0,0,-.5,-.5) # raw data; used; normally a variable
z <- boot(d,p,500) # bootstrapping: data=d, function=p, 500 draws

To come back to my example, calculating standard errors is simple: sd(as.numeric(boot(POSIT,p,500)$t)). So we run 500 draws of the function p defined above, and calculate the standard error.

Here’s an example using the mean: p <- function(x,y) {mean(x[y], na.rm=TRUE)}

Using functions may appear cumbersome at first sight, but once you use sapply to calculate many standard errors at once, for instance, it becomes much easier. sapply(1980:2010, function(x) sd(as.numeric(boot(POSIT[YEAR == x],p,500)$t))) will get the standard errors for each of the 30 years.