Using Shading in Coefficient Plots

Having recently discovered the R package denstrip, I was struck by intuitive it is to use shading to express uncertainty. If you need convincing, check out the paper by Jennifer Lundquist and Ken Lin linked below.

coefplot.fadeI wondered whether this would also work well for generic coefficient plots to examine regression results. The proof is in the pudding, so here’s some quickly thrown together code (also on Gist).

fade <- function(x, labels=names(coef(x)), expo=FALSE, xlab="", ylab="", bty="n", ...) {
# argument: a regression, additional arguments passed on to plot() and text()
coe <- summary(x)$coefficients[,1] # extract coefficients
cse <- summary(x)$coefficients[,2] # standard errors
len <- length(coe) # how many coefficients (without intercept)
if(expo == TRUE) { # exponential form
coe <- exp(coe)
cse <- exp(cse)
ran <- c(min(coe)- 3*max(cse), max(coe)+3*max(cse)) # range of values
plot(0, ylim=c(1.5,len+0.5), xlim=ran, xlab=xlab, ylab=ylab, bty=bty, type="n", ..., axes=FALSE) # empty plot
# title, xlab, ylab, etc. can be used here.
for(i in 2:len) {
dens <- rnorm(mean=coe[i], sd=cse[i], n=1000)
denstrip(dens, at=i) # passing arguments conflicts with plot()
text(ran[1],i, labels[i], adj = c(0,0), ...) # adj to left-align
# cex, col can be used here

Well, it does work (see figure above). If we add an abline(v=0, lty=3), we can easily highlight the zero line.

coefplot.armHowever, intuitive as it is, I’m not really convinced. Here’s the coefficient plot the arm package produces. While it doesn’t use shading, lines of different thickness are used to indicate the standard errors — nicely de-emphasizing the end-point. There’s more emphasis on the point estimate (i.e. the coefficient itself: the square), while my code hides this. I did try adding a white line at the point estimate (third figure, below), but this doesn’t resolve my biggest worry: the amount of ink that is used to convey the message… I’m just not convinced that the shading adds that much to the relatively simple coefficient plots in the package arm.


(There’s an argument width that could be added to line calling denstrip, like width=0.2, but it hasn’t convinced me enough.)

Jackson, Christopher H. 2008. “Displaying Uncertainty with Shading.” The American Statistician 62 (4): 340–47. doi:10.1198/000313008X370843.

Lundquist, Jennifer H., and Ken-Hou Lin. 2015. “Is Love (Color) Blind? The Economy of Race among Gay and Straight Daters.” Social Forces, March, sov008. doi:10.1093/sf/sov008.

How to Create Coefficient Plots in R the Easy Way

Presenting regression analyses as figures (rather than tables) has many advantages, despite what some reviewers may think

coefplottables2graphs has useful examples including R code, but there’s a simpler way. There’s an R package for (almost) everything, and (of course) you’ll find one to produce coefficient plots. Actually there are several ones.

The one I end up using most is the coefplot function in the package arm. It handles most common models out of the box. For those it doesn’t, you can simply supply the coefficients. Here’s the code for the coefficient plot shown. The first two lines are just to get the data in case you’re interested in full replication.

The default in arm is to use a vertical layout, so coefplot(m1) works wonderfully. Often I prefer the horizontal layout, which is easily done with vertical=FALSE; I also add custom margins so that the variable names are fully visible.

library(car) # for the example data
data(Duncan) # example data
m1 = lm(prestige ~ income + education + type, data=Duncan)
library(arm) # for coefplot()
coefplot(m1, vertical=FALSE, mar=c(5.5,2.5,2,2))

If I want to plot the coefficients of a model not supported, like Cox Proportional Hazard survival models, all it takes is to supply the coefficients. The coefplot function takes many arguments as we would expect it. Here’s an example. I supply the coefficients and SD as required (using subsets from the results), specify the variable names, and set the limits of the y-axis. In this example, I specify the columns with the coefficients [,1] and the SD [,2]. Specifying variable names is something I often do, because we don’t want to communicate using our (internal) variable names; even I find myself struggling to understand what some variables stood for when I get back to old results after a few months. Usually there is no need to specify the limits of the axes (ylim), but I included this here to show that the function takes many standard arguments, and because if we compare different models this can be useful. Speaking of comparing models, running coefplot a second time with add=TRUE does just this.

variableNames = c("Intercept", "Income", "Education", "Type Prof", "Type W.Coll")
coefplot(summary(m1)$coefficients[,1], summary(m1)$coefficients[,2], vertical=FALSE, varnames=variableNames, ylim=c(-5, 25), main="")

There’s also a dedicated package for coefficient plots called coefplot, but somehow it often failed me. There’s also Ben Bolker‘s coefplot2; it requires installation with the following code (it’s not on CRAN):

install.packages("coefplot2", repos="", type="source")

You might also be interested in the visreg package. Although it doesn’t do coefficient plots, it visualizes regression analyses so that you can see the data alongside the results.

Finally, for those happy to code in R, have a look at the figures (and code) by Carlisle Rainey. That’s nice for polishing the results for publication, but seems a bit complicated for a first look at the results. Indeed, even if we go for the table in the end, doing coefficient plots are a very useful tool for us researchers to understand the analyses we run, actually see what these figures amount to!

P.S. There are also legitimate reasons to keep the table, of course, but that’s something for the supplementary file.