Bootstrapping uses resampling to assign measures of accuracy, and it can easily be used in R. When I first used it, it took me a while to figure out the double subscripts needed, so here is how to do it.
First, we’ll need the boot package from CRAN. Here’s an example using polarization from my agrmt package. After installing the packages, we define a function. So in this case, I use polarization(collapse(POSIT, pos=w))
to calculate the point estimates of polarization. POSIT is the variable of interest – I am interested in the polarization in this positional variable. To use this as a function for bootstrapping, a second subscript is necessary: p <- function(x,y) {polarization(collapse(x[y], pos=w))}
.
The call of the boot
package is simple: boot(data, function, replications)
, as long as you have a function with double subscripts. So we would use: mean(x[y]) rather than mean(x) in the function.
w <- c(-1,-0.5,0,0.5,1) # positions at which data could occur, used by polarization function: the pos=w argument
c <- c(0,0,2,2,0) # collapsed data; *not* used
d <- c(0,0,-.5,-.5) # raw data; used; normally a variable
z <- boot(d,p,500) # bootstrapping: data=d, function=p, 500 draws
To come back to my example, calculating standard errors is simple: sd(as.numeric(boot(POSIT,p,500)$t))
. So we run 500 draws of the function p defined above, and calculate the standard error.
Here’s an example using the mean: p <- function(x,y) {mean(x[y], na.rm=TRUE)}
Using functions may appear cumbersome at first sight, but once you use sapply
to calculate many standard errors at once, for instance, it becomes much easier. sapply(1980:2010, function(x) sd(as.numeric(boot(POSIT[YEAR == x],p,500)$t)))
will get the standard errors for each of the 30 years.