With the R command
sapply() we can easily apply a function many times. Here I simply want to highlight that
sapply() can be used within
sapply(): it can be nested.
First, a simple application: I have several countries in a dataset, and want to generate a table for each of them.
sapply(c("AT", "DE", "CH"), function(x) round(prop.table(table(object[country == x]))*100, 1))
Step by step: the
table() function counts how many cases there are in each category of the variable
object. The subscript
[country == x] means that R replaces the x with one of the items provided in each round: “AT”, then “DE”, and finally “CH”. The
prop.table() function turns the counts into proportions. Here I also multiply these by 100 to get percentages, and round them off to just one digit. The
function(x) part tells R that we define our own function, and that
x is the variable to use. The first argument is the countries I want to use. I end up with a table, with the countries across and with the categories of by variable
With just three countries, using
sapply() can be rather trivial, but how about running the code on all countries in the dataset? We can use
sapply() rather than
for() loops has two important advantages. First, it is often faster. Second, we usually end up with (much) more compact code, reducing the risk of mistakes when copying and pasting code.
Let’s assume our dataset includes variation over time as well as across countries. We can simply nest two
sapply() commands. Here we have code to calculate the median salience by country and year.
cy <- c("AT", "DE", "CH")
yr <- 2000:2010
country.salience <- sapply(cy, function(x) sapply(yr, function(y) median(salience[country == x & year == y], na.rm=TRUE)))
rownames(country.salience) <- yr
colnames(country.salience) <- cy
At the top I define the countries of interest, and the years I want to examine. At the code, we take the median of the variable
salience for country x and year y. The first
sapply() runs this on the countries chosen, the second
sapply() runs this on the years chosen. The last two lines simply name the rows and columns to give an readily accessible table.
What if we now want the interpolated median instead of the median? We simply replace that part of the code. In contrast to copy & paste code, we make the change once, not for all the country/year combinations.