# Double sapply()

With the R command `sapply()` we can easily apply a function many times. Here I simply want to highlight that `sapply()` can be used within `sapply()`: it can be nested.

First, a simple application: I have several countries in a dataset, and want to generate a table for each of them.

```sapply(c("AT", "DE", "CH"), function(x) round(prop.table(table(object[country == x]))*100, 1)) ```

Step by step: the `table()` function counts how many cases there are in each category of the variable `object`. The subscript `[country == x]` means that R replaces the x with one of the items provided in each round: “AT”, then “DE”, and finally “CH”. The `prop.table()` function turns the counts into proportions. Here I also multiply these by 100 to get percentages, and round them off to just one digit. The `function(x)` part tells R that we define our own function, and that `x` is the variable to use. The first argument is the countries I want to use. I end up with a table, with the countries across and with the categories of by variable `object` down.

With just three countries, using `sapply()` can be rather trivial, but how about running the code on all countries in the dataset? We can use `unique(country)`. Using `sapply()` rather than `for()` loops has two important advantages. First, it is often faster. Second, we usually end up with (much) more compact code, reducing the risk of mistakes when copying and pasting code.

Let’s assume our dataset includes variation over time as well as across countries. We can simply nest two `sapply()` commands. Here we have code to calculate the median salience by country and year.

```cy <- c("AT", "DE", "CH") yr <- 2000:2010 country.salience <- sapply(cy, function(x) sapply(yr, function(y) median(salience[country == x & year == y], na.rm=TRUE))) rownames(country.salience) <- yr colnames(country.salience) <- cy ```

At the top I define the countries of interest, and the years I want to examine. At the code, we take the median of the variable `salience` for country x and year y. The first `sapply()` runs this on the countries chosen, the second `sapply()` runs this on the years chosen. The last two lines simply name the rows and columns to give an readily accessible table.

What if we now want the interpolated median instead of the median? We simply replace that part of the code. In contrast to copy & paste code, we make the change once, not for all the country/year combinations.

## 5 thoughts on “Double sapply()”

1. Didier Ruedin says:

Do I get brownie points for using a quadruple sapply? This week I had fun calculating exponential moving averages on my data: a first sapply to get the average for the 7-year span, a second sapply to do this for all the 20 years in the dataset, a third sapply to do this for all the parties in the dataset, and fourth to this for all 8 countries in the data set. That’s a single line of code to produce a lot of calculations… Just to show that nesting sapply functions really can be useful!

2. Glen says:

Hi Didlier, If possible, I would love to see that quadruple sapply code. I am working on a project to use similar datapoints. Thank you.

3. Didier Ruedin says:

Hi Glen. Thanks for checking in. I’m not sure what exactly I was working on five years ago, but the principle is the same as described in the post above.

4. Didier Ruedin says:

Hi Glen. Here is comes:

countries <- unique(country)
everything <- sapply(countries, function(cy) {
parties <- unique(party[country==cy]) # all parties
this <- sapply(parties, function(z) sapply(1990:2013, function(y) mav(as.numeric(sapply(seq(y-3,y+3), function(x) experts.raw[country==cy & party==z & year==x])))))
rownames(this) <- 1990:2013
colnames(this) > mav() is a moving average function. I have wrapped it into an sapply() at the time to get the seven years in my dataset separately.

mav <- function(x,k=3){
n > The dataset consists of several columns, of which “country”, “year”, “party”, and “expert.raw” are relevant. In the “expert.raw” we have the party position from an expert survey, the other three columns identify the country, year, and party. There are plenty of missing values because we do not have expert estimates for all the elections.

>> In the first sapply(), I basically ask R to run the code for each of the countries

>> In the second sapply(), I ask R to run the code for each party in a country

>> In the third sapply(), I ask R to run the code for each year (for a given party in a given country).

>> The fourth sapply() feeds the existing expert estimates for the year examined +/- 3 years (i.e. 7 years) and calculates the moving average on that.

>> After closing all the relevant brackets, I just label the column and row names

>> The resulting object “everything” is a list of 8 (countries) with a column for each party in a country and the years 1990 to 2013. The cells give the moving average of the expert positions.

This site uses Akismet to reduce spam. Learn how your comment data is processed.