Calculating Standard Deviations on Specific Columns/Variables in R

When calculating the mean across a set of variables (or columns) in R, we have colMeans() at our disposal. What do we do if we want to want to calculate say the standard deviation? There are a couple of packages offering such a function, but there is no need, because we have apply().

Let’s start with creating some data, a matrix with 3 columns full of random numbers.

M <- matrix(rnorm(30), ncol=3)

This gives us something like this:
[,1] [,2] [,3]
[1,] -0.3533716 -1.12408752 0.09979301
[2,] 0.6099991 -0.48712761 0.22566861
[3,] -0.9374809 -1.10497004 -0.26493616
[4,] -0.5243967 -0.66074559 0.16858864
[5,] 0.2094733 -0.45156576 -0.27735151
[6,] 0.6800691 1.82395926 -0.18114150
[7,] 0.1862829 0.43073422 0.14464538
[8,] -1.0130029 -1.52320349 -1.74322076
[9,] 1.1886103 0.09653443 -1.95614608
[10,] -0.9953963 -1.15683775 1.61106346

Now comes apply(), where 1 indicates that we want to apply the function specified (here: sd(), but we can use any function we want) across columns; we can use 2 to apply it across rows).

apply(M, 1, sd)

This gives us the standard deviations for each row:

[1] 0.6187682 0.5566979 0.4446021 0.4447124 0.3426177 1.0058659 0.1545623
[8] 0.3745954 1.5966433 1.5535429

We can quickly check whether these numbers are correct:

sd(c(-0.3533716, -1.12408752, 0.09979301))

[1] 0.6187682

Of course we can choose the variables or columns we want, such as this apply(M[,2:3], 1, sd) or by using cbind().

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s