Replication as learning

As part of the course on applied statistics I’m teaching, my students have to try to replicate a published paper (or, typically, part of the analysis). It’s an excellent opportunity to apply what they have learned in the course, and probably the best way to teach researcher degrees of freedom and how much we should trust the results of a single study. It’s also an excellent reminder to do better than much of the published research in providing proper details of the analysis we undertake. Common problems include not describing the selection of cases (where not everyone remains in the sample), opaque recoding of variables, and variables that are not described. An interesting case is the difference between what the authors wanted to do (e.g. restrict the sample to voters) and what they apparently did (e.g. forge to do so). One day, I hope this exercise will become obsolete: the day my students can readily download replication code…

Image: CC-by-nd Tina Sherwood Imaging https://flic.kr/p/8iz7qS

Calculating Standard Deviations on Specific Columns/Variables in R

When calculating the mean across a set of variables (or columns) in R, we have colMeans() at our disposal. What do we do if we want to want to calculate say the standard deviation? There are a couple of packages offering such a function, but there is no need, because we have apply().

Let’s start with creating some data, a matrix with 3 columns full of random numbers.

M <- matrix(rnorm(30), ncol=3)

This gives us something like this:
[,1] [,2] [,3]
[1,] -0.3533716 -1.12408752 0.09979301
[2,] 0.6099991 -0.48712761 0.22566861
[3,] -0.9374809 -1.10497004 -0.26493616
[4,] -0.5243967 -0.66074559 0.16858864
[5,] 0.2094733 -0.45156576 -0.27735151
[6,] 0.6800691 1.82395926 -0.18114150
[7,] 0.1862829 0.43073422 0.14464538
[8,] -1.0130029 -1.52320349 -1.74322076
[9,] 1.1886103 0.09653443 -1.95614608
[10,] -0.9953963 -1.15683775 1.61106346

Now comes apply(), where 1 indicates that we want to apply the function specified (here: sd(), but we can use any function we want) across columns; we can use 2 to apply it across rows).

apply(M, 1, sd)

This gives us the standard deviations for each row:

[1] 0.6187682 0.5566979 0.4446021 0.4447124 0.3426177 1.0058659 0.1545623
[8] 0.3745954 1.5966433 1.5535429

We can quickly check whether these numbers are correct:

sd(c(-0.3533716, -1.12408752, 0.09979301))

[1] 0.6187682

Of course we can choose the variables or columns we want, such as this apply(M[,2:3], 1, sd) or by using cbind().

Another one to watch: jamovi for stats

Here’s another open statistical program to watch: jamovi. Like JASP, jamovi is built on top of R. Unlike JAPS, jamovi is not focused on Bayesian analysis, but wants to be community driven. This means it has plugins (‘modules’) where others can contribute missing code. With its easy to use interface — as we know it from JASP –, jamovi is bound to appeal to many researchers and those familiar with SPSS will find their way around without problems. This is definitely one to watch.

Factorizing Error (in Zelig)

Today I re-run some code in R and was greeted with an error “Error in factorize(formula, data, f_out = TRUE)” and (more specifically) “Unable to find variable to convert to factor.” I immediately suspected the as.factor(x) among the predictor variables to be the culprit, but since this is analysis that has worked a few days earlier (on a different machine), I quickly searched on the web and found nothing. For some reason, in this case the as.factor(x) did not work, and the solution was simple: create a new (factor) variable separately and then run the regression analysis with the new variable. So instead of:

z <- zelig(y ~ x1 + as.factor(x2) + x3, model="normal", data=d)

I first create the new variable:

d$x2_factor <- as.factor(d$x2)

and then run the regression analysis with the new variable:

z <- zelig(y ~ x1 + x2_factor + x3, model="normal", data=d)

I just thought I’d share this in case someone else comes across this error and doesn’t find it obvious what the solution is.

Students are getting better = happy

I have just finished grading a batch of reports and am happy to see that my students this year seem to being better. What makes me particularly happy is that there were fewer “serious flaws”.