Turning R into SPSS?

I have written about several free alternatives to SPSS, including PSPP, Jamovi, and JASP. Bob Munchen has reviewed a few more options: Deducer, RKWard, Rattle, and the good old R Commander (in the screenshot on the left). We also find a review of Blue Sky Statistics. Blue Sky Statistics is another option for those seeking SPSS “simplicity” with R power underneath.

Blue Sky Statistics is available for Windows, and is open source. They make money from paid support. I note that it comes with a polished interface and this data editor that reminds us of Excel. I was very happy to see that Blue Sky Statistics offers many options for data handling, like recoding, merging, computing variables, or subsetting — that’s much better than what say jamovi offers at the moment.

The dialogs are quite intuitive if you are familiar with SPSS, and they can also produce R code. This is a feature we know from the R Commander, and ostensibly the aim is to allow users to wean from the graphical interface and move to the console. Nice as the idea is, it is defeated by custom commands like BSkyOpenNewDataset() that we don’t normally use.

The models offered by Blue Sky Statistics are fine for many uses — for those not living on the cutting edge. A nice touch are the interactive tables in the output, where you can customize to some degree.

Exciting as Blue Sky Statistics and other GUI are at first sight, I’m gradually becoming less excited about GUI for R. Probably the biggest challenge is the “hey, this is all text!” shock when you first open R (or typically Rstudio these days). Once you realize that the biggest challenge is to make the right choices and then interpret your results, you become less hung up about the “right” software. Once you realize that you’ll have to remember either way — where to click, or what to type — copying and pasting code fragments becomes less daunting. If you restrict yourself to a few basic commands like lm(), plot(), and summary(), R isn’t that difficult. Sure, when you come across idiosyncrasies because different developers use different naming conventions, R can be hard. But then, there are also the moments where you realize that there are so many ready-made solutions (i.e. packages) available and that with R you really are in control of your analysis. And the day you learn about replication and knitr, there’s hardly a way back.

One reason I kept looking for GUI was my MA students. I’m excited to see more and more of them choosing Rstudio over SPSS (they are given the choice, we’re currently use both in parallel)… so I there might be simply no need for turning R into SPSS.

 

Correlations Graphics in R

Correlations are some of the basics in quantitative analysis, and they are well suited for graphical examination. Using plots we can see whether it is justified to assume a linear relationship between the variables, for example. Scatter plots are our friends here, and with two variables it is as simple as calling plot() in R:

plot(var1, var2)

If we have more than two variables, it can be useful to plot a scatter plot matrix: multiple scatter plots in one go. The pairs() command is built in, but in my view not the most useful one out there. Here we use cbind() to combine a few variables, and specify that we don’t want to see the same scatter plots (rotated) in the upper panel.

pairs(cbind(var1, var2, var3, var4) , upper.panel=NULL)

A more flexible method is provided in library(car) with the scatterplotMatrix(). If this is not flexible enough, we can always split the plot and draw whatever we need, but that’s not for today.

library(car)scatterplotMatrix(cbind(var1, var2, var3, var4))

If we have many more variables, it’s necessary to draw multiple plots to be able to see what is going on. However, sometimes after having checked that the associations are more or less linear, we’re simply interested in the strength and direction of the correlations for many combinations of variables. I guess the classic approach is staring at a large table of correlation coefficients, but as is often the case, graphics can make your life easier, in this case library(corrplot):

library(corrplot)
corrplot(object_with_many_variables, method="circle", type="lower", diag=FALSE)

This is certainly more pleasant than staring at a table…

For all these commands, R offers plenty of ways to tweak the output.

Custom Tables of Descriptive Statistics in R

Here’s how we can quite easily and flexibly create tables of descriptive statistics in R. Of course, we can simply use summary(variable_name), but this is not what you’d include in a manuscript — so not what you want when compiling a document in knitr/Rmarkdown.

First, we identify the variables we want to summarize. Often our database includes many more variables:

vars <- c("variable_1", "variable_2", "variable_3")

Note that these are the variable names in quotes. Second, we use lapply() to calculate whatever summary statistic we want. This is where flexibility kicks in: have you ever tried to include an interpolated median in such a table, just as easy as the mean in R. Here’s an example with the mean, minimum, maximum, and median:

v_mean <- lapply(dataset[vars], mean, na.rm=TRUE)
v_min <- lapply(dataset[vars], min, na.rm=TRUE)
v_max <- lapply(dataset[vars], max, na.rm=TRUE)
v_med <- lapply(dataset[vars], median, na.rm=TRUE)

Too many digits? We can use round() to get rid of them. There’s actually an argument ‘digits’ in the kable() command we’ll use in a minute that in principle allows rounding at the very end, but unfortunately it often fails on me. Rounding:

v_mean <- round(as.numeric(v_mean), 2)

Now we only need to bring the different summary statistics together:

v_tab <- cbind(mean=v_mean, min=v_min, max=v_max, median=v_med)

And add useful variable labels:

rownames(v_tab) <- c("Variable 1", "A description of variable 2", "Variable 3")

and we use kable() to generate a decent table:

kable(v_tab)

If this looks complicated, bear in mind that with no additional work you can change the order of the variables and include any summary statistics. That’s table A1 in the appendix sorted.