Turning R into SPSS?

I have written about several free alternatives to SPSS, including PSPP, Jamovi, and JASP. Bob Munchen has reviewed a few more options: Deducer, RKWard, Rattle, and the good old R Commander (in the screenshot on the left). We also find a review of Blue Sky Statistics. Blue Sky Statistics is another option for those seeking SPSS “simplicity” with R power underneath.

Blue Sky Statistics is available for Windows, and is open source. They make money from paid support. I note that it comes with a polished interface and this data editor that reminds us of Excel. I was very happy to see that Blue Sky Statistics offers many options for data handling, like recoding, merging, computing variables, or subsetting — that’s much better than what say jamovi offers at the moment.

The dialogs are quite intuitive if you are familiar with SPSS, and they can also produce R code. This is a feature we know from the R Commander, and ostensibly the aim is to allow users to wean from the graphical interface and move to the console. Nice as the idea is, it is defeated by custom commands like BSkyOpenNewDataset() that we don’t normally use.

The models offered by Blue Sky Statistics are fine for many uses — for those not living on the cutting edge. A nice touch are the interactive tables in the output, where you can customize to some degree.

Exciting as Blue Sky Statistics and other GUI are at first sight, I’m gradually becoming less excited about GUI for R. Probably the biggest challenge is the “hey, this is all text!” shock when you first open R (or typically Rstudio these days). Once you realize that the biggest challenge is to make the right choices and then interpret your results, you become less hung up about the “right” software. Once you realize that you’ll have to remember either way — where to click, or what to type — copying and pasting code fragments becomes less daunting. If you restrict yourself to a few basic commands like lm(), plot(), and summary(), R isn’t that difficult. Sure, when you come across idiosyncrasies because different developers use different naming conventions, R can be hard. But then, there are also the moments where you realize that there are so many ready-made solutions (i.e. packages) available and that with R you really are in control of your analysis. And the day you learn about replication and knitr, there’s hardly a way back.

One reason I kept looking for GUI was my MA students. I’m excited to see more and more of them choosing Rstudio over SPSS (they are given the choice, we’re currently use both in parallel)… so I there might be simply no need for turning R into SPSS.


Why We Should Watch JASP

jaspJASP — “a fresh way to do statistics — has been around for a while now, but this is really a project I am watching. Even though it explicitly does not refer to just another statistic programme, that’s certainly a useful mnemonic until Google ranks the page higher. JASP comes with a clean interface that will feel familiar to SPSS users, but actually improves on SPSS on many fronts to make it easier to use. That’s a nice touch.

JASP uses a journal system like we know it from IPyhton and Jupyter with live preview. The live preview is great, as users can immediately see what consequences their choices have. Unfortunately, this only really works for relatively small datasets, experiments or a simple population survey with a thousand respondents or so. Better than other similar solutions, the code is not visible to the user, which leads to nice outputs. At the same time, we can go back to the analysis and modify the output at any time. That’s slightly easier than finding the corresponding code in say Rmarkdown and recompiling.

As a nice touch, there is integration with OSF and SocArxiv. This means if I wrote a paper based on analysis carried out in JASP, I could upload this alongside the paper, and anyone can see the output file and modify it — online.

JASP uses R to do the calculations, which gives it a bright future in the kind of things it can offer. Unfortunately, at the moment, what’s on offer remains limited. This means for my purposes, JASP is not (yet) a replacement for SPSS, just like PSPP. The way JASP implements what it offers, however, is the reason why we should watch this project: it is easy to use, and it lets users choose Bayesian analysis for everything.

While JASP uses R underneath, there does not appear to be a plugin system (other than getting involved in developing for the project). I think that user-provided extensions might be just what is needed to make this project take off, because generally speaking, this is the kind of program I can see myself use in teaching (alongside Rstudio) and for simple analyses. For more advanced analyses, R remains the go-to application.

In Praise of PSPP

In an earlier assessment of PSPP as a replacement of SPSS I mentioned some of the reasons I think PSPP cannot yet fulfil its ambition of being such a replacement. I’m happy to report that PSPP now does logistic regressions, but the point of this post is to highlight one of its strength in a practical application: PSPP is super fast.

I frequently use an old, underpowered Netbook, and usually that’s enough computing power for basic analyses (most mobile phones these days are more powerful). A few days ago, I wanted to run a very simple analysis on the longitudinal WVS data. We’re looking at a 500Mb SPSS file here, and all I wanted to do was calculating a new variable, and then get the mean by country and year. Really basic stuff, except that I only have 1Gb RAM available (I did say underpowered).

What happened next: I gave up on R loading the data file (the RData file provided by the WVS is 1.3Gb), it took too long. Opening PSPP is a breeze on this machine (and easily beats opening SPSS on the brand new Windows machine I’m provided with; I cannot imagine how long SPSS would take on this machine). While it wasn’t actually fast, it took around 5 minutes for each of the steps (calculating the new variable, sorting/splitting by country and year, frequencies statistics to get the mean). That’s around as long as I waited for R to load the data before giving up. Moreover, PSPP played nicely and did not lock up the computer, so I could actually do some other work at the same time; R can be more demanding.

Why I (also) teach using R/Rstudio

My colleagues are sometimes surprised to learn that I teach statistics using SPSS and R/Rstudio in parallel. (Part of this is due to a misconception that R is hard to learn, ignoring that there are more difficult problems like proper model specifications and interpretation of results.) In my opinion, there are many benefits in doing so; here’s an unordered (and incomplete) list:
– introduction to a statistics package that remains available after they leave university and have access to the SPSS site licence (between jobs, moving to another university, out of academia)
– exposure to a different paradigm, making the shift to other software like Stata or SAS appear less threatening
– understanding that it doesn’t matter what package we use for basic statistics (we could even do it by hand)
– that line on the CV
– overcoming limitations in SPSS (ever tried to plot an interaction effect the way we want them?)
– ensuring that those who want to progress to more advanced (contemporary) methods actually can (being “future ready”)
– encourage a mindset that we are in control of the analyses, not the software package

At the same time, I acknowledge that many students have been exposed to SPSS before and feel more at ease when they can see the menu bar. (And the day the university gets rid of that site licence, PSPP will do nicely to work in parallel with R/Rstudio).

Overcoming Warnings when Importing SPSS Files to R

R can import SPSS files quite easily, using the package foreign and the read.spss command. It usually works quite well out of the box, so well that I usually choose the SPSS file when downloading secondary data (hint: look at the argument use.value.labels depending on how you want your data).

Sometimes R isn’t so happy, throwing warnings like “Unrecognized record type 7, subtype 18 encountered in system file”. Generally warnings in R are there for a reason. Usually these seem to be variable and data attributes in SPSS, but to be sure, simply convert the SPSS file into SPSS Portable (*.por rather than *.sav). Don’t have SPSS? Enter PSPP , a free (open source) program that can help you out! (for Windows, check directly on this site).

pspp-saveas-por PSPP can open SPSS files faster than SPSS, and under File > Save as... there’s the option to save as a Portable file (rather than the default System File) at the bottom left of the dialog. If you import this (portable) SPSS file to R, there should be no errors or warnings.