Turning R into SPSS?

I have written about several free alternatives to SPSS, including PSPP, Jamovi, and JASP. Bob Munchen has reviewed a few more options: Deducer, RKWard, Rattle, and the good old R Commander (in the screenshot on the left). We also find a review of Blue Sky Statistics. Blue Sky Statistics is another option for those seeking SPSS “simplicity” with R power underneath.

Blue Sky Statistics is available for Windows, and is open source. They make money from paid support. I note that it comes with a polished interface and this data editor that reminds us of Excel. I was very happy to see that Blue Sky Statistics offers many options for data handling, like recoding, merging, computing variables, or subsetting — that’s much better than what say jamovi offers at the moment.

The dialogs are quite intuitive if you are familiar with SPSS, and they can also produce R code. This is a feature we know from the R Commander, and ostensibly the aim is to allow users to wean from the graphical interface and move to the console. Nice as the idea is, it is defeated by custom commands like BSkyOpenNewDataset() that we don’t normally use.

The models offered by Blue Sky Statistics are fine for many uses — for those not living on the cutting edge. A nice touch are the interactive tables in the output, where you can customize to some degree.

Exciting as Blue Sky Statistics and other GUI are at first sight, I’m gradually becoming less excited about GUI for R. Probably the biggest challenge is the “hey, this is all text!” shock when you first open R (or typically Rstudio these days). Once you realize that the biggest challenge is to make the right choices and then interpret your results, you become less hung up about the “right” software. Once you realize that you’ll have to remember either way — where to click, or what to type — copying and pasting code fragments becomes less daunting. If you restrict yourself to a few basic commands like lm(), plot(), and summary(), R isn’t that difficult. Sure, when you come across idiosyncrasies because different developers use different naming conventions, R can be hard. But then, there are also the moments where you realize that there are so many ready-made solutions (i.e. packages) available and that with R you really are in control of your analysis. And the day you learn about replication and knitr, there’s hardly a way back.

One reason I kept looking for GUI was my MA students. I’m excited to see more and more of them choosing Rstudio over SPSS (they are given the choice, we’re currently use both in parallel)… so I there might be simply no need for turning R into SPSS.

 

Another one to watch: jamovi for stats

Here’s another open statistical program to watch: jamovi. Like JASP, jamovi is built on top of R. Unlike JAPS, jamovi is not focused on Bayesian analysis, but wants to be community driven. This means it has plugins (‘modules’) where others can contribute missing code. With its easy to use interface — as we know it from JASP –, jamovi is bound to appeal to many researchers and those familiar with SPSS will find their way around without problems. This is definitely one to watch.

In Praise of PSPP

In an earlier assessment of PSPP as a replacement of SPSS I mentioned some of the reasons I think PSPP cannot yet fulfil its ambition of being such a replacement. I’m happy to report that PSPP now does logistic regressions, but the point of this post is to highlight one of its strength in a practical application: PSPP is super fast.

I frequently use an old, underpowered Netbook, and usually that’s enough computing power for basic analyses (most mobile phones these days are more powerful). A few days ago, I wanted to run a very simple analysis on the longitudinal WVS data. We’re looking at a 500Mb SPSS file here, and all I wanted to do was calculating a new variable, and then get the mean by country and year. Really basic stuff, except that I only have 1Gb RAM available (I did say underpowered).

What happened next: I gave up on R loading the data file (the RData file provided by the WVS is 1.3Gb), it took too long. Opening PSPP is a breeze on this machine (and easily beats opening SPSS on the brand new Windows machine I’m provided with; I cannot imagine how long SPSS would take on this machine). While it wasn’t actually fast, it took around 5 minutes for each of the steps (calculating the new variable, sorting/splitting by country and year, frequencies statistics to get the mean). That’s around as long as I waited for R to load the data before giving up. Moreover, PSPP played nicely and did not lock up the computer, so I could actually do some other work at the same time; R can be more demanding.

Overcoming Warnings when Importing SPSS Files to R

R can import SPSS files quite easily, using the package foreign and the read.spss command. It usually works quite well out of the box, so well that I usually choose the SPSS file when downloading secondary data (hint: look at the argument use.value.labels depending on how you want your data).

Sometimes R isn’t so happy, throwing warnings like “Unrecognized record type 7, subtype 18 encountered in system file”. Generally warnings in R are there for a reason. Usually these seem to be variable and data attributes in SPSS, but to be sure, simply convert the SPSS file into SPSS Portable (*.por rather than *.sav). Don’t have SPSS? Enter PSPP , a free (open source) program that can help you out! (for Windows, check directly on this site).

pspp-saveas-por PSPP can open SPSS files faster than SPSS, and under File > Save as... there’s the option to save as a Portable file (rather than the default System File) at the bottom left of the dialog. If you import this (portable) SPSS file to R, there should be no errors or warnings.

How I ended up with R

In my first stats course, we used SPSS, as it’s commonly the case. I was aware that there are alternative, particularly Stata was used by many of the senior researchers. Nevertheless, SPSS was what I got to know first, and it was OK. I kept ranting about the slow graphical interface on the Mac. (At least in more recent versions, SPSS seems quite responsive once it’s started up.) At first there seemed to be no point in trying something else. Having a penchant for open source, however, I did try R once or twice, but without a manual at hand, I was simply lost: there was no apparent way to get the data in, and it seemed just cumbersome. Why bother, anyway; I did have my SPSS.

Three things happened next. First, I kept hearing about R during discussions. Second, an advanced stats course was offered, and it came with the option of a crash course in R. I didn’t hesitate, and with an instructor, R wasn’t so difficult any more. In fact, after one afternoon session I felt confident enough I could find my way around R. Third, I hit the limits of SPSS. I needed propensity score matching. The SPSS macro I found on the web didn’t work on my version of SPSS. Should I invest in learning SPSS Basic, or do the thing in R? I figured that if I hit the limits of SPSS once, this is likely to happen again. From then onwards, I used SPSS and R in parallel: SPSS for basic stuff and recoding (etc.), R when SPSS couldn’t handle the task at hand.

Then I changed university (to one where I didn’t have right to SPSS on my laptop) and never looked back. Once I realized how easy it is to program in R, I now only touch SPSS when I have to (aka teaching).