Overcoming Warnings when Importing SPSS Files to R

R can import SPSS files quite easily, using the package foreign and the read.spss command. It usually works quite well out of the box, so well that I usually choose the SPSS file when downloading secondary data (hint: look at the argument use.value.labels depending on how you want your data).

Sometimes R isn’t so happy, throwing warnings like “Unrecognized record type 7, subtype 18 encountered in system file”. Generally warnings in R are there for a reason. Usually these seem to be variable and data attributes in SPSS, but to be sure, simply convert the SPSS file into SPSS Portable (*.por rather than *.sav). Don’t have SPSS? Enter PSPP , a free (open source) program that can help you out! (for Windows, check directly on this site).

pspp-saveas-por PSPP can open SPSS files faster than SPSS, and under File > Save as... there’s the option to save as a Portable file (rather than the default System File) at the bottom left of the dialog. If you import this (portable) SPSS file to R, there should be no errors or warnings.

How I ended up with R

In my first stats course, we used SPSS, as it’s commonly the case. I was aware that there are alternative, particularly Stata was used by many of the senior researchers. Nevertheless, SPSS was what I got to know first, and it was OK. I kept ranting about the slow graphical interface on the Mac. (At least in more recent versions, SPSS seems quite responsive once it’s started up.) At first there seemed to be no point in trying something else. Having a penchant for open source, however, I did try R once or twice, but without a manual at hand, I was simply lost: there was no apparent way to get the data in, and it seemed just cumbersome. Why bother, anyway; I did have my SPSS.

Three things happened next. First, I kept hearing about R during discussions. Second, an advanced stats course was offered, and it came with the option of a crash course in R. I didn’t hesitate, and with an instructor, R wasn’t so difficult any more. In fact, after one afternoon session I felt confident enough I could find my way around R. Third, I hit the limits of SPSS. I needed propensity score matching. The SPSS macro I found on the web didn’t work on my version of SPSS. Should I invest in learning SPSS Basic, or do the thing in R? I figured that if I hit the limits of SPSS once, this is likely to happen again. From then onwards, I used SPSS and R in parallel: SPSS for basic stuff and recoding (etc.), R when SPSS couldn’t handle the task at hand.

Then I changed university (to one where I didn’t have right to SPSS on my laptop) and never looked back. Once I realized how easy it is to program in R, I now only touch SPSS when I have to (aka teaching).

Introduction to Statistics Using PSPP

A few weeks back I argued that PSPP is not (yet) a real replacement for SPSS. I also claimed wrongly that there are no introductions to statistics that use PSPP. I had book-length introductions in mind, but alas no is not quite the right word. Today, I give you The PSPP Guide: An introduction to statistical analysis. This isn’t a proper review, nor an endorsement, simply because I haven’t actually read the book.

Nonetheless, here are some observations (looking over the table of contents). First off, the book does not seem to introduce much beyond PSPP’s capabilities. On the one hand, this is great for the readers, on the other hand, when teaching, there are many things I want my students to be aware of — doing statistics is one thing, reading and interpreting another. I note chapter 6 which sidesteps current shortcomings by using graphing capabilities in OpenOffice. The 2014 version of the book includes factor analysis, keeping up with PSPP. This said, personally, I cannot envisage teaching an introduction to statistics without mentioning logistic regressions.

Given the active development of PSPP I have no doubt that we will see more books like this in the future (and probably from more reputable publishers, too), but frankly, I can’t see myself using a book that doesn’t cover some of the methods I consider essential.

Is PSPP a replacement for SPSS?

PSPP is sometimes touted as a replacement for SPSS (including by its creators). Well, it isn’t (this is often the case with open source alternatives; the ambition and reality do not quite match). By stating plainly that PSPP is not a replacement for SPSS, I don’t mean to dismiss PSPP.

psppFirst off, PSPP is under active development, and getting hold of the latest version can be a bit difficult. For Windows, this site often has the most up-to-date version, for Linux/Debian you’ll need to be on a “unstable” release or compile your own (which I doubt many will want to do given that we’re looking at an SPSS replacement, not R or Octave).

Second, recent releases cover many basic functions needed for an introductory statistics course. The GUI frequently lags a bit the underlying capability, so some functions will only be available using SYNTAX. Oddly enough, the PSPP team copy the SPSS interface quite well, including things that could readily be improved (e.g. why do we have tabs for the “Data View” and the “Variable View”, but a separate window for the results or syntax? Why mix the two?).

So PSPP can readily do tables, ANOVA, linear and logistic regressions, and recoding variables. Unfortunately, and this is why PSPP is not even a replacement for basic SPSS users, there are bits and pieces missing even in the basic functions. On the positive side, PSPP has a cleaner interface than SPSS, on the negative side some features are just not there. Unless users follow a course designed specifically with PSPP in mind, they will frequently hit a wall. The same is the case for SYNTAX. Users will be able to run SPSS syntax with no problem, as long as PSPP has the commands implemented. Again, when using code from the many websites helping SPSS users, unfortunately PSPP users will frequently hit a wall.

What do I mean by bits and pieces missing? Let’s take a linear regression. It’s there, the familiar box with arrows to choose variables. Now I may want some multi-collinearity statistics, too. Ah, sorry, doesn’t exist yet. So I can build a model, but do not even have one of the most basic means to check whether it is any good. For this reason I am not surprised nobody has written an that there are not many introductions into statistics using PSPP… it’s just not there yet.

One thing I missed a lot is that PSPP does not remember the last input. So if I run a regression and want to add another variable, I’ll have to start from scratch in PSPP, entering each variable. Graphing is lacking or very poor.

With the advancements in Rstudio, R Commander, etc., I sometimes wonder whether PSPP is just advancing too slowly. Having said all this, I wanted to add on a positive note. PSPP has got quite stable in recent releases; it’s got a price tag hard to beat and moral superiority with being truly open source. And finally, it is fast, much faster than SPSS!

Creating Sparklines in MS Word

A few years back I’ve prepared a step-by-step guide to creating sparklines in SPSS and MS Word. I’ve come up with the idea long before learning about what is today widely known as sparklines from Edward Tufte (writing a physics lab report in high school). After a recent comment by a colleague (“you can do that in Word?”), I’ve decided to put this tutorial online.

The general principle applies to any software package to create the original graphs, including MS Excel. Continue reading “Creating Sparklines in MS Word”