Introduction to Statistics Using PSPP

A few weeks back I argued that PSPP is not (yet) a real replacement for SPSS. I also claimed wrongly that there are no introductions to statistics that use PSPP. I had book-length introductions in mind, but alas no is not quite the right word. Today, I give you The PSPP Guide: An introduction to statistical analysis. This isn’t a proper review, nor an endorsement, simply because I haven’t actually read the book.

Nonetheless, here are some observations (looking over the table of contents). First off, the book does not seem to introduce much beyond PSPP’s capabilities. On the one hand, this is great for the readers, on the other hand, when teaching, there are many things I want my students to be aware of — doing statistics is one thing, reading and interpreting another. I note chapter 6 which sidesteps current shortcomings by using graphing capabilities in OpenOffice. The 2014 version of the book includes factor analysis, keeping up with PSPP. This said, personally, I cannot envisage teaching an introduction to statistics without mentioning logistic regressions.

Given the active development of PSPP I have no doubt that we will see more books like this in the future (and probably from more reputable publishers, too), but frankly, I can’t see myself using a book that doesn’t cover some of the methods I consider essential.

Is PSPP a replacement for SPSS?

PSPP is sometimes touted as a replacement for SPSS (including by its creators). Well, it isn’t (this is often the case with open source alternatives; the ambition and reality do not quite match). By stating plainly that PSPP is not a replacement for SPSS, I don’t mean to dismiss PSPP.

psppFirst off, PSPP is under active development, and getting hold of the latest version can be a bit difficult. For Windows, this site often has the most up-to-date version, for Linux/Debian you’ll need to be on a “unstable” release or compile your own (which I doubt many will want to do given that we’re looking at an SPSS replacement, not R or Octave).

Second, recent releases cover many basic functions needed for an introductory statistics course. The GUI frequently lags a bit the underlying capability, so some functions will only be available using SYNTAX. Oddly enough, the PSPP team copy the SPSS interface quite well, including things that could readily be improved (e.g. why do we have tabs for the “Data View” and the “Variable View”, but a separate window for the results or syntax? Why mix the two?).

So PSPP can readily do tables, ANOVA, linear and logistic regressions, and recoding variables. Unfortunately, and this is why PSPP is not even a replacement for basic SPSS users, there are bits and pieces missing even in the basic functions. On the positive side, PSPP has a cleaner interface than SPSS, on the negative side some features are just not there. Unless users follow a course designed specifically with PSPP in mind, they will frequently hit a wall. The same is the case for SYNTAX. Users will be able to run SPSS syntax with no problem, as long as PSPP has the commands implemented. Again, when using code from the many websites helping SPSS users, unfortunately PSPP users will frequently hit a wall.

What do I mean by bits and pieces missing? Let’s take a linear regression. It’s there, the familiar box with arrows to choose variables. Now I may want some multi-collinearity statistics, too. Ah, sorry, doesn’t exist yet. So I can build a model, but do not even have one of the most basic means to check whether it is any good. For this reason I am not surprised nobody has written an that there are not many introductions into statistics using PSPP… it’s just not there yet.

One thing I missed a lot is that PSPP does not remember the last input. So if I run a regression and want to add another variable, I’ll have to start from scratch in PSPP, entering each variable. Graphing is lacking or very poor.

With the advancements in Rstudio, R Commander, etc., I sometimes wonder whether PSPP is just advancing too slowly. Having said all this, I wanted to add on a positive note. PSPP has got quite stable in recent releases; it’s got a price tag hard to beat and moral superiority with being truly open source. And finally, it is fast, much faster than SPSS!