If you aren’t, you should be using Zotero. It amazes me to see researchers ‘managing’ their references manually these days. It’s complicated, takes time, is prone to errors, and simply unnecessary. There are many options out there to manage your references, but you should look at the free and open Zotero. You can install it on all your devices, you’re not limited in the number of citations you can use, you can take it with you when you change workplace, and in fact you’re not even restricted to the feature of Zotero because you can use plugins. Seamless integration in word processors isn’t going to stand out from the competition, but getting stuff into Zotero takes no effort at all — it’s unparalleled easy with just one click in your web browser. You get free syncing, too. There really is no reason not to keep notes of what you are reading.

After grabbing Zotero, you probably want Zotfile, too. Zotfile manages your PDF versions of research articles. In my view, the most useful feature is the ability to extract highlighted text from the PDF. It’s so practical that I sometimes even don’t take proper notes (for the main points, you should store them in your brain anyway).

Image credit: Zotero, Zotfile


I have recently explored open-source approaches to computer-assisted qualitative data analysis (CAQDA). As is common with open-source software, there are several options available, but as is often also the case, not many of them can keep up with the commercial packages, or are abandoned.

Here I wanted to highlight just three options.

RQDA is built on top of R, which is perhaps not the most obvious choice — but can have advantages. The documentation is steadily improving, making it more apparent how RQDA has the main features we’ve come to expect from CAQDA software. I find it a bit fiddly with the many windows that tend to be opened, especially when working on a small screen.

Colloquium is Java-based, which makes it run almost everywhere. It offers a rather basic feature set, and tags can only be assigned to lines (which also implies that lines are the unit of analysis). Where it shines, though, is how it enables working in two languages in parallel.

CATMA is web-based, but runs without flash — so it should run pretty anywhere. It offers basic manual and automatic coding, but there’s one feature we really should care about: CATMA does TEI. This means that CATMA offers a standardized XML export that should be usable in the future, and facilitate sharing the documents as well as the accompanying coding. That’s quite exciting.

What I find difficult to judge at the moment, is whether TEI will be adopted by CAQDA software. Atlas.ti does some XML, but as far as I know it’s not TEI. And, would TEI be more useful to future researchers than a SQLite database like RQDA produces them?

Alienating open-source contributors?

Some time ago, I came across a blog post highlighting how open-source contributors can be alienated by maintainers. Tim Jurka describes his unpleasant experience of sending an updated version of an R package to CRAN. He highlights the short and impersonal messages from CRAN maintainers, an apparent contradiction, and generally felt alienated by the process. Interestingly, there are four lessons to be learnt offered:

– don’t alienate volunteers — everyone in the R community is a volunteer, and it doesn’t benefit the community when you’re unnecessarily rude.
– understand volunteers have other commitments — while the core R team is doing an excellent job building a statistical computing platform, not everyone can make the same commitment to an open-source project.
– open-source has limited resources — every contribution helps.
– be patient — not everyone can operate on the same level, and new members will need to be brought up to speed on best practices.

I guess everyone would sign up to this, but oddly enough my experience with the team running CRAN has always been of the nature Tim Jurka cites as a positive example: brief, but courteous. What is definitely missing from said blog post, though, is an appreciation that the team running R and CRAN are also volunteers!

Is PSPP a replacement for SPSS?

PSPP is sometimes touted as a replacement for SPSS (including by its creators). Well, it isn’t (this is often the case with open source alternatives; the ambition and reality do not quite match). By stating plainly that PSPP is not a replacement for SPSS, I don’t mean to dismiss PSPP.

psppFirst off, PSPP is under active development, and getting hold of the latest version can be a bit difficult. For Windows, this site often has the most up-to-date version, for Linux/Debian you’ll need to be on a “unstable” release or compile your own (which I doubt many will want to do given that we’re looking at an SPSS replacement, not R or Octave).

Second, recent releases cover many basic functions needed for an introductory statistics course. The GUI frequently lags a bit the underlying capability, so some functions will only be available using SYNTAX. Oddly enough, the PSPP team copy the SPSS interface quite well, including things that could readily be improved (e.g. why do we have tabs for the “Data View” and the “Variable View”, but a separate window for the results or syntax? Why mix the two?).

So PSPP can readily do tables, ANOVA, linear and logistic regressions, and recoding variables. Unfortunately, and this is why PSPP is not even a replacement for basic SPSS users, there are bits and pieces missing even in the basic functions. On the positive side, PSPP has a cleaner interface than SPSS, on the negative side some features are just not there. Unless users follow a course designed specifically with PSPP in mind, they will frequently hit a wall. The same is the case for SYNTAX. Users will be able to run SPSS syntax with no problem, as long as PSPP has the commands implemented. Again, when using code from the many websites helping SPSS users, unfortunately PSPP users will frequently hit a wall.

What do I mean by bits and pieces missing? Let’s take a linear regression. It’s there, the familiar box with arrows to choose variables. Now I may want some multi-collinearity statistics, too. Ah, sorry, doesn’t exist yet. So I can build a model, but do not even have one of the most basic means to check whether it is any good. For this reason I am not surprised nobody has written an that there are not many introductions into statistics using PSPP… it’s just not there yet.

One thing I missed a lot is that PSPP does not remember the last input. So if I run a regression and want to add another variable, I’ll have to start from scratch in PSPP, entering each variable. Graphing is lacking or very poor.

With the advancements in Rstudio, R Commander, etc., I sometimes wonder whether PSPP is just advancing too slowly. Having said all this, I wanted to add on a positive note. PSPP has got quite stable in recent releases; it’s got a price tag hard to beat and moral superiority with being truly open source. And finally, it is fast, much faster than SPSS!