Getting Qualtrics data into R

Data collected in Qualtrics come in a funny way when exported to CSV: the first two lines are headers. Simply using read.csv() will mess things up, because typically we only have one line as header. We can skip empty lines at the beginning, but there is no immediately obvious way to skip only the second line.

Of course there is an R package for that, but when I tried, the qualtRics package was very slow:

raw_data <- readSurvey("qualtrics_survey.csv")

raw_data <- readSurvey("qualtrics_survey_legacy.csv", legacyFormat=T) # if two rows at the top

As an alternative, you could import just the header of your survey, and then join it to an import where you skip the header lines. Actually, here’s a better way of doing just this:

everything = readLines("qualtrics_survey_legacy.csv")
wanted = everything[-2]
mydata = read.csv(textConnection(wanted), header = TRUE, stringsAsFactors = FALSE)

If you get an error “EOF within quoted string”, don’t ignore it: It indicates problems with double quoting, so add quote = "" to your import code.

If you are willing to violate the principle of not touching the raw data file, you could open the survey in a spreadsheet like Excel or LibreOffice Calc and delete the unwanted rows.

Given all these options, I found the most reliable way (as in: contrary to the above, it hasn’t failed me so far) to get Qualtrics data into R yet another one:
1. export as SPSS (rather than CSV)
2. use library(haven)
3. read_spss()

Turning R into SPSS?

I have written about several free alternatives to SPSS, including PSPP, Jamovi, and JASP. Bob Munchen has reviewed a few more options: Deducer, RKWard, Rattle, and the good old R Commander (in the screenshot on the left). We also find a review of Blue Sky Statistics. Blue Sky Statistics is another option for those seeking SPSS “simplicity” with R power underneath.

Blue Sky Statistics is available for Windows, and is open source. They make money from paid support. I note that it comes with a polished interface and this data editor that reminds us of Excel. I was very happy to see that Blue Sky Statistics offers many options for data handling, like recoding, merging, computing variables, or subsetting — that’s much better than what say jamovi offers at the moment.

The dialogs are quite intuitive if you are familiar with SPSS, and they can also produce R code. This is a feature we know from the R Commander, and ostensibly the aim is to allow users to wean from the graphical interface and move to the console. Nice as the idea is, it is defeated by custom commands like BSkyOpenNewDataset() that we don’t normally use.

The models offered by Blue Sky Statistics are fine for many uses — for those not living on the cutting edge. A nice touch are the interactive tables in the output, where you can customize to some degree.

Exciting as Blue Sky Statistics and other GUI are at first sight, I’m gradually becoming less excited about GUI for R. Probably the biggest challenge is the “hey, this is all text!” shock when you first open R (or typically Rstudio these days). Once you realize that the biggest challenge is to make the right choices and then interpret your results, you become less hung up about the “right” software. Once you realize that you’ll have to remember either way — where to click, or what to type — copying and pasting code fragments becomes less daunting. If you restrict yourself to a few basic commands like lm(), plot(), and summary(), R isn’t that difficult. Sure, when you come across idiosyncrasies because different developers use different naming conventions, R can be hard. But then, there are also the moments where you realize that there are so many ready-made solutions (i.e. packages) available and that with R you really are in control of your analysis. And the day you learn about replication and knitr, there’s hardly a way back.

One reason I kept looking for GUI was my MA students. I’m excited to see more and more of them choosing Rstudio over SPSS (they are given the choice, we’re currently use both in parallel)… so I there might be simply no need for turning R into SPSS.


Another one to watch: jamovi for stats

Here’s another open statistical program to watch: jamovi. Like JASP, jamovi is built on top of R. Unlike JAPS, jamovi is not focused on Bayesian analysis, but wants to be community driven. This means it has plugins (‘modules’) where others can contribute missing code. With its easy to use interface — as we know it from JASP –, jamovi is bound to appeal to many researchers and those familiar with SPSS will find their way around without problems. This is definitely one to watch.

In Praise of PSPP

In an earlier assessment of PSPP as a replacement of SPSS I mentioned some of the reasons I think PSPP cannot yet fulfil its ambition of being such a replacement. I’m happy to report that PSPP now does logistic regressions, but the point of this post is to highlight one of its strength in a practical application: PSPP is super fast.

I frequently use an old, underpowered Netbook, and usually that’s enough computing power for basic analyses (most mobile phones these days are more powerful). A few days ago, I wanted to run a very simple analysis on the longitudinal WVS data. We’re looking at a 500Mb SPSS file here, and all I wanted to do was calculating a new variable, and then get the mean by country and year. Really basic stuff, except that I only have 1Gb RAM available (I did say underpowered).

What happened next: I gave up on R loading the data file (the RData file provided by the WVS is 1.3Gb), it took too long. Opening PSPP is a breeze on this machine (and easily beats opening SPSS on the brand new Windows machine I’m provided with; I cannot imagine how long SPSS would take on this machine). While it wasn’t actually fast, it took around 5 minutes for each of the steps (calculating the new variable, sorting/splitting by country and year, frequencies statistics to get the mean). That’s around as long as I waited for R to load the data before giving up. Moreover, PSPP played nicely and did not lock up the computer, so I could actually do some other work at the same time; R can be more demanding.