Getting Qualtrics data into R

Data collected in Qualtrics come in a funny way when exported to CSV: the first two lines are headers. Simply using read.csv() will mess things up, because typically we only have one line as header. We can skip empty lines at the beginning, but there is no immediately obvious way to skip only the second line.

Of course there is an R package for that, but when I tried, the qualtRics package was very slow:

raw_data <- readSurvey("qualtrics_survey.csv")

raw_data <- readSurvey("qualtrics_survey_legacy.csv", legacyFormat=T) # if two rows at the top

As an alternative, you could import just the header of your survey, and then join it to an import where you skip the header lines. Actually, here’s a better way of doing just this:

everything = readLines("qualtrics_survey_legacy.csv")
wanted = everything[-2]
mydata = read.csv(textConnection(wanted), header = TRUE, stringsAsFactors = FALSE)

If you get an error “EOF within quoted string”, don’t ignore it: It indicates problems with double quoting, so add quote = "" to your import code.

If you are willing to violate the principle of not touching the raw data file, you could open the survey in a spreadsheet like Excel or LibreOffice Calc and delete the unwanted rows.

Given all these options, I found the most reliable way (as in: contrary to the above, it hasn’t failed me so far) to get Qualtrics data into R yet another one:
1. export as SPSS (rather than CSV)
2. use library(haven)
3. read_spss()

Barplots across variables in R

barplot_rainbowHere’s a good example of how useful sapply can be. I have some data from Qualtrics, and each response is coded in its own variable. Let’s say there is a question on what kind of organization respondents work in, with 10 response categories. Qualtrics produces 10 variables, each with 1 if the box was ticked, and empty otherwise (structure shown just below). With the default CSV import, these blank cells are turned into NA. Here’s a simple way to produce a barplot in this case (in R, of course).


types = sapply(1:10, function(i) sum(get(paste("Q1_",i,sep="")), na.rm=TRUE))

Let’s take this step by step. To count frequencies, we simply use sum(), with the argument na.rm=TRUE because the variables only contain 1 and NA. get() is used to find the variable specified by a string; the string is created with paste(). In this case, the variable names are Q1_1, Q1_2, Q1_3, … Q1_9, Q1_10. By using paste(), we combine the “Q1_” part with the counter variable i, with no separation (sep="").

The whole thing is then wrapped up in sapply(), with the counter variable i defined to take values from 1 to 10; the function(i) part is there so that the counter variable is applied to the sum. So sapply() takes each value of the counter variable, and applies it to the function we specified, which calculates the sum for one variable Q1_i at a time.

Now I can simply do a boxplot, and add the names.arg argument to specify the labels.

(Here I specified the colours: barplot(types, col=rainbow(10)) to have a catchy image at the top of this post, albeit one where colours have no meaning: so-called chart-junk).

Mousetracking in Qualtrics

Qualtrics is a widely used web service for surveys. It’s got plenty of useful features, one of which is the ability to include JavaScript. Jackson Walters has very kindly put up full instructions of how this can be used to track the respondents’ mouse in a particular question, following up a post on Stack Overflow. If you ever looked for mouse tracking in a self-administered web survey, look no further (assuming that you or your institute has a subscription with Qualtrics, of course).