readODS and column specifications

The R package readODS allows you to import ODS spreadsheets to R. It’s slow, but it works. In an attempt to speed up things, I thought providing column types would help. I didn’t find an improvement, but I noticed that the documentation wasn’t really clear (“refer to readr::type_convert to specify cols specification”). It seems like you do need to refer to type_convert to understand how to specify column types, and then feed them to readODS like this:

col_types=cols(VAR1 = "f", VAR2 ="i")

etc.

So an entire call would be:

data = read_ods("spreadsheet.ods", col_names=TRUE, col_types=cols(VAR1 = "f", VAR2 = "i", VAR3= "-"))

Note: I had to explicitly use library(readr) before calling read_ods(), otherwise the cols() function was not available.

Unfortunately, the short approach using col_types=as.col_spec(“fi-“) does not seem to work.

Error: `quantile.haven_labelled()` not implemented.

Today I got an Error: `quantile.haven_labelled()` not implemented on a database imported into R via library(haven) when trying to see the results of a simple linear regression model. What I needed was the zap_labels() function, which strips the value labels (and user-defined NA). Then I run the model on the new dataset, and all was good.

dataset2 = zap_labels(dataset)

Getting Qualtrics data into R

Data collected in Qualtrics come in a funny way when exported to CSV: the first two lines are headers. Simply using read.csv() will mess things up, because typically we only have one line as header. We can skip empty lines at the beginning, but there is no immediately obvious way to skip only the second line.

Of course there is an R package for that, but when I tried, the qualtRics package was very slow:

devtools::install_github("ropensci/qualtRics")
library(qualtRics)
raw_data <- readSurvey("qualtrics_survey.csv")

library(qualtRics)
raw_data <- readSurvey("qualtrics_survey_legacy.csv", legacyFormat=T) # if two rows at the top

As an alternative, you could import just the header of your survey, and then join it to an import where you skip the header lines. Actually, here’s a better way of doing just this:

everything = readLines("qualtrics_survey_legacy.csv")
wanted = everything[-2]
mydata = read.csv(textConnection(wanted), header = TRUE, stringsAsFactors = FALSE)

If you get an error “EOF within quoted string”, don’t ignore it: It indicates problems with double quoting, so add quote = "" to your import code.

If you are willing to violate the principle of not touching the raw data file, you could open the survey in a spreadsheet like Excel or LibreOffice Calc and delete the unwanted rows.

Given all these options, I found the most reliable way (as in: contrary to the above, it hasn’t failed me so far) to get Qualtrics data into R yet another one:
1. export as SPSS (rather than CSV)
2. use library(haven)
3. read_spss()

Overcoming Warnings when Importing SPSS Files to R

R can import SPSS files quite easily, using the package foreign and the read.spss command. It usually works quite well out of the box, so well that I usually choose the SPSS file when downloading secondary data (hint: look at the argument use.value.labels depending on how you want your data).

Sometimes R isn’t so happy, throwing warnings like “Unrecognized record type 7, subtype 18 encountered in system file”. Generally warnings in R are there for a reason. Usually these seem to be variable and data attributes in SPSS, but to be sure, simply convert the SPSS file into SPSS Portable (*.por rather than *.sav). Don’t have SPSS? Enter PSPP , a free (open source) program that can help you out! (for Windows, check directly on this site).

pspp-saveas-por PSPP can open SPSS files faster than SPSS, and under File > Save as... there’s the option to save as a Portable file (rather than the default System File) at the bottom left of the dialog. If you import this (portable) SPSS file to R, there should be no errors or warnings.