Getting Qualtrics data into R

Data collected in Qualtrics come in a funny way when exported to CSV: the first two lines are headers. Simply using read.csv() will mess things up, because typically we only have one line as header. We can skip empty lines at the beginning, but there is no immediately obvious way to skip only the second line.

Of course there is an R package for that, but when I tried, the qualtRics package was very slow:

devtools::install_github("ropensci/qualtRics")
library(qualtRics)
raw_data <- readSurvey("qualtrics_survey.csv")

library(qualtRics)
raw_data <- readSurvey("qualtrics_survey_legacy.csv", legacyFormat=T) # if two rows at the top

As an alternative, you could import just the header of your survey, and then join it to an import where you skip the header lines. Actually, here’s a better way of doing just this:

everything = readLines("qualtrics_survey_legacy.csv")
wanted = everything[-2]
mydata = read.csv(textConnection(wanted), header = TRUE, stringsAsFactors = FALSE)

If you get an error “EOF within quoted string”, don’t ignore it: It indicates problems with double quoting, so add quote = "" to your import code.

If you are willing to violate the principle of not touching the raw data file, you could open the survey in a spreadsheet like Excel or LibreOffice Calc and delete the unwanted rows.

Given all these options, I found the most reliable way (as in: contrary to the above, it hasn’t failed me so far) to get Qualtrics data into R yet another one:
1. export as SPSS (rather than CSV)
2. use library(haven)
3. read_spss()

Counting Articles in Nexis Uni

We wanted to count the number of articles on a number of keywords that were published in specific newspapers. This is a measure of salience. So we have a list of keywords (e.g. Boris Johnson, Jeremy Corbyn, Jo Swinson, Caroline Lucas), a date range (e.g. 1 January 2017 to 31 December 2017), and Nexis Uni to get the articles. In this case, we were not interested in the contents of the articles, so I downloaded the meta data (headline, publication, an empty “summary” column, and the date of the publication). Nexis Uni gives me a spreadsheet per keyword. With some 30 keywords, I did not want a manual approach.

What we wanted in the end was a spreadsheet, with all dates in the date range and the number of articles for each keyword as a column. Here’s how I did this in R:

sheets = list.files(pattern="*.XLSX")

This gives me a list of all the XLSX files in the folder. I use library(readxl) because it imports the dates properly, unlike some of the other options to open XLSX files in R.

library(readxl)

for (i in 1:length(sheets)) {
assign(paste("N", i, sep=""), read_excel(sheets[i]))
}

Here I chose a loop to read in the data from the Nexis spreadsheets, each into a separate container. I guess I could have used a list or something, but for this project speed was no concern.

Next, we need a list of dates in the date range. The seq() already works, but to make this match the date format from the Excel documents, I need to wrap everything in the as.POSIXlt.

dat = as.POSIXlt(seq(from=as.Date("2017-01-01"), to=as.Date("2017-12-31"), by=1))

Now comes the actual work. First I create a vector with the dates of the articles in the Nexis spreadsheet for each keyword. Some dates in the date range have no articles, others have one, others still have more than one. I then use sapply() to match the dates, and colSums() to count the number of articles for a given day.

for (i in 1:length(sheets)) {
assign(paste("D", i, sep=""), eval(parse(text=paste("N", i, "$Date", sep=""))))
assign(paste("S", i, sep=""), eval(parse(text=paste("sapply(dat, function(x) D", i, "== x)", sep=""))))
assign(paste("H", i, sep=""), eval(parse(text=paste("colSums(S", i, ")", sep=""))))
}

At this stage I regret not using a list or data frame, but this code combines the different variables:

hits = data.frame(dat)
for (i in 1:length(sheets)) {
hits = cbind(hits, eval(parse(text=paste("H", i, sep=""))))
}

And then I add the names of the sheets and I have a data frame I can export with write.csv() or whatever.

colnames(hits) = c("date", sheets)

LaTeX to Word

Ever needed to convert a LaTeX document to Word, like to submit it to a social sciences journal insisting on MS Word format? There are several options out there, including using Adobe Reader to save the PDF as a Word document. In my experience, the best results can be obtained when using MS Word to open the PDF document (yes, MS word can open PDF documents!). Obviously you’ll have to check everything carefully, but recent version of Word even seem to handle most equations right.

Alternatively, write in Pandoc Markdown to start with (or Sciflow if you need online collaborative writing), and you can create beautiful PDF as well as Word document, whatever you need.

Use MS Word to Convert PDF Files

Recently we had to convert a PDF file to MS Word so that we could benefit from the Track Changes feature in MS Word. The proofreader did not want to use the commenting tools in Adobe/Acrobat because he found them inefficient to propose changes in the text. (Yes, he could make direct changes, but it takes much more time.) We had a LaTeX source file and faced the common challenge of turning this into a Word file. I remembered that Adobe/Acrobat can export PDF to Word files, but as I have experienced many times, the output did not satisfy me at all. I also tried pandoc, but it turned out that we used bits of LaTeX pandoc cannot (yet) handle. When checking the output, I discovered that Word can open PDF files. We quite liked the output and had to tidy up only a few bits and pieces to have an acceptable Word file.

We could have avoided this challenge by using markdown and pandoc to start with… my usual approach these days.