Factorizing Error (in Zelig)

Today I re-run some code in R and was greeted with an error “Error in factorize(formula, data, f_out = TRUE)” and (more specifically) “Unable to find variable to convert to factor.” I immediately suspected the as.factor(x) among the predictor variables to be the culprit, but since this is analysis that has worked a few days earlier (on a different machine), I quickly searched on the web and found nothing. For some reason, in this case the as.factor(x) did not work, and the solution was simple: create a new (factor) variable separately and then run the regression analysis with the new variable. So instead of:

z <- zelig(y ~ x1 + as.factor(x2) + x3, model="normal", data=d)

I first create the new variable:

d$x2_factor <- as.factor(d$x2)

and then run the regression analysis with the new variable:

z <- zelig(y ~ x1 + x2_factor + x3, model="normal", data=d)

I just thought I’d share this in case someone else comes across this error and doesn’t find it obvious what the solution is.

Attitudes of the Moderate Left to Immigrants

I have just uploaded a “pre-print” to SocArXiv with a very simple description of how the moderate left in Switzerland (partisans of the Socialists and the Greens) see immigration. The summary is available here: SocArXiv. It’s a pre-print is quotation marks, because it’s clearly not something I have (or will) submit to a proper journal in this form. I’ve put it online, though, to make it available (including the PSPP code to get the results — yes PSPP). You will find descriptive statistics (cross-tabulations) from the 2015 Swiss Electoral Study (SELECTS), and see that the moderate left is more open to foreigners than other respondents on all measures of attitudes and for all definitions of the moderate left used (probably not surprisingly).

Set Encoding When Importing CSV Data in R Under Windows

This is another simple thing I keep on looking up: how to set the encoding when importing CSV data in R under Windows. I need this when my data file is in UTF-8 (pretty standard these days), but I’m using R under Windows; or when I have a Windows-encoded file when using R elsewhere. The default encoding in Windows is not UTF-8, and R uses the default encoding — well, by default. Typically this is not an issue unless my data file contains accented characters in strings, which can lead to garbled text when the wrong encoding is set/assumed.

The solution is quite simple: add encoding="" to the read.csv() command, like this:

x <- read.csv("datafile.csv", encoding="Windows-1252")

or like this:

x <- read.csv("datafile.csv", encoding="UTF-8")

Pre-Registration: A Reasonable Approach

We probably all know that pre-registration of experiments is a good thing. It’s a real solution to what is increasingly called ‘p-hacking’: doing analyses until you find a statistically significant association (which you then report).

One problem is that most pre-registration protocols are pretty complicated, and as researchers in the social sciences we usually don’t have inclination/incentives to follow complicated protocols typically designed for biomedical experiments. A probably more reasonable approach is AsPredicted: We’re looking at 9 simple and straightforward questions, and we’re looking at pre-registration that remains private until it is made public (but can be shared with reviewers).