Interpolated Median in R

In survey data, actual distributions frequently are non-continuous and non-normal. The mean may thus be inappropriate to summarize the central tendency, and the median too rough because it is constrained to the actual categories in the data. On a five-point scale, the median can only fall on one of these categories, and thus does not reflect smaller changes in the distribution. The interpolated median adjusts the median position to do just that.

The concept of interpolated medians is nicely described at (at the bottom), or at (with a nice graph).

In R, the interpolated median can easily be calculated with the package psych, using the interp.median function.

To illustrate the difference, here a simple plot using data from the SOM project. Here I use the positional variable (all actors in Austria combined) by year. It is immediately apparent that the three ways to express central tendency are different. The median position (in blue) is clearly constrained by the categories available (in this case -1, -0.5, 0, 0.5, 1). With its corrections, the interpolated median offers much more details of changes over time.

OK, not a very pretty graph, but it does illustrate the point.

scatterplot() with scales

Today I spend quite some time trying to figure out why I couldn’t use the scatterplot function (from the package car) for one specific variable, while it worked for every other variable. I got stuck at the error “Error in if (transform != FALSE | length(transform) == ncol(x)) { : argument is of length zero.”

It was only when I used str on the variables to examine the structure that I realized that the scatterplot function does not work with scales. So, normally I could use scatterplot(y~x | country). In this particular case, I used x <- scale(x1) + scale(x2) +... to create the new variable. scatterplot(y~x1 | country) worked perfectly, as did scatterplot(y~x2 | country), but the scale did not. It turns out the scale function also adds additional information, which breaks the scatterplot function. Once I knew this, the solution seemed obvious: filter out this additional information by using as.numeric: scatterplot(y~as.numeric(x) | country).

Wordscores/JFreq with Long Manifestos

Today I run into an unexpected error when using Wordscores in R. I used JFreq 0.5.4 to calculate the word frequencies from 35 parties with rather long party manifestos. This resulted in a 3.4M CSV file with 42462 columns. R would throw up an error regarding read.table when I called Austin‘s (0.2) wfm function to import the word frequencies: “Error in read.table(file = file, header = header, sep = sep, quote = quote, : more columns than column names”. Well, the file seems too wide to open.

The solution I found was to use the old JFreq 0.2.5, which produces the output the other way around (rows/columns switched). Even if it is a bit slower than the newer JFreq, having a rather long (as opposed to wide) CSV with the word frequencies does not seem to pose problems.

Moving Averages in R

To the best of my knowledge, R does not have a built-in function to calculate moving averages. Using the filter function, however, we can write a short function for moving averages:

mav <- function(x,n=5){stats::filter(x,rep(1/n,n), sides=2)}

We can then use the function on any data: mav(data), or mav(data,11) if we want to specify a different number of data points than the default 5; plotting works as expected: plot(mav(data)).

In addition to the number of data points over which to average, we can also change the sides argument of the filter functions: sides=2 uses both sides, sides=1 uses past values only.

No Syntax Highlighting for R in Notepad++

Notepad++ is a versatile text editor for Windows. Unfortunately in the current version, some themes do not offer syntax highlighting for R scripts. This can be fixed relatively easily by adding the relevant entry from a theme that does include it. In Windows > Program Files > Notepad++, open the theme that does not work (e.g. vim Dark Blue.xml), and one that does work (e.g. the stylers.model.xml). Simply copy over the relevant bits, and change the colours to match those of your theme.

You might have to start Notepad++ as administrator (right-click on the program entry in the start menu). Let’s hope the next release has this bug fixed…