Updated PLZ – Cantons Tools for R

Thanks to Eva Van Belle pointing out issues with Appenzell postcodes, I’m happy to announce an update to the postcode to cantons conversion script for R. It’s essentially a database with Swiss postcodes (PLZ) and what canton they are in. For 16 postcodes only a probabilistic assignment is possible, and this is handled by siding with the (typically much) larger municipality.

Convert Swiss postcodes to cantons: https://gist.github.com/druedin/6690720

Two simple helper functions to go with: https://gist.github.com/druedin/8758265

readODS and column specifications

The R package readODS allows you to import ODS spreadsheets to R. It’s slow, but it works. In an attempt to speed up things, I thought providing column types would help. I didn’t find an improvement, but I noticed that the documentation wasn’t really clear (“refer to readr::type_convert to specify cols specification”). It seems like you do need to refer to type_convert to understand how to specify column types, and then feed them to readODS like this:

col_types=cols(VAR1 = "f", VAR2 ="i")


So an entire call would be:

data = read_ods("spreadsheet.ods", col_names=TRUE, col_types=cols(VAR1 = "f", VAR2 = "i", VAR3= "-"))

Note: I had to explicitly use library(readr) before calling read_ods(), otherwise the cols() function was not available.

Unfortunately, the short approach using col_types=as.col_spec(“fi-“) does not seem to work.

Error: `quantile.haven_labelled()` not implemented.

Today I got an Error: `quantile.haven_labelled()` not implemented on a database imported into R via library(haven) when trying to see the results of a simple linear regression model. What I needed was the zap_labels() function, which strips the value labels (and user-defined NA). Then I run the model on the new dataset, and all was good.

dataset2 = zap_labels(dataset)

Visualize correlations in R

There are rare cases when a graphic is not better than a figure to help us understand our quantitative results. A simple yet common table we’re staring at ever so often are tables of correlation coefficients: how strongly do different variables correlate with one another. We’re scanning the tables for numbers close to +1 and close to -1, but there’s a better way: visualize!

The R package corrplot offers a ready-made solution:

dat=matrix(c(0.11128257, -0.38968561, 0.11765272, -0.07089879, -0.19715366, -0.48083950, 0.54760745, -0.49410370, -0.42443391), nrow=3)

Here we call the corrplot package, create some data so that we can plot something, normally this would be a selection of variables. Then we simply call corrplot() and we’re done.

There are many ways to tweak the plots, but in all versions we get a quicker and better overview of the variables that correlate than staring at a large table.

Here are some variants of the above:

corrplot(dat, method = "shade")
corrplot(dat, diag=FALSE)
corrplot(dat, method = "square")
corrplot(dat, method = "number")

rtweet with Premium to search the archive

The R package rtweet does a great job to connect R to Twitter. Unless you’re looking at the past 7 days, Twitter offers two additional API (with different syntax).

If you access Twitter archives with rtweet and have a Premium subscrption on Twitter, the current version of rtweet sends requests in batches of n=100, but Premium (currently) allows batches up n=500. This means, you use 5 requests where 1 would suffice. Kevin Taylor has provided a fix for this, which he also mentioned in the issues of rtweet. Using the fix is easy (much easier than the description in issues thread suggests):


This will replace any installed version fo rtweet. You probably want this version if you’re on Twitter Premium; for the free Sandbox, n=100 is correct. Perhaps this is why rtweet has not implemented the fix yet?

Image credit: CC-by-nc by diarnst